On Mon, Apr 21, 2025 at 08:37:40AM +0000, Tian, Kevin wrote:
From: Nicolin Chen nicolinc@nvidia.com Sent: Friday, April 11, 2025 2:38 PM
Add the support via vIOMMU infrastructure for virtualization use case.
This basically allows VMM to allocate VINTFs (as a vIOMMU object) and assign VCMDQs (vCMDQ objects) to it. A VINTF's MMIO page0 can be mmap'd to user space for VM to access directly without VMEXIT and corresponding hypercall.
it'd be helpful to add a bit more context, e.g. what page0 contains, sid slots, vcmdq_base (mapped in s2), etc. so it's easier for one to understand it from start instead of by reading the code.
Will do. It basically has all the control/status bits for direct vCMDQ controls.
As an initial version, the number of VCMDQs per VINTF is fixed to two.
so an user could map both VCMDQs of an VINTF even when only one VCMDQ is created, given the entire 64K page0 is legible for mmap once the VINTF is associated to a viommu?
Oh, that's a good point!
If a guest OS ignores the total number of VCMDQs emulated by the VMM and tries to enable the VCMDQ via the "reserved" MMIO region in the mmap'd VINTF page0, the host system would be spammed with vCMDQ TIMEOUTs that aren't supposed to happen nor be forwarded back to the guest.
It looks like we need some dynamic VCMDQ mapping to a VINTF v.s. static allocation, though we can still set the max number to 2.
no security issue given the VINTF is not shared, but conceptually if feasible (e.g. two CMDQ's MMIO ranges sits in different 4k pages of VINTF page0) does it make sense to do per-VCMDQ mmap control and return mmap info at VCMDQ alloc?
Page size can be 64K on ARM. And each additional logical VCMDQ (in a VINTF page0) has only an offset of 0x80. So, vCMDQ cannot be mmap'd individually.
- if (vintf->lvcmdqs[arg.vcmdq_id]) {
vcmdq = vintf->lvcmdqs[arg.vcmdq_id];
/* deinit the previous setting as a reset, before re-init */
tegra241_vcmdq_hw_deinit(vcmdq);
vcmdq->cmdq.q.q_base = q_base & VCMDQ_ADDR;
vcmdq->cmdq.q.q_base |= arg.vcmdq_log2size;
tegra241_vcmdq_hw_init_user(vcmdq);
return &vcmdq->core;
- }
why not returning -EBUSY here?
Hmm, this seems to a WAR that I forgot to drop! Will check and remove this.
- vcmdq = iommufd_vcmdq_alloc(viommu, struct tegra241_vcmdq,
core);
- if (!vcmdq)
return ERR_PTR(-ENOMEM);
- ret = tegra241_vintf_init_lvcmdq(vintf, arg.vcmdq_id, vcmdq);
- if (ret)
goto free_vcmdq;
- dev_dbg(cmdqv->dev, "%sallocated\n",
lvcmdq_error_header(vcmdq, header, 64));
- vcmdq->cmdq.q.q_base = q_base & VCMDQ_ADDR;
- vcmdq->cmdq.q.q_base |= arg.vcmdq_log2size;
could the queue size be multiple pages? there is no guarantee that the HPA of guest queue would be contiguous :/
It certainly can. VMM must make sure the guest PA are contiguous by using huge pages to back the guest RAM space. Kernel has no control of this but only has to trust the VMM.
I'm adding a note here: /* User space ensures that the queue memory is physically contiguous */
And likely something similar in the uAPI header too.
Thanks! Nicolin