On Tue, Jul 29, 2025 at 3:26 PM Jason Gunthorpe jgg@nvidia.com wrote:
On Mon, Jul 28, 2025 at 10:27:37AM -0600, Alex Williamson wrote:
On Fri, 25 Jul 2025 09:47:48 -0700 David Matlack dmatlack@google.com wrote:
I also was curious about your thoughts on maintenance of VFIO selftests, since I don't think we discussed that in the RFC. I am happy to help maintain VFIO selftests in whatever way makes the most sense. For now I added tools/testing/selftests/vfio under the top-level VFIO section in MAINTAINERS (so you would be the maintainer) and then also added a separate section for VFIO selftests with myself as a Reviewer (see PATCH 01). Reviewer felt like a better choice than Maintainer for myself since I am new to VFIO upstream (I've primarily worked on KVM in the past).
Hi David,
There's a lot of potential here and I'd like to see it proceed.
+1 too, I really lack time at the moment to do much with this but I'm half inclined to suggest Alex should say it should be merged in 6 weeks (to motivate any reviewing) and we can continue to work on it in-tree.
As they are self tests I think there is alot more value in having the tests than having perfect tests.
They have been quite useful already within Google. Internally we have something almost identical to the RFC and have been using that for testing our 6.6-based kernel continuously since March. Already they have caught one (self-inflicted) regression where 1GiB HugeTLB pages started getting mapped with 2MiB mappings in the IOMMU, and have been very helpful with new development (e.g. Aaron's work, and Live Update support).
So I agree, it's probably net positive to merge early and then iterate in-tree. Especially since these are only tests and not e.g. load-bearing kernel code (although I still want to hold a high bar for the selftests code).
The only patches to hold off merging would be 31-33, since those should probably go through the KVM tree? And of course we need Acks for the drivers/dma/{ioat,idxd} changes, but the changes there are pretty minor.
Something that we should continue to try to improve is the automation. These tests are often targeting a specific feature, so matching a device to a unit test becomes a barrier to automated runs. I wonder if we might be able to reach a point where the test runner can select appropriate devices from a pool of devices specified via environment variables.
Yes, I'd like to improve on this as well. Within Google we've recently split up run.sh into separate setup.sh and cleanup.sh scripts, and now store metadata about what devices are set up for VFIO selftests in files. Storing metadata in files has been especially useful for testing Live Updates, since we need the automation to remember what devices are in use across a kexec.
For now we still pass in the BDFs to tests via command line, but we could have tests themselves look and see what devices are available for use and then pick one or multiple as needed. The location of the metadata files can have some reasonable default path and with an option to override with a custom path via environment variable.
There's lots of directions we could go in, and it really depends on how folks want to use/run VFIO selftests, so I would like to learn more about that as well.
Makes a lot of sense to me!
I'd just put Dave as the VFIO selftest co-maintainer though - a pennance for doing so much work :)
Hah! I will accept my penance if necessary. :)
Jason