This is the basic functionality for iommufd to support
iommufd_device_replace() and IOMMU_HWPT_ALLOC for physical devices.
iommufd_device_replace() allows changing the HWPT associated with the
device to a new IOAS or HWPT. Replace does this in way that failure leaves
things unchanged, and utilizes the iommu iommu_group_replace_domain() API
to allow the iommu driver to perform an optional non-disruptive change.
IOMMU_HWPT_ALLOC allows HWPTs to be explicitly allocated by the user and
used by attach or replace. At this point it isn't very useful since the
HWPT is the same as the automatically managed HWPT from the IOAS. However
a following series will allow userspace to customize the created HWPT.
The implementation is complicated because we have to introduce some
per-iommu_group memory in iommufd and redo how we think about multi-device
groups to be more explicit. This solves all the locking problems in the
prior attempts.
This series is infrastructure work for the following series which:
- Add replace for attach
- Expose replace through VFIO APIs
- Implement driver parameters for HWPT creation (nesting)
Once review of this is complete I will keep it on a side branch and
accumulate the following series when they are ready so we can have a
stable base and make more incremental progress. When we have all the parts
together to get a full implementation it can go to Linus.
I have this on github:
https://github.com/jgunthorpe/linux/commits/iommufd_hwpt
v2:
- Use WARN_ON for the igroup->group test and move that logic to a
function iommufd_group_try_get()
- Change igroup->devices to igroup->device list
Replace will need to iterate over all attached idevs
- Rename to iommufd_group_setup_msi()
- New patch to export iommu_get_resv_regions()
- New patch to use per-device reserved regions instead of per-group
regions
- Split out the reorganizing of iommufd_device_change_pt() from the
replace patch
- Replace uses the per-dev reserved regions
- Use stdev_id in a few more places in the selftest
- Fix error handling in IOMMU_HWPT_ALLOC
- Clarify comments
- Rebase on v6.3-rc1
v1: https://lore.kernel.org/all/0-v1-7612f88c19f5+2f21-iommufd_alloc_jgg@nvidia…
Cc: Nicolin Chen <nicolinc(a)nvidia.com>
Cc: Yi Liu <yi.l.liu(a)intel.com>
Cc: kvm(a)vger.kernel.org
Signed-off-by: Jason Gunthorpe <jgg(a)nvidia.com>
Jason Gunthorpe (15):
iommufd: Move isolated msi enforcement to iommufd_device_bind()
iommufd: Add iommufd_group
iommufd: Replace the hwpt->devices list with iommufd_group
iommu: Export iommu_get_resv_regions()
iommufd: Keep track of each device's reserved regions instead of
groups
iommufd: Use the iommufd_group to avoid duplicate MSI setup
iommufd: Make sw_msi_start a group global
iommufd: Move putting a hwpt to a helper function
iommufd: Add enforced_cache_coherency to iommufd_hw_pagetable_alloc()
iommufd: Reorganize iommufd_device_attach into
iommufd_device_change_pt
iommufd: Add iommufd_device_replace()
iommufd: Make destroy_rwsem use a lock class per object type
iommufd: Add IOMMU_HWPT_ALLOC
iommufd/selftest: Return the real idev id from selftest mock_domain
iommufd/selftest: Add a selftest for IOMMU_HWPT_ALLOC
Nicolin Chen (2):
iommu: Introduce a new iommu_group_replace_domain() API
iommufd/selftest: Test iommufd_device_replace()
drivers/iommu/iommu-priv.h | 10 +
drivers/iommu/iommu.c | 41 +-
drivers/iommu/iommufd/device.c | 498 +++++++++++++-----
drivers/iommu/iommufd/hw_pagetable.c | 96 +++-
drivers/iommu/iommufd/io_pagetable.c | 25 +-
drivers/iommu/iommufd/iommufd_private.h | 51 +-
drivers/iommu/iommufd/iommufd_test.h | 6 +
drivers/iommu/iommufd/main.c | 17 +-
drivers/iommu/iommufd/selftest.c | 40 ++
include/linux/iommufd.h | 1 +
include/uapi/linux/iommufd.h | 26 +
tools/testing/selftests/iommu/iommufd.c | 64 ++-
.../selftests/iommu/iommufd_fail_nth.c | 52 +-
tools/testing/selftests/iommu/iommufd_utils.h | 61 ++-
14 files changed, 789 insertions(+), 199 deletions(-)
create mode 100644 drivers/iommu/iommu-priv.h
base-commit: 4ed4791afb34c61650b17407846174a72e4034f4
--
2.39.2
Add support for sockmap to vsock.
We're testing usage of vsock as a way to redirect guest-local UDS
requests to the host and this patch series greatly improves the
performance of such a setup.
Compared to copying packets via userspace, this improves throughput by
121% in basic testing.
Tested as follows.
Setup: guest unix dgram sender -> guest vsock redirector -> host vsock
server
Threads: 1
Payload: 64k
No sockmap:
- 76.3 MB/s
- The guest vsock redirector was
"socat VSOCK-CONNECT:2:1234 UNIX-RECV:/path/to/sock"
Using sockmap (this patch):
- 168.8 MB/s (+121%)
- The guest redirector was a simple sockmap echo server,
redirecting unix ingress to vsock 2:1234 egress.
- Same sender and server programs
*Note: these numbers are from RFC v1
Only the virtio transport has been tested. The loopback transport was
used in writing bpf/selftests, but not thoroughly tested otherwise.
This series requires the skb patch.
Changes in v3:
- vsock/bpf: Refactor wait logic in vsock_bpf_recvmsg() to avoid
backwards goto
- vsock/bpf: Check psock before acquiring slock
- vsock/bpf: Return bool instead of int of 0 or 1
- vsock/bpf: Wrap macro args __sk/__psock in parens
- vsock/bpf: Place comment trailer */ on separate line
Changes in v2:
- vsock/bpf: rename vsock_dgram_* -> vsock_*
- vsock/bpf: change sk_psock_{get,put} and {lock,release}_sock() order
to minimize slock hold time
- vsock/bpf: use "new style" wait
- vsock/bpf: fix bug in wait log
- vsock/bpf: add check that recvmsg sk_type is one dgram, seqpacket, or
stream. Return error if not one of the three.
- virtio/vsock: comment __skb_recv_datagram() usage
- virtio/vsock: do not init copied in read_skb()
- vsock/bpf: add ifdef guard around struct proto in dgram_recvmsg()
- selftests/bpf: add vsock loopback config for aarch64
- selftests/bpf: add vsock loopback config for s390x
- selftests/bpf: remove vsock device from vmtest.sh qemu machine
- selftests/bpf: remove CONFIG_VIRTIO_VSOCKETS=y from config.x86_64
- vsock/bpf: move transport-related (e.g., if (!vsk->transport)) checks
out of fast path
Signed-off-by: Bobby Eshleman <bobby.eshleman(a)bytedance.com>
---
Bobby Eshleman (3):
vsock: support sockmap
selftests/bpf: add vsock to vmtest.sh
selftests/bpf: Add a test case for vsock sockmap
drivers/vhost/vsock.c | 1 +
include/linux/virtio_vsock.h | 1 +
include/net/af_vsock.h | 17 ++
net/vmw_vsock/Makefile | 1 +
net/vmw_vsock/af_vsock.c | 55 ++++++-
net/vmw_vsock/virtio_transport.c | 2 +
net/vmw_vsock/virtio_transport_common.c | 24 +++
net/vmw_vsock/vsock_bpf.c | 175 +++++++++++++++++++++
net/vmw_vsock/vsock_loopback.c | 2 +
tools/testing/selftests/bpf/config.aarch64 | 2 +
tools/testing/selftests/bpf/config.s390x | 3 +
tools/testing/selftests/bpf/config.x86_64 | 3 +
.../selftests/bpf/prog_tests/sockmap_listen.c | 163 +++++++++++++++++++
13 files changed, 443 insertions(+), 6 deletions(-)
---
base-commit: d83115ce337a632f996e44c9f9e18cadfcf5a094
change-id: 20230118-support-vsock-sockmap-connectible-2e1297d2111a
Best regards,
--
Bobby Eshleman <bobby.eshleman(a)bytedance.com>
---
Bobby Eshleman (3):
vsock: support sockmap
selftests/bpf: add vsock to vmtest.sh
selftests/bpf: add a test case for vsock sockmap
drivers/vhost/vsock.c | 1 +
include/linux/virtio_vsock.h | 1 +
include/net/af_vsock.h | 17 ++
net/vmw_vsock/Makefile | 1 +
net/vmw_vsock/af_vsock.c | 55 ++++++-
net/vmw_vsock/virtio_transport.c | 2 +
net/vmw_vsock/virtio_transport_common.c | 25 +++
net/vmw_vsock/vsock_bpf.c | 174 +++++++++++++++++++++
net/vmw_vsock/vsock_loopback.c | 2 +
tools/testing/selftests/bpf/config.aarch64 | 2 +
tools/testing/selftests/bpf/config.s390x | 3 +
tools/testing/selftests/bpf/config.x86_64 | 3 +
.../selftests/bpf/prog_tests/sockmap_listen.c | 163 +++++++++++++++++++
13 files changed, 443 insertions(+), 6 deletions(-)
---
base-commit: c2ea552065e43d05bce240f53c3185fd3a066204
change-id: 20230227-vsock-sockmap-upstream-9d65c84174a2
Best regards,
--
Bobby Eshleman <bobby.eshleman(a)bytedance.com>
On Fri, 10 Mar 2023 at 07:25, Stephen Boyd <sboyd(a)kernel.org> wrote:
>
> Quoting David Gow (2023-03-02 23:15:31)
> > On Thu, 2 Mar 2023 at 09:38, Stephen Boyd <sboyd(a)kernel.org> wrote:
> > >
> > > Introduce KUnit resource wrappers around platform_driver_register(),
> > > platform_device_alloc(), and platform_device_add() so that test authors
> > > can register platform drivers/devices from their tests and have the
> > > drivers/devices automatically be unregistered when the test is done.
> > >
> > > This makes test setup code simpler when a platform driver or platform
> > > device is needed. Add a few test cases at the same time to make sure the
> > > APIs work as intended.
> > >
> > > Cc: Brendan Higgins <brendan.higgins(a)linux.dev>
> > > Cc: David Gow <davidgow(a)google.com>
> > > Cc: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
> > > Cc: "Rafael J. Wysocki" <rafael(a)kernel.org>
> > > Signed-off-by: Stephen Boyd <sboyd(a)kernel.org>
> > > ---
> > >
> > > Should this be moved to drivers/base/ and called platform_kunit.c?
> > > The include/kunit/platform_driver.h could also be
> > > kunit/platform_device.h to match linux/platform_device.h if that is more
> > > familiar.
> >
> > DRM has a similar thing already (albeit with a root_device, which is
> > more common with KUnit tests generally):
> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/inc…
> >
> > But that's reasonably drm-specific, so it makes sense that it lives
> > with DRM stuff. platform_device is a bit more generic.
> >
> > I'd probably personally err on the side of having these in
> > drivers/base/, as I think we'll ultimately need similar things for a
> > lot of different devices, and I'd rather not end up with things like
> > USB device helpers living in the lib/kunit directory alongside the
> > "core" KUnit code. But I could be persuaded otherwise.
>
> Ok no problem. I'll move it.
>
> >
> > >
> > > And I'm not super certain about allocating a driver structure and
> > > embedding it in a wrapper struct. Maybe the code should just use
> > > kunit_get_current_test() instead?
> >
> > I think there are enough cases througout the kernel where
> > device/driver structs are needed that having this makes sense.
> > Combined with the fact that, while kunit_get_current_test() can be
> > used even when KUnit is not loaded, actually doing anything with the
> > resulting struct kunit pointer will probably require (at least for the
> > moment) KUnit functions to be reachable, so would break if
> > CONFIG_KUNIT=m.
>
> Wouldn't it still work in that case? The unit tests would be modular as
> well because they depend on CONFIG_KUNIT.
>
Yeah, the only case where this starts to get hairy is if the tests end
up in the same module as the thing being tested (which sometimes
happens to avoid having to export a bunch of symbols: see, e.g.
thunderbolt and amdgpu), and then someone wants to build production
kernels with CONFIG_KUNIT=m (alas, Red Hat and Android).
So that's the only real place where you might need to avoid the
non-'hook' KUnit functions, but those drivers are pretty few and far
between, and most of the really useful functionality should be moving
to 'hooks' which will be patched out cleanly at runtime.
> >
> > So, unless you actually find kunit_get_current_test() and friends to
> > be easier to work with, I'd probably stick with this.
> >
>
> Alright thanks.
>
> > > diff --git a/lib/kunit/platform_driver.c b/lib/kunit/platform_driver.c
> > > new file mode 100644
> > > index 000000000000..11d155114936
> > > --- /dev/null
> > > +++ b/lib/kunit/platform_driver.c
> > > @@ -0,0 +1,207 @@
> > > +// SPDX-License-Identifier: GPL-2.0
> > > +/*
> > > + * Test managed platform driver
> > > + */
> > > +
> > > +#include <linux/device/driver.h>
> > > +#include <linux/platform_device.h>
> > > +
> > > +#include <kunit/resource.h>
> > > +
> > > +struct kunit_platform_device_alloc_params {
> > > + const char *name;
> > > + int id;
> > > +};
> >
> > FYI: It's my plan to eventually get rid of (or at least de-emphasize)
> > the whole 'init' function aspect of KUnit resources so we don't need
> > all of these extra structs and the like. It probably won't make it in
> > for 6.4, but we'll see...
>
> Will we be able to get the error values out of the init function? It's
> annoying that the error values can't be returned as error pointers to
> kunit_alloc_resource(). I end up skipping init, and doing it directly
> before or after calling the kunit_alloc_resource() function. I'll try to
> avoid init functions in the allocations.
Yeah, that's largely why the plan is to get rid of them: it just made
passing things around an enormous pain.
Just doing your own initialisation before adding it as a resource is
usually the right thing to do.
There's also going to be a simpler kunit_defer() wrapper around it,
which would just allow you to schedule a cleanup function to be called
(without the need to keep kunit_resource pointers around, etc), for
the cases where you don't need to look up resources elsewhere.
But just doing your own thing and calling kunit_alloc_resource() is
probably best for now, and should map well onto whatever this ends up
evolving into.
Cheers,
-- David
On Fri, 10 Mar 2023 at 07:19, Stephen Boyd <sboyd(a)kernel.org> wrote:
>
> Quoting David Gow (2023-03-02 23:15:04)
> > On Thu, 2 Mar 2023 at 09:38, Stephen Boyd <sboyd(a)kernel.org> wrote:
> > >
> > > To fully exercise common clk framework code in KUnit we need to
> > > associate 'struct device' pointers with 'struct device_node' pointers so
> > > that things like clk_get() can parse DT nodes for 'clocks' and so that
> > > clk providers can use DT to provide clks; the most common mode of
> > > operation for clk providers.
> > >
> > > Adding support to KUnit so that it loads a DTB is fairly simple after
> > > commit b31297f04e86 ("um: Add devicetree support"). We can simply pass a
> > > pre-compiled deviectree blob (DTB) on the kunit.py commandline and UML
> > > will load it. The problem is that tests won't know that the commandline
> > > has been modified, nor that a DTB has been loaded. Take a different
> > > approach so that tests can skip if a DTB hasn't been loaded.
> > >
> > > Reuse the Makefile logic from the OF unittests to build a DTB into the
> > > kernel. This DTB will be for the mythical machine "linux,kunit", i.e.
> > > the devicetree for the KUnit "board". In practice, it is a dtsi file
> > > that will gather includes for kunit tests that rely in part on a
> > > devicetree being loaded. The devicetree should only be loaded if
> > > CONFIG_OF_KUNIT=y. Make that a choice config parallel to the existing
> > > CONFIG_OF_UNITTEST so that only one devicetree can be loaded in the
> > > system at a time. Similarly, the kernel commandline option to load a
> > > DTB is ignored if CONFIG_OF_KUNIT is enabled so that only one DTB is
> > > loaded at a time.
> >
> > This feels a little bit like it's just papering over the real problem,
> > which is that there's no way tests can skip themselves if no DTB is
> > loaded.
>
> Hmm. I think you're suggesting that the unit test data be loaded
> whenever CONFIG_OF=y and CONFIG_KUNIT=y. Then tests can check for
> CONFIG_OF and skip if it isn't enabled?
>
More of the opposite: that we should have some way of supporting tests
which might want to use a DTB other than the built-in one. Mostly for
non-UML situations where an actual devicetree is needed to even boot
far enough to get test output (so we wouldn't be able to override it
with a compiled-in test one).
I think moving to overlays probably will render this idea obsolete:
but the thought was to give test code a way to check for the required
devicetree nodes at runtime, and skip the test if they weren't found.
That way, the failure mode for trying to boot this on something which
required another device tree for, e.g., serial, would be "these tests
are skipped because the wrong device tree is loaded", not "I get no
output because serial isn't working".
Again, though, it's only really needed for non-UML, and just loading
overlays as needed should be much more sensible anyway.
> >
> > That being said, I do think that there's probably some sense in
> > supporting the compiled-in DTB as well (it's definitely simpler than
> > patching kunit.py to always pass the extra command-line option in, for
> > example).
> > But maybe it'd be nice to have the command-line option override the
> > built-in one if present.
>
> Got it. I need to test loading another DTB on the commandline still, but
> I think this won't be a problem. We'll load the unittest-data DTB even
> with KUnit on UML, so assuming that works on UML right now it should be
> unchanged by this series once I resend.
Again, moving to overlays should render this mostly obsolete, no? Or
am I misunderstanding how the overlay stuff will work?
One possible future advantage of being able to test with custom DTs at
boot time would be for fuzzing (provide random DT properties, see what
happens in the test). We've got some vague plans to support a way of
passing custom data to tests to support this kind of case (though, if
we're using overlays, maybe the test could just patch those if we
wanted to do that).
Cheers,
-- David