Re: [PATCH 3/4] mm: simplify device private page handling in hmm_range_fault

20 Mar 2020

On Thu, Mar 19, 2020 at 03:56:50PM -0700, Ralph Campbell wrote:
...
Adding linux-kselftest@vger.kernel.org for the test config question.
On 3/19/20 11:17 AM, Jason Gunthorpe wrote:
...
On Tue, Mar 17, 2020 at 04:14:31PM -0700, Ralph Campbell wrote:
...
On 3/17/20 5:59 AM, Christoph Hellwig wrote:
...
On Tue, Mar 17, 2020 at 09:47:55AM -0300, Jason Gunthorpe wrote:
...
I've been using v7 of Ralph's tester and it is working well - it has
DEVICE_PRIVATE support so I think it can test this flow too. Ralph are
you able?
This hunk seems trivial enough to me, can we include it now?
I can send a separate patch for it once the tester covers it.  I don't
want to add it to the original patch as it is a significant behavior
change compared to the existing code.
Attached is an updated version of my HMM tests based on linux-5.6.0-rc6.
I ran this OK with Jason's 8+1 HMM patches, Christoph's 1-5 misc HMM clean ups,
and Christoph's 1-4 device private page changes applied.
I'd like to get this to mergable, it looks pretty good now, but I have
no idea about selftests - and I'm struggling to even compile the tools
dir
...

diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index 69def4a9df00..4d22ce7879a7 100644
+++ b/lib/Kconfig.debug
@@ -2162,6 +2162,18 @@ config TEST_MEMINIT
     If unsure, say N.
+config TEST_HMM

tristate "Test HMM (Heterogeneous Memory Management)"
depends on DEVICE_PRIVATE
select HMM_MIRROR
   select MMU_NOTIFIER



extra spaces
Will fix in v8.
...
In general I wonder if it even makes sense that DEVICE_PRIVATE is user
selectable?
Should tests enable the feature or the feature enable the test?
IMHO, if the feature is being compiled into the kernel, that should
enable the menu item for the test. If the feature isn't selected,
no need to test it :-)
I ment if DEVICE_PRIVATE should be a user selectable option at all, or
should it be turned on when a driver like nouveau is selected.
Is there some downside to enabling DEVICE_PRIVATE?
...
...
The notifier holds a mmgrab, no need for another one
OK. I'll replace dmirror->mm with dmirror->notifier.mm.
Right that is good too
...
...
...

filp->private_data = dmirror;

Not sure what this comment means
I'll change the comment to:
     /*
         * The first open of the device character file registers the address
         * space of the process doing the open() system call with the device.
         * Subsequent file opens by other processes will have access to the
         * first process' address space.
         */
How does this happen? The function looks like it always does the same thing
...
...
...
+static bool dmirror_interval_invalidate(struct mmu_interval_notifier *mni,

		const struct mmu_notifier_range *range,


		unsigned long cur_seq)



+{

struct dmirror *dmirror = container_of(mni, struct dmirror, notifier);
struct mm_struct *mm = dmirror->mm;

/*
* If the process doesn't exist, we don't need to invalidate the


* device page table since the address space will be torn down.


*/


if (!mmget_not_zero(mm))
return true;



Why? Don't the notifiers provide for this already.
mmget_not_zero() is required before calling hmm_range_fault() though
Oh... This is the invalidate_all path during invalidation
IMHO you should test the invalidation reason in the range to exclude
this.
But xa_erase looks totally safe so there should be no reason to do
that.
...
This is a workaround for a problem I don't quite understand.
If you change tools/testing/selftests/vm/hmm-tests.c line 868 to
   ASSERT_EQ(ret, -1);
Then the test will abort, core dump, and cause two problems,

the migrated page will be faulted back to system memory in order to write
it to the core dump. This triggers lockdep_assert_held(&walk.mm->mmap_sem)
in walk_page_range().

Has the migration stuff become entangled with the xarray?
...
[  137.980718] Code: 80 2f 1a 83 c6 05 e9 8d 7b 01 01 e8 3e b1 b1 fe e9 05 ff ff ff 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 41 56 41 55 41 54 55 <48> 89 fd 53 4c 8d 6d 10 e8 3c fc ff ff 49 89 c4 4c 89 e0 83 e0 03
[  137.999461] RSP: 0018:ffffc900015e77c8 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13
[  138.007028] RAX: ffff8886e508c408 RBX: 0000000000000000 RCX: ffffffff82626c89
[  138.014159] RDX: dffffc0000000000 RSI: 0000000000000000 RDI: ffffc900015e78a0
[  138.021293] RBP: ffffc900015e78a0 R08: ffffffff811461c4 R09: fffff520002bcf17
[  138.028426] R10: fffff520002bcf16 R11: 0000000000000003 R12: 0000000002606d10
[  138.035557] R13: ffff8886e508c448 R14: 0000000000000031 R15: ffffffffa06546a0
[  138.042701]  ? do_raw_spin_lock+0x104/0x1d0
[  138.046888]  ? xas_store+0x19/0xa60
[  138.050390]  xas_store+0x5b3/0xa60
[  138.053806]  ? register_lock_class+0x860/0x860
[  138.058267]  __xa_erase+0x96/0x110
[  138.061673]  ? xas_store+0xa60/0xa60
[  138.065267]  xa_erase+0x19/0x30
oh, it is doing this:
static void mn_itree_release(struct mmu_notifier_subscriptions *subscriptions,
                             struct mm_struct *mm)
{
        struct mmu_notifier_range range = {
                .flags = MMU_NOTIFIER_RANGE_BLOCKABLE,
                .event = MMU_NOTIFY_RELEASE,
                .mm = mm,
                .start = 0,
                .end = ULONG_MAX,
        };
ie it is sitting doing a huge number of xa_erases, I suppose. Probably
in normal exit the notifier is removed before the mm is destroyed.
The xa_erase needs to be a bit smarter to jump over gaps in the tree
perhaps some
xa_for_each()
   xa_erase()
pattern?
...
...
Also I get this:
lib/test_hmm.c: In function ‘dmirror_devmem_fault_alloc_and_copy’:
lib/test_hmm.c:1041:25: warning: unused variable ‘vma’ [-Wunused-variable]
  1041 |  struct vm_area_struct *vma = args->vma;
But this is a kernel bug, due to alloc_page_vma being a #define not a
static inline and me having CONFIG_NUMA off in this .config
Fixed.
in gfp.h?
Jason

    

2025

2024

2023

2022

2021

2020

2019

2018

2017

Re: [PATCH 3/4] mm: simplify device private page handling in hmm_range_fault