From: Ralph Campbell rcampbell@nvidia.com
The hmm_mirror_register() function registers a callback for when the CPU pagetable is modified. Normally, the device driver will call hmm_mirror_unregister() when the process using the device is finished. However, if the process exits uncleanly, the struct_mm can be destroyed with no warning to the device driver.
Changed since v1: - dropped VM_BUG_ON() - cc stable
Signed-off-by: Ralph Campbell rcampbell@nvidia.com Signed-off-by: Jérôme Glisse jglisse@redhat.com Cc: stable@vger.kernel.org Cc: Evgeny Baskakov ebaskakov@nvidia.com Cc: Mark Hairgrove mhairgrove@nvidia.com Cc: John Hubbard jhubbard@nvidia.com --- include/linux/hmm.h | 10 ++++++++++ mm/hmm.c | 18 +++++++++++++++++- 2 files changed, 27 insertions(+), 1 deletion(-)
diff --git a/include/linux/hmm.h b/include/linux/hmm.h index ef6044d08cc5..61b0e1c05ee1 100644 --- a/include/linux/hmm.h +++ b/include/linux/hmm.h @@ -218,6 +218,16 @@ enum hmm_update_type { * @update: callback to update range on a device */ struct hmm_mirror_ops { + /* release() - release hmm_mirror + * + * @mirror: pointer to struct hmm_mirror + * + * This is called when the mm_struct is being released. + * The callback should make sure no references to the mirror occur + * after the callback returns. + */ + void (*release)(struct hmm_mirror *mirror); + /* sync_cpu_device_pagetables() - synchronize page tables * * @mirror: pointer to struct hmm_mirror diff --git a/mm/hmm.c b/mm/hmm.c index 320545b98ff5..6088fa6ed137 100644 --- a/mm/hmm.c +++ b/mm/hmm.c @@ -160,6 +160,21 @@ static void hmm_invalidate_range(struct hmm *hmm, up_read(&hmm->mirrors_sem); }
+static void hmm_release(struct mmu_notifier *mn, struct mm_struct *mm) +{ + struct hmm *hmm = mm->hmm; + struct hmm_mirror *mirror; + struct hmm_mirror *mirror_next; + + down_write(&hmm->mirrors_sem); + list_for_each_entry_safe(mirror, mirror_next, &hmm->mirrors, list) { + list_del_init(&mirror->list); + if (mirror->ops->release) + mirror->ops->release(mirror); + } + up_write(&hmm->mirrors_sem); +} + static void hmm_invalidate_range_start(struct mmu_notifier *mn, struct mm_struct *mm, unsigned long start, @@ -185,6 +200,7 @@ static void hmm_invalidate_range_end(struct mmu_notifier *mn, }
static const struct mmu_notifier_ops hmm_mmu_notifier_ops = { + .release = hmm_release, .invalidate_range_start = hmm_invalidate_range_start, .invalidate_range_end = hmm_invalidate_range_end, }; @@ -230,7 +246,7 @@ void hmm_mirror_unregister(struct hmm_mirror *mirror) struct hmm *hmm = mirror->hmm;
down_write(&hmm->mirrors_sem); - list_del(&mirror->list); + list_del_init(&mirror->list); up_write(&hmm->mirrors_sem); } EXPORT_SYMBOL(hmm_mirror_unregister);
On Fri, 16 Mar 2018 15:14:08 -0400 jglisse@redhat.com wrote:
The hmm_mirror_register() function registers a callback for when the CPU pagetable is modified. Normally, the device driver will call hmm_mirror_unregister() when the process using the device is finished. However, if the process exits uncleanly, the struct_mm can be destroyed with no warning to the device driver.
Again, what are the user-visible effects of the bug? Such info is needed when others review our request for a -stable backport. And the many people who review -stable patches for integration into their own kernel trees will want to understand the benefit of the patch to their users.
On Fri, Mar 16, 2018 at 02:12:21PM -0700, Andrew Morton wrote:
On Fri, 16 Mar 2018 15:14:08 -0400 jglisse@redhat.com wrote:
The hmm_mirror_register() function registers a callback for when the CPU pagetable is modified. Normally, the device driver will call hmm_mirror_unregister() when the process using the device is finished. However, if the process exits uncleanly, the struct_mm can be destroyed with no warning to the device driver.
Again, what are the user-visible effects of the bug? Such info is needed when others review our request for a -stable backport. And the many people who review -stable patches for integration into their own kernel trees will want to understand the benefit of the patch to their users.
I have not had any issues in any of my own testing but nouveau driver is not as advance as the NVidia closed driver in respect to HMM inte- gration yet.
If any issues they will happen between exit_mm() and exit_files() in do_exit() (kernel/exit.c) exit_mm() tear down the mm struct but without this callback the device driver might still be handling page fault and thus might potentialy tries to handle them against a dead mm_struct.
So i am not sure what are the symptoms. To be fair there is no public driver using that part of HMM beside nouveau rfc patches. So at this point the impact on anybody is non existent. If anyone want to back- port nouveau HMM support once it make it upstream it will probably have to backport more things along the way. This is why i am not that aggressive on ccing stable so far.
Cheers, Jérôme
On 03/16/2018 02:26 PM, Jerome Glisse wrote:
On Fri, Mar 16, 2018 at 02:12:21PM -0700, Andrew Morton wrote:
On Fri, 16 Mar 2018 15:14:08 -0400 jglisse@redhat.com wrote:
The hmm_mirror_register() function registers a callback for when the CPU pagetable is modified. Normally, the device driver will call hmm_mirror_unregister() when the process using the device is finished. However, if the process exits uncleanly, the struct_mm can be destroyed with no warning to the device driver.
Again, what are the user-visible effects of the bug? Such info is needed when others review our request for a -stable backport. And the many people who review -stable patches for integration into their own kernel trees will want to understand the benefit of the patch to their users.
I have not had any issues in any of my own testing but nouveau driver is not as advance as the NVidia closed driver in respect to HMM inte- gration yet.
If any issues they will happen between exit_mm() and exit_files() in do_exit() (kernel/exit.c) exit_mm() tear down the mm struct but without this callback the device driver might still be handling page fault and thus might potentialy tries to handle them against a dead mm_struct.
So i am not sure what are the symptoms. To be fair there is no public driver using that part of HMM beside nouveau rfc patches. So at this point the impact on anybody is non existent. If anyone want to back- port nouveau HMM support once it make it upstream it will probably have to backport more things along the way. This is why i am not that aggressive on ccing stable so far.
The problem I'd like to avoid is: having a version of HMM in stable that is missing this new callback. And without it, once the driver starts doing actual concurrent operations, we can expect that the race condition will happen.
It just seems unfortunate to have stable versions out there that would be exposed to this, when it only require a small patch to avoid it.
On the other hand, it's also reasonable to claim that this is part of the evolving HMM feature, and as such, this new feature does not belong in stable. I'm not sure which argument carries more weight here.
thanks,
On 03/16/2018 12:14 PM, jglisse@redhat.com wrote:
From: Ralph Campbell rcampbell@nvidia.com
<snip>
+static void hmm_release(struct mmu_notifier *mn, struct mm_struct *mm) +{
- struct hmm *hmm = mm->hmm;
- struct hmm_mirror *mirror;
- struct hmm_mirror *mirror_next;
- down_write(&hmm->mirrors_sem);
- list_for_each_entry_safe(mirror, mirror_next, &hmm->mirrors, list) {
list_del_init(&mirror->list);
if (mirror->ops->release)
mirror->ops->release(mirror);
- }
- up_write(&hmm->mirrors_sem);
+}
OK, as for actual code review:
This part of the locking looks good. However, I think it can race against hmm_mirror_register(), because hmm_mirror_register() will just add a new mirror regardless.
So:
thread 1 thread 2 -------------- ----------------- hmm_release hmm_mirror_register down_write(&hmm->mirrors_sem); <blocked: waiting for sem> // deletes all list items up_write unblocked: adds new mirror
...so I think we need a way to back out of any pending hmm_mirror_register() calls, as part of the .release steps, right? It seems hard for the device driver, which could be inside of hmm_mirror_register(), to handle that. Especially considering that right now, hmm_mirror_register() will return success in this case--so there is no indication that anything is wrong.
Maybe hmm_mirror_register() could return an error (and not add to the mirror list), in such a situation, how's that sound?
thanks,
On 03/16/2018 07:36 PM, John Hubbard wrote:
On 03/16/2018 12:14 PM, jglisse@redhat.com wrote:
From: Ralph Campbell rcampbell@nvidia.com
<snip>
+static void hmm_release(struct mmu_notifier *mn, struct mm_struct *mm) +{
- struct hmm *hmm = mm->hmm;
- struct hmm_mirror *mirror;
- struct hmm_mirror *mirror_next;
- down_write(&hmm->mirrors_sem);
- list_for_each_entry_safe(mirror, mirror_next, &hmm->mirrors, list) {
list_del_init(&mirror->list);
if (mirror->ops->release)
mirror->ops->release(mirror);
- }
- up_write(&hmm->mirrors_sem);
+}
OK, as for actual code review:
This part of the locking looks good. However, I think it can race against hmm_mirror_register(), because hmm_mirror_register() will just add a new mirror regardless.
So:
thread 1 thread 2
hmm_release hmm_mirror_register down_write(&hmm->mirrors_sem); <blocked: waiting for sem> // deletes all list items up_write unblocked: adds new mirror
...so I think we need a way to back out of any pending hmm_mirror_register() calls, as part of the .release steps, right? It seems hard for the device driver, which could be inside of hmm_mirror_register(), to handle that. Especially considering that right now, hmm_mirror_register() will return success in this case--so there is no indication that anything is wrong.
Maybe hmm_mirror_register() could return an error (and not add to the mirror list), in such a situation, how's that sound?
In other words, I think this would help (not tested yet beyond a quick compile, but it's pretty simple):
diff --git a/mm/hmm.c b/mm/hmm.c index 7ccca5478ea1..da39f8522dca 100644 --- a/mm/hmm.c +++ b/mm/hmm.c @@ -66,6 +66,7 @@ struct hmm { struct list_head mirrors; struct mmu_notifier mmu_notifier; struct rw_semaphore mirrors_sem; + bool shutting_down; };
/* @@ -99,6 +100,7 @@ static struct hmm *hmm_register(struct mm_struct *mm) INIT_LIST_HEAD(&hmm->ranges); spin_lock_init(&hmm->lock); hmm->mm = mm; + hmm->shutting_down = false;
/* * We should only get here if hold the mmap_sem in write mode ie on @@ -167,6 +169,7 @@ static void hmm_release(struct mmu_notifier *mn, struct mm_struct *mm) struct hmm_mirror *mirror_next;
down_write(&hmm->mirrors_sem); + hmm->shutting_down = true; list_for_each_entry_safe(mirror, mirror_next, &hmm->mirrors, list) { list_del_init(&mirror->list); if (mirror->ops->release) @@ -227,6 +230,10 @@ int hmm_mirror_register(struct hmm_mirror *mirror, struct mm_struct *mm) return -ENOMEM;
down_write(&mirror->hmm->mirrors_sem); + if (mirror->hmm->shutting_down) { + up_write(&mirror->hmm->mirrors_sem); + return -ESRCH; + } list_add(&mirror->list, &mirror->hmm->mirrors); up_write(&mirror->hmm->mirrors_sem);
thanks,
On 03/16/2018 08:47 PM, John Hubbard wrote:
On 03/16/2018 07:36 PM, John Hubbard wrote:
On 03/16/2018 12:14 PM, jglisse@redhat.com wrote:
From: Ralph Campbell rcampbell@nvidia.com
<snip>
+static void hmm_release(struct mmu_notifier *mn, struct mm_struct *mm) +{
- struct hmm *hmm = mm->hmm;
- struct hmm_mirror *mirror;
- struct hmm_mirror *mirror_next;
- down_write(&hmm->mirrors_sem);
- list_for_each_entry_safe(mirror, mirror_next, &hmm->mirrors, list) {
list_del_init(&mirror->list);
if (mirror->ops->release)
mirror->ops->release(mirror);
- }
- up_write(&hmm->mirrors_sem);
+}
OK, as for actual code review:
This part of the locking looks good. However, I think it can race against hmm_mirror_register(), because hmm_mirror_register() will just add a new mirror regardless.
So:
thread 1 thread 2
hmm_release hmm_mirror_register down_write(&hmm->mirrors_sem); <blocked: waiting for sem> // deletes all list items up_write unblocked: adds new mirror
Mark Hairgrove just pointed out some more fun facts:
1. Because hmm_mirror_register() needs to be called with an mm that has a non-zero refcount, you generally cannot get an hmm_release callback, so the above race should not happen.
2. We looked around, and the code is missing a call to mmu_notifier_unregister(). That means that it is going to leak memory and not let the mm get released either.
Maybe having each mirror have its own mmu notifier callback is a possible way to solve this.
thanks,
linux-stable-mirror@lists.linaro.org