On Wed, Aug 29, 2018 at 8:49 AM, Rik van Riel riel@surriel.com wrote:
On Wed, 2018-08-29 at 08:36 -0700, Andy Lutomirski wrote:
On Wed, Aug 29, 2018 at 8:17 AM, Rik van Riel riel@surriel.com wrote:
On Tue, 2018-08-28 at 20:46 -0700, Andy Lutomirski wrote:
On Tue, Aug 28, 2018 at 10:56 AM, Rik van Riel riel@surriel.com wrote:
On Mon, 27 Aug 2018 16:04:16 -0700 Andy Lutomirski luto@kernel.org wrote:
The 0day bot is still chewing on this, but I've tested it a bit locally and it seems to do the right thing.
Hi Andy,
the version of the patch below should fix the bug we talked about in email yesterday. It should automatically cover kernel threads in lazy TLB mode, because current->mm will be NULL, while the cpu_tlbstate.loaded_mm should never be NULL.
That's better than mine. I tweaked it a bit and added some debugging, and I got this:
https://git.kernel.org/pub/scm/linux/kernel/git/luto/linux.git/commit/?h=x86...
I made the loaded_mm handling a little more conservative to make it more obvious that switch_mm_irqs_off() is safe regardless of exactly when it gets called relative to switching current.
I am not convinced that the dance of writing cpu_tlbstate.loaded_mm twice, with a barrier on each end, is useful or necessary.
At the time switch_mm_irqs_off returns, nmi_uaccess_ok() will still return false, because we have not switched "current" to the task that owns the next mm_struct yet.
We just have to make sure to:
- Change cpu_tlbstate.loaded_mm before we manipulate CR3, and
- Change "current" only once enough of the mm stuff has been switched, __switch_to seems to get that right.
Between the time switch_mm_irqs_off() sets cpu_tlbstate to the next mm, and __switch_to moves() over current, nmi_uaccess_ok() will return false.
All true, but I think it stops working as soon as someone starts calling switch_mm_irqs_off() for some other reason, such as during text_poke(). And that was the original motivation for this patch.
How does calling switch_mm_irqs_off() for text_poke() change around current->mm and cpu_tlbstate.loaded_mm?
Does current->mm stay the same throughout the entire text_poke() chain, while cpustate.loaded_mm is the only thing that is changed out?
This is exactly what happens. It seemed considerably more complicated and error-prone to fiddle with current->mm. Instead the idea was to turn off IRQs, get NMI to stay out of the way, and put everything back the way it was before the scheduler, any syscalls, etc could notice that we were messing around.
--Andy