On Tue, Nov 5, 2019 at 9:06 AM John Stultz john.stultz@linaro.org wrote:
On Tue, Nov 5, 2019 at 2:29 AM Will Deacon will@kernel.org wrote:
Hi John,
On Mon, Nov 04, 2019 at 05:16:42PM -0800, John Stultz wrote:
On Tue, Oct 29, 2019 at 8:31 AM Catalin Marinas catalin.marinas@arm.com wrote:
Shared and writable mappings (__S.1.) should be clean (!dirty) initially and made dirty on a subsequent write either through the hardware DBM (dirty bit management) mechanism or through a write page fault. A clean pte for the arm64 kernel is one that has PTE_RDONLY set and PTE_DIRTY clear.
The PAGE_SHARED{,_EXEC} attributes have PTE_WRITE set (PTE_DBM) and PTE_DIRTY clear. Prior to commit 73e86cb03cf2 ("arm64: Move PTE_RDONLY bit handling out of set_pte_at()"), it was the responsibility of set_pte_at() to set the PTE_RDONLY bit and mark the pte clean if the software PTE_DIRTY bit was not set. However, the above commit removed the pte_sw_dirty() check and the subsequent setting of PTE_RDONLY in set_pte_at() while leaving the PAGE_SHARED{,_EXEC} definitions unchanged. The result is that shared+writable mappings are now dirty by default
Fix the above by explicitly setting PTE_RDONLY in PAGE_SHARED{,_EXEC}. In addition, remove the superfluous PTE_DIRTY bit from the kernel PROT_* attributes.
Fixes: 73e86cb03cf2 ("arm64: Move PTE_RDONLY bit handling out of set_pte_at()") Cc: stable@vger.kernel.org # 4.14.x- Cc: Will Deacon will@kernel.org Signed-off-by: Catalin Marinas catalin.marinas@arm.com
Hey, So I'm not yet sure why, but I've just validated that this patch is causing trouble with booting AOSP on HiKey960 with 5.4-rc6 (-rc5 works fine).
Hmm. Annoying this wasn't spotted by CI.
Its odd, because the system does boot and is alive, but seems to stall out at the boot animation, and userland never finishes coming up to the home screen. It just sits there without a useful error message that I can find so far. Reverting just this patch seems to solve it and it boots all the way.
Given that I don't think the HiKey960 supports h/w DBM, my initial guess is that the GPU is stuck on a page fault.
I'll try to dig further to see what might be going on (the mali driver is a prime suspect here), but I wanted to raise the flag since we're at the end of the -rc cycle.
What exactly are you using for the mali driver?
I've got an old r10p0 bifrost blob we were given and kernel patches I've carried forward since then.
Again, I don't want to distract you too much for something that may be related to a blob driver. I mostly just wanted to raise a flag in case there was something off that might affect others.
Just as a further detail (about to close up for the day), I'm also seeing this issue on the HiKey board as well. Similarly reverting 747a70e60b72 resolves it. Its a mali blob driver too, but a different one (utgard) which makes me suspect this might be a real issue w/ something in AOSP.
I'll be testing on a db845c tomorrow morning to see if I can trigger it there as well.
thanks -john