On Fri, Jan 31, 2025 at 10:56:56AM +0000, Mark Rutland wrote:
On Thu, Jan 30, 2025 at 12:46:15PM -0800, Oliver Upton wrote:
There are a few places where the idreg overrides are read w/ the MMU off, for example the VHE and hVHE checks in __finalise_el2. And while the infrastructure gets this _mostly_ right (i.e. does the appropriate cache maintenance), the placement of the data itself is problematic and could share a cache line with something else.
Depending on how unforgiving an implementation's handling of mismatched attributes is, this could lead to data corruption. In one observed case, the system_cpucaps shared a line with arm64_sw_feature_override and the cpucaps got nuked after entering the hyp stub...
This doesn't sound right. Non-cacheable/Device reads should not lead to corruption of a cached copy regardless of whether that cached copy is clean or dirty.
The corruption suggests that either we're performing a *write* with mismatched attributes (in which case the use of .mmuoff.data.read below isn't quite right), or we have a plan invalidate somewhere without a clean (and e.g. something else might need to be moved into .mmuoff.data.write).
Seconding Ard's point, I think we need to understand this scenario better.
Of course. So the write to the idreg override is fine and gets written back after cache maintenance.
What's happening afterwards is CPU0 pulls in the line to write to system_cpucaps which also happens to contain arm64_sw_feature_override. That line is in UD state when CPU0 calls HVC_FINALISE_EL2 and goes to EL2 with the MMU off.
__finalise_el2() does a load on arm64_sw_feature_override which goes out as a ReadNoSnp. I is the only cache state for this request (IHI 0050G B4.2.1.2) and the SF stops tracking the line, so writeback never actually goes anywhere.
This patch might have been a touch premature, let me double check a few things on our side.