On Thu, Jun 22, 2023 at 04:37:21PM -0700, Kenneth Graunke wrote:
On Thursday, June 22, 2023 11:27:30 AM PDT Lucas De Marchi wrote:
Most of the context workarounds tweak masked registers, but not all. For masked registers, when writing the value it's sufficient to just write the wa->set_bits since that will take care of both the clr and set bits as well as not overwriting other bits.
However there are some workarounds, the registers are non-masked. Up until now the driver was simply emitting a MI_LOAD_REGISTER_IMM with the set_bits to program the register via the GPU in the WA bb. This has the side effect of overwriting the content of the register outside of bits that should be set and also doesn't handle the bits that should be cleared.
Kenneth reported that on DG2, mesa was seeing a weird behavior due to the kernel programming of L3SQCREG5 in dg2_ctx_gt_tuning_init(). With the GPU idle, that register could be read via intel_reg as 0x00e001ff, but during a 3D workload it would change to 0x0000007f. So the programming of that tuning was affecting more than the bits in L3_PWM_TIMER_INIT_VAL_MASK. Matt Roper noticed the lack of rmw for the context workarounds due to the use of MI_LOAD_REGISTER_IMM.
So, for registers that are not masked, read its value via mmio, modify and then set it in the buffer to be written by the GPU. This should take care in a simple way of programming just the bits required by the tuning/workaround. If in future there are registers that involved that can't be read by the CPU, a more complex approach may be required like a) issuing additional instructions to read and modify; or b) scan the golden context and patch it in place before saving it; or something else. But for now this should suffice.
Scanning the context workarounds for all platforms, these are the impacted ones with the respective registers
mtl: DRAW_WATERMARK mtl/dg2: XEHP_L3SQCREG5, XEHP_FF_MODE2 gen12: GEN12_FF_MODE2
Speaking of GEN12_FF_MODE2...there's a big scary comment above that workaround write which says that register "will return the wrong value when read." I think with this patch, we'll start doing a RMW cycle for the register, which could mix in some of this "wrong value". The comment mentions that the intention is to write the whole register, as the default value is 0 for all fields.
Good point. That also means we don't need to backport this patch to stable kernel to any gen12, since overwritting the other bits is actually the intended behavior.
Maybe what we want to do is change gen12_ctx_gt_tuning_init to do
wa_write(wal, GEN12_FF_MODE2, FF_MODE2_TDS_TIMER_128);
so it has a clear mask of ~0 instead of FF_MODE2_TDS_TIMER_MASK, and
In order to ignore read back when verifying, we would still need to use wa_add(), but changing the mask. We don't have a wa_write() that ends up with { .clr = ~0, .read_mask = 0 }.
wa_add(wal, GEN12_FF_MODE2, ~0, FF_MODE2_TDS_TIMER_128, 0, false);
then in this patch update your condition below from
if (wa->masked_reg || wa->set == U32_MAX) {
to
if (wa->masked_reg || wa->set == U32_MAX || wa->clear == U32_MAX) {
yeah... and maybe also warn if wa->read is 0, which means it's one of the registers we can't/shouldn't read from the CPU.
because if we're clearing all bits then we don't care about doing a read-modify-write either.
thanks Lucas De Marchi
--Ken