Quoting Chris Wilson (2020-04-14 17:14:23)
Try to make RPS dramatically more responsive by shrinking the evaluation intervales by a factor of 100! The issue is as we now park the GPU rapidly upon idling, a short or bursty workload such as the composited desktop never sustains enough work to fill and complete an evaluation window. As such, the frequency we program remains stuck. This was first reported as once boosted, we never relinquished the boost [see commit 21abf0bf168d ("drm/i915/gt: Treat idling as a RPS downclock event")] but it equally applies in the order direction for bursty workloads that *need* low latency, like desktop animations.
What we could try is preserve the incomplete EI history across idling, it is not clear whether that would be effective, nor whether the presumption of continuous workloads is accurate. A clearer path seems to treat it as symptomatic that we fail to handle bursty workload with the current EI, and seek to address that by shrinking the EI so the evaluations are run much more often.
This will likely entail more frequent interrupts, and by the time we process the interrupt in the bottom half [from inside a worker], the workload on the GPU has changed. To address the changeable nature, in the previous patch we compared the previous complete EI with the interrupt request and only up/down clock if both agree. The impact of asking for, and presumably, receiving more interrupts is still to be determined and mitigations sought. The first idea is to differentiate between up/down responsivity and make upclocking more responsive than downlocking. This should both help thwart jitter on bursty workloads by making it easier to increase than it is to decrease frequencies, and reduce the number of interrupts we would need to process.
Another worry I'd like to raise, is that by reducing the EI we risk unstable evaluations. I'm not sure how accurate the HW is, and I worry about borderline workloads (if that is possible) but mainly the worry is how the HW is sampling.
The other unmentioned unknown is the latency in reprogramming the frequency. At what point does it start to become a significant factor? I'm presuming the RPS evaluation itself is free, until it has to talk across the chip to send an interrupt. -Chris