On 17.10.2024 9.40, MichaĆ Pecio wrote:
Avoid xHC host from processing a cancelled URB by always turning cancelled URB TDs into no-op TRBs before queuing a 'Set TR Deq' command.
If the command fails then xHC will start processing the cancelled TD instead of skipping it once endpoint is restarted, causing issues like Babble error.
This is not a complete solution as a failed 'Set TR Deq' command does not guarantee xHC TRB caches are cleared.
Hmm, wouldn't a long and partially cached TD basically become corrupted by this overwrite?
Unlikely but not impossible. We already turn all cancelled TDs that we don't stop on into no-ops, so those would already now experience the same problem.
We stopped the endpoint, and issued a 'Set TR deq' command which is supposed to clear xHC TRB cache. I find it hard to believe xHC would continue by caching some select TRBs of a TD to cache.
But lets say we end up corrupting the TD. It might still be better than allowing xHC to process the TRBs and write to DMA addresses that might be freed/reused already.
For instance, No Op following a chain bit TRB is prohibited by 4.11.7.
4.11.5.1 even goes as far as saying that there are no constraints on the order in which TRBs are fetched from the ring, not sure how much "out of order" it can be and if a cached TD could be left with a hole?
If the reason of Set TR Deq failure is an earlier Stop Endpoint failure, the xHC is executing this TD right now. Or maybe the next one - I guess the driver already risks UB when it misses any Stop EP failure.
If it didn't fail, xHC may store some "state" which allows it to restart a TRB stopped in the middle. It might not expect the TRB to change.
This should not be an issue. We don't queue a 'Set TR Deq' command if we intend to continue processing a stopped TD, as the 'Set TR Deq' is designed to dump all transfer related state of the endpoint.
Actually, it would *almost* be better to deal with it by simply leaving the TRB on the ring and waiting for it to complete. Problem is when it doesn't execute soon, or ever, leaving the urb_dequeue() caller hanging.
We need to give back the cancelled URB at some point, and 'Set TR Deq' command completion is the latest reasonable place to do it.
After this we should prevent xHC hw from accessing URB DMA pointers.
Thanks Mathias