Hi Vladimir!
On Thu, 2023-02-16 at 17:58 +0200, Vladimir Oltean wrote:
On Thu, Feb 16, 2023 at 02:47:11PM +0100, Ferenc Fejes wrote:
To fix this, we need to keep a current_ipv variable according to the gate entry that's currently executed by act_gate, and use this IPV to overwrite the skb->priority.
In fact, a complication arises due to the following clause from 802.1Q:
8.6.6.1 PSFP queuing If PSFP is supported (8.6.5.1), and the IPV associated with the stream filter that passed the frame is anything other than the null value, then that IPV is used to determine the traffic class of the frame, in place of the frame's priority, via the Traffic Class Table specified in 8.6.6. In all other respects, the queuing actions specified in 8.6.6 are unchanged. The IPV is used only to determine the traffic class associated with a frame, and hence select an outbound queue; for all other purposes, the received priority is used.
Interesting. In my understanding this indeed something like "modify the queueing decision while preserve the original PCP value".
I actually wonder what the interpretation in the spirit of this clause is. I'm known to over-interpret the text of these IEEE standards and finding inconsistencies in the process (being able to prove both A and !A).
I agree, it takes time to guess what the intention behind the wording of the standard in some cases. I have the standard in front of me right now and its 2163 pages... Even if I grep to IPV, the context is overwhelmingly dense.
For example, if we consider the framePreemptionAdminStatus (express or preemptible). It is expressed per priority. So this means that the Internal Priority Value rewritten by PSFP will not affect the express/ preemptible nature of a frame, but it *will* affect the traffic class selection?
This is insane, because it means that to enforce the requirements of clause 12.30.1.1.1:
Priorities that all map to the same traffic class should be constrained to use the same value of preemption status.
it is insufficient to check that the prio_tc_map does not make some traffic classes both express and preemptible. One has to also check the tc-gate configuration, because the literal interpretation of the standard would suggest that a packet is preemptible or express based on skb-
priority,
but is classified to a traffic class based on skb->ipv. So I guess IPV rewriting can only take place between one preemptible priority and another, or only between one express priority and another, and that we should somehow test this. But PSFP is at ingress, and FP is at egress, so we'd somehow have to test the FP adminStatus of all the other net devices that we can forward to!!!
I'll try to ask around and see if anyone knows more details about what is the expected behavior.
I'll try to ask around too, thanks for pointing this out. My best understanding from the IPV that the standard treat it as skb->priority. It defines IPV as a 32bit signed value, which clearly imply similar semantics as skb->priority, which can be much larger than the number of the queues or traffic classes.
Your solution certainly correct and do the differentiation between the cases where we have PSFP at ingress or not. However in my understanding the only purpose of the IPV is the traffic class selection.
Setting skb->queue_mapping to IPV probably wont work, because of two reasons:
- it brings inconsistency with the mqprio/taprio queues and the
actual hw rings the traffic sent 2. some drivers dont check if skb->queue_mapping is bounded, they expect its smaller than the num_tx_queues.
The 2. might be solvable, but the 1. is more problematic. However with a helper, we might check if skb->queue_mapping is already set and use that as a traffic class. Is that possible? I dont really see any other codepath where that value can be other than zero before the qdisc layer. That way one flag (use_ipv) might be enough which tells that we should use the skb->queue_mapping as is (set by act_gate) and preserve skb->priority.
What do you think? Again, sorry for being late here, but I'm following the list and see that recently you did major mqprio/taprio fixes and refactors, so I hope your cache line is hot.
I'm afraid it's not so easy to reuse this field for the IPV, because skb->queue_mapping is already used between the RX and the TX path for "recording the RX queue", see skb_rx_queue_recorded(), skb_record_rx_queue(), skb_get_rx_queue().
Oh, alright. I continue to think about alternatives over introducing new members into sk_buff. It would be very nice to have proper act_gate IPV handling without hardware offload. Its great to see the support of frame preemption and PSFP support in more and more hardware but on the other hand it makes the lack of the proper software mode operation more and more awkward.
Best, Ferenc