I've sent another patch to suggest these changes. I've tested it (with iperf3 traffic) and by playing with ethtool -K on the bond device. With simple iperf3 TCP traffic and no other tweaks, I get 2x the performance over the bond device with my patch compared to without.
I hope I didn't miss anything...
https://lore.kernel.org/netdev/20250123150909.387415-1-cratiu@nvidia.com/T/#...
Cosmin.