On Sat, Oct 22, 2022 at 05:49:51PM +0300, Vladimir Oltean wrote:
On Sat, Oct 22, 2022 at 04:49:50PM +0300, Ido Schimmel wrote:
In the above scenario, learning does not need to be on for the bridge to populate its FDB, but rather for the bridge to refresh the dynamic FDB entries installed by hostapd. This seems like a valid use case and one needs a good reason to break it in future kernels.
Before suggesting any alternatives, I'd like to know more details about how this will work in practice, because I'm aware of the limitations that come with DSA not syncing its hardware FDB with the software bridge.
So you add a dynamic FDB entry from user space, it gets propagated to hardware via SWITCHDEV_FDB_ADD_TO_DEVICE, and from there on, they have completely independent ageing timers.
You'll still suffer interruptions in authorization, if the software FDB entry expires because it was never refreshed (which will happen if traffic is forwarded autonomously and not seen by software). And at this stage, you could just add static FDB entries which you periodically delete from user space, since the effect would be equivalent.
If the mitigation to that is going to involve the extern_learn flag, the whole point becomes moot (for mv88e6xxx), since FDB refreshing does not happen in the bridge driver in that case (so the learning flag can be whatever).
Once a dynamic FDB entry is installed in hardware the software bridge no longer sees the majority of the traffic that refreshes this entry, which means we need to prevent the bridge from mindlessly ageing and removing the entry. I see two options, depending on the capabilities of the underlying hardware implementation:
1. If the hardware is capable of generating an event that an entry was aged out, then once the dynamic entry was installed in hardware the device driver needs to let the bridge driver know that it is no longer responsible for ageing the entry. This can be done by either marking the entry as extern_learn or offloaded. The latter is more accurate, but we need to patch br_fdb_cleanup(). Upon an ageing event, the device driver will tell the bridge to remove the entry via SWITCHDEV_FDB_DEL_TO_BRIDGE.
2. If the hardware is unable to generate ageing events, but allows querying the activity of the entry, then the device driver will need to emulate the behavior of the first option. This allows us to use the same interface between the bridge and device driver regardless of the underlying hardware implementation. My feeling is that most devices fall in the first category.
Regarding learning from link-local frames, this can be mitigated by [2] without adding additional checks in the bridge. I don't know why this bridge option was originally added, but if it wasn't for this use case, then now it has another use case.
There is still the problem that link-local learning is on by default (follows the BR_LEARNING setting of the port). I don't feel exactly comfortable with the fact that it's easy for a user to miss this and leave the port completely insecure.
I'm willing to patch the man page and add a note near the 'locked' bridge port option.
Regarding MAB, from the above you can see that a pure 802.1X implementation that does not involve MAB can benefit from locked bridge ports with learning enabled. It is therefore not accurate to say that one wants MAB merely by enabling learning on a locked port. Given that MAB is a proprietary extension and much less secure than 802.1X, we can assume that there will be deployments out there that do not use MAB and do not care about notifications regarding locked FDB entries. I therefore think that MAB needs to be enabled by a separate bridge port flag that is rejected unless the bridge port is locked and has learning enabled.
I had missed the detail that dynamic FDB entries will be refreshed only with "learning" on. It makes the picture more complete. Only this is said in "man bridge":
learning on or learning off Controls whether a given port will learn MAC addresses from received traffic or not. If learning if off, the bridge will end up flooding any traffic for which it has no FDB entry. By default this flag is on.
Can live with MAB being a separate flag if it comes to that, as long as 'learning' will continue to have its own specific meaning, independent of it (right now that meaning is subtle and undocumented, but makes sense).
Yes, I agree it is subtle.
Regarding hardware offload, I have an idea (needs testing) on how to make mlxsw work in a similar way to mv88e6xxx. That is, does not involve injecting frames that incurred a miss to the Rx path. If you guys want, I'm willing to take a subset of the patches here, improve the commit message, do some small changes and submit them along with an mlxsw implementation. My intention is not to discredit anyone (I will keep the original authorship), but to help push this forward and give another example of hardware offload.
[1] https://github.com/westermo/hostapd/commit/10c584b875a63a9e58b0ad39835282545... [2] https://git.kernel.org/pub/scm/network/iproute2/iproute2-next.git/commit/?id...
I think it would be very nice if you could do that. As a middle ground between mv88e6xxx and mlxsw, I can also try to build a setup on ocelot (which should trap frames with MAC SA misses in a similar way to mlxsw, but does also not sync its FDB with the bridge, similar to the mv88e6xxx. Not sure what to do with dynamic FDB entries).
Will try to post my patches this week.
If only I would figure out how to configure that hostapd fork (something which I never did before).
Hans, would it be possible to lay out some usage instructions for this fork?
That would be good.