Hi Jason,
Thanks a lot for taking a look!
I'm actually less enthusiastic about this part, but mainly because I haven't looked closely at what the convention for this is. I was wondering though - this adds WGALLOWEDIP_A_FLAGS, which didn't exist before. Shouldn't some upper layer return a relevant value in that case? And even within the existing flags, for WGPEER_A_FLAGS, for example, old kernels check to see if there are new flags, for this purpose, e.g.:
if (attrs[WGPEER_A_FLAGS]) flags = nla_get_u32(attrs[WGPEER_A_FLAGS]); ret = -EOPNOTSUPP; if (flags & ~__WGPEER_F_ALL) goto out;
So I think we might be able to avoid having to bump the version number. GENL is supposed to be extensible like this.
I think the challenge with WGALLOWEDIP_A_FLAGS in particular is that because it didn't exist since the beginning like WGPEER_A_FLAGS, there are kernels out there that have no knowledge of it and wouldn't have this check in place. While I think it's a good idea to replicate this check for WGALLOWEDIP_A_FLAGS as well for future compatibility, we still need some way for clients to probe whether or not this feature is supported in case they're running on an older kernel. If we want to keep the version number as-is, I see a few alternatives:
1. Include WGALLOWEDIP_A_FLAGS in the response of WG_CMD_GET_DEVICE. This would be a new field inside each entry of WGPEER_A_ALLOWEDIPS. If the result of WG_CMD_GET_DEVICE contains WGALLOWEDIP_A_FLAGS then a client knows that it can use features WGALLOWEDIP_F_REMOVE_ME. It could potentially be used later to contain persistent flags for an IP, but for now would just be zeroed out. This fails if there are no allowed IPs configured for the device you're trying to configure, but in the case where you're trying to use WGALLOWEDIP_F_REMOVE_ME you probably would. 2. Keep everything the same. Don't bump the version number. Clients could in theory try to use WGALLOWEDIP_A_FLAGS with WG_CMD_SET_DEVICE then check if their request had the intended effect (e.g. checking if the IP was removed).
I slightly prefer approach 1 myself, as I feel it's a bit cleaner, but I'm happy to discuss some other approaches here. I'm trying to think about how these probes could be used inside the WireGuard Go library and wg itself. Assuming approach one,
* For libraries that manage WireGuard like golang.zx2c4.com/wireguard/wgctrl the first time a client uses .Device() (WG_CMD_GET_DEVICE under the hood) there would need to be some logic that looks at WGPEER_A_ALLOWEDIPS for one of the peers and sets some internal flag like c.supportsAllowedIPFlags. When a client later calls c.ConfigureDevice() the library would disallow direct IP removal if the feature is not supported (c.supportsAllowedIPFlags != nil && !*c.supportsAllowedIPFlags). Alternatively, you could do all this up front while initializing the client by creating some dummy device + peer, adding some IP, then using WG_CMD_GET_DEVICE to see if WGALLOWEDIP_A_FLAGS is present in the result. * Similarly for wg, if the user is trying to remove an allowed IP you'd first query the allowed IP for a peer and check for WGALLOWEDIP_A_FLAGS if it doesn't exist in the response then the command would fail and print something like "not supported".
Bumping WG_GENL_VERSION is cleaner still, since clients in userspace just need to check an integer value and avoid any probing logic. However, like you I am unsure what the convention is here and whether or not this has any unintended effects.
This file doesn't really do the _ prefix thing anywhere. Can you call this something more descriptive, like "remove_node"?
Sure. I'll update this in v3.
Reasoning for this is that it copies the logic in add()?
As for the cidr > bits check, the intent was simply to fail if the user passes an invalid value for WGALLOWEDIP_A_CIDR_MASK. Although, unlike the add() case, I suppose we could just remove this check. Worst case if a user passes something high like 255 remove() would just be a no-op. It looks safe to remove !peer check as well. I can update this in v3.
What's the reasoning behind returning success when it can't find the node? Because in that case it's already removed so the function is idempotent? And you checked that nothing really cares about the return value there anyway? Or is this a mistake and you meant to return something else? I can imagine good reasoning in either direction; I'd just like to learn that your choice is deliberate.
Yes, I opted for idempotence here intentionally. At least for the use cases I have in mind the intent is basically "remove all these allowed IPs from this peer if they exist". If an allowed IP already got removed or is mapped to another peer somehow then that's fine. I'd imagine it complicates the code in userspace to have to deal with corner cases involving possible error codes returned when removing a batch of allowed IPs. You'd need to query the device again, reform your request, etc. I /think/ add() is idempotent as well in cases where you re-add an IP to a peer, so there's some symmetry here. Perhaps if there are use cases that need more stricter checks in the future we could introduce a new flag to return an error code in this case.
As I mentioned above, you need to do the dance of:
ret = -EOPNOTSUPP; if (flags & ~__WGALLOWEDIP_F_ALL) goto out;
So that we can safely extend this later.
Good idea. I will add this in v3 as well.
We get 100 chars now, so you can rewrite this as:
ret = wg_allowedips_remove_v4(&peer->device->peer_allowedips, nla_data(attrs[WGALLOWEDIP_A_IPADDR]), cidr, peer, &peer->device->device_update_lock);
Nice, will do.
That comma should be a semicolon because what comes after is a complete sentence, and there's no conjunction.
Good point. I think we might actually need a comma /after/ otherwise: "...; otherwise, ...": "WGALLOWEDIP_F_REMOVE_ME if the specified IP should be removed; otherwise, this IP will be added if it is not already present".
I'm not so keen on this, simply because we've been able to do everything else in that script and keeping with the "make sure the userspace tooling" paradigm. There are two options:
Rewrite netns.sh all in C, only depending on libnl or whatever (which I would actually really *love* to see happen). This would change the testing paradigm, but I'd be okay with that if it's done well and all at once.
Add support for this new flag to wg(8) (which I think is necessary anyway for this to land; kernel features and userspace support oughta be posted at once).
Thanks for the patch. I like the feature and I'm happy you posted this.
Option 1 seems like a heavy lift for this patch. I think option 2 makes more sense, as long as there is an understanding that netns.sh needs to be run with the latest and greatest version of wg in order for all tests to pass. Maybe we can add a version check to selectively disable the remove IP tests if wg is on an older version. I agree though that long term option 1 would be nice, as it provides a finer level of control and tests could be run without as many external dependencies. I can get rid of the remove-ip cruft and send two patches to the wireguard mailing list if that works, one for the kernel and one for wireguard-tools.
However, this raises some questions about the wg interface itself which would be used both by netns.sh and end users. Looking at the current interface for wg, the way to configure allowed IPs currently is "wg set". For example,
wg set peer ABCD... allowed-ips 192.168.0.24/32,192.168.0.44/32,192.168.0.88/32
The intended effect is to completely replace the allowed IPs for that peer rather than to make an incremental change. I think we'll need to extend the interface a bit. There are a few directions we can take here:
1. Add a new flag to "wg set" like --incremental in which case it won't use WGPEER_F_REPLACE_ALLOWEDIPS under the hood. Support addition or removal of allowed IPs with a "-" suffix on CIDRs you want to remove (192.168.0.24/32-,192.168.0.44/32-,192.168.0.88/32). 2. Same as 1 but with a new argument name, allowed-ips-diff instead of the --incremental flag. This appears more in line with the current style of wg's arguments.
There are a lot more variations here, so I wanted to get your input here before just picking a direction. Thanks again for the thorough feedback!
-Jordan