PACKET socket can retain its fanout membership through link down and up and leave a fanout while closed regardless of link state. However, socket was forbidden from joining a fanout while it was not RUNNING.
This scenario was identified while studying DPDK pmd_af_packet_drv. Since sockets are only created during initialization, there is no reason to fail the initialization if a single link is temporarily down.
This patch allows PACKET socket to join a fanout while not RUNNING.
Selftest psock_fanout is extended to test this "fanout while link down" scenario.
Selftest psock_fanout is also extended to test fanout create/join by socket that did not bind or specified a protocol, which carries an implicit bind.
This is the only test that was performed.
Changes:
V03: * psock_fanout: add test for joining fanout with unbound socket. * Test that socket can receive packets before adding it to a fanout match. This is kind of replaces the RUNNING test that was removed. * Initialize po->ifindex in packet_create. To -1 if no protocol is specified and add an explicit initialization to 0 if protocol is specified. * Refactor relevant code in fanout_add within bind_lock, as a sequence of if {} else if {}, in order to reduce indentation of nested if statements and provide specific error codes.
V02: https://lore.kernel.org/netdev/cover.1728382839.git.gur.stavi@huawei.com * psock_fanout: use explicit loopback up/down instead of toggle. * psock_fanout: don't try to restore loopback state on failure. * Rephrase commit message about "leaving a fanout".
V01: https://lore.kernel.org/netdev/cover.1728303615.git.gur.stavi@huawei.com/
Gur Stavi (3): af_packet: allow fanout_add when socket is not RUNNING selftests: net/psock_fanout: socket joins fanout when link is down selftests: net/psock_fanout: unbound socket fanout
net/packet/af_packet.c | 35 ++++++---- tools/testing/selftests/net/psock_fanout.c | 78 +++++++++++++++++++++- 2 files changed, 96 insertions(+), 17 deletions(-)
base-commit: 36efaca9cb28a893cad98f0448c39a8b698859e2
PACKET socket can retain its fanout membership through link down and up and leave a fanout while closed regardless of link state. However, socket was forbidden from joining a fanout while it was not RUNNING.
This patch allows PACKET socket to join fanout while not RUNNING.
The previous test for RUNNING also implicitly tested that the socket is bound to a device. An explicit test of ifindex was added instead.
Signed-off-by: Gur Stavi gur.stavi@huawei.com --- net/packet/af_packet.c | 35 +++++++++++++++++++++-------------- 1 file changed, 21 insertions(+), 14 deletions(-)
diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c index f8942062f776..8137c33ab0fd 100644 --- a/net/packet/af_packet.c +++ b/net/packet/af_packet.c @@ -1843,26 +1843,29 @@ static int fanout_add(struct sock *sk, struct fanout_args *args) match->prot_hook.ignore_outgoing = type_flags & PACKET_FANOUT_FLAG_IGNORE_OUTGOING; list_add(&match->list, &fanout_list); } - err = -EINVAL;
spin_lock(&po->bind_lock); - if (packet_sock_flag(po, PACKET_SOCK_RUNNING) && - match->type == type && - match->prot_hook.type == po->prot_hook.type && - match->prot_hook.dev == po->prot_hook.dev) { + if (po->ifindex == -1 || po->num == 0) { + /* Socket can not receive packets */ + err = -ENXIO; + } else if (match->type != type || + match->prot_hook.type != po->prot_hook.type || + match->prot_hook.dev != po->prot_hook.dev) { + /* Joining an existing group, properties must be identical */ + err = -EINVAL; + } else if (refcount_read(&match->sk_ref) >= match->max_num_members) { err = -ENOSPC; - if (refcount_read(&match->sk_ref) < match->max_num_members) { + } else { + /* Paired with packet_setsockopt(PACKET_FANOUT_DATA) */ + WRITE_ONCE(po->fanout, match); + po->rollover = rollover; + rollover = NULL; + refcount_set(&match->sk_ref, refcount_read(&match->sk_ref) + 1); + if (packet_sock_flag(po, PACKET_SOCK_RUNNING)) { __dev_remove_pack(&po->prot_hook); - - /* Paired with packet_setsockopt(PACKET_FANOUT_DATA) */ - WRITE_ONCE(po->fanout, match); - - po->rollover = rollover; - rollover = NULL; - refcount_set(&match->sk_ref, refcount_read(&match->sk_ref) + 1); __fanout_link(sk, po); - err = 0; } + err = 0; } spin_unlock(&po->bind_lock);
@@ -3452,8 +3455,12 @@ static int packet_create(struct net *net, struct socket *sock, int protocol, po->prot_hook.af_packet_net = sock_net(sk);
if (proto) { + /* Implicitly bind socket to "any interface" */ + po->ifindex = 0; po->prot_hook.type = proto; __register_prot_hook(sk); + } else { + po->ifindex = -1; }
mutex_lock(&net->packet.sklist_lock);
Subject: [PATCH net-next v03 1/3] af_packet: allow fanout_add when socket is not RUNNING
PACKET socket can retain its fanout membership through link down and up and leave a fanout while closed regardless of link state. However, socket was forbidden from joining a fanout while it was not RUNNING.
This patch allows PACKET socket to join fanout while not RUNNING.
The previous test for RUNNING also implicitly tested that the socket is bound to a device. An explicit test of ifindex was added instead.
I had some mess up with the send-email. I prepared the following more complete git comment but sent the wrong patch. The code is still the same.
--- PACKET socket can retain its fanout membership through link down and up and leave a fanout while closed regardless of link state. However, socket was forbidden from joining a fanout while it was not RUNNING.
This patch allows PACKET socket to join fanout while not RUNNING.
Socket can be RUNNING if it has a specified protocol. Either directly from packet_create (being implicitly bound to any interface) or following a successful bind. Socket RUNNING state is switched off if it is bound to an interface that went down.
Instead of the test for RUNNING, this patch adds a test that socket can receive packets before allowing it into a fanout group. Socket can receive packets if it has a configured protocol and is bound to an interface or bound to any interface.
The only difference between the previous test and the current test is that the bound interface is allowed to have IFF_UP cleared. ---
Gur Stavi wrote:
PACKET socket can retain its fanout membership through link down and up and leave a fanout while closed regardless of link state. However, socket was forbidden from joining a fanout while it was not RUNNING.
This patch allows PACKET socket to join fanout while not RUNNING.
The previous test for RUNNING also implicitly tested that the socket is bound to a device. An explicit test of ifindex was added instead.
Signed-off-by: Gur Stavi gur.stavi@huawei.com
net/packet/af_packet.c | 35 +++++++++++++++++++++-------------- 1 file changed, 21 insertions(+), 14 deletions(-)
diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c index f8942062f776..8137c33ab0fd 100644 --- a/net/packet/af_packet.c +++ b/net/packet/af_packet.c @@ -1843,26 +1843,29 @@ static int fanout_add(struct sock *sk, struct fanout_args *args) match->prot_hook.ignore_outgoing = type_flags & PACKET_FANOUT_FLAG_IGNORE_OUTGOING; list_add(&match->list, &fanout_list); }
- err = -EINVAL;
spin_lock(&po->bind_lock);
- if (packet_sock_flag(po, PACKET_SOCK_RUNNING) &&
match->type == type &&
match->prot_hook.type == po->prot_hook.type &&
match->prot_hook.dev == po->prot_hook.dev) {
- if (po->ifindex == -1 || po->num == 0) {
This patch is more complex than it needs to be.
No need to block the case of ETH_P_NONE or not bound to a socket.
I would have discussed that in v2, but you already respun.
/* Socket can not receive packets */
err = -ENXIO;
- } else if (match->type != type ||
match->prot_hook.type != po->prot_hook.type ||
match->prot_hook.dev != po->prot_hook.dev) {
/* Joining an existing group, properties must be identical */
err = -EINVAL;
- } else if (refcount_read(&match->sk_ref) >= match->max_num_members) { err = -ENOSPC;
if (refcount_read(&match->sk_ref) < match->max_num_members) {
- } else {
/* Paired with packet_setsockopt(PACKET_FANOUT_DATA) */
WRITE_ONCE(po->fanout, match);
po->rollover = rollover;
rollover = NULL;
refcount_set(&match->sk_ref, refcount_read(&match->sk_ref) + 1);
if (packet_sock_flag(po, PACKET_SOCK_RUNNING)) { __dev_remove_pack(&po->prot_hook);
/* Paired with packet_setsockopt(PACKET_FANOUT_DATA) */
WRITE_ONCE(po->fanout, match);
po->rollover = rollover;
rollover = NULL;
refcount_set(&match->sk_ref, refcount_read(&match->sk_ref) + 1); __fanout_link(sk, po);
}err = 0;
} spin_unlock(&po->bind_lock);err = 0;
@@ -3452,8 +3455,12 @@ static int packet_create(struct net *net, struct socket *sock, int protocol, po->prot_hook.af_packet_net = sock_net(sk); if (proto) {
/* Implicitly bind socket to "any interface" */
po->prot_hook.type = proto; __register_prot_hook(sk);po->ifindex = 0;
- } else {
}po->ifindex = -1;
mutex_lock(&net->packet.sklist_lock); -- 2.45.2
Gur Stavi wrote:
PACKET socket can retain its fanout membership through link down and up and leave a fanout while closed regardless of link state. However, socket was forbidden from joining a fanout while it was not RUNNING.
This patch allows PACKET socket to join fanout while not RUNNING.
The previous test for RUNNING also implicitly tested that the socket is bound to a device. An explicit test of ifindex was added instead.
Signed-off-by: Gur Stavi gur.stavi@huawei.com
net/packet/af_packet.c | 35 +++++++++++++++++++++-------------- 1 file changed, 21 insertions(+), 14 deletions(-)
diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c index f8942062f776..8137c33ab0fd 100644 --- a/net/packet/af_packet.c +++ b/net/packet/af_packet.c @@ -1843,26 +1843,29 @@ static int fanout_add(struct sock *sk, struct
fanout_args *args)
match->prot_hook.ignore_outgoing = type_flags &
PACKET_FANOUT_FLAG_IGNORE_OUTGOING;
list_add(&match->list, &fanout_list);
}
err = -EINVAL;
spin_lock(&po->bind_lock);
if (packet_sock_flag(po, PACKET_SOCK_RUNNING) &&
match->type == type &&
match->prot_hook.type == po->prot_hook.type &&
match->prot_hook.dev == po->prot_hook.dev) {
- if (po->ifindex == -1 || po->num == 0) {
This patch is more complex than it needs to be.
No need to block the case of ETH_P_NONE or not bound to a socket.
ETH_P_NONE was blocked before as well. packet_do_bind will not switch socket to RUNNING when proto is 0.
if (proto == 0 || !need_rehook) goto out_unlock;
Same for packet_create.
So the old condition could only pass the RUNNING condition if proto was non-zero. The new condition is exactly equivalent except for allowing IFF_UP to be cleared in the bound device.
Yes, the same result could be achieved with a FEW less line changes but I think that the new logic is more readable where every clause explains itself with a comment instead of constructing one large if statement. And since the solution does add another nested if for the RUNNING the extra indentation started to look ugly.
I would have discussed that in v2, but you already respun.
/* Socket can not receive packets */
err = -ENXIO;
- } else if (match->type != type ||
match->prot_hook.type != po->prot_hook.type ||
match->prot_hook.dev != po->prot_hook.dev) {
/* Joining an existing group, properties must be identical */
err = -EINVAL;
- } else if (refcount_read(&match->sk_ref) >= match->max_num_members)
{
err = -ENOSPC;
if (refcount_read(&match->sk_ref) < match->max_num_members) {
- } else {
/* Paired with packet_setsockopt(PACKET_FANOUT_DATA) */
WRITE_ONCE(po->fanout, match);
po->rollover = rollover;
rollover = NULL;
refcount_set(&match->sk_ref, refcount_read(&match->sk_ref) +
1);
if (packet_sock_flag(po, PACKET_SOCK_RUNNING)) { __dev_remove_pack(&po->prot_hook);
/* Paired with packet_setsockopt(PACKET_FANOUT_DATA) */
WRITE_ONCE(po->fanout, match);
po->rollover = rollover;
rollover = NULL;
refcount_set(&match->sk_ref, refcount_read(&match-
sk_ref) + 1); __fanout_link(sk, po);
}err = 0;
} spin_unlock(&po->bind_lock);err = 0;
@@ -3452,8 +3455,12 @@ static int packet_create(struct net *net, struct
socket *sock, int protocol,
po->prot_hook.af_packet_net = sock_net(sk);
if (proto) {
/* Implicitly bind socket to "any interface" */
po->ifindex = 0;
po->prot_hook.type = proto; __register_prot_hook(sk);
} else {
po->ifindex = -1;
}
mutex_lock(&net->packet.sklist_lock);
-- 2.45.2
Modify test_control_group to have toggle parameter. When toggle is non-zero, loopback device will be set down for the initialization of fd[1] which is still expected to successfully join the fanout.
Signed-off-by: Gur Stavi gur.stavi@huawei.com --- tools/testing/selftests/net/psock_fanout.c | 42 ++++++++++++++++++++-- 1 file changed, 39 insertions(+), 3 deletions(-)
diff --git a/tools/testing/selftests/net/psock_fanout.c b/tools/testing/selftests/net/psock_fanout.c index 4f31e92ebd96..acdfae8f8a9a 100644 --- a/tools/testing/selftests/net/psock_fanout.c +++ b/tools/testing/selftests/net/psock_fanout.c @@ -48,6 +48,7 @@ #include <string.h> #include <sys/mman.h> #include <sys/socket.h> +#include <sys/ioctl.h> #include <sys/stat.h> #include <sys/types.h> #include <unistd.h> @@ -59,6 +60,33 @@
static uint32_t cfg_max_num_members;
+static void loopback_set_up_down(int state_up) +{ + struct ifreq ifreq = {}; + int fd, err; + + fd = socket(AF_PACKET, SOCK_RAW, 0); + if (fd < 0) { + perror("socket loopback"); + exit(1); + } + strcpy(ifreq.ifr_name, "lo"); + err = ioctl(fd, SIOCGIFFLAGS, &ifreq); + if (err) { + perror("SIOCGIFFLAGS"); + exit(1); + } + if (state_up != !!(ifreq.ifr_flags & IFF_UP)) { + ifreq.ifr_flags ^= IFF_UP; + err = ioctl(fd, SIOCSIFFLAGS, &ifreq); + if (err) { + perror("SIOCSIFFLAGS"); + exit(1); + } + } + close(fd); +} + /* Open a socket in a given fanout mode. * @return -1 if mode is bad, a valid socket otherwise */ static int sock_fanout_open(uint16_t typeflags, uint16_t group_id) @@ -264,17 +292,22 @@ static void test_control_single(void) }
/* Test illegal group with different modes or flags */ -static void test_control_group(void) +static void test_control_group(int toggle) { int fds[2];
- fprintf(stderr, "test: control multiple sockets\n"); + if (toggle) + fprintf(stderr, "test: control multiple sockets with link down toggle\n"); + else + fprintf(stderr, "test: control multiple sockets\n");
fds[0] = sock_fanout_open(PACKET_FANOUT_HASH, 0); if (fds[0] == -1) { fprintf(stderr, "ERROR: failed to open HASH socket\n"); exit(1); } + if (toggle) + loopback_set_up_down(0); if (sock_fanout_open(PACKET_FANOUT_HASH | PACKET_FANOUT_FLAG_DEFRAG, 0) != -1) { fprintf(stderr, "ERROR: joined group with wrong flag defrag\n"); @@ -294,6 +327,8 @@ static void test_control_group(void) fprintf(stderr, "ERROR: failed to join group\n"); exit(1); } + if (toggle) + loopback_set_up_down(1); if (close(fds[1]) || close(fds[0])) { fprintf(stderr, "ERROR: closing sockets\n"); exit(1); @@ -489,7 +524,8 @@ int main(int argc, char **argv) int port_off = 2, tries = 20, ret;
test_control_single(); - test_control_group(); + test_control_group(0); + test_control_group(1); test_control_group_max_num_members(); test_unique_fanout_group_ids();
Add a test that validates that an unbound packet socket cannot create/join a fanout group.
Signed-off-by: Gur Stavi gur.stavi@huawei.com --- tools/testing/selftests/net/psock_fanout.c | 36 ++++++++++++++++++++++ 1 file changed, 36 insertions(+)
diff --git a/tools/testing/selftests/net/psock_fanout.c b/tools/testing/selftests/net/psock_fanout.c index acdfae8f8a9a..84c524357075 100644 --- a/tools/testing/selftests/net/psock_fanout.c +++ b/tools/testing/selftests/net/psock_fanout.c @@ -279,6 +279,41 @@ static int sock_fanout_read(int fds[], char *rings[], const int expect[]) return 0; }
+/* Test that creating/joining a fanout group fails for unbound socket without + * a specified protocol + */ +static void test_unbound_fanout(void) +{ + int val, fd0, fd1, err; + + fprintf(stderr, "test: unbound fanout\n"); + fd0 = socket(PF_PACKET, SOCK_RAW, 0); + if (fd0 < 0) { + perror("socket packet"); + exit(1); + } + /* Try to create a new fanout group. Should fail. */ + val = (PACKET_FANOUT_HASH << 16) | 1; + err = setsockopt(fd0, SOL_PACKET, PACKET_FANOUT, &val, sizeof(val)); + if (!err) { + fprintf(stderr, "ERROR: unbound socket fanout create\n"); + exit(1); + } + fd1 = sock_fanout_open(PACKET_FANOUT_HASH, 1); + if (fd1 == -1) { + fprintf(stderr, "ERROR: failed to open HASH socket\n"); + exit(1); + } + /* Try to join an existing fanout group. Should fail. */ + err = setsockopt(fd0, SOL_PACKET, PACKET_FANOUT, &val, sizeof(val)); + if (!err) { + fprintf(stderr, "ERROR: unbound socket fanout join\n"); + exit(1); + } + close(fd0); + close(fd1); +} + /* Test illegal mode + flag combination */ static void test_control_single(void) { @@ -523,6 +558,7 @@ int main(int argc, char **argv) const int expect_uniqueid[2][2] = { { 20, 20}, { 20, 20 } }; int port_off = 2, tries = 20, ret;
+ test_unbound_fanout(); test_control_single(); test_control_group(0); test_control_group(1);
linux-kselftest-mirror@lists.linaro.org