Hello Cassio,
On 10/06/2025 21:31:48+0100, Cassio Neri wrote:
> Hi all,
>
> Although untested, I'm pretty sure that with very small changes, the
> previous revision (1d1bb12) can handle dates prior to 1970-01-01 with no
> need to add extra branches or arithmetic operations. Indeed, 1d1bb12
> contains:
>
> <code>
> /* time must be positive */
> days = div_s64_rem(time, 86400, &secs);
>
> /* day of the week, 1970-01-01 was a Thursday */
> tm->tm_wday = (days + 4) % 7;
>
> /* long comments */
>
> udays = ((u32) days) + 719468;
> </code>
>
> This could have been changed to:
>
> <code>
> /* time must be >= -719468 * 86400 which corresponds to 0000-03-01 */
> udays = div_u64_rem(time + 719468 * 86400, 86400, &secs);
>
> /* day of the week, 0000-03-01 was a Wednesday (in the proleptic Gregorian
> calendar) */
> tm->tm_wday = (days + 3) % 7;
>
> /* long comments */
> </code>
>
> Indeed, the addition of 719468 * 86400 to `time` makes `days` to be 719468
> more than it should be. Therefore, in the calculation of `udays`, the
> addition of 719468 becomes unnecessary and thus, `udays == days`. Moreover,
> this means that `days` can be removed altogether and replaced by `udays`.
> (Not the other way around because in the remaining code `udays` must be
> u32.)
>
> Now, 719468 % 7 = 1 and thus tm->wday is 1 day after what it should be and
> we correct that by adding 3 instead of 4.
>
> Therefore, I suggest these changes on top of 1d1bb12 instead of those made
> in 7df4cfe. Since you're working on this, can I please kindly suggest two
> other changes?
>
> 1) Change the reference provided in the long comment. It should say, "The
> following algorithm is, basically, Figure 12 of Neri and Schneider [1]" and
> [1] should refer to the published article:
>
> Neri C, Schneider L. Euclidean affine functions and their application to
> calendar algorithms. Softw Pract Exper. 2023;53(4):937-970. doi:
> 10.1002/spe.3172
> https://doi.org/10.1002/spe.3172
>
> The article is much better written and clearer than the pre-print currently
> referred to.
>
Thanks for your input, I wanted to look again at your paper and make those
optimizations which is why I took so long to review the original patch.
Unfortunately, I didn't have the time before the merge window.
I would also gladly take patches for this if you are up for the task.
> 2) Function rtc_time64_to_tm_test_date_range in drivers/rtc/lib_test.c, is
> a kunit test that checks the result for everyday in a 160000 years range
> starting at 1970-01-01. It'd be nice if this test is adapted to the new
> code and starts at 1900-01-01 (technically, it could start at 0000-03-01
> but since tm->year counts from 1900, it would be weird to see tm->year ==
> -1900 to mean that the calendar year is 0.) Also 160000 is definitely an
> overkill (my bad!) and a couple of thousands of years, say 3000, should be
> more than safe for anyone. :-)
This is also something on my radar as some have been complaining about the time
it takes to run those tests.
>
> Many thanks,
> Cassio.
>
>
>
> On Tue, 10 Jun 2025 at 08:35, Uwe Kleine-König <u.kleine-koenig(a)baylibre.com>
> wrote:
>
> > From: Alexandre Mergnat <amergnat(a)baylibre.com>
> >
> > commit 7df4cfef8b351fec3156160bedfc7d6d29de4cce upstream.
> >
> > Conversion of dates before 1970 is still relevant today because these
> > dates are reused on some hardwares to store dates bigger than the
> > maximal date that is representable in the device's native format.
> > This prominently and very soon affects the hardware covered by the
> > rtc-mt6397 driver that can only natively store dates in the interval
> > 1900-01-01 up to 2027-12-31. So to store the date 2028-01-01 00:00:00
> > to such a device, rtc_time64_to_tm() must do the right thing for
> > time=-2208988800.
> >
> > Signed-off-by: Alexandre Mergnat <amergnat(a)baylibre.com>
> > Reviewed-by: Uwe Kleine-König <u.kleine-koenig(a)baylibre.com>
> > Link:
> > https://lore.kernel.org/r/20250428-enable-rtc-v4-1-2b2f7e3f9349@baylibre.com
> > Signed-off-by: Alexandre Belloni <alexandre.belloni(a)bootlin.com>
> > Signed-off-by: Uwe Kleine-König <u.kleine-koenig(a)baylibre.com>
> > ---
> > drivers/rtc/lib.c | 24 +++++++++++++++++++-----
> > 1 file changed, 19 insertions(+), 5 deletions(-)
> >
> > diff --git a/drivers/rtc/lib.c b/drivers/rtc/lib.c
> > index fe361652727a..13b5b1f20465 100644
> > --- a/drivers/rtc/lib.c
> > +++ b/drivers/rtc/lib.c
> > @@ -46,24 +46,38 @@ EXPORT_SYMBOL(rtc_year_days);
> > * rtc_time64_to_tm - converts time64_t to rtc_time.
> > *
> > * @time: The number of seconds since 01-01-1970 00:00:00.
> > - * (Must be positive.)
> > + * Works for values since at least 1900
> > * @tm: Pointer to the struct rtc_time.
> > */
> > void rtc_time64_to_tm(time64_t time, struct rtc_time *tm)
> > {
> > - unsigned int secs;
> > - int days;
> > + int days, secs;
> >
> > u64 u64tmp;
> > u32 u32tmp, udays, century, day_of_century, year_of_century, year,
> > day_of_year, month, day;
> > bool is_Jan_or_Feb, is_leap_year;
> >
> > - /* time must be positive */
> > + /*
> > + * Get days and seconds while preserving the sign to
> > + * handle negative time values (dates before 1970-01-01)
> > + */
> > days = div_s64_rem(time, 86400, &secs);
> >
> > + /*
> > + * We need 0 <= secs < 86400 which isn't given for negative
> > + * values of time. Fixup accordingly.
> > + */
> > + if (secs < 0) {
> > + days -= 1;
> > + secs += 86400;
> > + }
> > +
> > /* day of the week, 1970-01-01 was a Thursday */
> > tm->tm_wday = (days + 4) % 7;
> > + /* Ensure tm_wday is always positive */
> > + if (tm->tm_wday < 0)
> > + tm->tm_wday += 7;
> >
> > /*
> > * The following algorithm is, basically, Proposition 6.3 of Neri
> > @@ -93,7 +107,7 @@ void rtc_time64_to_tm(time64_t time, struct rtc_time
> > *tm)
> > * thus, is slightly different from [1].
> > */
> >
> > - udays = ((u32) days) + 719468;
> > + udays = days + 719468;
> >
> > u32tmp = 4 * udays + 3;
> > century = u32tmp / 146097;
> > --
> > 2.49.0
> >
> >
This reverts commit e70c301faece15b618e54b613b1fd6ece3dd05b4.
Commit <e70c301faece> ("block: don't reorder requests in
blk_add_rq_to_plug") reversed how requests are stored in the blk_plug
list, this had significant impact on bio merging with requests exist on
the plug list. This impact has been reported in [1] and could easily be
reproducible using 4k randwrite fio benchmark on an NVME based SSD without
having any filesystem on the disk.
My benchmark is:
fio --time_based --name=benchmark --size=50G --rw=randwrite \
--runtime=60 --filename="/dev/nvme1n1" --ioengine=psync \
--randrepeat=0 --iodepth=1 --fsync=64 --invalidate=1 \
--verify=0 --verify_fatal=0 --blocksize=4k --numjobs=4 \
--group_reporting
On 1.9TiB SSD(180K Max IOPS) attached to i3.16xlarge AWS EC2 instance.
Kernel | fio (B.W MiB/sec) | I/O size (iostat)
--------------+---------------------+--------------------
6.15.1 | 362 | 2KiB
6.15.1+revert | 660 (+82%) | 4KiB
--------------+---------------------+--------------------
I have run iostat while the fio benchmark was running and was able to
see that the I/O size seen on the disk is shown as 2KB without this revert
while it's 4KB with the revert. In the bad case the write bandwidth
is capped at around 362MiB/sec which almost 2KiB * 180K IOPS so we are
hitting the SSD Disk IOPS limit which is 180K. After the revert the I/O
size has been doubled to 4KiB hence the bandwidth has been almost doubled
as we no longer hit the Disk IOPS limit.
I have done some tracing using bpftrace & bcc and was able to conclude that
the reason behind the I/O size discrepancy with the revert is that this
fio benchmark is subimitting each 4k I/O as 2 contiguous 2KB bios.
In the good case each 2 bios are merged in a 4KB request that's then been
submitted to the disk while in the bad case 2K bios are submitted to the
disk without merging because blk_attempt_plug_merge() failed to merge
them as seen below.
**Without the revert**
[12:12:28]
r::blk_attempt_plug_merge():int:$retval
COUNT EVENT
5618 $retval = 1
176578 $retval = 0
**With the revert**
[12:11:43]
r::blk_attempt_plug_merge():int:$retval
COUNT EVENT
146684 $retval = 0
146686 $retval = 1
In blk_attempt_plug_merge() we are iterating ithrought the plug list
from head to tail looking for a request with which we can merge the
most recently submitted bio.
With commit <e70c301faece> ("block: don't reorder requests in
blk_add_rq_to_plug") the most recent request will be at the tail so
blk_attempt_plug_merge() will fail because it tries to merge bio with
the plug list head. In blk_attempt_plug_merge() we don't iterate across
the whole plug list because as we exit the loop once we fail merging in
blk_attempt_bio_merge().
In commit <bc490f81731> ("block: change plugging to use a singly linked
list") the plug list has been changed to single linked list so there's
no way to iterate the list from tail to head which is the only way to
mitigate the impact on bio merging if we want to keep commit <e70c301faece>
("block: don't reorder requests in blk_add_rq_to_plug").
Given that moving plug list to a single linked list was mainly for
performance reason then let's revert commit <e70c301faece> ("block: don't
reorder requests in blk_add_rq_to_plug") for now to mitigate the
reported performance regression.
[1] https://lore.kernel.org/lkml/202412122112.ca47bcec-lkp@intel.com/
Cc: stable(a)vger.kernel.org # 6.12
Reported-by: kernel test robot <oliver.sang(a)intel.com>
Reported-by: Hagar Hemdan <hagarhem(a)amazon.com>
Reported-and-bisected-by: Shaoying Xu <shaoyi(a)amazon.com>
Signed-off-by: Hazem Mohamed Abuelfotoh <abuehaze(a)amazon.com>
---
block/blk-mq.c | 4 ++--
drivers/block/virtio_blk.c | 2 +-
drivers/nvme/host/pci.c | 2 +-
3 files changed, 4 insertions(+), 4 deletions(-)
diff --git a/block/blk-mq.c b/block/blk-mq.c
index c2697db59109..28965cac19fb 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -1394,7 +1394,7 @@ static void blk_add_rq_to_plug(struct blk_plug *plug, struct request *rq)
*/
if (!plug->has_elevator && (rq->rq_flags & RQF_SCHED_TAGS))
plug->has_elevator = true;
- rq_list_add_tail(&plug->mq_list, rq);
+ rq_list_add_head(&plug->mq_list, rq);
plug->rq_count++;
}
@@ -2846,7 +2846,7 @@ static void blk_mq_dispatch_plug_list(struct blk_plug *plug, bool from_sched)
rq_list_add_tail(&requeue_list, rq);
continue;
}
- list_add_tail(&rq->queuelist, &list);
+ list_add(&rq->queuelist, &list);
depth++;
} while (!rq_list_empty(&plug->mq_list));
diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
index 7cffea01d868..7992a171f905 100644
--- a/drivers/block/virtio_blk.c
+++ b/drivers/block/virtio_blk.c
@@ -513,7 +513,7 @@ static void virtio_queue_rqs(struct rq_list *rqlist)
vq = this_vq;
if (virtblk_prep_rq_batch(req))
- rq_list_add_tail(&submit_list, req);
+ rq_list_add_head(&submit_list, req); /* reverse order */
else
rq_list_add_tail(&requeue_list, req);
}
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index f1dd804151b1..5f7da42f9dac 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -1026,7 +1026,7 @@ static void nvme_queue_rqs(struct rq_list *rqlist)
nvmeq = req->mq_hctx->driver_data;
if (nvme_prep_rq_batch(nvmeq, req))
- rq_list_add_tail(&submit_list, req);
+ rq_list_add_head(&submit_list, req); /* reverse order */
else
rq_list_add_tail(&requeue_list, req);
}
--
2.47.1
From: Bartosz Golaszewski <bartosz.golaszewski(a)linaro.org>
When requesting pins whose intr_detection_width setting is not 1 or 2
for interrupts (for example by running `gpiomon -c 0 113` on RB2), we'll
hit a BUG() in msm_gpio_irq_set_type(). Potentially crashing the kernel
due to an invalid request from user-space is not optimal, so let's go
through the pins and mark those that would fail the check as invalid for
the irq chip as we should not even register them as available irqs.
This function can be extended if we determine that there are more
corner-cases like this.
Fixes: f365be092572 ("pinctrl: Add Qualcomm TLMM driver")
Cc: stable(a)vger.kernel.org
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski(a)linaro.org>
---
drivers/pinctrl/qcom/pinctrl-msm.c | 19 +++++++++++++++++++
1 file changed, 19 insertions(+)
diff --git a/drivers/pinctrl/qcom/pinctrl-msm.c b/drivers/pinctrl/qcom/pinctrl-msm.c
index f012ea88aa22c..77e0c2f023455 100644
--- a/drivers/pinctrl/qcom/pinctrl-msm.c
+++ b/drivers/pinctrl/qcom/pinctrl-msm.c
@@ -1038,6 +1038,24 @@ static bool msm_gpio_needs_dual_edge_parent_workaround(struct irq_data *d,
test_bit(d->hwirq, pctrl->skip_wake_irqs);
}
+static void msm_gpio_irq_init_valid_mask(struct gpio_chip *gc,
+ unsigned long *valid_mask,
+ unsigned int ngpios)
+{
+ struct msm_pinctrl *pctrl = gpiochip_get_data(gc);
+ const struct msm_pingroup *g;
+ int i;
+
+ bitmap_fill(valid_mask, ngpios);
+
+ for (i = 0; i < ngpios; i++) {
+ g = &pctrl->soc->groups[i];
+ if (g->intr_detection_width != 1 &&
+ g->intr_detection_width != 2)
+ clear_bit(i, valid_mask);
+ }
+}
+
static int msm_gpio_irq_set_type(struct irq_data *d, unsigned int type)
{
struct gpio_chip *gc = irq_data_get_irq_chip_data(d);
@@ -1441,6 +1459,7 @@ static int msm_gpio_init(struct msm_pinctrl *pctrl)
girq->default_type = IRQ_TYPE_NONE;
girq->handler = handle_bad_irq;
girq->parents[0] = pctrl->irq;
+ girq->init_valid_mask = msm_gpio_irq_init_valid_mask;
ret = gpiochip_add_data(&pctrl->chip, pctrl);
if (ret) {
--
2.48.1
This series has a bug fix for the early bootconsole as well as some
related efficiency improvements and cleanup.
The relevant code is subject to CONSOLE_DEBUG, which is presently only
used with CONFIG_MAC. To test this series (in qemu-system-m68k, for example)
it's helpful to enable CONFIG_EARLY_PRINTK and
CONFIG_FRAMEBUFFER_CONSOLE_DEFERRED_TAKEOVER and boot with
kernel parameters 'console=ttyS0 earlyprintk keep_bootcon'.
---
Changed since v1:
- Solved problem with line wrap while scrolling.
- Added two additional patches.
Changed since v2:
- Adopted addq and subq as suggested by Andreas.
Finn Thain (3):
m68k: Fix lost column on framebuffer debug console
m68k: Avoid pointless recursion in debug console rendering
m68k: Remove unused "cursor home" code from debug console
arch/m68k/kernel/head.S | 73 +++++++++++++++++++++--------------------
1 file changed, 37 insertions(+), 36 deletions(-)
--
2.45.3
From: Dan Aloni <dan.aloni(a)vastdata.com>
[ Upstream commit a9c10b5b3b67b3750a10c8b089b2e05f5e176e33 ]
If there are failures then we must not leave the non-NULL pointers with
the error value, otherwise `rpcrdma_ep_destroy` gets confused and tries
free them, resulting in an Oops.
Signed-off-by: Dan Aloni <dan.aloni(a)vastdata.com>
Acked-by: Chuck Lever <chuck.lever(a)oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker(a)Netapp.com>
(cherry picked from commit a9c10b5b3b67b3750a10c8b089b2e05f5e176e33)
[Larry: backport to 5.4.y. Minor conflict resolved due to missing commit 93aa8e0a9de80
xprtrdma: Merge struct rpcrdma_ia into struct rpcrdma_ep]
Signed-off-by: Larry Bassel <larry.bassel(a)oracle.com>
---
net/sunrpc/xprtrdma/verbs.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
index cfae1a871578..4fd3f632a2af 100644
--- a/net/sunrpc/xprtrdma/verbs.c
+++ b/net/sunrpc/xprtrdma/verbs.c
@@ -525,6 +525,7 @@ int rpcrdma_ep_create(struct rpcrdma_xprt *r_xprt)
IB_POLL_WORKQUEUE);
if (IS_ERR(sendcq)) {
rc = PTR_ERR(sendcq);
+ sendcq = NULL;
goto out1;
}
@@ -533,6 +534,7 @@ int rpcrdma_ep_create(struct rpcrdma_xprt *r_xprt)
IB_POLL_WORKQUEUE);
if (IS_ERR(recvcq)) {
rc = PTR_ERR(recvcq);
+ recvcq = NULL;
goto out2;
}
--
2.46.0
Hi Kees,
Bill's PR to disable __counted_by for "whole struct" __bdos cases has now
been merged into 19.1.3 [1], so here's the patch to disable __counted_by
for clang versions < 19.1.3 in the kernel.
Hopefully in the near future __counted_by for whole struct __bdos can be
enabled once again in coordination between the kernel, gcc, and clang.
There has been recent progress on this in [2] thanks to Tavian.
Also see previous discussion on the mailing list [3]
Thanks to everyone for moving this issue along. In particular, Bill for
his PR to clang/llvm, Kees and Thorsten for reproducers of the two issues,
Nathan for Kconfig-ifying this patch, and Miguel for reviewing.
Info for the stable team:
This patch should be backported to kernels >= 6.6 to make sure that those
build correctly with the effected clang versions. This patch cherry-picks
cleanly onto linux-6.11.y. For linux-6.6.y three prerequiste commits are
neded:
16c31dd7fdf6: Compiler Attributes: counted_by: bump min gcc version
2993eb7a8d34: Compiler Attributes: counted_by: fixup clang URL
231dc3f0c936: lkdtm/bugs: Improve warning message for compilers without counted_by support
There are still two merge conflicts even with those prerequistes.
Here's the correct resolution:
1. include/linux/compiler_types.h:
use the incoming change until before (but not including) the
"Apply __counted_by() when the Endianness matches to increase test coverage."
comment
2. lib/overflow_kunit.c:
HEAD is correct
[1] https://github.com/llvm/llvm-project/pull/112786
[2] https://github.com/llvm/llvm-project/pull/112636
[3] https://lore.kernel.org/lkml/3E304FB2-799D-478F-889A-CDFC1A52DCD8@toblux.co…
Best Regards
Jan
Jan Hendrik Farr (1):
Compiler Attributes: disable __counted_by for clang < 19.1.3
drivers/misc/lkdtm/bugs.c | 2 +-
include/linux/compiler_attributes.h | 13 -------------
include/linux/compiler_types.h | 19 +++++++++++++++++++
init/Kconfig | 9 +++++++++
lib/overflow_kunit.c | 2 +-
5 files changed, 30 insertions(+), 15 deletions(-)
--
2.47.0
Hi Greg,
Please cherry-pick this patch series into 5.10.y stable. It
includes a feature that fixes CVE-2022-0500 which allows a user with
cap_bpf privileges to get root privileges. The patch that fixes
the bug is
patch 6/8: bpf: Make per_cpu_ptr return rdonly PTR_TO_MEM
The rest are the depedences required by the fix patch.
This patchset has been merged in mainline v5.17 and backported to v5.16[1]
and v5.15[2]
Tested by compile, build and run through the bpf selftest test_progs.
Before:
./test_progs -t ksyms_btf/write_check
test_ksyms_btf:PASS:btf_exists 0 nsec
test_write_check:FAIL:skel_open unexpected load of a prog writing to ksym memory
#44/3 write_check:FAIL
#44 ksyms_btf:FAIL
Summary: 0/0 PASSED, 0 SKIPPED, 2 FAILED
After:
./test_progs -t ksyms_btf/write_check
#44/3 write_check:OK
#44 ksyms_btf:OK
Summary: 1/1 PASSED, 0 SKIPPED, 0 FAILED
[1] https://lore.kernel.org/all/Yg6cixLJFoxDmp+I@kroah.com/
[2] https://lore.kernel.org/all/Ymupcl2JshcWjmMD@kroah.com/
Hao Luo (8):
bpf: Introduce composable reg, ret and arg types.
bpf: Replace ARG_XXX_OR_NULL with ARG_XXX | PTR_MAYBE_NULL
bpf: Replace RET_XXX_OR_NULL with RET_XXX | PTR_MAYBE_NULL
bpf: Replace PTR_TO_XXX_OR_NULL with PTR_TO_XXX | PTR_MAYBE_NULL
bpf: Introduce MEM_RDONLY flag
bpf: Make per_cpu_ptr return rdonly PTR_TO_MEM.
bpf: Add MEM_RDONLY for helper args that are pointers to rdonly mem.
bpf/selftests: Test PTR_TO_RDONLY_MEM
include/linux/bpf.h | 98 +++-
include/linux/bpf_verifier.h | 18 +
kernel/bpf/btf.c | 8 +-
kernel/bpf/cgroup.c | 2 +-
kernel/bpf/helpers.c | 10 +-
kernel/bpf/map_iter.c | 4 +-
kernel/bpf/ringbuf.c | 2 +-
kernel/bpf/verifier.c | 477 +++++++++---------
kernel/trace/bpf_trace.c | 22 +-
net/core/bpf_sk_storage.c | 2 +-
net/core/filter.c | 62 +--
net/core/sock_map.c | 2 +-
.../selftests/bpf/prog_tests/ksyms_btf.c | 14 +
.../bpf/progs/test_ksyms_btf_write_check.c | 29 ++
14 files changed, 441 insertions(+), 309 deletions(-)
create mode 100644 tools/testing/selftests/bpf/progs/test_ksyms_btf_write_check.c
--
2.47.1
Hello,
this is a followup to
https://lore.kernel.org/stable/cover.1749223334.git.u.kleine-koenig@baylibr…
that handled backporting the two patches by Alexandre to the active
stable kernels between 6.15 and 5.15. Here comes a backport to 5.10.y, git
am handles application to 5.4.y just fine.
Compared to the backport for later kernels I included a major rework of
rtc_time64_to_tm() by Cassio Neri. (FTR: I checked, that commit by
Cassio Neri isn't the reason we need to fix rtc_time64_to_tm(), the
actual problem is older.)
Now that I completed the backport and did some final checks on it I
noticed that the problem fixed here is (TTBOMK) a theoretic one because
only drivers with .start_secs < 0 are known to have issues and in 5.10
and before there is no such driver. I'm uncertain if this should result
in not backporting the changes. I would tend to pick them anyhow, but
I won't argue on a veto.
Best regards
Uwe
Alexandre Mergnat (2):
rtc: Make rtc_time64_to_tm() support dates before 1970
rtc: Fix offset calculation for .start_secs < 0
Cassio Neri (1):
rtc: Improve performance of rtc_time64_to_tm(). Add tests.
drivers/rtc/Kconfig | 10 ++++
drivers/rtc/Makefile | 1 +
drivers/rtc/class.c | 2 +-
drivers/rtc/lib.c | 121 ++++++++++++++++++++++++++++++++---------
drivers/rtc/lib_test.c | 79 +++++++++++++++++++++++++++
5 files changed, 185 insertions(+), 28 deletions(-)
create mode 100644 drivers/rtc/lib_test.c
base-commit: 01e7e36b8606e5d4fddf795938010f7bfa3aa277
--
2.49.0
From: Jakub Kicinski <kuba(a)kernel.org>
commit f22b4b55edb507a2b30981e133b66b642be4d13f upstream.
I find the behavior of xa_for_each_start() slightly counter-intuitive.
It doesn't end the iteration by making the index point after the last
element. IOW calling xa_for_each_start() again after it "finished"
will run the body of the loop for the last valid element, instead
of doing nothing.
This works fine for netlink dumps if they terminate correctly
(i.e. coalesce or carefully handle NLM_DONE), but as we keep getting
reminded legacy dumps are unlikely to go away.
Fixing this generically at the xa_for_each_start() level seems hard -
there is no index reserved for "end of iteration".
ifindexes are 31b wide, tho, and iterator is ulong so for
for_each_netdev_dump() it's safe to go to the next element.
Signed-off-by: Jakub Kicinski <kuba(a)kernel.org>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel(a)intel.com>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
Signed-off-by: Jeremy Kerr <jk(a)codeconstruct.com.au>
---
The mctp RTM_GETADDR rework backport of acab78ae12c7 ("net: mctp: Don't
access ifa_index when missing") pulled 2d45eeb7d5d7 ("mctp: no longer
rely on net->dev_index_head[]") as a dependency. However, that change
relies on this backport for correct behaviour of for_each_netdev_dump().
Jakub mentions[1] that nothing should be relying on the old behaviour of
for_each_netdev_dump(), hence the backport.
[1]: https://lore.kernel.org/netdev/20250609083749.741c27f5@kernel.org/
This backport is only applicable to 6.6.y; the change hit upstream in
6.10.
---
include/linux/netdevice.h | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 0b0a172337dbac5716e5e5556befd95b4c201f5b..030d9de2ba2d23aa80b4b02182883f022f553964 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -3036,7 +3036,8 @@ extern rwlock_t dev_base_lock; /* Device list lock */
#define net_device_entry(lh) list_entry(lh, struct net_device, dev_list)
#define for_each_netdev_dump(net, d, ifindex) \
- xa_for_each_start(&(net)->dev_by_index, (ifindex), (d), (ifindex))
+ for (; (d = xa_find(&(net)->dev_by_index, &ifindex, \
+ ULONG_MAX, XA_PRESENT)); ifindex++)
static inline struct net_device *next_net_device(struct net_device *dev)
{
---
base-commit: c2603c511feb427b2b09f74b57816a81272932a1
change-id: 20250610-nl-dump-618700905d4f
Best regards,
--
Jeremy Kerr <jk(a)codeconstruct.com.au>