Hej min kära,
Jag är ledsen att jag stör dig och inkräktar på din integritet. Jag är
singel, ensam och i behov av en omtänksam, kärleksfull och romantisk
följeslagare.
Jag är en hemlig beundrare och skulle vilja utforska möjligheten att
lära mig mer om varandra. Jag vet att det är konstigt att kontakta dig
på det här sättet och jag hoppas att du kan förlåta mig. Jag är en blyg
person och det är det enda sättet jag vet att jag kan få din
uppmärksamhet. Jag vill bara veta vad du tycker och min avsikt är inte
att förolämpa dig. Jag hoppas att vi kan vara vänner om det är vad du
vill, även om jag vill vara mer än bara en vän. Jag vet att du har några
frågor att ställa och jag hoppas att jag kan tillfredsställa en del av
din nyfikenhet med några svar.
Jag tror på talesättet att för världen är du bara en person, men för
någon speciell är du världen, allt jag vill ha är kärlek, romantisk
omsorg och uppmärksamhet från en speciell följeslagare som jag hoppas
skulle vara du.
Jag hoppas att detta meddelande kommer att bli början på en långsiktig
kommunikation mellan oss, skicka bara ett svar på detta meddelande, det
kommer att göra mig glad.
Puss och kram,
Marion.
Mi dispiace disturbarti e invadere la tua privacy. Sono single,
solitario e bisognoso di un compagno premuroso, amorevole e romantico.
Sono un ammiratore segreto e vorrei esplorare l'opportunità di farlo
saperne di più l'uno sull'altro. So che è strano contattarti
in questo modo e spero che tu possa perdonarmi. Sono una persona timida e
questo è l'unico modo in cui so di poter attirare la tua attenzione. Voglio semplicemente
per sapere cosa ne pensate e la mia intenzione non è di offendervi.
Spero che possiamo essere amici se è quello che vuoi, anche se lo vorrei
essere più di un semplice amico. So che hai alcune domande da fare
chiedi e spero di poter soddisfare alcune delle tue curiosità con alcuni
risposte.
Credo nel detto che "per il mondo sei solo una persona,
ma per qualcuno di speciale tu sei il mondo'. Tutto quello che voglio è amore,
cure e attenzioni romantiche da una compagna speciale quale sono io
sperando saresti tu.
Spero che questo messaggio sia l'inizio di un lungo periodo
comunicazione tra di noi, è sufficiente inviare una risposta a questo messaggio, it
mi renderà felice.
Baci e abbracci,
Marion.
> This is for pre-6.4 kernels, as scrub code goes through a huge rework.
>
> [BUG]
> Even before the scrub rework, if we have some corrupted metadata failed
> to be repaired during replace, we still continue replace and let it
> finish just as there is nothing wrong:
>
> BTRFS info (device dm-4): dev_replace from /dev/mapper/test-scratch1 (devid 1) to /dev/mapper/test-scratch2 started
> BTRFS warning (device dm-4): tree block 5578752 mirror 1 has bad csum, has 0x00000000 want 0xade80ca1
> BTRFS warning (device dm-4): tree block 5578752 mirror 0 has bad csum, has 0x00000000 want 0xade80ca1
> BTRFS warning (device dm-4): checksum error at logical 5578752 on dev /dev/mapper/test-scratch1, physical 5578752: metadata leaf (level 0) in tree 5
> BTRFS warning (device dm-4): checksum error at logical 5578752 on dev /dev/mapper/test-scratch1, physical 5578752: metadata leaf (level 0) in tree 5
> BTRFS error (device dm-4): bdev /dev/mapper/test-scratch1 errs: wr 0, rd 0, flush 0, corrupt 1, gen 0
> BTRFS warning (device dm-4): tree block 5578752 mirror 1 has bad bytenr, has 0 want 5578752
> BTRFS error (device dm-4): unable to fixup (regular) error at logical 5578752 on dev /dev/mapper/test-scratch1
> BTRFS info (device dm-4): dev_replace from /dev/mapper/test-scratch1 (devid 1) to /dev/mapper/test-scratch2 finished
>
> This can lead to unexpected problems for the result fs.
>
> [CAUSE]
> Btrfs reuses scrub code path for dev-replace to iterate all dev extents.
>
> But unlike scrub, dev-replace doesn't really bother to check the scrub
> progress, which records all the errors found during replace.
>
> And even if we checks the progress, we can not really determine which
> errors are minor, which are critical just by the plain numbers.
> (remember we don't treat metadata/data checksum error differently).
>
> This behavior is there from the very beginning.
>
> [FIX]
> Instead of continue the replace, just error out if we hit an unrepaired
> metadata sector.
>
> Now the dev-replace would be rejected with -EIO, to inform the user.
> Although it also means, the fs has some metadata error which can not be
> repaired, the user would be super upset anyway.
If one sector is bad in metadata how much secondary data is damaged?
As someone who recently had to remove, and currently replace a disk.
it is upsetting, if it stopped if we stopped because 0,01% of data is
unrepairable, if we can save the other 99,99%. Can we have it
continue, print an error message to standard out, and a way for the
user to delete or copy it (with som option like -force-delete or
--force-copy) with the error to the new disk?
"Metadata at block 5578752 is damaged and unrepaired. Skipping. Read
`man btrfs-replace` for more info. "
At the end if possible, list files affected by the damaged metadata blocks.
In man answer:
How can the user know what files are connected to the metadata?
How can a user decide what to do with the damaged metadata?
At minimum, can there be some useful info to the info to the error output? like
"Replace has stopped, due to reading unrepaired metadata block, was
working on block 5578752, se `dmesg` for more details. (\s Sorry but
you are currently f..k)"
>
> The new dmesg would look like this:
>
> BTRFS info (device dm-4): dev_replace from /dev/mapper/test-scratch1 (devid 1) to /dev/mapper/test-scratch2 started
> BTRFS warning (device dm-4): tree block 5578752 mirror 1 has bad csum, has 0x00000000 want 0xade80ca1
> BTRFS warning (device dm-4): tree block 5578752 mirror 1 has bad csum, has 0x00000000 want 0xade80ca1
> BTRFS error (device dm-4): unable to fixup (regular) error at logical 5570560 on dev /dev/mapper/test-scratch1 physical 5570560
> BTRFS warning (device dm-4): header error at logical 5570560 on dev /dev/mapper/test-scratch1, physical 5570560: metadata leaf (level 0) in tree 5
> BTRFS warning (device dm-4): header error at logical 5570560 on dev /dev/mapper/test-scratch1, physical 5570560: metadata leaf (level 0) in tree 5
> BTRFS error (device dm-4): stripe 5570560 has unrepaired metadata sector at 5578752
> BTRFS error (device dm-4): btrfs_scrub_dev(/dev/mapper/test-scratch1, 1, /dev/mapper/test-scratch2) failed -5
>
> CC: stable(a)vger.kernel.org
> Signed-off-by: Qu Wenruo <wqu(a)suse.com>
> ---
> I'm not sure how should we merge this patch.
>
> The misc-next is already merging the new scrub code, but the problem is
> there for all old kernels thus we need such fixes.
>
> Maybe we can merge this fix before the scrub rework, then the rework,
> and finally the better fix using reworked interface?
> ---
> fs/btrfs/scrub.c | 9 +++++++++
> 1 file changed, 9 insertions(+)
>
> diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
> index ef4046a2572c..71f64b9bcd9f 100644
> --- a/fs/btrfs/scrub.c
> +++ b/fs/btrfs/scrub.c
> @@ -195,6 +195,7 @@ struct scrub_ctx {
> struct mutex wr_lock;
> struct btrfs_device *wr_tgtdev;
> bool flush_all_writes;
> + bool has_meta_failed;
>
> /*
> * statistics
> @@ -1380,6 +1381,8 @@ static int scrub_handle_errored_block(struct scrub_block *sblock_to_check)
> btrfs_err_rl_in_rcu(fs_info,
> "unable to fixup (regular) error at logical %llu on dev %s",
> logical, btrfs_dev_name(dev));
> + if (is_metadata)
> + sctx->has_meta_failed = true;
> }
>
> out:
> @@ -3838,6 +3841,12 @@ static noinline_for_stack int scrub_stripe(struct scrub_ctx *sctx,
>
> blk_finish_plug(&plug);
>
> + /*
> + * If we have metadata unable to be repaired, we should error
> + * out the dev-replace.
> + */
> + if (sctx->is_dev_replace && sctx->has_meta_failed && ret >= 0)
> + ret = -EIO;
> if (sctx->is_dev_replace && ret >= 0) {
> int ret2;
>
--
Torstein Eide
Torsteine(a)gmail.com
I sent an email to you yesterday but since I did not get a response,
I thought probably you did not receive it so I decided to send it
again and hopefully I will get a response this time around.
I am a secret admirer and would like to explore the opportunity to
learn more about each other. I know it is strange to contact you
this way and I hope you can forgive me. I am a shy person and
this is the only way I know I could get your attention. I just want
to know what you think and my intention is not to offend you.
I hope we can be friends if that is what you want, although I wish
to be more than just a friend. I know you have a few questions to
ask and I hope I can satisfy some of your curiosity with a few
answers.
I believe in the saying that 'to the world you are just one person,
but to someone special you are the world'. All I want is love,
romantic care and attention from a special companion which I am
hoping would be you.
I hope this message will be the beginning of a long term
communication between us, simply send a reply to this message, it
will make me happy.
Hugs and kisses,
Marion.
The patch titled
Subject: relayfs: fix out-of-bounds access in relay_file_read
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
relayfs-fix-out-of-bounds-access-in-relay_file_read.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Zhang Zhengming <zhang.zhengming(a)h3c.com>
Subject: relayfs: fix out-of-bounds access in relay_file_read
Date: Wed, 19 Apr 2023 12:02:03 +0800
There is a crash in relay_file_read, as the var from
point to the end of last subbuf.
The oops looks something like:
pc : __arch_copy_to_user+0x180/0x310
lr : relay_file_read+0x20c/0x2c8
Call trace:
__arch_copy_to_user+0x180/0x310
full_proxy_read+0x68/0x98
vfs_read+0xb0/0x1d0
ksys_read+0x6c/0xf0
__arm64_sys_read+0x20/0x28
el0_svc_common.constprop.3+0x84/0x108
do_el0_svc+0x74/0x90
el0_svc+0x1c/0x28
el0_sync_handler+0x88/0xb0
el0_sync+0x148/0x180
We get the condition by analyzing the vmcore:
1). The last produced byte and last consumed byte
both at the end of the last subbuf
2). A softirq calls function(e.g __blk_add_trace)
to write relay buffer occurs when an program is calling
relay_file_read_avail().
relay_file_read
relay_file_read_avail
relay_file_read_consume(buf, 0, 0);
//interrupted by softirq who will write subbuf
....
return 1;
//read_start point to the end of the last subbuf
read_start = relay_file_read_start_pos
//avail is equal to subsize
avail = relay_file_read_subbuf_avail
//from points to an invalid memory address
from = buf->start + read_start
//system is crashed
copy_to_user(buffer, from, avail)
Link: https://lkml.kernel.org/r/20230419040203.37676-1-zhang.zhengming@h3c.com
Fixes: 341a7213e5c1 ("kernel/relay.c: fix read_pos error when multiple readers")
Signed-off-by: Zhang Zhengming <zhang.zhengming(a)h3c.com>
Reviewed-by: Zhao Lei <zhao_lei1(a)hoperun.com>
Reviewed-by: Zhou Kete <zhou.kete(a)h3c.com>
Cc: Pengcheng Yang <yangpc(a)wangsu.com>
Cc: Jens Axboe <axboe(a)kernel.dk>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
kernel/relay.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
--- a/kernel/relay.c~relayfs-fix-out-of-bounds-access-in-relay_file_read
+++ a/kernel/relay.c
@@ -989,7 +989,8 @@ static size_t relay_file_read_start_pos(
size_t subbuf_size = buf->chan->subbuf_size;
size_t n_subbufs = buf->chan->n_subbufs;
size_t consumed = buf->subbufs_consumed % n_subbufs;
- size_t read_pos = consumed * subbuf_size + buf->bytes_consumed;
+ size_t read_pos = (consumed * subbuf_size + buf->bytes_consumed)
+ % (n_subbufs * subbuf_size);
read_subbuf = read_pos / subbuf_size;
padding = buf->padding[read_subbuf];
_
Patches currently in -mm which might be from zhang.zhengming(a)h3c.com are
relayfs-fix-out-of-bounds-access-in-relay_file_read.patch
The patch titled
Subject: maple_tree: make maple state reusable after mas_empty_area_rev()
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
maple_tree-make-maple-state-reusable-after-mas_empty_area_rev.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: "Liam R. Howlett" <Liam.Howlett(a)oracle.com>
Subject: maple_tree: make maple state reusable after mas_empty_area_rev()
Date: Fri, 14 Apr 2023 10:57:26 -0400
Stop using maple state min/max for the range by passing through pointers
for those values. This will allow the maple state to be reused without
resetting.
Also add some logic to fail out early on searching with invalid
arguments.
Link: https://lkml.kernel.org/r/20230414145728.4067069-1-Liam.Howlett@oracle.com
Fixes: 54a611b60590 ("Maple Tree: add new data structure")
Signed-off-by: Liam R. Howlett <Liam.Howlett(a)oracle.com>
Reported-by: Rick Edgecombe <rick.p.edgecombe(a)intel.com>
Cc: Peng Zhang <zhangpeng.00(a)bytedance.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
lib/maple_tree.c | 27 +++++++++++++--------------
1 file changed, 13 insertions(+), 14 deletions(-)
--- a/lib/maple_tree.c~maple_tree-make-maple-state-reusable-after-mas_empty_area_rev
+++ a/lib/maple_tree.c
@@ -4965,7 +4965,8 @@ not_found:
* Return: True if found in a leaf, false otherwise.
*
*/
-static bool mas_rev_awalk(struct ma_state *mas, unsigned long size)
+static bool mas_rev_awalk(struct ma_state *mas, unsigned long size,
+ unsigned long *gap_min, unsigned long *gap_max)
{
enum maple_type type = mte_node_type(mas->node);
struct maple_node *node = mas_mn(mas);
@@ -5030,8 +5031,8 @@ static bool mas_rev_awalk(struct ma_stat
if (unlikely(ma_is_leaf(type))) {
mas->offset = offset;
- mas->min = min;
- mas->max = min + gap - 1;
+ *gap_min = min;
+ *gap_max = min + gap - 1;
return true;
}
@@ -5309,6 +5310,9 @@ int mas_empty_area(struct ma_state *mas,
unsigned long *pivots;
enum maple_type mt;
+ if (min >= max)
+ return -EINVAL;
+
if (mas_is_start(mas))
mas_start(mas);
else if (mas->offset >= 2)
@@ -5363,6 +5367,9 @@ int mas_empty_area_rev(struct ma_state *
{
struct maple_enode *last = mas->node;
+ if (min >= max)
+ return -EINVAL;
+
if (mas_is_start(mas)) {
mas_start(mas);
mas->offset = mas_data_end(mas);
@@ -5382,7 +5389,7 @@ int mas_empty_area_rev(struct ma_state *
mas->index = min;
mas->last = max;
- while (!mas_rev_awalk(mas, size)) {
+ while (!mas_rev_awalk(mas, size, &min, &max)) {
if (last == mas->node) {
if (!mas_rewind_node(mas))
return -EBUSY;
@@ -5397,17 +5404,9 @@ int mas_empty_area_rev(struct ma_state *
if (unlikely(mas->offset == MAPLE_NODE_SLOTS))
return -EBUSY;
- /*
- * mas_rev_awalk() has set mas->min and mas->max to the gap values. If
- * the maximum is outside the window we are searching, then use the last
- * location in the search.
- * mas->max and mas->min is the range of the gap.
- * mas->index and mas->last are currently set to the search range.
- */
-
/* Trim the upper limit to the max. */
- if (mas->max <= mas->last)
- mas->last = mas->max;
+ if (max <= mas->last)
+ mas->last = max;
mas->index = mas->last - size + 1;
return 0;
_
Patches currently in -mm which might be from Liam.Howlett(a)oracle.com are
maple_tree-make-maple-state-reusable-after-mas_empty_area_rev.patch
From: Ville Syrjälä <ville.syrjala(a)linux.intel.com>
The skl+ scalers only sample 12 bits of PIPESRC so we can't
do any plane scaling at all when the pipe source size is >4k.
Make sure the pipe source size is also below the scaler's src
size limits. Might not be 100% accurate, but should at least be
safe. We can refine the limits later if we discover that recent
hw is less restricted.
Cc: stable(a)vger.kernel.org
Tested-by: Ross Zwisler <zwisler(a)google.com>
Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/8357
Signed-off-by: Ville Syrjälä <ville.syrjala(a)linux.intel.com>
---
drivers/gpu/drm/i915/display/skl_scaler.c | 17 +++++++++++++++++
1 file changed, 17 insertions(+)
diff --git a/drivers/gpu/drm/i915/display/skl_scaler.c b/drivers/gpu/drm/i915/display/skl_scaler.c
index 473d53610b92..0e7e014fcc71 100644
--- a/drivers/gpu/drm/i915/display/skl_scaler.c
+++ b/drivers/gpu/drm/i915/display/skl_scaler.c
@@ -111,6 +111,8 @@ skl_update_scaler(struct intel_crtc_state *crtc_state, bool force_detach,
struct drm_i915_private *dev_priv = to_i915(crtc->base.dev);
const struct drm_display_mode *adjusted_mode =
&crtc_state->hw.adjusted_mode;
+ int pipe_src_w = drm_rect_width(&crtc_state->pipe_src);
+ int pipe_src_h = drm_rect_height(&crtc_state->pipe_src);
int min_src_w, min_src_h, min_dst_w, min_dst_h;
int max_src_w, max_src_h, max_dst_w, max_dst_h;
@@ -207,6 +209,21 @@ skl_update_scaler(struct intel_crtc_state *crtc_state, bool force_detach,
return -EINVAL;
}
+ /*
+ * The pipe scaler does not use all the bits of PIPESRC, at least
+ * on the earlier platforms. So even when we're scaling a plane
+ * the *pipe* source size must not be too large. For simplicity
+ * we assume the limits match the scaler source size limits. Might
+ * not be 100% accurate on all platforms, but good enough for now.
+ */
+ if (pipe_src_w > max_src_w || pipe_src_h > max_src_h) {
+ drm_dbg_kms(&dev_priv->drm,
+ "scaler_user index %u.%u: pipe src size %ux%u "
+ "is out of scaler range\n",
+ crtc->pipe, scaler_user, pipe_src_w, pipe_src_h);
+ return -EINVAL;
+ }
+
/* mark this plane as a scaler user in crtc_state */
scaler_state->scaler_users |= (1 << scaler_user);
drm_dbg_kms(&dev_priv->drm, "scaler_user index %u.%u: "
--
2.39.2
This is the start of the stable review cycle for the 4.19.281 release.
There are 57 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Thu, 20 Apr 2023 12:02:44 +0000.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.19.281-r…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.19.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 4.19.281-rc1
Marc Zyngier <marc.zyngier(a)arm.com>
arm64: KVM: Fix system register enumeration
Dave Martin <Dave.Martin(a)arm.com>
KVM: arm64: Filter out invalid core register IDs in KVM_GET_REG_LIST
Dave Martin <Dave.Martin(a)arm.com>
KVM: arm64: Factor out core register ID enumeration
Paolo Bonzini <pbonzini(a)redhat.com>
KVM: nVMX: add missing consistency checks for CR0 and CR4
Steve Clevenger <scclevenger(a)os.amperecomputing.com>
coresight-etm4: Fix for() loop drvdata->nr_addr_cmp range bug
George Cherian <george.cherian(a)marvell.com>
watchdog: sbsa_wdog: Make sure the timeout programming is within the limits
Waiman Long <longman(a)redhat.com>
cgroup/cpuset: Wake up cpuset_attach_wq tasks in cpuset_cancel_attach()
ZhaoLong Wang <wangzhaolong1(a)huawei.com>
ubi: Fix deadlock caused by recursively holding work_sem
Lee Jones <lee.jones(a)linaro.org>
mtd: ubi: wl: Fix a couple of kernel-doc issues
Zhihao Cheng <chengzhihao1(a)huawei.com>
ubi: Fix failure attaching when vid_hdr offset equals to (sub)page size
Basavaraj Natikar <Basavaraj.Natikar(a)amd.com>
x86/PCI: Add quirk for AMD XHCI controller that loses MSI-X state in D3hot
Jiri Kosina <jkosina(a)suse.cz>
scsi: ses: Handle enclosure with just a primary component gracefully
Robbie Harwood <rharwood(a)redhat.com>
verify_pefile: relax wrapper length check
Hans de Goede <hdegoede(a)redhat.com>
efi: sysfb_efi: Add quirk for Lenovo Yoga Book X91F/L
Alexander Stein <alexander.stein(a)ew.tq-group.com>
i2c: imx-lpi2c: clean rx/tx buffers upon new message
Grant Grundler <grundler(a)chromium.org>
power: supply: cros_usbpd: reclassify "default case!" as debug
Eric Dumazet <edumazet(a)google.com>
udp6: fix potential access to stale information
Roman Gushchin <roman.gushchin(a)linux.dev>
net: macb: fix a memory corruption in extended buffer descriptor mode
Xin Long <lucien.xin(a)gmail.com>
sctp: fix a potential overflow in sctp_ifwdtsn_skip
Denis Plotnikov <den-plotnikov(a)yandex-team.ru>
qlcnic: check pci_reset_function result
Harshit Mogalapalli <harshit.m.mogalapalli(a)oracle.com>
niu: Fix missing unwind goto in niu_alloc_channels()
Zheng Wang <zyytlz.wz(a)163.com>
9p/xen : Fix use after free bug in xen_9pfs_front_remove due to race condition
Bang Li <libang.linuxer(a)gmail.com>
mtdblock: tolerate corrected bit-flips
Min Li <lm0963hack(a)gmail.com>
Bluetooth: Fix race condition in hidp_session_thread
Luiz Augusto von Dentz <luiz.von.dentz(a)intel.com>
Bluetooth: L2CAP: Fix use-after-free in l2cap_disconnect_{req,rsp}
Oswald Buddenhagen <oswald.buddenhagen(a)gmx.de>
ALSA: hda/sigmatel: fix S/PDIF out on Intel D*45* motherboards
Oswald Buddenhagen <oswald.buddenhagen(a)gmx.de>
ALSA: i2c/cs8427: fix iec958 mixer control deactivation
Oswald Buddenhagen <oswald.buddenhagen(a)gmx.de>
ALSA: hda/sigmatel: add pin overrides for Intel DP45SG motherboard
Oswald Buddenhagen <oswald.buddenhagen(a)gmx.de>
ALSA: emu10k1: fix capture interrupt handler unlinking
Kornel Dulęba <korneld(a)chromium.org>
Revert "pinctrl: amd: Disable and mask interrupts on resume"
Rongwei Wang <rongwei.wang(a)linux.alibaba.com>
mm/swap: fix swap_info_struct race between swapoff and get_swap_pages()
Zheng Yejian <zhengyejian1(a)huawei.com>
ring-buffer: Fix race while reader and writer are on the same page
John Keeping <john(a)metanate.com>
ftrace: Mark get_lock_parent_ip() __always_inline
Kan Liang <kan.liang(a)linux.intel.com>
perf/core: Fix the same task check in perf_event_set_output
Jeremy Soller <jeremy(a)system76.com>
ALSA: hda/realtek: Add quirk for Clevo X370SNW
Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
nilfs2: fix sysfs interface lifetime
Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
nilfs2: fix potential UAF of struct nilfs_sc_info in nilfs_segctor_thread()
Biju Das <biju.das.jz(a)bp.renesas.com>
tty: serial: sh-sci: Fix Rx on RZ/G2L SCI
Biju Das <biju.das.jz(a)bp.renesas.com>
tty: serial: sh-sci: Fix transmit end interrupt handler
William Breathitt Gray <william.gray(a)linaro.org>
iio: dac: cio-dac: Fix max DAC write value check for 12-bit
Bjørn Mork <bjorn(a)mork.no>
USB: serial: option: add Quectel RM500U-CN modem
Enrico Sau <enrico.sau(a)gmail.com>
USB: serial: option: add Telit FE990 compositions
Kees Jan Koster <kjkoster(a)kjkoster.org>
USB: serial: cp210x: add Silicon Labs IFS-USB-DATACABLE IDs
Dhruva Gole <d-gole(a)ti.com>
gpio: davinci: Add irq chip flag to skip set wake
Ziyang Xuan <william.xuanziyang(a)huawei.com>
ipv6: Fix an uninit variable access bug in __ip6_make_skb()
Xin Long <lucien.xin(a)gmail.com>
sctp: check send stream number after wait_for_sndbuf
Jakub Kicinski <kuba(a)kernel.org>
net: don't let netpoll invoke NAPI if in xmit context
Eric Dumazet <edumazet(a)google.com>
icmp: guard against too small mtu
Felix Fietkau <nbd(a)nbd.name>
wifi: mac80211: fix invalid drv_sta_pre_rcu_remove calls for non-uploaded sta
Uwe Kleine-König <u.kleine-koenig(a)pengutronix.de>
pwm: cros-ec: Explicitly set .polarity in .get_state()
Trond Myklebust <trond.myklebust(a)hammerspace.com>
NFSv4: Fix hangs when recovering open state after a server reboot
Trond Myklebust <trond.myklebust(a)hammerspace.com>
NFSv4: Check the return value of update_open_stateid()
Trond Myklebust <trond.myklebust(a)hammerspace.com>
NFSv4: Convert struct nfs4_state to use refcount_t
Kornel Dulęba <korneld(a)chromium.org>
pinctrl: amd: Disable and mask interrupts on resume
Sachi King <nakato(a)nakato.io>
pinctrl: amd: disable and mask interrupts on probe
Linus Walleij <linus.walleij(a)linaro.org>
pinctrl: amd: Use irqchip template
Sandeep Singh <sandeep.singh(a)amd.com>
pinctrl: Added IRQF_SHARED flag for amd-pinctrl driver
-------------
Diffstat:
Documentation/sound/hd-audio/models.rst | 2 +-
Makefile | 4 +-
arch/arm64/kvm/guest.c | 83 ++++++++++++++++++++-----
arch/x86/kernel/sysfb_efi.c | 8 +++
arch/x86/kvm/vmx/vmx.c | 10 ++-
arch/x86/pci/fixup.c | 21 +++++++
crypto/asymmetric_keys/verify_pefile.c | 12 ++--
drivers/gpio/gpio-davinci.c | 2 +-
drivers/hwtracing/coresight/coresight-etm4x.c | 2 +-
drivers/i2c/busses/i2c-imx-lpi2c.c | 2 +
drivers/iio/dac/cio-dac.c | 4 +-
drivers/mtd/mtdblock.c | 12 ++--
drivers/mtd/ubi/build.c | 21 +++++--
drivers/mtd/ubi/wl.c | 5 +-
drivers/net/ethernet/cadence/macb_main.c | 4 ++
drivers/net/ethernet/qlogic/qlcnic/qlcnic_ctx.c | 8 ++-
drivers/net/ethernet/sun/niu.c | 2 +-
drivers/pinctrl/pinctrl-amd.c | 56 +++++++++++++----
drivers/power/supply/cros_usbpd-charger.c | 2 +-
drivers/pwm/pwm-cros-ec.c | 1 +
drivers/scsi/ses.c | 20 +++---
drivers/tty/serial/sh-sci.c | 9 ++-
drivers/usb/serial/cp210x.c | 1 +
drivers/usb/serial/option.c | 10 +++
drivers/watchdog/sbsa_gwdt.c | 1 +
fs/nfs/nfs4_fs.h | 2 +-
fs/nfs/nfs4proc.c | 25 ++++----
fs/nfs/nfs4state.c | 8 +--
fs/nilfs2/segment.c | 3 +-
fs/nilfs2/super.c | 2 +
fs/nilfs2/the_nilfs.c | 12 ++--
include/linux/ftrace.h | 2 +-
kernel/cgroup/cpuset.c | 4 +-
kernel/events/core.c | 2 +-
kernel/trace/ring_buffer.c | 13 +++-
mm/swapfile.c | 3 +-
net/9p/trans_xen.c | 4 ++
net/bluetooth/hidp/core.c | 2 +-
net/bluetooth/l2cap_core.c | 24 ++-----
net/core/netpoll.c | 19 +++++-
net/ipv4/icmp.c | 5 ++
net/ipv6/ip6_output.c | 7 ++-
net/ipv6/udp.c | 8 ++-
net/mac80211/sta_info.c | 3 +-
net/sctp/socket.c | 4 ++
net/sctp/stream_interleave.c | 3 +-
sound/i2c/cs8427.c | 7 ++-
sound/pci/emu10k1/emupcm.c | 4 +-
sound/pci/hda/patch_realtek.c | 1 +
sound/pci/hda/patch_sigmatel.c | 10 +++
50 files changed, 350 insertions(+), 129 deletions(-)
This is the start of the stable review cycle for the 4.14.313 release.
There are 37 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Thu, 20 Apr 2023 12:02:44 +0000.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.14.313-r…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.14.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 4.14.313-rc1
Marc Zyngier <marc.zyngier(a)arm.com>
arm64: KVM: Fix system register enumeration
Dave Martin <Dave.Martin(a)arm.com>
KVM: arm64: Filter out invalid core register IDs in KVM_GET_REG_LIST
Dave Martin <Dave.Martin(a)arm.com>
KVM: arm64: Factor out core register ID enumeration
Steve Clevenger <scclevenger(a)os.amperecomputing.com>
coresight-etm4: Fix for() loop drvdata->nr_addr_cmp range bug
George Cherian <george.cherian(a)marvell.com>
watchdog: sbsa_wdog: Make sure the timeout programming is within the limits
Waiman Long <longman(a)redhat.com>
cgroup/cpuset: Wake up cpuset_attach_wq tasks in cpuset_cancel_attach()
Zhihao Cheng <chengzhihao1(a)huawei.com>
ubi: Fix failure attaching when vid_hdr offset equals to (sub)page size
Robbie Harwood <rharwood(a)redhat.com>
verify_pefile: relax wrapper length check
Hans de Goede <hdegoede(a)redhat.com>
efi: sysfb_efi: Add quirk for Lenovo Yoga Book X91F/L
Alexander Stein <alexander.stein(a)ew.tq-group.com>
i2c: imx-lpi2c: clean rx/tx buffers upon new message
Roman Gushchin <roman.gushchin(a)linux.dev>
net: macb: fix a memory corruption in extended buffer descriptor mode
Denis Plotnikov <den-plotnikov(a)yandex-team.ru>
qlcnic: check pci_reset_function result
Harshit Mogalapalli <harshit.m.mogalapalli(a)oracle.com>
niu: Fix missing unwind goto in niu_alloc_channels()
Zheng Wang <zyytlz.wz(a)163.com>
9p/xen : Fix use after free bug in xen_9pfs_front_remove due to race condition
Bang Li <libang.linuxer(a)gmail.com>
mtdblock: tolerate corrected bit-flips
Min Li <lm0963hack(a)gmail.com>
Bluetooth: Fix race condition in hidp_session_thread
Luiz Augusto von Dentz <luiz.von.dentz(a)intel.com>
Bluetooth: L2CAP: Fix use-after-free in l2cap_disconnect_{req,rsp}
Oswald Buddenhagen <oswald.buddenhagen(a)gmx.de>
ALSA: hda/sigmatel: fix S/PDIF out on Intel D*45* motherboards
Oswald Buddenhagen <oswald.buddenhagen(a)gmx.de>
ALSA: i2c/cs8427: fix iec958 mixer control deactivation
Oswald Buddenhagen <oswald.buddenhagen(a)gmx.de>
ALSA: hda/sigmatel: add pin overrides for Intel DP45SG motherboard
Oswald Buddenhagen <oswald.buddenhagen(a)gmx.de>
ALSA: emu10k1: fix capture interrupt handler unlinking
Rongwei Wang <rongwei.wang(a)linux.alibaba.com>
mm/swap: fix swap_info_struct race between swapoff and get_swap_pages()
Zheng Yejian <zhengyejian1(a)huawei.com>
ring-buffer: Fix race while reader and writer are on the same page
John Keeping <john(a)metanate.com>
ftrace: Mark get_lock_parent_ip() __always_inline
Kan Liang <kan.liang(a)linux.intel.com>
perf/core: Fix the same task check in perf_event_set_output
Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
nilfs2: fix sysfs interface lifetime
Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
nilfs2: fix potential UAF of struct nilfs_sc_info in nilfs_segctor_thread()
Biju Das <biju.das.jz(a)bp.renesas.com>
tty: serial: sh-sci: Fix Rx on RZ/G2L SCI
William Breathitt Gray <william.gray(a)linaro.org>
iio: dac: cio-dac: Fix max DAC write value check for 12-bit
Bjørn Mork <bjorn(a)mork.no>
USB: serial: option: add Quectel RM500U-CN modem
Enrico Sau <enrico.sau(a)gmail.com>
USB: serial: option: add Telit FE990 compositions
Kees Jan Koster <kjkoster(a)kjkoster.org>
USB: serial: cp210x: add Silicon Labs IFS-USB-DATACABLE IDs
Dhruva Gole <d-gole(a)ti.com>
gpio: davinci: Add irq chip flag to skip set wake
Ziyang Xuan <william.xuanziyang(a)huawei.com>
ipv6: Fix an uninit variable access bug in __ip6_make_skb()
Eric Dumazet <edumazet(a)google.com>
icmp: guard against too small mtu
Felix Fietkau <nbd(a)nbd.name>
wifi: mac80211: fix invalid drv_sta_pre_rcu_remove calls for non-uploaded sta
Uwe Kleine-König <u.kleine-koenig(a)pengutronix.de>
pwm: cros-ec: Explicitly set .polarity in .get_state()
-------------
Diffstat:
Documentation/sound/hd-audio/models.rst | 2 +-
Makefile | 4 +-
arch/arm64/kvm/guest.c | 83 ++++++++++++++++++++-----
arch/x86/kernel/sysfb_efi.c | 8 +++
crypto/asymmetric_keys/verify_pefile.c | 12 ++--
drivers/gpio/gpio-davinci.c | 2 +-
drivers/hwtracing/coresight/coresight-etm4x.c | 2 +-
drivers/i2c/busses/i2c-imx-lpi2c.c | 2 +
drivers/iio/dac/cio-dac.c | 4 +-
drivers/mtd/mtdblock.c | 12 ++--
drivers/mtd/ubi/build.c | 21 +++++--
drivers/net/ethernet/cadence/macb_main.c | 4 ++
drivers/net/ethernet/qlogic/qlcnic/qlcnic_ctx.c | 8 ++-
drivers/net/ethernet/sun/niu.c | 2 +-
drivers/pwm/pwm-cros-ec.c | 1 +
drivers/tty/serial/sh-sci.c | 2 +-
drivers/usb/serial/cp210x.c | 1 +
drivers/usb/serial/option.c | 10 +++
drivers/watchdog/sbsa_gwdt.c | 1 +
fs/nilfs2/segment.c | 3 +-
fs/nilfs2/super.c | 2 +
fs/nilfs2/the_nilfs.c | 12 ++--
include/linux/ftrace.h | 2 +-
kernel/cgroup/cpuset.c | 4 +-
kernel/events/core.c | 2 +-
kernel/trace/ring_buffer.c | 13 +++-
mm/swapfile.c | 3 +-
net/9p/trans_xen.c | 4 ++
net/bluetooth/hidp/core.c | 2 +-
net/bluetooth/l2cap_core.c | 24 ++-----
net/ipv4/icmp.c | 5 ++
net/ipv6/ip6_output.c | 7 ++-
net/mac80211/sta_info.c | 3 +-
sound/i2c/cs8427.c | 7 ++-
sound/pci/emu10k1/emupcm.c | 4 +-
sound/pci/hda/patch_sigmatel.c | 10 +++
36 files changed, 211 insertions(+), 77 deletions(-)
--
Hallo,
Ich bin Mis Vera Wilfred aus Abidjan Cote D'Ivoire (Elfenbeinküste)
Ich bin 22 Jahre alt, Mädchen, Waise, das heißt, weil ich keine Eltern
habe, ich habe ungefähr (10.500.000,00 US-Dollar) Zehn Millionen,
fünfhunderttausend vereint Staatsdollar.
Was ich von meinem verstorbenen Vater geerbt habe, hat er den Fonds
auf einem Fest- / Wechselkonto bei einer der besten Banken hier in
Abidjan hinterlegt.
mein Vater hat meinen Namen als seine einzige Tochter und einziges
Kind für die nächsten Angehörigen des Fonds verwendet.
Zweitens bekunden Sie mit Ihrer vollen Zustimmung, mit mir zu diesem
Zweck zusammenzuarbeiten, Ihr Interesse, indem Sie mir antworten,
damit ich Ihnen die notwendigen Informationen und die Details zum
weiteren Vorgehen zukommen lassen kann. Ich werde Ihnen 20% des Geldes
anbieten deine Hilfe für mich.
Möge Gott Sie für Ihre schnelle Aufmerksamkeit segnen. Meine besten
und liebenswürdigen Grüße an Sie und Ihre ganze Familie, wenn Sie mich
für weitere Details kontaktieren.
Ich brauche Ihre Assistentin, um mir zu helfen, diesen Fonds in Ihrem
Land zu investieren. Kontaktieren Sie mich jetzt für weitere Details.
Vielen Dank
Vera Wilfred.
Antgroup is using 5.10.y in product environment, we found several patches are
missing in 5.10.y tree. These patches are needed for us. So we backported them
to 5.10.y. Also backport to 5.15.y and 6.1.y to prevent regression.
Jiachen Zhang (1):
fuse: always revalidate rename target dentry
fs/fuse/dir.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--
2.40.0
Antgroup is using 5.10.y in product environment, we found several patches are
missing in 5.10.y tree. These patches are needed for us. So we backported them
to 5.10.y. Also backport to 5.15.y and 6.1.y to prevent regression.
Jiachen Zhang (1):
fuse: always revalidate rename target dentry
Miklos Szeredi (2):
fuse: fix attr version comparison in fuse_read_update_size()
fuse: fix deadlock between atomic O_TRUNC and page invalidation
fs/fuse/dir.c | 9 +++++++--
fs/fuse/file.c | 31 ++++++++++++++++++-------------
2 files changed, 25 insertions(+), 15 deletions(-)
--
2.40.0
Antgroup is using 5.10.y in product environment, we found several patches are
missing in 5.10.y tree. These patches are needed for us. So we backported them
to 5.10.y. Also backport to 5.15.y and 6.1.y to prevent regression.
Connor Kuehl (1):
virtiofs: split requests that exceed virtqueue size
Jiachen Zhang (1):
fuse: always revalidate rename target dentry
Miklos Szeredi (4):
virtiofs: clean up error handling in virtio_fs_get_tree()
fuse: check s_root when destroying sb
fuse: fix attr version comparison in fuse_read_update_size()
fuse: fix deadlock between atomic O_TRUNC and page invalidation
fs/fuse/dir.c | 7 ++++++-
fs/fuse/file.c | 31 +++++++++++++++++-------------
fs/fuse/fuse_i.h | 3 +++
fs/fuse/inode.c | 5 +++--
fs/fuse/virtio_fs.c | 46 +++++++++++++++++++++++++++++----------------
5 files changed, 60 insertions(+), 32 deletions(-)
--
2.40.0
Antgroup is using 5.10.y in product environment, we found several patches are
missing in 5.10.y tree. These patches are needed for us. So we backported them
to 5.10.y. Also backport to 5.15.y and 6.1.y to prevent regression.
Connor Kuehl (1):
virtiofs: split requests that exceed virtqueue size
Jiachen Zhang (1):
fuse: always revalidate rename target dentry
Miklos Szeredi (4):
virtiofs: clean up error handling in virtio_fs_get_tree()
fuse: check s_root when destroying sb
fuse: fix attr version comparison in fuse_read_update_size()
fuse: fix deadlock between atomic O_TRUNC and page invalidation
fs/fuse/dir.c | 7 ++++++-
fs/fuse/file.c | 31 +++++++++++++++++-------------
fs/fuse/fuse_i.h | 3 +++
fs/fuse/inode.c | 5 +++--
fs/fuse/virtio_fs.c | 46 +++++++++++++++++++++++++++++----------------
5 files changed, 60 insertions(+), 32 deletions(-)
--
2.40.0
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x f1581626071c8e37c58c5e8f0b4126b17172a211
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023041926-clique-washout-2197@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
f1581626071c ("riscv: Do not set initial_boot_params to the linear address of the dtb")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From f1581626071c8e37c58c5e8f0b4126b17172a211 Mon Sep 17 00:00:00 2001
From: Alexandre Ghiti <alexghiti(a)rivosinc.com>
Date: Wed, 29 Mar 2023 10:19:31 +0200
Subject: [PATCH] riscv: Do not set initial_boot_params to the linear address
of the dtb
early_init_dt_verify() is already called in parse_dtb() and since the dtb
address does not change anymore (it is now in the fixmap region), no need
to reset initial_boot_params by calling early_init_dt_verify() again.
Signed-off-by: Alexandre Ghiti <alexghiti(a)rivosinc.com>
Link: https://lore.kernel.org/r/20230329081932.79831-3-alexghiti@rivosinc.com
Cc: stable(a)vger.kernel.org
Signed-off-by: Palmer Dabbelt <palmer(a)rivosinc.com>
diff --git a/arch/riscv/kernel/setup.c b/arch/riscv/kernel/setup.c
index 542eed85ad2c..a059b73f4ddb 100644
--- a/arch/riscv/kernel/setup.c
+++ b/arch/riscv/kernel/setup.c
@@ -278,10 +278,7 @@ void __init setup_arch(char **cmdline_p)
#if IS_ENABLED(CONFIG_BUILTIN_DTB)
unflatten_and_copy_device_tree();
#else
- if (early_init_dt_verify(__va(XIP_FIXUP(dtb_early_pa))))
- unflatten_device_tree();
- else
- pr_err("No DTB found in kernel mappings\n");
+ unflatten_device_tree();
#endif
misc_mem_init();
Per-vcpu flags are updated using a non-atomic RMW operation.
Which means it is possible to get preempted between the read and
write operations.
Another interesting thing to note is that preemption also updates
flags, as we have some flag manipulation in both the load and put
operations.
It is thus possible to lose information communicated by either
load or put, as the preempted flag update will overwrite the flags
when the thread is resumed. This is specially critical if either
load or put has stored information which depends on the physical
CPU the vcpu runs on.
This results in really elusive bugs, and kudos must be given to
Mostafa for the long hours of debugging, and finally spotting
the problem.
Fix it by disabling preemption during the RMW operation, which
ensures that the state stays consistent. Also upgrade vcpu_get_flag
path to use READ_ONCE() to make sure the field is always atomically
accessed.
Fixes: e87abb73e594 ("KVM: arm64: Add helpers to manipulate vcpu flags among a set")
Reported-by: Mostafa Saleh <smostafa(a)google.com>
Signed-off-by: Marc Zyngier <maz(a)kernel.org>
Cc: stable(a)vger.kernel.org
---
Notes:
v2: add READ_ONCE() on the read path, expand commit message
arch/arm64/include/asm/kvm_host.h | 19 ++++++++++++++++++-
1 file changed, 18 insertions(+), 1 deletion(-)
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index bcd774d74f34..3dd691c85ca0 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -576,9 +576,22 @@ struct kvm_vcpu_arch {
({ \
__build_check_flag(v, flagset, f, m); \
\
- v->arch.flagset & (m); \
+ READ_ONCE(v->arch.flagset) & (m); \
})
+/*
+ * Note that the set/clear accessors must be preempt-safe in order to
+ * avoid nesting them with load/put which also manipulate flags...
+ */
+#ifdef __KVM_NVHE_HYPERVISOR__
+/* the nVHE hypervisor is always non-preemptible */
+#define __vcpu_flags_preempt_disable()
+#define __vcpu_flags_preempt_enable()
+#else
+#define __vcpu_flags_preempt_disable() preempt_disable()
+#define __vcpu_flags_preempt_enable() preempt_enable()
+#endif
+
#define __vcpu_set_flag(v, flagset, f, m) \
do { \
typeof(v->arch.flagset) *fset; \
@@ -586,9 +599,11 @@ struct kvm_vcpu_arch {
__build_check_flag(v, flagset, f, m); \
\
fset = &v->arch.flagset; \
+ __vcpu_flags_preempt_disable(); \
if (HWEIGHT(m) > 1) \
*fset &= ~(m); \
*fset |= (f); \
+ __vcpu_flags_preempt_enable(); \
} while (0)
#define __vcpu_clear_flag(v, flagset, f, m) \
@@ -598,7 +613,9 @@ struct kvm_vcpu_arch {
__build_check_flag(v, flagset, f, m); \
\
fset = &v->arch.flagset; \
+ __vcpu_flags_preempt_disable(); \
*fset &= ~(m); \
+ __vcpu_flags_preempt_enable(); \
} while (0)
#define vcpu_get_flag(v, ...) __vcpu_get_flag((v), __VA_ARGS__)
--
2.34.1
The following commit has been merged into the timers/core branch of tip:
Commit-ID: 1bb5b68fd3aabb6b9d6b9e9bb092bb8f3c2ade62
Gitweb: https://git.kernel.org/tip/1bb5b68fd3aabb6b9d6b9e9bb092bb8f3c2ade62
Author: Thomas Gleixner <tglx(a)linutronix.de>
AuthorDate: Mon, 17 Apr 2023 15:37:55 +02:00
Committer: Thomas Gleixner <tglx(a)linutronix.de>
CommitterDate: Wed, 19 Apr 2023 10:29:00 +02:00
posix-cpu-timers: Implement the missing timer_wait_running callback
For some unknown reason the introduction of the timer_wait_running callback
missed to fixup posix CPU timers, which went unnoticed for almost four years.
Marco reported recently that the WARN_ON() in timer_wait_running()
triggers with a posix CPU timer test case.
Posix CPU timers have two execution models for expiring timers depending on
CONFIG_POSIX_CPU_TIMERS_TASK_WORK:
1) If not enabled, the expiry happens in hard interrupt context so
spin waiting on the remote CPU is reasonably time bound.
Implement an empty stub function for that case.
2) If enabled, the expiry happens in task work before returning to user
space or guest mode. The expired timers are marked as firing and moved
from the timer queue to a local list head with sighand lock held. Once
the timers are moved, sighand lock is dropped and the expiry happens in
fully preemptible context. That means the expiring task can be scheduled
out, migrated, interrupted etc. So spin waiting on it is more than
suboptimal.
The timer wheel has a timer_wait_running() mechanism for RT, which uses
a per CPU timer-base expiry lock which is held by the expiry code and the
task waiting for the timer function to complete blocks on that lock.
This does not work in the same way for posix CPU timers as there is no
timer base and expiry for process wide timers can run on any task
belonging to that process, but the concept of waiting on an expiry lock
can be used too in a slightly different way:
- Add a mutex to struct posix_cputimers_work. This struct is per task
and used to schedule the expiry task work from the timer interrupt.
- Add a task_struct pointer to struct cpu_timer which is used to store
a the task which runs the expiry. That's filled in when the task
moves the expired timers to the local expiry list. That's not
affecting the size of the k_itimer union as there are bigger union
members already
- Let the task take the expiry mutex around the expiry function
- Let the waiter acquire a task reference with rcu_read_lock() held and
block on the expiry mutex
This avoids spin-waiting on a task which might not even be on a CPU and
works nicely for RT too.
Fixes: ec8f954a40da ("posix-timers: Use a callback for cancel synchronization on PREEMPT_RT")
Reported-by: Marco Elver <elver(a)google.com>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Tested-by: Marco Elver <elver(a)google.com>
Tested-by: Sebastian Andrzej Siewior <bigeasy(a)linutronix.de>
Reviewed-by: Frederic Weisbecker <frederic(a)kernel.org>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/r/87zg764ojw.ffs@tglx
---
include/linux/posix-timers.h | 17 ++++---
kernel/time/posix-cpu-timers.c | 81 +++++++++++++++++++++++++++------
kernel/time/posix-timers.c | 4 ++-
3 files changed, 82 insertions(+), 20 deletions(-)
diff --git a/include/linux/posix-timers.h b/include/linux/posix-timers.h
index 2c6e99c..d607f51 100644
--- a/include/linux/posix-timers.h
+++ b/include/linux/posix-timers.h
@@ -4,6 +4,7 @@
#include <linux/spinlock.h>
#include <linux/list.h>
+#include <linux/mutex.h>
#include <linux/alarmtimer.h>
#include <linux/timerqueue.h>
@@ -62,16 +63,18 @@ static inline int clockid_to_fd(const clockid_t clk)
* cpu_timer - Posix CPU timer representation for k_itimer
* @node: timerqueue node to queue in the task/sig
* @head: timerqueue head on which this timer is queued
- * @task: Pointer to target task
+ * @pid: Pointer to target task PID
* @elist: List head for the expiry list
* @firing: Timer is currently firing
+ * @handling: Pointer to the task which handles expiry
*/
struct cpu_timer {
- struct timerqueue_node node;
- struct timerqueue_head *head;
- struct pid *pid;
- struct list_head elist;
- int firing;
+ struct timerqueue_node node;
+ struct timerqueue_head *head;
+ struct pid *pid;
+ struct list_head elist;
+ int firing;
+ struct task_struct __rcu *handling;
};
static inline bool cpu_timer_enqueue(struct timerqueue_head *head,
@@ -135,10 +138,12 @@ struct posix_cputimers {
/**
* posix_cputimers_work - Container for task work based posix CPU timer expiry
* @work: The task work to be scheduled
+ * @mutex: Mutex held around expiry in context of this task work
* @scheduled: @work has been scheduled already, no further processing
*/
struct posix_cputimers_work {
struct callback_head work;
+ struct mutex mutex;
unsigned int scheduled;
};
diff --git a/kernel/time/posix-cpu-timers.c b/kernel/time/posix-cpu-timers.c
index 2f5e9b3..fb56e02 100644
--- a/kernel/time/posix-cpu-timers.c
+++ b/kernel/time/posix-cpu-timers.c
@@ -846,6 +846,8 @@ static u64 collect_timerqueue(struct timerqueue_head *head,
return expires;
ctmr->firing = 1;
+ /* See posix_cpu_timer_wait_running() */
+ rcu_assign_pointer(ctmr->handling, current);
cpu_timer_dequeue(ctmr);
list_add_tail(&ctmr->elist, firing);
}
@@ -1161,7 +1163,49 @@ static void handle_posix_cpu_timers(struct task_struct *tsk);
#ifdef CONFIG_POSIX_CPU_TIMERS_TASK_WORK
static void posix_cpu_timers_work(struct callback_head *work)
{
+ struct posix_cputimers_work *cw = container_of(work, typeof(*cw), work);
+
+ mutex_lock(&cw->mutex);
handle_posix_cpu_timers(current);
+ mutex_unlock(&cw->mutex);
+}
+
+/*
+ * Invoked from the posix-timer core when a cancel operation failed because
+ * the timer is marked firing. The caller holds rcu_read_lock(), which
+ * protects the timer and the task which is expiring it from being freed.
+ */
+static void posix_cpu_timer_wait_running(struct k_itimer *timr)
+{
+ struct task_struct *tsk = rcu_dereference(timr->it.cpu.handling);
+
+ /* Has the handling task completed expiry already? */
+ if (!tsk)
+ return;
+
+ /* Ensure that the task cannot go away */
+ get_task_struct(tsk);
+ /* Now drop the RCU protection so the mutex can be locked */
+ rcu_read_unlock();
+ /* Wait on the expiry mutex */
+ mutex_lock(&tsk->posix_cputimers_work.mutex);
+ /* Release it immediately again. */
+ mutex_unlock(&tsk->posix_cputimers_work.mutex);
+ /* Drop the task reference. */
+ put_task_struct(tsk);
+ /* Relock RCU so the callsite is balanced */
+ rcu_read_lock();
+}
+
+static void posix_cpu_timer_wait_running_nsleep(struct k_itimer *timr)
+{
+ /* Ensure that timr->it.cpu.handling task cannot go away */
+ rcu_read_lock();
+ spin_unlock_irq(&timr->it_lock);
+ posix_cpu_timer_wait_running(timr);
+ rcu_read_unlock();
+ /* @timr is on stack and is valid */
+ spin_lock_irq(&timr->it_lock);
}
/*
@@ -1177,6 +1221,7 @@ void clear_posix_cputimers_work(struct task_struct *p)
sizeof(p->posix_cputimers_work.work));
init_task_work(&p->posix_cputimers_work.work,
posix_cpu_timers_work);
+ mutex_init(&p->posix_cputimers_work.mutex);
p->posix_cputimers_work.scheduled = false;
}
@@ -1255,6 +1300,18 @@ static inline void __run_posix_cpu_timers(struct task_struct *tsk)
lockdep_posixtimer_exit();
}
+static void posix_cpu_timer_wait_running(struct k_itimer *timr)
+{
+ cpu_relax();
+}
+
+static void posix_cpu_timer_wait_running_nsleep(struct k_itimer *timr)
+{
+ spin_unlock_irq(&timer.it_lock);
+ cpu_relax();
+ spin_lock_irq(&timer.it_lock);
+}
+
static inline bool posix_cpu_timers_work_scheduled(struct task_struct *tsk)
{
return false;
@@ -1363,6 +1420,8 @@ static void handle_posix_cpu_timers(struct task_struct *tsk)
*/
if (likely(cpu_firing >= 0))
cpu_timer_fire(timer);
+ /* See posix_cpu_timer_wait_running() */
+ rcu_assign_pointer(timer->it.cpu.handling, NULL);
spin_unlock(&timer->it_lock);
}
}
@@ -1497,23 +1556,16 @@ static int do_cpu_nanosleep(const clockid_t which_clock, int flags,
expires = cpu_timer_getexpires(&timer.it.cpu);
error = posix_cpu_timer_set(&timer, 0, &zero_it, &it);
if (!error) {
- /*
- * Timer is now unarmed, deletion can not fail.
- */
+ /* Timer is now unarmed, deletion can not fail. */
posix_cpu_timer_del(&timer);
+ } else {
+ while (error == TIMER_RETRY) {
+ posix_cpu_timer_wait_running_nsleep(&timer);
+ error = posix_cpu_timer_del(&timer);
+ }
}
- spin_unlock_irq(&timer.it_lock);
- while (error == TIMER_RETRY) {
- /*
- * We need to handle case when timer was or is in the
- * middle of firing. In other cases we already freed
- * resources.
- */
- spin_lock_irq(&timer.it_lock);
- error = posix_cpu_timer_del(&timer);
- spin_unlock_irq(&timer.it_lock);
- }
+ spin_unlock_irq(&timer.it_lock);
if ((it.it_value.tv_sec | it.it_value.tv_nsec) == 0) {
/*
@@ -1623,6 +1675,7 @@ const struct k_clock clock_posix_cpu = {
.timer_del = posix_cpu_timer_del,
.timer_get = posix_cpu_timer_get,
.timer_rearm = posix_cpu_timer_rearm,
+ .timer_wait_running = posix_cpu_timer_wait_running,
};
const struct k_clock clock_process = {
diff --git a/kernel/time/posix-timers.c b/kernel/time/posix-timers.c
index 0c8a87a..808a247 100644
--- a/kernel/time/posix-timers.c
+++ b/kernel/time/posix-timers.c
@@ -846,6 +846,10 @@ static struct k_itimer *timer_wait_running(struct k_itimer *timer,
rcu_read_lock();
unlock_timer(timer, *flags);
+ /*
+ * kc->timer_wait_running() might drop RCU lock. So @timer
+ * cannot be touched anymore after the function returns!
+ */
if (!WARN_ON_ONCE(!kc->timer_wait_running))
kc->timer_wait_running(timer);
Christoph Paasch reported a couple of issues found by syzkaller and
linked to operations done by the MPTCP worker on (un)accepted sockets.
Fixing these issues was not obvious and rather complex but Paolo Abeni
nicely managed to propose these excellent patches that seem to satisfy
syzkaller.
Patch 1 partially reverts a recent fix but while still providing a
solution for the previous issue, it also prevents the MPTCP worker from
running concurrently with inet_csk_listen_stop(). A warning is then
avoided. The partially reverted patch has been introduced in v6.3-rc3,
backported up to v6.1 and fixing an issue visible from v5.18.
Patch 2 prevents the MPTCP worker to race with mptcp_accept() causing a
UaF when a fallback to TCP is done while in parallel, the socket is
being accepted by the userspace. This is also a fix of a previous fix
introduced in v6.3-rc3, backported up to v6.1 but here fixing an issue
that is in theory there from v5.7. There is no need to backport it up
to here as it looks like it is only visible later, around v5.18, see the
previous cover-letter linked to this original fix.
Signed-off-by: Matthieu Baerts <matthieu.baerts(a)tessares.net>
---
Paolo Abeni (2):
mptcp: stops worker on unaccepted sockets at listener close
mptcp: fix accept vs worker race
net/mptcp/protocol.c | 74 ++++++++++++++++++++++++++++++++----------------
net/mptcp/protocol.h | 2 ++
net/mptcp/subflow.c | 80 ++++++++++++++++++++++++++++++++++++++++++++++++++--
3 files changed, 129 insertions(+), 27 deletions(-)
---
base-commit: 338469d677e5d426f5ada88761f16f6d2c7c1981
change-id: 20230417-upstream-net-20230417-mptcp-worker-acceptw-31f35d7c3e9a
Best regards,
--
Matthieu Baerts <matthieu.baerts(a)tessares.net>
Commit b94f9ac79a7395c2d6171cc753cc27942df0be73 upstream.
Since commit 1243dc518c9d ("cgroup/cpuset: Convert cpuset_mutex to
percpu_rwsem"), cpuset_mutex has been replaced by cpuset_rwsem which is
a percpu rwsem. However, the comments in kernel/cgroup/cpuset.c still
reference cpuset_mutex which are now incorrect.
Change all the references of cpuset_mutex to cpuset_rwsem.
Fixes: 1243dc518c9d ("cgroup/cpuset: Convert cpuset_mutex to percpu_rwsem")
Signed-off-by: Waiman Long <longman(a)redhat.com>
Signed-off-by: Tejun Heo <tj(a)kernel.org>
---
kernel/cgroup/cpuset.c | 56 ++++++++++++++++++++++--------------------
1 file changed, 29 insertions(+), 27 deletions(-)
diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index 43270b07b2e0..fc6aedb84b80 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -299,17 +299,19 @@ static struct cpuset top_cpuset = {
if (is_cpuset_online(((des_cs) = css_cs((pos_css)))))
/*
- * There are two global locks guarding cpuset structures - cpuset_mutex and
+ * There are two global locks guarding cpuset structures - cpuset_rwsem and
* callback_lock. We also require taking task_lock() when dereferencing a
* task's cpuset pointer. See "The task_lock() exception", at the end of this
- * comment.
+ * comment. The cpuset code uses only cpuset_rwsem write lock. Other
+ * kernel subsystems can use cpuset_read_lock()/cpuset_read_unlock() to
+ * prevent change to cpuset structures.
*
* A task must hold both locks to modify cpusets. If a task holds
- * cpuset_mutex, then it blocks others wanting that mutex, ensuring that it
+ * cpuset_rwsem, it blocks others wanting that rwsem, ensuring that it
* is the only task able to also acquire callback_lock and be able to
* modify cpusets. It can perform various checks on the cpuset structure
* first, knowing nothing will change. It can also allocate memory while
- * just holding cpuset_mutex. While it is performing these checks, various
+ * just holding cpuset_rwsem. While it is performing these checks, various
* callback routines can briefly acquire callback_lock to query cpusets.
* Once it is ready to make the changes, it takes callback_lock, blocking
* everyone else.
@@ -380,7 +382,7 @@ static inline bool is_in_v2_mode(void)
* One way or another, we guarantee to return some non-empty subset
* of cpu_online_mask.
*
- * Call with callback_lock or cpuset_mutex held.
+ * Call with callback_lock or cpuset_rwsem held.
*/
static void guarantee_online_cpus(struct cpuset *cs, struct cpumask *pmask)
{
@@ -410,7 +412,7 @@ static void guarantee_online_cpus(struct cpuset *cs, struct cpumask *pmask)
* One way or another, we guarantee to return some non-empty subset
* of node_states[N_MEMORY].
*
- * Call with callback_lock or cpuset_mutex held.
+ * Call with callback_lock or cpuset_rwsem held.
*/
static void guarantee_online_mems(struct cpuset *cs, nodemask_t *pmask)
{
@@ -422,7 +424,7 @@ static void guarantee_online_mems(struct cpuset *cs, nodemask_t *pmask)
/*
* update task's spread flag if cpuset's page/slab spread flag is set
*
- * Call with callback_lock or cpuset_mutex held.
+ * Call with callback_lock or cpuset_rwsem held.
*/
static void cpuset_update_task_spread_flag(struct cpuset *cs,
struct task_struct *tsk)
@@ -443,7 +445,7 @@ static void cpuset_update_task_spread_flag(struct cpuset *cs,
*
* One cpuset is a subset of another if all its allowed CPUs and
* Memory Nodes are a subset of the other, and its exclusive flags
- * are only set if the other's are set. Call holding cpuset_mutex.
+ * are only set if the other's are set. Call holding cpuset_rwsem.
*/
static int is_cpuset_subset(const struct cpuset *p, const struct cpuset *q)
@@ -552,7 +554,7 @@ static inline void free_cpuset(struct cpuset *cs)
* If we replaced the flag and mask values of the current cpuset
* (cur) with those values in the trial cpuset (trial), would
* our various subset and exclusive rules still be valid? Presumes
- * cpuset_mutex held.
+ * cpuset_rwsem held.
*
* 'cur' is the address of an actual, in-use cpuset. Operations
* such as list traversal that depend on the actual address of the
@@ -675,7 +677,7 @@ static void update_domain_attr_tree(struct sched_domain_attr *dattr,
rcu_read_unlock();
}
-/* Must be called with cpuset_mutex held. */
+/* Must be called with cpuset_rwsem held. */
static inline int nr_cpusets(void)
{
/* jump label reference count + the top-level cpuset */
@@ -701,7 +703,7 @@ static inline int nr_cpusets(void)
* domains when operating in the severe memory shortage situations
* that could cause allocation failures below.
*
- * Must be called with cpuset_mutex held.
+ * Must be called with cpuset_rwsem held.
*
* The three key local variables below are:
* cp - cpuset pointer, used (together with pos_css) to perform a
@@ -980,7 +982,7 @@ partition_and_rebuild_sched_domains(int ndoms_new, cpumask_var_t doms_new[],
* 'cpus' is removed, then call this routine to rebuild the
* scheduler's dynamic sched domains.
*
- * Call with cpuset_mutex held. Takes get_online_cpus().
+ * Call with cpuset_rwsem held. Takes get_online_cpus().
*/
static void rebuild_sched_domains_locked(void)
{
@@ -1053,7 +1055,7 @@ void rebuild_sched_domains(void)
* @cs: the cpuset in which each task's cpus_allowed mask needs to be changed
*
* Iterate through each task of @cs updating its cpus_allowed to the
- * effective cpuset's. As this function is called with cpuset_mutex held,
+ * effective cpuset's. As this function is called with cpuset_rwsem held,
* cpuset membership stays stable.
*/
static void update_tasks_cpumask(struct cpuset *cs)
@@ -1328,7 +1330,7 @@ static int update_parent_subparts_cpumask(struct cpuset *cpuset, int cmd,
*
* On legacy hierachy, effective_cpus will be the same with cpu_allowed.
*
- * Called with cpuset_mutex held
+ * Called with cpuset_rwsem held
*/
static void update_cpumasks_hier(struct cpuset *cs, struct tmpmasks *tmp)
{
@@ -1688,12 +1690,12 @@ static void *cpuset_being_rebound;
* @cs: the cpuset in which each task's mems_allowed mask needs to be changed
*
* Iterate through each task of @cs updating its mems_allowed to the
- * effective cpuset's. As this function is called with cpuset_mutex held,
+ * effective cpuset's. As this function is called with cpuset_rwsem held,
* cpuset membership stays stable.
*/
static void update_tasks_nodemask(struct cpuset *cs)
{
- static nodemask_t newmems; /* protected by cpuset_mutex */
+ static nodemask_t newmems; /* protected by cpuset_rwsem */
struct css_task_iter it;
struct task_struct *task;
@@ -1706,7 +1708,7 @@ static void update_tasks_nodemask(struct cpuset *cs)
* take while holding tasklist_lock. Forks can happen - the
* mpol_dup() cpuset_being_rebound check will catch such forks,
* and rebind their vma mempolicies too. Because we still hold
- * the global cpuset_mutex, we know that no other rebind effort
+ * the global cpuset_rwsem, we know that no other rebind effort
* will be contending for the global variable cpuset_being_rebound.
* It's ok if we rebind the same mm twice; mpol_rebind_mm()
* is idempotent. Also migrate pages in each mm to new nodes.
@@ -1752,7 +1754,7 @@ static void update_tasks_nodemask(struct cpuset *cs)
*
* On legacy hiearchy, effective_mems will be the same with mems_allowed.
*
- * Called with cpuset_mutex held
+ * Called with cpuset_rwsem held
*/
static void update_nodemasks_hier(struct cpuset *cs, nodemask_t *new_mems)
{
@@ -1805,7 +1807,7 @@ static void update_nodemasks_hier(struct cpuset *cs, nodemask_t *new_mems)
* mempolicies and if the cpuset is marked 'memory_migrate',
* migrate the tasks pages to the new memory.
*
- * Call with cpuset_mutex held. May take callback_lock during call.
+ * Call with cpuset_rwsem held. May take callback_lock during call.
* Will take tasklist_lock, scan tasklist for tasks in cpuset cs,
* lock each such tasks mm->mmap_lock, scan its vma's and rebind
* their mempolicies to the cpusets new mems_allowed.
@@ -1895,7 +1897,7 @@ static int update_relax_domain_level(struct cpuset *cs, s64 val)
* @cs: the cpuset in which each task's spread flags needs to be changed
*
* Iterate through each task of @cs updating its spread flags. As this
- * function is called with cpuset_mutex held, cpuset membership stays
+ * function is called with cpuset_rwsem held, cpuset membership stays
* stable.
*/
static void update_tasks_flags(struct cpuset *cs)
@@ -1915,7 +1917,7 @@ static void update_tasks_flags(struct cpuset *cs)
* cs: the cpuset to update
* turning_on: whether the flag is being set or cleared
*
- * Call with cpuset_mutex held.
+ * Call with cpuset_rwsem held.
*/
static int update_flag(cpuset_flagbits_t bit, struct cpuset *cs,
@@ -1964,7 +1966,7 @@ static int update_flag(cpuset_flagbits_t bit, struct cpuset *cs,
* cs: the cpuset to update
* new_prs: new partition root state
*
- * Call with cpuset_mutex held.
+ * Call with cpuset_rwsem held.
*/
static int update_prstate(struct cpuset *cs, int new_prs)
{
@@ -2145,7 +2147,7 @@ static int fmeter_getrate(struct fmeter *fmp)
static struct cpuset *cpuset_attach_old_cs;
-/* Called by cgroups to determine if a cpuset is usable; cpuset_mutex held */
+/* Called by cgroups to determine if a cpuset is usable; cpuset_rwsem held */
static int cpuset_can_attach(struct cgroup_taskset *tset)
{
struct cgroup_subsys_state *css;
@@ -2197,7 +2199,7 @@ static void cpuset_cancel_attach(struct cgroup_taskset *tset)
}
/*
- * Protected by cpuset_mutex. cpus_attach is used only by cpuset_attach()
+ * Protected by cpuset_rwsem. cpus_attach is used only by cpuset_attach()
* but we can't allocate it dynamically there. Define it global and
* allocate from cpuset_init().
*/
@@ -2205,7 +2207,7 @@ static cpumask_var_t cpus_attach;
static void cpuset_attach(struct cgroup_taskset *tset)
{
- /* static buf protected by cpuset_mutex */
+ /* static buf protected by cpuset_rwsem */
static nodemask_t cpuset_attach_nodemask_to;
struct task_struct *task;
struct task_struct *leader;
@@ -2398,7 +2400,7 @@ static ssize_t cpuset_write_resmask(struct kernfs_open_file *of,
* operation like this one can lead to a deadlock through kernfs
* active_ref protection. Let's break the protection. Losing the
* protection is okay as we check whether @cs is online after
- * grabbing cpuset_mutex anyway. This only happens on the legacy
+ * grabbing cpuset_rwsem anyway. This only happens on the legacy
* hierarchies.
*/
css_get(&cs->css);
@@ -3637,7 +3639,7 @@ void __cpuset_memory_pressure_bump(void)
* - Used for /proc/<pid>/cpuset.
* - No need to task_lock(tsk) on this tsk->cpuset reference, as it
* doesn't really matter if tsk->cpuset changes after we read it,
- * and we take cpuset_mutex, keeping cpuset_attach() from changing it
+ * and we take cpuset_rwsem, keeping cpuset_attach() from changing it
* anyway.
*/
int proc_cpuset_show(struct seq_file *m, struct pid_namespace *ns,
--
2.31.1
The quilt patch titled
Subject: scripts/gdb: fix lx-timerlist for Python3
has been removed from the -mm tree. Its filename was
scripts-gdb-fix-lx-timerlist-for-python3.patch
This patch was dropped because it was merged into the mm-nonmm-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Peng Liu <liupeng17(a)lenovo.com>
Subject: scripts/gdb: fix lx-timerlist for Python3
Date: Tue, 21 Mar 2023 14:19:29 +0800
Below incompatibilities between Python2 and Python3 made lx-timerlist fail
to run under Python3.
o xrange() is replaced by range() in Python3
o bytes and str are different types in Python3
o the return value of Inferior.read_memory() is memoryview object in
Python3
akpm: cc stable so that older kernels are properly debuggable under newer
Python.
Link: https://lkml.kernel.org/r/TYCP286MB2146EE1180A4D5176CBA8AB2C6819@TYCP286MB2…
Signed-off-by: Peng Liu <liupeng17(a)lenovo.com>
Reviewed-by: Jan Kiszka <jan.kiszka(a)siemens.com>
Cc: Florian Fainelli <f.fainelli(a)gmail.com>
Cc: Kieran Bingham <kbingham(a)kernel.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
scripts/gdb/linux/timerlist.py | 4 +++-
scripts/gdb/linux/utils.py | 5 ++++-
2 files changed, 7 insertions(+), 2 deletions(-)
--- a/scripts/gdb/linux/timerlist.py~scripts-gdb-fix-lx-timerlist-for-python3
+++ a/scripts/gdb/linux/timerlist.py
@@ -72,7 +72,7 @@ def print_cpu(hrtimer_bases, cpu, max_cl
ts = cpus.per_cpu(tick_sched_ptr, cpu)
text = "cpu: {}\n".format(cpu)
- for i in xrange(max_clock_bases):
+ for i in range(max_clock_bases):
text += " clock {}:\n".format(i)
text += print_base(cpu_base['clock_base'][i])
@@ -157,6 +157,8 @@ def pr_cpumask(mask):
num_bytes = (nr_cpu_ids + 7) / 8
buf = utils.read_memoryview(inf, bits, num_bytes).tobytes()
buf = binascii.b2a_hex(buf)
+ if type(buf) is not str:
+ buf=buf.decode()
chunks = []
i = num_bytes
--- a/scripts/gdb/linux/utils.py~scripts-gdb-fix-lx-timerlist-for-python3
+++ a/scripts/gdb/linux/utils.py
@@ -88,7 +88,10 @@ def get_target_endianness():
def read_memoryview(inf, start, length):
- return memoryview(inf.read_memory(start, length))
+ m = inf.read_memory(start, length)
+ if type(m) is memoryview:
+ return m
+ return memoryview(m)
def read_u16(buffer, offset):
_
Patches currently in -mm which might be from liupeng17(a)lenovo.com are
The quilt patch titled
Subject: nilfs2: initialize unused bytes in segment summary blocks
has been removed from the -mm tree. Its filename was
nilfs2-initialize-unused-bytes-in-segment-summary-blocks.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Subject: nilfs2: initialize unused bytes in segment summary blocks
Date: Tue, 18 Apr 2023 02:35:13 +0900
Syzbot still reports uninit-value in nilfs_add_checksums_on_logs() for
KMSAN enabled kernels after applying commit 7397031622e0 ("nilfs2:
initialize "struct nilfs_binfo_dat"->bi_pad field").
This is because the unused bytes at the end of each block in segment
summaries are not initialized. So this fixes the issue by padding the
unused bytes with null bytes.
Link: https://lkml.kernel.org/r/20230417173513.12598-1-konishi.ryusuke@gmail.com
Signed-off-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Tested-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Reported-by: syzbot+048585f3f4227bb2b49b(a)syzkaller.appspotmail.com
Link: https://syzkaller.appspot.com/bug?extid=048585f3f4227bb2b49b
Cc: Alexander Potapenko <glider(a)google.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/nilfs2/segment.c | 20 ++++++++++++++++++++
1 file changed, 20 insertions(+)
--- a/fs/nilfs2/segment.c~nilfs2-initialize-unused-bytes-in-segment-summary-blocks
+++ a/fs/nilfs2/segment.c
@@ -430,6 +430,23 @@ static int nilfs_segctor_reset_segment_b
return 0;
}
+/**
+ * nilfs_segctor_zeropad_segsum - zero pad the rest of the segment summary area
+ * @sci: segment constructor object
+ *
+ * nilfs_segctor_zeropad_segsum() zero-fills unallocated space at the end of
+ * the current segment summary block.
+ */
+static void nilfs_segctor_zeropad_segsum(struct nilfs_sc_info *sci)
+{
+ struct nilfs_segsum_pointer *ssp;
+
+ ssp = sci->sc_blk_cnt > 0 ? &sci->sc_binfo_ptr : &sci->sc_finfo_ptr;
+ if (ssp->offset < ssp->bh->b_size)
+ memset(ssp->bh->b_data + ssp->offset, 0,
+ ssp->bh->b_size - ssp->offset);
+}
+
static int nilfs_segctor_feed_segment(struct nilfs_sc_info *sci)
{
sci->sc_nblk_this_inc += sci->sc_curseg->sb_sum.nblocks;
@@ -438,6 +455,7 @@ static int nilfs_segctor_feed_segment(st
* The current segment is filled up
* (internal code)
*/
+ nilfs_segctor_zeropad_segsum(sci);
sci->sc_curseg = NILFS_NEXT_SEGBUF(sci->sc_curseg);
return nilfs_segctor_reset_segment_buffer(sci);
}
@@ -542,6 +560,7 @@ static int nilfs_segctor_add_file_block(
goto retry;
}
if (unlikely(required)) {
+ nilfs_segctor_zeropad_segsum(sci);
err = nilfs_segbuf_extend_segsum(segbuf);
if (unlikely(err))
goto failed;
@@ -1533,6 +1552,7 @@ static int nilfs_segctor_collect(struct
nadd = min_t(int, nadd << 1, SC_MAX_SEGDELTA);
sci->sc_stage = prev_stage;
}
+ nilfs_segctor_zeropad_segsum(sci);
nilfs_segctor_truncate_segments(sci, sci->sc_curseg, nilfs->ns_sufile);
return 0;
_
Patches currently in -mm which might be from konishi.ryusuke(a)gmail.com are
The quilt patch titled
Subject: mm: page_alloc: skip regions with hugetlbfs pages when allocating 1G pages
has been removed from the -mm tree. Its filename was
mm-page_alloc-skip-regions-with-hugetlbfs-pages-when-allocating-1g-pages.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Mel Gorman <mgorman(a)techsingularity.net>
Subject: mm: page_alloc: skip regions with hugetlbfs pages when allocating 1G pages
Date: Fri, 14 Apr 2023 15:14:29 +0100
A bug was reported by Yuanxi Liu where allocating 1G pages at runtime is
taking an excessive amount of time for large amounts of memory. Further
testing allocating huge pages that the cost is linear i.e. if allocating
1G pages in batches of 10 then the time to allocate nr_hugepages from
10->20->30->etc increases linearly even though 10 pages are allocated at
each step. Profiles indicated that much of the time is spent checking the
validity within already existing huge pages and then attempting a
migration that fails after isolating the range, draining pages and a whole
lot of other useless work.
Commit eb14d4eefdc4 ("mm,page_alloc: drop unnecessary checks from
pfn_range_valid_contig") removed two checks, one which ignored huge pages
for contiguous allocations as huge pages can sometimes migrate. While
there may be value on migrating a 2M page to satisfy a 1G allocation, it's
potentially expensive if the 1G allocation fails and it's pointless to try
moving a 1G page for a new 1G allocation or scan the tail pages for valid
PFNs.
Reintroduce the PageHuge check and assume any contiguous region with
hugetlbfs pages is unsuitable for a new 1G allocation.
The hpagealloc test allocates huge pages in batches and reports the
average latency per page over time. This test happens just after boot
when fragmentation is not an issue. Units are in milliseconds.
hpagealloc
6.3.0-rc6 6.3.0-rc6 6.3.0-rc6
vanilla hugeallocrevert-v1r1 hugeallocsimple-v1r2
Min Latency 26.42 ( 0.00%) 5.07 ( 80.82%) 18.94 ( 28.30%)
1st-qrtle Latency 356.61 ( 0.00%) 5.34 ( 98.50%) 19.85 ( 94.43%)
2nd-qrtle Latency 697.26 ( 0.00%) 5.47 ( 99.22%) 20.44 ( 97.07%)
3rd-qrtle Latency 972.94 ( 0.00%) 5.50 ( 99.43%) 20.81 ( 97.86%)
Max-1 Latency 26.42 ( 0.00%) 5.07 ( 80.82%) 18.94 ( 28.30%)
Max-5 Latency 82.14 ( 0.00%) 5.11 ( 93.78%) 19.31 ( 76.49%)
Max-10 Latency 150.54 ( 0.00%) 5.20 ( 96.55%) 19.43 ( 87.09%)
Max-90 Latency 1164.45 ( 0.00%) 5.53 ( 99.52%) 20.97 ( 98.20%)
Max-95 Latency 1223.06 ( 0.00%) 5.55 ( 99.55%) 21.06 ( 98.28%)
Max-99 Latency 1278.67 ( 0.00%) 5.57 ( 99.56%) 22.56 ( 98.24%)
Max Latency 1310.90 ( 0.00%) 8.06 ( 99.39%) 26.62 ( 97.97%)
Amean Latency 678.36 ( 0.00%) 5.44 * 99.20%* 20.44 * 96.99%*
6.3.0-rc6 6.3.0-rc6 6.3.0-rc6
vanilla revert-v1 hugeallocfix-v2
Duration User 0.28 0.27 0.30
Duration System 808.66 17.77 35.99
Duration Elapsed 830.87 18.08 36.33
The vanilla kernel is poor, taking up to 1.3 second to allocate a huge
page and almost 10 minutes in total to run the test. Reverting the
problematic commit reduces it to 8ms at worst and the patch takes 26ms.
This patch fixes the main issue with skipping huge pages but leaves the
page_count() out because a page with an elevated count potentially can
migrate.
BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=217022
Link: https://lkml.kernel.org/r/20230414141429.pwgieuwluxwez3rj@techsingularity.n…
Fixes: eb14d4eefdc4 ("mm,page_alloc: drop unnecessary checks from pfn_range_valid_contig")
Signed-off-by: Mel Gorman <mgorman(a)techsingularity.net>
Reported-by: Yuanxi Liu <y.liu(a)naruida.com>
Acked-by: Vlastimil Babka <vbabka(a)suse.cz>
Reviewed-by: David Hildenbrand <david(a)redhat.com>
Acked-by: Michal Hocko <mhocko(a)suse.com>
Reviewed-by: Oscar Salvador <osalvador(a)suse.de>
Cc: Matthew Wilcox <willy(a)infradead.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/page_alloc.c | 3 +++
1 file changed, 3 insertions(+)
--- a/mm/page_alloc.c~mm-page_alloc-skip-regions-with-hugetlbfs-pages-when-allocating-1g-pages
+++ a/mm/page_alloc.c
@@ -9466,6 +9466,9 @@ static bool pfn_range_valid_contig(struc
if (PageReserved(page))
return false;
+
+ if (PageHuge(page))
+ return false;
}
return true;
}
_
Patches currently in -mm which might be from mgorman(a)techsingularity.net are
The quilt patch titled
Subject: mm/mmap: regression fix for unmapped_area{_topdown}
has been removed from the -mm tree. Its filename was
mm-mmap-regression-fix-for-unmapped_area_topdown.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: "Liam R. Howlett" <Liam.Howlett(a)oracle.com>
Subject: mm/mmap: regression fix for unmapped_area{_topdown}
Date: Fri, 14 Apr 2023 14:59:19 -0400
The maple tree limits the gap returned to a window that specifically fits
what was asked. This may not be optimal in the case of switching search
directions or a gap that does not satisfy the requested space for other
reasons. Fix the search by retrying the operation and limiting the search
window in the rare occasion that a conflict occurs.
Link: https://lkml.kernel.org/r/20230414185919.4175572-1-Liam.Howlett@oracle.com
Fixes: 3499a13168da ("mm/mmap: use maple tree for unmapped_area{_topdown}")
Signed-off-by: Liam R. Howlett <Liam.Howlett(a)oracle.com>
Reported-by: Rick Edgecombe <rick.p.edgecombe(a)intel.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/mmap.c | 48 +++++++++++++++++++++++++++++++++++++++++++-----
1 file changed, 43 insertions(+), 5 deletions(-)
--- a/mm/mmap.c~mm-mmap-regression-fix-for-unmapped_area_topdown
+++ a/mm/mmap.c
@@ -1518,7 +1518,8 @@ static inline int accountable_mapping(st
*/
static unsigned long unmapped_area(struct vm_unmapped_area_info *info)
{
- unsigned long length, gap;
+ unsigned long length, gap, low_limit;
+ struct vm_area_struct *tmp;
MA_STATE(mas, ¤t->mm->mm_mt, 0, 0);
@@ -1527,12 +1528,29 @@ static unsigned long unmapped_area(struc
if (length < info->length)
return -ENOMEM;
- if (mas_empty_area(&mas, info->low_limit, info->high_limit - 1,
- length))
+ low_limit = info->low_limit;
+retry:
+ if (mas_empty_area(&mas, low_limit, info->high_limit - 1, length))
return -ENOMEM;
gap = mas.index;
gap += (info->align_offset - gap) & info->align_mask;
+ tmp = mas_next(&mas, ULONG_MAX);
+ if (tmp && (tmp->vm_flags & VM_GROWSDOWN)) { /* Avoid prev check if possible */
+ if (vm_start_gap(tmp) < gap + length - 1) {
+ low_limit = tmp->vm_end;
+ mas_reset(&mas);
+ goto retry;
+ }
+ } else {
+ tmp = mas_prev(&mas, 0);
+ if (tmp && vm_end_gap(tmp) > gap) {
+ low_limit = vm_end_gap(tmp);
+ mas_reset(&mas);
+ goto retry;
+ }
+ }
+
return gap;
}
@@ -1548,7 +1566,8 @@ static unsigned long unmapped_area(struc
*/
static unsigned long unmapped_area_topdown(struct vm_unmapped_area_info *info)
{
- unsigned long length, gap;
+ unsigned long length, gap, high_limit, gap_end;
+ struct vm_area_struct *tmp;
MA_STATE(mas, ¤t->mm->mm_mt, 0, 0);
/* Adjust search length to account for worst case alignment overhead */
@@ -1556,12 +1575,31 @@ static unsigned long unmapped_area_topdo
if (length < info->length)
return -ENOMEM;
- if (mas_empty_area_rev(&mas, info->low_limit, info->high_limit - 1,
+ high_limit = info->high_limit;
+retry:
+ if (mas_empty_area_rev(&mas, info->low_limit, high_limit - 1,
length))
return -ENOMEM;
gap = mas.last + 1 - info->length;
gap -= (gap - info->align_offset) & info->align_mask;
+ gap_end = mas.last;
+ tmp = mas_next(&mas, ULONG_MAX);
+ if (tmp && (tmp->vm_flags & VM_GROWSDOWN)) { /* Avoid prev check if possible */
+ if (vm_start_gap(tmp) <= gap_end) {
+ high_limit = vm_start_gap(tmp);
+ mas_reset(&mas);
+ goto retry;
+ }
+ } else {
+ tmp = mas_prev(&mas, 0);
+ if (tmp && vm_end_gap(tmp) > gap) {
+ high_limit = tmp->vm_start;
+ mas_reset(&mas);
+ goto retry;
+ }
+ }
+
return gap;
}
_
Patches currently in -mm which might be from Liam.Howlett(a)oracle.com are
The quilt patch titled
Subject: maple_tree: fix mas_empty_area() search
has been removed from the -mm tree. Its filename was
maple_tree-fix-mas_empty_area-search.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: "Liam R. Howlett" <Liam.Howlett(a)oracle.com>
Subject: maple_tree: fix mas_empty_area() search
Date: Fri, 14 Apr 2023 10:57:27 -0400
The internal function of mas_awalk() was incorrectly skipping the last
entry in a node, which could potentially be NULL. This is only a problem
for the left-most node in the tree - otherwise that NULL would not exist.
Fix mas_awalk() by using the metadata to obtain the end of the node for
the loop and the logical pivot as apposed to the raw pivot value.
Link: https://lkml.kernel.org/r/20230414145728.4067069-2-Liam.Howlett@oracle.com
Fixes: 54a611b60590 ("Maple Tree: add new data structure")
Signed-off-by: Liam R. Howlett <Liam.Howlett(a)oracle.com>
Reported-by: Rick Edgecombe <rick.p.edgecombe(a)intel.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
lib/maple_tree.c | 20 +++++++++++---------
1 file changed, 11 insertions(+), 9 deletions(-)
--- a/lib/maple_tree.c~maple_tree-fix-mas_empty_area-search
+++ a/lib/maple_tree.c
@@ -5056,10 +5056,10 @@ static inline bool mas_anode_descend(str
{
enum maple_type type = mte_node_type(mas->node);
unsigned long pivot, min, gap = 0;
- unsigned char offset;
- unsigned long *gaps;
- unsigned long *pivots = ma_pivots(mas_mn(mas), type);
- void __rcu **slots = ma_slots(mas_mn(mas), type);
+ unsigned char offset, data_end;
+ unsigned long *gaps, *pivots;
+ void __rcu **slots;
+ struct maple_node *node;
bool found = false;
if (ma_is_dense(type)) {
@@ -5067,13 +5067,15 @@ static inline bool mas_anode_descend(str
return true;
}
- gaps = ma_gaps(mte_to_node(mas->node), type);
+ node = mas_mn(mas);
+ pivots = ma_pivots(node, type);
+ slots = ma_slots(node, type);
+ gaps = ma_gaps(node, type);
offset = mas->offset;
min = mas_safe_min(mas, pivots, offset);
- for (; offset < mt_slots[type]; offset++) {
- pivot = mas_safe_pivot(mas, pivots, offset, type);
- if (offset && !pivot)
- break;
+ data_end = ma_data_end(node, type, pivots, mas->max);
+ for (; offset <= data_end; offset++) {
+ pivot = mas_logical_pivot(mas, pivots, offset, type);
/* Not within lower bounds */
if (mas->index > pivot)
_
Patches currently in -mm which might be from Liam.Howlett(a)oracle.com are
The quilt patch titled
Subject: maple_tree: make maple state reusable after mas_empty_area_rev()
has been removed from the -mm tree. Its filename was
maple_tree-make-maple-state-reusable-after-mas_empty_area_rev.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: "Liam R. Howlett" <Liam.Howlett(a)oracle.com>
Subject: maple_tree: make maple state reusable after mas_empty_area_rev()
Date: Fri, 14 Apr 2023 10:57:26 -0400
Stop using maple state min/max for the range by passing through pointers
for those values. This will allow the maple state to be reused without
resetting.
Also add some logic to fail out early on searching with invalid
arguments.
Link: https://lkml.kernel.org/r/20230414145728.4067069-1-Liam.Howlett@oracle.com
Fixes: 54a611b60590 ("Maple Tree: add new data structure")
Signed-off-by: Liam R. Howlett <Liam.Howlett(a)oracle.com>
Reported-by: Rick Edgecombe <rick.p.edgecombe(a)intel.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
lib/maple_tree.c | 27 +++++++++++++--------------
1 file changed, 13 insertions(+), 14 deletions(-)
--- a/lib/maple_tree.c~maple_tree-make-maple-state-reusable-after-mas_empty_area_rev
+++ a/lib/maple_tree.c
@@ -4965,7 +4965,8 @@ not_found:
* Return: True if found in a leaf, false otherwise.
*
*/
-static bool mas_rev_awalk(struct ma_state *mas, unsigned long size)
+static bool mas_rev_awalk(struct ma_state *mas, unsigned long size,
+ unsigned long *gap_min, unsigned long *gap_max)
{
enum maple_type type = mte_node_type(mas->node);
struct maple_node *node = mas_mn(mas);
@@ -5030,8 +5031,8 @@ static bool mas_rev_awalk(struct ma_stat
if (unlikely(ma_is_leaf(type))) {
mas->offset = offset;
- mas->min = min;
- mas->max = min + gap - 1;
+ *gap_min = min;
+ *gap_max = min + gap - 1;
return true;
}
@@ -5307,6 +5308,9 @@ int mas_empty_area(struct ma_state *mas,
unsigned long *pivots;
enum maple_type mt;
+ if (min >= max)
+ return -EINVAL;
+
if (mas_is_start(mas))
mas_start(mas);
else if (mas->offset >= 2)
@@ -5361,6 +5365,9 @@ int mas_empty_area_rev(struct ma_state *
{
struct maple_enode *last = mas->node;
+ if (min >= max)
+ return -EINVAL;
+
if (mas_is_start(mas)) {
mas_start(mas);
mas->offset = mas_data_end(mas);
@@ -5380,7 +5387,7 @@ int mas_empty_area_rev(struct ma_state *
mas->index = min;
mas->last = max;
- while (!mas_rev_awalk(mas, size)) {
+ while (!mas_rev_awalk(mas, size, &min, &max)) {
if (last == mas->node) {
if (!mas_rewind_node(mas))
return -EBUSY;
@@ -5395,17 +5402,9 @@ int mas_empty_area_rev(struct ma_state *
if (unlikely(mas->offset == MAPLE_NODE_SLOTS))
return -EBUSY;
- /*
- * mas_rev_awalk() has set mas->min and mas->max to the gap values. If
- * the maximum is outside the window we are searching, then use the last
- * location in the search.
- * mas->max and mas->min is the range of the gap.
- * mas->index and mas->last are currently set to the search range.
- */
-
/* Trim the upper limit to the max. */
- if (mas->max <= mas->last)
- mas->last = mas->max;
+ if (max <= mas->last)
+ mas->last = max;
mas->index = mas->last - size + 1;
return 0;
_
Patches currently in -mm which might be from Liam.Howlett(a)oracle.com are
The quilt patch titled
Subject: mm: fix memory leak on mm_init error handling
has been removed from the -mm tree. Its filename was
mm-fix-memory-leak-on-mm_init-error-handling.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Mathieu Desnoyers <mathieu.desnoyers(a)efficios.com>
Subject: mm: fix memory leak on mm_init error handling
Date: Thu, 30 Mar 2023 09:38:22 -0400
commit f1a7941243c1 ("mm: convert mm's rss stats into percpu_counter")
introduces a memory leak by missing a call to destroy_context() when a
percpu_counter fails to allocate.
Before introducing the per-cpu counter allocations, init_new_context() was
the last call that could fail in mm_init(), and thus there was no need to
ever invoke destroy_context() in the error paths. Adding the following
percpu counter allocations adds error paths after init_new_context(),
which means its associated destroy_context() needs to be called when
percpu counters fail to allocate.
Link: https://lkml.kernel.org/r/20230330133822.66271-1-mathieu.desnoyers@efficios…
Fixes: f1a7941243c1 ("mm: convert mm's rss stats into percpu_counter")
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers(a)efficios.com>
Acked-by: Shakeel Butt <shakeelb(a)google.com>
Cc: Marek Szyprowski <m.szyprowski(a)samsung.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
kernel/fork.c | 1 +
1 file changed, 1 insertion(+)
--- a/kernel/fork.c~mm-fix-memory-leak-on-mm_init-error-handling
+++ a/kernel/fork.c
@@ -1174,6 +1174,7 @@ static struct mm_struct *mm_init(struct
fail_pcpu:
while (i > 0)
percpu_counter_destroy(&mm->rss_stat[--i]);
+ destroy_context(mm);
fail_nocontext:
mm_free_pgd(mm);
fail_nopgd:
_
Patches currently in -mm which might be from mathieu.desnoyers(a)efficios.com are
The quilt patch titled
Subject: mm/page_alloc: fix potential deadlock on zonelist_update_seq seqlock
has been removed from the -mm tree. Its filename was
mm-page_alloc-fix-potential-deadlock-on-zonelist_update_seq-seqlock.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Tetsuo Handa <penguin-kernel(a)I-love.SAKURA.ne.jp>
Subject: mm/page_alloc: fix potential deadlock on zonelist_update_seq seqlock
Date: Tue, 4 Apr 2023 23:31:58 +0900
syzbot is reporting circular locking dependency which involves
zonelist_update_seq seqlock [1], for this lock is checked by memory
allocation requests which do not need to be retried.
One deadlock scenario is kmalloc(GFP_ATOMIC) from an interrupt handler.
CPU0
----
__build_all_zonelists() {
write_seqlock(&zonelist_update_seq); // makes zonelist_update_seq.seqcount odd
// e.g. timer interrupt handler runs at this moment
some_timer_func() {
kmalloc(GFP_ATOMIC) {
__alloc_pages_slowpath() {
read_seqbegin(&zonelist_update_seq) {
// spins forever because zonelist_update_seq.seqcount is odd
}
}
}
}
// e.g. timer interrupt handler finishes
write_sequnlock(&zonelist_update_seq); // makes zonelist_update_seq.seqcount even
}
This deadlock scenario can be easily eliminated by not calling
read_seqbegin(&zonelist_update_seq) from !__GFP_DIRECT_RECLAIM allocation
requests, for retry is applicable to only __GFP_DIRECT_RECLAIM allocation
requests. But Michal Hocko does not know whether we should go with this
approach.
Another deadlock scenario which syzbot is reporting is a race between
kmalloc(GFP_ATOMIC) from tty_insert_flip_string_and_push_buffer() with
port->lock held and printk() from __build_all_zonelists() with
zonelist_update_seq held.
CPU0 CPU1
---- ----
pty_write() {
tty_insert_flip_string_and_push_buffer() {
__build_all_zonelists() {
write_seqlock(&zonelist_update_seq);
build_zonelists() {
printk() {
vprintk() {
vprintk_default() {
vprintk_emit() {
console_unlock() {
console_flush_all() {
console_emit_next_record() {
con->write() = serial8250_console_write() {
spin_lock_irqsave(&port->lock, flags);
tty_insert_flip_string() {
tty_insert_flip_string_fixed_flag() {
__tty_buffer_request_room() {
tty_buffer_alloc() {
kmalloc(GFP_ATOMIC | __GFP_NOWARN) {
__alloc_pages_slowpath() {
zonelist_iter_begin() {
read_seqbegin(&zonelist_update_seq); // spins forever because zonelist_update_seq.seqcount is odd
spin_lock_irqsave(&port->lock, flags); // spins forever because port->lock is held
}
}
}
}
}
}
}
}
spin_unlock_irqrestore(&port->lock, flags);
// message is printed to console
spin_unlock_irqrestore(&port->lock, flags);
}
}
}
}
}
}
}
}
}
write_sequnlock(&zonelist_update_seq);
}
}
}
This deadlock scenario can be eliminated by
preventing interrupt context from calling kmalloc(GFP_ATOMIC)
and
preventing printk() from calling console_flush_all()
while zonelist_update_seq.seqcount is odd.
Since Petr Mladek thinks that __build_all_zonelists() can become a
candidate for deferring printk() [2], let's address this problem by
disabling local interrupts in order to avoid kmalloc(GFP_ATOMIC)
and
disabling synchronous printk() in order to avoid console_flush_all()
.
As a side effect of minimizing duration of zonelist_update_seq.seqcount
being odd by disabling synchronous printk(), latency at
read_seqbegin(&zonelist_update_seq) for both !__GFP_DIRECT_RECLAIM and
__GFP_DIRECT_RECLAIM allocation requests will be reduced. Although, from
lockdep perspective, not calling read_seqbegin(&zonelist_update_seq) (i.e.
do not record unnecessary locking dependency) from interrupt context is
still preferable, even if we don't allow calling kmalloc(GFP_ATOMIC)
inside
write_seqlock(&zonelist_update_seq)/write_sequnlock(&zonelist_update_seq)
section...
Link: https://lkml.kernel.org/r/8796b95c-3da3-5885-fddd-6ef55f30e4d3@I-love.SAKUR…
Fixes: 3d36424b3b58 ("mm/page_alloc: fix race condition between build_all_zonelists and page allocation")
Link: https://lkml.kernel.org/r/ZCrs+1cDqPWTDFNM@alley [2]
Reported-by: syzbot <syzbot+223c7461c58c58a4cb10(a)syzkaller.appspotmail.com>
Link: https://syzkaller.appspot.com/bug?extid=223c7461c58c58a4cb10 [1]
Signed-off-by: Tetsuo Handa <penguin-kernel(a)I-love.SAKURA.ne.jp>
Acked-by: Michal Hocko <mhocko(a)suse.com>
Acked-by: Mel Gorman <mgorman(a)techsingularity.net>
Cc: Petr Mladek <pmladek(a)suse.com>
Cc: David Hildenbrand <david(a)redhat.com>
Cc: Ilpo J��rvinen <ilpo.jarvinen(a)linux.intel.com>
Cc: John Ogness <john.ogness(a)linutronix.de>
Cc: Patrick Daly <quic_pdaly(a)quicinc.com>
Cc: Sergey Senozhatsky <senozhatsky(a)chromium.org>
Cc: Steven Rostedt <rostedt(a)goodmis.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/page_alloc.c | 16 ++++++++++++++++
1 file changed, 16 insertions(+)
--- a/mm/page_alloc.c~mm-page_alloc-fix-potential-deadlock-on-zonelist_update_seq-seqlock
+++ a/mm/page_alloc.c
@@ -6632,7 +6632,21 @@ static void __build_all_zonelists(void *
int nid;
int __maybe_unused cpu;
pg_data_t *self = data;
+ unsigned long flags;
+ /*
+ * Explicitly disable this CPU's interrupts before taking seqlock
+ * to prevent any IRQ handler from calling into the page allocator
+ * (e.g. GFP_ATOMIC) that could hit zonelist_iter_begin and livelock.
+ */
+ local_irq_save(flags);
+ /*
+ * Explicitly disable this CPU's synchronous printk() before taking
+ * seqlock to prevent any printk() from trying to hold port->lock, for
+ * tty_insert_flip_string_and_push_buffer() on other CPU might be
+ * calling kmalloc(GFP_ATOMIC | __GFP_NOWARN) with port->lock held.
+ */
+ printk_deferred_enter();
write_seqlock(&zonelist_update_seq);
#ifdef CONFIG_NUMA
@@ -6671,6 +6685,8 @@ static void __build_all_zonelists(void *
}
write_sequnlock(&zonelist_update_seq);
+ printk_deferred_exit();
+ local_irq_restore(flags);
}
static noinline void __init
_
Patches currently in -mm which might be from penguin-kernel(a)I-love.SAKURA.ne.jp are
The quilt patch titled
Subject: kernel/sys.c: fix and improve control flow in __sys_setres[ug]id()
has been removed from the -mm tree. Its filename was
kernel-sysc-fix-and-improve-control-flow-in-__sys_setresid.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Ondrej Mosnacek <omosnace(a)redhat.com>
Subject: kernel/sys.c: fix and improve control flow in __sys_setres[ug]id()
Date: Fri, 17 Feb 2023 17:21:54 +0100
Linux Security Modules (LSMs) that implement the "capable" hook will
usually emit an access denial message to the audit log whenever they
"block" the current task from using the given capability based on their
security policy.
The occurrence of a denial is used as an indication that the given task
has attempted an operation that requires the given access permission, so
the callers of functions that perform LSM permission checks must take care
to avoid calling them too early (before it is decided if the permission is
actually needed to perform the requested operation).
The __sys_setres[ug]id() functions violate this convention by first
calling ns_capable_setid() and only then checking if the operation
requires the capability or not. It means that any caller that has the
capability granted by DAC (task's capability set) but not by MAC (LSMs)
will generate a "denied" audit record, even if is doing an operation for
which the capability is not required.
Fix this by reordering the checks such that ns_capable_setid() is checked
last and -EPERM is returned immediately if it returns false.
While there, also do two small optimizations:
* move the capability check before prepare_creds() and
* bail out early in case of a no-op.
Link: https://lkml.kernel.org/r/20230217162154.837549-1-omosnace@redhat.com
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Ondrej Mosnacek <omosnace(a)redhat.com>
Cc: Eric W. Biederman <ebiederm(a)xmission.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
kernel/sys.c | 69 ++++++++++++++++++++++++++++---------------------
1 file changed, 40 insertions(+), 29 deletions(-)
--- a/kernel/sys.c~kernel-sysc-fix-and-improve-control-flow-in-__sys_setresid
+++ a/kernel/sys.c
@@ -664,6 +664,7 @@ long __sys_setresuid(uid_t ruid, uid_t e
struct cred *new;
int retval;
kuid_t kruid, keuid, ksuid;
+ bool ruid_new, euid_new, suid_new;
kruid = make_kuid(ns, ruid);
keuid = make_kuid(ns, euid);
@@ -678,25 +679,29 @@ long __sys_setresuid(uid_t ruid, uid_t e
if ((suid != (uid_t) -1) && !uid_valid(ksuid))
return -EINVAL;
+ old = current_cred();
+
+ /* check for no-op */
+ if ((ruid == (uid_t) -1 || uid_eq(kruid, old->uid)) &&
+ (euid == (uid_t) -1 || (uid_eq(keuid, old->euid) &&
+ uid_eq(keuid, old->fsuid))) &&
+ (suid == (uid_t) -1 || uid_eq(ksuid, old->suid)))
+ return 0;
+
+ ruid_new = ruid != (uid_t) -1 && !uid_eq(kruid, old->uid) &&
+ !uid_eq(kruid, old->euid) && !uid_eq(kruid, old->suid);
+ euid_new = euid != (uid_t) -1 && !uid_eq(keuid, old->uid) &&
+ !uid_eq(keuid, old->euid) && !uid_eq(keuid, old->suid);
+ suid_new = suid != (uid_t) -1 && !uid_eq(ksuid, old->uid) &&
+ !uid_eq(ksuid, old->euid) && !uid_eq(ksuid, old->suid);
+ if ((ruid_new || euid_new || suid_new) &&
+ !ns_capable_setid(old->user_ns, CAP_SETUID))
+ return -EPERM;
+
new = prepare_creds();
if (!new)
return -ENOMEM;
- old = current_cred();
-
- retval = -EPERM;
- if (!ns_capable_setid(old->user_ns, CAP_SETUID)) {
- if (ruid != (uid_t) -1 && !uid_eq(kruid, old->uid) &&
- !uid_eq(kruid, old->euid) && !uid_eq(kruid, old->suid))
- goto error;
- if (euid != (uid_t) -1 && !uid_eq(keuid, old->uid) &&
- !uid_eq(keuid, old->euid) && !uid_eq(keuid, old->suid))
- goto error;
- if (suid != (uid_t) -1 && !uid_eq(ksuid, old->uid) &&
- !uid_eq(ksuid, old->euid) && !uid_eq(ksuid, old->suid))
- goto error;
- }
-
if (ruid != (uid_t) -1) {
new->uid = kruid;
if (!uid_eq(kruid, old->uid)) {
@@ -761,6 +766,7 @@ long __sys_setresgid(gid_t rgid, gid_t e
struct cred *new;
int retval;
kgid_t krgid, kegid, ksgid;
+ bool rgid_new, egid_new, sgid_new;
krgid = make_kgid(ns, rgid);
kegid = make_kgid(ns, egid);
@@ -773,23 +779,28 @@ long __sys_setresgid(gid_t rgid, gid_t e
if ((sgid != (gid_t) -1) && !gid_valid(ksgid))
return -EINVAL;
+ old = current_cred();
+
+ /* check for no-op */
+ if ((rgid == (gid_t) -1 || gid_eq(krgid, old->gid)) &&
+ (egid == (gid_t) -1 || (gid_eq(kegid, old->egid) &&
+ gid_eq(kegid, old->fsgid))) &&
+ (sgid == (gid_t) -1 || gid_eq(ksgid, old->sgid)))
+ return 0;
+
+ rgid_new = rgid != (gid_t) -1 && !gid_eq(krgid, old->gid) &&
+ !gid_eq(krgid, old->egid) && !gid_eq(krgid, old->sgid);
+ egid_new = egid != (gid_t) -1 && !gid_eq(kegid, old->gid) &&
+ !gid_eq(kegid, old->egid) && !gid_eq(kegid, old->sgid);
+ sgid_new = sgid != (gid_t) -1 && !gid_eq(ksgid, old->gid) &&
+ !gid_eq(ksgid, old->egid) && !gid_eq(ksgid, old->sgid);
+ if ((rgid_new || egid_new || sgid_new) &&
+ !ns_capable_setid(old->user_ns, CAP_SETGID))
+ return -EPERM;
+
new = prepare_creds();
if (!new)
return -ENOMEM;
- old = current_cred();
-
- retval = -EPERM;
- if (!ns_capable_setid(old->user_ns, CAP_SETGID)) {
- if (rgid != (gid_t) -1 && !gid_eq(krgid, old->gid) &&
- !gid_eq(krgid, old->egid) && !gid_eq(krgid, old->sgid))
- goto error;
- if (egid != (gid_t) -1 && !gid_eq(kegid, old->gid) &&
- !gid_eq(kegid, old->egid) && !gid_eq(kegid, old->sgid))
- goto error;
- if (sgid != (gid_t) -1 && !gid_eq(ksgid, old->gid) &&
- !gid_eq(ksgid, old->egid) && !gid_eq(ksgid, old->sgid))
- goto error;
- }
if (rgid != (gid_t) -1)
new->gid = krgid;
_
Patches currently in -mm which might be from omosnace(a)redhat.com are