Hello,
A reputable pharmaceutical company from Vietnam is in need of a reliable individual or corporate entity in your state to act as their Liaison; this will not affect your current job or business operations in anyway. If interested, reply for more information.
Sincerely,
Ms. Kelvin Lin
CC
This is a note to let you know that I've just added the patch titled
iio: adc: ad_sigma_delta: do not use internal iio_dev lock
to my char-misc git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc.git
in the char-misc-next branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will also be merged in the next major kernel release
during the merge window.
If you have any questions about this process, please let me know.
From 20228a1d5a55e7db0c6720840f2c7d2b48c55f69 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Nuno=20S=C3=A1?= <nuno.sa(a)analog.com>
Date: Tue, 20 Sep 2022 13:28:07 +0200
Subject: iio: adc: ad_sigma_delta: do not use internal iio_dev lock
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Drop 'mlock' usage by making use of iio_device_claim_direct_mode().
This change actually makes sure we cannot do a single conversion while
buffering is enable. Note there was a potential race in the previous
code since we were only acquiring the lock after checking if the bus is
enabled.
Fixes: af3008485ea0 ("iio:adc: Add common code for ADI Sigma Delta devices")
Signed-off-by: Nuno Sá <nuno.sa(a)analog.com>
Reviewed-by: Miquel Raynal <miquel.raynal(a)bootlin.com>
Cc: <Stable(a)vger.kernel.org> #No rush as race is very old.
Link: https://lore.kernel.org/r/20220920112821.975359-2-nuno.sa@analog.com
Signed-off-by: Jonathan Cameron <Jonathan.Cameron(a)huawei.com>
---
drivers/iio/adc/ad_sigma_delta.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/drivers/iio/adc/ad_sigma_delta.c b/drivers/iio/adc/ad_sigma_delta.c
index 261a9a6b45e1..d8570f620785 100644
--- a/drivers/iio/adc/ad_sigma_delta.c
+++ b/drivers/iio/adc/ad_sigma_delta.c
@@ -281,10 +281,10 @@ int ad_sigma_delta_single_conversion(struct iio_dev *indio_dev,
unsigned int data_reg;
int ret = 0;
- if (iio_buffer_enabled(indio_dev))
- return -EBUSY;
+ ret = iio_device_claim_direct_mode(indio_dev);
+ if (ret)
+ return ret;
- mutex_lock(&indio_dev->mlock);
ad_sigma_delta_set_channel(sigma_delta, chan->address);
spi_bus_lock(sigma_delta->spi->master);
@@ -323,7 +323,7 @@ int ad_sigma_delta_single_conversion(struct iio_dev *indio_dev,
ad_sigma_delta_set_mode(sigma_delta, AD_SD_MODE_IDLE);
sigma_delta->bus_locked = false;
spi_bus_unlock(sigma_delta->spi->master);
- mutex_unlock(&indio_dev->mlock);
+ iio_device_release_direct_mode(indio_dev);
if (ret)
return ret;
--
2.38.1
This is a note to let you know that I've just added the patch titled
iio: adc: ad_sigma_delta: do not use internal iio_dev lock
to my char-misc git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc.git
in the char-misc-testing branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will be merged to the char-misc-next branch sometime soon,
after it passes testing, and the merge window is open.
If you have any questions about this process, please let me know.
From 20228a1d5a55e7db0c6720840f2c7d2b48c55f69 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Nuno=20S=C3=A1?= <nuno.sa(a)analog.com>
Date: Tue, 20 Sep 2022 13:28:07 +0200
Subject: iio: adc: ad_sigma_delta: do not use internal iio_dev lock
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Drop 'mlock' usage by making use of iio_device_claim_direct_mode().
This change actually makes sure we cannot do a single conversion while
buffering is enable. Note there was a potential race in the previous
code since we were only acquiring the lock after checking if the bus is
enabled.
Fixes: af3008485ea0 ("iio:adc: Add common code for ADI Sigma Delta devices")
Signed-off-by: Nuno Sá <nuno.sa(a)analog.com>
Reviewed-by: Miquel Raynal <miquel.raynal(a)bootlin.com>
Cc: <Stable(a)vger.kernel.org> #No rush as race is very old.
Link: https://lore.kernel.org/r/20220920112821.975359-2-nuno.sa@analog.com
Signed-off-by: Jonathan Cameron <Jonathan.Cameron(a)huawei.com>
---
drivers/iio/adc/ad_sigma_delta.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/drivers/iio/adc/ad_sigma_delta.c b/drivers/iio/adc/ad_sigma_delta.c
index 261a9a6b45e1..d8570f620785 100644
--- a/drivers/iio/adc/ad_sigma_delta.c
+++ b/drivers/iio/adc/ad_sigma_delta.c
@@ -281,10 +281,10 @@ int ad_sigma_delta_single_conversion(struct iio_dev *indio_dev,
unsigned int data_reg;
int ret = 0;
- if (iio_buffer_enabled(indio_dev))
- return -EBUSY;
+ ret = iio_device_claim_direct_mode(indio_dev);
+ if (ret)
+ return ret;
- mutex_lock(&indio_dev->mlock);
ad_sigma_delta_set_channel(sigma_delta, chan->address);
spi_bus_lock(sigma_delta->spi->master);
@@ -323,7 +323,7 @@ int ad_sigma_delta_single_conversion(struct iio_dev *indio_dev,
ad_sigma_delta_set_mode(sigma_delta, AD_SD_MODE_IDLE);
sigma_delta->bus_locked = false;
spi_bus_unlock(sigma_delta->spi->master);
- mutex_unlock(&indio_dev->mlock);
+ iio_device_release_direct_mode(indio_dev);
if (ret)
return ret;
--
2.38.1
Hi Greg
6.0.10-rc2
compiles [1], boots and runs here on x86_64
(Intel i5-11400, Fedora 37)
[1]
no rtc_wake_setup error seen here
Thanks
Tested-by: Ronald Warsow <rwarsow(a)gmx.de>
>
> On 24.11.22 02:08, Dominic Jones wrote:
> >> On Fri, Oct 28, 2022 at 02:51:43PM +0000, Dominic Jones wrote:
> >>> Updating the machine's kernel from v5.19.x to v6.0.x causes the machine to not
> >>> successfully boot. The machine boots successfully (and exhibits stable operation)
> >>> with version v5.19.17 and multiple earlier releases in the 5.19 line. Multiple releases
> >>> from the 6.0 line (including 6.0.0, 6.0.3, and 6.0.5), with no other changes to the
> >>> software environment, do not boot. Instead, the machine hangs after loading services
> >>> but before presenting a display manager; the machine instead shows repetitive hard
> >>> drive activity at this point and then no apparent activity.
> >>>
> >>> ''uname'' output for the machine successfully running v5.19.17 is:
> >>>
> >>> Linux [MACHINE_NAME] 5.19.17 #1 SMP PREEMPT_DYNAMIC Mon Oct 24 13:32:29 2022 i686 Intel(R) Atom(TM) CPU N270 @ 1.60GHz GenuineIntel GNU/Linux
> >>>
> >>> The machine is an OCZ Neutrino netbook, running a custom OS build largely similar to
> >>> LFS development. The kernel update uses ''make olddefconfig''.
> >>
> >> Can you use 'git bisect' to find the offending change that causes this
> >> to happen?
> >
> > Bisection is complete. Here's what it returned.
> >
> > ---
> >
> > 3a194f3f8ad01bce00bd7174aaba1563bcc827eb is the first bad commit
>
> Many thx for this. A fix for that particular commit for recently
> committed to 6.0.y:
> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=…
>
> That thus bears the question: does your problem still happen with the
> latest 6.0.y version?
I'll test; it looks like 6.0.9 is the current stable release so I'll give that a
try.
Dominic
This is an automatic generated email to let you know that the following patch were queued:
Subject: media: dvb-core: Fix double free in dvb_register_device()
Author: Keita Suzuki <keitasuzuki.park(a)sslab.ics.keio.ac.jp>
Date: Tue Apr 26 06:29:19 2022 +0100
In function dvb_register_device() -> dvb_register_media_device() ->
dvb_create_media_entity(), dvb->entity is allocated and initialized. If
the initialization fails, it frees the dvb->entity, and return an error
code. The caller takes the error code and handles the error by calling
dvb_media_device_free(), which unregisters the entity and frees the
field again if it is not NULL. As dvb->entity may not NULLed in
dvb_create_media_entity() when the allocation of dvbdev->pad fails, a
double free may occur. This may also cause an Use After free in
media_device_unregister_entity().
Fix this by storing NULL to dvb->entity when it is freed.
Link: https://lore.kernel.org/linux-media/20220426052921.2088416-1-keitasuzuki.pa…
Fixes: fcd5ce4b3936 ("media: dvb-core: fix a memory leak bug")
Cc: stable(a)vger.kernel.org
Cc: Wenwen Wang <wenwen(a)cs.uga.edu>
Signed-off-by: Keita Suzuki <keitasuzuki.park(a)sslab.ics.keio.ac.jp>
Signed-off-by: Mauro Carvalho Chehab <mchehab(a)kernel.org>
drivers/media/dvb-core/dvbdev.c | 1 +
1 file changed, 1 insertion(+)
---
diff --git a/drivers/media/dvb-core/dvbdev.c b/drivers/media/dvb-core/dvbdev.c
index d5a142ef9876..5b275a9395c1 100644
--- a/drivers/media/dvb-core/dvbdev.c
+++ b/drivers/media/dvb-core/dvbdev.c
@@ -333,6 +333,7 @@ static int dvb_create_media_entity(struct dvb_device *dvbdev,
GFP_KERNEL);
if (!dvbdev->pads) {
kfree(dvbdev->entity);
+ dvbdev->entity = NULL;
return -ENOMEM;
}
}
From: Thomas Huth <thuth(a)redhat.com>
We recently experienced some weird huge time jumps in nested guests when
rebooting them in certain cases. After adding some debug code to the epoch
handling in vsie.c (thanks to David Hildenbrand for the idea!), it was
obvious that the "epdx" field (the multi-epoch extension) did not get set
to 0xff in case the "epoch" field was negative.
Seems like the code misses to copy the value from the epdx field from
the guest to the shadow control block. By doing so, the weird time
jumps are gone in our scenarios.
Link: https://bugzilla.redhat.com/show_bug.cgi?id=2140899
Fixes: 8fa1696ea781 ("KVM: s390: Multiple Epoch Facility support")
Signed-off-by: Thomas Huth <thuth(a)redhat.com>
Reviewed-by: Christian Borntraeger <borntraeger(a)linux.ibm.com>
Acked-by: David Hildenbrand <david(a)redhat.com>
Reviewed-by: Claudio Imbrenda <imbrenda(a)linux.ibm.com>
Reviewed-by: Janosch Frank <frankja(a)linux.ibm.com>
Cc: stable(a)vger.kernel.org # 4.19+
Link: https://lore.kernel.org/r/20221123090833.292938-1-thuth@redhat.com
Message-Id: <20221123090833.292938-1-thuth(a)redhat.com>
Signed-off-by: Janosch Frank <frankja(a)linux.ibm.com>
---
arch/s390/kvm/vsie.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/arch/s390/kvm/vsie.c b/arch/s390/kvm/vsie.c
index 94138f8f0c1c..ace2541ababd 100644
--- a/arch/s390/kvm/vsie.c
+++ b/arch/s390/kvm/vsie.c
@@ -546,8 +546,10 @@ static int shadow_scb(struct kvm_vcpu *vcpu, struct vsie_page *vsie_page)
if (test_kvm_cpu_feat(vcpu->kvm, KVM_S390_VM_CPU_FEAT_CEI))
scb_s->eca |= scb_o->eca & ECA_CEI;
/* Epoch Extension */
- if (test_kvm_facility(vcpu->kvm, 139))
+ if (test_kvm_facility(vcpu->kvm, 139)) {
scb_s->ecd |= scb_o->ecd & ECD_MEF;
+ scb_s->epdx = scb_o->epdx;
+ }
/* etoken */
if (test_kvm_facility(vcpu->kvm, 156))
--
2.38.1
From: Ziyang Xuan <william.xuanziyang(a)huawei.com>
In can327_feed_frame_to_netdev(), it did not free the skb when netdev
is down, and all callers of can327_feed_frame_to_netdev() did not free
allocated skb too. That would trigger skb leak.
Fix it by adding kfree_skb() in can327_feed_frame_to_netdev() when netdev
is down. Not tested, just compiled.
Fixes: 43da2f07622f ("can: can327: CAN/ldisc driver for ELM327 based OBD-II adapters")
Signed-off-by: Ziyang Xuan <william.xuanziyang(a)huawei.com>
Link: https://lore.kernel.org/all/20221110061437.411525-1-william.xuanziyang@huaw…
Reviewed-by: Max Staudt <max(a)enpas.org>
Cc: stable(a)vger.kernel.org
Signed-off-by: Marc Kleine-Budde <mkl(a)pengutronix.de>
---
drivers/net/can/can327.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/net/can/can327.c b/drivers/net/can/can327.c
index 094197780776..ed3d0b8989a0 100644
--- a/drivers/net/can/can327.c
+++ b/drivers/net/can/can327.c
@@ -263,8 +263,10 @@ static void can327_feed_frame_to_netdev(struct can327 *elm, struct sk_buff *skb)
{
lockdep_assert_held(&elm->lock);
- if (!netif_running(elm->dev))
+ if (!netif_running(elm->dev)) {
+ kfree_skb(skb);
return;
+ }
/* Queue for NAPI pickup.
* rx-offload will update stats and LEDs for us.
base-commit: ad17c2a3f11b0f6b122e7842d8f7d9a5fcc7ac63
--
2.35.1
Hello friend,
I am writing to you on behalf of my client Mr. Yusuf Habib. My name is
Lukas, I am an investment portfolio Manager at MetLife, and my client
(Mr. Yusuf Habib) has a large sum of money and he is looking for someone
to help him manage the funds.
The Saudi government filed charges against my-client Mr. Yusuf Habib
with the aim of keeping him in prison indefinitely. A variety of local
and foreign politicians, civil activists, and journalists consider the
process leading to the imprisonment of Mr. Yusuf to be politically
motivated. My client's involvement and financial support for Jamal Ahmad
Khashoggi posed the most challenge ever to Mohammed bin Salman Al Saud
who happens to be the current Crown Prince of Saudi Arabia. The money is
currently deposited in the name of an existing Investment entity.
My client Mr. Yusuf Habib has presented a subtle offer that will need
the help of a partner like you to complete successfully. Mr. Yusuf Habib
is in a difficult situation, and he must immediately relocate certain
sums of money and this must be done in such a way that it must not be
tied to Mr. Yusuf Habib. The money is currently deposited in the name of
an existing Investment entity.
Your role will be to:
[1]. Act as the original beneficiary of the funds.
[2]. Receive the funds into a business / private bank account.
[3]. Invest / Manage the funds outside of Turkey
[4]. Value of funds: 35 million US Dollars.
See the website below to understand better the problem Mr. Yusuf Habib
faced all these past years:
Everything will be done legally to ensure the rights to the funds are
transferred to you. If you agree to partner with Mr. Yusuf Habib in this
partnership business proposal, he will compensate you with 35% percent
of the total sum.
Terms will be discussed when you show interest and if you aren't
interested and you know of someone looking for an investor, please give
him / her my contact.
Should you prefer I re-contact you with more express facts. Then make
your interest known.
Sincerely,
Lukas.
The patch titled
Subject: mm: migrate: fix THP's mapcount on isolation
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
mm-migrate-fix-thps-mapcount-on-isolation.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Gavin Shan <gshan(a)redhat.com>
Subject: mm: migrate: fix THP's mapcount on isolation
Date: Thu, 24 Nov 2022 17:55:23 +0800
The issue is reported when removing memory through virtio_mem device. The
transparent huge page, experienced copy-on-write fault, is wrongly
regarded as pinned. The transparent huge page is escaped from being
isolated in isolate_migratepages_block(). The transparent huge page can't
be migrated and the corresponding memory block can't be put into offline
state.
Fix it by replacing page_mapcount() with total_mapcount(). With this, the
transparent huge page can be isolated and migrated, and the memory block
can be put into offline state. Besides, The page's refcount is increased
a bit earlier to avoid the page is released when the check is executed.
Link: https://lkml.kernel.org/r/20221124095523.31061-1-gshan@redhat.com
Fixes: 1da2f328fa64 ("mm,thp,compaction,cma: allow THP migration for CMA allocations")
Signed-off-by: Gavin Shan <gshan(a)redhat.com>
Reported-by: Zhenyu Zhang <zhenyzha(a)redhat.com>
Suggested-by: David Hildenbrand <david(a)redhat.com>
Acked-by: David Hildenbrand <david(a)redhat.com>
Cc: Alistair Popple <apopple(a)nvidia.com>
Cc: Hugh Dickins <hughd(a)google.com>
Cc: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com>
Cc: Matthew Wilcox <willy(a)infradead.org>
Cc: William Kucharski <william.kucharski(a)oracle.com>
Cc: Zi Yan <ziy(a)nvidia.com>
Cc: <stable(a)vger.kernel.org> [5.7+]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/compaction.c | 22 +++++++++++-----------
1 file changed, 11 insertions(+), 11 deletions(-)
--- a/mm/compaction.c~mm-migrate-fix-thps-mapcount-on-isolation
+++ a/mm/compaction.c
@@ -985,28 +985,28 @@ isolate_migratepages_block(struct compac
}
/*
+ * Be careful not to clear PageLRU until after we're
+ * sure the page is not being freed elsewhere -- the
+ * page release code relies on it.
+ */
+ if (unlikely(!get_page_unless_zero(page)))
+ goto isolate_fail;
+
+ /*
* Migration will fail if an anonymous page is pinned in memory,
* so avoid taking lru_lock and isolating it unnecessarily in an
* admittedly racy check.
*/
mapping = page_mapping(page);
- if (!mapping && page_count(page) > page_mapcount(page))
- goto isolate_fail;
+ if (!mapping && (page_count(page) - 1) > total_mapcount(page))
+ goto isolate_fail_put;
/*
* Only allow to migrate anonymous pages in GFP_NOFS context
* because those do not depend on fs locks.
*/
if (!(cc->gfp_mask & __GFP_FS) && mapping)
- goto isolate_fail;
-
- /*
- * Be careful not to clear PageLRU until after we're
- * sure the page is not being freed elsewhere -- the
- * page release code relies on it.
- */
- if (unlikely(!get_page_unless_zero(page)))
- goto isolate_fail;
+ goto isolate_fail_put;
/* Only take pages on LRU: a check now makes later tests safe */
if (!PageLRU(page))
_
Patches currently in -mm which might be from gshan(a)redhat.com are
mm-migrate-fix-thps-mapcount-on-isolation.patch
The quilt patch titled
Subject: mm: migrate: Fix THP's mapcount on isolation
has been removed from the -mm tree. Its filename was
mm-migrate-fix-thps-mapcount-on-isolation.patch
This patch was dropped because an updated version will be merged
------------------------------------------------------
From: Gavin Shan <gshan(a)redhat.com>
Subject: mm: migrate: Fix THP's mapcount on isolation
Date: Wed, 23 Nov 2022 08:57:52 +0800
The issue is reported when removing memory through virtio_mem device. The
transparent huge page, experienced copy-on-write fault, is wrongly
regarded as pinned. The transparent huge page is escaped from being
isolated in isolate_migratepages_block(). The transparent huge page can't
be migrated and the corresponding memory block can't be put into offline
state.
Fix it by replacing page_mapcount() with total_mapcount(). With this, the
transparent huge page can be isolated and migrated, and the memory block
can be put into offline state.
Link: https://lkml.kernel.org/r/20221123005752.161003-1-gshan@redhat.com
Fixes: 3917c80280c9 ("thp: change CoW semantics for anon-THP")
Signed-off-by: Gavin Shan <gshan(a)redhat.com>
Reported-by: Zhenyu Zhang <zhenyzha(a)redhat.com>
Suggested-by: David Hildenbrand <david(a)redhat.com>
Cc: David Hildenbrand <david(a)redhat.com>
Cc: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com>
Cc: William Kucharski <william.kucharski(a)oracle.com>
Cc: Zi Yan <ziy(a)nvidia.com>
Cc: <stable(a)vger.kernel.org> [v5.8+]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/compaction.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/mm/compaction.c~mm-migrate-fix-thps-mapcount-on-isolation
+++ a/mm/compaction.c
@@ -990,7 +990,7 @@ isolate_migratepages_block(struct compac
* admittedly racy check.
*/
mapping = page_mapping(page);
- if (!mapping && page_count(page) > page_mapcount(page))
+ if (!mapping && page_count(page) > total_mapcount(page))
goto isolate_fail;
/*
_
Patches currently in -mm which might be from gshan(a)redhat.com are
Make nfsd_splice_actor work with reads with a non-zero offset that doesn't end on a page boundary.
This was found when virtual machines with nfs-mounted qcow2 disks failed to boot properly (originally found
on v6.0.5, fix also needed and tested on v6.0.9 and v6.1-rc6).
Signed-off-by: Anders Blomdell <anders.blomdell(a)control.lth.se>
Link: https://bugzilla.redhat.com/show_bug.cgi?id=2142132
Fixes: bfbfb6182ad1 "nfsd_splice_actor(): handle compound pages"
Cc: stable(a)vger.kernel.org # v6.0+
-- a/fs/nfsd/vfs.c
+++ b/fs/nfsd/vfs.c
@@ -869,12 +869,13 @@ nfsd_splice_actor(struct pipe_inode_info *pipe, struct pipe_buffer *buf,
struct splice_desc *sd)
{
struct svc_rqst *rqstp = sd->u.data;
- struct page *page = buf->page; // may be a compound one
+ // buf->page may be a compound one
unsigned offset = buf->offset;
+ struct page *first = buf->page + offset / PAGE_SIZE;
+ struct page *last = buf->page + (offset + sd->len - 1) / PAGE_SIZE;
- page += offset / PAGE_SIZE;
- for (int i = sd->len; i > 0; i -= PAGE_SIZE)
- svc_rqst_replace_page(rqstp, page++);
+ for (struct page *page = first; page <= last; page++)
+ svc_rqst_replace_page(rqstp, page);
if (rqstp->rq_res.page_len == 0) // first call
rqstp->rq_res.page_base = offset % PAGE_SIZE;
rqstp->rq_res.page_len += sd->len;
--
Anders Blomdell Email: anders.blomdell(a)control.lth.se
Department of Automatic Control
Lund University Phone: +46 46 222 4625
P.O. Box 118
SE-221 00 Lund, Sweden
From: "Steven Rostedt (Google)" <rostedt(a)goodmis.org>
commit 94eedf3dded5 ("tracing: Fix race where eprobes can be called before
the event") fixed an issue where if an event is soft disabled, and the
trigger is being added, there's a small window where the event sees that
there's a trigger but does not see that it requires reading the event yet,
and then calls the trigger with the record == NULL.
This could be solved with adding memory barriers in the hot path, or to
make sure that all the triggers requiring a record check for NULL. The
latter was chosen.
Commit 94eedf3dded5 set the eprobe trigger handle to check for NULL, but
the same needs to be done with histograms.
Link: https://lore.kernel.org/linux-trace-kernel/20221118211809.701d40c0f8a757b0d…
Link: https://lore.kernel.org/linux-trace-kernel/20221123164323.03450c3a@gandalf.…
Cc: Tom Zanussi <zanussi(a)kernel.org>
Cc: stable(a)vger.kernel.org
Fixes: 7491e2c442781 ("tracing: Add a probe that attaches to trace events")
Reported-by: Masami Hiramatsu (Google) <mhiramat(a)kernel.org>
Acked-by: Masami Hiramatsu (Google) <mhiramat(a)kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
---
kernel/trace/trace_events_hist.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/kernel/trace/trace_events_hist.c b/kernel/trace/trace_events_hist.c
index 087c19548049..1c82478e8dff 100644
--- a/kernel/trace/trace_events_hist.c
+++ b/kernel/trace/trace_events_hist.c
@@ -5143,6 +5143,9 @@ static void event_hist_trigger(struct event_trigger_data *data,
void *key = NULL;
unsigned int i;
+ if (unlikely(!rbe))
+ return;
+
memset(compound_key, 0, hist_data->key_size);
for_each_hist_key_field(i, hist_data) {
--
2.35.1
From: Keith Busch <kbusch(a)kernel.org>
commit 23e085b2dead13b51fe86d27069895b740f749c0 upstream.
The passthrough commands already have this restriction, but the other
operations do not. Require the same capabilities for all users as all of
these operations, which include resets and rescans, can be disruptive.
Signed-off-by: Keith Busch <kbusch(a)kernel.org>
Signed-off-by: Christoph Hellwig <hch(a)lst.de>
Signed-off-by: Ovidiu Panait <ovidiu.panait(a)windriver.com>
---
These backports are for CVE-2022-3169.
drivers/nvme/host/core.c | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 3f106771d15b..d9c78fe85cb3 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -3330,11 +3330,17 @@ static long nvme_dev_ioctl(struct file *file, unsigned int cmd,
case NVME_IOCTL_IO_CMD:
return nvme_dev_user_cmd(ctrl, argp);
case NVME_IOCTL_RESET:
+ if (!capable(CAP_SYS_ADMIN))
+ return -EACCES;
dev_warn(ctrl->device, "resetting controller\n");
return nvme_reset_ctrl_sync(ctrl);
case NVME_IOCTL_SUBSYS_RESET:
+ if (!capable(CAP_SYS_ADMIN))
+ return -EACCES;
return nvme_reset_subsystem(ctrl);
case NVME_IOCTL_RESCAN:
+ if (!capable(CAP_SYS_ADMIN))
+ return -EACCES;
nvme_queue_scan(ctrl);
return 0;
default:
--
2.38.1
The following commit has been merged into the perf/core branch of tip:
Commit-ID: 6532783310e2b2f50dc13f46c49aa6546cb6e7a3
Gitweb: https://git.kernel.org/tip/6532783310e2b2f50dc13f46c49aa6546cb6e7a3
Author: Alexander Antonov <alexander.antonov(a)linux.intel.com>
AuthorDate: Thu, 17 Nov 2022 12:28:25
Committer: Peter Zijlstra <peterz(a)infradead.org>
CommitterDate: Thu, 24 Nov 2022 11:09:20 +01:00
perf/x86/intel/uncore: Clear attr_update properly
Current clear_attr_update procedure in pmu_set_mapping() sets attr_update
field in NULL that is not correct because intel_uncore_type pmu types can
contain several groups in attr_update field. For example, SPR platform
already has uncore_alias_group to update and then UPI topology group will
be added in next patches.
Fix current behavior and clear attr_update group related to mapping only.
Fixes: bb42b3d39781 ("perf/x86/intel/uncore: Expose an Uncore unit to IIO PMON mapping")
Signed-off-by: Alexander Antonov <alexander.antonov(a)linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz(a)infradead.org>
Reviewed-by: Kan Liang <kan.liang(a)linux.intel.com>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/r/20221117122833.3103580-4-alexander.antonov@linux.…
---
arch/x86/events/intel/uncore_snbep.c | 17 ++++++++++++++++-
1 file changed, 16 insertions(+), 1 deletion(-)
diff --git a/arch/x86/events/intel/uncore_snbep.c b/arch/x86/events/intel/uncore_snbep.c
index d3323f1..0d06b56 100644
--- a/arch/x86/events/intel/uncore_snbep.c
+++ b/arch/x86/events/intel/uncore_snbep.c
@@ -3872,6 +3872,21 @@ static const struct attribute_group *skx_iio_attr_update[] = {
NULL,
};
+static void pmu_clear_mapping_attr(const struct attribute_group **groups,
+ struct attribute_group *ag)
+{
+ int i;
+
+ for (i = 0; groups[i]; i++) {
+ if (groups[i] == ag) {
+ for (i++; groups[i]; i++)
+ groups[i - 1] = groups[i];
+ groups[i - 1] = NULL;
+ break;
+ }
+ }
+}
+
static int
pmu_set_mapping(struct intel_uncore_type *type, struct attribute_group *ag,
ssize_t (*show)(struct device*, struct device_attribute*, char*),
@@ -3926,7 +3941,7 @@ clear_attrs:
clear_topology:
pmu_free_topology(type);
clear_attr_update:
- type->attr_update = NULL;
+ pmu_clear_mapping_attr(type->attr_update, ag);
return ret;
}
Hi,
I'd like to request the follow commits to be backported to 5.15.y.
- dd0f230a0a80 ("mm: hwpoison: refactor refcount check handling")
- 4966455d9100 ("mm: hwpoison: handle non-anonymous THP correctly")
- a76054266661 ("mm: shmem: don't truncate page if memory failure happens")
These patches fixed a data lost issue by preventing shmem pagecache from
being removed by memory error. These were not tagged for stable originally,
but that's revisited recently.
Thanks,
Naoya Horiguchi
On Wed, 23 Nov 2022 14:25:58 -0500
Steven Rostedt <rostedt(a)goodmis.org> wrote:
> From: "Steven Rostedt (Google)" <rostedt(a)goodmis.org>
>
> After 65536 dynamic events have been added and removed, the "type" field
> of the event then uses the first type number that is available (not
> currently used by other events). A type number is the identifier of the
> binary blobs in the tracing ring buffer (known as events) to map them to
> logic that can parse the binary blob.
>
> The issue is that if a dynamic event (like a kprobe event) is traced and
> is in the ring buffer, and then that event is removed (because it is
> dynamic, which means it can be created and destroyed), if another dynamic
> event is created that has the same number that new event's logic on
> parsing the binary blob will be used.
>
> To show how this can be an issue, the following can crash the kernel:
>
> # cd /sys/kernel/tracing
> # for i in `seq 65536`; do
> echo 'p:kprobes/foo do_sys_openat2 $arg1:u32' > kprobe_events
> # done
>
> For every iteration of the above, the writing to the kprobe_events will
> remove the old event and create a new one (with the same format) and
> increase the type number to the next available on until the type number
> reaches over 65535 which is the max number for the 16 bit type. After it
> reaches that number, the logic to allocate a new number simply looks for
> the next available number. When an dynamic event is removed, that number
> is then available to be reused by the next dynamic event created. That is,
> once the above reaches the max number, the number assigned to the event in
> that loop will remain the same.
>
> Now that means deleting one dynamic event and created another will reuse
> the previous events type number. This is where bad things can happen.
> After the above loop finishes, the kprobes/foo event which reads the
> do_sys_openat2 function call's first parameter as an integer.
>
> # echo 1 > kprobes/foo/enable
> # cat /etc/passwd > /dev/null
> # cat trace
> cat-2211 [005] .... 2007.849603: foo: (do_sys_openat2+0x0/0x130) arg1=4294967196
> cat-2211 [005] .... 2007.849620: foo: (do_sys_openat2+0x0/0x130) arg1=4294967196
> cat-2211 [005] .... 2007.849838: foo: (do_sys_openat2+0x0/0x130) arg1=4294967196
> cat-2211 [005] .... 2007.849880: foo: (do_sys_openat2+0x0/0x130) arg1=4294967196
> # echo 0 > kprobes/foo/enable
>
> Now if we delete the kprobe and create a new one that reads a string:
>
> # echo 'p:kprobes/foo do_sys_openat2 +0($arg2):string' > kprobe_events
>
> And now we can the trace:
>
> # cat trace
> sendmail-1942 [002] ..... 530.136320: foo: (do_sys_openat2+0x0/0x240) arg1= cat-2046 [004] ..... 530.930817: foo: (do_sys_openat2+0x0/0x240) arg1="������������������������������������������������������������������������������������������������"
> cat-2046 [004] ..... 530.930961: foo: (do_sys_openat2+0x0/0x240) arg1="������������������������������������������������������������������������������������������������"
> cat-2046 [004] ..... 530.934278: foo: (do_sys_openat2+0x0/0x240) arg1="������������������������������������������������������������������������������������������������"
> cat-2046 [004] ..... 530.934563: foo: (do_sys_openat2+0x0/0x240) arg1="������������������������������������������������������������������������������������������������"
> bash-1515 [007] ..... 534.299093: foo: (do_sys_openat2+0x0/0x240) arg1="kkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk���������@��4Z����;Y�����U
Aah, right. Even if we remove one dynamic event, it was shown
as unknown events.
>
> And dmesg has:
>
> ==================================================================
> BUG: KASAN: use-after-free in string+0xd4/0x1c0
> Read of size 1 at addr ffff88805fdbbfa0 by task cat/2049
>
> CPU: 0 PID: 2049 Comm: cat Not tainted 6.1.0-rc6-test+ #641
> Hardware name: Hewlett-Packard HP Compaq Pro 6300 SFF/339A, BIOS K01 v03.03 07/14/2016
> Call Trace:
> <TASK>
> dump_stack_lvl+0x5b/0x77
> print_report+0x17f/0x47b
> kasan_report+0xad/0x130
> string+0xd4/0x1c0
> vsnprintf+0x500/0x840
> seq_buf_vprintf+0x62/0xc0
> trace_seq_printf+0x10e/0x1e0
> print_type_string+0x90/0xa0
> print_kprobe_event+0x16b/0x290
> print_trace_line+0x451/0x8e0
> s_show+0x72/0x1f0
> seq_read_iter+0x58e/0x750
> seq_read+0x115/0x160
> vfs_read+0x11d/0x460
> ksys_read+0xa9/0x130
> do_syscall_64+0x3a/0x90
> entry_SYSCALL_64_after_hwframe+0x63/0xcd
> RIP: 0033:0x7fc2e972ade2
> Code: c0 e9 b2 fe ff ff 50 48 8d 3d b2 3f 0a 00 e8 05 f0 01 00 0f 1f 44 00 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 0f 05 <48> 3d 00 f0 ff ff 77 56 c3 0f 1f 44 00 00 48 83 ec 28 48 89 54 24
> RSP: 002b:00007ffc64e687c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
> RAX: ffffffffffffffda RBX: 0000000000020000 RCX: 00007fc2e972ade2
> RDX: 0000000000020000 RSI: 00007fc2e980d000 RDI: 0000000000000003
> RBP: 00007fc2e980d000 R08: 00007fc2e980c010 R09: 0000000000000000
> R10: 0000000000000022 R11: 0000000000000246 R12: 0000000000020f00
> R13: 0000000000000003 R14: 0000000000020000 R15: 0000000000020000
> </TASK>
>
> The buggy address belongs to the physical page:
> page:ffffea00017f6ec0 refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x5fdbb
> flags: 0xfffffc0000000(node=0|zone=1|lastcpupid=0x1fffff)
> raw: 000fffffc0000000 0000000000000000 ffffea00017f6ec8 0000000000000000
> raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000
> page dumped because: kasan: bad access detected
>
> Memory state around the buggy address:
> ffff88805fdbbe80: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> ffff88805fdbbf00: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> >ffff88805fdbbf80: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> ^
> ffff88805fdbc000: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> ffff88805fdbc080: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> ==================================================================
>
> This was found when Zheng Yejian sent a patch to convert the even type
> number assignment to use IDA, which gives the next available number, and
> this bug showed up in the fuzz testing by Yujie Liu and the kernel test
> robot. But after further analysis, I found that this behavior is the same
> as when the event type numbers go past the 16bit max (and the above shows
> that).
>
> As modules have a similar issue, but is dealt with by setting a
> "WAS_ENABLED" flag when a module event is enabled, and when the module is
> freed, if any of its events were enabled, the ring buffer that holds that
> event is also cleared, to prevent reading stale events. The same can be
> done for dynamic events.
Indeed. If the dynamic event had not been enabled, there is no reason
to clear the buffer.
This looks good to me.
Acked-by: Masami Hiramatsu (Google) <mhiramat(a)kernel.org>
Thank you!
>
> If any dynamic event that is being removed was enabled, then make sure the
> buffers they were enabled in are now cleared.
>
> Link: https://lore.kernel.org/all/20221110020319.1259291-1-zhengyejian1@huawei.co…
>
> Cc: stable(a)vger.kernel.org
> Depends-on: TBD ("tracing: Add tracing_reset_all_online_cpus_unlocked() function")
> Depends-on: 5448d44c38557 ("tracing: Add unified dynamic event framework")
> Depends-on: 6212dd29683ee ("tracing/kprobes: Use dyn_event framework for kprobe events")
> Depends-on: 065e63f951432 ("tracing: Only have rmmod clear buffers that its events were active in")
> Depends-on: 575380da8b469 ("tracing: Only clear trace buffer on module unload if event was traced")
> Fixes: 77b44d1b7c283 ("tracing/kprobes: Rename Kprobe-tracer to kprobe-event")
> Reported-by: Zheng Yejian <zhengyejian1(a)huawei.com>
> Reported-by: Yujie Liu <yujie.liu(a)intel.com>
> Reported-by: kernel test robot <yujie.liu(a)intel.com>
> Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
> ---
> kernel/trace/trace_dynevent.c | 2 ++
> kernel/trace/trace_events.c | 11 ++++++++++-
> 2 files changed, 12 insertions(+), 1 deletion(-)
>
> diff --git a/kernel/trace/trace_dynevent.c b/kernel/trace/trace_dynevent.c
> index 154996684fb5..4376887e0d8a 100644
> --- a/kernel/trace/trace_dynevent.c
> +++ b/kernel/trace/trace_dynevent.c
> @@ -118,6 +118,7 @@ int dyn_event_release(const char *raw_command, struct dyn_event_operations *type
> if (ret)
> break;
> }
> + tracing_reset_all_online_cpus();
> mutex_unlock(&event_mutex);
> out:
> argv_free(argv);
> @@ -214,6 +215,7 @@ int dyn_events_release_all(struct dyn_event_operations *type)
> break;
> }
> out:
> + tracing_reset_all_online_cpus();
> mutex_unlock(&event_mutex);
>
> return ret;
> diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c
> index 0449e3c7d327..3bfaf560ecc4 100644
> --- a/kernel/trace/trace_events.c
> +++ b/kernel/trace/trace_events.c
> @@ -2947,7 +2947,10 @@ static int probe_remove_event_call(struct trace_event_call *call)
> * TRACE_REG_UNREGISTER.
> */
> if (file->flags & EVENT_FILE_FL_ENABLED)
> - return -EBUSY;
> + goto busy;
> +
> + if (file->flags & EVENT_FILE_FL_WAS_ENABLED)
> + tr->clear_trace = true;
> /*
> * The do_for_each_event_file_safe() is
> * a double loop. After finding the call for this
> @@ -2960,6 +2963,12 @@ static int probe_remove_event_call(struct trace_event_call *call)
> __trace_remove_event_call(call);
>
> return 0;
> + busy:
> + /* No need to clear the trace now */
> + list_for_each_entry(tr, &ftrace_trace_arrays, list) {
> + tr->clear_trace = false;
> + }
> + return -EBUSY;
> }
>
> /* Remove an event_call */
> --
> 2.35.1
>
>
--
Masami Hiramatsu (Google) <mhiramat(a)kernel.org>
The patch titled
Subject: mm/khugepaged: invoke MMU notifiers in shmem/file collapse paths
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
mm-khugepaged-invoke-mmu-notifiers-in-shmem-file-collapse-paths.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Jann Horn <jannh(a)google.com>
Subject: mm/khugepaged: invoke MMU notifiers in shmem/file collapse paths
Date: Wed, 23 Nov 2022 17:56:52 +0100
Any codepath that zaps page table entries must invoke MMU notifiers to
ensure that secondary MMUs (like KVM) don't keep accessing pages which
aren't mapped anymore. Secondary MMUs don't hold their own references to
pages that are mirrored over, so failing to notify them can lead to page
use-after-free.
I'm marking this as addressing an issue introduced in commit f3f0e1d2150b
("khugepaged: add support of collapse for tmpfs/shmem pages"), but most of
the security impact of this only came in commit 27e1f8273113 ("khugepaged:
enable collapse pmd for pte-mapped THP"), which actually omitted flushes
for the removal of present PTEs, not just for the removal of empty page
tables.
Link: https://lkml.kernel.org/r/20221123165652.2204925-5-jannh@google.com
Fixes: f3f0e1d2150b ("khugepaged: add support of collapse for tmpfs/shmem p=
ages")
Signed-off-by: Jann Horn <jannh(a)google.com>
Cc: David Hildenbrand <david(a)redhat.com>
Cc: John Hubbard <jhubbard(a)nvidia.com>
Cc: Mike Kravetz <mike.kravetz(a)oracle.com>
Cc: Peter Xu <peterx(a)redhat.com>
Cc: Yang Shi <shy828301(a)gmail.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/khugepaged.c | 5 +++++
1 file changed, 5 insertions(+)
--- a/mm/khugepaged.c~mm-khugepaged-invoke-mmu-notifiers-in-shmem-file-collapse-paths
+++ a/mm/khugepaged.c
@@ -1421,6 +1421,7 @@ static void collapse_and_free_pmd(struct
{
pmd_t pmd;
struct mmu_gather tlb;
+ struct mmu_notifier_range range;
mmap_assert_write_locked(mm);
if (vma->vm_file)
@@ -1433,12 +1434,16 @@ static void collapse_and_free_pmd(struct
lockdep_assert_held_write(&vma->anon_vma->root->rwsem);
page_table_check_pte_clear_range(mm, addr, pmd);
+ mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, NULL, mm, addr,
+ addr + HPAGE_PMD_SIZE);
+ mmu_notifier_invalidate_range_start(&range);
tlb_gather_mmu(&tlb, mm);
pmd = READ_ONCE(*pmdp);
pmd_clear(pmdp);
tlb_flush_pte_range(&tlb, addr, HPAGE_PMD_SIZE);
pte_free_tlb(&tlb, pmd_pgtable(pmd), addr);
tlb_finish_mmu(&tlb);
+ mmu_notifier_invalidate_range_end(&range);
mm_dec_nr_ptes(mm);
}
_
Patches currently in -mm which might be from jannh(a)google.com are
mm-khugepaged-take-the-right-locks-for-page-table-retraction.patch
mmu_gather-use-macro-arguments-more-carefully.patch
mm-khugepaged-fix-gup-fast-interaction-by-freeing-ptes-via-mmu_gather.patch
mm-khugepaged-invoke-mmu-notifiers-in-shmem-file-collapse-paths.patch
The patch titled
Subject: mm/khugepaged: fix GUP-fast interaction by freeing ptes via mmu_gather
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
mm-khugepaged-fix-gup-fast-interaction-by-freeing-ptes-via-mmu_gather.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Jann Horn <jannh(a)google.com>
Subject: mm/khugepaged: fix GUP-fast interaction by freeing ptes via mmu_gather
Date: Wed, 23 Nov 2022 17:56:51 +0100
Since commit 70cbc3cc78a99 ("mm: gup: fix the fast GUP race against THP
collapse"), the lockless_pages_from_mm() fastpath rechecks the pmd_t to
ensure that the page table was not removed by khugepaged in between.
However, lockless_pages_from_mm() still requires that the page table is
not concurrently freed. We could provide this guarantee in khugepaged by
using some variant of pte_free() with appropriate delay; but such a helper
doesn't really exist outside the mmu_gather infrastructure.
To avoid having to wire up a new codepath for freeing page tables that
might have been in use in the past, fix the issue by letting khugepaged
deposit a fresh page table (if required) instead of depositing the
existing page table, and free the old page table via mmu_gather.
Link: https://lkml.kernel.org/r/20221123165652.2204925-4-jannh@google.com
Fixes: ba76149f47d8 ("thp: khugepaged")
Signed-off-by: Jann Horn <jannh(a)google.com>
Cc: David Hildenbrand <david(a)redhat.com>
Cc: John Hubbard <jhubbard(a)nvidia.com>
Cc: Mike Kravetz <mike.kravetz(a)oracle.com>
Cc: Peter Xu <peterx(a)redhat.com>
Cc: Yang Shi <shy828301(a)gmail.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/khugepaged.c | 47 ++++++++++++++++++++++++++++++++++++----------
1 file changed, 37 insertions(+), 10 deletions(-)
--- a/mm/khugepaged.c~mm-khugepaged-fix-gup-fast-interaction-by-freeing-ptes-via-mmu_gather
+++ a/mm/khugepaged.c
@@ -975,6 +975,8 @@ static int collapse_huge_page(struct mm_
int result = SCAN_FAIL;
struct vm_area_struct *vma;
struct mmu_notifier_range range;
+ struct mmu_gather tlb;
+ pgtable_t deposit_table = NULL;
VM_BUG_ON(address & ~HPAGE_PMD_MASK);
@@ -989,6 +991,11 @@ static int collapse_huge_page(struct mm_
result = alloc_charge_hpage(&hpage, mm, cc);
if (result != SCAN_SUCCEED)
goto out_nolock;
+ deposit_table = pte_alloc_one(mm);
+ if (!deposit_table) {
+ result = SCAN_FAIL;
+ goto out_nolock;
+ }
mmap_read_lock(mm);
result = hugepage_vma_revalidate(mm, address, true, &vma, cc);
@@ -1041,12 +1048,12 @@ static int collapse_huge_page(struct mm_
pmd_ptl = pmd_lock(mm, pmd); /* probably unnecessary */
/*
- * This removes any huge TLB entry from the CPU so we won't allow
- * huge and small TLB entries for the same virtual address to
- * avoid the risk of CPU bugs in that area.
- *
- * Parallel fast GUP is fine since fast GUP will back off when
- * it detects PMD is changed.
+ * Unlink the page table from the PMD and do a TLB flush.
+ * This ensures that the CPUs can't write to the old pages anymore by
+ * the time __collapse_huge_page_copy() copies their contents, and it
+ * allows __collapse_huge_page_copy() to free the old pages.
+ * This also prevents lockless_pages_from_mm() from grabbing references
+ * on the old pages from here on.
*/
_pmd = pmdp_collapse_flush(vma, address, pmd);
spin_unlock(pmd_ptl);
@@ -1090,6 +1097,16 @@ static int collapse_huge_page(struct mm_
__SetPageUptodate(hpage);
pgtable = pmd_pgtable(_pmd);
+ /*
+ * Discard the old page table.
+ * The TLB flush that's implied here is redundant, but hard to avoid
+ * with the current API.
+ */
+ tlb_gather_mmu(&tlb, mm);
+ tlb_flush_pte_range(&tlb, address, HPAGE_PMD_SIZE);
+ pte_free_tlb(&tlb, pgtable, address);
+ tlb_finish_mmu(&tlb);
+
_pmd = mk_huge_pmd(hpage, vma->vm_page_prot);
_pmd = maybe_pmd_mkwrite(pmd_mkdirty(_pmd), vma);
@@ -1097,7 +1114,8 @@ static int collapse_huge_page(struct mm_
BUG_ON(!pmd_none(*pmd));
page_add_new_anon_rmap(hpage, vma, address);
lru_cache_add_inactive_or_unevictable(hpage, vma);
- pgtable_trans_huge_deposit(mm, pmd, pgtable);
+ pgtable_trans_huge_deposit(mm, pmd, deposit_table);
+ deposit_table = NULL;
set_pmd_at(mm, address, pmd, _pmd);
update_mmu_cache_pmd(vma, address, pmd);
spin_unlock(pmd_ptl);
@@ -1112,6 +1130,8 @@ out_nolock:
mem_cgroup_uncharge(page_folio(hpage));
put_page(hpage);
}
+ if (deposit_table)
+ pte_free(mm, deposit_table);
trace_mm_collapse_huge_page(mm, result == SCAN_SUCCEED, result);
return result;
}
@@ -1393,11 +1413,14 @@ static int set_huge_pmd(struct vm_area_s
* The mmap lock together with this VMA's rmap locks covers all paths towards
* the page table entries we're messing with here, except for hardware page
* table walks and lockless_pages_from_mm().
+ *
+ * This function is similar to free_pte_range().
*/
static void collapse_and_free_pmd(struct mm_struct *mm, struct vm_area_struct *vma,
unsigned long addr, pmd_t *pmdp)
{
pmd_t pmd;
+ struct mmu_gather tlb;
mmap_assert_write_locked(mm);
if (vma->vm_file)
@@ -1408,11 +1431,15 @@ static void collapse_and_free_pmd(struct
*/
if (vma->anon_vma)
lockdep_assert_held_write(&vma->anon_vma->root->rwsem);
+ page_table_check_pte_clear_range(mm, addr, pmd);
- pmd = pmdp_collapse_flush(vma, addr, pmdp);
+ tlb_gather_mmu(&tlb, mm);
+ pmd = READ_ONCE(*pmdp);
+ pmd_clear(pmdp);
+ tlb_flush_pte_range(&tlb, addr, HPAGE_PMD_SIZE);
+ pte_free_tlb(&tlb, pmd_pgtable(pmd), addr);
+ tlb_finish_mmu(&tlb);
mm_dec_nr_ptes(mm);
- page_table_check_pte_clear_range(mm, addr, pmd);
- pte_free(mm, pmd_pgtable(pmd));
}
/**
_
Patches currently in -mm which might be from jannh(a)google.com are
mm-khugepaged-take-the-right-locks-for-page-table-retraction.patch
mmu_gather-use-macro-arguments-more-carefully.patch
mm-khugepaged-fix-gup-fast-interaction-by-freeing-ptes-via-mmu_gather.patch
mm-khugepaged-invoke-mmu-notifiers-in-shmem-file-collapse-paths.patch
The patch titled
Subject: mmu_gather: Use macro arguments more carefully
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
mmu_gather-use-macro-arguments-more-carefully.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Jann Horn <jannh(a)google.com>
Subject: mmu_gather: Use macro arguments more carefully
Date: Wed, 23 Nov 2022 17:56:50 +0100
Avoid breaking stuff when the tlb parameter is an expression like "&tlb".
The following commit relies on this when calling pte_free_tlb().
(Going forward it would probably be a good idea to change macros like this
into inline functions...)
Link: https://lkml.kernel.org/r/20221123165652.2204925-3-jannh@google.com
Fixes: a6d60245d6d9 ("asm-generic/tlb: Track which levels of the page table=
s have been cleared")
Signed-off-by: Jann Horn <jannh(a)google.com>
Cc: David Hildenbrand <david(a)redhat.com>
Cc: John Hubbard <jhubbard(a)nvidia.com>
Cc: Mike Kravetz <mike.kravetz(a)oracle.com>
Cc: Peter Xu <peterx(a)redhat.com>
Cc: Yang Shi <shy828301(a)gmail.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
include/asm-generic/tlb.h | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
--- a/include/asm-generic/tlb.h~mmu_gather-use-macro-arguments-more-carefully
+++ a/include/asm-generic/tlb.h
@@ -630,7 +630,7 @@ static inline void tlb_flush_p4d_range(s
#define pte_free_tlb(tlb, ptep, address) \
do { \
tlb_flush_pmd_range(tlb, address, PAGE_SIZE); \
- tlb->freed_tables = 1; \
+ (tlb)->freed_tables = 1; \
__pte_free_tlb(tlb, ptep, address); \
} while (0)
#endif
@@ -639,7 +639,7 @@ static inline void tlb_flush_p4d_range(s
#define pmd_free_tlb(tlb, pmdp, address) \
do { \
tlb_flush_pud_range(tlb, address, PAGE_SIZE); \
- tlb->freed_tables = 1; \
+ (tlb)->freed_tables = 1; \
__pmd_free_tlb(tlb, pmdp, address); \
} while (0)
#endif
@@ -648,7 +648,7 @@ static inline void tlb_flush_p4d_range(s
#define pud_free_tlb(tlb, pudp, address) \
do { \
tlb_flush_p4d_range(tlb, address, PAGE_SIZE); \
- tlb->freed_tables = 1; \
+ (tlb)->freed_tables = 1; \
__pud_free_tlb(tlb, pudp, address); \
} while (0)
#endif
@@ -657,7 +657,7 @@ static inline void tlb_flush_p4d_range(s
#define p4d_free_tlb(tlb, pudp, address) \
do { \
__tlb_adjust_range(tlb, address, PAGE_SIZE); \
- tlb->freed_tables = 1; \
+ (tlb)->freed_tables = 1; \
__p4d_free_tlb(tlb, pudp, address); \
} while (0)
#endif
_
Patches currently in -mm which might be from jannh(a)google.com are
mm-khugepaged-take-the-right-locks-for-page-table-retraction.patch
mmu_gather-use-macro-arguments-more-carefully.patch
mm-khugepaged-fix-gup-fast-interaction-by-freeing-ptes-via-mmu_gather.patch
mm-khugepaged-invoke-mmu-notifiers-in-shmem-file-collapse-paths.patch
The patch titled
Subject: mm/khugepaged: take the right locks for page table retraction
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
mm-khugepaged-take-the-right-locks-for-page-table-retraction.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Jann Horn <jannh(a)google.com>
Subject: mm/khugepaged: take the right locks for page table retraction
Date: Wed, 23 Nov 2022 17:56:49 +0100
Patch series "khugepaged fixes, take two", v2.
This patch (of 4):
pagetable walks on address ranges mapped by VMAs can be done under the
mmap lock, the lock of an anon_vma attached to the VMA, or the lock of the
VMA's address_space. Only one of these needs to be held, and it does not
need to be held in exclusive mode.
Under those circumstances, the rules for concurrent access to page table
entries are:
- Terminal page table entries (entries that don't point to another page
table) can be arbitrarily changed under the page table lock, with the
exception that they always need to be consistent for
hardware page table walks and lockless_pages_from_mm().
This includes that they can be changed into non-terminal entries.
- Non-terminal page table entries (which point to another page table)
can not be modified; readers are allowed to READ_ONCE() an entry, verify
that it is non-terminal, and then assume that its value will stay as-is.
Retracting a page table involves modifying a non-terminal entry, so
page-table-level locks are insufficient to protect against concurrent page
table traversal; it requires taking all the higher-level locks under which
it is possible to start a page walk in the relevant range in exclusive
mode.
The collapse_huge_page() path for anonymous THP already follows this rule,
but the shmem/file THP path was getting it wrong, making it possible for
concurrent rmap-based operations to cause corruption.
Link: https://lkml.kernel.org/r/20221123165652.2204925-1-jannh@google.com
Link: https://lkml.kernel.org/r/20221123165652.2204925-2-jannh@google.com
Fixes: 27e1f8273113 ("khugepaged: enable collapse pmd for pte-mapped THP")
Signed-off-by: Jann Horn <jannh(a)google.com>
Cc: David Hildenbrand <david(a)redhat.com>
Cc: John Hubbard <jhubbard(a)nvidia.com>
Cc: Peter Xu <peterx(a)redhat.com>
Cc: Yang Shi <shy828301(a)gmail.com>
Cc: Mike Kravetz <mike.kravetz(a)oracle.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/khugepaged.c | 55 ++++++++++++++++++++++++++++++++++++++++++----
1 file changed, 51 insertions(+), 4 deletions(-)
--- a/mm/khugepaged.c~mm-khugepaged-take-the-right-locks-for-page-table-retraction
+++ a/mm/khugepaged.c
@@ -1379,16 +1379,37 @@ static int set_huge_pmd(struct vm_area_s
return SCAN_SUCCEED;
}
+/*
+ * A note about locking:
+ * Trying to take the page table spinlocks would be useless here because those
+ * are only used to synchronize:
+ *
+ * - modifying terminal entries (ones that point to a data page, not to another
+ * page table)
+ * - installing *new* non-terminal entries
+ *
+ * Instead, we need roughly the same kind of protection as free_pgtables() or
+ * mm_take_all_locks() (but only for a single VMA):
+ * The mmap lock together with this VMA's rmap locks covers all paths towards
+ * the page table entries we're messing with here, except for hardware page
+ * table walks and lockless_pages_from_mm().
+ */
static void collapse_and_free_pmd(struct mm_struct *mm, struct vm_area_struct *vma,
unsigned long addr, pmd_t *pmdp)
{
- spinlock_t *ptl;
pmd_t pmd;
mmap_assert_write_locked(mm);
- ptl = pmd_lock(vma->vm_mm, pmdp);
+ if (vma->vm_file)
+ lockdep_assert_held_write(&vma->vm_file->f_mapping->i_mmap_rwsem);
+ /*
+ * All anon_vmas attached to the VMA have the same root and are
+ * therefore locked by the same lock.
+ */
+ if (vma->anon_vma)
+ lockdep_assert_held_write(&vma->anon_vma->root->rwsem);
+
pmd = pmdp_collapse_flush(vma, addr, pmdp);
- spin_unlock(ptl);
mm_dec_nr_ptes(mm);
page_table_check_pte_clear_range(mm, addr, pmd);
pte_free(mm, pmd_pgtable(pmd));
@@ -1439,6 +1460,14 @@ int collapse_pte_mapped_thp(struct mm_st
if (!hugepage_vma_check(vma, vma->vm_flags, false, false, false))
return SCAN_VMA_CHECK;
+ /*
+ * Symmetry with retract_page_tables(): Exclude MAP_PRIVATE mappings
+ * that got written to. Without this, we'd have to also lock the
+ * anon_vma if one exists.
+ */
+ if (vma->anon_vma)
+ return SCAN_VMA_CHECK;
+
/* Keep pmd pgtable for uffd-wp; see comment in retract_page_tables() */
if (userfaultfd_wp(vma))
return SCAN_PTE_UFFD_WP;
@@ -1472,6 +1501,20 @@ int collapse_pte_mapped_thp(struct mm_st
goto drop_hpage;
}
+ /*
+ * We need to lock the mapping so that from here on, only GUP-fast and
+ * hardware page walks can access the parts of the page tables that
+ * we're operating on.
+ * See collapse_and_free_pmd().
+ */
+ i_mmap_lock_write(vma->vm_file->f_mapping);
+
+ /*
+ * This spinlock should be unnecessary: Nobody else should be accessing
+ * the page tables under spinlock protection here, only
+ * lockless_pages_from_mm() and the hardware page walker can access page
+ * tables while all the high-level locks are held in write mode.
+ */
start_pte = pte_offset_map_lock(mm, pmd, haddr, &ptl);
result = SCAN_FAIL;
@@ -1526,6 +1569,8 @@ int collapse_pte_mapped_thp(struct mm_st
/* step 4: remove pte entries */
collapse_and_free_pmd(mm, vma, haddr, pmd);
+ i_mmap_unlock_write(vma->vm_file->f_mapping);
+
maybe_install_pmd:
/* step 5: install pmd entry */
result = install_pmd
@@ -1539,6 +1584,7 @@ drop_hpage:
abort:
pte_unmap_unlock(start_pte, ptl);
+ i_mmap_unlock_write(vma->vm_file->f_mapping);
goto drop_hpage;
}
@@ -1595,7 +1641,8 @@ static int retract_page_tables(struct ad
* An alternative would be drop the check, but check that page
* table is clear before calling pmdp_collapse_flush() under
* ptl. It has higher chance to recover THP for the VMA, but
- * has higher cost too.
+ * has higher cost too. It would also probably require locking
+ * the anon_vma.
*/
if (vma->anon_vma) {
result = SCAN_PAGE_ANON;
_
Patches currently in -mm which might be from jannh(a)google.com are
mm-khugepaged-take-the-right-locks-for-page-table-retraction.patch
mmu_gather-use-macro-arguments-more-carefully.patch
mm-khugepaged-fix-gup-fast-interaction-by-freeing-ptes-via-mmu_gather.patch
mm-khugepaged-invoke-mmu-notifiers-in-shmem-file-collapse-paths.patch
Make nfsd_splice_actor work with reads with a non-zero offset that doesn't end on a page boundary.
This was found when virtual machines with nfs-mounted qcow2 disks failed to boot properly (originally found
on v6.0.5, fix also needed and tested on v6.0.9 and v6.1-rc6).
Signed-off-by: Anders Blomdell <anders.blomdell(a)control.lth.se>
Link: https://bugzilla.redhat.com/show_bug.cgi?id=2142132
Fixes: bfbfb6182ad1 "nfsd_splice_actor(): handle compound pages"
Cc: stable(a)vger.kernel.org # v6.0+
--- a/fs/nfsd/vfs.c
+++ b/fs/nfsd/vfs.c
@@ -873,7 +873,7 @@ nfsd_splice_actor(struct pipe_inode_info *pipe, struct pipe_buffer *buf,
unsigned offset = buf->offset;
page += offset / PAGE_SIZE;
- for (int i = sd->len; i > 0; i -= PAGE_SIZE)
+ for (int i = sd->len + offset % PAGE_SIZE; i > 0; i -= PAGE_SIZE)
svc_rqst_replace_page(rqstp, page++);
if (rqstp->rq_res.page_len == 0) // first call
rqstp->rq_res.page_base = offset % PAGE_SIZE;
--
Anders Blomdell Email: anders.blomdell(a)control.lth.se
Department of Automatic Control
Lund University Phone: +46 46 222 4625
P.O. Box 118
SE-221 00 Lund, Sweden
From: Lukas Wunner <lukas(a)wunner.de>
[ Upstream commit 038ee49fef18710bedd38b531d173ccd746b2d8d ]
RS485-enabled UART ports on TI Sitara SoCs with active-low polarity
exhibit a Transmit Enable glitch on ->set_termios():
omap8250_restore_regs(), which is called from omap_8250_set_termios(),
sets the TCRTLR bit in the MCR register and clears all other bits,
including RTS. If RTS uses active-low polarity, it is now asserted
for no reason.
The TCRTLR bit is subsequently cleared by writing up->mcr to the MCR
register. That variable is always zero, so the RTS bit is still cleared
(incorrectly so if RTS is active-high).
(up->mcr is not, as one might think, a cache of the MCR register's
current value. Rather, it only caches a single bit of that register,
the AFE bit. And it only does so if the UART supports the AFE bit,
which OMAP does not. For details see serial8250_do_set_termios() and
serial8250_do_set_mctrl().)
Finally at the end of omap8250_restore_regs(), the MCR register is
restored (and RTS deasserted) by a call to up->port.ops->set_mctrl()
(which equals serial8250_set_mctrl()) and serial8250_em485_stop_tx().
So there's an RTS glitch between setting TCRTLR and calling
serial8250_em485_stop_tx(). Avoid by using a read-modify-write
when setting TCRTLR.
While at it, drop a redundant initialization of up->mcr. As explained
above, the variable isn't used by the driver and it is already
initialized to zero because it is part of the static struct
serial8250_ports[] declared in 8250_core.c. (Static structs are
initialized to zero per section 6.7.8 nr. 10 of the C99 standard.)
Cc: Jan Kiszka <jan.kiszka(a)siemens.com>
Cc: Su Bao Cheng <baocheng.su(a)siemens.com>
Tested-by: Matthias Schiffer <matthias.schiffer(a)ew.tq-group.com>
Signed-off-by: Lukas Wunner <lukas(a)wunner.de>
Link: https://lore.kernel.org/r/6554b0241a2c7fd50f32576fdbafed96709e11e8.16642789…
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
drivers/tty/serial/8250/8250_omap.c | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/drivers/tty/serial/8250/8250_omap.c b/drivers/tty/serial/8250/8250_omap.c
index c551407bee07..7795e6401a93 100644
--- a/drivers/tty/serial/8250/8250_omap.c
+++ b/drivers/tty/serial/8250/8250_omap.c
@@ -259,6 +259,7 @@ static void omap8250_restore_regs(struct uart_8250_port *up)
{
struct omap8250_priv *priv = up->port.private_data;
struct uart_8250_dma *dma = up->dma;
+ u8 mcr = serial8250_in_MCR(up);
if (dma && dma->tx_running) {
/*
@@ -275,7 +276,7 @@ static void omap8250_restore_regs(struct uart_8250_port *up)
serial_out(up, UART_EFR, UART_EFR_ECB);
serial_out(up, UART_LCR, UART_LCR_CONF_MODE_A);
- serial8250_out_MCR(up, UART_MCR_TCRTLR);
+ serial8250_out_MCR(up, mcr | UART_MCR_TCRTLR);
serial_out(up, UART_FCR, up->fcr);
omap8250_update_scr(up, priv);
@@ -291,7 +292,8 @@ static void omap8250_restore_regs(struct uart_8250_port *up)
serial_out(up, UART_LCR, 0);
/* drop TCR + TLR access, we setup XON/XOFF later */
- serial8250_out_MCR(up, up->mcr);
+ serial8250_out_MCR(up, mcr);
+
serial_out(up, UART_IER, up->ier);
serial_out(up, UART_LCR, UART_LCR_CONF_MODE_B);
@@ -600,7 +602,6 @@ static int omap_8250_startup(struct uart_port *port)
pm_runtime_get_sync(port->dev);
- up->mcr = 0;
serial_out(up, UART_FCR, UART_FCR_CLEAR_RCVR | UART_FCR_CLEAR_XMIT);
serial_out(up, UART_LCR, UART_LCR_WLEN8);
--
2.35.1
It can take more than one second to check each connector
when the system is resumed. So if you have, say, eight
connectors, it may take eight seconds for ucsi_resume() to
finish. That's a bit too much.
This will modify ucsi_resume() so that it schedules a work
where the interface is actually resumed instead of checking
the connectors directly. The connections will also be
checked in separate tasks which are queued for each connector
separately.
Reported-by: Todd Brandt <todd.e.brandt(a)intel.com>
Fixes: 99f6d4361113 ("usb: typec: ucsi: Check the connection on resume")
Link: https://bugzilla.kernel.org/show_bug.cgi?id=216706
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Heikki Krogerus <heikki.krogerus(a)linux.intel.com>
---
drivers/usb/typec/ucsi/ucsi.c | 17 +++++++++++++----
drivers/usb/typec/ucsi/ucsi.h | 1 +
2 files changed, 14 insertions(+), 4 deletions(-)
diff --git a/drivers/usb/typec/ucsi/ucsi.c b/drivers/usb/typec/ucsi/ucsi.c
index a7987fc764cc6..eabe519013e78 100644
--- a/drivers/usb/typec/ucsi/ucsi.c
+++ b/drivers/usb/typec/ucsi/ucsi.c
@@ -1270,8 +1270,9 @@ static int ucsi_init(struct ucsi *ucsi)
return ret;
}
-int ucsi_resume(struct ucsi *ucsi)
+static void ucsi_resume_work(struct work_struct *work)
{
+ struct ucsi *ucsi = container_of(work, struct ucsi, resume_work);
struct ucsi_connector *con;
u64 command;
int ret;
@@ -1279,15 +1280,21 @@ int ucsi_resume(struct ucsi *ucsi)
/* Restore UCSI notification enable mask after system resume */
command = UCSI_SET_NOTIFICATION_ENABLE | ucsi->ntfy;
ret = ucsi_send_command(ucsi, command, NULL, 0);
- if (ret < 0)
- return ret;
+ if (ret < 0) {
+ dev_err(ucsi->dev, "failed to re-enable notifications (%d)\n", ret);
+ return;
+ }
for (con = ucsi->connector; con->port; con++) {
mutex_lock(&con->lock);
- ucsi_check_connection(con);
+ ucsi_partner_task(con, ucsi_check_connection, 1, 0);
mutex_unlock(&con->lock);
}
+}
+int ucsi_resume(struct ucsi *ucsi)
+{
+ queue_work(system_long_wq, &ucsi->resume_work);
return 0;
}
EXPORT_SYMBOL_GPL(ucsi_resume);
@@ -1347,6 +1354,7 @@ struct ucsi *ucsi_create(struct device *dev, const struct ucsi_operations *ops)
if (!ucsi)
return ERR_PTR(-ENOMEM);
+ INIT_WORK(&ucsi->resume_work, ucsi_resume_work);
INIT_DELAYED_WORK(&ucsi->work, ucsi_init_work);
mutex_init(&ucsi->ppm_lock);
ucsi->dev = dev;
@@ -1401,6 +1409,7 @@ void ucsi_unregister(struct ucsi *ucsi)
/* Make sure that we are not in the middle of driver initialization */
cancel_delayed_work_sync(&ucsi->work);
+ cancel_work_sync(&ucsi->resume_work);
/* Disable notifications */
ucsi->ops->async_write(ucsi, UCSI_CONTROL, &cmd, sizeof(cmd));
diff --git a/drivers/usb/typec/ucsi/ucsi.h b/drivers/usb/typec/ucsi/ucsi.h
index 8eb391e3e592c..c968474ee5473 100644
--- a/drivers/usb/typec/ucsi/ucsi.h
+++ b/drivers/usb/typec/ucsi/ucsi.h
@@ -287,6 +287,7 @@ struct ucsi {
struct ucsi_capability cap;
struct ucsi_connector *connector;
+ struct work_struct resume_work;
struct delayed_work work;
int work_count;
#define UCSI_ROLE_SWITCH_RETRY_PER_HZ 10
--
2.35.1
It can take more than one second to check each connector
when the system is resumed. So if you have, say, eight
connectors, it may take eight seconds for ucsi_resume() to
finish. That's a bit too much.
This will modify ucsi_resume() so that it schedules a work
where the interface is actually resumed instead of checking
the connectors directly. The connections will also be
checked in separate tasks which are queued for each connector
separately.
Reported-by: Todd Brandt <todd.e.brandt(a)intel.com>
Fixes: f9f019f7d849 ("usb: typec: ucsi: Resume in separate work")
Link: https://bugzilla.kernel.org/show_bug.cgi?id=216706
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Heikki Krogerus <heikki.krogerus(a)linux.intel.com>
---
drivers/usb/typec/ucsi/ucsi.c | 17 +++++++++++++----
drivers/usb/typec/ucsi/ucsi.h | 1 +
2 files changed, 14 insertions(+), 4 deletions(-)
diff --git a/drivers/usb/typec/ucsi/ucsi.c b/drivers/usb/typec/ucsi/ucsi.c
index a7987fc764cc6..eabe519013e78 100644
--- a/drivers/usb/typec/ucsi/ucsi.c
+++ b/drivers/usb/typec/ucsi/ucsi.c
@@ -1270,8 +1270,9 @@ static int ucsi_init(struct ucsi *ucsi)
return ret;
}
-int ucsi_resume(struct ucsi *ucsi)
+static void ucsi_resume_work(struct work_struct *work)
{
+ struct ucsi *ucsi = container_of(work, struct ucsi, resume_work);
struct ucsi_connector *con;
u64 command;
int ret;
@@ -1279,15 +1280,21 @@ int ucsi_resume(struct ucsi *ucsi)
/* Restore UCSI notification enable mask after system resume */
command = UCSI_SET_NOTIFICATION_ENABLE | ucsi->ntfy;
ret = ucsi_send_command(ucsi, command, NULL, 0);
- if (ret < 0)
- return ret;
+ if (ret < 0) {
+ dev_err(ucsi->dev, "failed to re-enable notifications (%d)\n", ret);
+ return;
+ }
for (con = ucsi->connector; con->port; con++) {
mutex_lock(&con->lock);
- ucsi_check_connection(con);
+ ucsi_partner_task(con, ucsi_check_connection, 1, 0);
mutex_unlock(&con->lock);
}
+}
+int ucsi_resume(struct ucsi *ucsi)
+{
+ queue_work(system_long_wq, &ucsi->resume_work);
return 0;
}
EXPORT_SYMBOL_GPL(ucsi_resume);
@@ -1347,6 +1354,7 @@ struct ucsi *ucsi_create(struct device *dev, const struct ucsi_operations *ops)
if (!ucsi)
return ERR_PTR(-ENOMEM);
+ INIT_WORK(&ucsi->resume_work, ucsi_resume_work);
INIT_DELAYED_WORK(&ucsi->work, ucsi_init_work);
mutex_init(&ucsi->ppm_lock);
ucsi->dev = dev;
@@ -1401,6 +1409,7 @@ void ucsi_unregister(struct ucsi *ucsi)
/* Make sure that we are not in the middle of driver initialization */
cancel_delayed_work_sync(&ucsi->work);
+ cancel_work_sync(&ucsi->resume_work);
/* Disable notifications */
ucsi->ops->async_write(ucsi, UCSI_CONTROL, &cmd, sizeof(cmd));
diff --git a/drivers/usb/typec/ucsi/ucsi.h b/drivers/usb/typec/ucsi/ucsi.h
index 8eb391e3e592c..c968474ee5473 100644
--- a/drivers/usb/typec/ucsi/ucsi.h
+++ b/drivers/usb/typec/ucsi/ucsi.h
@@ -287,6 +287,7 @@ struct ucsi {
struct ucsi_capability cap;
struct ucsi_connector *connector;
+ struct work_struct resume_work;
struct delayed_work work;
int work_count;
#define UCSI_ROLE_SWITCH_RETRY_PER_HZ 10
--
2.35.1
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
5121197ecc5d ("kcm: close race conditions on sk_receive_queue")
bbb03029a899 ("strparser: Generalize strparser")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 5121197ecc5db58c07da95eb1ff82b98b121a221 Mon Sep 17 00:00:00 2001
From: Cong Wang <cong.wang(a)bytedance.com>
Date: Sun, 13 Nov 2022 16:51:19 -0800
Subject: [PATCH] kcm: close race conditions on sk_receive_queue
sk->sk_receive_queue is protected by skb queue lock, but for KCM
sockets its RX path takes mux->rx_lock to protect more than just
skb queue. However, kcm_recvmsg() still only grabs the skb queue
lock, so race conditions still exist.
We can teach kcm_recvmsg() to grab mux->rx_lock too but this would
introduce a potential performance regression as struct kcm_mux can
be shared by multiple KCM sockets.
So we have to enforce skb queue lock in requeue_rx_msgs() and handle
skb peek case carefully in kcm_wait_data(). Fortunately,
skb_recv_datagram() already handles it nicely and is widely used by
other sockets, we can just switch to skb_recv_datagram() after
getting rid of the unnecessary sock lock in kcm_recvmsg() and
kcm_splice_read(). Side note: SOCK_DONE is not used by KCM sockets,
so it is safe to get rid of this check too.
I ran the original syzbot reproducer for 30 min without seeing any
issue.
Fixes: ab7ac4eb9832 ("kcm: Kernel Connection Multiplexor module")
Reported-by: syzbot+278279efdd2730dd14bf(a)syzkaller.appspotmail.com
Reported-by: shaozhengchao <shaozhengchao(a)huawei.com>
Cc: Paolo Abeni <pabeni(a)redhat.com>
Cc: Tom Herbert <tom(a)herbertland.com>
Signed-off-by: Cong Wang <cong.wang(a)bytedance.com>
Link: https://lore.kernel.org/r/20221114005119.597905-1-xiyou.wangcong@gmail.com
Signed-off-by: Paolo Abeni <pabeni(a)redhat.com>
diff --git a/net/kcm/kcmsock.c b/net/kcm/kcmsock.c
index a5004228111d..890a2423f559 100644
--- a/net/kcm/kcmsock.c
+++ b/net/kcm/kcmsock.c
@@ -222,7 +222,7 @@ static void requeue_rx_msgs(struct kcm_mux *mux, struct sk_buff_head *head)
struct sk_buff *skb;
struct kcm_sock *kcm;
- while ((skb = __skb_dequeue(head))) {
+ while ((skb = skb_dequeue(head))) {
/* Reset destructor to avoid calling kcm_rcv_ready */
skb->destructor = sock_rfree;
skb_orphan(skb);
@@ -1085,53 +1085,17 @@ static int kcm_sendmsg(struct socket *sock, struct msghdr *msg, size_t len)
return err;
}
-static struct sk_buff *kcm_wait_data(struct sock *sk, int flags,
- long timeo, int *err)
-{
- struct sk_buff *skb;
-
- while (!(skb = skb_peek(&sk->sk_receive_queue))) {
- if (sk->sk_err) {
- *err = sock_error(sk);
- return NULL;
- }
-
- if (sock_flag(sk, SOCK_DONE))
- return NULL;
-
- if ((flags & MSG_DONTWAIT) || !timeo) {
- *err = -EAGAIN;
- return NULL;
- }
-
- sk_wait_data(sk, &timeo, NULL);
-
- /* Handle signals */
- if (signal_pending(current)) {
- *err = sock_intr_errno(timeo);
- return NULL;
- }
- }
-
- return skb;
-}
-
static int kcm_recvmsg(struct socket *sock, struct msghdr *msg,
size_t len, int flags)
{
struct sock *sk = sock->sk;
struct kcm_sock *kcm = kcm_sk(sk);
int err = 0;
- long timeo;
struct strp_msg *stm;
int copied = 0;
struct sk_buff *skb;
- timeo = sock_rcvtimeo(sk, flags & MSG_DONTWAIT);
-
- lock_sock(sk);
-
- skb = kcm_wait_data(sk, flags, timeo, &err);
+ skb = skb_recv_datagram(sk, flags, &err);
if (!skb)
goto out;
@@ -1162,14 +1126,11 @@ static int kcm_recvmsg(struct socket *sock, struct msghdr *msg,
/* Finished with message */
msg->msg_flags |= MSG_EOR;
KCM_STATS_INCR(kcm->stats.rx_msgs);
- skb_unlink(skb, &sk->sk_receive_queue);
- kfree_skb(skb);
}
}
out:
- release_sock(sk);
-
+ skb_free_datagram(sk, skb);
return copied ? : err;
}
@@ -1179,7 +1140,6 @@ static ssize_t kcm_splice_read(struct socket *sock, loff_t *ppos,
{
struct sock *sk = sock->sk;
struct kcm_sock *kcm = kcm_sk(sk);
- long timeo;
struct strp_msg *stm;
int err = 0;
ssize_t copied;
@@ -1187,11 +1147,7 @@ static ssize_t kcm_splice_read(struct socket *sock, loff_t *ppos,
/* Only support splice for SOCKSEQPACKET */
- timeo = sock_rcvtimeo(sk, flags & MSG_DONTWAIT);
-
- lock_sock(sk);
-
- skb = kcm_wait_data(sk, flags, timeo, &err);
+ skb = skb_recv_datagram(sk, flags, &err);
if (!skb)
goto err_out;
@@ -1219,13 +1175,11 @@ static ssize_t kcm_splice_read(struct socket *sock, loff_t *ppos,
* finish reading the message.
*/
- release_sock(sk);
-
+ skb_free_datagram(sk, skb);
return copied;
err_out:
- release_sock(sk);
-
+ skb_free_datagram(sk, skb);
return err;
}
I'm announcing the release of the 4.19.266 kernel.
All users of the 4.19 kernel series must upgrade.
The updated 4.19.y git tree can be found at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git linux-4.19.y
and can be browsed at the normal kernel.org git web browser:
https://git.kernel.org/?p=linux/kernel/git/stable/linux-stable.git;a=summary
thanks,
greg k-h
------------
Documentation/admin-guide/kernel-parameters.txt | 13
Makefile | 2
arch/x86/entry/calling.h | 68 +++-
arch/x86/entry/entry_32.S | 2
arch/x86/entry/entry_64.S | 34 +-
arch/x86/entry/entry_64_compat.S | 11
arch/x86/include/asm/cpu_device_id.h | 168 ++++++++++
arch/x86/include/asm/cpufeatures.h | 18 -
arch/x86/include/asm/intel-family.h | 6
arch/x86/include/asm/msr-index.h | 10
arch/x86/include/asm/nospec-branch.h | 53 +--
arch/x86/kernel/cpu/amd.c | 21 -
arch/x86/kernel/cpu/bugs.c | 368 +++++++++++++++++++-----
arch/x86/kernel/cpu/common.c | 60 ++-
arch/x86/kernel/cpu/match.c | 44 ++
arch/x86/kernel/cpu/scattered.c | 1
arch/x86/kernel/process.c | 2
arch/x86/kvm/svm.c | 1
arch/x86/kvm/vmx.c | 53 +++
arch/x86/kvm/x86.c | 4
drivers/base/cpu.c | 8
drivers/cpufreq/acpi-cpufreq.c | 1
drivers/cpufreq/amd_freq_sensitivity.c | 1
drivers/idle/intel_idle.c | 43 ++
include/linux/cpu.h | 2
include/linux/kvm_host.h | 2
include/linux/mod_devicetable.h | 4
tools/arch/x86/include/asm/cpufeatures.h | 1
28 files changed, 814 insertions(+), 187 deletions(-)
Alexandre Chartre (2):
x86/bugs: Report AMD retbleed vulnerability
x86/bugs: Add AMD retbleed= boot parameter
Andrew Cooper (1):
x86/cpu/amd: Enumerate BTC_NO
Daniel Sneddon (1):
x86/speculation: Add RSB VM Exit protections
Greg Kroah-Hartman (1):
Linux 4.19.266
Ingo Molnar (1):
x86/cpufeature: Fix various quality problems in the <asm/cpu_device_hd.h> header
Josh Poimboeuf (8):
x86/speculation: Fix RSB filling with CONFIG_RETPOLINE=n
x86/speculation: Fix firmware entry SPEC_CTRL handling
x86/speculation: Fix SPEC_CTRL write on SMT state change
x86/speculation: Use cached host SPEC_CTRL value for guest entry/exit
x86/speculation: Remove x86_spec_ctrl_mask
KVM: VMX: Prevent guest RSB poisoning attacks with eIBRS
KVM: VMX: Fix IBRS handling after vmexit
x86/speculation: Fill RSB on vmexit for IBRS
Kan Liang (1):
x86/cpufeature: Add facility to check for min microcode revisions
Mark Gross (1):
x86/cpu: Add a steppings field to struct x86_cpu_id
Nathan Chancellor (1):
x86/speculation: Use DECLARE_PER_CPU for x86_spec_ctrl_current
Pawan Gupta (4):
x86/speculation: Add spectre_v2=ibrs option to support Kernel IBRS
x86/bugs: Add Cannon lake to RETBleed affected CPU list
x86/speculation: Disable RRSBA behavior
x86/bugs: Warn when "ibrs" mitigation is selected on Enhanced IBRS parts
Peter Zijlstra (10):
x86/cpufeatures: Move RETPOLINE flags to word 11
x86/bugs: Keep a per-CPU IA32_SPEC_CTRL value
x86/entry: Remove skip_r11rcx
x86/entry: Add kernel IBRS implementation
x86/bugs: Optimize SPEC_CTRL MSR writes
x86/bugs: Split spectre_v2_select_mitigation() and spectre_v2_user_select_mitigation()
x86/bugs: Report Intel retbleed vulnerability
intel_idle: Disable IBRS during long idle
x86/speculation: Change FILL_RETURN_BUFFER to work with objtool
x86/common: Stamp out the stepping madness
Suleiman Souhlal (2):
Revert "x86/speculation: Add RSB VM Exit protections"
Revert "x86/cpu: Add a steppings field to struct x86_cpu_id"
Thomas Gleixner (2):
x86/devicetable: Move x86 specific macro out of generic code
x86/cpu: Add consistent CPU match macros
The quilt patch titled
Subject: mm/cgroup/reclaim: fix dirty pages throttling on cgroup v1
has been removed from the -mm tree. Its filename was
mm-cgroup-reclaim-fix-dirty-pages-throttling-on-cgroup-v1.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: "Aneesh Kumar K.V" <aneesh.kumar(a)linux.ibm.com>
Subject: mm/cgroup/reclaim: fix dirty pages throttling on cgroup v1
Date: Fri, 18 Nov 2022 12:36:03 +0530
balance_dirty_pages doesn't do the required dirty throttling on cgroupv1.
See commit 9badce000e2c ("cgroup, writeback: don't enable cgroup writeback
on traditional hierarchies"). Instead, the kernel depends on writeback
throttling in shrink_folio_list to achieve the same goal. With large
memory systems, the flusher may not be able to writeback quickly enough
such that we will start finding pages in the shrink_folio_list already in
writeback. Hence for cgroupv1 let's do a reclaim throttle after waking up
the flusher.
The below test which used to fail on a 256GB system completes till the the
file system is full with this change.
root@lp2:/sys/fs/cgroup/memory# mkdir test
root@lp2:/sys/fs/cgroup/memory# cd test/
root@lp2:/sys/fs/cgroup/memory/test# echo 120M > memory.limit_in_bytes
root@lp2:/sys/fs/cgroup/memory/test# echo $$ > tasks
root@lp2:/sys/fs/cgroup/memory/test# dd if=/dev/zero of=/home/kvaneesh/test bs=1M
Killed
Link: https://lkml.kernel.org/r/20221118070603.84081-1-aneesh.kumar@linux.ibm.com
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar(a)linux.ibm.com>
Suggested-by: Johannes Weiner <hannes(a)cmpxchg.org>
Acked-by: Johannes Weiner <hannes(a)cmpxchg.org>
Cc: Tejun Heo <tj(a)kernel.org>
Cc: zefan li <lizefan.x(a)bytedance.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/vmscan.c | 14 +++++++++++++-
1 file changed, 13 insertions(+), 1 deletion(-)
--- a/mm/vmscan.c~mm-cgroup-reclaim-fix-dirty-pages-throttling-on-cgroup-v1
+++ a/mm/vmscan.c
@@ -2514,8 +2514,20 @@ static unsigned long shrink_inactive_lis
* the flushers simply cannot keep up with the allocation
* rate. Nudge the flusher threads in case they are asleep.
*/
- if (stat.nr_unqueued_dirty == nr_taken)
+ if (stat.nr_unqueued_dirty == nr_taken) {
wakeup_flusher_threads(WB_REASON_VMSCAN);
+ /*
+ * For cgroupv1 dirty throttling is achieved by waking up
+ * the kernel flusher here and later waiting on folios
+ * which are in writeback to finish (see shrink_folio_list()).
+ *
+ * Flusher may not be able to issue writeback quickly
+ * enough for cgroupv1 writeback throttling to work
+ * on a large system.
+ */
+ if (!writeback_throttling_sane(sc))
+ reclaim_throttle(pgdat, VMSCAN_THROTTLE_WRITEBACK);
+ }
sc->nr.dirty += stat.nr_dirty;
sc->nr.congested += stat.nr_congested;
_
Patches currently in -mm which might be from aneesh.kumar(a)linux.ibm.com are
The quilt patch titled
Subject: kbuild: fix -Wimplicit-function-declaration in license_is_gpl_compatible
has been removed from the -mm tree. Its filename was
kbuild-fix-wimplicit-function-declaration-in-license_is_gpl_compatible.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Sam James <sam(a)gentoo.org>
Subject: kbuild: fix -Wimplicit-function-declaration in license_is_gpl_compatible
Date: Wed, 16 Nov 2022 18:26:34 +0000
Add missing <linux/string.h> include for strcmp.
Clang 16 makes -Wimplicit-function-declaration an error by default.
Unfortunately, out of tree modules may use this in configure scripts,
which means failure might cause silent miscompilation or misconfiguration.
For more information, see LWN.net [0] or LLVM's Discourse [1], gentoo-dev@ [2],
or the (new) c-std-porting mailing list [3].
[0] https://lwn.net/Articles/913505/
[1] https://discourse.llvm.org/t/configure-script-breakage-with-the-new-werror-…
[2] https://archives.gentoo.org/gentoo-dev/message/dd9f2d3082b8b6f8dfbccb0639e6…
[3] hosted at lists.linux.dev.
[akpm(a)linux-foundation.org: remember "linux/"]
Link: https://lkml.kernel.org/r/20221116182634.2823136-1-sam@gentoo.org
Signed-off-by: Sam James <sam(a)gentoo.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
include/linux/license.h | 2 ++
1 file changed, 2 insertions(+)
--- a/include/linux/license.h~kbuild-fix-wimplicit-function-declaration-in-license_is_gpl_compatible
+++ a/include/linux/license.h
@@ -2,6 +2,8 @@
#ifndef __LICENSE_H
#define __LICENSE_H
+#include <linux/string.h>
+
static inline int license_is_gpl_compatible(const char *license)
{
return (strcmp(license, "GPL") == 0
_
Patches currently in -mm which might be from sam(a)gentoo.org are
The quilt patch titled
Subject: mm/damon/sysfs-schemes: skip stats update if the scheme directory is removed
has been removed from the -mm tree. Its filename was
mm-damon-sysfs-schemes-skip-stats-update-if-the-scheme-directory-is-removed.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: SeongJae Park <sj(a)kernel.org>
Subject: mm/damon/sysfs-schemes: skip stats update if the scheme directory is removed
Date: Mon, 14 Nov 2022 17:55:52 +0000
A DAMON sysfs interface user can start DAMON with a scheme, remove the
sysfs directory for the scheme, and then ask update of the scheme's stats.
Because the schemes stats update logic isn't aware of the situation, it
results in an invalid memory access. Fix the bug by checking if the
scheme sysfs directory exists.
Link: https://lkml.kernel.org/r/20221114175552.1951-1-sj@kernel.org
Fixes: 0ac32b8affb5 ("mm/damon/sysfs: support DAMOS stats")
Signed-off-by: SeongJae Park <sj(a)kernel.org>
Cc: <stable(a)vger.kernel.org> [v5.18]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/damon/sysfs.c | 4 ++++
1 file changed, 4 insertions(+)
--- a/mm/damon/sysfs.c~mm-damon-sysfs-schemes-skip-stats-update-if-the-scheme-directory-is-removed
+++ a/mm/damon/sysfs.c
@@ -2339,6 +2339,10 @@ static int damon_sysfs_upd_schemes_stats
damon_for_each_scheme(scheme, ctx) {
struct damon_sysfs_stats *sysfs_stats;
+ /* user could have removed the scheme sysfs dir */
+ if (schemes_idx >= sysfs_schemes->nr)
+ break;
+
sysfs_stats = sysfs_schemes->schemes_arr[schemes_idx++]->stats;
sysfs_stats->nr_tried = scheme->stat.nr_tried;
sysfs_stats->sz_tried = scheme->stat.sz_tried;
_
Patches currently in -mm which might be from sj(a)kernel.org are
mm-damon-sysfs-fix-wrong-empty-schemes-assumption-under-online-tuning-in-damon_sysfs_set_schemes.patch
mm-damon-core-split-out-damos-charged-region-skip-logic-into-a-new-function.patch
mm-damon-core-split-damos-application-logic-into-a-new-function.patch
mm-damon-core-split-out-scheme-stat-update-logic-into-a-new-function.patch
mm-damon-core-split-out-scheme-quota-adjustment-logic-into-a-new-function.patch
mm-damon-sysfs-use-damon_addr_range-for-regions-start-and-end-values.patch
mm-damon-sysfs-remove-parameters-of-damon_sysfs_region_alloc.patch
mm-damon-sysfs-move-sysfs_lock-to-common-module.patch
mm-damon-sysfs-move-unsigned-long-range-directory-to-common-module.patch
mm-damon-sysfs-split-out-kdamond-independent-schemes-stats-update-logic-into-a-new-function.patch
mm-damon-sysfs-split-out-schemes-directory-implementation-to-separate-file.patch
mm-damon-modules-deduplicate-init-steps-for-damon-context-setup.patch
mm-damon-reclaimlru_sort-remove-unnecessarily-included-headers.patch
mm-damon-reclaim-enable-and-disable-synchronously.patch
selftests-damon-add-tests-for-damon_reclaims-enabled-parameter.patch
mm-damon-lru_sort-enable-and-disable-synchronously.patch
selftests-damon-add-tests-for-damon_lru_sorts-enabled-parameter.patch
docs-admin-guide-mm-damon-usage-describe-the-rules-of-sysfs-region-directories.patch
docs-admin-guide-mm-damon-usage-fix-wrong-usage-example-of-init_regions-file.patch
mm-damon-core-add-a-callback-for-scheme-target-regions-check.patch
mm-damon-sysfs-schemes-implement-schemes-tried_regions-directory.patch
mm-damon-sysfs-schemes-implement-scheme-region-directory.patch
mm-damon-sysfs-implement-damos-tried-regions-update-command.patch
mm-damon-sysfs-implement-damos-tried-regions-update-command-fix.patch
mm-damon-sysfs-schemes-implement-damos-tried-regions-clear-command.patch
mm-damon-sysfs-schemes-implement-damos-tried-regions-clear-command-fix.patch
tools-selftets-damon-sysfs-test-tried_regions-directory-existence.patch
docs-admin-guide-mm-damon-usage-document-schemes-s-tried_regions-sysfs-directory.patch
docs-abi-damon-document-schemes-s-tried_regions-sysfs-directory.patch
selftests-damon-test-non-context-inputs-to-rm_contexts-file.patch
The quilt patch titled
Subject: mm: correctly charge compressed memory to its memcg
has been removed from the -mm tree. Its filename was
mm-correctly-charge-compressed-memory-to-its-memcg.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Li Liguang <liliguang(a)baidu.com>
Subject: mm: correctly charge compressed memory to its memcg
Date: Mon, 14 Nov 2022 14:48:28 -0500
Kswapd will reclaim memory when memory pressure is high, the annonymous
memory will be compressed and stored in the zpool if zswap is enabled.
The memcg_kmem_bypass() in get_obj_cgroup_from_page() will bypass the
kernel thread and cause the compressed memory not be charged to its memory
cgroup.
Remove the memcg_kmem_bypass() call and properly charge compressed memory
to its corresponding memory cgroup.
Link: https://lore.kernel.org/linux-mm/CALvZod4nnn8BHYqAM4xtcR0Ddo2-Wr8uKm9h_CHWU…
Link: https://lkml.kernel.org/r/20221114194828.100822-1-hannes@cmpxchg.org
Fixes: f4840ccfca25 ("zswap: memcg accounting")
Signed-off-by: Li Liguang <liliguang(a)baidu.com>
Signed-off-by: Johannes Weiner <hannes(a)cmpxchg.org>
Acked-by: Shakeel Butt <shakeelb(a)google.com>
Reviewed-by: Muchun Song <songmuchun(a)bytedance.com>
Cc: Michal Hocko <mhocko(a)suse.com>
Cc: Roman Gushchin <roman.gushchin(a)linux.dev>
Cc: <stable(a)vger.kernel.org> [5.19+]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/memcontrol.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/mm/memcontrol.c~mm-correctly-charge-compressed-memory-to-its-memcg
+++ a/mm/memcontrol.c
@@ -3026,7 +3026,7 @@ struct obj_cgroup *get_obj_cgroup_from_p
{
struct obj_cgroup *objcg;
- if (!memcg_kmem_enabled() || memcg_kmem_bypass())
+ if (!memcg_kmem_enabled())
return NULL;
if (PageMemcgKmem(page)) {
_
Patches currently in -mm which might be from liliguang(a)baidu.com are
The quilt patch titled
Subject: mm: vmscan: fix extreme overreclaim and swap floods
has been removed from the -mm tree. Its filename was
mm-vmscan-fix-extreme-overreclaim-and-swap-floods.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Johannes Weiner <hannes(a)cmpxchg.org>
Subject: mm: vmscan: fix extreme overreclaim and swap floods
Date: Tue, 2 Aug 2022 12:28:11 -0400
During proactive reclaim, we sometimes observe severe overreclaim, with
several thousand times more pages reclaimed than requested.
This trace was obtained from shrink_lruvec() during such an instance:
prio:0 anon_cost:1141521 file_cost:7767
nr_reclaimed:4387406 nr_to_reclaim:1047 (or_factor:4190)
nr=[7161123 345 578 1111]
While he reclaimer requested 4M, vmscan reclaimed close to 16G, most of it
by swapping. These requests take over a minute, during which the write()
to memory.reclaim is unkillably stuck inside the kernel.
Digging into the source, this is caused by the proportional reclaim
bailout logic. This code tries to resolve a fundamental conflict: to
reclaim roughly what was requested, while also aging all LRUs fairly and
in accordance to their size, swappiness, refault rates etc. The way it
attempts fairness is that once the reclaim goal has been reached, it stops
scanning the LRUs with the smaller remaining scan targets, and adjusts the
remainder of the bigger LRUs according to how much of the smaller LRUs was
scanned. It then finishes scanning that remainder regardless of the
reclaim goal.
This works fine if priority levels are low and the LRU lists are
comparable in size. However, in this instance, the cgroup that is
targeted by proactive reclaim has almost no files left - they've already
been squeezed out by proactive reclaim earlier - and the remaining anon
pages are hot. Anon rotations cause the priority level to drop to 0,
which results in reclaim targeting all of anon (a lot) and all of file
(almost nothing). By the time reclaim decides to bail, it has scanned
most or all of the file target, and therefor must also scan most or all of
the enormous anon target. This target is thousands of times larger than
the reclaim goal, thus causing the overreclaim.
The bailout code hasn't changed in years, why is this failing now? The
most likely explanations are two other recent changes in anon reclaim:
1. Before the series starting with commit 5df741963d52 ("mm: fix LRU
balancing effect of new transparent huge pages"), the VM was
overall relatively reluctant to swap at all, even if swap was
configured. This means the LRU balancing code didn't come into play
as often as it does now, and mostly in high pressure situations
where pronounced swap activity wouldn't be as surprising.
2. For historic reasons, shrink_lruvec() loops on the scan targets of
all LRU lists except the active anon one, meaning it would bail if
the only remaining pages to scan were active anon - even if there
were a lot of them.
Before the series starting with commit ccc5dc67340c ("mm/vmscan:
make active/inactive ratio as 1:1 for anon lru"), most anon pages
would live on the active LRU; the inactive one would contain only a
handful of preselected reclaim candidates. After the series, anon
gets aged similarly to file, and the inactive list is the default
for new anon pages as well, making it often the much bigger list.
As a result, the VM is now more likely to actually finish large
anon targets than before.
Change the code such that only one SWAP_CLUSTER_MAX-sized nudge toward the
larger LRU lists is made before bailing out on a met reclaim goal.
This fixes the extreme overreclaim problem.
Fairness is more subtle and harder to evaluate. No obvious misbehavior
was observed on the test workload, in any case. Conceptually, fairness
should primarily be a cumulative effect from regular, lower priority
scans. Once the VM is in trouble and needs to escalate scan targets to
make forward progress, fairness needs to take a backseat. This is also
acknowledged by the myriad exceptions in get_scan_count(). This patch
makes fairness decrease gradually, as it keeps fairness work static over
increasing priority levels with growing scan targets. This should make
more sense - although we may have to re-visit the exact values.
Link: https://lkml.kernel.org/r/20220802162811.39216-1-hannes@cmpxchg.org
Signed-off-by: Johannes Weiner <hannes(a)cmpxchg.org>
Reviewed-by: Rik van Riel <riel(a)surriel.com>
Acked-by: Mel Gorman <mgorman(a)techsingularity.net>
Cc: Hugh Dickins <hughd(a)google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim(a)lge.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/vmscan.c | 10 ++++------
1 file changed, 4 insertions(+), 6 deletions(-)
--- a/mm/vmscan.c~mm-vmscan-fix-extreme-overreclaim-and-swap-floods
+++ a/mm/vmscan.c
@@ -5844,8 +5844,8 @@ static void shrink_lruvec(struct lruvec
enum lru_list lru;
unsigned long nr_reclaimed = 0;
unsigned long nr_to_reclaim = sc->nr_to_reclaim;
+ bool proportional_reclaim;
struct blk_plug plug;
- bool scan_adjusted;
if (lru_gen_enabled()) {
lru_gen_shrink_lruvec(lruvec, sc);
@@ -5868,8 +5868,8 @@ static void shrink_lruvec(struct lruvec
* abort proportional reclaim if either the file or anon lru has already
* dropped to zero at the first pass.
*/
- scan_adjusted = (!cgroup_reclaim(sc) && !current_is_kswapd() &&
- sc->priority == DEF_PRIORITY);
+ proportional_reclaim = (!cgroup_reclaim(sc) && !current_is_kswapd() &&
+ sc->priority == DEF_PRIORITY);
blk_start_plug(&plug);
while (nr[LRU_INACTIVE_ANON] || nr[LRU_ACTIVE_FILE] ||
@@ -5889,7 +5889,7 @@ static void shrink_lruvec(struct lruvec
cond_resched();
- if (nr_reclaimed < nr_to_reclaim || scan_adjusted)
+ if (nr_reclaimed < nr_to_reclaim || proportional_reclaim)
continue;
/*
@@ -5940,8 +5940,6 @@ static void shrink_lruvec(struct lruvec
nr_scanned = targets[lru] - nr[lru];
nr[lru] = targets[lru] * (100 - percentage) / 100;
nr[lru] -= min(nr[lru], nr_scanned);
-
- scan_adjusted = true;
}
blk_finish_plug(&plug);
sc->nr_reclaimed += nr_reclaimed;
_
Patches currently in -mm which might be from hannes(a)cmpxchg.org are
mm-vmscan-split-khugepaged-stats-from-direct-reclaim-stats.patch
zswap-fix-writeback-lock-ordering-for-zsmalloc.patch
zpool-clean-out-dead-code.patch
The patch titled
Subject: mm: migrate: Fix THP's mapcount on isolation
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
mm-migrate-fix-thps-mapcount-on-isolation.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Gavin Shan <gshan(a)redhat.com>
Subject: mm: migrate: Fix THP's mapcount on isolation
Date: Wed, 23 Nov 2022 08:57:52 +0800
The issue is reported when removing memory through virtio_mem device. The
transparent huge page, experienced copy-on-write fault, is wrongly
regarded as pinned. The transparent huge page is escaped from being
isolated in isolate_migratepages_block(). The transparent huge page can't
be migrated and the corresponding memory block can't be put into offline
state.
Fix it by replacing page_mapcount() with total_mapcount(). With this, the
transparent huge page can be isolated and migrated, and the memory block
can be put into offline state.
Link: https://lkml.kernel.org/r/20221123005752.161003-1-gshan@redhat.com
Fixes: 3917c80280c9 ("thp: change CoW semantics for anon-THP")
Signed-off-by: Gavin Shan <gshan(a)redhat.com>
Reported-by: Zhenyu Zhang <zhenyzha(a)redhat.com>
Suggested-by: David Hildenbrand <david(a)redhat.com>
Cc: David Hildenbrand <david(a)redhat.com>
Cc: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com>
Cc: William Kucharski <william.kucharski(a)oracle.com>
Cc: Zi Yan <ziy(a)nvidia.com>
Cc: <stable(a)vger.kernel.org> [v5.8+]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/compaction.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/mm/compaction.c~mm-migrate-fix-thps-mapcount-on-isolation
+++ a/mm/compaction.c
@@ -990,7 +990,7 @@ isolate_migratepages_block(struct compac
* admittedly racy check.
*/
mapping = page_mapping(page);
- if (!mapping && page_count(page) > page_mapcount(page))
+ if (!mapping && page_count(page) > total_mapcount(page))
goto isolate_fail;
/*
_
Patches currently in -mm which might be from gshan(a)redhat.com are
mm-migrate-fix-thps-mapcount-on-isolation.patch
Hi Sasha,
On Tue, Nov 22, 2022 at 10:19 AM -05, Sasha Levin wrote:
> This is a note to let you know that I've just added the patch titled
>
> l2tp: Serialize access to sk_user_data with sk_callback_lock
>
> to the 6.0-stable tree which can be found at:
> http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
>
> The filename of the patch is:
> l2tp-serialize-access-to-sk_user_data-with-sk_callba.patch
> and it can be found in the queue-6.0 subdirectory.
>
> If you, or anyone else, feels it should not be added to the stable tree,
> please let <stable(a)vger.kernel.org> know about it.
"Just when I thought I was out, they pull me back in!"
Greg assured me yesterday that this was dropped from stable queues:
https://lore.kernel.org/stable/Y3thooxAN2Are7Ai@kroah.com/
> commit 1ea60c1db42da0b5b40eb7e9bf8d5937f6f475cc
> Author: Jakub Sitnicki <jakub(a)cloudflare.com>
> Date: Mon Nov 14 20:16:19 2022 +0100
>
> l2tp: Serialize access to sk_user_data with sk_callback_lock
>
> [ Upstream commit b68777d54fac21fc833ec26ea1a2a84f975ab035 ]
>
> sk->sk_user_data has multiple users, which are not compatible with each
> other. Writers must synchronize by grabbing the sk->sk_callback_lock.
>
> l2tp currently fails to grab the lock when modifying the underlying tunnel
> socket fields. Fix it by adding appropriate locking.
>
> We err on the side of safety and grab the sk_callback_lock also inside the
> sk_destruct callback overridden by l2tp, even though there should be no
> refs allowing access to the sock at the time when sk_destruct gets called.
>
> v4:
> - serialize write to sk_user_data in l2tp sk_destruct
>
> v3:
> - switch from sock lock to sk_callback_lock
> - document write-protection for sk_user_data
>
> v2:
> - update Fixes to point to origin of the bug
> - use real names in Reported/Tested-by tags
>
> Cc: Tom Parkin <tparkin(a)katalix.com>
> Fixes: 3557baabf280 ("[L2TP]: PPP over L2TP driver core")
> Reported-by: Haowei Yan <g1042620637(a)gmail.com>
> Signed-off-by: Jakub Sitnicki <jakub(a)cloudflare.com>
> Signed-off-by: David S. Miller <davem(a)davemloft.net>
> Signed-off-by: Sasha Levin <sashal(a)kernel.org>
>
> diff --git a/include/net/sock.h b/include/net/sock.h
> index f6e6838c82df..03a4ebe3ccc8 100644
> --- a/include/net/sock.h
> +++ b/include/net/sock.h
> @@ -323,7 +323,7 @@ struct sk_filter;
> * @sk_tskey: counter to disambiguate concurrent tstamp requests
> * @sk_zckey: counter to order MSG_ZEROCOPY notifications
> * @sk_socket: Identd and reporting IO signals
> - * @sk_user_data: RPC layer private data
> + * @sk_user_data: RPC layer private data. Write-protected by @sk_callback_lock.
> * @sk_frag: cached page frag
> * @sk_peek_off: current peek_offset value
> * @sk_send_head: front of stuff to transmit
> diff --git a/net/l2tp/l2tp_core.c b/net/l2tp/l2tp_core.c
> index 7499c51b1850..754fdda8a5f5 100644
> --- a/net/l2tp/l2tp_core.c
> +++ b/net/l2tp/l2tp_core.c
> @@ -1150,8 +1150,10 @@ static void l2tp_tunnel_destruct(struct sock *sk)
> }
>
> /* Remove hooks into tunnel socket */
> + write_lock_bh(&sk->sk_callback_lock);
> sk->sk_destruct = tunnel->old_sk_destruct;
> sk->sk_user_data = NULL;
> + write_unlock_bh(&sk->sk_callback_lock);
>
> /* Call the original destructor */
> if (sk->sk_destruct)
> @@ -1469,16 +1471,18 @@ int l2tp_tunnel_register(struct l2tp_tunnel *tunnel, struct net *net,
> sock = sockfd_lookup(tunnel->fd, &ret);
> if (!sock)
> goto err;
> -
> - ret = l2tp_validate_socket(sock->sk, net, tunnel->encap);
> - if (ret < 0)
> - goto err_sock;
> }
>
> + sk = sock->sk;
> + write_lock(&sk->sk_callback_lock);
> +
> + ret = l2tp_validate_socket(sk, net, tunnel->encap);
> + if (ret < 0)
> + goto err_sock;
> +
> tunnel->l2tp_net = net;
> pn = l2tp_pernet(net);
>
> - sk = sock->sk;
> sock_hold(sk);
> tunnel->sock = sk;
>
> @@ -1504,7 +1508,7 @@ int l2tp_tunnel_register(struct l2tp_tunnel *tunnel, struct net *net,
>
> setup_udp_tunnel_sock(net, sock, &udp_cfg);
> } else {
> - sk->sk_user_data = tunnel;
> + rcu_assign_sk_user_data(sk, tunnel);
> }
>
> tunnel->old_sk_destruct = sk->sk_destruct;
> @@ -1518,6 +1522,7 @@ int l2tp_tunnel_register(struct l2tp_tunnel *tunnel, struct net *net,
> if (tunnel->fd >= 0)
> sockfd_put(sock);
>
> + write_unlock(&sk->sk_callback_lock);
> return 0;
>
> err_sock:
> @@ -1525,6 +1530,8 @@ int l2tp_tunnel_register(struct l2tp_tunnel *tunnel, struct net *net,
> sock_release(sock);
> else
> sockfd_put(sock);
> +
> + write_unlock(&sk->sk_callback_lock);
> err:
> return ret;
> }
The patch titled
Subject: mm/damon/sysfs: fix wrong empty schemes assumption under online tuning in damon_sysfs_set_schemes()
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
mm-damon-sysfs-fix-wrong-empty-schemes-assumption-under-online-tuning-in-damon_sysfs_set_schemes.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: SeongJae Park <sj(a)kernel.org>
Subject: mm/damon/sysfs: fix wrong empty schemes assumption under online tuning in damon_sysfs_set_schemes()
Date: Tue, 22 Nov 2022 19:48:31 +0000
Commit da87878010e5 ("mm/damon/sysfs: support online inputs update") made
'damon_sysfs_set_schemes()' to be called for running DAMON context, which
could have schemes. In the case, DAMON sysfs interface is supposed to
update, remove, or add schemes to reflect the sysfs files. However, the
code is assuming the DAMON context wouldn't have schemes at all, and
therefore creates and adds new schemes. As a result, the code doesn't
work as intended for online schemes tuning and could have more than
expected memory footprint. The schemes are all in the DAMON context, so
it doesn't leak the memory, though.
Remove the wrong asssumption (the DAMON context wouldn't have schemes) in
'damon_sysfs_set_schemes()' to fix the bug.
Link: https://lkml.kernel.org/r/20221122194831.3472-1-sj@kernel.org
Fixes: da87878010e5 ("mm/damon/sysfs: support online inputs update")
Signed-off-by: SeongJae Park <sj(a)kernel.org>
Cc: <stable(a)vger.kernel.org> [5.19+]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/damon/sysfs.c | 46 +++++++++++++++++++++++++++++++++++++++++++--
1 file changed, 44 insertions(+), 2 deletions(-)
--- a/mm/damon/sysfs.c~mm-damon-sysfs-fix-wrong-empty-schemes-assumption-under-online-tuning-in-damon_sysfs_set_schemes
+++ a/mm/damon/sysfs.c
@@ -2283,12 +2283,54 @@ static struct damos *damon_sysfs_mk_sche
&wmarks);
}
+static void damon_sysfs_update_scheme(struct damos *scheme,
+ struct damon_sysfs_scheme *sysfs_scheme)
+{
+ struct damon_sysfs_access_pattern *access_pattern =
+ sysfs_scheme->access_pattern;
+ struct damon_sysfs_quotas *sysfs_quotas = sysfs_scheme->quotas;
+ struct damon_sysfs_weights *sysfs_weights = sysfs_quotas->weights;
+ struct damon_sysfs_watermarks *sysfs_wmarks = sysfs_scheme->watermarks;
+
+ scheme->pattern.min_sz_region = access_pattern->sz->min;
+ scheme->pattern.max_sz_region = access_pattern->sz->max;
+ scheme->pattern.min_nr_accesses = access_pattern->nr_accesses->min;
+ scheme->pattern.max_nr_accesses = access_pattern->nr_accesses->max;
+ scheme->pattern.min_age_region = access_pattern->age->min;
+ scheme->pattern.max_age_region = access_pattern->age->max;
+
+ scheme->action = sysfs_scheme->action;
+
+ scheme->quota.ms = sysfs_quotas->ms;
+ scheme->quota.sz = sysfs_quotas->sz;
+ scheme->quota.reset_interval = sysfs_quotas->reset_interval_ms;
+ scheme->quota.weight_sz = sysfs_weights->sz;
+ scheme->quota.weight_nr_accesses = sysfs_weights->nr_accesses;
+ scheme->quota.weight_age = sysfs_weights->age;
+
+ scheme->wmarks.metric = sysfs_wmarks->metric;
+ scheme->wmarks.interval = sysfs_wmarks->interval_us;
+ scheme->wmarks.high = sysfs_wmarks->high;
+ scheme->wmarks.mid = sysfs_wmarks->mid;
+ scheme->wmarks.low = sysfs_wmarks->low;
+}
+
static int damon_sysfs_set_schemes(struct damon_ctx *ctx,
struct damon_sysfs_schemes *sysfs_schemes)
{
- int i;
+ struct damos *scheme, *next;
+ int i = 0;
+
+ damon_for_each_scheme_safe(scheme, next, ctx) {
+ if (i < sysfs_schemes->nr)
+ damon_sysfs_update_scheme(scheme,
+ sysfs_schemes->schemes_arr[i]);
+ else
+ damon_destroy_scheme(scheme);
+ i++;
+ }
- for (i = 0; i < sysfs_schemes->nr; i++) {
+ for (; i < sysfs_schemes->nr; i++) {
struct damos *scheme, *next;
scheme = damon_sysfs_mk_scheme(sysfs_schemes->schemes_arr[i]);
_
Patches currently in -mm which might be from sj(a)kernel.org are
mm-damon-sysfs-schemes-skip-stats-update-if-the-scheme-directory-is-removed.patch
mm-damon-sysfs-fix-wrong-empty-schemes-assumption-under-online-tuning-in-damon_sysfs_set_schemes.patch
mm-damon-core-split-out-damos-charged-region-skip-logic-into-a-new-function.patch
mm-damon-core-split-damos-application-logic-into-a-new-function.patch
mm-damon-core-split-out-scheme-stat-update-logic-into-a-new-function.patch
mm-damon-core-split-out-scheme-quota-adjustment-logic-into-a-new-function.patch
mm-damon-sysfs-use-damon_addr_range-for-regions-start-and-end-values.patch
mm-damon-sysfs-remove-parameters-of-damon_sysfs_region_alloc.patch
mm-damon-sysfs-move-sysfs_lock-to-common-module.patch
mm-damon-sysfs-move-unsigned-long-range-directory-to-common-module.patch
mm-damon-sysfs-split-out-kdamond-independent-schemes-stats-update-logic-into-a-new-function.patch
mm-damon-sysfs-split-out-schemes-directory-implementation-to-separate-file.patch
mm-damon-modules-deduplicate-init-steps-for-damon-context-setup.patch
mm-damon-reclaimlru_sort-remove-unnecessarily-included-headers.patch
mm-damon-reclaim-enable-and-disable-synchronously.patch
selftests-damon-add-tests-for-damon_reclaims-enabled-parameter.patch
mm-damon-lru_sort-enable-and-disable-synchronously.patch
selftests-damon-add-tests-for-damon_lru_sorts-enabled-parameter.patch
docs-admin-guide-mm-damon-usage-describe-the-rules-of-sysfs-region-directories.patch
docs-admin-guide-mm-damon-usage-fix-wrong-usage-example-of-init_regions-file.patch
mm-damon-core-add-a-callback-for-scheme-target-regions-check.patch
mm-damon-sysfs-schemes-implement-schemes-tried_regions-directory.patch
mm-damon-sysfs-schemes-implement-scheme-region-directory.patch
mm-damon-sysfs-implement-damos-tried-regions-update-command.patch
mm-damon-sysfs-implement-damos-tried-regions-update-command-fix.patch
mm-damon-sysfs-schemes-implement-damos-tried-regions-clear-command.patch
mm-damon-sysfs-schemes-implement-damos-tried-regions-clear-command-fix.patch
tools-selftets-damon-sysfs-test-tried_regions-directory-existence.patch
docs-admin-guide-mm-damon-usage-document-schemes-s-tried_regions-sysfs-directory.patch
docs-abi-damon-document-schemes-s-tried_regions-sysfs-directory.patch
selftests-damon-test-non-context-inputs-to-rm_contexts-file.patch
The size of device tree node secmon (bl31_secmon_reserved) was
incorrect. It should be increased to 2MiB (0x200000).
The origin setting will cause some abnormal behavior due to
trusted-firmware-a and related firmware didn't load correctly.
The incorrect behavior may vary because of different software stacks.
For example, it will cause build error in some Yocto project because
it will check if there was enough memory to load trusted-firmware-a
to the reserved memory.
When mt8195-demo.dts sent to the upstream, at that time the size of
BL31 was small. Because supported functions and modules in BL31 are
basic sets when the board was under early development stage.
Now BL31 includes more firmwares of coprocessors and maturer functions
so the size has grown bigger in real applications. According to the value
reported by customers, we think reserved 2MiB for BL31 might be enough
for maybe the following 2 or 3 years.
Cc: stable(a)vger.kernel.org # v5.19
Fixes: 6147314aeedc ("arm64: dts: mediatek: Add device-tree for MT8195 Demo board")
Signed-off-by: Macpaul Lin <macpaul.lin(a)mediatek.com>
Reviewed-by: Miles Chen <miles.chen(a)mediatek.com>
---
Changes for v2
- Add more information about the size difference for BL31 in commit message.
Thanks for Miles's review.
arch/arm64/boot/dts/mediatek/mt8195-demo.dts | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/arch/arm64/boot/dts/mediatek/mt8195-demo.dts b/arch/arm64/boot/dts/mediatek/mt8195-demo.dts
index 4fbd99eb496a..dec85d254838 100644
--- a/arch/arm64/boot/dts/mediatek/mt8195-demo.dts
+++ b/arch/arm64/boot/dts/mediatek/mt8195-demo.dts
@@ -56,10 +56,10 @@
#size-cells = <2>;
ranges;
- /* 192 KiB reserved for ARM Trusted Firmware (BL31) */
+ /* 2 MiB reserved for ARM Trusted Firmware (BL31) */
bl31_secmon_reserved: secmon@54600000 {
no-map;
- reg = <0 0x54600000 0x0 0x30000>;
+ reg = <0 0x54600000 0x0 0x200000>;
};
/* 12 MiB reserved for OP-TEE (BL32)
--
2.18.0
This is a note to let you know that I've just added the patch titled
dt-bindings: iio: adc: Remove the property "aspeed,trim-data-valid"
to my char-misc git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc.git
in the char-misc-linus branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will hopefully also be merged in Linus's tree for the
next -rc kernel release.
If you have any questions about this process, please let me know.
From 398e3479874f381cca8726ca5d8a31e1bf35a3cd Mon Sep 17 00:00:00 2001
From: Billy Tsai <billy_tsai(a)aspeedtech.com>
Date: Mon, 14 Nov 2022 10:50:57 +0800
Subject: dt-bindings: iio: adc: Remove the property "aspeed,trim-data-valid"
If the property is set on a device without valid trimming data in the OTP
the ADC will not function correctly. Therefore, this patch drops the use
of this property to avoid this scenario.
Fixes: 2bdb2f00a895 ("dt-bindings: iio: adc: Add ast2600-adc bindings")
Signed-off-by: Billy Tsai <billy_tsai(a)aspeedtech.com>
Acked-by: Rob Herring <robh(a)kernel.org>
Link: https://lore.kernel.org/r/20221114025057.10843-2-billy_tsai@aspeedtech.com
Cc: <Stable(a)vger.kernel.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron(a)huawei.com>
---
.../devicetree/bindings/iio/adc/aspeed,ast2600-adc.yaml | 7 -------
1 file changed, 7 deletions(-)
diff --git a/Documentation/devicetree/bindings/iio/adc/aspeed,ast2600-adc.yaml b/Documentation/devicetree/bindings/iio/adc/aspeed,ast2600-adc.yaml
index b283c8ca2bbf..5c08d8b6e995 100644
--- a/Documentation/devicetree/bindings/iio/adc/aspeed,ast2600-adc.yaml
+++ b/Documentation/devicetree/bindings/iio/adc/aspeed,ast2600-adc.yaml
@@ -62,13 +62,6 @@ properties:
description:
Inform the driver that last channel will be used to sensor battery.
- aspeed,trim-data-valid:
- type: boolean
- description: |
- The ADC reference voltage can be calibrated to obtain the trimming
- data which will be stored in otp. This property informs the driver that
- the data store in the otp is valid.
-
required:
- compatible
- reg
--
2.38.1
This is a note to let you know that I've just added the patch titled
iio: adc: aspeed: Remove the trim valid dts property.
to my char-misc git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc.git
in the char-misc-linus branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will hopefully also be merged in Linus's tree for the
next -rc kernel release.
If you have any questions about this process, please let me know.
From fdd0d6b2eb35c83d6b1226ad20b346a4b45ddfb8 Mon Sep 17 00:00:00 2001
From: Billy Tsai <billy_tsai(a)aspeedtech.com>
Date: Mon, 14 Nov 2022 10:50:56 +0800
Subject: iio: adc: aspeed: Remove the trim valid dts property.
The dts property "aspeed,trim-data-valid" is currently used to determine
whether to read trimming data from the OTP register. If this is set on
a device without valid trimming data in the OTP the ADC will not function
correctly. This patch drops the use of this property and instead uses the
default (unprogrammed) OTP value of 0 to detect when a fallback value of
0x8 should be used rather then the value read from the OTP.
Fixes: d0a4c17b4073 ("iio: adc: aspeed: Get and set trimming data.")
Signed-off-by: Billy Tsai <billy_tsai(a)aspeedtech.com>
Link: https://lore.kernel.org/r/20221114025057.10843-1-billy_tsai@aspeedtech.com
Cc: <Stable(a)vger.kernel.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron(a)huawei.com>
---
drivers/iio/adc/aspeed_adc.c | 11 +++++------
1 file changed, 5 insertions(+), 6 deletions(-)
diff --git a/drivers/iio/adc/aspeed_adc.c b/drivers/iio/adc/aspeed_adc.c
index 9341e0e0eb55..998e8bcc06e1 100644
--- a/drivers/iio/adc/aspeed_adc.c
+++ b/drivers/iio/adc/aspeed_adc.c
@@ -202,6 +202,8 @@ static int aspeed_adc_set_trim_data(struct iio_dev *indio_dev)
((scu_otp) &
(data->model_data->trim_locate->field)) >>
__ffs(data->model_data->trim_locate->field);
+ if (!trimming_val)
+ trimming_val = 0x8;
}
dev_dbg(data->dev,
"trimming val = %d, offset = %08x, fields = %08x\n",
@@ -563,12 +565,9 @@ static int aspeed_adc_probe(struct platform_device *pdev)
if (ret)
return ret;
- if (of_find_property(data->dev->of_node, "aspeed,trim-data-valid",
- NULL)) {
- ret = aspeed_adc_set_trim_data(indio_dev);
- if (ret)
- return ret;
- }
+ ret = aspeed_adc_set_trim_data(indio_dev);
+ if (ret)
+ return ret;
if (of_find_property(data->dev->of_node, "aspeed,battery-sensing",
NULL)) {
--
2.38.1
This is a note to let you know that I've just added the patch titled
iio: core: Fix entry not deleted when iio_register_sw_trigger_type()
to my char-misc git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc.git
in the char-misc-linus branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will hopefully also be merged in Linus's tree for the
next -rc kernel release.
If you have any questions about this process, please let me know.
From 4ad09d956f8eacff61e67e5b13ba8ebec3232f76 Mon Sep 17 00:00:00 2001
From: Chen Zhongjin <chenzhongjin(a)huawei.com>
Date: Tue, 8 Nov 2022 11:28:02 +0800
Subject: iio: core: Fix entry not deleted when iio_register_sw_trigger_type()
fails
In iio_register_sw_trigger_type(), configfs_register_default_group() is
possible to fail, but the entry add to iio_trigger_types_list is not
deleted.
This leaves wild in iio_trigger_types_list, which can cause page fault
when module is loading again. So fix this by list_del(&t->list) in error
path.
BUG: unable to handle page fault for address: fffffbfff81d7400
Call Trace:
<TASK>
iio_register_sw_trigger_type
do_one_initcall
do_init_module
load_module
...
Fixes: b662f809d410 ("iio: core: Introduce IIO software triggers")
Signed-off-by: Chen Zhongjin <chenzhongjin(a)huawei.com>
Link: https://lore.kernel.org/r/20221108032802.168623-1-chenzhongjin@huawei.com
Cc: <Stable(a)vger.kernel.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron(a)huawei.com>
---
drivers/iio/industrialio-sw-trigger.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/drivers/iio/industrialio-sw-trigger.c b/drivers/iio/industrialio-sw-trigger.c
index 994f03a71520..d86a3305d9e8 100644
--- a/drivers/iio/industrialio-sw-trigger.c
+++ b/drivers/iio/industrialio-sw-trigger.c
@@ -58,8 +58,12 @@ int iio_register_sw_trigger_type(struct iio_sw_trigger_type *t)
t->group = configfs_register_default_group(iio_triggers_group, t->name,
&iio_trigger_type_group_type);
- if (IS_ERR(t->group))
+ if (IS_ERR(t->group)) {
+ mutex_lock(&iio_trigger_types_lock);
+ list_del(&t->list);
+ mutex_unlock(&iio_trigger_types_lock);
ret = PTR_ERR(t->group);
+ }
return ret;
}
--
2.38.1
This is a note to let you know that I've just added the patch titled
iio: accel: bma400: Fix memory leak in bma400_get_steps_reg()
to my char-misc git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc.git
in the char-misc-linus branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will hopefully also be merged in Linus's tree for the
next -rc kernel release.
If you have any questions about this process, please let me know.
From 20690cd50e68c0313472c7539460168b8ea6444d Mon Sep 17 00:00:00 2001
From: Dong Chenchen <dongchenchen2(a)huawei.com>
Date: Thu, 10 Nov 2022 09:07:26 +0800
Subject: iio: accel: bma400: Fix memory leak in bma400_get_steps_reg()
When regmap_bulk_read() fails, it does not free steps_raw,
which will cause a memory leak issue, this patch fixes it.
Fixes: d221de60eee3 ("iio: accel: bma400: Add separate channel for step counter")
Signed-off-by: Dong Chenchen <dongchenchen2(a)huawei.com>
Reviewed-by: Jagath Jog J <jagathjog1996(a)gmail.com>
Link: https://lore.kernel.org/r/20221110010726.235601-1-dongchenchen2@huawei.com
Cc: <Stable(a)vger.kernel.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron(a)huawei.com>
---
drivers/iio/accel/bma400_core.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/iio/accel/bma400_core.c b/drivers/iio/accel/bma400_core.c
index 490c342ef72a..6e4d10a7cd32 100644
--- a/drivers/iio/accel/bma400_core.c
+++ b/drivers/iio/accel/bma400_core.c
@@ -805,8 +805,10 @@ static int bma400_get_steps_reg(struct bma400_data *data, int *val)
ret = regmap_bulk_read(data->regmap, BMA400_STEP_CNT0_REG,
steps_raw, BMA400_STEP_RAW_LEN);
- if (ret)
+ if (ret) {
+ kfree(steps_raw);
return ret;
+ }
*val = get_unaligned_le24(steps_raw);
kfree(steps_raw);
return IIO_VAL_INT;
--
2.38.1
This is a note to let you know that I've just added the patch titled
iio: light: apds9960: fix wrong register for gesture gain
to my char-misc git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc.git
in the char-misc-linus branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will hopefully also be merged in Linus's tree for the
next -rc kernel release.
If you have any questions about this process, please let me know.
From 0aa60ff5d996d4ecdd4a62699c01f6d00f798d59 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Alejandro=20Concepci=C3=B3n=20Rodr=C3=ADguez?=
<asconcepcion(a)acoro.eu>
Date: Sun, 6 Nov 2022 01:56:51 +0000
Subject: iio: light: apds9960: fix wrong register for gesture gain
Gesture Gain Control is in REG_GCONF_2 (0xa3), not in REG_CONFIG_2 (0x90).
Fixes: aff268cd532e ("iio: light: add APDS9960 ALS + promixity driver")
Signed-off-by: Alejandro Concepcion-Rodriguez <asconcepcion(a)acoro.eu>
Acked-by: Matt Ranostay <matt.ranostay(a)konsulko.com>
Cc: <Stable(a)vger.kernel.org>
Link: https://lore.kernel.org/r/EaT-NKC-H4DNX5z4Lg9B6IWPD5TrTrYBr5DYB784wfDKQkTmz…
Signed-off-by: Jonathan Cameron <Jonathan.Cameron(a)huawei.com>
---
drivers/iio/light/apds9960.c | 12 ++++++------
1 file changed, 6 insertions(+), 6 deletions(-)
diff --git a/drivers/iio/light/apds9960.c b/drivers/iio/light/apds9960.c
index b62c139baf41..38d4c7644bef 100644
--- a/drivers/iio/light/apds9960.c
+++ b/drivers/iio/light/apds9960.c
@@ -54,9 +54,6 @@
#define APDS9960_REG_CONTROL_PGAIN_MASK_SHIFT 2
#define APDS9960_REG_CONFIG_2 0x90
-#define APDS9960_REG_CONFIG_2_GGAIN_MASK 0x60
-#define APDS9960_REG_CONFIG_2_GGAIN_MASK_SHIFT 5
-
#define APDS9960_REG_ID 0x92
#define APDS9960_REG_STATUS 0x93
@@ -77,6 +74,9 @@
#define APDS9960_REG_GCONF_1_GFIFO_THRES_MASK_SHIFT 6
#define APDS9960_REG_GCONF_2 0xa3
+#define APDS9960_REG_GCONF_2_GGAIN_MASK 0x60
+#define APDS9960_REG_GCONF_2_GGAIN_MASK_SHIFT 5
+
#define APDS9960_REG_GOFFSET_U 0xa4
#define APDS9960_REG_GOFFSET_D 0xa5
#define APDS9960_REG_GPULSE 0xa6
@@ -396,9 +396,9 @@ static int apds9960_set_pxs_gain(struct apds9960_data *data, int val)
}
ret = regmap_update_bits(data->regmap,
- APDS9960_REG_CONFIG_2,
- APDS9960_REG_CONFIG_2_GGAIN_MASK,
- idx << APDS9960_REG_CONFIG_2_GGAIN_MASK_SHIFT);
+ APDS9960_REG_GCONF_2,
+ APDS9960_REG_GCONF_2_GGAIN_MASK,
+ idx << APDS9960_REG_GCONF_2_GGAIN_MASK_SHIFT);
if (!ret)
data->pxs_gain = idx;
mutex_unlock(&data->lock);
--
2.38.1
This is the start of the stable review cycle for the 4.19.266 release.
There are 34 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Wed, 23 Nov 2022 12:41:40 +0000.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.19.266-r…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.19.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 4.19.266-rc1
Daniel Sneddon <daniel.sneddon(a)linux.intel.com>
x86/speculation: Add RSB VM Exit protections
Pawan Gupta <pawan.kumar.gupta(a)linux.intel.com>
x86/bugs: Warn when "ibrs" mitigation is selected on Enhanced IBRS parts
Nathan Chancellor <nathan(a)kernel.org>
x86/speculation: Use DECLARE_PER_CPU for x86_spec_ctrl_current
Pawan Gupta <pawan.kumar.gupta(a)linux.intel.com>
x86/speculation: Disable RRSBA behavior
Pawan Gupta <pawan.kumar.gupta(a)linux.intel.com>
x86/bugs: Add Cannon lake to RETBleed affected CPU list
Andrew Cooper <andrew.cooper3(a)citrix.com>
x86/cpu/amd: Enumerate BTC_NO
Peter Zijlstra <peterz(a)infradead.org>
x86/common: Stamp out the stepping madness
Josh Poimboeuf <jpoimboe(a)kernel.org>
x86/speculation: Fill RSB on vmexit for IBRS
Josh Poimboeuf <jpoimboe(a)kernel.org>
KVM: VMX: Fix IBRS handling after vmexit
Josh Poimboeuf <jpoimboe(a)kernel.org>
KVM: VMX: Prevent guest RSB poisoning attacks with eIBRS
Josh Poimboeuf <jpoimboe(a)kernel.org>
x86/speculation: Remove x86_spec_ctrl_mask
Josh Poimboeuf <jpoimboe(a)kernel.org>
x86/speculation: Use cached host SPEC_CTRL value for guest entry/exit
Josh Poimboeuf <jpoimboe(a)kernel.org>
x86/speculation: Fix SPEC_CTRL write on SMT state change
Josh Poimboeuf <jpoimboe(a)kernel.org>
x86/speculation: Fix firmware entry SPEC_CTRL handling
Josh Poimboeuf <jpoimboe(a)kernel.org>
x86/speculation: Fix RSB filling with CONFIG_RETPOLINE=n
Peter Zijlstra <peterz(a)infradead.org>
x86/speculation: Change FILL_RETURN_BUFFER to work with objtool
Peter Zijlstra <peterz(a)infradead.org>
intel_idle: Disable IBRS during long idle
Peter Zijlstra <peterz(a)infradead.org>
x86/bugs: Report Intel retbleed vulnerability
Peter Zijlstra <peterz(a)infradead.org>
x86/bugs: Split spectre_v2_select_mitigation() and spectre_v2_user_select_mitigation()
Pawan Gupta <pawan.kumar.gupta(a)linux.intel.com>
x86/speculation: Add spectre_v2=ibrs option to support Kernel IBRS
Peter Zijlstra <peterz(a)infradead.org>
x86/bugs: Optimize SPEC_CTRL MSR writes
Peter Zijlstra <peterz(a)infradead.org>
x86/entry: Add kernel IBRS implementation
Peter Zijlstra <peterz(a)infradead.org>
x86/entry: Remove skip_r11rcx
Peter Zijlstra <peterz(a)infradead.org>
x86/bugs: Keep a per-CPU IA32_SPEC_CTRL value
Alexandre Chartre <alexandre.chartre(a)oracle.com>
x86/bugs: Add AMD retbleed= boot parameter
Alexandre Chartre <alexandre.chartre(a)oracle.com>
x86/bugs: Report AMD retbleed vulnerability
Peter Zijlstra <peterz(a)infradead.org>
x86/cpufeatures: Move RETPOLINE flags to word 11
Mark Gross <mgross(a)linux.intel.com>
x86/cpu: Add a steppings field to struct x86_cpu_id
Thomas Gleixner <tglx(a)linutronix.de>
x86/cpu: Add consistent CPU match macros
Thomas Gleixner <tglx(a)linutronix.de>
x86/devicetable: Move x86 specific macro out of generic code
Ingo Molnar <mingo(a)kernel.org>
x86/cpufeature: Fix various quality problems in the <asm/cpu_device_hd.h> header
Kan Liang <kan.liang(a)linux.intel.com>
x86/cpufeature: Add facility to check for min microcode revisions
Suleiman Souhlal <suleiman(a)google.com>
Revert "x86/cpu: Add a steppings field to struct x86_cpu_id"
Suleiman Souhlal <suleiman(a)google.com>
Revert "x86/speculation: Add RSB VM Exit protections"
-------------
Diffstat:
Documentation/admin-guide/kernel-parameters.txt | 13 +
Makefile | 4 +-
arch/x86/entry/calling.h | 68 ++++-
arch/x86/entry/entry_32.S | 2 -
arch/x86/entry/entry_64.S | 34 ++-
arch/x86/entry/entry_64_compat.S | 11 +-
arch/x86/include/asm/cpu_device_id.h | 168 ++++++++++-
arch/x86/include/asm/cpufeatures.h | 18 +-
arch/x86/include/asm/intel-family.h | 6 +
arch/x86/include/asm/msr-index.h | 10 +
arch/x86/include/asm/nospec-branch.h | 53 ++--
arch/x86/kernel/cpu/amd.c | 21 +-
arch/x86/kernel/cpu/bugs.c | 368 +++++++++++++++++++-----
arch/x86/kernel/cpu/common.c | 60 ++--
arch/x86/kernel/cpu/match.c | 44 ++-
arch/x86/kernel/cpu/scattered.c | 1 +
arch/x86/kernel/process.c | 2 +-
arch/x86/kvm/svm.c | 1 +
arch/x86/kvm/vmx.c | 53 +++-
arch/x86/kvm/x86.c | 4 +-
drivers/base/cpu.c | 8 +
drivers/cpufreq/acpi-cpufreq.c | 1 +
drivers/cpufreq/amd_freq_sensitivity.c | 1 +
drivers/idle/intel_idle.c | 43 ++-
include/linux/cpu.h | 2 +
include/linux/kvm_host.h | 2 +-
include/linux/mod_devicetable.h | 4 +-
tools/arch/x86/include/asm/cpufeatures.h | 1 +
28 files changed, 815 insertions(+), 188 deletions(-)
Hello
I will like to use the liberty of this medium to inform you as a consultant,that my principal is interested in investing his bond/funds as a silent business partner in your company.Taking into proper
consideration the Return on Investment(ROI) based on a ten (10) year strategic plan.
I shall give you details when you reply.
Regards,
This is a note to let you know that I've just added the patch titled
usb: cdnsp: fix issue with ZLP - added TD_SIZE = 1
to my usb git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb.git
in the usb-linus branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will hopefully also be merged in Linus's tree for the
next -rc kernel release.
If you have any questions about this process, please let me know.
From 7a21b27aafa3edead79ed97e6f22236be6b9f447 Mon Sep 17 00:00:00 2001
From: Pawel Laszczak <pawell(a)cadence.com>
Date: Tue, 15 Nov 2022 04:22:18 -0500
Subject: usb: cdnsp: fix issue with ZLP - added TD_SIZE = 1
Patch modifies the TD_SIZE in TRB before ZLP TRB.
The TD_SIZE in TRB before ZLP TRB must be set to 1 to force
processing ZLP TRB by controller.
cc: <stable(a)vger.kernel.org>
Fixes: 3d82904559f4 ("usb: cdnsp: cdns3 Add main part of Cadence USBSSP DRD Driver")
Signed-off-by: Pawel Laszczak <pawell(a)cadence.com>
Reviewed-by: Peter Chen <peter.chen(a)kernel.org>
Link: https://lore.kernel.org/r/20221115092218.421267-1-pawell@cadence.com
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/usb/cdns3/cdnsp-ring.c | 14 ++++++++++----
1 file changed, 10 insertions(+), 4 deletions(-)
diff --git a/drivers/usb/cdns3/cdnsp-ring.c b/drivers/usb/cdns3/cdnsp-ring.c
index 04dfcaa08dc4..2f29431f612e 100644
--- a/drivers/usb/cdns3/cdnsp-ring.c
+++ b/drivers/usb/cdns3/cdnsp-ring.c
@@ -1763,10 +1763,15 @@ static u32 cdnsp_td_remainder(struct cdnsp_device *pdev,
int trb_buff_len,
unsigned int td_total_len,
struct cdnsp_request *preq,
- bool more_trbs_coming)
+ bool more_trbs_coming,
+ bool zlp)
{
u32 maxp, total_packet_count;
+ /* Before ZLP driver needs set TD_SIZE = 1. */
+ if (zlp)
+ return 1;
+
/* One TRB with a zero-length data packet. */
if (!more_trbs_coming || (transferred == 0 && trb_buff_len == 0) ||
trb_buff_len == td_total_len)
@@ -1960,7 +1965,8 @@ int cdnsp_queue_bulk_tx(struct cdnsp_device *pdev, struct cdnsp_request *preq)
/* Set the TRB length, TD size, and interrupter fields. */
remainder = cdnsp_td_remainder(pdev, enqd_len, trb_buff_len,
full_len, preq,
- more_trbs_coming);
+ more_trbs_coming,
+ zero_len_trb);
length_field = TRB_LEN(trb_buff_len) | TRB_TD_SIZE(remainder) |
TRB_INTR_TARGET(0);
@@ -2025,7 +2031,7 @@ int cdnsp_queue_ctrl_tx(struct cdnsp_device *pdev, struct cdnsp_request *preq)
if (preq->request.length > 0) {
remainder = cdnsp_td_remainder(pdev, 0, preq->request.length,
- preq->request.length, preq, 1);
+ preq->request.length, preq, 1, 0);
length_field = TRB_LEN(preq->request.length) |
TRB_TD_SIZE(remainder) | TRB_INTR_TARGET(0);
@@ -2226,7 +2232,7 @@ static int cdnsp_queue_isoc_tx(struct cdnsp_device *pdev,
/* Set the TRB length, TD size, & interrupter fields. */
remainder = cdnsp_td_remainder(pdev, running_total,
trb_buff_len, td_len, preq,
- more_trbs_coming);
+ more_trbs_coming, 0);
length_field = TRB_LEN(trb_buff_len) | TRB_INTR_TARGET(0);
--
2.38.1
This is a note to let you know that I've just added the patch titled
usb: cdnsp: Fix issue with Clear Feature Halt Endpoint
to my usb git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb.git
in the usb-linus branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will hopefully also be merged in Linus's tree for the
next -rc kernel release.
If you have any questions about this process, please let me know.
From b25264f22b498dff3fa5c70c9bea840e83fff0d1 Mon Sep 17 00:00:00 2001
From: Pawel Laszczak <pawell(a)cadence.com>
Date: Thu, 10 Nov 2022 01:30:05 -0500
Subject: usb: cdnsp: Fix issue with Clear Feature Halt Endpoint
During handling Clear Halt Endpoint Feature request, driver invokes
Reset Endpoint command. Because this command has some issue with
transition endpoint from Running to Idle state the driver must
stop the endpoint by using Stop Endpoint command.
cc: <stable(a)vger.kernel.org>
Fixes: 3d82904559f4 ("usb: cdnsp: cdns3 Add main part of Cadence USBSSP DRD Driver")
Reviewed-by: Peter Chen <peter.chen(a)kernel.org>
Signed-off-by: Pawel Laszczak <pawell(a)cadence.com>
Link: https://lore.kernel.org/r/20221110063005.370656-1-pawell@cadence.com
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/usb/cdns3/cdnsp-gadget.c | 12 ++++--------
drivers/usb/cdns3/cdnsp-ring.c | 3 ++-
2 files changed, 6 insertions(+), 9 deletions(-)
diff --git a/drivers/usb/cdns3/cdnsp-gadget.c b/drivers/usb/cdns3/cdnsp-gadget.c
index c67715f6f756..f9aa50ff14d4 100644
--- a/drivers/usb/cdns3/cdnsp-gadget.c
+++ b/drivers/usb/cdns3/cdnsp-gadget.c
@@ -600,11 +600,11 @@ int cdnsp_halt_endpoint(struct cdnsp_device *pdev,
trace_cdnsp_ep_halt(value ? "Set" : "Clear");
- if (value) {
- ret = cdnsp_cmd_stop_ep(pdev, pep);
- if (ret)
- return ret;
+ ret = cdnsp_cmd_stop_ep(pdev, pep);
+ if (ret)
+ return ret;
+ if (value) {
if (GET_EP_CTX_STATE(pep->out_ctx) == EP_STATE_STOPPED) {
cdnsp_queue_halt_endpoint(pdev, pep->idx);
cdnsp_ring_cmd_db(pdev);
@@ -613,10 +613,6 @@ int cdnsp_halt_endpoint(struct cdnsp_device *pdev,
pep->ep_state |= EP_HALTED;
} else {
- /*
- * In device mode driver can call reset endpoint command
- * from any endpoint state.
- */
cdnsp_queue_reset_ep(pdev, pep->idx);
cdnsp_ring_cmd_db(pdev);
ret = cdnsp_wait_for_cmd_compl(pdev);
diff --git a/drivers/usb/cdns3/cdnsp-ring.c b/drivers/usb/cdns3/cdnsp-ring.c
index 794e413800ae..04dfcaa08dc4 100644
--- a/drivers/usb/cdns3/cdnsp-ring.c
+++ b/drivers/usb/cdns3/cdnsp-ring.c
@@ -2076,7 +2076,8 @@ int cdnsp_cmd_stop_ep(struct cdnsp_device *pdev, struct cdnsp_ep *pep)
u32 ep_state = GET_EP_CTX_STATE(pep->out_ctx);
int ret = 0;
- if (ep_state == EP_STATE_STOPPED || ep_state == EP_STATE_DISABLED) {
+ if (ep_state == EP_STATE_STOPPED || ep_state == EP_STATE_DISABLED ||
+ ep_state == EP_STATE_HALTED) {
trace_cdnsp_ep_stopped_or_disabled(pep->out_ctx);
goto ep_stopped;
}
--
2.38.1
commit 1980860e0c8299316cddaf0992dd9e1258ec9d88 upstream.
Returning true from handle_rx_dma() without flushing DMA first creates
a data ordering hazard. If DMA Rx has handled any character at the
point when RLSI occurs, the non-DMA path handles any pending characters
jumping them ahead of those characters that are pending under DMA.
Fixes: 75df022b5f89 ("serial: 8250_dma: Fix RX handling")
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen(a)linux.intel.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko(a)linux.intel.com>
Link: https://lore.kernel.org/r/20221108121952.5497-5-ilpo.jarvinen@linux.intel.c…
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/tty/serial/8250/8250_port.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/drivers/tty/serial/8250/8250_port.c b/drivers/tty/serial/8250/8250_port.c
index 97a64bc79350..b47335dfae16 100644
--- a/drivers/tty/serial/8250/8250_port.c
+++ b/drivers/tty/serial/8250/8250_port.c
@@ -1824,10 +1824,9 @@ static bool handle_rx_dma(struct uart_8250_port *up, unsigned int iir)
if (!up->dma->rx_running)
break;
fallthrough;
+ case UART_IIR_RLSI:
case UART_IIR_RX_TIMEOUT:
serial8250_rx_dma_flush(up);
- /* fall-through */
- case UART_IIR_RLSI:
return true;
}
return up->dma->rx_dma(up);
--
2.30.2
We face some regressions on a few IXP42x systems when
accessing flash, the following unrelated error prints
appear from the PCI driver:
ixp4xx-pci c0000000.pci: PCI: abort_handler addr = 0xff9ffb5f,
isr = 0x0, status = 0x22a0
ixp4xx-pci c0000000.pci: imprecise abort
(...)
It turns out that while bit 7 is masked "reserved" it is
not unused, so masking it off as zero is dangerous, and
breaks flash access on some systems such as the NSLU2.
Be more careful and avoid masking off any of the reserved
bits 7, 8, 9 or 30. Only keep masking EXP_WORD (bit 2)
on IXP43x which is necessary in some setups.
Cc: stable(a)vger.kernel.org
Fixes: 1c953bda90ca ("bus: ixp4xx: Add a driver for IXP4xx expansion bus")
Signed-off-by: Linus Walleij <linus.walleij(a)linaro.org>
---
SoC folks: please apply this directly for fixes since it
fixes a regression.
---
drivers/bus/intel-ixp4xx-eb.c | 9 +++------
1 file changed, 3 insertions(+), 6 deletions(-)
diff --git a/drivers/bus/intel-ixp4xx-eb.c b/drivers/bus/intel-ixp4xx-eb.c
index a4388440aca7..91db001eb69a 100644
--- a/drivers/bus/intel-ixp4xx-eb.c
+++ b/drivers/bus/intel-ixp4xx-eb.c
@@ -49,7 +49,7 @@
#define IXP4XX_EXP_SIZE_SHIFT 10
#define IXP4XX_EXP_CNFG_0 BIT(9) /* Always zero */
#define IXP43X_EXP_SYNC_INTEL BIT(8) /* Only on IXP43x */
-#define IXP43X_EXP_EXP_CHIP BIT(7) /* Only on IXP43x */
+#define IXP43X_EXP_EXP_CHIP BIT(7) /* Only on IXP43x, dangerous to touch on IXP42x */
#define IXP4XX_EXP_BYTE_RD16 BIT(6)
#define IXP4XX_EXP_HRDY_POL BIT(5) /* Only on IXP42x */
#define IXP4XX_EXP_MUX_EN BIT(4)
@@ -57,8 +57,6 @@
#define IXP4XX_EXP_WORD BIT(2) /* Always zero */
#define IXP4XX_EXP_WR_EN BIT(1)
#define IXP4XX_EXP_BYTE_EN BIT(0)
-#define IXP42X_RESERVED (BIT(30)|IXP4XX_EXP_CNFG_0|BIT(8)|BIT(7)|IXP4XX_EXP_WORD)
-#define IXP43X_RESERVED (BIT(30)|IXP4XX_EXP_CNFG_0|BIT(5)|IXP4XX_EXP_WORD)
#define IXP4XX_EXP_CNFG0 0x20
#define IXP4XX_EXP_CNFG0_MEM_MAP BIT(31)
@@ -252,10 +250,9 @@ static void ixp4xx_exp_setup_chipselect(struct ixp4xx_eb *eb,
cs_cfg |= val << IXP4XX_EXP_CYC_TYPE_SHIFT;
}
- if (eb->is_42x)
- cs_cfg &= ~IXP42X_RESERVED;
if (eb->is_43x) {
- cs_cfg &= ~IXP43X_RESERVED;
+ /* Should always be zero */
+ cs_cfg &= ~IXP4XX_EXP_WORD;
/*
* This bit for Intel strata flash is currently unused, but let's
* report it if we find one.
--
2.38.1
commit 7090abd6ad0610a144523ce4ffcb8560909bf2a8 upstream.
Configure DMA to use 16B burst size with Elkhart Lake. This makes the
bus use more efficient and works around an issue which occurs with the
previously used 1B.
The fix was initially developed by Srikanth Thokala and Aman Kumar.
This together with the previous config change is the cleaned up version
of the original fix.
Fixes: 0a9410b981e9 ("serial: 8250_lpss: Enable DMA on Intel Elkhart Lake")
Cc: <stable(a)vger.kernel.org> # serial: 8250_lpss: Configure DMA also w/o DMA filter
Reported-by: Wentong Wu <wentong.wu(a)intel.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen(a)linux.intel.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko(a)linux.intel.com>
Link: https://lore.kernel.org/r/20221108121952.5497-4-ilpo.jarvinen@linux.intel.c…
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/tty/serial/8250/8250_lpss.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/drivers/tty/serial/8250/8250_lpss.c b/drivers/tty/serial/8250/8250_lpss.c
index 49ae73f4d3a0..078f12f856f6 100644
--- a/drivers/tty/serial/8250/8250_lpss.c
+++ b/drivers/tty/serial/8250/8250_lpss.c
@@ -177,6 +177,9 @@ static int ehl_serial_setup(struct lpss8250 *lpss, struct uart_port *port)
* matching with the registered General Purpose DMA controllers.
*/
up->dma = dma;
+
+ lpss->dma_maxburst = 16;
+
return 0;
}
--
2.30.2
commit 8cccf05fe857a18ee26e20d11a8455a73ffd4efd upstream.
If a nilfs2 filesystem is downgraded to read-only due to metadata
corruption on disk and is remounted read/write, or if emergency read-only
remount is performed, detaching a log writer and synchronizing the
filesystem can be done at the same time.
In these cases, use-after-free of the log writer (hereinafter
nilfs->ns_writer) can happen as shown in the scenario below:
Task1 Task2
-------------------------------- ------------------------------
nilfs_construct_segment
nilfs_segctor_sync
init_wait
init_waitqueue_entry
add_wait_queue
schedule
nilfs_remount (R/W remount case)
nilfs_attach_log_writer
nilfs_detach_log_writer
nilfs_segctor_destroy
kfree
finish_wait
_raw_spin_lock_irqsave
__raw_spin_lock_irqsave
do_raw_spin_lock
debug_spin_lock_before <-- use-after-free
While Task1 is sleeping, nilfs->ns_writer is freed by Task2. After Task1
waked up, Task1 accesses nilfs->ns_writer which is already freed. This
scenario diagram is based on the Shigeru Yoshida's post [1].
This patch fixes the issue by not detaching nilfs->ns_writer on remount so
that this UAF race doesn't happen. Along with this change, this patch
also inserts a few necessary read-only checks with superblock instance
where only the ns_writer pointer was used to check if the filesystem is
read-only.
Link: https://syzkaller.appspot.com/bug?id=79a4c002e960419ca173d55e863bd09e8112df…
Link: https://lkml.kernel.org/r/20221103141759.1836312-1-syoshida@redhat.com [1]
Link: https://lkml.kernel.org/r/20221104142959.28296-1-konishi.ryusuke@gmail.com
Signed-off-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Reported-by: syzbot+f816fa82f8783f7a02bb(a)syzkaller.appspotmail.com
Reported-by: Shigeru Yoshida <syoshida(a)redhat.com>
Tested-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
Please apply this patch to stable-4.9 instead of the patch that could
not be applied to it last time.
A rejected hunk has been manually resolved, and replaced sb_rdonly()
with its equivalent bitwise operation since the function does not yet
exist in this kernel. Also, retested against that stable tree.
fs/nilfs2/segment.c | 15 ++++++++-------
fs/nilfs2/super.c | 2 --
2 files changed, 8 insertions(+), 9 deletions(-)
diff --git a/fs/nilfs2/segment.c b/fs/nilfs2/segment.c
index 450bfdee513b..86769ecfe61f 100644
--- a/fs/nilfs2/segment.c
+++ b/fs/nilfs2/segment.c
@@ -329,7 +329,7 @@ void nilfs_relax_pressure_in_lock(struct super_block *sb)
struct the_nilfs *nilfs = sb->s_fs_info;
struct nilfs_sc_info *sci = nilfs->ns_writer;
- if (!sci || !sci->sc_flush_request)
+ if (sb->s_flags & MS_RDONLY || unlikely(!sci) || !sci->sc_flush_request)
return;
set_bit(NILFS_SC_PRIOR_FLUSH, &sci->sc_flags);
@@ -2252,7 +2252,7 @@ int nilfs_construct_segment(struct super_block *sb)
struct nilfs_transaction_info *ti;
int err;
- if (!sci)
+ if (sb->s_flags & MS_RDONLY || unlikely(!sci))
return -EROFS;
/* A call inside transactions causes a deadlock. */
@@ -2291,7 +2291,7 @@ int nilfs_construct_dsync_segment(struct super_block *sb, struct inode *inode,
struct nilfs_transaction_info ti;
int err = 0;
- if (!sci)
+ if (sb->s_flags & MS_RDONLY || unlikely(!sci))
return -EROFS;
nilfs_transaction_lock(sb, &ti, 0);
@@ -2788,11 +2788,12 @@ int nilfs_attach_log_writer(struct super_block *sb, struct nilfs_root *root)
if (nilfs->ns_writer) {
/*
- * This happens if the filesystem was remounted
- * read/write after nilfs_error degenerated it into a
- * read-only mount.
+ * This happens if the filesystem is made read-only by
+ * __nilfs_error or nilfs_remount and then remounted
+ * read/write. In these cases, reuse the existing
+ * writer.
*/
- nilfs_detach_log_writer(sb);
+ return 0;
}
nilfs->ns_writer = nilfs_segctor_new(sb, root);
diff --git a/fs/nilfs2/super.c b/fs/nilfs2/super.c
index c95d369e90aa..9e8f186b14d1 100644
--- a/fs/nilfs2/super.c
+++ b/fs/nilfs2/super.c
@@ -1147,8 +1147,6 @@ static int nilfs_remount(struct super_block *sb, int *flags, char *data)
if ((*flags & MS_RDONLY) == (sb->s_flags & MS_RDONLY))
goto out;
if (*flags & MS_RDONLY) {
- /* Shutting down log writer */
- nilfs_detach_log_writer(sb);
sb->s_flags |= MS_RDONLY;
/*
--
2.31.1
From: Keith Busch <kbusch(a)kernel.org>
commit 23e085b2dead13b51fe86d27069895b740f749c0 upstream.
The passthrough commands already have this restriction, but the other
operations do not. Require the same capabilities for all users as all of
these operations, which include resets and rescans, can be disruptive.
Signed-off-by: Keith Busch <kbusch(a)kernel.org>
Signed-off-by: Christoph Hellwig <hch(a)lst.de>
Signed-off-by: Ovidiu Panait <ovidiu.panait(a)windriver.com>
---
These backports are for CVE-2022-3169.
drivers/nvme/host/ioctl.c | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/drivers/nvme/host/ioctl.c b/drivers/nvme/host/ioctl.c
index d3281f87cd6e..a48a79ed5c4c 100644
--- a/drivers/nvme/host/ioctl.c
+++ b/drivers/nvme/host/ioctl.c
@@ -764,11 +764,17 @@ long nvme_dev_ioctl(struct file *file, unsigned int cmd,
case NVME_IOCTL_IO_CMD:
return nvme_dev_user_cmd(ctrl, argp);
case NVME_IOCTL_RESET:
+ if (!capable(CAP_SYS_ADMIN))
+ return -EACCES;
dev_warn(ctrl->device, "resetting controller\n");
return nvme_reset_ctrl_sync(ctrl);
case NVME_IOCTL_SUBSYS_RESET:
+ if (!capable(CAP_SYS_ADMIN))
+ return -EACCES;
return nvme_reset_subsystem(ctrl);
case NVME_IOCTL_RESCAN:
+ if (!capable(CAP_SYS_ADMIN))
+ return -EACCES;
nvme_queue_scan(ctrl);
return 0;
default:
--
2.38.1
From: Keith Busch <kbusch(a)kernel.org>
commit 23e085b2dead13b51fe86d27069895b740f749c0 upstream.
The passthrough commands already have this restriction, but the other
operations do not. Require the same capabilities for all users as all of
these operations, which include resets and rescans, can be disruptive.
Signed-off-by: Keith Busch <kbusch(a)kernel.org>
Signed-off-by: Christoph Hellwig <hch(a)lst.de>
Signed-off-by: Ovidiu Panait <ovidiu.panait(a)windriver.com>
---
These backports are for CVE-2022-3169.
drivers/nvme/host/ioctl.c | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/drivers/nvme/host/ioctl.c b/drivers/nvme/host/ioctl.c
index 22314962842d..7397fad4c96f 100644
--- a/drivers/nvme/host/ioctl.c
+++ b/drivers/nvme/host/ioctl.c
@@ -484,11 +484,17 @@ long nvme_dev_ioctl(struct file *file, unsigned int cmd,
case NVME_IOCTL_IO_CMD:
return nvme_dev_user_cmd(ctrl, argp);
case NVME_IOCTL_RESET:
+ if (!capable(CAP_SYS_ADMIN))
+ return -EACCES;
dev_warn(ctrl->device, "resetting controller\n");
return nvme_reset_ctrl_sync(ctrl);
case NVME_IOCTL_SUBSYS_RESET:
+ if (!capable(CAP_SYS_ADMIN))
+ return -EACCES;
return nvme_reset_subsystem(ctrl);
case NVME_IOCTL_RESCAN:
+ if (!capable(CAP_SYS_ADMIN))
+ return -EACCES;
nvme_queue_scan(ctrl);
return 0;
default:
--
2.38.1
test_bpf tail call tests end up as:
test_bpf: #0 Tail call leaf jited:1 85 PASS
test_bpf: #1 Tail call 2 jited:1 111 PASS
test_bpf: #2 Tail call 3 jited:1 145 PASS
test_bpf: #3 Tail call 4 jited:1 170 PASS
test_bpf: #4 Tail call load/store leaf jited:1 190 PASS
test_bpf: #5 Tail call load/store jited:1
BUG: Unable to handle kernel data access on write at 0xf1b4e000
Faulting instruction address: 0xbe86b710
Oops: Kernel access of bad area, sig: 11 [#1]
BE PAGE_SIZE=4K MMU=Hash PowerMac
Modules linked in: test_bpf(+)
CPU: 0 PID: 97 Comm: insmod Not tainted 6.1.0-rc4+ #195
Hardware name: PowerMac3,1 750CL 0x87210 PowerMac
NIP: be86b710 LR: be857e88 CTR: be86b704
REGS: f1b4df20 TRAP: 0300 Not tainted (6.1.0-rc4+)
MSR: 00009032 <EE,ME,IR,DR,RI> CR: 28008242 XER: 00000000
DAR: f1b4e000 DSISR: 42000000
GPR00: 00000001 f1b4dfe0 c11d2280 00000000 00000000 00000000 00000002 00000000
GPR08: f1b4e000 be86b704 f1b4e000 00000000 00000000 100d816a f2440000 fe73baa8
GPR16: f2458000 00000000 c1941ae4 f1fe2248 00000045 c0de0000 f2458030 00000000
GPR24: 000003e8 0000000f f2458000 f1b4dc90 3e584b46 00000000 f24466a0 c1941a00
NIP [be86b710] 0xbe86b710
LR [be857e88] __run_one+0xec/0x264 [test_bpf]
Call Trace:
[f1b4dfe0] [00000002] 0x2 (unreliable)
Instruction dump:
XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX
XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX
---[ end trace 0000000000000000 ]---
This is a tentative to write above the stack. The problem is encoutered
with tests added by commit 38608ee7b690 ("bpf, tests: Add load store
test case for tail call")
This happens because tail call is done to a BPF prog with a different
stack_depth. At the time being, the stack is kept as is when the caller
tail calls its callee. But at exit, the callee restores the stack based
on its own properties. Therefore here, at each run, r1 is erroneously
increased by 32 - 16 = 16 bytes.
This was done that way in order to pass the tail call count from caller
to callee through the stack. As powerpc32 doesn't have a red zone in
the stack, it was necessary the maintain the stack as is for the tail
call. But it was not anticipated that the BPF frame size could be
different.
Let's take a new approach. Use register r0 to carry the tail call count
during the tail call, and save it into the stack at function entry if
required. That's a deviation from the ppc32 ABI, but after all the way
tail calls are implemented is already not in accordance with the ABI.
With the fix, tail call tests are now successfull:
test_bpf: #0 Tail call leaf jited:1 53 PASS
test_bpf: #1 Tail call 2 jited:1 115 PASS
test_bpf: #2 Tail call 3 jited:1 154 PASS
test_bpf: #3 Tail call 4 jited:1 165 PASS
test_bpf: #4 Tail call load/store leaf jited:1 101 PASS
test_bpf: #5 Tail call load/store jited:1 141 PASS
test_bpf: #6 Tail call error path, max count reached jited:1 994 PASS
test_bpf: #7 Tail call count preserved across function calls jited:1 140975 PASS
test_bpf: #8 Tail call error path, NULL target jited:1 110 PASS
test_bpf: #9 Tail call error path, index out of range jited:1 69 PASS
test_bpf: test_tail_calls: Summary: 10 PASSED, 0 FAILED, [10/10 JIT'ed]
Fixes: 51c66ad849a7 ("powerpc/bpf: Implement extended BPF on PPC32")
Cc: stable(a)vger.kernel.org
Signed-off-by: Christophe Leroy <christophe.leroy(a)csgroup.eu>
---
arch/powerpc/net/bpf_jit_comp32.c | 25 +++++++++++--------------
1 file changed, 11 insertions(+), 14 deletions(-)
diff --git a/arch/powerpc/net/bpf_jit_comp32.c b/arch/powerpc/net/bpf_jit_comp32.c
index 43f1c76d48ce..97e75b8181ca 100644
--- a/arch/powerpc/net/bpf_jit_comp32.c
+++ b/arch/powerpc/net/bpf_jit_comp32.c
@@ -115,21 +115,19 @@ void bpf_jit_build_prologue(u32 *image, struct codegen_context *ctx)
/* First arg comes in as a 32 bits pointer. */
EMIT(PPC_RAW_MR(bpf_to_ppc(BPF_REG_1), _R3));
- EMIT(PPC_RAW_LI(bpf_to_ppc(BPF_REG_1) - 1, 0));
+ EMIT(PPC_RAW_LI(_R0, 0));
+
+#define BPF_TAILCALL_PROLOGUE_SIZE 8
+
EMIT(PPC_RAW_STWU(_R1, _R1, -BPF_PPC_STACKFRAME(ctx)));
/*
- * Initialize tail_call_cnt in stack frame if we do tail calls.
- * Otherwise, put in NOPs so that it can be skipped when we are
- * invoked through a tail call.
+ * Save tail_call_cnt in stack frame if we do tail calls.
*/
if (ctx->seen & SEEN_TAILCALL)
- EMIT(PPC_RAW_STW(bpf_to_ppc(BPF_REG_1) - 1, _R1,
- bpf_jit_stack_offsetof(ctx, BPF_PPC_TC)));
- else
- EMIT(PPC_RAW_NOP());
+ EMIT(PPC_RAW_STW(_R0, _R1, bpf_jit_stack_offsetof(ctx, BPF_PPC_TC)));
-#define BPF_TAILCALL_PROLOGUE_SIZE 16
+ EMIT(PPC_RAW_LI(bpf_to_ppc(BPF_REG_1) - 1, 0));
/*
* We need a stack frame, but we don't necessarily need to
@@ -244,7 +242,6 @@ static int bpf_jit_emit_tail_call(u32 *image, struct codegen_context *ctx, u32 o
EMIT(PPC_RAW_RLWINM(_R3, b2p_index, 2, 0, 29));
EMIT(PPC_RAW_ADD(_R3, _R3, b2p_bpf_array));
EMIT(PPC_RAW_LWZ(_R3, _R3, offsetof(struct bpf_array, ptrs)));
- EMIT(PPC_RAW_STW(_R0, _R1, bpf_jit_stack_offsetof(ctx, BPF_PPC_TC)));
/*
* if (prog == NULL)
@@ -257,20 +254,20 @@ static int bpf_jit_emit_tail_call(u32 *image, struct codegen_context *ctx, u32 o
EMIT(PPC_RAW_LWZ(_R3, _R3, offsetof(struct bpf_prog, bpf_func)));
if (ctx->seen & SEEN_FUNC)
- EMIT(PPC_RAW_LWZ(_R0, _R1, BPF_PPC_STACKFRAME(ctx) + PPC_LR_STKOFF));
+ EMIT(PPC_RAW_LWZ(_R5, _R1, BPF_PPC_STACKFRAME(ctx) + PPC_LR_STKOFF));
EMIT(PPC_RAW_ADDIC(_R3, _R3, BPF_TAILCALL_PROLOGUE_SIZE));
if (ctx->seen & SEEN_FUNC)
- EMIT(PPC_RAW_MTLR(_R0));
+ EMIT(PPC_RAW_MTLR(_R5));
EMIT(PPC_RAW_MTCTR(_R3));
- EMIT(PPC_RAW_MR(_R3, bpf_to_ppc(BPF_REG_1)));
-
/* tear restore NVRs, ... */
bpf_jit_emit_common_epilogue(image, ctx);
+ EMIT(PPC_RAW_ADDI(_R1, _R1, BPF_PPC_STACKFRAME(ctx)));
+
EMIT(PPC_RAW_BCTR());
/* out: */
--
2.38.1
The VM_SOFTDIRTY should be set in the vma flags to be tested if new
allocation should be merged in previous vma or not. With this patch,
the new allocations are merged in the previous VMAs.
I've tested it by reverting the commit 34228d473efe ("mm: ignore
VM_SOFTDIRTY on VMA merging") and after adding this following patch,
I'm seeing that all the new allocations done through mmap() are merged
in the previous VMAs. The number of VMAs doesn't increase drastically
which had contributed to the crash of gimp. If I run the same test after
reverting and not including this patch, the number of VMAs keep on
increasing with every mmap() syscall which proves this patch.
The commit 34228d473efe ("mm: ignore VM_SOFTDIRTY on VMA merging")
seems like a workaround. But it lets the soft-dirty and non-soft-dirty
VMA to get merged. It helps in avoiding the creation of too many VMAs.
But it creates the problem while adding the feature of clearing the
soft-dirty status of only a part of the memory region.
Cc: <stable(a)vger.kernel.org>
Fixes: d9104d1ca966 ("mm: track vma changes with VM_SOFTDIRTY bit")
Signed-off-by: Muhammad Usama Anjum <usama.anjum(a)collabora.com>
---
We need more testing of this patch.
While implementing clear soft-dirty bit for a range of address space, I'm
facing an issue. The non-soft dirty VMA gets merged sometimes with the soft
dirty VMA. Thus the non-soft dirty VMA become dirty which is undesirable.
When discussed with the some other developers they consider it the
regression. Why the non-soft dirty page should appear as soft dirty when it
isn't soft dirty in reality? I agree with them. Should we revert
34228d473efe or find a workaround in the IOCTL?
* Revert may cause the VMAs to expand in uncontrollable situation where the
soft dirty bit of a lot of memory regions or the whole address space is
being cleared again and again. AFAIK normal process must either be only
clearing a few memory regions. So the applications should be okay. There is
still chance of regressions if some applications are already using the
soft-dirty bit. I'm not sure how to test it.
* Add a flag in the IOCTL to ignore the dirtiness of VMA. The user will
surely lose the functionality to detect reused memory regions. But the
extraneous soft-dirty pages would not appear. I'm trying to do this in the
patch series [1]. Some discussion is going on that this fails with some
mprotect use case [2]. I still need to have a look at the mprotect selftest
to see how and why this fails. I think this can be implemented after some
more work probably in mprotect side.
[1] https://lore.kernel.org/all/20221109102303.851281-1-usama.anjum@collabora.c…
[2] https://lore.kernel.org/all/bfcae708-db21-04b4-0bbe-712badd03071@redhat.com/
---
mm/mmap.c | 21 +++++++++++----------
1 file changed, 11 insertions(+), 10 deletions(-)
diff --git a/mm/mmap.c b/mm/mmap.c
index f9b96b387a6f..6934b8f61fdc 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -1708,6 +1708,15 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
vm_flags |= VM_ACCOUNT;
}
+ /*
+ * New (or expanded) vma always get soft dirty status.
+ * Otherwise user-space soft-dirty page tracker won't
+ * be able to distinguish situation when vma area unmapped,
+ * then new mapped in-place (which must be aimed as
+ * a completely new data area).
+ */
+ vm_flags |= VM_SOFTDIRTY;
+
/*
* Can we just expand an old mapping?
*/
@@ -1823,15 +1832,6 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
if (file)
uprobe_mmap(vma);
- /*
- * New (or expanded) vma always get soft dirty status.
- * Otherwise user-space soft-dirty page tracker won't
- * be able to distinguish situation when vma area unmapped,
- * then new mapped in-place (which must be aimed as
- * a completely new data area).
- */
- vma->vm_flags |= VM_SOFTDIRTY;
-
vma_set_page_prot(vma);
return addr;
@@ -2998,6 +2998,8 @@ static int do_brk_flags(unsigned long addr, unsigned long len, unsigned long fla
if (security_vm_enough_memory_mm(mm, len >> PAGE_SHIFT))
return -ENOMEM;
+ flags |= VM_SOFTDIRTY;
+
/* Can we just expand an old private anonymous mapping? */
vma = vma_merge(mm, prev, addr, addr + len, flags,
NULL, NULL, pgoff, NULL, NULL_VM_UFFD_CTX, NULL);
@@ -3026,7 +3028,6 @@ static int do_brk_flags(unsigned long addr, unsigned long len, unsigned long fla
mm->data_vm += len >> PAGE_SHIFT;
if (flags & VM_LOCKED)
mm->locked_vm += (len >> PAGE_SHIFT);
- vma->vm_flags |= VM_SOFTDIRTY;
return 0;
}
--
2.30.2
From: Joe Korty <joe.korty(a)concurrent-rt.com>
The TVAL register is 32 bit signed. Thus only the lower 31 bits are
available to specify when an interrupt is to occur at some time in the
near future. Attempting to specify a larger interval with TVAL results
in a negative time delta which means the timer fires immediately upon
being programmed, rather than firing at that expected future time.
The solution is for Linux to declare that TVAL is a 31 bit register rather
than give its true size of 32 bits. This prevents Linux from programming
TVAL with a too-large value. Note that, prior to 5.16, this little trick
was the standard way to handle TVAL in Linux, so there is nothing new
happening here on that front.
The softlockup detector hides the issue, because it keeps generating
short timer deadlines that are within the scope of the broken timer.
Disable it, and you start using NO_HZ with much longer timer deadlines,
which turns into an interrupt flood:
11: 1124855130 949168462 758009394 76417474 104782230 30210281
310890 1734323687 GICv2 29 Level arch_timer
And "much longer" isn't that long: it takes less than 43s to underflow
TVAL at 50MHz (the frequency of the counter on XGene-1).
Some comments on the v1 version of this patch by Marc Zyngier:
XGene implements CVAL (a 64bit comparator) in terms of TVAL (a countdown
register) instead of the other way around. TVAL being a 32bit register,
the width of the counter should equally be 32. However, TVAL is a
*signed* value, and keeps counting down in the negative range once the
timer fires.
It means that any TVAL value with bit 31 set will fire immediately,
as it cannot be distinguished from an already expired timer. Reducing
the timer range back to a paltry 31 bits papers over the issue.
Another problem cannot be fixed though, which is that the timer interrupt
*must* be handled within the negative countdown period, or the interrupt
will be lost (TVAL will rollover to a positive value, indicative of a
new timer deadline).
Cc: stable(a)vger.kernel.org # 5.16+
Fixes: 012f18850452 ("clocksource/drivers/arm_arch_timer: Work around broken CVAL implementations")
Signed-off-by: Joe Korty <joe.korty(a)concurrent-rt.com>
Reviewed-by: Marc Zyngier <maz(a)kernel.org>
[maz: revamped the commit message]
Signed-off-by: Marc Zyngier <maz(a)kernel.org>
Link: https://lore.kernel.org/r/20221024165422.GA51107@zipoli.concurrent-rt.com
---
Notes:
v3: rejigged commit message along the lines of the v2 review.
drivers/clocksource/arm_arch_timer.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/drivers/clocksource/arm_arch_timer.c b/drivers/clocksource/arm_arch_timer.c
index a7ff77550e17..41323c2bc0b1 100644
--- a/drivers/clocksource/arm_arch_timer.c
+++ b/drivers/clocksource/arm_arch_timer.c
@@ -806,6 +806,9 @@ static u64 __arch_timer_check_delta(void)
/*
* XGene-1 implements CVAL in terms of TVAL, meaning
* that the maximum timer range is 32bit. Shame on them.
+ *
+ * Note that TVAL is signed, thus has only 31 of its
+ * 32 bits to express magnitude.
*/
MIDR_ALL_VERSIONS(MIDR_CPU_MODEL(ARM_CPU_IMP_APM,
APM_CPU_PART_POTENZA)),
@@ -813,8 +816,8 @@ static u64 __arch_timer_check_delta(void)
};
if (is_midr_in_range_list(read_cpuid_id(), broken_cval_midrs)) {
- pr_warn_once("Broken CNTx_CVAL_EL1, limiting width to 32bits");
- return CLOCKSOURCE_MASK(32);
+ pr_warn_once("Broken CNTx_CVAL_EL1, using 32 bit TVAL instead.\n");
+ return CLOCKSOURCE_MASK(31);
}
#endif
return CLOCKSOURCE_MASK(arch_counter_get_width());
--
2.34.1
Förmånstagare
Det finns en utmärkelse i ditt namn från FN och
Världshälsoorganisationen som är ansluten till den internationella
monetära fonden där din e-postadress och din fond lämnades till oss
för din överföring, vänligen bekräfta dina uppgifter för din
överföring.
Vi fick i uppdrag att överföra alla pågående transaktioner inom de
kommande två, men om du har fått din fond vänligen ignorera detta
meddelande, om inte följ detta omedelbart.
Vi behöver ditt brådskande svar på det här meddelandet, det här är
inte en av de där internetbedragarna där ute, det är en
pandemilindring.
Jennifer
Since commit 0f0101719138 ("usb: dwc3: Don't switch OTG -> peripheral if extcon is present")
Dual Role support on Intel Merrifield platform broke due to rearranging
the call to dwc3_get_extcon().
It appears to be caused by ulpi_read_id() on the first test write failing
with -ETIMEDOUT. Currently ulpi_read_id() expects to discover the phy via
DT when the test write fails and returns 0 in that case, even if DT does not
provide the phy. As a result usb probe completes without phy.
On Intel Merrifield the issue is reproducible but difficult to find the
root cause. Using an ordinary kernel and rootfs with buildroot ulpi_read_id()
succeeds. As soon as adding ftrace / bootconfig to find out why,
ulpi_read_id() fails and we can't analyze the flow. Using another rootfs
ulpi_read_id() fails even without adding ftrace. Appearantly the issue is
some kind of timing / race, but merely retrying ulpi_read_id() does not
resolve the issue.
This patch makes ulpi_read_id() return -ETIMEDOUT to its user if the first
test write fails. The user should then handle it appropriately. A follow up
patch will make dwc3_core_init() set -EPROBE_DEFER in this case and bail out.
Fixes: ef6a7bcfb01c ("usb: ulpi: Support device discovery via DT")
Cc: stable(a)vger.kernel.org
Signed-off-by: Ferry Toth <ftoth(a)exalondelft.nl>
---
drivers/usb/common/ulpi.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/usb/common/ulpi.c b/drivers/usb/common/ulpi.c
index d7c8461976ce..60e8174686a1 100644
--- a/drivers/usb/common/ulpi.c
+++ b/drivers/usb/common/ulpi.c
@@ -207,7 +207,7 @@ static int ulpi_read_id(struct ulpi *ulpi)
/* Test the interface */
ret = ulpi_write(ulpi, ULPI_SCRATCH, 0xaa);
if (ret < 0)
- goto err;
+ return ret;
ret = ulpi_read(ulpi, ULPI_SCRATCH);
if (ret < 0)
--
2.37.2
The patch titled
Subject: tools/vm/slabinfo-gnuplot: use "grep -E" instead of "egrep"
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
tools-vm-slabinfo-gnuplot-use-grep-e-instead-of-egrep.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Tiezhu Yang <yangtiezhu(a)loongson.cn>
Subject: tools/vm/slabinfo-gnuplot: use "grep -E" instead of "egrep"
Date: Sat, 19 Nov 2022 10:36:59 +0800
The latest version of grep claims the egrep is now obsolete so the build
now contains warnings that look like:
egrep: warning: egrep is obsolescent; using grep -E
fix this up by moving the related file to use "grep -E" instead.
sed -i "s/egrep/grep -E/g" `grep egrep -rwl tools/vm`
Here are the steps to install the latest grep:
wget http://ftp.gnu.org/gnu/grep/grep-3.8.tar.gz
tar xf grep-3.8.tar.gz
cd grep-3.8 && ./configure && make
sudo make install
export PATH=/usr/local/bin:$PATH
Link: https://lkml.kernel.org/r/1668825419-30584-1-git-send-email-yangtiezhu@loon…
Signed-off-by: Tiezhu Yang <yangtiezhu(a)loongson.cn>
Reviewed-by: Sergey Senozhatsky <senozhatsky(a)chromium.org>
Cc: Vlastimil Babka <vbabka(a)suse.cz>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
tools/vm/slabinfo-gnuplot.sh | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
--- a/tools/vm/slabinfo-gnuplot.sh~tools-vm-slabinfo-gnuplot-use-grep-e-instead-of-egrep
+++ a/tools/vm/slabinfo-gnuplot.sh
@@ -150,7 +150,7 @@ do_preprocess()
let lines=3
out=`basename "$in"`"-slabs-by-loss"
`cat "$in" | grep -A "$lines" 'Slabs sorted by loss' |\
- egrep -iv '\-\-|Name|Slabs'\
+ grep -E -iv '\-\-|Name|Slabs'\
| awk '{print $1" "$4+$2*$3" "$4}' > "$out"`
if [ $? -eq 0 ]; then
do_slabs_plotting "$out"
@@ -159,7 +159,7 @@ do_preprocess()
let lines=3
out=`basename "$in"`"-slabs-by-size"
`cat "$in" | grep -A "$lines" 'Slabs sorted by size' |\
- egrep -iv '\-\-|Name|Slabs'\
+ grep -E -iv '\-\-|Name|Slabs'\
| awk '{print $1" "$4" "$4-$2*$3}' > "$out"`
if [ $? -eq 0 ]; then
do_slabs_plotting "$out"
_
Patches currently in -mm which might be from yangtiezhu(a)loongson.cn are
tools-vm-slabinfo-gnuplot-use-grep-e-instead-of-egrep.patch
The patch titled
Subject: nilfs2: fix NULL pointer dereference in nilfs_palloc_commit_free_entry()
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
nilfs2-fix-null-pointer-dereference-in-nilfs_palloc_commit_free_entry.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: ZhangPeng <zhangpeng362(a)huawei.com>
Subject: nilfs2: fix NULL pointer dereference in nilfs_palloc_commit_free_entry()
Date: Sat, 19 Nov 2022 21:05:42 +0900
Syzbot reported a null-ptr-deref bug:
NILFS (loop0): segctord starting. Construction interval = 5 seconds, CP
frequency < 30 seconds
general protection fault, probably for non-canonical address
0xdffffc0000000002: 0000 [#1] PREEMPT SMP KASAN
KASAN: null-ptr-deref in range [0x0000000000000010-0x0000000000000017]
CPU: 1 PID: 3603 Comm: segctord Not tainted
6.1.0-rc2-syzkaller-00105-gb229b6ca5abb #0
Hardware name: Google Compute Engine/Google Compute Engine, BIOS Google
10/11/2022
RIP: 0010:nilfs_palloc_commit_free_entry+0xe5/0x6b0
fs/nilfs2/alloc.c:608
Code: 00 00 00 00 fc ff df 80 3c 02 00 0f 85 cd 05 00 00 48 b8 00 00 00
00 00 fc ff df 4c 8b 73 08 49 8d 7e 10 48 89 fa 48 c1 ea 03 <80> 3c 02
00 0f 85 26 05 00 00 49 8b 46 10 be a6 00 00 00 48 c7 c7
RSP: 0018:ffffc90003dff830 EFLAGS: 00010212
RAX: dffffc0000000000 RBX: ffff88802594e218 RCX: 000000000000000d
RDX: 0000000000000002 RSI: 0000000000002000 RDI: 0000000000000010
RBP: ffff888071880222 R08: 0000000000000005 R09: 000000000000003f
R10: 000000000000000d R11: 0000000000000000 R12: ffff888071880158
R13: ffff88802594e220 R14: 0000000000000000 R15: 0000000000000004
FS: 0000000000000000(0000) GS:ffff8880b9b00000(0000)
knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fb1c08316a8 CR3: 0000000018560000 CR4: 0000000000350ee0
Call Trace:
<TASK>
nilfs_dat_commit_free fs/nilfs2/dat.c:114 [inline]
nilfs_dat_commit_end+0x464/0x5f0 fs/nilfs2/dat.c:193
nilfs_dat_commit_update+0x26/0x40 fs/nilfs2/dat.c:236
nilfs_btree_commit_update_v+0x87/0x4a0 fs/nilfs2/btree.c:1940
nilfs_btree_commit_propagate_v fs/nilfs2/btree.c:2016 [inline]
nilfs_btree_propagate_v fs/nilfs2/btree.c:2046 [inline]
nilfs_btree_propagate+0xa00/0xd60 fs/nilfs2/btree.c:2088
nilfs_bmap_propagate+0x73/0x170 fs/nilfs2/bmap.c:337
nilfs_collect_file_data+0x45/0xd0 fs/nilfs2/segment.c:568
nilfs_segctor_apply_buffers+0x14a/0x470 fs/nilfs2/segment.c:1018
nilfs_segctor_scan_file+0x3f4/0x6f0 fs/nilfs2/segment.c:1067
nilfs_segctor_collect_blocks fs/nilfs2/segment.c:1197 [inline]
nilfs_segctor_collect fs/nilfs2/segment.c:1503 [inline]
nilfs_segctor_do_construct+0x12fc/0x6af0 fs/nilfs2/segment.c:2045
nilfs_segctor_construct+0x8e3/0xb30 fs/nilfs2/segment.c:2379
nilfs_segctor_thread_construct fs/nilfs2/segment.c:2487 [inline]
nilfs_segctor_thread+0x3c3/0xf30 fs/nilfs2/segment.c:2570
kthread+0x2e4/0x3a0 kernel/kthread.c:376
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306
</TASK>
...
If DAT metadata file is corrupted on disk, there is a case where
req->pr_desc_bh is NULL and blocknr is 0 at nilfs_dat_commit_end() during
a b-tree operation that cascadingly updates ancestor nodes of the b-tree,
because nilfs_dat_commit_alloc() for a lower level block can initialize
the blocknr on the same DAT entry between nilfs_dat_prepare_end() and
nilfs_dat_commit_end().
If this happens, nilfs_dat_commit_end() calls nilfs_dat_commit_free()
without valid buffer heads in req->pr_desc_bh and req->pr_bitmap_bh, and
causes the NULL pointer dereference above in
nilfs_palloc_commit_free_entry() function, which leads to a crash.
Fix this by adding a NULL check on req->pr_desc_bh and req->pr_bitmap_bh
before nilfs_palloc_commit_free_entry() in nilfs_dat_commit_free().
This also calls nilfs_error() in that case to notify that there is a fatal
flaw in the filesystem metadata and prevent further operations.
Link: https://lkml.kernel.org/r/00000000000097c20205ebaea3d6@google.com
Link: https://lkml.kernel.org/r/20221114040441.1649940-1-zhangpeng362@huawei.com
Link: https://lkml.kernel.org/r/20221119120542.17204-1-konishi.ryusuke@gmail.com
Signed-off-by: ZhangPeng <zhangpeng362(a)huawei.com>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Reported-by: syzbot+ebe05ee8e98f755f61d0(a)syzkaller.appspotmail.com
Tested-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/nilfs2/dat.c | 7 +++++++
1 file changed, 7 insertions(+)
--- a/fs/nilfs2/dat.c~nilfs2-fix-null-pointer-dereference-in-nilfs_palloc_commit_free_entry
+++ a/fs/nilfs2/dat.c
@@ -111,6 +111,13 @@ static void nilfs_dat_commit_free(struct
kunmap_atomic(kaddr);
nilfs_dat_commit_entry(dat, req);
+
+ if (unlikely(req->pr_desc_bh == NULL || req->pr_bitmap_bh == NULL)) {
+ nilfs_error(dat->i_sb,
+ "state inconsistency probably due to duplicate use of vblocknr = %llu",
+ (unsigned long long)req->pr_entry_nr);
+ return;
+ }
nilfs_palloc_commit_free_entry(dat, req);
}
_
Patches currently in -mm which might be from zhangpeng362(a)huawei.com are
nilfs2-fix-null-pointer-dereference-in-nilfs_palloc_commit_free_entry.patch
The patch titled
Subject: error-injection: add prompt for function error injection
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
error-injection-add-prompt-for-function-error-injection.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: "Steven Rostedt (Google)" <rostedt(a)goodmis.org>
Subject: error-injection: add prompt for function error injection
Date: Mon, 21 Nov 2022 10:44:03 -0500
The config to be able to inject error codes into any function annotated
with ALLOW_ERROR_INJECTION() is enabled when
CONFIG_FUNCTION_ERROR_INJECTION is enabled. But unfortunately, this is
always enabled on x86 when KPROBES is enabled, and there's no way to turn
it off.
As kprobes is useful for observability of the kernel, it is useful to have
it enabled in production environments. But error injection should be
avoided. Add a prompt to the config to allow it to be disabled even when
kprobes is enabled, and get rid of the "def_bool y".
This is a kernel debug feature (it's in Kconfig.debug), and should have
never been something enabled by default.
Link: https://lkml.kernel.org/r/20221121104403.1545f9b5@gandalf.local.home
Fixes: 540adea3809f6 ("error-injection: Separate error-injection from kprobe")
Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
Acked-by: Borislav Petkov <bp(a)suse.de>
Cc: Alexei Starovoitov <alexei.starovoitov(a)gmail.com>
Cc: Christoph Hellwig <hch(a)infradead.org>
Cc: Florent Revest <revest(a)chromium.org>
Cc: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Cc: Josh Poimboeuf <jpoimboe(a)redhat.com>
Cc: Kees Cook <keescook(a)chromium.org>
Cc: KP Singh <kpsingh(a)kernel.org>
Cc: Mark Rutland <mark.rutland(a)arm.com>
Cc: Masami Hiramatsu (Google) <mhiramat(a)kernel.org>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
lib/Kconfig.debug | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)
--- a/lib/Kconfig.debug~error-injection-add-prompt-for-function-error-injection
+++ a/lib/Kconfig.debug
@@ -1874,8 +1874,14 @@ config NETDEV_NOTIFIER_ERROR_INJECT
If unsure, say N.
config FUNCTION_ERROR_INJECTION
- def_bool y
+ bool "Fault-injections of functions"
depends on HAVE_FUNCTION_ERROR_INJECTION && KPROBES
+ help
+ Add fault injections into various functions that are annotated with
+ ALLOW_ERROR_INJECTION() in the kernel. BPF may also modify the return
+ value of theses functions. This is useful to test error paths of code.
+
+ If unsure, say N
config FAULT_INJECTION
bool "Fault-injection framework"
_
Patches currently in -mm which might be from rostedt(a)goodmis.org are
error-injection-add-prompt-for-function-error-injection.patch