On 4/10/2023 12:03 AM, Mario Limonciello wrote:
> On 3/20/23 04:32, Kornel Dulęba wrote:
>
>> This fixes a similar problem to the one observed in:
>> commit 4e5a04be88fe ("pinctrl: amd: disable and mask interrupts on
>> probe").
>>
>> On some systems, during suspend/resume cycle firmware leaves
>> an interrupt enabled on a pin that is not used by the kernel.
>> This confuses the AMD pinctrl driver and causes spurious interrupts.
>>
>> The driver already has logic to detect if a pin is used by the kernel.
>> Leverage it to re-initialize interrupt fields of a pin only if it's not
>> used by us.
>>
>> Signed-off-by: Kornel Dulęba <korneld(a)chromium.org>
>> ---
>> drivers/pinctrl/pinctrl-amd.c | 36 +++++++++++++++++++----------------
>> 1 file changed, 20 insertions(+), 16 deletions(-)
>>
>> diff --git a/drivers/pinctrl/pinctrl-amd.c
>> b/drivers/pinctrl/pinctrl-amd.c
>> index 9236a132c7ba..609821b756c2 100644
>> --- a/drivers/pinctrl/pinctrl-amd.c
>> +++ b/drivers/pinctrl/pinctrl-amd.c
>> @@ -872,32 +872,34 @@ static const struct pinconf_ops amd_pinconf_ops
>> = {
>> .pin_config_group_set = amd_pinconf_group_set,
>> };
>> -static void amd_gpio_irq_init(struct amd_gpio *gpio_dev)
>> +static void amd_gpio_irq_init_pin(struct amd_gpio *gpio_dev, int pin)
>> {
>> - struct pinctrl_desc *desc = gpio_dev->pctrl->desc;
>> + const struct pin_desc *pd;
>> unsigned long flags;
>> u32 pin_reg, mask;
>> - int i;
>> mask = BIT(WAKE_CNTRL_OFF_S0I3) | BIT(WAKE_CNTRL_OFF_S3) |
>> BIT(INTERRUPT_MASK_OFF) | BIT(INTERRUPT_ENABLE_OFF) |
>> BIT(WAKE_CNTRL_OFF_S4);
>> - for (i = 0; i < desc->npins; i++) {
>> - int pin = desc->pins[i].number;
>> - const struct pin_desc *pd = pin_desc_get(gpio_dev->pctrl, pin);
>> -
>> - if (!pd)
>> - continue;
>> + pd = pin_desc_get(gpio_dev->pctrl, pin);
>> + if (!pd)
>> + return;
>> - raw_spin_lock_irqsave(&gpio_dev->lock, flags);
>> + raw_spin_lock_irqsave(&gpio_dev->lock, flags);
>> + pin_reg = readl(gpio_dev->base + pin * 4);
>> + pin_reg &= ~mask;
>> + writel(pin_reg, gpio_dev->base + pin * 4);
>> + raw_spin_unlock_irqrestore(&gpio_dev->lock, flags);
>> +}
>> - pin_reg = readl(gpio_dev->base + i * 4);
>> - pin_reg &= ~mask;
>> - writel(pin_reg, gpio_dev->base + i * 4);
>> +static void amd_gpio_irq_init(struct amd_gpio *gpio_dev)
>> +{
>> + struct pinctrl_desc *desc = gpio_dev->pctrl->desc;
>> + int i;
>> - raw_spin_unlock_irqrestore(&gpio_dev->lock, flags);
>> - }
>> + for (i = 0; i < desc->npins; i++)
>> + amd_gpio_irq_init_pin(gpio_dev, i);
>> }
>> #ifdef CONFIG_PM_SLEEP
>> @@ -950,8 +952,10 @@ static int amd_gpio_resume(struct device *dev)
>> for (i = 0; i < desc->npins; i++) {
>> int pin = desc->pins[i].number;
>> - if (!amd_gpio_should_save(gpio_dev, pin))
>> + if (!amd_gpio_should_save(gpio_dev, pin)) {
>> + amd_gpio_irq_init_pin(gpio_dev, pin);
>> continue;
>> + }
>> raw_spin_lock_irqsave(&gpio_dev->lock, flags);
>> gpio_dev->saved_regs[i] |= readl(gpio_dev->base + pin * 4)
>> & PIN_IRQ_PENDING;
>
> Hello Kornel,
>
> I've found that this commit which was included in 6.3-rc5 is causing a
> regression waking up from lid on a Lenovo Z13.
observed "unable to wake from power button" on AMD based Dell platform.
Reverting "pinctrl: amd: Disable and mask interrupts on resume" on the
top of 6.3-rc6 does fix the issue.
>
> Reverting it on top of 6.3-rc6 resolves the problem.
>
> I've collected what I can into this bug report:
>
> https://bugzilla.kernel.org/show_bug.cgi?id=217315
>
> Linus Walleij,
>
> It looks like this was CC to stable. If we can't get a quick solution
> we might want to pull this from stable.
this commit landed into 6.1.23 as well
d9c63daa576b2 pinctrl: amd: Disable and mask interrupts on resume
>
> Thanks,
>
>
Regards,
Richard
Greg,
Here are the backports of the patches for the maple tree fixes from
6.3-rc6 for 6.1 and 6.2.
The patches differ with a few extra patches for the maple tree, and
changes to the mm code patch to work without the vma iterator change.
Liam R. Howlett (14):
maple_tree: remove GFP_ZERO from kmem_cache_alloc() and
kmem_cache_alloc_bulk()
maple_tree: fix potential rcu issue
maple_tree: reduce user error potential
maple_tree: fix handle of invalidated state in mas_wr_store_setup()
maple_tree: fix mas_prev() and mas_find() state handling
maple_tree: fix mas_skip_node() end slot detection
maple_tree: be more cautious about dead nodes
maple_tree: refine ma_state init from mas_start()
maple_tree: detect dead nodes in mas_start()
maple_tree: fix freeing of nodes in rcu mode
maple_tree: remove extra smp_wmb() from mas_dead_leaves()
maple_tree: add smp_rmb() to dead node detection
maple_tree: add RCU lock checking to rcu callback functions
mm: enable maple tree RCU mode by default.
include/linux/mm_types.h | 3 +-
kernel/fork.c | 3 +
lib/maple_tree.c | 389 ++++++++++++++++++++-----------
mm/mmap.c | 3 +-
tools/testing/radix-tree/maple.c | 18 +-
5 files changed, 263 insertions(+), 153 deletions(-)
--
2.39.2
The patch below does not apply to the 6.2-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.2.y
git checkout FETCH_HEAD
git cherry-pick -x 7c7b962938ddda6a9cd095de557ee5250706ea88
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023041153-unlikable-steam-cf2b@gregkh' --subject-prefix 'PATCH 6.2.y' HEAD^..
Possible dependencies:
7c7b962938dd ("mm: take a page reference when removing device exclusive entries")
7d4a8be0c4b2 ("mm/mmu_notifier: remove unused mmu_notifier_range_update_to_read_only export")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 7c7b962938ddda6a9cd095de557ee5250706ea88 Mon Sep 17 00:00:00 2001
From: Alistair Popple <apopple(a)nvidia.com>
Date: Thu, 30 Mar 2023 12:25:19 +1100
Subject: [PATCH] mm: take a page reference when removing device exclusive
entries
Device exclusive page table entries are used to prevent CPU access to a
page whilst it is being accessed from a device. Typically this is used to
implement atomic operations when the underlying bus does not support
atomic access. When a CPU thread encounters a device exclusive entry it
locks the page and restores the original entry after calling mmu notifiers
to signal drivers that exclusive access is no longer available.
The device exclusive entry holds a reference to the page making it safe to
access the struct page whilst the entry is present. However the fault
handling code does not hold the PTL when taking the page lock. This means
if there are multiple threads faulting concurrently on the device
exclusive entry one will remove the entry whilst others will wait on the
page lock without holding a reference.
This can lead to threads locking or waiting on a folio with a zero
refcount. Whilst mmap_lock prevents the pages getting freed via munmap()
they may still be freed by a migration. This leads to warnings such as
PAGE_FLAGS_CHECK_AT_FREE due to the page being locked when the refcount
drops to zero.
Fix this by trying to take a reference on the folio before locking it.
The code already checks the PTE under the PTL and aborts if the entry is
no longer there. It is also possible the folio has been unmapped, freed
and re-allocated allowing a reference to be taken on an unrelated folio.
This case is also detected by the PTE check and the folio is unlocked
without further changes.
Link: https://lkml.kernel.org/r/20230330012519.804116-1-apopple@nvidia.com
Fixes: b756a3b5e7ea ("mm: device exclusive memory access")
Signed-off-by: Alistair Popple <apopple(a)nvidia.com>
Reviewed-by: Ralph Campbell <rcampbell(a)nvidia.com>
Reviewed-by: John Hubbard <jhubbard(a)nvidia.com>
Acked-by: David Hildenbrand <david(a)redhat.com>
Cc: Matthew Wilcox (Oracle) <willy(a)infradead.org>
Cc: Christoph Hellwig <hch(a)infradead.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/mm/memory.c b/mm/memory.c
index f456f3b5049c..01a23ad48a04 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -3563,8 +3563,21 @@ static vm_fault_t remove_device_exclusive_entry(struct vm_fault *vmf)
struct vm_area_struct *vma = vmf->vma;
struct mmu_notifier_range range;
- if (!folio_lock_or_retry(folio, vma->vm_mm, vmf->flags))
+ /*
+ * We need a reference to lock the folio because we don't hold
+ * the PTL so a racing thread can remove the device-exclusive
+ * entry and unmap it. If the folio is free the entry must
+ * have been removed already. If it happens to have already
+ * been re-allocated after being freed all we do is lock and
+ * unlock it.
+ */
+ if (!folio_try_get(folio))
+ return 0;
+
+ if (!folio_lock_or_retry(folio, vma->vm_mm, vmf->flags)) {
+ folio_put(folio);
return VM_FAULT_RETRY;
+ }
mmu_notifier_range_init_owner(&range, MMU_NOTIFY_EXCLUSIVE, 0,
vma->vm_mm, vmf->address & PAGE_MASK,
(vmf->address & PAGE_MASK) + PAGE_SIZE, NULL);
@@ -3577,6 +3590,7 @@ static vm_fault_t remove_device_exclusive_entry(struct vm_fault *vmf)
pte_unmap_unlock(vmf->pte, vmf->ptl);
folio_unlock(folio);
+ folio_put(folio);
mmu_notifier_invalidate_range_end(&range);
return 0;
This is a note to let you know that I've just added the patch titled
iio: addac: stx104: Fix race condition when converting
to my char-misc git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc.git
in the char-misc-testing branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will be merged to the char-misc-next branch sometime soon,
after it passes testing, and the merge window is open.
If you have any questions about this process, please let me know.
From 4f9b80aefb9e2f542a49d9ec087cf5919730e1dd Mon Sep 17 00:00:00 2001
From: William Breathitt Gray <william.gray(a)linaro.org>
Date: Thu, 6 Apr 2023 10:40:11 -0400
Subject: iio: addac: stx104: Fix race condition when converting
analog-to-digital
The ADC conversion procedure requires several device I/O operations
performed in a particular sequence. If stx104_read_raw() is called
concurrently, the ADC conversion procedure could be clobbered. Prevent
such a race condition by utilizing a mutex.
Fixes: 4075a283ae83 ("iio: stx104: Add IIO support for the ADC channels")
Signed-off-by: William Breathitt Gray <william.gray(a)linaro.org>
Link: https://lore.kernel.org/r/2ae5e40eed5006ca735e4c12181a9ff5ced65547.16807905…
Cc: <Stable(a)vger.kernel.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron(a)huawei.com>
---
drivers/iio/addac/stx104.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/drivers/iio/addac/stx104.c b/drivers/iio/addac/stx104.c
index 4239aafe42fc..8730b79e921c 100644
--- a/drivers/iio/addac/stx104.c
+++ b/drivers/iio/addac/stx104.c
@@ -117,6 +117,8 @@ static int stx104_read_raw(struct iio_dev *indio_dev,
return IIO_VAL_INT;
}
+ mutex_lock(&priv->lock);
+
/* select ADC channel */
iowrite8(chan->channel | (chan->channel << 4), ®->achan);
@@ -127,6 +129,8 @@ static int stx104_read_raw(struct iio_dev *indio_dev,
while (ioread8(®->cir_asr) & BIT(7));
*val = ioread16(®->ssr_ad);
+
+ mutex_unlock(&priv->lock);
return IIO_VAL_INT;
case IIO_CHAN_INFO_OFFSET:
/* get ADC bipolar/unipolar configuration */
--
2.40.0
This is a note to let you know that I've just added the patch titled
iio: addac: stx104: Fix race condition for stx104_write_raw()
to my char-misc git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc.git
in the char-misc-testing branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will be merged to the char-misc-next branch sometime soon,
after it passes testing, and the merge window is open.
If you have any questions about this process, please let me know.
From 9740827468cea80c42db29e7171a50e99acf7328 Mon Sep 17 00:00:00 2001
From: William Breathitt Gray <william.gray(a)linaro.org>
Date: Thu, 6 Apr 2023 10:40:10 -0400
Subject: iio: addac: stx104: Fix race condition for stx104_write_raw()
The priv->chan_out_states array and actual DAC value can become
mismatched if stx104_write_raw() is called concurrently. Prevent such a
race condition by utilizing a mutex.
Fixes: 97a445dad37a ("iio: Add IIO support for the DAC on the Apex Embedded Systems STX104")
Signed-off-by: William Breathitt Gray <william.gray(a)linaro.org>
Link: https://lore.kernel.org/r/c95c9a77fcef36b2a052282146950f23bbc1ebdc.16807905…
Cc: <Stable(a)vger.kernel.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron(a)huawei.com>
---
drivers/iio/addac/stx104.c | 8 ++++++++
1 file changed, 8 insertions(+)
diff --git a/drivers/iio/addac/stx104.c b/drivers/iio/addac/stx104.c
index e45b70aa5bb7..4239aafe42fc 100644
--- a/drivers/iio/addac/stx104.c
+++ b/drivers/iio/addac/stx104.c
@@ -15,6 +15,7 @@
#include <linux/kernel.h>
#include <linux/module.h>
#include <linux/moduleparam.h>
+#include <linux/mutex.h>
#include <linux/spinlock.h>
#include <linux/types.h>
@@ -69,10 +70,12 @@ struct stx104_reg {
/**
* struct stx104_iio - IIO device private data structure
+ * @lock: synchronization lock to prevent I/O race conditions
* @chan_out_states: channels' output states
* @reg: I/O address offset for the device registers
*/
struct stx104_iio {
+ struct mutex lock;
unsigned int chan_out_states[STX104_NUM_OUT_CHAN];
struct stx104_reg __iomem *reg;
};
@@ -178,9 +181,12 @@ static int stx104_write_raw(struct iio_dev *indio_dev,
if ((unsigned int)val > 65535)
return -EINVAL;
+ mutex_lock(&priv->lock);
+
priv->chan_out_states[chan->channel] = val;
iowrite16(val, &priv->reg->dac[chan->channel]);
+ mutex_unlock(&priv->lock);
return 0;
}
return -EINVAL;
@@ -351,6 +357,8 @@ static int stx104_probe(struct device *dev, unsigned int id)
indio_dev->name = dev_name(dev);
+ mutex_init(&priv->lock);
+
/* configure device for software trigger operation */
iowrite8(0, &priv->reg->acr);
--
2.40.0
From: Paolo Bonzini <pbonzini(a)redhat.com>
commit 112e66017bff7f2837030f34c2bc19501e9212d5 upstream.
The effective values of the guest CR0 and CR4 registers may differ from
those included in the VMCS12. In particular, disabling EPT forces
CR4.PAE=1 and disabling unrestricted guest mode forces CR0.PG=CR0.PE=1.
Therefore, checks on these bits cannot be delegated to the processor
and must be performed by KVM.
Reported-by: Reima ISHII <ishiir(a)g.ecc.u-tokyo.ac.jp>
Cc: stable(a)vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
[OP: drop CC() macro calls, as tracing is not implemented in 4.19]
[OP: adjust "return -EINVAL" -> "return 1" to match current return logic]
Signed-off-by: Ovidiu Panait <ovidiu.panait(a)windriver.com>
---
arch/x86/kvm/vmx/vmx.c | 10 ++++++++--
1 file changed, 8 insertions(+), 2 deletions(-)
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index ec821a5d131a..265e70b0eb79 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -12752,7 +12752,7 @@ static int nested_vmx_check_vmcs_link_ptr(struct kvm_vcpu *vcpu,
static int check_vmentry_postreqs(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12,
u32 *exit_qual)
{
- bool ia32e;
+ bool ia32e = !!(vmcs12->vm_entry_controls & VM_ENTRY_IA32E_MODE);
*exit_qual = ENTRY_FAIL_DEFAULT;
@@ -12765,6 +12765,13 @@ static int check_vmentry_postreqs(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12,
return 1;
}
+ if ((vmcs12->guest_cr0 & (X86_CR0_PG | X86_CR0_PE)) == X86_CR0_PG)
+ return 1;
+
+ if ((ia32e && !(vmcs12->guest_cr4 & X86_CR4_PAE)) ||
+ (ia32e && !(vmcs12->guest_cr0 & X86_CR0_PG)))
+ return 1;
+
/*
* If the load IA32_EFER VM-entry control is 1, the following checks
* are performed on the field for the IA32_EFER MSR:
@@ -12776,7 +12783,6 @@ static int check_vmentry_postreqs(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12,
*/
if (to_vmx(vcpu)->nested.nested_run_pending &&
(vmcs12->vm_entry_controls & VM_ENTRY_LOAD_IA32_EFER)) {
- ia32e = (vmcs12->vm_entry_controls & VM_ENTRY_IA32E_MODE) != 0;
if (!kvm_valid_efer(vcpu, vmcs12->guest_ia32_efer) ||
ia32e != !!(vmcs12->guest_ia32_efer & EFER_LMA) ||
((vmcs12->guest_cr0 & X86_CR0_PG) &&
--
2.39.1
As per the RZ/G2L users hardware manual (Rev.1.20 Sep, 2022), section
23.3.7 Serial Data Transmission (Asynchronous Mode) it is mentioned
that the TE (transmit enable) must be set after setting TIE (transmit
interrupt enable) or these 2 bits are set to 1 simultaneously by a
single instruction. So set these 2 bits in single instruction.
Fixes: 93bcefd4c6ba ("serial: sh-sci: Fix setting SCSCR_TIE while transferring data")
Cc: stable(a)vger.kernel.org
Signed-off-by: Biju Das <biju.das.jz(a)bp.renesas.com>
---
v3:
* New patch moved here from Renesas SCI fixes patch series
* Updated commit description
* Moved handling of clearing TE bit as separate patch#5.
---
drivers/tty/serial/sh-sci.c | 19 +++++++++++++++++--
1 file changed, 17 insertions(+), 2 deletions(-)
diff --git a/drivers/tty/serial/sh-sci.c b/drivers/tty/serial/sh-sci.c
index 15743c2f3d3d..32f5c1f7d697 100644
--- a/drivers/tty/serial/sh-sci.c
+++ b/drivers/tty/serial/sh-sci.c
@@ -601,6 +601,15 @@ static void sci_start_tx(struct uart_port *port)
port->type == PORT_SCIFA || port->type == PORT_SCIFB) {
/* Set TIE (Transmit Interrupt Enable) bit in SCSCR */
ctrl = serial_port_in(port, SCSCR);
+
+ /*
+ * For SCI, TE (transmit enable) must be set after setting TIE
+ * (transmit interrupt enable) or in the same instruction to start
+ * the transmit process.
+ */
+ if (port->type == PORT_SCI)
+ ctrl |= SCSCR_TE;
+
serial_port_out(port, SCSCR, ctrl | SCSCR_TIE);
}
}
@@ -2600,8 +2609,14 @@ static void sci_set_termios(struct uart_port *port, struct ktermios *termios,
sci_set_mctrl(port, port->mctrl);
}
- scr_val |= SCSCR_RE | SCSCR_TE |
- (s->cfg->scscr & ~(SCSCR_CKE1 | SCSCR_CKE0));
+ /*
+ * For SCI, TE (transmit enable) must be set after setting TIE
+ * (transmit interrupt enable) or in the same instruction to
+ * start the transmitting process. So skip setting TE here for SCI.
+ */
+ if (port->type != PORT_SCI)
+ scr_val |= SCSCR_TE;
+ scr_val |= SCSCR_RE | (s->cfg->scscr & ~(SCSCR_CKE1 | SCSCR_CKE0));
serial_port_out(port, SCSCR, scr_val | s->hscif_tot);
if ((srr + 1 == 5) &&
(port->type == PORT_SCIFA || port->type == PORT_SCIFB)) {
--
2.25.1
Hi Dave,
Please pull this branch with changes for xfs.
As usual, I did a test-merge with the main upstream branch as of a few
minutes ago, and didn't see any conflicts. Please let me know if you
encounter any problems.
--D
The following changes since commit 7ba83850ca2691865713b307ed001bde5fddb084:
xfs: deprecate the ascii-ci feature (2023-04-11 19:05:19 -0700)
are available in the Git repository at:
git://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git tags/fix-bugs-6.4_2023-04-11
for you to fetch changes up to 4fe9cd8a34a1f934e5f936d4245e19300b52d440:
xfs: fix BUG_ON in xfs_getbmap() (2023-04-11 19:48:08 -0700)
----------------------------------------------------------------
xfs: rollup of various bug fixes
This is a rollup of miscellaneous bug fixes.
Signed-off-by: Darrick J. Wong <djwong(a)kernel.org>
----------------------------------------------------------------
Darrick J. Wong (2):
xfs: _{attr,data}_map_shared should take ILOCK_EXCL until iread_extents is completely done
xfs: verify buffer contents when we skip log replay
Dave Chinner (2):
xfs: remove WARN when dquot cache insertion fails
xfs: don't consider future format versions valid
Ye Bin (1):
xfs: fix BUG_ON in xfs_getbmap()
fs/xfs/libxfs/xfs_bmap.c | 6 ++++++
fs/xfs/libxfs/xfs_inode_fork.c | 16 +++++++++++++++-
fs/xfs/libxfs/xfs_inode_fork.h | 6 ++++--
fs/xfs/libxfs/xfs_sb.c | 11 ++++++-----
fs/xfs/xfs_bmap_util.c | 14 ++++++--------
fs/xfs/xfs_buf_item_recover.c | 10 ++++++++++
fs/xfs/xfs_dquot.c | 1 -
7 files changed, 47 insertions(+), 17 deletions(-)
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x 7c7b962938ddda6a9cd095de557ee5250706ea88
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023041157-hyperlink-prognosis-e6db@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
7c7b962938dd ("mm: take a page reference when removing device exclusive entries")
7d4a8be0c4b2 ("mm/mmu_notifier: remove unused mmu_notifier_range_update_to_read_only export")
369258ce41c6 ("hugetlb: remove duplicate mmu notifications")
f268f6cf875f ("mm/khugepaged: invoke MMU notifiers in shmem/file collapse paths")
2ba99c5e0881 ("mm/khugepaged: fix GUP-fast interaction by sending IPI")
8d3c106e19e8 ("mm/khugepaged: take the right locks for page table retraction")
21b85b09527c ("madvise: use zap_page_range_single for madvise dontneed")
131a79b474e9 ("hugetlb: fix vma lock handling during split vma and range unmapping")
34488399fa08 ("mm/madvise: add file and shmem support to MADV_COLLAPSE")
58ac9a8993a1 ("mm/khugepaged: attempt to map file/shmem-backed pte-mapped THPs by pmds")
780a4b6fb865 ("mm/khugepaged: check compound_order() in collapse_pte_mapped_thp()")
40549ba8f8e0 ("hugetlb: use new vma_lock for pmd sharing synchronization")
378397ccb8e5 ("hugetlb: create hugetlb_unmap_file_folio to unmap single file folio")
8d9bfb260814 ("hugetlb: add vma based lock for pmd sharing")
12710fd69634 ("hugetlb: rename vma_shareable() and refactor code")
c86272287bc6 ("hugetlb: create remove_inode_single_folio to remove single file folio")
7e1813d48dd3 ("hugetlb: rename remove_huge_page to hugetlb_delete_from_page_cache")
3a47c54f09c4 ("hugetlbfs: revert use i_mmap_rwsem for more pmd sharing synchronization")
188a39725ad7 ("hugetlbfs: revert use i_mmap_rwsem to address page fault/truncate race")
19672a9e4a75 ("mm: convert lock_page_or_retry() to folio_lock_or_retry()")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 7c7b962938ddda6a9cd095de557ee5250706ea88 Mon Sep 17 00:00:00 2001
From: Alistair Popple <apopple(a)nvidia.com>
Date: Thu, 30 Mar 2023 12:25:19 +1100
Subject: [PATCH] mm: take a page reference when removing device exclusive
entries
Device exclusive page table entries are used to prevent CPU access to a
page whilst it is being accessed from a device. Typically this is used to
implement atomic operations when the underlying bus does not support
atomic access. When a CPU thread encounters a device exclusive entry it
locks the page and restores the original entry after calling mmu notifiers
to signal drivers that exclusive access is no longer available.
The device exclusive entry holds a reference to the page making it safe to
access the struct page whilst the entry is present. However the fault
handling code does not hold the PTL when taking the page lock. This means
if there are multiple threads faulting concurrently on the device
exclusive entry one will remove the entry whilst others will wait on the
page lock without holding a reference.
This can lead to threads locking or waiting on a folio with a zero
refcount. Whilst mmap_lock prevents the pages getting freed via munmap()
they may still be freed by a migration. This leads to warnings such as
PAGE_FLAGS_CHECK_AT_FREE due to the page being locked when the refcount
drops to zero.
Fix this by trying to take a reference on the folio before locking it.
The code already checks the PTE under the PTL and aborts if the entry is
no longer there. It is also possible the folio has been unmapped, freed
and re-allocated allowing a reference to be taken on an unrelated folio.
This case is also detected by the PTE check and the folio is unlocked
without further changes.
Link: https://lkml.kernel.org/r/20230330012519.804116-1-apopple@nvidia.com
Fixes: b756a3b5e7ea ("mm: device exclusive memory access")
Signed-off-by: Alistair Popple <apopple(a)nvidia.com>
Reviewed-by: Ralph Campbell <rcampbell(a)nvidia.com>
Reviewed-by: John Hubbard <jhubbard(a)nvidia.com>
Acked-by: David Hildenbrand <david(a)redhat.com>
Cc: Matthew Wilcox (Oracle) <willy(a)infradead.org>
Cc: Christoph Hellwig <hch(a)infradead.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/mm/memory.c b/mm/memory.c
index f456f3b5049c..01a23ad48a04 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -3563,8 +3563,21 @@ static vm_fault_t remove_device_exclusive_entry(struct vm_fault *vmf)
struct vm_area_struct *vma = vmf->vma;
struct mmu_notifier_range range;
- if (!folio_lock_or_retry(folio, vma->vm_mm, vmf->flags))
+ /*
+ * We need a reference to lock the folio because we don't hold
+ * the PTL so a racing thread can remove the device-exclusive
+ * entry and unmap it. If the folio is free the entry must
+ * have been removed already. If it happens to have already
+ * been re-allocated after being freed all we do is lock and
+ * unlock it.
+ */
+ if (!folio_try_get(folio))
+ return 0;
+
+ if (!folio_lock_or_retry(folio, vma->vm_mm, vmf->flags)) {
+ folio_put(folio);
return VM_FAULT_RETRY;
+ }
mmu_notifier_range_init_owner(&range, MMU_NOTIFY_EXCLUSIVE, 0,
vma->vm_mm, vmf->address & PAGE_MASK,
(vmf->address & PAGE_MASK) + PAGE_SIZE, NULL);
@@ -3577,6 +3590,7 @@ static vm_fault_t remove_device_exclusive_entry(struct vm_fault *vmf)
pte_unmap_unlock(vmf->pte, vmf->ptl);
folio_unlock(folio);
+ folio_put(folio);
mmu_notifier_invalidate_range_end(&range);
return 0;
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x 7c7b962938ddda6a9cd095de557ee5250706ea88
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023041155-spiny-taking-d67f@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
7c7b962938dd ("mm: take a page reference when removing device exclusive entries")
7d4a8be0c4b2 ("mm/mmu_notifier: remove unused mmu_notifier_range_update_to_read_only export")
369258ce41c6 ("hugetlb: remove duplicate mmu notifications")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 7c7b962938ddda6a9cd095de557ee5250706ea88 Mon Sep 17 00:00:00 2001
From: Alistair Popple <apopple(a)nvidia.com>
Date: Thu, 30 Mar 2023 12:25:19 +1100
Subject: [PATCH] mm: take a page reference when removing device exclusive
entries
Device exclusive page table entries are used to prevent CPU access to a
page whilst it is being accessed from a device. Typically this is used to
implement atomic operations when the underlying bus does not support
atomic access. When a CPU thread encounters a device exclusive entry it
locks the page and restores the original entry after calling mmu notifiers
to signal drivers that exclusive access is no longer available.
The device exclusive entry holds a reference to the page making it safe to
access the struct page whilst the entry is present. However the fault
handling code does not hold the PTL when taking the page lock. This means
if there are multiple threads faulting concurrently on the device
exclusive entry one will remove the entry whilst others will wait on the
page lock without holding a reference.
This can lead to threads locking or waiting on a folio with a zero
refcount. Whilst mmap_lock prevents the pages getting freed via munmap()
they may still be freed by a migration. This leads to warnings such as
PAGE_FLAGS_CHECK_AT_FREE due to the page being locked when the refcount
drops to zero.
Fix this by trying to take a reference on the folio before locking it.
The code already checks the PTE under the PTL and aborts if the entry is
no longer there. It is also possible the folio has been unmapped, freed
and re-allocated allowing a reference to be taken on an unrelated folio.
This case is also detected by the PTE check and the folio is unlocked
without further changes.
Link: https://lkml.kernel.org/r/20230330012519.804116-1-apopple@nvidia.com
Fixes: b756a3b5e7ea ("mm: device exclusive memory access")
Signed-off-by: Alistair Popple <apopple(a)nvidia.com>
Reviewed-by: Ralph Campbell <rcampbell(a)nvidia.com>
Reviewed-by: John Hubbard <jhubbard(a)nvidia.com>
Acked-by: David Hildenbrand <david(a)redhat.com>
Cc: Matthew Wilcox (Oracle) <willy(a)infradead.org>
Cc: Christoph Hellwig <hch(a)infradead.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/mm/memory.c b/mm/memory.c
index f456f3b5049c..01a23ad48a04 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -3563,8 +3563,21 @@ static vm_fault_t remove_device_exclusive_entry(struct vm_fault *vmf)
struct vm_area_struct *vma = vmf->vma;
struct mmu_notifier_range range;
- if (!folio_lock_or_retry(folio, vma->vm_mm, vmf->flags))
+ /*
+ * We need a reference to lock the folio because we don't hold
+ * the PTL so a racing thread can remove the device-exclusive
+ * entry and unmap it. If the folio is free the entry must
+ * have been removed already. If it happens to have already
+ * been re-allocated after being freed all we do is lock and
+ * unlock it.
+ */
+ if (!folio_try_get(folio))
+ return 0;
+
+ if (!folio_lock_or_retry(folio, vma->vm_mm, vmf->flags)) {
+ folio_put(folio);
return VM_FAULT_RETRY;
+ }
mmu_notifier_range_init_owner(&range, MMU_NOTIFY_EXCLUSIVE, 0,
vma->vm_mm, vmf->address & PAGE_MASK,
(vmf->address & PAGE_MASK) + PAGE_SIZE, NULL);
@@ -3577,6 +3590,7 @@ static vm_fault_t remove_device_exclusive_entry(struct vm_fault *vmf)
pte_unmap_unlock(vmf->pte, vmf->ptl);
folio_unlock(folio);
+ folio_put(folio);
mmu_notifier_invalidate_range_end(&range);
return 0;
The patch titled
Subject: writeback, cgroup: fix null-ptr-deref write in bdi_split_work_to_wbs
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
writeback-cgroup-fix-null-ptr-deref-write-in-bdi_split_work_to_wbs.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Baokun Li <libaokun1(a)huawei.com>
Subject: writeback, cgroup: fix null-ptr-deref write in bdi_split_work_to_wbs
Date: Mon, 10 Apr 2023 21:08:26 +0800
KASAN report null-ptr-deref:
==================================================================
BUG: KASAN: null-ptr-deref in bdi_split_work_to_wbs+0x5c5/0x7b0
Write of size 8 at addr 0000000000000000 by task sync/943
CPU: 5 PID: 943 Comm: sync Tainted: 6.3.0-rc5-next-20230406-dirty #461
Call Trace:
<TASK>
dump_stack_lvl+0x7f/0xc0
print_report+0x2ba/0x340
kasan_report+0xc4/0x120
kasan_check_range+0x1b7/0x2e0
__kasan_check_write+0x24/0x40
bdi_split_work_to_wbs+0x5c5/0x7b0
sync_inodes_sb+0x195/0x630
sync_inodes_one_sb+0x3a/0x50
iterate_supers+0x106/0x1b0
ksys_sync+0x98/0x160
[...]
==================================================================
The race that causes the above issue is as follows:
cpu1 cpu2
-------------------------|-------------------------
inode_switch_wbs
INIT_WORK(&isw->work, inode_switch_wbs_work_fn)
queue_rcu_work(isw_wq, &isw->work)
// queue_work async
inode_switch_wbs_work_fn
wb_put_many(old_wb, nr_switched)
percpu_ref_put_many
ref->data->release(ref)
cgwb_release
queue_work(cgwb_release_wq, &wb->release_work)
// queue_work async
&wb->release_work
cgwb_release_workfn
ksys_sync
iterate_supers
sync_inodes_one_sb
sync_inodes_sb
bdi_split_work_to_wbs
kmalloc(sizeof(*work), GFP_ATOMIC)
// alloc memory failed
percpu_ref_exit
ref->data = NULL
kfree(data)
wb_get(wb)
percpu_ref_get(&wb->refcnt)
percpu_ref_get_many(ref, 1)
atomic_long_add(nr, &ref->data->count)
atomic64_add(i, v)
// trigger null-ptr-deref
bdi_split_work_to_wbs() traverses &bdi->wb_list to split work into all
wbs. If the allocation of new work fails, the on-stack fallback will be
used and the reference count of the current wb is increased afterwards.
If cgroup writeback membership switches occur before getting the reference
count and the current wb is released as old_wd, then calling wb_get() or
wb_put() will trigger the null pointer dereference above.
This issue was introduced in v4.3-rc7 (see fix tag1). Both
sync_inodes_sb() and __writeback_inodes_sb_nr() calls to
bdi_split_work_to_wbs() can trigger this issue. For scenarios called via
sync_inodes_sb(), originally commit 7fc5854f8c6e ("writeback: synchronize
sync(2) against cgroup writeback membership switches") reduced the
possibility of the issue by adding wb_switch_rwsem, but in v5.14-rc1 (see
fix tag2) removed the "inode_io_list_del_locked(inode, old_wb)" from
inode_switch_wbs_work_fn() so that wb->state contains WB_has_dirty_io,
thus old_wb is not skipped when traversing wbs in bdi_split_work_to_wbs(),
and the issue becomes easily reproducible again.
To solve this problem, percpu_ref_exit() is called under RCU protection to
avoid race between cgwb_release_workfn() and bdi_split_work_to_wbs().
Moreover, replace wb_get() with wb_tryget() in bdi_split_work_to_wbs(),
and skip the current wb if wb_tryget() fails because the wb has already
been shutdown.
Link: https://lkml.kernel.org/r/20230410130826.1492525-1-libaokun1@huawei.com
Fixes: b817525a4a80 ("writeback: bdi_writeback iteration must not skip dying ones")
Signed-off-by: Baokun Li <libaokun1(a)huawei.com>
Reviewed-by: Jan Kara <jack(a)suse.cz>
Cc: Alexander Viro <viro(a)zeniv.linux.org.uk>
Cc: Andreas Dilger <adilger.kernel(a)dilger.ca>
Cc: Christian Brauner <brauner(a)kernel.org>
Cc: Dennis Zhou <dennis(a)kernel.org>
Cc: Hou Tao <houtao1(a)huawei.com>
Cc: Tejun Heo <tj(a)kernel.org>
Cc: yangerkun <yangerkun(a)huawei.com>
Cc: Zhang Yi <yi.zhang(a)huawei.com>
Cc: Jens Axboe <axboe(a)kernel.dk>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/fs-writeback.c | 17 ++++++++++-------
mm/backing-dev.c | 12 ++++++++++--
2 files changed, 20 insertions(+), 9 deletions(-)
--- a/fs/fs-writeback.c~writeback-cgroup-fix-null-ptr-deref-write-in-bdi_split_work_to_wbs
+++ a/fs/fs-writeback.c
@@ -978,6 +978,16 @@ restart:
continue;
}
+ /*
+ * If wb_tryget fails, the wb has been shutdown, skip it.
+ *
+ * Pin @wb so that it stays on @bdi->wb_list. This allows
+ * continuing iteration from @wb after dropping and
+ * regrabbing rcu read lock.
+ */
+ if (!wb_tryget(wb))
+ continue;
+
/* alloc failed, execute synchronously using on-stack fallback */
work = &fallback_work;
*work = *base_work;
@@ -986,13 +996,6 @@ restart:
work->done = &fallback_work_done;
wb_queue_work(wb, work);
-
- /*
- * Pin @wb so that it stays on @bdi->wb_list. This allows
- * continuing iteration from @wb after dropping and
- * regrabbing rcu read lock.
- */
- wb_get(wb);
last_wb = wb;
rcu_read_unlock();
--- a/mm/backing-dev.c~writeback-cgroup-fix-null-ptr-deref-write-in-bdi_split_work_to_wbs
+++ a/mm/backing-dev.c
@@ -507,6 +507,15 @@ static LIST_HEAD(offline_cgwbs);
static void cleanup_offline_cgwbs_workfn(struct work_struct *work);
static DECLARE_WORK(cleanup_offline_cgwbs_work, cleanup_offline_cgwbs_workfn);
+static void cgwb_free_rcu(struct rcu_head *rcu_head)
+{
+ struct bdi_writeback *wb = container_of(rcu_head,
+ struct bdi_writeback, rcu);
+
+ percpu_ref_exit(&wb->refcnt);
+ kfree(wb);
+}
+
static void cgwb_release_workfn(struct work_struct *work)
{
struct bdi_writeback *wb = container_of(work, struct bdi_writeback,
@@ -529,11 +538,10 @@ static void cgwb_release_workfn(struct w
list_del(&wb->offline_node);
spin_unlock_irq(&cgwb_lock);
- percpu_ref_exit(&wb->refcnt);
wb_exit(wb);
bdi_put(bdi);
WARN_ON_ONCE(!list_empty(&wb->b_attached));
- kfree_rcu(wb, rcu);
+ call_rcu(&wb->rcu, cgwb_free_rcu);
}
static void cgwb_release(struct percpu_ref *refcnt)
_
Patches currently in -mm which might be from libaokun1(a)huawei.com are
writeback-cgroup-fix-null-ptr-deref-write-in-bdi_split_work_to_wbs.patch
There's a few reasons the kernel should not spam dmesg on bad
userspace ioctl input:
- at warning level it results in CI false positives
- it allows userspace to drown dmesg output, potentially hiding real
issues.
None of the other generic EINVAL checks report in the
FBIOPUT_VSCREENINFO ioctl do this, so it's also inconsistent.
I guess the intent of the patch which introduced this warning was that
the drivers ->fb_check_var routine should fail in that case. Reality
is that there's too many fbdev drivers and not enough people
maintaining them by far, and so over the past few years we've simply
handled all these validation gaps by tighning the checks in the core,
because that's realistically really all that will ever happen.
Reported-by: syzbot+20dcf81733d43ddff661(a)syzkaller.appspotmail.com
Link: https://syzkaller.appspot.com/bug?id=c5faf983bfa4a607de530cd3bb008888bf06ce…
Fixes: 6c11df58fd1a ("fbmem: Check virtual screen sizes in fb_set_var()")
Cc: Helge Deller <deller(a)gmx.de>
Cc: Geert Uytterhoeven <geert(a)linux-m68k.org>
Cc: stable(a)vger.kernel.org # v5.4+
Cc: Daniel Vetter <daniel(a)ffwll.ch>
Cc: Javier Martinez Canillas <javierm(a)redhat.com>
Cc: Thomas Zimmermann <tzimmermann(a)suse.de>
Signed-off-by: Daniel Vetter <daniel.vetter(a)intel.com>
---
drivers/video/fbdev/core/fbmem.c | 4 ----
1 file changed, 4 deletions(-)
diff --git a/drivers/video/fbdev/core/fbmem.c b/drivers/video/fbdev/core/fbmem.c
index 875541ff185b..9757f4bcdf57 100644
--- a/drivers/video/fbdev/core/fbmem.c
+++ b/drivers/video/fbdev/core/fbmem.c
@@ -1021,10 +1021,6 @@ fb_set_var(struct fb_info *info, struct fb_var_screeninfo *var)
/* verify that virtual resolution >= physical resolution */
if (var->xres_virtual < var->xres ||
var->yres_virtual < var->yres) {
- pr_warn("WARNING: fbcon: Driver '%s' missed to adjust virtual screen size (%ux%u vs. %ux%u)\n",
- info->fix.id,
- var->xres_virtual, var->yres_virtual,
- var->xres, var->yres);
return -EINVAL;
}
--
2.40.0
In mas_alloc_nodes(), "node->node_count = 0" means to initialize the
node_count field of the new node, but the node may not be a new node.
It may be a node that existed before and node_count has a value, setting
it to 0 will cause a memory leak. At this time, mas->alloc->total will
be greater than the actual number of nodes in the linked list, which may
cause many other errors. For example, out-of-bounds access in mas_pop_node(),
and mas_pop_node() may return addresses that should not be used. Fix it
by initializing node_count only for new nodes.
Also, by the way, an if-else statement was removed to simplify the code.
Fixes: 54a611b60590 ("Maple Tree: add new data structure")
Signed-off-by: Peng Zhang <zhangpeng.00(a)bytedance.com>
Cc: <stable(a)vger.kernel.org>
---
lib/maple_tree.c | 19 +++++++------------
1 file changed, 7 insertions(+), 12 deletions(-)
diff --git a/lib/maple_tree.c b/lib/maple_tree.c
index dd1a114d9e2b..938634bea2d6 100644
--- a/lib/maple_tree.c
+++ b/lib/maple_tree.c
@@ -1303,26 +1303,21 @@ static inline void mas_alloc_nodes(struct ma_state *mas, gfp_t gfp)
node = mas->alloc;
node->request_count = 0;
while (requested) {
- max_req = MAPLE_ALLOC_SLOTS;
- if (node->node_count) {
- unsigned int offset = node->node_count;
-
- slots = (void **)&node->slot[offset];
- max_req -= offset;
- } else {
- slots = (void **)&node->slot;
- }
-
+ max_req = MAPLE_ALLOC_SLOTS - node->node_count;
+ slots = (void **)&node->slot[node->node_count];
max_req = min(requested, max_req);
count = mt_alloc_bulk(gfp, max_req, slots);
if (!count)
goto nomem_bulk;
+ if (node->node_count == 0) {
+ node->slot[0]->node_count = 0;
+ node->slot[0]->request_count = 0;
+ }
+
node->node_count += count;
allocated += count;
node = node->slot[0];
- node->node_count = 0;
- node->request_count = 0;
requested -= count;
}
mas->alloc->total = allocated;
--
2.20.1
Commit 414428c5da1c ("PCI: hv: Lock PCI bus on device eject") added
pci_lock_rescan_remove() and pci_unlock_rescan_remove() in
create_root_hv_pci_bus() and in hv_eject_device_work() to address the
race between create_root_hv_pci_bus() and hv_eject_device_work(), but it
turns that grubing the pci_rescan_remove_lock mutex is not enough:
refer to the earlier fix "PCI: hv: Add a per-bus mutex state_lock".
Now with hbus->state_lock and other fixes, the race is resolved, so
remove pci_{lock,unlock}_rescan_remove() in create_root_hv_pci_bus():
this removes the serialization in hv_pci_probe() and hence allows
async-probing (PROBE_PREFER_ASYNCHRONOUS) to work.
Add the async-probing flag to hv_pci_drv.
pci_{lock,unlock}_rescan_remove() in hv_eject_device_work() and in
hv_pci_remove() are still kept: according to the comment before
drivers/pci/probe.c: static DEFINE_MUTEX(pci_rescan_remove_lock),
"PCI device removal routines should always be executed under this mutex".
Signed-off-by: Dexuan Cui <decui(a)microsoft.com>
Cc: stable(a)vger.kernel.org
---
v2:
No change to the patch body.
Improved the commit message [Michael Kelley]
Added Cc:stable
drivers/pci/controller/pci-hyperv.c | 11 +++++++++--
1 file changed, 9 insertions(+), 2 deletions(-)
diff --git a/drivers/pci/controller/pci-hyperv.c b/drivers/pci/controller/pci-hyperv.c
index 3ae2f99dea8c2..2ea2b1b8a4c9a 100644
--- a/drivers/pci/controller/pci-hyperv.c
+++ b/drivers/pci/controller/pci-hyperv.c
@@ -2312,12 +2312,16 @@ static int create_root_hv_pci_bus(struct hv_pcibus_device *hbus)
if (error)
return error;
- pci_lock_rescan_remove();
+ /*
+ * pci_lock_rescan_remove() and pci_unlock_rescan_remove() are
+ * unnecessary here, because we hold the hbus->state_lock, meaning
+ * hv_eject_device_work() and pci_devices_present_work() can't race
+ * with create_root_hv_pci_bus().
+ */
hv_pci_assign_numa_node(hbus);
pci_bus_assign_resources(bridge->bus);
hv_pci_assign_slots(hbus);
pci_bus_add_devices(bridge->bus);
- pci_unlock_rescan_remove();
hbus->state = hv_pcibus_installed;
return 0;
}
@@ -4003,6 +4007,9 @@ static struct hv_driver hv_pci_drv = {
.remove = hv_pci_remove,
.suspend = hv_pci_suspend,
.resume = hv_pci_resume,
+ .driver = {
+ .probe_type = PROBE_PREFER_ASYNCHRONOUS,
+ },
};
static void __exit exit_hv_pci_drv(void)
--
2.25.1
This is an automatic generated email to let you know that the following patch were queued:
Subject: media: ov8856: Do not check for for module version
Author: Ricardo Ribalda <ribalda(a)chromium.org>
Date: Thu Mar 23 23:44:20 2023 +0100
It the device is probed in non-zero ACPI D state, the module
identification is delayed until the first streamon.
The module identification has two parts: deviceID and version. To rea
the version we have to enable OTP read. This cannot be done during
streamon, becase it modifies REG_MODE_SELECT.
Since the driver has the same behaviour for all the module versions, do
not read the module version from the sensor's OTP.
Cc: stable(a)vger.kernel.org
Fixes: 0e014f1a8d54 ("media: ov8856: support device probe in non-zero ACPI D state")
Signed-off-by: Ricardo Ribalda <ribalda(a)chromium.org>
Signed-off-by: Sakari Ailus <sakari.ailus(a)linux.intel.com>
Signed-off-by: Hans Verkuil <hverkuil-cisco(a)xs4all.nl>
drivers/media/i2c/ov8856.c | 40 ----------------------------------------
1 file changed, 40 deletions(-)
---
diff --git a/drivers/media/i2c/ov8856.c b/drivers/media/i2c/ov8856.c
index cf8384e09413..b5c7881383ca 100644
--- a/drivers/media/i2c/ov8856.c
+++ b/drivers/media/i2c/ov8856.c
@@ -1709,46 +1709,6 @@ static int ov8856_identify_module(struct ov8856 *ov8856)
return -ENXIO;
}
- ret = ov8856_write_reg(ov8856, OV8856_REG_MODE_SELECT,
- OV8856_REG_VALUE_08BIT, OV8856_MODE_STREAMING);
- if (ret)
- return ret;
-
- ret = ov8856_write_reg(ov8856, OV8856_OTP_MODE_CTRL,
- OV8856_REG_VALUE_08BIT, OV8856_OTP_MODE_AUTO);
- if (ret) {
- dev_err(&client->dev, "failed to set otp mode");
- return ret;
- }
-
- ret = ov8856_write_reg(ov8856, OV8856_OTP_LOAD_CTRL,
- OV8856_REG_VALUE_08BIT,
- OV8856_OTP_LOAD_CTRL_ENABLE);
- if (ret) {
- dev_err(&client->dev, "failed to enable load control");
- return ret;
- }
-
- ret = ov8856_read_reg(ov8856, OV8856_MODULE_REVISION,
- OV8856_REG_VALUE_08BIT, &val);
- if (ret) {
- dev_err(&client->dev, "failed to read module revision");
- return ret;
- }
-
- dev_info(&client->dev, "OV8856 revision %x (%s) at address 0x%02x\n",
- val,
- val == OV8856_2A_MODULE ? "2A" :
- val == OV8856_1B_MODULE ? "1B" : "unknown revision",
- client->addr);
-
- ret = ov8856_write_reg(ov8856, OV8856_REG_MODE_SELECT,
- OV8856_REG_VALUE_08BIT, OV8856_MODE_STANDBY);
- if (ret) {
- dev_err(&client->dev, "failed to exit streaming mode");
- return ret;
- }
-
ov8856->identified = true;
return 0;
[CCing the stable list as well as Greg and Sasha so they can correct me
if I write something stupid]
On 06.04.23 10:27, Ricardo Cañuelo wrote:
>
> On 5/4/23 19:14, Thorsten Leemhuis wrote:
>> Wait, what? A patch (5225e1b87432 ("ARM: dts: meson: Fix the UART
>> compatible strings")) that was merged for v5.17-rc4 and is not in the
>> list of patches that were in 4.14.312-rc1
>> (https://lore.kernel.org/all/20230403140351.636471867@linuxfoundation.org/
>> ) is meant to suddenly cause this? How is this possible? Am I totally on
>> the wrong track here and misunderstanding something, or is this a
>> bisection that went horribly sideways?
>
> I didn't say this was introduced in 4.14.312-rc1, this has been failing
> for a long time and it was merged for 4.14.267:
> https://lwn.net/Articles/884977/
>
> Sorry I wasn't clear before.
Ahh, no worries and thx for this. But well, in that case let me get back
to something from your report:
>>> KernelCI detected that this patch introduced a regression in
>>> stable-rc/linux-4.14.y on a meson8b-odroidc1.
>>> After this patch was applied the tests running on this platform don't
>>> show any serial output.
>>>
>>> This doesn't happen in other stable branches nor in mainline, but 4.14
>>> hasn't still reached EOL and it'd be good to find a fix.
Well, the stable maintainers may correct me if I'm wrong, but as far as
I know in that case it's the duty of the stable team (which was not even
CCed on the report afaics) to look into this for two reasons:
* the regression does not happened in mainline (and maybe never has)
* mainline developers never signed up for maintaining their work in
longterm kernels; quite a few nevertheless help in situation like this,
at least for recent series and if they asked for a backport through a
"CC: <stable@" tag – but the latter doesn't seem to be the case here
(not totally sure, but it looks like AUTOSEL picked this up) and it's a
quite old series.
>>> #regzbot introduced: 5225e1b87432dcf0d0fc3440824b91d04c1d6cc1
Thx for getting regzbot involved, but due to your usage it now considers
this a mainline regression, as 5225e1b87432 is a mainline commit. As
this only happens in a particular stable tree, it should use a commit id
from there instead:
#regzbot introduced: 23dfa42a0a2a91d640ef3fce585194b970d8680c
(above line will make regzbot adjust this)
Ciao, Thorsten
my greeting to you my friend i sent you messages many times without response?
----------------------------------------------------------------------
mans sveiciens tev, mans draugs, es tev vairākas reizes nosūtīju ziņas
bez atbildes?
This is an oversight from dc5bdb68b5b3 ("drm/fb-helper: Fix vt
restore") - I failed to realize that nasty userspace could set this.
It's not pretty to mix up kernel-internal and userspace uapi flags
like this, but since the entire fb_var_screeninfo structure is uapi
we'd need to either add a new parameter to the ->fb_set_par callback
and fb_set_par() function, which has a _lot_ of users. Or some other
fairly ugly side-channel int fb_info. Neither is a pretty prospect.
Instead just correct the issue at hand by filtering out this
kernel-internal flag in the ioctl handling code.
Signed-off-by: Daniel Vetter <daniel.vetter(a)intel.com>
Fixes: dc5bdb68b5b3 ("drm/fb-helper: Fix vt restore")
Cc: Alex Deucher <alexander.deucher(a)amd.com>
Cc: shlomo(a)fastmail.com
Cc: Michel Dänzer <michel(a)daenzer.net>
Cc: Noralf Trønnes <noralf(a)tronnes.org>
Cc: Thomas Zimmermann <tzimmermann(a)suse.de>
Cc: Daniel Vetter <daniel.vetter(a)intel.com>
Cc: Maarten Lankhorst <maarten.lankhorst(a)linux.intel.com>
Cc: Maxime Ripard <mripard(a)kernel.org>
Cc: David Airlie <airlied(a)linux.ie>
Cc: Daniel Vetter <daniel(a)ffwll.ch>
Cc: dri-devel(a)lists.freedesktop.org
Cc: <stable(a)vger.kernel.org> # v5.7+
Cc: Bartlomiej Zolnierkiewicz <b.zolnierkie(a)samsung.com>
Cc: Geert Uytterhoeven <geert(a)linux-m68k.org>
Cc: Nathan Chancellor <natechancellor(a)gmail.com>
Cc: Qiujun Huang <hqjagain(a)gmail.com>
Cc: Peter Rosin <peda(a)axentia.se>
Cc: linux-fbdev(a)vger.kernel.org
Cc: Helge Deller <deller(a)gmx.de>
Cc: Sam Ravnborg <sam(a)ravnborg.org>
Cc: Geert Uytterhoeven <geert+renesas(a)glider.be>
Cc: Samuel Thibault <samuel.thibault(a)ens-lyon.org>
Cc: Tetsuo Handa <penguin-kernel(a)I-love.SAKURA.ne.jp>
Cc: Shigeru Yoshida <syoshida(a)redhat.com>
---
drivers/video/fbdev/core/fbmem.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/drivers/video/fbdev/core/fbmem.c b/drivers/video/fbdev/core/fbmem.c
index 875541ff185b..3fd95a79e4c3 100644
--- a/drivers/video/fbdev/core/fbmem.c
+++ b/drivers/video/fbdev/core/fbmem.c
@@ -1116,6 +1116,8 @@ static long do_fb_ioctl(struct fb_info *info, unsigned int cmd,
case FBIOPUT_VSCREENINFO:
if (copy_from_user(&var, argp, sizeof(var)))
return -EFAULT;
+ /* only for kernel-internal use */
+ var.activate &= ~FB_ACTIVATE_KD_TEXT;
console_lock();
lock_fb_info(info);
ret = fbcon_modechange_possible(info, &var);
--
2.40.0
UNICEF is looking for a Personal assistant to Justin McCarthy, Esq. who is currently in the Ukraine assisting the Justice for children Program. You will be responsible for handling Mr McCarthy's purchase of items such as computed learning materials, food items and shelter related items for UNICEF specific affairs in the City while he is away. You will be paid $59 per hour and your hours will be flexible with a minimum of 1 hours every week day remotely. Please submit your Resume to: justinmccarthy.esq(a)gmail.com to apply for this position.
I found a regression while updating from 6.2.9 to 6.2.10 (Arch Linux).
After upgrading to 6.2.10, my external monitors stopped working (no
input) when starting my display manager.
My hardware:
Lenovo T14s AMD gen 1
Lenovo USB-C Dock Gen 2 40AS (firmware up to date: 13.24)
2 monitors connected via dock and thus via an MST hub
Reverting commit d7b5638bd3374a47f0b038449118b12d8d6e391c fixes the issue.
Best regards,
Veronika
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x 0145462fc802cd447ef5d029758043c7f15b4b1e
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023041107-basically-gas-eb2c@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
0145462fc802 ("can: isotp: isotp_recvmsg(): use sock_recv_cmsgs() to get SOCK_RXQ_OVFL infos")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 0145462fc802cd447ef5d029758043c7f15b4b1e Mon Sep 17 00:00:00 2001
From: Oliver Hartkopp <socketcan(a)hartkopp.net>
Date: Thu, 30 Mar 2023 19:02:48 +0200
Subject: [PATCH] can: isotp: isotp_recvmsg(): use sock_recv_cmsgs() to get
SOCK_RXQ_OVFL infos
isotp.c was still using sock_recv_timestamp() which does not provide
control messages to detect dropped PDUs in the receive path.
Fixes: e057dd3fc20f ("can: add ISO 15765-2:2016 transport protocol")
Signed-off-by: Oliver Hartkopp <socketcan(a)hartkopp.net>
Link: https://lore.kernel.org/all/20230330170248.62342-1-socketcan@hartkopp.net
Cc: stable(a)vger.kernel.org
Signed-off-by: Marc Kleine-Budde <mkl(a)pengutronix.de>
diff --git a/net/can/isotp.c b/net/can/isotp.c
index 9bc344851704..47c2ebad10ed 100644
--- a/net/can/isotp.c
+++ b/net/can/isotp.c
@@ -1120,7 +1120,7 @@ static int isotp_recvmsg(struct socket *sock, struct msghdr *msg, size_t size,
if (ret < 0)
goto out_err;
- sock_recv_timestamp(msg, sk, skb);
+ sock_recv_cmsgs(msg, sk, skb);
if (msg->msg_name) {
__sockaddr_check_size(ISOTP_MIN_NAMELEN);
From: Robert Foss <robert.foss(a)linaro.org>
[ Upstream commit 2a9df204be0bbb896e087f00b9ee3fc559d5a608 ]
This fixes PLL being unable to lock, and is derived from an equivalent
downstream commit.
Available LT9611 documentation does not list this register, neither does
LT9611UXC (which is a different chip).
This commit has been confirmed to fix HDMI output on DragonBoard 845c.
Suggested-by: Amit Pundir <amit.pundir(a)linaro.org>
Reviewed-by: Amit Pundir <amit.pundir(a)linaro.org>
Signed-off-by: Robert Foss <robert.foss(a)linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20221213150304.4189760-1-robe…
Signed-off-by: Amit Pundir <amit.pundir(a)linaro.org>
---
To be cherry-picked on v5.10.y, v5.15.y and v6.1.y.
drivers/gpu/drm/bridge/lontium-lt9611.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/gpu/drm/bridge/lontium-lt9611.c b/drivers/gpu/drm/bridge/lontium-lt9611.c
index 3b77238ca4af..ae8c6d9d4095 100644
--- a/drivers/gpu/drm/bridge/lontium-lt9611.c
+++ b/drivers/gpu/drm/bridge/lontium-lt9611.c
@@ -258,6 +258,7 @@ static int lt9611_pll_setup(struct lt9611 *lt9611, const struct drm_display_mode
{ 0x8126, 0x55 },
{ 0x8127, 0x66 },
{ 0x8128, 0x88 },
+ { 0x812a, 0x20 },
};
regmap_multi_reg_write(lt9611->regmap, reg_cfg, ARRAY_SIZE(reg_cfg));
--
2.25.1
Hi,
Some MST fixups that landed in 6.3-rc1 were CC to stable and successfully
applied to 6.2.y, but failed to apply to 6.1.y even though they are needed
there as well.
They fail to apply due to this commit missing in 6.1.y:
commit 8c7d980da9ba ("drm/nouveau/disp: move DP MST payload config method")
Backporting that is a rabbit hole of other work and makes this no longer
viable for stable, so instead hand modify
commit e761cc20946a ("drm/display/dp_mst: Handle old/new payload states in drm_dp_remove_payload()")
for the missing contextual changes in 6.1.y in nouveau.
I've tested that these work on a Rembrandt laptop connected to WD19TB
with two monitors connected.
Imre Deak (2):
drm/display/dp_mst: Handle old/new payload states in
drm_dp_remove_payload()
drm/i915/dp_mst: Fix payload removal during output disabling
.../amd/display/amdgpu_dm/amdgpu_dm_helpers.c | 2 +-
drivers/gpu/drm/display/drm_dp_mst_topology.c | 26 ++++++++++---------
drivers/gpu/drm/i915/display/intel_dp_mst.c | 14 +++++++---
drivers/gpu/drm/nouveau/dispnv50/disp.c | 2 +-
include/drm/display/drm_dp_mst_helper.h | 3 ++-
5 files changed, 28 insertions(+), 19 deletions(-)
--
2.34.1
[Public]
Hi,
Rajib recently found that GPU reset is failing on DCN 3.1.4 with 6.1.y.
It's because of some missing commits in 6.1.y that skip the appropriate functions to match the GPU's graphics architecture. Backporting these and GPU reset works again.
Can you please bring these back to 6.1.y and 6.2.y?
2a7798ea7390 ("drm/amdgpu: for S0ix, skip SDMA 5.x+ suspend/resume")
e11c775030c5 ("drm/amdgpu: skip psp suspend for IMU enabled ASICs mode2 reset")
Thanks,
[Public]
Hi,
There is a bug present in 6.2.y and 6.1.y where if a dock that supports MST is unplugged during suspend it's not possible to get MST working after resume. This is because the cleanup/error path doesn't actually clear the MST tree.
It's fixed in 6.3 with:
3f6752b4de41 ("drm/amd/display: Clear MST topology if it fails to resume")
Can this please be backported to 6.2.y and 6.1.y?
Thanks,
Please take 84aca0a7e039 ("blk-throttle: Fix that bps of child could
exceed bps limited in parent") for stable 6.1
Cherry pick claims that the patch applies cleanly to previous trees.
Don't believe it. This fixes a regression introduced in 5.18 abouts
The patch below does not apply to the 6.2-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.2.y
git checkout FETCH_HEAD
git cherry-pick -x 3dd4432549415f3c65dd52d5c687629efbf4ece1
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023041140-childcare-profusely-9882@gregkh' --subject-prefix 'PATCH 6.2.y' HEAD^..
Possible dependencies:
3dd443254941 ("mm: enable maple tree RCU mode by default")
3b9dbd5e91b1 ("kernel/fork: convert forking to using the vmi iterator")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 3dd4432549415f3c65dd52d5c687629efbf4ece1 Mon Sep 17 00:00:00 2001
From: "Liam R. Howlett" <Liam.Howlett(a)oracle.com>
Date: Mon, 27 Feb 2023 09:36:07 -0800
Subject: [PATCH] mm: enable maple tree RCU mode by default
Use the maple tree in RCU mode for VMA tracking.
The maple tree tracks the stack and is able to update the pivot
(lower/upper boundary) in-place to allow the page fault handler to write
to the tree while holding just the mmap read lock. This is safe as the
writes to the stack have a guard VMA which ensures there will always be a
NULL in the direction of the growth and thus will only update a pivot.
It is possible, but not recommended, to have VMAs that grow up/down
without guard VMAs. syzbot has constructed a testcase which sets up a VMA
to grow and consume the empty space. Overwriting the entire NULL entry
causes the tree to be altered in a way that is not safe for concurrent
readers; the readers may see a node being rewritten or one that does not
match the maple state they are using.
Enabling RCU mode allows the concurrent readers to see a stable node and
will return the expected result.
[Liam.Howlett(a)Oracle.com: we don't need to free the nodes with RCU[
Link: https://lore.kernel.org/linux-mm/000000000000b0a65805f663ace6@google.com/
Link: https://lkml.kernel.org/r/20230227173632.3292573-9-surenb@google.com
Fixes: d4af56c5c7c6 ("mm: start tracking VMAs with maple tree")
Signed-off-by: Liam R. Howlett <Liam.Howlett(a)oracle.com>
Signed-off-by: Suren Baghdasaryan <surenb(a)google.com>
Reported-by: syzbot+8d95422d3537159ca390(a)syzkaller.appspotmail.com
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 0722859c3647..a57e6ae78e65 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -774,7 +774,8 @@ struct mm_struct {
unsigned long cpu_bitmap[];
};
-#define MM_MT_FLAGS (MT_FLAGS_ALLOC_RANGE | MT_FLAGS_LOCK_EXTERN)
+#define MM_MT_FLAGS (MT_FLAGS_ALLOC_RANGE | MT_FLAGS_LOCK_EXTERN | \
+ MT_FLAGS_USE_RCU)
extern struct mm_struct init_mm;
/* Pointer magic because the dynamic array size confuses some compilers. */
diff --git a/kernel/fork.c b/kernel/fork.c
index c0257cbee093..0c92f224c68c 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -617,6 +617,7 @@ static __latent_entropy int dup_mmap(struct mm_struct *mm,
if (retval)
goto out;
+ mt_clear_in_rcu(vmi.mas.tree);
for_each_vma(old_vmi, mpnt) {
struct file *file;
@@ -700,6 +701,8 @@ static __latent_entropy int dup_mmap(struct mm_struct *mm,
retval = arch_dup_mmap(oldmm, mm);
loop_out:
vma_iter_free(&vmi);
+ if (!retval)
+ mt_set_in_rcu(vmi.mas.tree);
out:
mmap_write_unlock(mm);
flush_tlb_mm(oldmm);
diff --git a/mm/mmap.c b/mm/mmap.c
index ad499f7b767f..ff68a67a2a7c 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -2277,7 +2277,7 @@ do_vmi_align_munmap(struct vma_iterator *vmi, struct vm_area_struct *vma,
int count = 0;
int error = -ENOMEM;
MA_STATE(mas_detach, &mt_detach, 0, 0);
- mt_init_flags(&mt_detach, MT_FLAGS_LOCK_EXTERN);
+ mt_init_flags(&mt_detach, vmi->mas.tree->ma_flags & MT_FLAGS_LOCK_MASK);
mt_set_external_lock(&mt_detach, &mm->mmap_lock);
/*
@@ -3037,6 +3037,7 @@ void exit_mmap(struct mm_struct *mm)
*/
set_bit(MMF_OOM_SKIP, &mm->flags);
mmap_write_lock(mm);
+ mt_clear_in_rcu(&mm->mm_mt);
free_pgtables(&tlb, &mm->mm_mt, vma, FIRST_USER_ADDRESS,
USER_PGTABLES_CEILING);
tlb_finish_mmu(&tlb);
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x 3dd4432549415f3c65dd52d5c687629efbf4ece1
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023041138-delay-translate-7f35@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
3dd443254941 ("mm: enable maple tree RCU mode by default")
3b9dbd5e91b1 ("kernel/fork: convert forking to using the vmi iterator")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 3dd4432549415f3c65dd52d5c687629efbf4ece1 Mon Sep 17 00:00:00 2001
From: "Liam R. Howlett" <Liam.Howlett(a)oracle.com>
Date: Mon, 27 Feb 2023 09:36:07 -0800
Subject: [PATCH] mm: enable maple tree RCU mode by default
Use the maple tree in RCU mode for VMA tracking.
The maple tree tracks the stack and is able to update the pivot
(lower/upper boundary) in-place to allow the page fault handler to write
to the tree while holding just the mmap read lock. This is safe as the
writes to the stack have a guard VMA which ensures there will always be a
NULL in the direction of the growth and thus will only update a pivot.
It is possible, but not recommended, to have VMAs that grow up/down
without guard VMAs. syzbot has constructed a testcase which sets up a VMA
to grow and consume the empty space. Overwriting the entire NULL entry
causes the tree to be altered in a way that is not safe for concurrent
readers; the readers may see a node being rewritten or one that does not
match the maple state they are using.
Enabling RCU mode allows the concurrent readers to see a stable node and
will return the expected result.
[Liam.Howlett(a)Oracle.com: we don't need to free the nodes with RCU[
Link: https://lore.kernel.org/linux-mm/000000000000b0a65805f663ace6@google.com/
Link: https://lkml.kernel.org/r/20230227173632.3292573-9-surenb@google.com
Fixes: d4af56c5c7c6 ("mm: start tracking VMAs with maple tree")
Signed-off-by: Liam R. Howlett <Liam.Howlett(a)oracle.com>
Signed-off-by: Suren Baghdasaryan <surenb(a)google.com>
Reported-by: syzbot+8d95422d3537159ca390(a)syzkaller.appspotmail.com
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 0722859c3647..a57e6ae78e65 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -774,7 +774,8 @@ struct mm_struct {
unsigned long cpu_bitmap[];
};
-#define MM_MT_FLAGS (MT_FLAGS_ALLOC_RANGE | MT_FLAGS_LOCK_EXTERN)
+#define MM_MT_FLAGS (MT_FLAGS_ALLOC_RANGE | MT_FLAGS_LOCK_EXTERN | \
+ MT_FLAGS_USE_RCU)
extern struct mm_struct init_mm;
/* Pointer magic because the dynamic array size confuses some compilers. */
diff --git a/kernel/fork.c b/kernel/fork.c
index c0257cbee093..0c92f224c68c 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -617,6 +617,7 @@ static __latent_entropy int dup_mmap(struct mm_struct *mm,
if (retval)
goto out;
+ mt_clear_in_rcu(vmi.mas.tree);
for_each_vma(old_vmi, mpnt) {
struct file *file;
@@ -700,6 +701,8 @@ static __latent_entropy int dup_mmap(struct mm_struct *mm,
retval = arch_dup_mmap(oldmm, mm);
loop_out:
vma_iter_free(&vmi);
+ if (!retval)
+ mt_set_in_rcu(vmi.mas.tree);
out:
mmap_write_unlock(mm);
flush_tlb_mm(oldmm);
diff --git a/mm/mmap.c b/mm/mmap.c
index ad499f7b767f..ff68a67a2a7c 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -2277,7 +2277,7 @@ do_vmi_align_munmap(struct vma_iterator *vmi, struct vm_area_struct *vma,
int count = 0;
int error = -ENOMEM;
MA_STATE(mas_detach, &mt_detach, 0, 0);
- mt_init_flags(&mt_detach, MT_FLAGS_LOCK_EXTERN);
+ mt_init_flags(&mt_detach, vmi->mas.tree->ma_flags & MT_FLAGS_LOCK_MASK);
mt_set_external_lock(&mt_detach, &mm->mmap_lock);
/*
@@ -3037,6 +3037,7 @@ void exit_mmap(struct mm_struct *mm)
*/
set_bit(MMF_OOM_SKIP, &mm->flags);
mmap_write_lock(mm);
+ mt_clear_in_rcu(&mm->mm_mt);
free_pgtables(&tlb, &mm->mm_mt, vma, FIRST_USER_ADDRESS,
USER_PGTABLES_CEILING);
tlb_finish_mmu(&tlb);
The patch below does not apply to the 6.2-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.2.y
git checkout FETCH_HEAD
git cherry-pick -x 0a2b18d948838e16912b3b627b504ab062b7d02a
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023041133-corroding-underdone-8e9f@gregkh' --subject-prefix 'PATCH 6.2.y' HEAD^..
Possible dependencies:
0a2b18d94883 ("maple_tree: add smp_rmb() to dead node detection")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 0a2b18d948838e16912b3b627b504ab062b7d02a Mon Sep 17 00:00:00 2001
From: "Liam R. Howlett" <Liam.Howlett(a)oracle.com>
Date: Mon, 27 Feb 2023 09:36:05 -0800
Subject: [PATCH] maple_tree: add smp_rmb() to dead node detection
Add an smp_rmb() before reading the parent pointer to ensure that anything
read from the node prior to the parent pointer hasn't been reordered ahead
of this check.
The is necessary for RCU mode.
Link: https://lkml.kernel.org/r/20230227173632.3292573-7-surenb@google.com
Fixes: 54a611b60590 ("Maple Tree: add new data structure")
Signed-off-by: Liam R. Howlett <Liam.Howlett(a)oracle.com>
Signed-off-by: Suren Baghdasaryan <surenb(a)google.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/lib/maple_tree.c b/lib/maple_tree.c
index 5202d89ba56e..72c89eb03393 100644
--- a/lib/maple_tree.c
+++ b/lib/maple_tree.c
@@ -539,9 +539,11 @@ static inline struct maple_node *mte_parent(const struct maple_enode *enode)
*/
static inline bool ma_dead_node(const struct maple_node *node)
{
- struct maple_node *parent = (void *)((unsigned long)
- node->parent & ~MAPLE_NODE_MASK);
+ struct maple_node *parent;
+ /* Do not reorder reads from the node prior to the parent check */
+ smp_rmb();
+ parent = (void *)((unsigned long) node->parent & ~MAPLE_NODE_MASK);
return (parent == node);
}
@@ -556,6 +558,8 @@ static inline bool mte_dead_node(const struct maple_enode *enode)
struct maple_node *parent, *node;
node = mte_to_node(enode);
+ /* Do not reorder reads from the node prior to the parent check */
+ smp_rmb();
parent = mte_parent(enode);
return (parent == node);
}
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x 0a2b18d948838e16912b3b627b504ab062b7d02a
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023041132-hundredth-delouse-b680@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
0a2b18d94883 ("maple_tree: add smp_rmb() to dead node detection")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 0a2b18d948838e16912b3b627b504ab062b7d02a Mon Sep 17 00:00:00 2001
From: "Liam R. Howlett" <Liam.Howlett(a)oracle.com>
Date: Mon, 27 Feb 2023 09:36:05 -0800
Subject: [PATCH] maple_tree: add smp_rmb() to dead node detection
Add an smp_rmb() before reading the parent pointer to ensure that anything
read from the node prior to the parent pointer hasn't been reordered ahead
of this check.
The is necessary for RCU mode.
Link: https://lkml.kernel.org/r/20230227173632.3292573-7-surenb@google.com
Fixes: 54a611b60590 ("Maple Tree: add new data structure")
Signed-off-by: Liam R. Howlett <Liam.Howlett(a)oracle.com>
Signed-off-by: Suren Baghdasaryan <surenb(a)google.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/lib/maple_tree.c b/lib/maple_tree.c
index 5202d89ba56e..72c89eb03393 100644
--- a/lib/maple_tree.c
+++ b/lib/maple_tree.c
@@ -539,9 +539,11 @@ static inline struct maple_node *mte_parent(const struct maple_enode *enode)
*/
static inline bool ma_dead_node(const struct maple_node *node)
{
- struct maple_node *parent = (void *)((unsigned long)
- node->parent & ~MAPLE_NODE_MASK);
+ struct maple_node *parent;
+ /* Do not reorder reads from the node prior to the parent check */
+ smp_rmb();
+ parent = (void *)((unsigned long) node->parent & ~MAPLE_NODE_MASK);
return (parent == node);
}
@@ -556,6 +558,8 @@ static inline bool mte_dead_node(const struct maple_enode *enode)
struct maple_node *parent, *node;
node = mte_to_node(enode);
+ /* Do not reorder reads from the node prior to the parent check */
+ smp_rmb();
parent = mte_parent(enode);
return (parent == node);
}
The patch below does not apply to the 6.2-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.2.y
git checkout FETCH_HEAD
git cherry-pick -x 8372f4d83f96f35915106093cde4565836587123
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023041120-outbid-gloomy-afbe@gregkh' --subject-prefix 'PATCH 6.2.y' HEAD^..
Possible dependencies:
8372f4d83f96 ("maple_tree: remove extra smp_wmb() from mas_dead_leaves()")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 8372f4d83f96f35915106093cde4565836587123 Mon Sep 17 00:00:00 2001
From: Liam Howlett <Liam.Howlett(a)oracle.com>
Date: Mon, 27 Feb 2023 09:36:03 -0800
Subject: [PATCH] maple_tree: remove extra smp_wmb() from mas_dead_leaves()
The call to mte_set_dead_node() before the smp_wmb() already calls
smp_wmb() so this is not needed. This is an optimization for the RCU mode
of the maple tree.
Link: https://lkml.kernel.org/r/20230227173632.3292573-5-surenb@google.com
Fixes: 54a611b60590 ("Maple Tree: add new data structure")
Signed-off-by: Liam Howlett <Liam.Howlett(a)oracle.com>
Signed-off-by: Suren Baghdasaryan <surenb(a)google.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/lib/maple_tree.c b/lib/maple_tree.c
index 946acda29521..96d673e4ba5b 100644
--- a/lib/maple_tree.c
+++ b/lib/maple_tree.c
@@ -5503,7 +5503,6 @@ unsigned char mas_dead_leaves(struct ma_state *mas, void __rcu **slots,
break;
mte_set_node_dead(entry);
- smp_wmb(); /* Needed for RCU */
node->type = type;
rcu_assign_pointer(slots[offset], node);
}
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x 8372f4d83f96f35915106093cde4565836587123
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023041119-pacemaker-division-7192@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
8372f4d83f96 ("maple_tree: remove extra smp_wmb() from mas_dead_leaves()")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 8372f4d83f96f35915106093cde4565836587123 Mon Sep 17 00:00:00 2001
From: Liam Howlett <Liam.Howlett(a)oracle.com>
Date: Mon, 27 Feb 2023 09:36:03 -0800
Subject: [PATCH] maple_tree: remove extra smp_wmb() from mas_dead_leaves()
The call to mte_set_dead_node() before the smp_wmb() already calls
smp_wmb() so this is not needed. This is an optimization for the RCU mode
of the maple tree.
Link: https://lkml.kernel.org/r/20230227173632.3292573-5-surenb@google.com
Fixes: 54a611b60590 ("Maple Tree: add new data structure")
Signed-off-by: Liam Howlett <Liam.Howlett(a)oracle.com>
Signed-off-by: Suren Baghdasaryan <surenb(a)google.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/lib/maple_tree.c b/lib/maple_tree.c
index 946acda29521..96d673e4ba5b 100644
--- a/lib/maple_tree.c
+++ b/lib/maple_tree.c
@@ -5503,7 +5503,6 @@ unsigned char mas_dead_leaves(struct ma_state *mas, void __rcu **slots,
break;
mte_set_node_dead(entry);
- smp_wmb(); /* Needed for RCU */
node->type = type;
rcu_assign_pointer(slots[offset], node);
}
The patch below does not apply to the 6.2-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.2.y
git checkout FETCH_HEAD
git cherry-pick -x a7b92d59c885018cb7bb88539892278e4fd64b29
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023041103-washcloth-overplay-32db@gregkh' --subject-prefix 'PATCH 6.2.y' HEAD^..
Possible dependencies:
a7b92d59c885 ("maple_tree: detect dead nodes in mas_start()")
46b345848261 ("maple_tree: refine ma_state init from mas_start()")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From a7b92d59c885018cb7bb88539892278e4fd64b29 Mon Sep 17 00:00:00 2001
From: Liam Howlett <Liam.Howlett(a)oracle.com>
Date: Mon, 27 Feb 2023 09:36:01 -0800
Subject: [PATCH] maple_tree: detect dead nodes in mas_start()
When initially starting a search, the root node may already be in the
process of being replaced in RCU mode. Detect and restart the walk if
this is the case. This is necessary for RCU mode of the maple tree.
Link: https://lkml.kernel.org/r/20230227173632.3292573-3-surenb@google.com
Fixes: 54a611b60590 ("Maple Tree: add new data structure")
Signed-off-by: Liam Howlett <Liam.Howlett(a)oracle.com>
Signed-off-by: Suren Baghdasaryan <surenb(a)google.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/lib/maple_tree.c b/lib/maple_tree.c
index 095b9cb1f4f1..3d53339656e1 100644
--- a/lib/maple_tree.c
+++ b/lib/maple_tree.c
@@ -1360,12 +1360,16 @@ static inline struct maple_enode *mas_start(struct ma_state *mas)
mas->max = ULONG_MAX;
mas->depth = 0;
+retry:
root = mas_root(mas);
/* Tree with nodes */
if (likely(xa_is_node(root))) {
mas->depth = 1;
mas->node = mte_safe_root(root);
mas->offset = 0;
+ if (mte_dead_node(mas->node))
+ goto retry;
+
return NULL;
}
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x a7b92d59c885018cb7bb88539892278e4fd64b29
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023041102-dust-ecosystem-f498@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
a7b92d59c885 ("maple_tree: detect dead nodes in mas_start()")
46b345848261 ("maple_tree: refine ma_state init from mas_start()")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From a7b92d59c885018cb7bb88539892278e4fd64b29 Mon Sep 17 00:00:00 2001
From: Liam Howlett <Liam.Howlett(a)oracle.com>
Date: Mon, 27 Feb 2023 09:36:01 -0800
Subject: [PATCH] maple_tree: detect dead nodes in mas_start()
When initially starting a search, the root node may already be in the
process of being replaced in RCU mode. Detect and restart the walk if
this is the case. This is necessary for RCU mode of the maple tree.
Link: https://lkml.kernel.org/r/20230227173632.3292573-3-surenb@google.com
Fixes: 54a611b60590 ("Maple Tree: add new data structure")
Signed-off-by: Liam Howlett <Liam.Howlett(a)oracle.com>
Signed-off-by: Suren Baghdasaryan <surenb(a)google.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/lib/maple_tree.c b/lib/maple_tree.c
index 095b9cb1f4f1..3d53339656e1 100644
--- a/lib/maple_tree.c
+++ b/lib/maple_tree.c
@@ -1360,12 +1360,16 @@ static inline struct maple_enode *mas_start(struct ma_state *mas)
mas->max = ULONG_MAX;
mas->depth = 0;
+retry:
root = mas_root(mas);
/* Tree with nodes */
if (likely(xa_is_node(root))) {
mas->depth = 1;
mas->node = mte_safe_root(root);
mas->offset = 0;
+ if (mte_dead_node(mas->node))
+ goto retry;
+
return NULL;
}
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x 39d0bd86c499ecd6abae42a9b7112056c5560691
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023041149-mashed-decompose-eca7@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
39d0bd86c499 ("maple_tree: be more cautious about dead nodes")
65be6f058b0e ("maple_tree: fix potential rcu issue")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 39d0bd86c499ecd6abae42a9b7112056c5560691 Mon Sep 17 00:00:00 2001
From: Liam Howlett <Liam.Howlett(a)oracle.com>
Date: Mon, 27 Feb 2023 09:36:00 -0800
Subject: [PATCH] maple_tree: be more cautious about dead nodes
Patch series "Fix VMA tree modification under mmap read lock".
Syzbot reported a BUG_ON in mm/mmap.c which was found to be caused by an
inconsistency between threads walking the VMA maple tree. The
inconsistency is caused by the page fault handler modifying the maple tree
while holding the mmap_lock for read.
This only happens for stack VMAs. We had thought this was safe as it only
modifies a single pivot in the tree. Unfortunately, syzbot constructed a
test case where the stack had no guard page and grew the stack to abut the
next VMA. This causes us to delete the NULL entry between the two VMAs
and rewrite the node.
We considered several options for fixing this, including dropping the
mmap_lock, then reacquiring it for write; and relaxing the definition of
the tree to permit a zero-length NULL entry in the node. We decided the
best option was to backport some of the RCU patches from -next, which
solve the problem by allocating a new node and RCU-freeing the old node.
Since the problem exists in 6.1, we preferred a solution which is similar
to the one we intended to merge next merge window.
These patches have been in -next since next-20230301, and have received
intensive testing in Android as part of the RCU page fault patchset. They
were also sent as part of the "Per-VMA locks" v4 patch series. Patches 1
to 7 are bug fixes for RCU mode of the tree and patch 8 enables RCU mode
for the tree.
Performance v6.3-rc3 vs patched v6.3-rc3: Running these changes through
mmtests showed there was a 15-20% performance decrease in
will-it-scale/brk1-processes. This tests creating and inserting a single
VMA repeatedly through the brk interface and isn't representative of any
real world applications.
This patch (of 8):
ma_pivots() and ma_data_end() may be called with a dead node. Ensure to
that the node isn't dead before using the returned values.
This is necessary for RCU mode of the maple tree.
Link: https://lkml.kernel.org/r/20230327185532.2354250-1-Liam.Howlett@oracle.com
Link: https://lkml.kernel.org/r/20230227173632.3292573-1-surenb@google.com
Link: https://lkml.kernel.org/r/20230227173632.3292573-2-surenb@google.com
Fixes: 54a611b60590 ("Maple Tree: add new data structure")
Signed-off-by: Liam Howlett <Liam.Howlett(a)oracle.com>
Signed-off-by: Suren Baghdasaryan <surenb(a)google.com>
Cc: Andy Lutomirski <luto(a)kernel.org>
Cc: Arjun Roy <arjunroy(a)google.com>
Cc: Axel Rasmussen <axelrasmussen(a)google.com>
Cc: Chris Li <chriscli(a)google.com>
Cc: David Hildenbrand <david(a)redhat.com>
Cc: David Howells <dhowells(a)redhat.com>
Cc: Davidlohr Bueso <dave(a)stgolabs.net>
Cc: David Rientjes <rientjes(a)google.com>
Cc: Eric Dumazet <edumazet(a)google.com>
Cc: freak07 <michalechner92(a)googlemail.com>
Cc: Greg Thelen <gthelen(a)google.com>
Cc: Hugh Dickins <hughd(a)google.com>
Cc: Ingo Molnar <mingo(a)redhat.com>
Cc: Jann Horn <jannh(a)google.com>
Cc: Joel Fernandes <joelaf(a)google.com>
Cc: Johannes Weiner <hannes(a)cmpxchg.org>
Cc: Kent Overstreet <kent.overstreet(a)linux.dev>
Cc: Laurent Dufour <ldufour(a)linux.ibm.com>
Cc: Lorenzo Stoakes <lstoakes(a)gmail.com>
Cc: Matthew Wilcox <willy(a)infradead.org>
Cc: Mel Gorman <mgorman(a)techsingularity.net>
Cc: Michal Hocko <mhocko(a)suse.com>
Cc: Mike Rapoport <rppt(a)kernel.org>
Cc: Minchan Kim <minchan(a)google.com>
Cc: Paul E. McKenney <paulmck(a)kernel.org>
Cc: Peter Oskolkov <posk(a)google.com>
Cc: Peter Xu <peterx(a)redhat.com>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Punit Agrawal <punit.agrawal(a)bytedance.com>
Cc: Sebastian Andrzej Siewior <bigeasy(a)linutronix.de>
Cc: Shakeel Butt <shakeelb(a)google.com>
Cc: Soheil Hassas Yeganeh <soheil(a)google.com>
Cc: Song Liu <songliubraving(a)fb.com>
Cc: Vlastimil Babka <vbabka(a)suse.cz>
Cc: Will Deacon <will(a)kernel.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/lib/maple_tree.c b/lib/maple_tree.c
index 9e2735cbc2b4..095b9cb1f4f1 100644
--- a/lib/maple_tree.c
+++ b/lib/maple_tree.c
@@ -544,6 +544,7 @@ static inline bool ma_dead_node(const struct maple_node *node)
return (parent == node);
}
+
/*
* mte_dead_node() - check if the @enode is dead.
* @enode: The encoded maple node
@@ -625,6 +626,8 @@ static inline unsigned int mas_alloc_req(const struct ma_state *mas)
* @node - the maple node
* @type - the node type
*
+ * In the event of a dead node, this array may be %NULL
+ *
* Return: A pointer to the maple node pivots
*/
static inline unsigned long *ma_pivots(struct maple_node *node,
@@ -1096,8 +1099,11 @@ static int mas_ascend(struct ma_state *mas)
a_type = mas_parent_enum(mas, p_enode);
a_node = mte_parent(p_enode);
a_slot = mte_parent_slot(p_enode);
- pivots = ma_pivots(a_node, a_type);
a_enode = mt_mk_node(a_node, a_type);
+ pivots = ma_pivots(a_node, a_type);
+
+ if (unlikely(ma_dead_node(a_node)))
+ return 1;
if (!set_min && a_slot) {
set_min = true;
@@ -1401,6 +1407,9 @@ static inline unsigned char ma_data_end(struct maple_node *node,
{
unsigned char offset;
+ if (!pivots)
+ return 0;
+
if (type == maple_arange_64)
return ma_meta_end(node, type);
@@ -1436,6 +1445,9 @@ static inline unsigned char mas_data_end(struct ma_state *mas)
return ma_meta_end(node, type);
pivots = ma_pivots(node, type);
+ if (unlikely(ma_dead_node(node)))
+ return 0;
+
offset = mt_pivots[type] - 1;
if (likely(!pivots[offset]))
return ma_meta_end(node, type);
@@ -4505,6 +4517,9 @@ static inline int mas_prev_node(struct ma_state *mas, unsigned long min)
node = mas_mn(mas);
slots = ma_slots(node, mt);
pivots = ma_pivots(node, mt);
+ if (unlikely(ma_dead_node(node)))
+ return 1;
+
mas->max = pivots[offset];
if (offset)
mas->min = pivots[offset - 1] + 1;
@@ -4526,6 +4541,9 @@ static inline int mas_prev_node(struct ma_state *mas, unsigned long min)
slots = ma_slots(node, mt);
pivots = ma_pivots(node, mt);
offset = ma_data_end(node, mt, pivots, mas->max);
+ if (unlikely(ma_dead_node(node)))
+ return 1;
+
if (offset)
mas->min = pivots[offset - 1] + 1;
@@ -4574,6 +4592,7 @@ static inline int mas_next_node(struct ma_state *mas, struct maple_node *node,
struct maple_enode *enode;
int level = 0;
unsigned char offset;
+ unsigned char node_end;
enum maple_type mt;
void __rcu **slots;
@@ -4597,7 +4616,11 @@ static inline int mas_next_node(struct ma_state *mas, struct maple_node *node,
node = mas_mn(mas);
mt = mte_node_type(mas->node);
pivots = ma_pivots(node, mt);
- } while (unlikely(offset == ma_data_end(node, mt, pivots, mas->max)));
+ node_end = ma_data_end(node, mt, pivots, mas->max);
+ if (unlikely(ma_dead_node(node)))
+ return 1;
+
+ } while (unlikely(offset == node_end));
slots = ma_slots(node, mt);
pivot = mas_safe_pivot(mas, pivots, ++offset, mt);
@@ -4613,6 +4636,9 @@ static inline int mas_next_node(struct ma_state *mas, struct maple_node *node,
mt = mte_node_type(mas->node);
slots = ma_slots(node, mt);
pivots = ma_pivots(node, mt);
+ if (unlikely(ma_dead_node(node)))
+ return 1;
+
offset = 0;
pivot = pivots[0];
}
@@ -4659,11 +4685,14 @@ static inline void *mas_next_nentry(struct ma_state *mas,
return NULL;
}
- pivots = ma_pivots(node, type);
slots = ma_slots(node, type);
- mas->index = mas_safe_min(mas, pivots, mas->offset);
+ pivots = ma_pivots(node, type);
count = ma_data_end(node, type, pivots, mas->max);
- if (ma_dead_node(node))
+ if (unlikely(ma_dead_node(node)))
+ return NULL;
+
+ mas->index = mas_safe_min(mas, pivots, mas->offset);
+ if (unlikely(ma_dead_node(node)))
return NULL;
if (mas->index > max)
@@ -4817,6 +4846,11 @@ static inline void *mas_prev_nentry(struct ma_state *mas, unsigned long limit,
slots = ma_slots(mn, mt);
pivots = ma_pivots(mn, mt);
+ if (unlikely(ma_dead_node(mn))) {
+ mas_rewalk(mas, index);
+ goto retry;
+ }
+
if (offset == mt_pivots[mt])
pivot = mas->max;
else
@@ -6617,11 +6651,11 @@ static inline void *mas_first_entry(struct ma_state *mas, struct maple_node *mn,
while (likely(!ma_is_leaf(mt))) {
MT_BUG_ON(mas->tree, mte_dead_node(mas->node));
slots = ma_slots(mn, mt);
- pivots = ma_pivots(mn, mt);
- max = pivots[0];
entry = mas_slot(mas, slots, 0);
+ pivots = ma_pivots(mn, mt);
if (unlikely(ma_dead_node(mn)))
return NULL;
+ max = pivots[0];
mas->node = entry;
mn = mas_mn(mas);
mt = mte_node_type(mas->node);
@@ -6641,13 +6675,13 @@ static inline void *mas_first_entry(struct ma_state *mas, struct maple_node *mn,
if (likely(entry))
return entry;
- pivots = ma_pivots(mn, mt);
- mas->index = pivots[0] + 1;
mas->offset = 1;
entry = mas_slot(mas, slots, 1);
+ pivots = ma_pivots(mn, mt);
if (unlikely(ma_dead_node(mn)))
return NULL;
+ mas->index = pivots[0] + 1;
if (mas->index > limit)
goto none;
The patch below does not apply to the 6.2-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.2.y
git checkout FETCH_HEAD
git cherry-pick -x 39d0bd86c499ecd6abae42a9b7112056c5560691
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023041146-spout-exterior-7270@gregkh' --subject-prefix 'PATCH 6.2.y' HEAD^..
Possible dependencies:
39d0bd86c499 ("maple_tree: be more cautious about dead nodes")
65be6f058b0e ("maple_tree: fix potential rcu issue")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 39d0bd86c499ecd6abae42a9b7112056c5560691 Mon Sep 17 00:00:00 2001
From: Liam Howlett <Liam.Howlett(a)oracle.com>
Date: Mon, 27 Feb 2023 09:36:00 -0800
Subject: [PATCH] maple_tree: be more cautious about dead nodes
Patch series "Fix VMA tree modification under mmap read lock".
Syzbot reported a BUG_ON in mm/mmap.c which was found to be caused by an
inconsistency between threads walking the VMA maple tree. The
inconsistency is caused by the page fault handler modifying the maple tree
while holding the mmap_lock for read.
This only happens for stack VMAs. We had thought this was safe as it only
modifies a single pivot in the tree. Unfortunately, syzbot constructed a
test case where the stack had no guard page and grew the stack to abut the
next VMA. This causes us to delete the NULL entry between the two VMAs
and rewrite the node.
We considered several options for fixing this, including dropping the
mmap_lock, then reacquiring it for write; and relaxing the definition of
the tree to permit a zero-length NULL entry in the node. We decided the
best option was to backport some of the RCU patches from -next, which
solve the problem by allocating a new node and RCU-freeing the old node.
Since the problem exists in 6.1, we preferred a solution which is similar
to the one we intended to merge next merge window.
These patches have been in -next since next-20230301, and have received
intensive testing in Android as part of the RCU page fault patchset. They
were also sent as part of the "Per-VMA locks" v4 patch series. Patches 1
to 7 are bug fixes for RCU mode of the tree and patch 8 enables RCU mode
for the tree.
Performance v6.3-rc3 vs patched v6.3-rc3: Running these changes through
mmtests showed there was a 15-20% performance decrease in
will-it-scale/brk1-processes. This tests creating and inserting a single
VMA repeatedly through the brk interface and isn't representative of any
real world applications.
This patch (of 8):
ma_pivots() and ma_data_end() may be called with a dead node. Ensure to
that the node isn't dead before using the returned values.
This is necessary for RCU mode of the maple tree.
Link: https://lkml.kernel.org/r/20230327185532.2354250-1-Liam.Howlett@oracle.com
Link: https://lkml.kernel.org/r/20230227173632.3292573-1-surenb@google.com
Link: https://lkml.kernel.org/r/20230227173632.3292573-2-surenb@google.com
Fixes: 54a611b60590 ("Maple Tree: add new data structure")
Signed-off-by: Liam Howlett <Liam.Howlett(a)oracle.com>
Signed-off-by: Suren Baghdasaryan <surenb(a)google.com>
Cc: Andy Lutomirski <luto(a)kernel.org>
Cc: Arjun Roy <arjunroy(a)google.com>
Cc: Axel Rasmussen <axelrasmussen(a)google.com>
Cc: Chris Li <chriscli(a)google.com>
Cc: David Hildenbrand <david(a)redhat.com>
Cc: David Howells <dhowells(a)redhat.com>
Cc: Davidlohr Bueso <dave(a)stgolabs.net>
Cc: David Rientjes <rientjes(a)google.com>
Cc: Eric Dumazet <edumazet(a)google.com>
Cc: freak07 <michalechner92(a)googlemail.com>
Cc: Greg Thelen <gthelen(a)google.com>
Cc: Hugh Dickins <hughd(a)google.com>
Cc: Ingo Molnar <mingo(a)redhat.com>
Cc: Jann Horn <jannh(a)google.com>
Cc: Joel Fernandes <joelaf(a)google.com>
Cc: Johannes Weiner <hannes(a)cmpxchg.org>
Cc: Kent Overstreet <kent.overstreet(a)linux.dev>
Cc: Laurent Dufour <ldufour(a)linux.ibm.com>
Cc: Lorenzo Stoakes <lstoakes(a)gmail.com>
Cc: Matthew Wilcox <willy(a)infradead.org>
Cc: Mel Gorman <mgorman(a)techsingularity.net>
Cc: Michal Hocko <mhocko(a)suse.com>
Cc: Mike Rapoport <rppt(a)kernel.org>
Cc: Minchan Kim <minchan(a)google.com>
Cc: Paul E. McKenney <paulmck(a)kernel.org>
Cc: Peter Oskolkov <posk(a)google.com>
Cc: Peter Xu <peterx(a)redhat.com>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Punit Agrawal <punit.agrawal(a)bytedance.com>
Cc: Sebastian Andrzej Siewior <bigeasy(a)linutronix.de>
Cc: Shakeel Butt <shakeelb(a)google.com>
Cc: Soheil Hassas Yeganeh <soheil(a)google.com>
Cc: Song Liu <songliubraving(a)fb.com>
Cc: Vlastimil Babka <vbabka(a)suse.cz>
Cc: Will Deacon <will(a)kernel.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/lib/maple_tree.c b/lib/maple_tree.c
index 9e2735cbc2b4..095b9cb1f4f1 100644
--- a/lib/maple_tree.c
+++ b/lib/maple_tree.c
@@ -544,6 +544,7 @@ static inline bool ma_dead_node(const struct maple_node *node)
return (parent == node);
}
+
/*
* mte_dead_node() - check if the @enode is dead.
* @enode: The encoded maple node
@@ -625,6 +626,8 @@ static inline unsigned int mas_alloc_req(const struct ma_state *mas)
* @node - the maple node
* @type - the node type
*
+ * In the event of a dead node, this array may be %NULL
+ *
* Return: A pointer to the maple node pivots
*/
static inline unsigned long *ma_pivots(struct maple_node *node,
@@ -1096,8 +1099,11 @@ static int mas_ascend(struct ma_state *mas)
a_type = mas_parent_enum(mas, p_enode);
a_node = mte_parent(p_enode);
a_slot = mte_parent_slot(p_enode);
- pivots = ma_pivots(a_node, a_type);
a_enode = mt_mk_node(a_node, a_type);
+ pivots = ma_pivots(a_node, a_type);
+
+ if (unlikely(ma_dead_node(a_node)))
+ return 1;
if (!set_min && a_slot) {
set_min = true;
@@ -1401,6 +1407,9 @@ static inline unsigned char ma_data_end(struct maple_node *node,
{
unsigned char offset;
+ if (!pivots)
+ return 0;
+
if (type == maple_arange_64)
return ma_meta_end(node, type);
@@ -1436,6 +1445,9 @@ static inline unsigned char mas_data_end(struct ma_state *mas)
return ma_meta_end(node, type);
pivots = ma_pivots(node, type);
+ if (unlikely(ma_dead_node(node)))
+ return 0;
+
offset = mt_pivots[type] - 1;
if (likely(!pivots[offset]))
return ma_meta_end(node, type);
@@ -4505,6 +4517,9 @@ static inline int mas_prev_node(struct ma_state *mas, unsigned long min)
node = mas_mn(mas);
slots = ma_slots(node, mt);
pivots = ma_pivots(node, mt);
+ if (unlikely(ma_dead_node(node)))
+ return 1;
+
mas->max = pivots[offset];
if (offset)
mas->min = pivots[offset - 1] + 1;
@@ -4526,6 +4541,9 @@ static inline int mas_prev_node(struct ma_state *mas, unsigned long min)
slots = ma_slots(node, mt);
pivots = ma_pivots(node, mt);
offset = ma_data_end(node, mt, pivots, mas->max);
+ if (unlikely(ma_dead_node(node)))
+ return 1;
+
if (offset)
mas->min = pivots[offset - 1] + 1;
@@ -4574,6 +4592,7 @@ static inline int mas_next_node(struct ma_state *mas, struct maple_node *node,
struct maple_enode *enode;
int level = 0;
unsigned char offset;
+ unsigned char node_end;
enum maple_type mt;
void __rcu **slots;
@@ -4597,7 +4616,11 @@ static inline int mas_next_node(struct ma_state *mas, struct maple_node *node,
node = mas_mn(mas);
mt = mte_node_type(mas->node);
pivots = ma_pivots(node, mt);
- } while (unlikely(offset == ma_data_end(node, mt, pivots, mas->max)));
+ node_end = ma_data_end(node, mt, pivots, mas->max);
+ if (unlikely(ma_dead_node(node)))
+ return 1;
+
+ } while (unlikely(offset == node_end));
slots = ma_slots(node, mt);
pivot = mas_safe_pivot(mas, pivots, ++offset, mt);
@@ -4613,6 +4636,9 @@ static inline int mas_next_node(struct ma_state *mas, struct maple_node *node,
mt = mte_node_type(mas->node);
slots = ma_slots(node, mt);
pivots = ma_pivots(node, mt);
+ if (unlikely(ma_dead_node(node)))
+ return 1;
+
offset = 0;
pivot = pivots[0];
}
@@ -4659,11 +4685,14 @@ static inline void *mas_next_nentry(struct ma_state *mas,
return NULL;
}
- pivots = ma_pivots(node, type);
slots = ma_slots(node, type);
- mas->index = mas_safe_min(mas, pivots, mas->offset);
+ pivots = ma_pivots(node, type);
count = ma_data_end(node, type, pivots, mas->max);
- if (ma_dead_node(node))
+ if (unlikely(ma_dead_node(node)))
+ return NULL;
+
+ mas->index = mas_safe_min(mas, pivots, mas->offset);
+ if (unlikely(ma_dead_node(node)))
return NULL;
if (mas->index > max)
@@ -4817,6 +4846,11 @@ static inline void *mas_prev_nentry(struct ma_state *mas, unsigned long limit,
slots = ma_slots(mn, mt);
pivots = ma_pivots(mn, mt);
+ if (unlikely(ma_dead_node(mn))) {
+ mas_rewalk(mas, index);
+ goto retry;
+ }
+
if (offset == mt_pivots[mt])
pivot = mas->max;
else
@@ -6617,11 +6651,11 @@ static inline void *mas_first_entry(struct ma_state *mas, struct maple_node *mn,
while (likely(!ma_is_leaf(mt))) {
MT_BUG_ON(mas->tree, mte_dead_node(mas->node));
slots = ma_slots(mn, mt);
- pivots = ma_pivots(mn, mt);
- max = pivots[0];
entry = mas_slot(mas, slots, 0);
+ pivots = ma_pivots(mn, mt);
if (unlikely(ma_dead_node(mn)))
return NULL;
+ max = pivots[0];
mas->node = entry;
mn = mas_mn(mas);
mt = mte_node_type(mas->node);
@@ -6641,13 +6675,13 @@ static inline void *mas_first_entry(struct ma_state *mas, struct maple_node *mn,
if (likely(entry))
return entry;
- pivots = ma_pivots(mn, mt);
- mas->index = pivots[0] + 1;
mas->offset = 1;
entry = mas_slot(mas, slots, 1);
+ pivots = ma_pivots(mn, mt);
if (unlikely(ma_dead_node(mn)))
return NULL;
+ mas->index = pivots[0] + 1;
if (mas->index > limit)
goto none;
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.14.y
git checkout FETCH_HEAD
git cherry-pick -x dc30c011469165d57af9adac5baff7d767d20e5c
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023041105-shakily-screen-fbb6@gregkh' --subject-prefix 'PATCH 4.14.y' HEAD^..
Possible dependencies:
dc30c0114691 ("drm/i915: fix race condition UAF in i915_perf_add_config_ioctl")
2fec539112e8 ("i915/perf: Replace DRM_DEBUG with driver specific drm_dbg call")
046d1660daee ("drm/i915/gem: Return an error ptr from context_lookup")
a4839cb1137b ("drm/i915: Stop manually RCU banging in reset_stats_ioctl (v2)")
651e7d48577a ("drm/i915: replace IS_GEN and friends with GRAPHICS_VER")
ec2b1485a065 ("drm/i915/dmc: s/HAS_CSR/HAS_DMC")
c24760cf42c3 ("drm/i915/dmc: s/intel_csr/intel_dmc")
93e7e61eb448 ("drm/i915/display: rename display version macros")
4df9c1ae7a4b ("drm/i915: rename display.version to display.ver")
6c51f288b41f ("drm/i915: Don't use {skl, cnl}_hpd_pin() for bxt/glk")
0fe6637d9852 ("drm/i915: Restore lost glk ccs w/a")
87b8c3bc8d27 ("drm/i915: Restore lost glk FBC 16bpp w/a")
2446e1d6433b ("drm/i915/display: Eliminate IS_GEN9_{BC,LP}")
9c0fed84d575 ("Merge tag 'drm-intel-next-2021-04-01' of git://anongit.freedesktop.org/drm/drm-intel into drm-next")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From dc30c011469165d57af9adac5baff7d767d20e5c Mon Sep 17 00:00:00 2001
From: Min Li <lm0963hack(a)gmail.com>
Date: Tue, 28 Mar 2023 17:36:27 +0800
Subject: [PATCH] drm/i915: fix race condition UAF in
i915_perf_add_config_ioctl
Userspace can guess the id value and try to race oa_config object creation
with config remove, resulting in a use-after-free if we dereference the
object after unlocking the metrics_lock. For that reason, unlocking the
metrics_lock must be done after we are done dereferencing the object.
Signed-off-by: Min Li <lm0963hack(a)gmail.com>
Fixes: f89823c21224 ("drm/i915/perf: Implement I915_PERF_ADD/REMOVE_CONFIG interface")
Cc: <stable(a)vger.kernel.org> # v4.14+
Reviewed-by: Andi Shyti <andi.shyti(a)linux.intel.com>
Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa(a)intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin(a)intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230328093627.5067-1-lm0963h…
[tursulin: Manually added stable tag.]
(cherry picked from commit 49f6f6483b652108bcb73accd0204a464b922395)
Signed-off-by: Jani Nikula <jani.nikula(a)intel.com>
diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
index 283a4a3c6862..004074936300 100644
--- a/drivers/gpu/drm/i915/i915_perf.c
+++ b/drivers/gpu/drm/i915/i915_perf.c
@@ -4638,13 +4638,13 @@ int i915_perf_add_config_ioctl(struct drm_device *dev, void *data,
err = oa_config->id;
goto sysfs_err;
}
-
- mutex_unlock(&perf->metrics_lock);
+ id = oa_config->id;
drm_dbg(&perf->i915->drm,
"Added config %s id=%i\n", oa_config->uuid, oa_config->id);
+ mutex_unlock(&perf->metrics_lock);
- return oa_config->id;
+ return id;
sysfs_err:
mutex_unlock(&perf->metrics_lock);
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.19.y
git checkout FETCH_HEAD
git cherry-pick -x dc30c011469165d57af9adac5baff7d767d20e5c
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023041104-implant-passport-b83d@gregkh' --subject-prefix 'PATCH 4.19.y' HEAD^..
Possible dependencies:
dc30c0114691 ("drm/i915: fix race condition UAF in i915_perf_add_config_ioctl")
2fec539112e8 ("i915/perf: Replace DRM_DEBUG with driver specific drm_dbg call")
046d1660daee ("drm/i915/gem: Return an error ptr from context_lookup")
a4839cb1137b ("drm/i915: Stop manually RCU banging in reset_stats_ioctl (v2)")
651e7d48577a ("drm/i915: replace IS_GEN and friends with GRAPHICS_VER")
ec2b1485a065 ("drm/i915/dmc: s/HAS_CSR/HAS_DMC")
c24760cf42c3 ("drm/i915/dmc: s/intel_csr/intel_dmc")
93e7e61eb448 ("drm/i915/display: rename display version macros")
4df9c1ae7a4b ("drm/i915: rename display.version to display.ver")
6c51f288b41f ("drm/i915: Don't use {skl, cnl}_hpd_pin() for bxt/glk")
0fe6637d9852 ("drm/i915: Restore lost glk ccs w/a")
87b8c3bc8d27 ("drm/i915: Restore lost glk FBC 16bpp w/a")
2446e1d6433b ("drm/i915/display: Eliminate IS_GEN9_{BC,LP}")
9c0fed84d575 ("Merge tag 'drm-intel-next-2021-04-01' of git://anongit.freedesktop.org/drm/drm-intel into drm-next")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From dc30c011469165d57af9adac5baff7d767d20e5c Mon Sep 17 00:00:00 2001
From: Min Li <lm0963hack(a)gmail.com>
Date: Tue, 28 Mar 2023 17:36:27 +0800
Subject: [PATCH] drm/i915: fix race condition UAF in
i915_perf_add_config_ioctl
Userspace can guess the id value and try to race oa_config object creation
with config remove, resulting in a use-after-free if we dereference the
object after unlocking the metrics_lock. For that reason, unlocking the
metrics_lock must be done after we are done dereferencing the object.
Signed-off-by: Min Li <lm0963hack(a)gmail.com>
Fixes: f89823c21224 ("drm/i915/perf: Implement I915_PERF_ADD/REMOVE_CONFIG interface")
Cc: <stable(a)vger.kernel.org> # v4.14+
Reviewed-by: Andi Shyti <andi.shyti(a)linux.intel.com>
Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa(a)intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin(a)intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230328093627.5067-1-lm0963h…
[tursulin: Manually added stable tag.]
(cherry picked from commit 49f6f6483b652108bcb73accd0204a464b922395)
Signed-off-by: Jani Nikula <jani.nikula(a)intel.com>
diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
index 283a4a3c6862..004074936300 100644
--- a/drivers/gpu/drm/i915/i915_perf.c
+++ b/drivers/gpu/drm/i915/i915_perf.c
@@ -4638,13 +4638,13 @@ int i915_perf_add_config_ioctl(struct drm_device *dev, void *data,
err = oa_config->id;
goto sysfs_err;
}
-
- mutex_unlock(&perf->metrics_lock);
+ id = oa_config->id;
drm_dbg(&perf->i915->drm,
"Added config %s id=%i\n", oa_config->uuid, oa_config->id);
+ mutex_unlock(&perf->metrics_lock);
- return oa_config->id;
+ return id;
sysfs_err:
mutex_unlock(&perf->metrics_lock);
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y
git checkout FETCH_HEAD
git cherry-pick -x dc30c011469165d57af9adac5baff7d767d20e5c
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023041103-fading-coexist-fbc0@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^..
Possible dependencies:
dc30c0114691 ("drm/i915: fix race condition UAF in i915_perf_add_config_ioctl")
2fec539112e8 ("i915/perf: Replace DRM_DEBUG with driver specific drm_dbg call")
046d1660daee ("drm/i915/gem: Return an error ptr from context_lookup")
a4839cb1137b ("drm/i915: Stop manually RCU banging in reset_stats_ioctl (v2)")
651e7d48577a ("drm/i915: replace IS_GEN and friends with GRAPHICS_VER")
ec2b1485a065 ("drm/i915/dmc: s/HAS_CSR/HAS_DMC")
c24760cf42c3 ("drm/i915/dmc: s/intel_csr/intel_dmc")
93e7e61eb448 ("drm/i915/display: rename display version macros")
4df9c1ae7a4b ("drm/i915: rename display.version to display.ver")
6c51f288b41f ("drm/i915: Don't use {skl, cnl}_hpd_pin() for bxt/glk")
0fe6637d9852 ("drm/i915: Restore lost glk ccs w/a")
87b8c3bc8d27 ("drm/i915: Restore lost glk FBC 16bpp w/a")
2446e1d6433b ("drm/i915/display: Eliminate IS_GEN9_{BC,LP}")
9c0fed84d575 ("Merge tag 'drm-intel-next-2021-04-01' of git://anongit.freedesktop.org/drm/drm-intel into drm-next")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From dc30c011469165d57af9adac5baff7d767d20e5c Mon Sep 17 00:00:00 2001
From: Min Li <lm0963hack(a)gmail.com>
Date: Tue, 28 Mar 2023 17:36:27 +0800
Subject: [PATCH] drm/i915: fix race condition UAF in
i915_perf_add_config_ioctl
Userspace can guess the id value and try to race oa_config object creation
with config remove, resulting in a use-after-free if we dereference the
object after unlocking the metrics_lock. For that reason, unlocking the
metrics_lock must be done after we are done dereferencing the object.
Signed-off-by: Min Li <lm0963hack(a)gmail.com>
Fixes: f89823c21224 ("drm/i915/perf: Implement I915_PERF_ADD/REMOVE_CONFIG interface")
Cc: <stable(a)vger.kernel.org> # v4.14+
Reviewed-by: Andi Shyti <andi.shyti(a)linux.intel.com>
Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa(a)intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin(a)intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230328093627.5067-1-lm0963h…
[tursulin: Manually added stable tag.]
(cherry picked from commit 49f6f6483b652108bcb73accd0204a464b922395)
Signed-off-by: Jani Nikula <jani.nikula(a)intel.com>
diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
index 283a4a3c6862..004074936300 100644
--- a/drivers/gpu/drm/i915/i915_perf.c
+++ b/drivers/gpu/drm/i915/i915_perf.c
@@ -4638,13 +4638,13 @@ int i915_perf_add_config_ioctl(struct drm_device *dev, void *data,
err = oa_config->id;
goto sysfs_err;
}
-
- mutex_unlock(&perf->metrics_lock);
+ id = oa_config->id;
drm_dbg(&perf->i915->drm,
"Added config %s id=%i\n", oa_config->uuid, oa_config->id);
+ mutex_unlock(&perf->metrics_lock);
- return oa_config->id;
+ return id;
sysfs_err:
mutex_unlock(&perf->metrics_lock);
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x dc30c011469165d57af9adac5baff7d767d20e5c
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023041102-expenses-unwoven-6355@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
dc30c0114691 ("drm/i915: fix race condition UAF in i915_perf_add_config_ioctl")
2fec539112e8 ("i915/perf: Replace DRM_DEBUG with driver specific drm_dbg call")
046d1660daee ("drm/i915/gem: Return an error ptr from context_lookup")
a4839cb1137b ("drm/i915: Stop manually RCU banging in reset_stats_ioctl (v2)")
651e7d48577a ("drm/i915: replace IS_GEN and friends with GRAPHICS_VER")
ec2b1485a065 ("drm/i915/dmc: s/HAS_CSR/HAS_DMC")
c24760cf42c3 ("drm/i915/dmc: s/intel_csr/intel_dmc")
93e7e61eb448 ("drm/i915/display: rename display version macros")
4df9c1ae7a4b ("drm/i915: rename display.version to display.ver")
6c51f288b41f ("drm/i915: Don't use {skl, cnl}_hpd_pin() for bxt/glk")
0fe6637d9852 ("drm/i915: Restore lost glk ccs w/a")
87b8c3bc8d27 ("drm/i915: Restore lost glk FBC 16bpp w/a")
2446e1d6433b ("drm/i915/display: Eliminate IS_GEN9_{BC,LP}")
9c0fed84d575 ("Merge tag 'drm-intel-next-2021-04-01' of git://anongit.freedesktop.org/drm/drm-intel into drm-next")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From dc30c011469165d57af9adac5baff7d767d20e5c Mon Sep 17 00:00:00 2001
From: Min Li <lm0963hack(a)gmail.com>
Date: Tue, 28 Mar 2023 17:36:27 +0800
Subject: [PATCH] drm/i915: fix race condition UAF in
i915_perf_add_config_ioctl
Userspace can guess the id value and try to race oa_config object creation
with config remove, resulting in a use-after-free if we dereference the
object after unlocking the metrics_lock. For that reason, unlocking the
metrics_lock must be done after we are done dereferencing the object.
Signed-off-by: Min Li <lm0963hack(a)gmail.com>
Fixes: f89823c21224 ("drm/i915/perf: Implement I915_PERF_ADD/REMOVE_CONFIG interface")
Cc: <stable(a)vger.kernel.org> # v4.14+
Reviewed-by: Andi Shyti <andi.shyti(a)linux.intel.com>
Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa(a)intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin(a)intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230328093627.5067-1-lm0963h…
[tursulin: Manually added stable tag.]
(cherry picked from commit 49f6f6483b652108bcb73accd0204a464b922395)
Signed-off-by: Jani Nikula <jani.nikula(a)intel.com>
diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
index 283a4a3c6862..004074936300 100644
--- a/drivers/gpu/drm/i915/i915_perf.c
+++ b/drivers/gpu/drm/i915/i915_perf.c
@@ -4638,13 +4638,13 @@ int i915_perf_add_config_ioctl(struct drm_device *dev, void *data,
err = oa_config->id;
goto sysfs_err;
}
-
- mutex_unlock(&perf->metrics_lock);
+ id = oa_config->id;
drm_dbg(&perf->i915->drm,
"Added config %s id=%i\n", oa_config->uuid, oa_config->id);
+ mutex_unlock(&perf->metrics_lock);
- return oa_config->id;
+ return id;
sysfs_err:
mutex_unlock(&perf->metrics_lock);
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x dc30c011469165d57af9adac5baff7d767d20e5c
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023041101-slackness-maturing-0041@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
dc30c0114691 ("drm/i915: fix race condition UAF in i915_perf_add_config_ioctl")
2fec539112e8 ("i915/perf: Replace DRM_DEBUG with driver specific drm_dbg call")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From dc30c011469165d57af9adac5baff7d767d20e5c Mon Sep 17 00:00:00 2001
From: Min Li <lm0963hack(a)gmail.com>
Date: Tue, 28 Mar 2023 17:36:27 +0800
Subject: [PATCH] drm/i915: fix race condition UAF in
i915_perf_add_config_ioctl
Userspace can guess the id value and try to race oa_config object creation
with config remove, resulting in a use-after-free if we dereference the
object after unlocking the metrics_lock. For that reason, unlocking the
metrics_lock must be done after we are done dereferencing the object.
Signed-off-by: Min Li <lm0963hack(a)gmail.com>
Fixes: f89823c21224 ("drm/i915/perf: Implement I915_PERF_ADD/REMOVE_CONFIG interface")
Cc: <stable(a)vger.kernel.org> # v4.14+
Reviewed-by: Andi Shyti <andi.shyti(a)linux.intel.com>
Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa(a)intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin(a)intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230328093627.5067-1-lm0963h…
[tursulin: Manually added stable tag.]
(cherry picked from commit 49f6f6483b652108bcb73accd0204a464b922395)
Signed-off-by: Jani Nikula <jani.nikula(a)intel.com>
diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
index 283a4a3c6862..004074936300 100644
--- a/drivers/gpu/drm/i915/i915_perf.c
+++ b/drivers/gpu/drm/i915/i915_perf.c
@@ -4638,13 +4638,13 @@ int i915_perf_add_config_ioctl(struct drm_device *dev, void *data,
err = oa_config->id;
goto sysfs_err;
}
-
- mutex_unlock(&perf->metrics_lock);
+ id = oa_config->id;
drm_dbg(&perf->i915->drm,
"Added config %s id=%i\n", oa_config->uuid, oa_config->id);
+ mutex_unlock(&perf->metrics_lock);
- return oa_config->id;
+ return id;
sysfs_err:
mutex_unlock(&perf->metrics_lock);
If qdisc_create_dflt() fails, it returns NULL. With CONFIG_NET_SCHED
enabled, the check qdisc != &noop_qdisc passes and qdisc will be passed
to qdisc_hash_add(), which dereferences it.
This assignment was present in the upstream commit
5891cd5ec46c2 ("net_sched: add __rcu annotation to netdev->qdisc") but
was missed in the backport 22d95b5449249 ("net_sched: add __rcu
annotation to netdev->qdisc"), perhaps due to merge conflicts.
dev->qdisc is &noop_qdisc by default and if qdisc_create_dflt() fails,
this assignment will make sure qdisc == &noop_qdisc and no NULL
dereference will take place.
This bug was discovered and resolved using Coverity Static Analysis
Security Testing (SAST) by Synopsys, Inc.
Fixes: 22d95b5449249 ("net_sched: add __rcu annotation to netdev->qdisc")
Signed-off-by: Pratyush Yadav <ptyadav(a)amazon.de>
---
As usual, this was found by our static code analysis bot. I have
compile-tested this patch and ran a simple boot test. Did not do any
testing specifically to hit this bug.
net/sched/sch_generic.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
index 1f055c21be4cf..4250f3cf30e72 100644
--- a/net/sched/sch_generic.c
+++ b/net/sched/sch_generic.c
@@ -1116,6 +1116,7 @@ static void attach_default_qdiscs(struct net_device *dev)
qdisc->ops->attach(qdisc);
}
}
+ qdisc = rtnl_dereference(dev->qdisc);
#ifdef CONFIG_NET_SCHED
if (qdisc != &noop_qdisc)
--
2.39.2
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x 0145462fc802cd447ef5d029758043c7f15b4b1e
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023041108-careless-semifinal-6234@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
0145462fc802 ("can: isotp: isotp_recvmsg(): use sock_recv_cmsgs() to get SOCK_RXQ_OVFL infos")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 0145462fc802cd447ef5d029758043c7f15b4b1e Mon Sep 17 00:00:00 2001
From: Oliver Hartkopp <socketcan(a)hartkopp.net>
Date: Thu, 30 Mar 2023 19:02:48 +0200
Subject: [PATCH] can: isotp: isotp_recvmsg(): use sock_recv_cmsgs() to get
SOCK_RXQ_OVFL infos
isotp.c was still using sock_recv_timestamp() which does not provide
control messages to detect dropped PDUs in the receive path.
Fixes: e057dd3fc20f ("can: add ISO 15765-2:2016 transport protocol")
Signed-off-by: Oliver Hartkopp <socketcan(a)hartkopp.net>
Link: https://lore.kernel.org/all/20230330170248.62342-1-socketcan@hartkopp.net
Cc: stable(a)vger.kernel.org
Signed-off-by: Marc Kleine-Budde <mkl(a)pengutronix.de>
diff --git a/net/can/isotp.c b/net/can/isotp.c
index 9bc344851704..47c2ebad10ed 100644
--- a/net/can/isotp.c
+++ b/net/can/isotp.c
@@ -1120,7 +1120,7 @@ static int isotp_recvmsg(struct socket *sock, struct msghdr *msg, size_t size,
if (ret < 0)
goto out_err;
- sock_recv_timestamp(msg, sk, skb);
+ sock_recv_cmsgs(msg, sk, skb);
if (msg->msg_name) {
__sockaddr_check_size(ISOTP_MIN_NAMELEN);
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.14.y
git checkout FETCH_HEAD
git cherry-pick -x 9d52727f8043cfda241ae96896628d92fa9c50bb
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023041141-footsore-nanny-d8c6@gregkh' --subject-prefix 'PATCH 4.14.y' HEAD^..
Possible dependencies:
9d52727f8043 ("tracing: Have tracing_snapshot_instance_cond() write errors to the appropriate instance")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 9d52727f8043cfda241ae96896628d92fa9c50bb Mon Sep 17 00:00:00 2001
From: "Steven Rostedt (Google)" <rostedt(a)goodmis.org>
Date: Tue, 4 Apr 2023 22:21:14 -0400
Subject: [PATCH] tracing: Have tracing_snapshot_instance_cond() write errors
to the appropriate instance
If a trace instance has a failure with its snapshot code, the error
message is to be written to that instance's buffer. But currently, the
message is written to the top level buffer. Worse yet, it may also disable
the top level buffer and not the instance that had the issue.
Link: https://lkml.kernel.org/r/20230405022341.688730321@goodmis.org
Cc: stable(a)vger.kernel.org
Cc: Masami Hiramatsu <mhiramat(a)kernel.org>
Cc: Mark Rutland <mark.rutland(a)arm.com>
Cc: Andrew Morton <akpm(a)linux-foundation.org>
Cc: Ross Zwisler <zwisler(a)google.com>
Fixes: 2824f50332486 ("tracing: Make the snapshot trigger work with instances")
Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 937e9676dfd4..ed1d1093f5e9 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -1149,22 +1149,22 @@ static void tracing_snapshot_instance_cond(struct trace_array *tr,
unsigned long flags;
if (in_nmi()) {
- internal_trace_puts("*** SNAPSHOT CALLED FROM NMI CONTEXT ***\n");
- internal_trace_puts("*** snapshot is being ignored ***\n");
+ trace_array_puts(tr, "*** SNAPSHOT CALLED FROM NMI CONTEXT ***\n");
+ trace_array_puts(tr, "*** snapshot is being ignored ***\n");
return;
}
if (!tr->allocated_snapshot) {
- internal_trace_puts("*** SNAPSHOT NOT ALLOCATED ***\n");
- internal_trace_puts("*** stopping trace here! ***\n");
- tracing_off();
+ trace_array_puts(tr, "*** SNAPSHOT NOT ALLOCATED ***\n");
+ trace_array_puts(tr, "*** stopping trace here! ***\n");
+ tracer_tracing_off(tr);
return;
}
/* Note, snapshot can not be used when the tracer uses it */
if (tracer->use_max_tr) {
- internal_trace_puts("*** LATENCY TRACER ACTIVE ***\n");
- internal_trace_puts("*** Can not use snapshot (sorry) ***\n");
+ trace_array_puts(tr, "*** LATENCY TRACER ACTIVE ***\n");
+ trace_array_puts(tr, "*** Can not use snapshot (sorry) ***\n");
return;
}
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.19.y
git checkout FETCH_HEAD
git cherry-pick -x 9d52727f8043cfda241ae96896628d92fa9c50bb
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023041139-posted-attribute-6004@gregkh' --subject-prefix 'PATCH 4.19.y' HEAD^..
Possible dependencies:
9d52727f8043 ("tracing: Have tracing_snapshot_instance_cond() write errors to the appropriate instance")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 9d52727f8043cfda241ae96896628d92fa9c50bb Mon Sep 17 00:00:00 2001
From: "Steven Rostedt (Google)" <rostedt(a)goodmis.org>
Date: Tue, 4 Apr 2023 22:21:14 -0400
Subject: [PATCH] tracing: Have tracing_snapshot_instance_cond() write errors
to the appropriate instance
If a trace instance has a failure with its snapshot code, the error
message is to be written to that instance's buffer. But currently, the
message is written to the top level buffer. Worse yet, it may also disable
the top level buffer and not the instance that had the issue.
Link: https://lkml.kernel.org/r/20230405022341.688730321@goodmis.org
Cc: stable(a)vger.kernel.org
Cc: Masami Hiramatsu <mhiramat(a)kernel.org>
Cc: Mark Rutland <mark.rutland(a)arm.com>
Cc: Andrew Morton <akpm(a)linux-foundation.org>
Cc: Ross Zwisler <zwisler(a)google.com>
Fixes: 2824f50332486 ("tracing: Make the snapshot trigger work with instances")
Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 937e9676dfd4..ed1d1093f5e9 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -1149,22 +1149,22 @@ static void tracing_snapshot_instance_cond(struct trace_array *tr,
unsigned long flags;
if (in_nmi()) {
- internal_trace_puts("*** SNAPSHOT CALLED FROM NMI CONTEXT ***\n");
- internal_trace_puts("*** snapshot is being ignored ***\n");
+ trace_array_puts(tr, "*** SNAPSHOT CALLED FROM NMI CONTEXT ***\n");
+ trace_array_puts(tr, "*** snapshot is being ignored ***\n");
return;
}
if (!tr->allocated_snapshot) {
- internal_trace_puts("*** SNAPSHOT NOT ALLOCATED ***\n");
- internal_trace_puts("*** stopping trace here! ***\n");
- tracing_off();
+ trace_array_puts(tr, "*** SNAPSHOT NOT ALLOCATED ***\n");
+ trace_array_puts(tr, "*** stopping trace here! ***\n");
+ tracer_tracing_off(tr);
return;
}
/* Note, snapshot can not be used when the tracer uses it */
if (tracer->use_max_tr) {
- internal_trace_puts("*** LATENCY TRACER ACTIVE ***\n");
- internal_trace_puts("*** Can not use snapshot (sorry) ***\n");
+ trace_array_puts(tr, "*** LATENCY TRACER ACTIVE ***\n");
+ trace_array_puts(tr, "*** Can not use snapshot (sorry) ***\n");
return;
}
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y
git checkout FETCH_HEAD
git cherry-pick -x 9d52727f8043cfda241ae96896628d92fa9c50bb
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023041138-snaking-richness-7cd0@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^..
Possible dependencies:
9d52727f8043 ("tracing: Have tracing_snapshot_instance_cond() write errors to the appropriate instance")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 9d52727f8043cfda241ae96896628d92fa9c50bb Mon Sep 17 00:00:00 2001
From: "Steven Rostedt (Google)" <rostedt(a)goodmis.org>
Date: Tue, 4 Apr 2023 22:21:14 -0400
Subject: [PATCH] tracing: Have tracing_snapshot_instance_cond() write errors
to the appropriate instance
If a trace instance has a failure with its snapshot code, the error
message is to be written to that instance's buffer. But currently, the
message is written to the top level buffer. Worse yet, it may also disable
the top level buffer and not the instance that had the issue.
Link: https://lkml.kernel.org/r/20230405022341.688730321@goodmis.org
Cc: stable(a)vger.kernel.org
Cc: Masami Hiramatsu <mhiramat(a)kernel.org>
Cc: Mark Rutland <mark.rutland(a)arm.com>
Cc: Andrew Morton <akpm(a)linux-foundation.org>
Cc: Ross Zwisler <zwisler(a)google.com>
Fixes: 2824f50332486 ("tracing: Make the snapshot trigger work with instances")
Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 937e9676dfd4..ed1d1093f5e9 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -1149,22 +1149,22 @@ static void tracing_snapshot_instance_cond(struct trace_array *tr,
unsigned long flags;
if (in_nmi()) {
- internal_trace_puts("*** SNAPSHOT CALLED FROM NMI CONTEXT ***\n");
- internal_trace_puts("*** snapshot is being ignored ***\n");
+ trace_array_puts(tr, "*** SNAPSHOT CALLED FROM NMI CONTEXT ***\n");
+ trace_array_puts(tr, "*** snapshot is being ignored ***\n");
return;
}
if (!tr->allocated_snapshot) {
- internal_trace_puts("*** SNAPSHOT NOT ALLOCATED ***\n");
- internal_trace_puts("*** stopping trace here! ***\n");
- tracing_off();
+ trace_array_puts(tr, "*** SNAPSHOT NOT ALLOCATED ***\n");
+ trace_array_puts(tr, "*** stopping trace here! ***\n");
+ tracer_tracing_off(tr);
return;
}
/* Note, snapshot can not be used when the tracer uses it */
if (tracer->use_max_tr) {
- internal_trace_puts("*** LATENCY TRACER ACTIVE ***\n");
- internal_trace_puts("*** Can not use snapshot (sorry) ***\n");
+ trace_array_puts(tr, "*** LATENCY TRACER ACTIVE ***\n");
+ trace_array_puts(tr, "*** Can not use snapshot (sorry) ***\n");
return;
}
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x 9d52727f8043cfda241ae96896628d92fa9c50bb
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023041137-compound-rush-1a0a@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
9d52727f8043 ("tracing: Have tracing_snapshot_instance_cond() write errors to the appropriate instance")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 9d52727f8043cfda241ae96896628d92fa9c50bb Mon Sep 17 00:00:00 2001
From: "Steven Rostedt (Google)" <rostedt(a)goodmis.org>
Date: Tue, 4 Apr 2023 22:21:14 -0400
Subject: [PATCH] tracing: Have tracing_snapshot_instance_cond() write errors
to the appropriate instance
If a trace instance has a failure with its snapshot code, the error
message is to be written to that instance's buffer. But currently, the
message is written to the top level buffer. Worse yet, it may also disable
the top level buffer and not the instance that had the issue.
Link: https://lkml.kernel.org/r/20230405022341.688730321@goodmis.org
Cc: stable(a)vger.kernel.org
Cc: Masami Hiramatsu <mhiramat(a)kernel.org>
Cc: Mark Rutland <mark.rutland(a)arm.com>
Cc: Andrew Morton <akpm(a)linux-foundation.org>
Cc: Ross Zwisler <zwisler(a)google.com>
Fixes: 2824f50332486 ("tracing: Make the snapshot trigger work with instances")
Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 937e9676dfd4..ed1d1093f5e9 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -1149,22 +1149,22 @@ static void tracing_snapshot_instance_cond(struct trace_array *tr,
unsigned long flags;
if (in_nmi()) {
- internal_trace_puts("*** SNAPSHOT CALLED FROM NMI CONTEXT ***\n");
- internal_trace_puts("*** snapshot is being ignored ***\n");
+ trace_array_puts(tr, "*** SNAPSHOT CALLED FROM NMI CONTEXT ***\n");
+ trace_array_puts(tr, "*** snapshot is being ignored ***\n");
return;
}
if (!tr->allocated_snapshot) {
- internal_trace_puts("*** SNAPSHOT NOT ALLOCATED ***\n");
- internal_trace_puts("*** stopping trace here! ***\n");
- tracing_off();
+ trace_array_puts(tr, "*** SNAPSHOT NOT ALLOCATED ***\n");
+ trace_array_puts(tr, "*** stopping trace here! ***\n");
+ tracer_tracing_off(tr);
return;
}
/* Note, snapshot can not be used when the tracer uses it */
if (tracer->use_max_tr) {
- internal_trace_puts("*** LATENCY TRACER ACTIVE ***\n");
- internal_trace_puts("*** Can not use snapshot (sorry) ***\n");
+ trace_array_puts(tr, "*** LATENCY TRACER ACTIVE ***\n");
+ trace_array_puts(tr, "*** Can not use snapshot (sorry) ***\n");
return;
}
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x 9d52727f8043cfda241ae96896628d92fa9c50bb
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023041136-wham-same-24c8@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
9d52727f8043 ("tracing: Have tracing_snapshot_instance_cond() write errors to the appropriate instance")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 9d52727f8043cfda241ae96896628d92fa9c50bb Mon Sep 17 00:00:00 2001
From: "Steven Rostedt (Google)" <rostedt(a)goodmis.org>
Date: Tue, 4 Apr 2023 22:21:14 -0400
Subject: [PATCH] tracing: Have tracing_snapshot_instance_cond() write errors
to the appropriate instance
If a trace instance has a failure with its snapshot code, the error
message is to be written to that instance's buffer. But currently, the
message is written to the top level buffer. Worse yet, it may also disable
the top level buffer and not the instance that had the issue.
Link: https://lkml.kernel.org/r/20230405022341.688730321@goodmis.org
Cc: stable(a)vger.kernel.org
Cc: Masami Hiramatsu <mhiramat(a)kernel.org>
Cc: Mark Rutland <mark.rutland(a)arm.com>
Cc: Andrew Morton <akpm(a)linux-foundation.org>
Cc: Ross Zwisler <zwisler(a)google.com>
Fixes: 2824f50332486 ("tracing: Make the snapshot trigger work with instances")
Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 937e9676dfd4..ed1d1093f5e9 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -1149,22 +1149,22 @@ static void tracing_snapshot_instance_cond(struct trace_array *tr,
unsigned long flags;
if (in_nmi()) {
- internal_trace_puts("*** SNAPSHOT CALLED FROM NMI CONTEXT ***\n");
- internal_trace_puts("*** snapshot is being ignored ***\n");
+ trace_array_puts(tr, "*** SNAPSHOT CALLED FROM NMI CONTEXT ***\n");
+ trace_array_puts(tr, "*** snapshot is being ignored ***\n");
return;
}
if (!tr->allocated_snapshot) {
- internal_trace_puts("*** SNAPSHOT NOT ALLOCATED ***\n");
- internal_trace_puts("*** stopping trace here! ***\n");
- tracing_off();
+ trace_array_puts(tr, "*** SNAPSHOT NOT ALLOCATED ***\n");
+ trace_array_puts(tr, "*** stopping trace here! ***\n");
+ tracer_tracing_off(tr);
return;
}
/* Note, snapshot can not be used when the tracer uses it */
if (tracer->use_max_tr) {
- internal_trace_puts("*** LATENCY TRACER ACTIVE ***\n");
- internal_trace_puts("*** Can not use snapshot (sorry) ***\n");
+ trace_array_puts(tr, "*** LATENCY TRACER ACTIVE ***\n");
+ trace_array_puts(tr, "*** Can not use snapshot (sorry) ***\n");
return;
}
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x 9d52727f8043cfda241ae96896628d92fa9c50bb
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023041135-platypus-depletion-872e@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
9d52727f8043 ("tracing: Have tracing_snapshot_instance_cond() write errors to the appropriate instance")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 9d52727f8043cfda241ae96896628d92fa9c50bb Mon Sep 17 00:00:00 2001
From: "Steven Rostedt (Google)" <rostedt(a)goodmis.org>
Date: Tue, 4 Apr 2023 22:21:14 -0400
Subject: [PATCH] tracing: Have tracing_snapshot_instance_cond() write errors
to the appropriate instance
If a trace instance has a failure with its snapshot code, the error
message is to be written to that instance's buffer. But currently, the
message is written to the top level buffer. Worse yet, it may also disable
the top level buffer and not the instance that had the issue.
Link: https://lkml.kernel.org/r/20230405022341.688730321@goodmis.org
Cc: stable(a)vger.kernel.org
Cc: Masami Hiramatsu <mhiramat(a)kernel.org>
Cc: Mark Rutland <mark.rutland(a)arm.com>
Cc: Andrew Morton <akpm(a)linux-foundation.org>
Cc: Ross Zwisler <zwisler(a)google.com>
Fixes: 2824f50332486 ("tracing: Make the snapshot trigger work with instances")
Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 937e9676dfd4..ed1d1093f5e9 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -1149,22 +1149,22 @@ static void tracing_snapshot_instance_cond(struct trace_array *tr,
unsigned long flags;
if (in_nmi()) {
- internal_trace_puts("*** SNAPSHOT CALLED FROM NMI CONTEXT ***\n");
- internal_trace_puts("*** snapshot is being ignored ***\n");
+ trace_array_puts(tr, "*** SNAPSHOT CALLED FROM NMI CONTEXT ***\n");
+ trace_array_puts(tr, "*** snapshot is being ignored ***\n");
return;
}
if (!tr->allocated_snapshot) {
- internal_trace_puts("*** SNAPSHOT NOT ALLOCATED ***\n");
- internal_trace_puts("*** stopping trace here! ***\n");
- tracing_off();
+ trace_array_puts(tr, "*** SNAPSHOT NOT ALLOCATED ***\n");
+ trace_array_puts(tr, "*** stopping trace here! ***\n");
+ tracer_tracing_off(tr);
return;
}
/* Note, snapshot can not be used when the tracer uses it */
if (tracer->use_max_tr) {
- internal_trace_puts("*** LATENCY TRACER ACTIVE ***\n");
- internal_trace_puts("*** Can not use snapshot (sorry) ***\n");
+ trace_array_puts(tr, "*** LATENCY TRACER ACTIVE ***\n");
+ trace_array_puts(tr, "*** Can not use snapshot (sorry) ***\n");
return;
}
The patch below does not apply to the 6.2-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.2.y
git checkout FETCH_HEAD
git cherry-pick -x 9d52727f8043cfda241ae96896628d92fa9c50bb
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023041134-deploy-disparity-5f63@gregkh' --subject-prefix 'PATCH 6.2.y' HEAD^..
Possible dependencies:
9d52727f8043 ("tracing: Have tracing_snapshot_instance_cond() write errors to the appropriate instance")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 9d52727f8043cfda241ae96896628d92fa9c50bb Mon Sep 17 00:00:00 2001
From: "Steven Rostedt (Google)" <rostedt(a)goodmis.org>
Date: Tue, 4 Apr 2023 22:21:14 -0400
Subject: [PATCH] tracing: Have tracing_snapshot_instance_cond() write errors
to the appropriate instance
If a trace instance has a failure with its snapshot code, the error
message is to be written to that instance's buffer. But currently, the
message is written to the top level buffer. Worse yet, it may also disable
the top level buffer and not the instance that had the issue.
Link: https://lkml.kernel.org/r/20230405022341.688730321@goodmis.org
Cc: stable(a)vger.kernel.org
Cc: Masami Hiramatsu <mhiramat(a)kernel.org>
Cc: Mark Rutland <mark.rutland(a)arm.com>
Cc: Andrew Morton <akpm(a)linux-foundation.org>
Cc: Ross Zwisler <zwisler(a)google.com>
Fixes: 2824f50332486 ("tracing: Make the snapshot trigger work with instances")
Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 937e9676dfd4..ed1d1093f5e9 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -1149,22 +1149,22 @@ static void tracing_snapshot_instance_cond(struct trace_array *tr,
unsigned long flags;
if (in_nmi()) {
- internal_trace_puts("*** SNAPSHOT CALLED FROM NMI CONTEXT ***\n");
- internal_trace_puts("*** snapshot is being ignored ***\n");
+ trace_array_puts(tr, "*** SNAPSHOT CALLED FROM NMI CONTEXT ***\n");
+ trace_array_puts(tr, "*** snapshot is being ignored ***\n");
return;
}
if (!tr->allocated_snapshot) {
- internal_trace_puts("*** SNAPSHOT NOT ALLOCATED ***\n");
- internal_trace_puts("*** stopping trace here! ***\n");
- tracing_off();
+ trace_array_puts(tr, "*** SNAPSHOT NOT ALLOCATED ***\n");
+ trace_array_puts(tr, "*** stopping trace here! ***\n");
+ tracer_tracing_off(tr);
return;
}
/* Note, snapshot can not be used when the tracer uses it */
if (tracer->use_max_tr) {
- internal_trace_puts("*** LATENCY TRACER ACTIVE ***\n");
- internal_trace_puts("*** Can not use snapshot (sorry) ***\n");
+ trace_array_puts(tr, "*** LATENCY TRACER ACTIVE ***\n");
+ trace_array_puts(tr, "*** Can not use snapshot (sorry) ***\n");
return;
}
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x e5c972c1fadacc858b6a564d056f177275238040
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023041122-judiciary-maker-f29c@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
e5c972c1fada ("KVM: SVM: Flush Hyper-V TLB when required")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From e5c972c1fadacc858b6a564d056f177275238040 Mon Sep 17 00:00:00 2001
From: Jeremi Piotrowski <jpiotrowski(a)linux.microsoft.com>
Date: Fri, 24 Mar 2023 15:52:33 +0100
Subject: [PATCH] KVM: SVM: Flush Hyper-V TLB when required
The Hyper-V "EnlightenedNptTlb" enlightenment is always enabled when KVM
is running on top of Hyper-V and Hyper-V exposes support for it (which
is always). On AMD CPUs this enlightenment results in ASID invalidations
not flushing TLB entries derived from the NPT. To force the underlying
(L0) hypervisor to rebuild its shadow page tables, an explicit hypercall
is needed.
The original KVM implementation of Hyper-V's "EnlightenedNptTlb" on SVM
only added remote TLB flush hooks. This worked out fine for a while, as
sufficient remote TLB flushes where being issued in KVM to mask the
problem. Since v5.17, changes in the TDP code reduced the number of
flushes and the out-of-sync TLB prevents guests from booting
successfully.
Split svm_flush_tlb_current() into separate callbacks for the 3 cases
(guest/all/current), and issue the required Hyper-V hypercall when a
Hyper-V TLB flush is needed. The most important case where the TLB flush
was missing is when loading a new PGD, which is followed by what is now
svm_flush_tlb_current().
Cc: stable(a)vger.kernel.org # v5.17+
Fixes: 1e0c7d40758b ("KVM: SVM: hyper-v: Remote TLB flush for SVM")
Link: https://lore.kernel.org/lkml/43980946-7bbf-dcef-7e40-af904c456250@linux.mic…
Suggested-by: Sean Christopherson <seanjc(a)google.com>
Signed-off-by: Jeremi Piotrowski <jpiotrowski(a)linux.microsoft.com>
Reviewed-by: Vitaly Kuznetsov <vkuznets(a)redhat.com>
Message-Id: <20230324145233.4585-1-jpiotrowski(a)linux.microsoft.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kvm/kvm_onhyperv.h b/arch/x86/kvm/kvm_onhyperv.h
index 287e98ef9df3..6272dabec02d 100644
--- a/arch/x86/kvm/kvm_onhyperv.h
+++ b/arch/x86/kvm/kvm_onhyperv.h
@@ -12,6 +12,11 @@ int hv_remote_flush_tlb_with_range(struct kvm *kvm,
int hv_remote_flush_tlb(struct kvm *kvm);
void hv_track_root_tdp(struct kvm_vcpu *vcpu, hpa_t root_tdp);
#else /* !CONFIG_HYPERV */
+static inline int hv_remote_flush_tlb(struct kvm *kvm)
+{
+ return -EOPNOTSUPP;
+}
+
static inline void hv_track_root_tdp(struct kvm_vcpu *vcpu, hpa_t root_tdp)
{
}
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 252e7f37e4e2..f25bc3cbb250 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -3729,7 +3729,7 @@ static void svm_enable_nmi_window(struct kvm_vcpu *vcpu)
svm->vmcb->save.rflags |= (X86_EFLAGS_TF | X86_EFLAGS_RF);
}
-static void svm_flush_tlb_current(struct kvm_vcpu *vcpu)
+static void svm_flush_tlb_asid(struct kvm_vcpu *vcpu)
{
struct vcpu_svm *svm = to_svm(vcpu);
@@ -3753,6 +3753,37 @@ static void svm_flush_tlb_current(struct kvm_vcpu *vcpu)
svm->current_vmcb->asid_generation--;
}
+static void svm_flush_tlb_current(struct kvm_vcpu *vcpu)
+{
+ hpa_t root_tdp = vcpu->arch.mmu->root.hpa;
+
+ /*
+ * When running on Hyper-V with EnlightenedNptTlb enabled, explicitly
+ * flush the NPT mappings via hypercall as flushing the ASID only
+ * affects virtual to physical mappings, it does not invalidate guest
+ * physical to host physical mappings.
+ */
+ if (svm_hv_is_enlightened_tlb_enabled(vcpu) && VALID_PAGE(root_tdp))
+ hyperv_flush_guest_mapping(root_tdp);
+
+ svm_flush_tlb_asid(vcpu);
+}
+
+static void svm_flush_tlb_all(struct kvm_vcpu *vcpu)
+{
+ /*
+ * When running on Hyper-V with EnlightenedNptTlb enabled, remote TLB
+ * flushes should be routed to hv_remote_flush_tlb() without requesting
+ * a "regular" remote flush. Reaching this point means either there's
+ * a KVM bug or a prior hv_remote_flush_tlb() call failed, both of
+ * which might be fatal to the guest. Yell, but try to recover.
+ */
+ if (WARN_ON_ONCE(svm_hv_is_enlightened_tlb_enabled(vcpu)))
+ hv_remote_flush_tlb(vcpu->kvm);
+
+ svm_flush_tlb_asid(vcpu);
+}
+
static void svm_flush_tlb_gva(struct kvm_vcpu *vcpu, gva_t gva)
{
struct vcpu_svm *svm = to_svm(vcpu);
@@ -4745,10 +4776,10 @@ static struct kvm_x86_ops svm_x86_ops __initdata = {
.set_rflags = svm_set_rflags,
.get_if_flag = svm_get_if_flag,
- .flush_tlb_all = svm_flush_tlb_current,
+ .flush_tlb_all = svm_flush_tlb_all,
.flush_tlb_current = svm_flush_tlb_current,
.flush_tlb_gva = svm_flush_tlb_gva,
- .flush_tlb_guest = svm_flush_tlb_current,
+ .flush_tlb_guest = svm_flush_tlb_asid,
.vcpu_pre_run = svm_vcpu_pre_run,
.vcpu_run = svm_vcpu_run,
diff --git a/arch/x86/kvm/svm/svm_onhyperv.h b/arch/x86/kvm/svm/svm_onhyperv.h
index cff838f15db5..786d46d73a8e 100644
--- a/arch/x86/kvm/svm/svm_onhyperv.h
+++ b/arch/x86/kvm/svm/svm_onhyperv.h
@@ -6,6 +6,8 @@
#ifndef __ARCH_X86_KVM_SVM_ONHYPERV_H__
#define __ARCH_X86_KVM_SVM_ONHYPERV_H__
+#include <asm/mshyperv.h>
+
#if IS_ENABLED(CONFIG_HYPERV)
#include "kvm_onhyperv.h"
@@ -15,6 +17,14 @@ static struct kvm_x86_ops svm_x86_ops;
int svm_hv_enable_l2_tlb_flush(struct kvm_vcpu *vcpu);
+static inline bool svm_hv_is_enlightened_tlb_enabled(struct kvm_vcpu *vcpu)
+{
+ struct hv_vmcb_enlightenments *hve = &to_svm(vcpu)->vmcb->control.hv_enlightenments;
+
+ return ms_hyperv.nested_features & HV_X64_NESTED_ENLIGHTENED_TLB &&
+ !!hve->hv_enlightenments_control.enlightened_npt_tlb;
+}
+
static inline void svm_hv_init_vmcb(struct vmcb *vmcb)
{
struct hv_vmcb_enlightenments *hve = &vmcb->control.hv_enlightenments;
@@ -80,6 +90,11 @@ static inline void svm_hv_update_vp_id(struct vmcb *vmcb, struct kvm_vcpu *vcpu)
}
#else
+static inline bool svm_hv_is_enlightened_tlb_enabled(struct kvm_vcpu *vcpu)
+{
+ return false;
+}
+
static inline void svm_hv_init_vmcb(struct vmcb *vmcb)
{
}
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y
git checkout FETCH_HEAD
git cherry-pick -x 00f4bc5184c19cb33f468f1ea409d70d19f8f502
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023041103-squeegee-calcium-abaa@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^..
Possible dependencies:
00f4bc5184c1 ("counter: 104-quad-8: Fix Synapse action reported for Index signals")
aaec1a0f76ec ("counter: Internalize sysfs interface code")
ea434ff82649 ("counter: stm32-timer-cnt: Provide defines for slave mode selection")
05593a3fd103 ("counter: stm32-lptimer-cnt: Provide defines for clock polarities")
394a0150a064 ("counter: Rename counter_count_function to counter_function")
493b938a14ed ("counter: Rename counter_signal_value to counter_signal_level")
b11eed1554e8 ("counter: Return error code on invalid modes")
728246e8f726 ("counter: 104-quad-8: Return error when invalid mode during ceiling_write")
d0ce3d5cf77d ("counter: stm32-timer-cnt: Add const qualifier for actions_list array")
f83e6e59366b ("counter: stm32-lptimer-cnt: Add const qualifier for actions_list array")
0056a405c7ad ("counter: microchip-tcb-capture: Add const qualifier for actions_list array")
9b2574f61c49 ("counter: ftm-quaddec: Add const qualifier for actions_list array")
6a9eb0e31044 ("counter: 104-quad-8: Add const qualifier for actions_list array")
45af9ae84c60 ("counter: stm32-timer-cnt: Add const qualifier for functions_list array")
8a00fed665ad ("counter: stm32-lptimer-cnt: Add const qualifier for functions_list array")
7e0dcfcefeca ("counter: microchip-tcb-capture: Add const qualifier for functions_list array")
891b58b35fd6 ("counter: interrupt-cnt: Add const qualifier for functions_list array")
fca2534fddfa ("counter: 104-quad-8: Add const qualifier for functions_list array")
b711f687a1c1 ("counter: Add support for Intel Quadrature Encoder Peripheral")
9c15db92a8e5 ("Merge tag 'iio-for-5.13a' of https://git.kernel.org/pub/scm/linux/kernel/git/jic23/iio into staging-next")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 00f4bc5184c19cb33f468f1ea409d70d19f8f502 Mon Sep 17 00:00:00 2001
From: William Breathitt Gray <william.gray(a)linaro.org>
Date: Thu, 16 Mar 2023 16:34:26 -0400
Subject: [PATCH] counter: 104-quad-8: Fix Synapse action reported for Index
signals
Signal 16 and higher represent the device's Index lines. The
priv->preset_enable array holds the device configuration for these Index
lines. The preset_enable configuration is active low on the device, so
invert the conditional check in quad8_action_read() to properly handle
the logical state of preset_enable.
Fixes: f1d8a071d45b ("counter: 104-quad-8: Add Generic Counter interface support")
Cc: <stable(a)vger.kernel.org>
Link: https://lore.kernel.org/r/20230316203426.224745-1-william.gray@linaro.org/
Signed-off-by: William Breathitt Gray <william.gray(a)linaro.org>
diff --git a/drivers/counter/104-quad-8.c b/drivers/counter/104-quad-8.c
index d59e4f34a680..d9cb937665cf 100644
--- a/drivers/counter/104-quad-8.c
+++ b/drivers/counter/104-quad-8.c
@@ -368,7 +368,7 @@ static int quad8_action_read(struct counter_device *counter,
/* Handle Index signals */
if (synapse->signal->id >= 16) {
- if (priv->preset_enable[count->id])
+ if (!priv->preset_enable[count->id])
*action = COUNTER_SYNAPSE_ACTION_RISING_EDGE;
else
*action = COUNTER_SYNAPSE_ACTION_NONE;
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x 00f4bc5184c19cb33f468f1ea409d70d19f8f502
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023041101-politely-properly-fe73@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
00f4bc5184c1 ("counter: 104-quad-8: Fix Synapse action reported for Index signals")
aaec1a0f76ec ("counter: Internalize sysfs interface code")
ea434ff82649 ("counter: stm32-timer-cnt: Provide defines for slave mode selection")
05593a3fd103 ("counter: stm32-lptimer-cnt: Provide defines for clock polarities")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 00f4bc5184c19cb33f468f1ea409d70d19f8f502 Mon Sep 17 00:00:00 2001
From: William Breathitt Gray <william.gray(a)linaro.org>
Date: Thu, 16 Mar 2023 16:34:26 -0400
Subject: [PATCH] counter: 104-quad-8: Fix Synapse action reported for Index
signals
Signal 16 and higher represent the device's Index lines. The
priv->preset_enable array holds the device configuration for these Index
lines. The preset_enable configuration is active low on the device, so
invert the conditional check in quad8_action_read() to properly handle
the logical state of preset_enable.
Fixes: f1d8a071d45b ("counter: 104-quad-8: Add Generic Counter interface support")
Cc: <stable(a)vger.kernel.org>
Link: https://lore.kernel.org/r/20230316203426.224745-1-william.gray@linaro.org/
Signed-off-by: William Breathitt Gray <william.gray(a)linaro.org>
diff --git a/drivers/counter/104-quad-8.c b/drivers/counter/104-quad-8.c
index d59e4f34a680..d9cb937665cf 100644
--- a/drivers/counter/104-quad-8.c
+++ b/drivers/counter/104-quad-8.c
@@ -368,7 +368,7 @@ static int quad8_action_read(struct counter_device *counter,
/* Handle Index signals */
if (synapse->signal->id >= 16) {
- if (priv->preset_enable[count->id])
+ if (!priv->preset_enable[count->id])
*action = COUNTER_SYNAPSE_ACTION_RISING_EDGE;
else
*action = COUNTER_SYNAPSE_ACTION_NONE;
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x 00f4bc5184c19cb33f468f1ea409d70d19f8f502
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023041102-fleshy-condiment-a713@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
00f4bc5184c1 ("counter: 104-quad-8: Fix Synapse action reported for Index signals")
aaec1a0f76ec ("counter: Internalize sysfs interface code")
ea434ff82649 ("counter: stm32-timer-cnt: Provide defines for slave mode selection")
05593a3fd103 ("counter: stm32-lptimer-cnt: Provide defines for clock polarities")
394a0150a064 ("counter: Rename counter_count_function to counter_function")
493b938a14ed ("counter: Rename counter_signal_value to counter_signal_level")
b11eed1554e8 ("counter: Return error code on invalid modes")
728246e8f726 ("counter: 104-quad-8: Return error when invalid mode during ceiling_write")
d0ce3d5cf77d ("counter: stm32-timer-cnt: Add const qualifier for actions_list array")
f83e6e59366b ("counter: stm32-lptimer-cnt: Add const qualifier for actions_list array")
0056a405c7ad ("counter: microchip-tcb-capture: Add const qualifier for actions_list array")
9b2574f61c49 ("counter: ftm-quaddec: Add const qualifier for actions_list array")
6a9eb0e31044 ("counter: 104-quad-8: Add const qualifier for actions_list array")
45af9ae84c60 ("counter: stm32-timer-cnt: Add const qualifier for functions_list array")
8a00fed665ad ("counter: stm32-lptimer-cnt: Add const qualifier for functions_list array")
7e0dcfcefeca ("counter: microchip-tcb-capture: Add const qualifier for functions_list array")
891b58b35fd6 ("counter: interrupt-cnt: Add const qualifier for functions_list array")
fca2534fddfa ("counter: 104-quad-8: Add const qualifier for functions_list array")
b711f687a1c1 ("counter: Add support for Intel Quadrature Encoder Peripheral")
9c15db92a8e5 ("Merge tag 'iio-for-5.13a' of https://git.kernel.org/pub/scm/linux/kernel/git/jic23/iio into staging-next")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 00f4bc5184c19cb33f468f1ea409d70d19f8f502 Mon Sep 17 00:00:00 2001
From: William Breathitt Gray <william.gray(a)linaro.org>
Date: Thu, 16 Mar 2023 16:34:26 -0400
Subject: [PATCH] counter: 104-quad-8: Fix Synapse action reported for Index
signals
Signal 16 and higher represent the device's Index lines. The
priv->preset_enable array holds the device configuration for these Index
lines. The preset_enable configuration is active low on the device, so
invert the conditional check in quad8_action_read() to properly handle
the logical state of preset_enable.
Fixes: f1d8a071d45b ("counter: 104-quad-8: Add Generic Counter interface support")
Cc: <stable(a)vger.kernel.org>
Link: https://lore.kernel.org/r/20230316203426.224745-1-william.gray@linaro.org/
Signed-off-by: William Breathitt Gray <william.gray(a)linaro.org>
diff --git a/drivers/counter/104-quad-8.c b/drivers/counter/104-quad-8.c
index d59e4f34a680..d9cb937665cf 100644
--- a/drivers/counter/104-quad-8.c
+++ b/drivers/counter/104-quad-8.c
@@ -368,7 +368,7 @@ static int quad8_action_read(struct counter_device *counter,
/* Handle Index signals */
if (synapse->signal->id >= 16) {
- if (priv->preset_enable[count->id])
+ if (!priv->preset_enable[count->id])
*action = COUNTER_SYNAPSE_ACTION_RISING_EDGE;
else
*action = COUNTER_SYNAPSE_ACTION_NONE;
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x 1f2803b2660f4b04d48d065072c0ae0c9ca255fd
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023041153-figment-fanfare-e9c7@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
1f2803b2660f ("mm: kfence: fix handling discontiguous page")
3ee2d7471fa4 ("mm: kfence: fix PG_slab and memcg_data clearing")
8f0b36497303 ("mm: kfence: fix objcgs vector allocation")
b33f778bba5e ("kfence: alloc kfence_pool after system startup")
698361bca2d5 ("kfence: allow re-enabling KFENCE after system startup")
07e8481d3c38 ("kfence: always use static branches to guard kfence_alloc()")
08f6b10630f2 ("kfence: limit currently covered allocations when pool nearly full")
a9ab52bbcb52 ("kfence: move saving stack trace of allocations into __kfence_alloc()")
9a19aeb56650 ("kfence: count unexpectedly skipped allocations")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 1f2803b2660f4b04d48d065072c0ae0c9ca255fd Mon Sep 17 00:00:00 2001
From: Muchun Song <muchun.song(a)linux.dev>
Date: Thu, 23 Mar 2023 10:50:03 +0800
Subject: [PATCH] mm: kfence: fix handling discontiguous page
The struct pages could be discontiguous when the kfence pool is allocated
via alloc_contig_pages() with CONFIG_SPARSEMEM and
!CONFIG_SPARSEMEM_VMEMMAP.
This may result in setting PG_slab and memcg_data to a arbitrary
address (may be not used as a struct page), which in the worst case
might corrupt the kernel.
So the iteration should use nth_page().
Link: https://lkml.kernel.org/r/20230323025003.94447-1-songmuchun@bytedance.com
Fixes: 0ce20dd84089 ("mm: add Kernel Electric-Fence infrastructure")
Signed-off-by: Muchun Song <songmuchun(a)bytedance.com>
Reviewed-by: Marco Elver <elver(a)google.com>
Reviewed-by: Kefeng Wang <wangkefeng.wang(a)huawei.com>
Cc: Alexander Potapenko <glider(a)google.com>
Cc: Dmitry Vyukov <dvyukov(a)google.com>
Cc: Jann Horn <jannh(a)google.com>
Cc: SeongJae Park <sjpark(a)amazon.de>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/mm/kfence/core.c b/mm/kfence/core.c
index d66092dd187c..1065e0568d05 100644
--- a/mm/kfence/core.c
+++ b/mm/kfence/core.c
@@ -556,7 +556,7 @@ static unsigned long kfence_init_pool(void)
* enters __slab_free() slow-path.
*/
for (i = 0; i < KFENCE_POOL_SIZE / PAGE_SIZE; i++) {
- struct slab *slab = page_slab(&pages[i]);
+ struct slab *slab = page_slab(nth_page(pages, i));
if (!i || (i % 2))
continue;
@@ -602,7 +602,7 @@ static unsigned long kfence_init_pool(void)
reset_slab:
for (i = 0; i < KFENCE_POOL_SIZE / PAGE_SIZE; i++) {
- struct slab *slab = page_slab(&pages[i]);
+ struct slab *slab = page_slab(nth_page(pages, i));
if (!i || (i % 2))
continue;
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x 3ee2d7471fa4963a2ced0a84f0653ce88b43c5b2
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023041141-negligent-sappy-dd84@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
3ee2d7471fa4 ("mm: kfence: fix PG_slab and memcg_data clearing")
8f0b36497303 ("mm: kfence: fix objcgs vector allocation")
b33f778bba5e ("kfence: alloc kfence_pool after system startup")
698361bca2d5 ("kfence: allow re-enabling KFENCE after system startup")
07e8481d3c38 ("kfence: always use static branches to guard kfence_alloc()")
08f6b10630f2 ("kfence: limit currently covered allocations when pool nearly full")
a9ab52bbcb52 ("kfence: move saving stack trace of allocations into __kfence_alloc()")
9a19aeb56650 ("kfence: count unexpectedly skipped allocations")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 3ee2d7471fa4963a2ced0a84f0653ce88b43c5b2 Mon Sep 17 00:00:00 2001
From: Muchun Song <muchun.song(a)linux.dev>
Date: Mon, 20 Mar 2023 11:00:59 +0800
Subject: [PATCH] mm: kfence: fix PG_slab and memcg_data clearing
It does not reset PG_slab and memcg_data when KFENCE fails to initialize
kfence pool at runtime. It is reporting a "Bad page state" message when
kfence pool is freed to buddy. The checking of whether it is a compound
head page seems unnecessary since we already guarantee this when
allocating kfence pool. Remove the check to simplify the code.
Link: https://lkml.kernel.org/r/20230320030059.20189-1-songmuchun@bytedance.com
Fixes: 0ce20dd84089 ("mm: add Kernel Electric-Fence infrastructure")
Signed-off-by: Muchun Song <songmuchun(a)bytedance.com>
Cc: Alexander Potapenko <glider(a)google.com>
Cc: Dmitry Vyukov <dvyukov(a)google.com>
Cc: Jann Horn <jannh(a)google.com>
Cc: Marco Elver <elver(a)google.com>
Cc: Roman Gushchin <roman.gushchin(a)linux.dev>
Cc: SeongJae Park <sjpark(a)amazon.de>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/mm/kfence/core.c b/mm/kfence/core.c
index 79c94ee55f97..d66092dd187c 100644
--- a/mm/kfence/core.c
+++ b/mm/kfence/core.c
@@ -561,10 +561,6 @@ static unsigned long kfence_init_pool(void)
if (!i || (i % 2))
continue;
- /* Verify we do not have a compound head page. */
- if (WARN_ON(compound_head(&pages[i]) != &pages[i]))
- return addr;
-
__folio_set_slab(slab_folio(slab));
#ifdef CONFIG_MEMCG
slab->memcg_data = (unsigned long)&kfence_metadata[i / 2 - 1].objcg |
@@ -597,12 +593,26 @@ static unsigned long kfence_init_pool(void)
/* Protect the right redzone. */
if (unlikely(!kfence_protect(addr + PAGE_SIZE)))
- return addr;
+ goto reset_slab;
addr += 2 * PAGE_SIZE;
}
return 0;
+
+reset_slab:
+ for (i = 0; i < KFENCE_POOL_SIZE / PAGE_SIZE; i++) {
+ struct slab *slab = page_slab(&pages[i]);
+
+ if (!i || (i % 2))
+ continue;
+#ifdef CONFIG_MEMCG
+ slab->memcg_data = 0;
+#endif
+ __folio_clear_slab(slab_folio(slab));
+ }
+
+ return addr;
}
static bool __init kfence_init_pool_early(void)
@@ -632,16 +642,6 @@ static bool __init kfence_init_pool_early(void)
* fails for the first page, and therefore expect addr==__kfence_pool in
* most failure cases.
*/
- for (char *p = (char *)addr; p < __kfence_pool + KFENCE_POOL_SIZE; p += PAGE_SIZE) {
- struct slab *slab = virt_to_slab(p);
-
- if (!slab)
- continue;
-#ifdef CONFIG_MEMCG
- slab->memcg_data = 0;
-#endif
- __folio_clear_slab(slab_folio(slab));
- }
memblock_free_late(__pa(addr), KFENCE_POOL_SIZE - (addr - (unsigned long)__kfence_pool));
__kfence_pool = NULL;
return false;
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x e5c972c1fadacc858b6a564d056f177275238040
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023041120-hesitant-rekindle-faac@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
e5c972c1fada ("KVM: SVM: Flush Hyper-V TLB when required")
5be2226f417d ("KVM: x86: allow defining return-0 static calls")
abb6d479e226 ("KVM: x86: make several APIC virtualization callbacks optional")
dd2319c61888 ("KVM: x86: warn on incorrectly NULL members of kvm_x86_ops")
e4fc23bad813 ("KVM: x86: remove KVM_X86_OP_NULL and mark optional kvm_x86_ops")
8a2897853c53 ("KVM: x86: return 1 unconditionally for availability of KVM_CAP_VAPIC")
db6e7adf8de9 ("KVM: SVM: Rename AVIC helpers to use "avic" prefix instead of "svm"")
4e71cad31c62 ("Merge remote-tracking branch 'kvm/master' into HEAD")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From e5c972c1fadacc858b6a564d056f177275238040 Mon Sep 17 00:00:00 2001
From: Jeremi Piotrowski <jpiotrowski(a)linux.microsoft.com>
Date: Fri, 24 Mar 2023 15:52:33 +0100
Subject: [PATCH] KVM: SVM: Flush Hyper-V TLB when required
The Hyper-V "EnlightenedNptTlb" enlightenment is always enabled when KVM
is running on top of Hyper-V and Hyper-V exposes support for it (which
is always). On AMD CPUs this enlightenment results in ASID invalidations
not flushing TLB entries derived from the NPT. To force the underlying
(L0) hypervisor to rebuild its shadow page tables, an explicit hypercall
is needed.
The original KVM implementation of Hyper-V's "EnlightenedNptTlb" on SVM
only added remote TLB flush hooks. This worked out fine for a while, as
sufficient remote TLB flushes where being issued in KVM to mask the
problem. Since v5.17, changes in the TDP code reduced the number of
flushes and the out-of-sync TLB prevents guests from booting
successfully.
Split svm_flush_tlb_current() into separate callbacks for the 3 cases
(guest/all/current), and issue the required Hyper-V hypercall when a
Hyper-V TLB flush is needed. The most important case where the TLB flush
was missing is when loading a new PGD, which is followed by what is now
svm_flush_tlb_current().
Cc: stable(a)vger.kernel.org # v5.17+
Fixes: 1e0c7d40758b ("KVM: SVM: hyper-v: Remote TLB flush for SVM")
Link: https://lore.kernel.org/lkml/43980946-7bbf-dcef-7e40-af904c456250@linux.mic…
Suggested-by: Sean Christopherson <seanjc(a)google.com>
Signed-off-by: Jeremi Piotrowski <jpiotrowski(a)linux.microsoft.com>
Reviewed-by: Vitaly Kuznetsov <vkuznets(a)redhat.com>
Message-Id: <20230324145233.4585-1-jpiotrowski(a)linux.microsoft.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kvm/kvm_onhyperv.h b/arch/x86/kvm/kvm_onhyperv.h
index 287e98ef9df3..6272dabec02d 100644
--- a/arch/x86/kvm/kvm_onhyperv.h
+++ b/arch/x86/kvm/kvm_onhyperv.h
@@ -12,6 +12,11 @@ int hv_remote_flush_tlb_with_range(struct kvm *kvm,
int hv_remote_flush_tlb(struct kvm *kvm);
void hv_track_root_tdp(struct kvm_vcpu *vcpu, hpa_t root_tdp);
#else /* !CONFIG_HYPERV */
+static inline int hv_remote_flush_tlb(struct kvm *kvm)
+{
+ return -EOPNOTSUPP;
+}
+
static inline void hv_track_root_tdp(struct kvm_vcpu *vcpu, hpa_t root_tdp)
{
}
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 252e7f37e4e2..f25bc3cbb250 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -3729,7 +3729,7 @@ static void svm_enable_nmi_window(struct kvm_vcpu *vcpu)
svm->vmcb->save.rflags |= (X86_EFLAGS_TF | X86_EFLAGS_RF);
}
-static void svm_flush_tlb_current(struct kvm_vcpu *vcpu)
+static void svm_flush_tlb_asid(struct kvm_vcpu *vcpu)
{
struct vcpu_svm *svm = to_svm(vcpu);
@@ -3753,6 +3753,37 @@ static void svm_flush_tlb_current(struct kvm_vcpu *vcpu)
svm->current_vmcb->asid_generation--;
}
+static void svm_flush_tlb_current(struct kvm_vcpu *vcpu)
+{
+ hpa_t root_tdp = vcpu->arch.mmu->root.hpa;
+
+ /*
+ * When running on Hyper-V with EnlightenedNptTlb enabled, explicitly
+ * flush the NPT mappings via hypercall as flushing the ASID only
+ * affects virtual to physical mappings, it does not invalidate guest
+ * physical to host physical mappings.
+ */
+ if (svm_hv_is_enlightened_tlb_enabled(vcpu) && VALID_PAGE(root_tdp))
+ hyperv_flush_guest_mapping(root_tdp);
+
+ svm_flush_tlb_asid(vcpu);
+}
+
+static void svm_flush_tlb_all(struct kvm_vcpu *vcpu)
+{
+ /*
+ * When running on Hyper-V with EnlightenedNptTlb enabled, remote TLB
+ * flushes should be routed to hv_remote_flush_tlb() without requesting
+ * a "regular" remote flush. Reaching this point means either there's
+ * a KVM bug or a prior hv_remote_flush_tlb() call failed, both of
+ * which might be fatal to the guest. Yell, but try to recover.
+ */
+ if (WARN_ON_ONCE(svm_hv_is_enlightened_tlb_enabled(vcpu)))
+ hv_remote_flush_tlb(vcpu->kvm);
+
+ svm_flush_tlb_asid(vcpu);
+}
+
static void svm_flush_tlb_gva(struct kvm_vcpu *vcpu, gva_t gva)
{
struct vcpu_svm *svm = to_svm(vcpu);
@@ -4745,10 +4776,10 @@ static struct kvm_x86_ops svm_x86_ops __initdata = {
.set_rflags = svm_set_rflags,
.get_if_flag = svm_get_if_flag,
- .flush_tlb_all = svm_flush_tlb_current,
+ .flush_tlb_all = svm_flush_tlb_all,
.flush_tlb_current = svm_flush_tlb_current,
.flush_tlb_gva = svm_flush_tlb_gva,
- .flush_tlb_guest = svm_flush_tlb_current,
+ .flush_tlb_guest = svm_flush_tlb_asid,
.vcpu_pre_run = svm_vcpu_pre_run,
.vcpu_run = svm_vcpu_run,
diff --git a/arch/x86/kvm/svm/svm_onhyperv.h b/arch/x86/kvm/svm/svm_onhyperv.h
index cff838f15db5..786d46d73a8e 100644
--- a/arch/x86/kvm/svm/svm_onhyperv.h
+++ b/arch/x86/kvm/svm/svm_onhyperv.h
@@ -6,6 +6,8 @@
#ifndef __ARCH_X86_KVM_SVM_ONHYPERV_H__
#define __ARCH_X86_KVM_SVM_ONHYPERV_H__
+#include <asm/mshyperv.h>
+
#if IS_ENABLED(CONFIG_HYPERV)
#include "kvm_onhyperv.h"
@@ -15,6 +17,14 @@ static struct kvm_x86_ops svm_x86_ops;
int svm_hv_enable_l2_tlb_flush(struct kvm_vcpu *vcpu);
+static inline bool svm_hv_is_enlightened_tlb_enabled(struct kvm_vcpu *vcpu)
+{
+ struct hv_vmcb_enlightenments *hve = &to_svm(vcpu)->vmcb->control.hv_enlightenments;
+
+ return ms_hyperv.nested_features & HV_X64_NESTED_ENLIGHTENED_TLB &&
+ !!hve->hv_enlightenments_control.enlightened_npt_tlb;
+}
+
static inline void svm_hv_init_vmcb(struct vmcb *vmcb)
{
struct hv_vmcb_enlightenments *hve = &vmcb->control.hv_enlightenments;
@@ -80,6 +90,11 @@ static inline void svm_hv_update_vp_id(struct vmcb *vmcb, struct kvm_vcpu *vcpu)
}
#else
+static inline bool svm_hv_is_enlightened_tlb_enabled(struct kvm_vcpu *vcpu)
+{
+ return false;
+}
+
static inline void svm_hv_init_vmcb(struct vmcb *vmcb)
{
}
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.14.y
git checkout FETCH_HEAD
git cherry-pick -x 80962ec912db56d323883154efc2297473e692cb
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023041108-tumble-width-e1b4@gregkh' --subject-prefix 'PATCH 4.14.y' HEAD^..
Possible dependencies:
80962ec912db ("KVM: nVMX: Do not report error code when synthesizing VM-Exit from Real Mode")
d4963e319f1f ("KVM: x86: Make kvm_queued_exception a properly named, visible struct")
5623f751bd9c ("KVM: x86: Treat #DBs from the emulator as fault-like (code and DR7.GD=1)")
8d178f460772 ("KVM: nVMX: Treat General Detect #DB (DR7.GD=1) as fault-like")
eba9799b5a6e ("KVM: VMX: Drop bits 31:16 when shoving exception error code into VMCS")
a61d7c5432ac ("KVM: x86: Trace re-injected exceptions")
6ef88d6e36c2 ("KVM: SVM: Re-inject INT3/INTO instead of retrying the instruction")
3741aec4c38f ("KVM: SVM: Stuff next_rip on emulated INT3 injection if NRIPS is supported")
cd9e6da8048c ("KVM: SVM: Unwind "speculative" RIP advancement if INTn injection "fails"")
00f08d99dd7d ("KVM: nSVM: Sync next_rip field from vmcb12 to vmcb02")
9bd1f0efa859 ("KVM: nVMX: Clear IDT vectoring on nested VM-Exit for double/triple fault")
c3634d25fbee ("KVM: nVMX: Leave most VM-Exit info fields unmodified on failed VM-Entry")
1d5a1b5860ed ("KVM: x86: nSVM: correctly virtualize LBR msrs when L2 is running")
db663af4a001 ("kvm: x86: SVM: use vmcb* instead of svm->vmcb where it makes sense")
b9f3973ab3a8 ("KVM: x86: nSVM: implement nested VMLOAD/VMSAVE")
23e5092b6e2a ("KVM: SVM: Rename hook implementations to conform to kvm_x86_ops' names")
e27bc0440ebd ("KVM: x86: Rename kvm_x86_ops pointers to align w/ preferred vendor names")
068f7ea61895 ("KVM: SVM: improve split between svm_prepare_guest_switch and sev_es_prepare_guest_switch")
e1779c2714c3 ("KVM: x86: nSVM: fix potential NULL derefernce on nested migration")
54744e17f031 ("KVM: SVM: Move svm_hardware_setup() and its helpers below svm_x86_ops")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 80962ec912db56d323883154efc2297473e692cb Mon Sep 17 00:00:00 2001
From: Sean Christopherson <seanjc(a)google.com>
Date: Wed, 22 Mar 2023 07:33:00 -0700
Subject: [PATCH] KVM: nVMX: Do not report error code when synthesizing VM-Exit
from Real Mode
Don't report an error code to L1 when synthesizing a nested VM-Exit and
L2 is in Real Mode. Per Intel's SDM, regarding the error code valid bit:
This bit is always 0 if the VM exit occurred while the logical processor
was in real-address mode (CR0.PE=0).
The bug was introduced by a recent fix for AMD's Paged Real Mode, which
moved the error code suppression from the common "queue exception" path
to the "inject exception" path, but missed VMX's "synthesize VM-Exit"
path.
Fixes: b97f07458373 ("KVM: x86: determine if an exception has an error code only when injecting it.")
Cc: stable(a)vger.kernel.org
Cc: Maxim Levitsky <mlevitsk(a)redhat.com>
Signed-off-by: Sean Christopherson <seanjc(a)google.com>
Message-Id: <20230322143300.2209476-3-seanjc(a)google.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 1bc2b80273c9..768487611db7 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -3868,7 +3868,12 @@ static void nested_vmx_inject_exception_vmexit(struct kvm_vcpu *vcpu)
exit_qual = 0;
}
- if (ex->has_error_code) {
+ /*
+ * Unlike AMD's Paged Real Mode, which reports an error code on #PF
+ * VM-Exits even if the CPU is in Real Mode, Intel VMX never sets the
+ * "has error code" flags on VM-Exit if the CPU is in Real Mode.
+ */
+ if (ex->has_error_code && is_protmode(vcpu)) {
/*
* Intel CPUs do not generate error codes with bits 31:16 set,
* and more importantly VMX disallows setting bits 31:16 in the
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.19.y
git checkout FETCH_HEAD
git cherry-pick -x 80962ec912db56d323883154efc2297473e692cb
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023041107-parameter-dissuade-6553@gregkh' --subject-prefix 'PATCH 4.19.y' HEAD^..
Possible dependencies:
80962ec912db ("KVM: nVMX: Do not report error code when synthesizing VM-Exit from Real Mode")
d4963e319f1f ("KVM: x86: Make kvm_queued_exception a properly named, visible struct")
5623f751bd9c ("KVM: x86: Treat #DBs from the emulator as fault-like (code and DR7.GD=1)")
8d178f460772 ("KVM: nVMX: Treat General Detect #DB (DR7.GD=1) as fault-like")
eba9799b5a6e ("KVM: VMX: Drop bits 31:16 when shoving exception error code into VMCS")
a61d7c5432ac ("KVM: x86: Trace re-injected exceptions")
6ef88d6e36c2 ("KVM: SVM: Re-inject INT3/INTO instead of retrying the instruction")
3741aec4c38f ("KVM: SVM: Stuff next_rip on emulated INT3 injection if NRIPS is supported")
cd9e6da8048c ("KVM: SVM: Unwind "speculative" RIP advancement if INTn injection "fails"")
00f08d99dd7d ("KVM: nSVM: Sync next_rip field from vmcb12 to vmcb02")
9bd1f0efa859 ("KVM: nVMX: Clear IDT vectoring on nested VM-Exit for double/triple fault")
c3634d25fbee ("KVM: nVMX: Leave most VM-Exit info fields unmodified on failed VM-Entry")
1d5a1b5860ed ("KVM: x86: nSVM: correctly virtualize LBR msrs when L2 is running")
db663af4a001 ("kvm: x86: SVM: use vmcb* instead of svm->vmcb where it makes sense")
b9f3973ab3a8 ("KVM: x86: nSVM: implement nested VMLOAD/VMSAVE")
23e5092b6e2a ("KVM: SVM: Rename hook implementations to conform to kvm_x86_ops' names")
e27bc0440ebd ("KVM: x86: Rename kvm_x86_ops pointers to align w/ preferred vendor names")
068f7ea61895 ("KVM: SVM: improve split between svm_prepare_guest_switch and sev_es_prepare_guest_switch")
e1779c2714c3 ("KVM: x86: nSVM: fix potential NULL derefernce on nested migration")
54744e17f031 ("KVM: SVM: Move svm_hardware_setup() and its helpers below svm_x86_ops")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 80962ec912db56d323883154efc2297473e692cb Mon Sep 17 00:00:00 2001
From: Sean Christopherson <seanjc(a)google.com>
Date: Wed, 22 Mar 2023 07:33:00 -0700
Subject: [PATCH] KVM: nVMX: Do not report error code when synthesizing VM-Exit
from Real Mode
Don't report an error code to L1 when synthesizing a nested VM-Exit and
L2 is in Real Mode. Per Intel's SDM, regarding the error code valid bit:
This bit is always 0 if the VM exit occurred while the logical processor
was in real-address mode (CR0.PE=0).
The bug was introduced by a recent fix for AMD's Paged Real Mode, which
moved the error code suppression from the common "queue exception" path
to the "inject exception" path, but missed VMX's "synthesize VM-Exit"
path.
Fixes: b97f07458373 ("KVM: x86: determine if an exception has an error code only when injecting it.")
Cc: stable(a)vger.kernel.org
Cc: Maxim Levitsky <mlevitsk(a)redhat.com>
Signed-off-by: Sean Christopherson <seanjc(a)google.com>
Message-Id: <20230322143300.2209476-3-seanjc(a)google.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 1bc2b80273c9..768487611db7 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -3868,7 +3868,12 @@ static void nested_vmx_inject_exception_vmexit(struct kvm_vcpu *vcpu)
exit_qual = 0;
}
- if (ex->has_error_code) {
+ /*
+ * Unlike AMD's Paged Real Mode, which reports an error code on #PF
+ * VM-Exits even if the CPU is in Real Mode, Intel VMX never sets the
+ * "has error code" flags on VM-Exit if the CPU is in Real Mode.
+ */
+ if (ex->has_error_code && is_protmode(vcpu)) {
/*
* Intel CPUs do not generate error codes with bits 31:16 set,
* and more importantly VMX disallows setting bits 31:16 in the
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y
git checkout FETCH_HEAD
git cherry-pick -x 80962ec912db56d323883154efc2297473e692cb
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023041106-splinter-basin-0152@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^..
Possible dependencies:
80962ec912db ("KVM: nVMX: Do not report error code when synthesizing VM-Exit from Real Mode")
d4963e319f1f ("KVM: x86: Make kvm_queued_exception a properly named, visible struct")
5623f751bd9c ("KVM: x86: Treat #DBs from the emulator as fault-like (code and DR7.GD=1)")
8d178f460772 ("KVM: nVMX: Treat General Detect #DB (DR7.GD=1) as fault-like")
eba9799b5a6e ("KVM: VMX: Drop bits 31:16 when shoving exception error code into VMCS")
a61d7c5432ac ("KVM: x86: Trace re-injected exceptions")
6ef88d6e36c2 ("KVM: SVM: Re-inject INT3/INTO instead of retrying the instruction")
3741aec4c38f ("KVM: SVM: Stuff next_rip on emulated INT3 injection if NRIPS is supported")
cd9e6da8048c ("KVM: SVM: Unwind "speculative" RIP advancement if INTn injection "fails"")
00f08d99dd7d ("KVM: nSVM: Sync next_rip field from vmcb12 to vmcb02")
9bd1f0efa859 ("KVM: nVMX: Clear IDT vectoring on nested VM-Exit for double/triple fault")
c3634d25fbee ("KVM: nVMX: Leave most VM-Exit info fields unmodified on failed VM-Entry")
1d5a1b5860ed ("KVM: x86: nSVM: correctly virtualize LBR msrs when L2 is running")
db663af4a001 ("kvm: x86: SVM: use vmcb* instead of svm->vmcb where it makes sense")
b9f3973ab3a8 ("KVM: x86: nSVM: implement nested VMLOAD/VMSAVE")
23e5092b6e2a ("KVM: SVM: Rename hook implementations to conform to kvm_x86_ops' names")
e27bc0440ebd ("KVM: x86: Rename kvm_x86_ops pointers to align w/ preferred vendor names")
068f7ea61895 ("KVM: SVM: improve split between svm_prepare_guest_switch and sev_es_prepare_guest_switch")
e1779c2714c3 ("KVM: x86: nSVM: fix potential NULL derefernce on nested migration")
54744e17f031 ("KVM: SVM: Move svm_hardware_setup() and its helpers below svm_x86_ops")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 80962ec912db56d323883154efc2297473e692cb Mon Sep 17 00:00:00 2001
From: Sean Christopherson <seanjc(a)google.com>
Date: Wed, 22 Mar 2023 07:33:00 -0700
Subject: [PATCH] KVM: nVMX: Do not report error code when synthesizing VM-Exit
from Real Mode
Don't report an error code to L1 when synthesizing a nested VM-Exit and
L2 is in Real Mode. Per Intel's SDM, regarding the error code valid bit:
This bit is always 0 if the VM exit occurred while the logical processor
was in real-address mode (CR0.PE=0).
The bug was introduced by a recent fix for AMD's Paged Real Mode, which
moved the error code suppression from the common "queue exception" path
to the "inject exception" path, but missed VMX's "synthesize VM-Exit"
path.
Fixes: b97f07458373 ("KVM: x86: determine if an exception has an error code only when injecting it.")
Cc: stable(a)vger.kernel.org
Cc: Maxim Levitsky <mlevitsk(a)redhat.com>
Signed-off-by: Sean Christopherson <seanjc(a)google.com>
Message-Id: <20230322143300.2209476-3-seanjc(a)google.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 1bc2b80273c9..768487611db7 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -3868,7 +3868,12 @@ static void nested_vmx_inject_exception_vmexit(struct kvm_vcpu *vcpu)
exit_qual = 0;
}
- if (ex->has_error_code) {
+ /*
+ * Unlike AMD's Paged Real Mode, which reports an error code on #PF
+ * VM-Exits even if the CPU is in Real Mode, Intel VMX never sets the
+ * "has error code" flags on VM-Exit if the CPU is in Real Mode.
+ */
+ if (ex->has_error_code && is_protmode(vcpu)) {
/*
* Intel CPUs do not generate error codes with bits 31:16 set,
* and more importantly VMX disallows setting bits 31:16 in the
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x 80962ec912db56d323883154efc2297473e692cb
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023041104-scarf-keg-0b67@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
80962ec912db ("KVM: nVMX: Do not report error code when synthesizing VM-Exit from Real Mode")
d4963e319f1f ("KVM: x86: Make kvm_queued_exception a properly named, visible struct")
5623f751bd9c ("KVM: x86: Treat #DBs from the emulator as fault-like (code and DR7.GD=1)")
8d178f460772 ("KVM: nVMX: Treat General Detect #DB (DR7.GD=1) as fault-like")
eba9799b5a6e ("KVM: VMX: Drop bits 31:16 when shoving exception error code into VMCS")
a61d7c5432ac ("KVM: x86: Trace re-injected exceptions")
6ef88d6e36c2 ("KVM: SVM: Re-inject INT3/INTO instead of retrying the instruction")
3741aec4c38f ("KVM: SVM: Stuff next_rip on emulated INT3 injection if NRIPS is supported")
cd9e6da8048c ("KVM: SVM: Unwind "speculative" RIP advancement if INTn injection "fails"")
00f08d99dd7d ("KVM: nSVM: Sync next_rip field from vmcb12 to vmcb02")
9bd1f0efa859 ("KVM: nVMX: Clear IDT vectoring on nested VM-Exit for double/triple fault")
c3634d25fbee ("KVM: nVMX: Leave most VM-Exit info fields unmodified on failed VM-Entry")
1d5a1b5860ed ("KVM: x86: nSVM: correctly virtualize LBR msrs when L2 is running")
db663af4a001 ("kvm: x86: SVM: use vmcb* instead of svm->vmcb where it makes sense")
b9f3973ab3a8 ("KVM: x86: nSVM: implement nested VMLOAD/VMSAVE")
23e5092b6e2a ("KVM: SVM: Rename hook implementations to conform to kvm_x86_ops' names")
e27bc0440ebd ("KVM: x86: Rename kvm_x86_ops pointers to align w/ preferred vendor names")
068f7ea61895 ("KVM: SVM: improve split between svm_prepare_guest_switch and sev_es_prepare_guest_switch")
e1779c2714c3 ("KVM: x86: nSVM: fix potential NULL derefernce on nested migration")
54744e17f031 ("KVM: SVM: Move svm_hardware_setup() and its helpers below svm_x86_ops")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 80962ec912db56d323883154efc2297473e692cb Mon Sep 17 00:00:00 2001
From: Sean Christopherson <seanjc(a)google.com>
Date: Wed, 22 Mar 2023 07:33:00 -0700
Subject: [PATCH] KVM: nVMX: Do not report error code when synthesizing VM-Exit
from Real Mode
Don't report an error code to L1 when synthesizing a nested VM-Exit and
L2 is in Real Mode. Per Intel's SDM, regarding the error code valid bit:
This bit is always 0 if the VM exit occurred while the logical processor
was in real-address mode (CR0.PE=0).
The bug was introduced by a recent fix for AMD's Paged Real Mode, which
moved the error code suppression from the common "queue exception" path
to the "inject exception" path, but missed VMX's "synthesize VM-Exit"
path.
Fixes: b97f07458373 ("KVM: x86: determine if an exception has an error code only when injecting it.")
Cc: stable(a)vger.kernel.org
Cc: Maxim Levitsky <mlevitsk(a)redhat.com>
Signed-off-by: Sean Christopherson <seanjc(a)google.com>
Message-Id: <20230322143300.2209476-3-seanjc(a)google.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 1bc2b80273c9..768487611db7 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -3868,7 +3868,12 @@ static void nested_vmx_inject_exception_vmexit(struct kvm_vcpu *vcpu)
exit_qual = 0;
}
- if (ex->has_error_code) {
+ /*
+ * Unlike AMD's Paged Real Mode, which reports an error code on #PF
+ * VM-Exits even if the CPU is in Real Mode, Intel VMX never sets the
+ * "has error code" flags on VM-Exit if the CPU is in Real Mode.
+ */
+ if (ex->has_error_code && is_protmode(vcpu)) {
/*
* Intel CPUs do not generate error codes with bits 31:16 set,
* and more importantly VMX disallows setting bits 31:16 in the
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x 80962ec912db56d323883154efc2297473e692cb
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023041103-bulb-unaired-ceb0@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
80962ec912db ("KVM: nVMX: Do not report error code when synthesizing VM-Exit from Real Mode")
d4963e319f1f ("KVM: x86: Make kvm_queued_exception a properly named, visible struct")
5623f751bd9c ("KVM: x86: Treat #DBs from the emulator as fault-like (code and DR7.GD=1)")
8d178f460772 ("KVM: nVMX: Treat General Detect #DB (DR7.GD=1) as fault-like")
eba9799b5a6e ("KVM: VMX: Drop bits 31:16 when shoving exception error code into VMCS")
a61d7c5432ac ("KVM: x86: Trace re-injected exceptions")
6ef88d6e36c2 ("KVM: SVM: Re-inject INT3/INTO instead of retrying the instruction")
3741aec4c38f ("KVM: SVM: Stuff next_rip on emulated INT3 injection if NRIPS is supported")
cd9e6da8048c ("KVM: SVM: Unwind "speculative" RIP advancement if INTn injection "fails"")
00f08d99dd7d ("KVM: nSVM: Sync next_rip field from vmcb12 to vmcb02")
9bd1f0efa859 ("KVM: nVMX: Clear IDT vectoring on nested VM-Exit for double/triple fault")
c3634d25fbee ("KVM: nVMX: Leave most VM-Exit info fields unmodified on failed VM-Entry")
1d5a1b5860ed ("KVM: x86: nSVM: correctly virtualize LBR msrs when L2 is running")
db663af4a001 ("kvm: x86: SVM: use vmcb* instead of svm->vmcb where it makes sense")
b9f3973ab3a8 ("KVM: x86: nSVM: implement nested VMLOAD/VMSAVE")
23e5092b6e2a ("KVM: SVM: Rename hook implementations to conform to kvm_x86_ops' names")
e27bc0440ebd ("KVM: x86: Rename kvm_x86_ops pointers to align w/ preferred vendor names")
068f7ea61895 ("KVM: SVM: improve split between svm_prepare_guest_switch and sev_es_prepare_guest_switch")
e1779c2714c3 ("KVM: x86: nSVM: fix potential NULL derefernce on nested migration")
54744e17f031 ("KVM: SVM: Move svm_hardware_setup() and its helpers below svm_x86_ops")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 80962ec912db56d323883154efc2297473e692cb Mon Sep 17 00:00:00 2001
From: Sean Christopherson <seanjc(a)google.com>
Date: Wed, 22 Mar 2023 07:33:00 -0700
Subject: [PATCH] KVM: nVMX: Do not report error code when synthesizing VM-Exit
from Real Mode
Don't report an error code to L1 when synthesizing a nested VM-Exit and
L2 is in Real Mode. Per Intel's SDM, regarding the error code valid bit:
This bit is always 0 if the VM exit occurred while the logical processor
was in real-address mode (CR0.PE=0).
The bug was introduced by a recent fix for AMD's Paged Real Mode, which
moved the error code suppression from the common "queue exception" path
to the "inject exception" path, but missed VMX's "synthesize VM-Exit"
path.
Fixes: b97f07458373 ("KVM: x86: determine if an exception has an error code only when injecting it.")
Cc: stable(a)vger.kernel.org
Cc: Maxim Levitsky <mlevitsk(a)redhat.com>
Signed-off-by: Sean Christopherson <seanjc(a)google.com>
Message-Id: <20230322143300.2209476-3-seanjc(a)google.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 1bc2b80273c9..768487611db7 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -3868,7 +3868,12 @@ static void nested_vmx_inject_exception_vmexit(struct kvm_vcpu *vcpu)
exit_qual = 0;
}
- if (ex->has_error_code) {
+ /*
+ * Unlike AMD's Paged Real Mode, which reports an error code on #PF
+ * VM-Exits even if the CPU is in Real Mode, Intel VMX never sets the
+ * "has error code" flags on VM-Exit if the CPU is in Real Mode.
+ */
+ if (ex->has_error_code && is_protmode(vcpu)) {
/*
* Intel CPUs do not generate error codes with bits 31:16 set,
* and more importantly VMX disallows setting bits 31:16 in the
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.14.y
git checkout FETCH_HEAD
git cherry-pick -x 6c41468c7c12d74843bb414fc00307ea8a6318c3
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023041138-dallying-idly-b9ed@gregkh' --subject-prefix 'PATCH 4.14.y' HEAD^..
Possible dependencies:
6c41468c7c12 ("KVM: x86: Clear "has_error_code", not "error_code", for RM exception injection")
d4963e319f1f ("KVM: x86: Make kvm_queued_exception a properly named, visible struct")
6ad75c5c99f7 ("KVM: x86: Rename kvm_x86_ops.queue_exception to inject_exception")
5623f751bd9c ("KVM: x86: Treat #DBs from the emulator as fault-like (code and DR7.GD=1)")
8d178f460772 ("KVM: nVMX: Treat General Detect #DB (DR7.GD=1) as fault-like")
eba9799b5a6e ("KVM: VMX: Drop bits 31:16 when shoving exception error code into VMCS")
a61d7c5432ac ("KVM: x86: Trace re-injected exceptions")
6ef88d6e36c2 ("KVM: SVM: Re-inject INT3/INTO instead of retrying the instruction")
3741aec4c38f ("KVM: SVM: Stuff next_rip on emulated INT3 injection if NRIPS is supported")
cd9e6da8048c ("KVM: SVM: Unwind "speculative" RIP advancement if INTn injection "fails"")
00f08d99dd7d ("KVM: nSVM: Sync next_rip field from vmcb12 to vmcb02")
9bd1f0efa859 ("KVM: nVMX: Clear IDT vectoring on nested VM-Exit for double/triple fault")
c3634d25fbee ("KVM: nVMX: Leave most VM-Exit info fields unmodified on failed VM-Entry")
1d5a1b5860ed ("KVM: x86: nSVM: correctly virtualize LBR msrs when L2 is running")
db663af4a001 ("kvm: x86: SVM: use vmcb* instead of svm->vmcb where it makes sense")
b9f3973ab3a8 ("KVM: x86: nSVM: implement nested VMLOAD/VMSAVE")
23e5092b6e2a ("KVM: SVM: Rename hook implementations to conform to kvm_x86_ops' names")
e27bc0440ebd ("KVM: x86: Rename kvm_x86_ops pointers to align w/ preferred vendor names")
068f7ea61895 ("KVM: SVM: improve split between svm_prepare_guest_switch and sev_es_prepare_guest_switch")
e1779c2714c3 ("KVM: x86: nSVM: fix potential NULL derefernce on nested migration")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 6c41468c7c12d74843bb414fc00307ea8a6318c3 Mon Sep 17 00:00:00 2001
From: Sean Christopherson <seanjc(a)google.com>
Date: Wed, 22 Mar 2023 07:32:59 -0700
Subject: [PATCH] KVM: x86: Clear "has_error_code", not "error_code", for RM
exception injection
When injecting an exception into a vCPU in Real Mode, suppress the error
code by clearing the flag that tracks whether the error code is valid, not
by clearing the error code itself. The "typo" was introduced by recent
fix for SVM's funky Paged Real Mode.
Opportunistically hoist the logic above the tracepoint so that the trace
is coherent with respect to what is actually injected (this was also the
behavior prior to the buggy commit).
Fixes: b97f07458373 ("KVM: x86: determine if an exception has an error code only when injecting it.")
Cc: stable(a)vger.kernel.org
Cc: Maxim Levitsky <mlevitsk(a)redhat.com>
Signed-off-by: Sean Christopherson <seanjc(a)google.com>
Message-Id: <20230322143300.2209476-2-seanjc(a)google.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 45017576ad5e..7d6f98b7635f 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -9908,13 +9908,20 @@ int kvm_check_nested_events(struct kvm_vcpu *vcpu)
static void kvm_inject_exception(struct kvm_vcpu *vcpu)
{
+ /*
+ * Suppress the error code if the vCPU is in Real Mode, as Real Mode
+ * exceptions don't report error codes. The presence of an error code
+ * is carried with the exception and only stripped when the exception
+ * is injected as intercepted #PF VM-Exits for AMD's Paged Real Mode do
+ * report an error code despite the CPU being in Real Mode.
+ */
+ vcpu->arch.exception.has_error_code &= is_protmode(vcpu);
+
trace_kvm_inj_exception(vcpu->arch.exception.vector,
vcpu->arch.exception.has_error_code,
vcpu->arch.exception.error_code,
vcpu->arch.exception.injected);
- if (vcpu->arch.exception.error_code && !is_protmode(vcpu))
- vcpu->arch.exception.error_code = false;
static_call(kvm_x86_inject_exception)(vcpu);
}
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.19.y
git checkout FETCH_HEAD
git cherry-pick -x 6c41468c7c12d74843bb414fc00307ea8a6318c3
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023041137-rockfish-condone-6161@gregkh' --subject-prefix 'PATCH 4.19.y' HEAD^..
Possible dependencies:
6c41468c7c12 ("KVM: x86: Clear "has_error_code", not "error_code", for RM exception injection")
d4963e319f1f ("KVM: x86: Make kvm_queued_exception a properly named, visible struct")
6ad75c5c99f7 ("KVM: x86: Rename kvm_x86_ops.queue_exception to inject_exception")
5623f751bd9c ("KVM: x86: Treat #DBs from the emulator as fault-like (code and DR7.GD=1)")
8d178f460772 ("KVM: nVMX: Treat General Detect #DB (DR7.GD=1) as fault-like")
eba9799b5a6e ("KVM: VMX: Drop bits 31:16 when shoving exception error code into VMCS")
a61d7c5432ac ("KVM: x86: Trace re-injected exceptions")
6ef88d6e36c2 ("KVM: SVM: Re-inject INT3/INTO instead of retrying the instruction")
3741aec4c38f ("KVM: SVM: Stuff next_rip on emulated INT3 injection if NRIPS is supported")
cd9e6da8048c ("KVM: SVM: Unwind "speculative" RIP advancement if INTn injection "fails"")
00f08d99dd7d ("KVM: nSVM: Sync next_rip field from vmcb12 to vmcb02")
9bd1f0efa859 ("KVM: nVMX: Clear IDT vectoring on nested VM-Exit for double/triple fault")
c3634d25fbee ("KVM: nVMX: Leave most VM-Exit info fields unmodified on failed VM-Entry")
1d5a1b5860ed ("KVM: x86: nSVM: correctly virtualize LBR msrs when L2 is running")
db663af4a001 ("kvm: x86: SVM: use vmcb* instead of svm->vmcb where it makes sense")
b9f3973ab3a8 ("KVM: x86: nSVM: implement nested VMLOAD/VMSAVE")
23e5092b6e2a ("KVM: SVM: Rename hook implementations to conform to kvm_x86_ops' names")
e27bc0440ebd ("KVM: x86: Rename kvm_x86_ops pointers to align w/ preferred vendor names")
068f7ea61895 ("KVM: SVM: improve split between svm_prepare_guest_switch and sev_es_prepare_guest_switch")
e1779c2714c3 ("KVM: x86: nSVM: fix potential NULL derefernce on nested migration")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 6c41468c7c12d74843bb414fc00307ea8a6318c3 Mon Sep 17 00:00:00 2001
From: Sean Christopherson <seanjc(a)google.com>
Date: Wed, 22 Mar 2023 07:32:59 -0700
Subject: [PATCH] KVM: x86: Clear "has_error_code", not "error_code", for RM
exception injection
When injecting an exception into a vCPU in Real Mode, suppress the error
code by clearing the flag that tracks whether the error code is valid, not
by clearing the error code itself. The "typo" was introduced by recent
fix for SVM's funky Paged Real Mode.
Opportunistically hoist the logic above the tracepoint so that the trace
is coherent with respect to what is actually injected (this was also the
behavior prior to the buggy commit).
Fixes: b97f07458373 ("KVM: x86: determine if an exception has an error code only when injecting it.")
Cc: stable(a)vger.kernel.org
Cc: Maxim Levitsky <mlevitsk(a)redhat.com>
Signed-off-by: Sean Christopherson <seanjc(a)google.com>
Message-Id: <20230322143300.2209476-2-seanjc(a)google.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 45017576ad5e..7d6f98b7635f 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -9908,13 +9908,20 @@ int kvm_check_nested_events(struct kvm_vcpu *vcpu)
static void kvm_inject_exception(struct kvm_vcpu *vcpu)
{
+ /*
+ * Suppress the error code if the vCPU is in Real Mode, as Real Mode
+ * exceptions don't report error codes. The presence of an error code
+ * is carried with the exception and only stripped when the exception
+ * is injected as intercepted #PF VM-Exits for AMD's Paged Real Mode do
+ * report an error code despite the CPU being in Real Mode.
+ */
+ vcpu->arch.exception.has_error_code &= is_protmode(vcpu);
+
trace_kvm_inj_exception(vcpu->arch.exception.vector,
vcpu->arch.exception.has_error_code,
vcpu->arch.exception.error_code,
vcpu->arch.exception.injected);
- if (vcpu->arch.exception.error_code && !is_protmode(vcpu))
- vcpu->arch.exception.error_code = false;
static_call(kvm_x86_inject_exception)(vcpu);
}
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y
git checkout FETCH_HEAD
git cherry-pick -x 6c41468c7c12d74843bb414fc00307ea8a6318c3
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023041136-spoiling-scoured-c3b4@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^..
Possible dependencies:
6c41468c7c12 ("KVM: x86: Clear "has_error_code", not "error_code", for RM exception injection")
d4963e319f1f ("KVM: x86: Make kvm_queued_exception a properly named, visible struct")
6ad75c5c99f7 ("KVM: x86: Rename kvm_x86_ops.queue_exception to inject_exception")
5623f751bd9c ("KVM: x86: Treat #DBs from the emulator as fault-like (code and DR7.GD=1)")
8d178f460772 ("KVM: nVMX: Treat General Detect #DB (DR7.GD=1) as fault-like")
eba9799b5a6e ("KVM: VMX: Drop bits 31:16 when shoving exception error code into VMCS")
a61d7c5432ac ("KVM: x86: Trace re-injected exceptions")
6ef88d6e36c2 ("KVM: SVM: Re-inject INT3/INTO instead of retrying the instruction")
3741aec4c38f ("KVM: SVM: Stuff next_rip on emulated INT3 injection if NRIPS is supported")
cd9e6da8048c ("KVM: SVM: Unwind "speculative" RIP advancement if INTn injection "fails"")
00f08d99dd7d ("KVM: nSVM: Sync next_rip field from vmcb12 to vmcb02")
9bd1f0efa859 ("KVM: nVMX: Clear IDT vectoring on nested VM-Exit for double/triple fault")
c3634d25fbee ("KVM: nVMX: Leave most VM-Exit info fields unmodified on failed VM-Entry")
1d5a1b5860ed ("KVM: x86: nSVM: correctly virtualize LBR msrs when L2 is running")
db663af4a001 ("kvm: x86: SVM: use vmcb* instead of svm->vmcb where it makes sense")
b9f3973ab3a8 ("KVM: x86: nSVM: implement nested VMLOAD/VMSAVE")
23e5092b6e2a ("KVM: SVM: Rename hook implementations to conform to kvm_x86_ops' names")
e27bc0440ebd ("KVM: x86: Rename kvm_x86_ops pointers to align w/ preferred vendor names")
068f7ea61895 ("KVM: SVM: improve split between svm_prepare_guest_switch and sev_es_prepare_guest_switch")
e1779c2714c3 ("KVM: x86: nSVM: fix potential NULL derefernce on nested migration")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 6c41468c7c12d74843bb414fc00307ea8a6318c3 Mon Sep 17 00:00:00 2001
From: Sean Christopherson <seanjc(a)google.com>
Date: Wed, 22 Mar 2023 07:32:59 -0700
Subject: [PATCH] KVM: x86: Clear "has_error_code", not "error_code", for RM
exception injection
When injecting an exception into a vCPU in Real Mode, suppress the error
code by clearing the flag that tracks whether the error code is valid, not
by clearing the error code itself. The "typo" was introduced by recent
fix for SVM's funky Paged Real Mode.
Opportunistically hoist the logic above the tracepoint so that the trace
is coherent with respect to what is actually injected (this was also the
behavior prior to the buggy commit).
Fixes: b97f07458373 ("KVM: x86: determine if an exception has an error code only when injecting it.")
Cc: stable(a)vger.kernel.org
Cc: Maxim Levitsky <mlevitsk(a)redhat.com>
Signed-off-by: Sean Christopherson <seanjc(a)google.com>
Message-Id: <20230322143300.2209476-2-seanjc(a)google.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 45017576ad5e..7d6f98b7635f 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -9908,13 +9908,20 @@ int kvm_check_nested_events(struct kvm_vcpu *vcpu)
static void kvm_inject_exception(struct kvm_vcpu *vcpu)
{
+ /*
+ * Suppress the error code if the vCPU is in Real Mode, as Real Mode
+ * exceptions don't report error codes. The presence of an error code
+ * is carried with the exception and only stripped when the exception
+ * is injected as intercepted #PF VM-Exits for AMD's Paged Real Mode do
+ * report an error code despite the CPU being in Real Mode.
+ */
+ vcpu->arch.exception.has_error_code &= is_protmode(vcpu);
+
trace_kvm_inj_exception(vcpu->arch.exception.vector,
vcpu->arch.exception.has_error_code,
vcpu->arch.exception.error_code,
vcpu->arch.exception.injected);
- if (vcpu->arch.exception.error_code && !is_protmode(vcpu))
- vcpu->arch.exception.error_code = false;
static_call(kvm_x86_inject_exception)(vcpu);
}
From: Eric Biggers <ebiggers(a)google.com>
Commit 56124d6c87fd ("fsverity: support enabling with tree block size <
PAGE_SIZE") changed FS_IOC_ENABLE_VERITY to use __kernel_read() to read
the file's data, instead of direct pagecache accesses.
An unintended consequence of this is that the
'WARN_ON_ONCE(!(file->f_mode & FMODE_READ))' in __kernel_read() became
reachable by fuzz tests. This happens if FS_IOC_ENABLE_VERITY is called
on a fd opened with access mode 3, which means "ioctl access only".
Arguably, FS_IOC_ENABLE_VERITY should work on ioctl-only fds. But
ioctl-only fds are a weird Linux extension that is rarely used and that
few people even know about. (The documentation for FS_IOC_ENABLE_VERITY
even specifically says it requires O_RDONLY.) It's probably not
worthwhile to make the ioctl internally open a new fd just to handle
this case. Thus, just reject the ioctl on such fds for now.
Fixes: 56124d6c87fd ("fsverity: support enabling with tree block size < PAGE_SIZE")
Reported-by: syzbot+51177e4144d764827c45(a)syzkaller.appspotmail.com
Link: https://syzkaller.appspot.com/bug?id=2281afcbbfa8fdb92f9887479cc0e4180f1c6b…
Cc: stable(a)vger.kernel.org
Signed-off-by: Eric Biggers <ebiggers(a)google.com>
---
fs/verity/enable.c | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/fs/verity/enable.c b/fs/verity/enable.c
index bbec6f93172cf..fc4c50e5219dc 100644
--- a/fs/verity/enable.c
+++ b/fs/verity/enable.c
@@ -357,6 +357,13 @@ int fsverity_ioctl_enable(struct file *filp, const void __user *uarg)
err = file_permission(filp, MAY_WRITE);
if (err)
return err;
+ /*
+ * __kernel_read() is used while building the Merkle tree. So, we can't
+ * allow file descriptors that were opened for ioctl access only, using
+ * the special nonstandard access mode 3. O_RDONLY only, please!
+ */
+ if (!(filp->f_mode & FMODE_READ))
+ return -EBADF;
if (IS_APPEND(inode))
return -EPERM;
base-commit: dbd91ed3b5acb7acba2cac2d38e7aec57a5f1e96
--
2.40.0
Dzień dobry,
chciałbym poinformować Państwa o możliwości pozyskania nowych zleceń ze strony www.
Widzimy zainteresowanie potencjalnych Klientów Państwa firmą, dlatego chętnie pomożemy Państwu dotrzeć z ofertą do większego grona odbiorców poprzez efektywne metody pozycjonowania strony w Google.
Czy mógłbym liczyć na kontakt zwrotny?
Pozdrawiam serdecznie,
Wiktor Nurek
This is a note to let you know that I've just added the patch titled
fpga: m10bmc-sec: Fix rsu_send_data() to return
to my char-misc git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc.git
in the char-misc-linus branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will hopefully also be merged in Linus's tree for the
next -rc kernel release.
If you have any questions about this process, please let me know.
From c3d79fda250ac5df73d089f08311eb87138b04f3 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Ilpo=20J=C3=A4rvinen?= <ilpo.jarvinen(a)linux.intel.com>
Date: Wed, 8 Feb 2023 10:08:46 +0200
Subject: fpga: m10bmc-sec: Fix rsu_send_data() to return
FW_UPLOAD_ERR_HW_ERROR
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
rsu_send_data() should return FW_UPLOAD_ERR_* error codes instead of
normal -Exxxx codes. Convert <0 return from ->rsu_status() to
FW_UPLOAD_ERR_HW_ERROR.
Fixes: 001a734a55d0 ("fpga: m10bmc-sec: Make rsu status type specific")
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen(a)linux.intel.com>
Reviewed-by: Russ Weight <russell.h.weight(a)intel.com>
Cc: <stable(a)vger.kernel.org>
Acked-by: Xu Yilun <yilun.xu(a)intel.com>
Link: https://lore.kernel.org/r/20230208080846.10795-1-ilpo.jarvinen@linux.intel.…
Signed-off-by: Xu Yilun <yilun.xu(a)intel.com>
---
drivers/fpga/intel-m10-bmc-sec-update.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/fpga/intel-m10-bmc-sec-update.c b/drivers/fpga/intel-m10-bmc-sec-update.c
index f0acedc80182..d7e2f9f461bc 100644
--- a/drivers/fpga/intel-m10-bmc-sec-update.c
+++ b/drivers/fpga/intel-m10-bmc-sec-update.c
@@ -474,7 +474,7 @@ static enum fw_upload_err rsu_send_data(struct m10bmc_sec *sec)
ret = sec->ops->rsu_status(sec);
if (ret < 0)
- return ret;
+ return FW_UPLOAD_ERR_HW_ERROR;
status = ret;
if (!rsu_status_ok(status)) {
--
2.40.0
This is a note to let you know that I've just added the patch titled
iio: dac: ad5755: Add missing fwnode_handle_put()
to my char-misc git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc.git
in the char-misc-linus branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will hopefully also be merged in Linus's tree for the
next -rc kernel release.
If you have any questions about this process, please let me know.
From ffef73791574b8da872cfbf881d8e3e9955fc130 Mon Sep 17 00:00:00 2001
From: Liang He <windhl(a)126.com>
Date: Wed, 22 Mar 2023 11:56:27 +0800
Subject: iio: dac: ad5755: Add missing fwnode_handle_put()
In ad5755_parse_fw(), we should add fwnode_handle_put()
when break out of the iteration device_for_each_child_node()
as it will automatically increase and decrease the refcounter.
Fixes: 3ac27afefd5d ("iio:dac:ad5755: Switch to generic firmware properties and drop pdata")
Signed-off-by: Liang He <windhl(a)126.com>
Link: https://lore.kernel.org/r/20230322035627.1856421-1-windhl@126.com
Cc: <Stable(a)vger.kernel.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron(a)huawei.com>
---
drivers/iio/dac/ad5755.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/iio/dac/ad5755.c b/drivers/iio/dac/ad5755.c
index beadfa938d2d..404865e35460 100644
--- a/drivers/iio/dac/ad5755.c
+++ b/drivers/iio/dac/ad5755.c
@@ -802,6 +802,7 @@ static struct ad5755_platform_data *ad5755_parse_fw(struct device *dev)
return pdata;
error_out:
+ fwnode_handle_put(pp);
devm_kfree(dev, pdata);
return NULL;
}
--
2.40.0
This is a note to let you know that I've just added the patch titled
iio: light: tsl2772: fix reading proximity-diodes from device tree
to my char-misc git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc.git
in the char-misc-linus branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will hopefully also be merged in Linus's tree for the
next -rc kernel release.
If you have any questions about this process, please let me know.
From b1cb00d51e361cf5af93649917d9790e1623647e Mon Sep 17 00:00:00 2001
From: Brian Masney <bmasney(a)redhat.com>
Date: Mon, 3 Apr 2023 21:14:55 -0400
Subject: iio: light: tsl2772: fix reading proximity-diodes from device tree
tsl2772_read_prox_diodes() will correctly parse the properties from
device tree to determine which proximity diode(s) to read from, however
it didn't actually set this value on the struct tsl2772_settings. Let's
go ahead and fix that.
Reported-by: Tom Rix <trix(a)redhat.com>
Link: https://lore.kernel.org/lkml/20230327120823.1369700-1-trix@redhat.com/
Fixes: 94cd1113aaa0 ("iio: tsl2772: add support for reading proximity led settings from device tree")
Signed-off-by: Brian Masney <bmasney(a)redhat.com>
Link: https://lore.kernel.org/r/20230404011455.339454-1-bmasney@redhat.com
Cc: <Stable(a)vger.kernel.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron(a)huawei.com>
---
drivers/iio/light/tsl2772.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/iio/light/tsl2772.c b/drivers/iio/light/tsl2772.c
index ad50baa0202c..e823c145f679 100644
--- a/drivers/iio/light/tsl2772.c
+++ b/drivers/iio/light/tsl2772.c
@@ -601,6 +601,7 @@ static int tsl2772_read_prox_diodes(struct tsl2772_chip *chip)
return -EINVAL;
}
}
+ chip->settings.prox_diode = prox_diode_mask;
return 0;
}
--
2.40.0
The patch titled
Subject: maple_tree: fix a potential memory leak, OOB access, or other unpredictable bug
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
maple_tree-fix-a-potential-memory-leak-oob-access-or-other-unpredictable-bug.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Peng Zhang <zhangpeng.00(a)bytedance.com>
Subject: maple_tree: fix a potential memory leak, OOB access, or other unpredictable bug
Date: Tue, 11 Apr 2023 12:10:04 +0800
In mas_alloc_nodes(), "node->node_count = 0" means to initialize the
node_count field of the new node, but the node may not be a new node. It
may be a node that existed before and node_count has a value, setting it
to 0 will cause a memory leak. At this time, mas->alloc->total will be
greater than the actual number of nodes in the linked list, which may
cause many other errors. For example, out-of-bounds access in
mas_pop_node(), and mas_pop_node() may return addresses that should not be
used. Fix it by initializing node_count only for new nodes.
Also, by the way, an if-else statement was removed to simplify the code.
Link: https://lkml.kernel.org/r/20230411041005.26205-1-zhangpeng.00@bytedance.com
Fixes: 54a611b60590 ("Maple Tree: add new data structure")
Signed-off-by: Peng Zhang <zhangpeng.00(a)bytedance.com>
Cc: Liam R. Howlett <Liam.Howlett(a)Oracle.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
lib/maple_tree.c | 19 +++++++------------
1 file changed, 7 insertions(+), 12 deletions(-)
--- a/lib/maple_tree.c~maple_tree-fix-a-potential-memory-leak-oob-access-or-other-unpredictable-bug
+++ a/lib/maple_tree.c
@@ -1303,26 +1303,21 @@ static inline void mas_alloc_nodes(struc
node = mas->alloc;
node->request_count = 0;
while (requested) {
- max_req = MAPLE_ALLOC_SLOTS;
- if (node->node_count) {
- unsigned int offset = node->node_count;
-
- slots = (void **)&node->slot[offset];
- max_req -= offset;
- } else {
- slots = (void **)&node->slot;
- }
-
+ max_req = MAPLE_ALLOC_SLOTS - node->node_count;
+ slots = (void **)&node->slot[node->node_count];
max_req = min(requested, max_req);
count = mt_alloc_bulk(gfp, max_req, slots);
if (!count)
goto nomem_bulk;
+ if (node->node_count == 0) {
+ node->slot[0]->node_count = 0;
+ node->slot[0]->request_count = 0;
+ }
+
node->node_count += count;
allocated += count;
node = node->slot[0];
- node->node_count = 0;
- node->request_count = 0;
requested -= count;
}
mas->alloc->total = allocated;
_
Patches currently in -mm which might be from zhangpeng.00(a)bytedance.com are
maple_tree-fix-a-potential-memory-leak-oob-access-or-other-unpredictable-bug.patch
mm-kfence-improve-the-performance-of-__kfence_alloc-and-__kfence_free.patch
maple_tree-simplify-mas_wr_node_walk.patch
maple_tree-use-correct-variable-type-in-sizeof.patch
The patch titled
Subject: tools/mm/page_owner_sort.c: fix TGID output when cull=tg is used
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
tools-mm-page_owner_sortc-fix-tgid-output-when-cull=tg-is-used.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Steve Chou <steve_chou(a)pesi.com.tw>
Subject: tools/mm/page_owner_sort.c: fix TGID output when cull=tg is used
Date: Tue, 11 Apr 2023 11:49:28 +0800
When using cull option with 'tg' flag, the fprintf is using pid instead
of tgid. It should use tgid instead.
Link: https://lkml.kernel.org/r/20230411034929.2071501-1-steve_chou@pesi.com.tw
Fixes: 9c8a0a8e599f4a ("tools/vm/page_owner_sort.c: support for user-defined culling rules")
Signed-off-by: Steve Chou <steve_chou(a)pesi.com.tw>
Cc: Jiajian Ye <yejiajian2018(a)email.szu.edu.cn>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
tools/mm/page_owner_sort.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/tools/mm/page_owner_sort.c~tools-mm-page_owner_sortc-fix-tgid-output-when-cull=tg-is-used
+++ a/tools/mm/page_owner_sort.c
@@ -857,7 +857,7 @@ int main(int argc, char **argv)
if (cull & CULL_PID || filter & FILTER_PID)
fprintf(fout, ", PID %d", list[i].pid);
if (cull & CULL_TGID || filter & FILTER_TGID)
- fprintf(fout, ", TGID %d", list[i].pid);
+ fprintf(fout, ", TGID %d", list[i].tgid);
if (cull & CULL_COMM || filter & FILTER_COMM)
fprintf(fout, ", task_comm_name: %s", list[i].comm);
if (cull & CULL_ALLOCATOR) {
_
Patches currently in -mm which might be from steve_chou(a)pesi.com.tw are
tools-mm-page_owner_sortc-fix-tgid-output-when-cull=tg-is-used.patch
The quilt patch titled
Subject: maple_tree: fix a potential memory leak, OOB access, or other unpredictable bug
has been removed from the -mm tree. Its filename was
maple_tree-fix-a-potential-memory-leak-oob-access-or-other-unpredictable-bug.patch
This patch was dropped because an updated version will be merged
------------------------------------------------------
From: Peng Zhang <zhangpeng.00(a)bytedance.com>
Subject: maple_tree: fix a potential memory leak, OOB access, or other unpredictable bug
Date: Fri, 7 Apr 2023 12:07:18 +0800
In mas_alloc_nodes(), there is such a piece of code:
while (requested) {
...
node->node_count = 0;
...
}
"node->node_count = 0" means to initialize the node_count field of the new
node, but the node may not be a new node. It may be a node that existed
before and node_count has a value, setting it to 0 will cause a memory
leak. At this time, mas->alloc->total will be greater than the actual
number of nodes in the linked list, which may cause many other errors.
For example, out-of-bounds access in mas_pop_node(), and mas_pop_node()
may return addresses that should not be used. Fix it by initializing
node_count only for new nodes.
Link: https://lkml.kernel.org/r/20230407040718.99064-2-zhangpeng.00@bytedance.com
Fixes: 54a611b60590 ("Maple Tree: add new data structure")
Signed-off-by: Peng Zhang <zhangpeng.00(a)bytedance.com>
Cc: Liam R. Howlett <Liam.Howlett(a)Oracle.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
lib/maple_tree.c | 16 ++++------------
1 file changed, 4 insertions(+), 12 deletions(-)
--- a/lib/maple_tree.c~maple_tree-fix-a-potential-memory-leak-oob-access-or-other-unpredictable-bug
+++ a/lib/maple_tree.c
@@ -1303,26 +1303,18 @@ static inline void mas_alloc_nodes(struc
node = mas->alloc;
node->request_count = 0;
while (requested) {
- max_req = MAPLE_ALLOC_SLOTS;
- if (node->node_count) {
- unsigned int offset = node->node_count;
-
- slots = (void **)&node->slot[offset];
- max_req -= offset;
- } else {
- slots = (void **)&node->slot;
- }
-
+ max_req = MAPLE_ALLOC_SLOTS - node->node_count;
+ slots = (void **)&node->slot[node->node_count];
max_req = min(requested, max_req);
count = mt_alloc_bulk(gfp, max_req, slots);
if (!count)
goto nomem_bulk;
+ if (node->node_count == 0)
+ node->slot[0]->node_count = 0;
node->node_count += count;
allocated += count;
node = node->slot[0];
- node->node_count = 0;
- node->request_count = 0;
requested -= count;
}
mas->alloc->total = allocated;
_
Patches currently in -mm which might be from zhangpeng.00(a)bytedance.com are
maple_tree-use-correct-variable-type-in-sizeof.patch
mm-kfence-improve-the-performance-of-__kfence_alloc-and-__kfence_free.patch
maple_tree-simplify-mas_wr_node_walk.patch
maple_tree-add-a-test-case-to-check-maple_alloc.patch
The patch titled
Subject: mm/mempolicy: fix use-after-free of VMA iterator
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
mm-mempolicy-fix-use-after-free-of-vma-iterator.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: "Liam R. Howlett" <Liam.Howlett(a)oracle.com>
Subject: mm/mempolicy: fix use-after-free of VMA iterator
Date: Mon, 10 Apr 2023 11:22:05 -0400
set_mempolicy_home_node() iterates over a list of VMAs and calls
mbind_range() on each VMA, which also iterates over the singular list of
the VMA passed in and potentially splits the VMA. Since the VMA iterator
is not passed through, set_mempolicy_home_node() may now point to a stale
node in the VMA tree. This can result in a UAF as reported by syzbot.
Avoid the stale maple tree node by passing the VMA iterator through to the
underlying call to split_vma().
mbind_range() is also overly complicated, since there are two calling
functions and one already handles iterating over the VMAs. Simplify
mbind_range() to only handle merging and splitting of the VMAs.
Align the new loop in do_mbind() and existing loop in
set_mempolicy_home_node() to use the reduced mbind_range() function. This
allows for a single location of the range calculation and avoids
constantly looking up the previous VMA (since this is a loop over the
VMAs).
Link: https://lore.kernel.org/linux-mm/000000000000c93feb05f87e24ad@google.com/
Fixes: 66850be55e8e ("mm/mempolicy: use vma iterator & maple state instead of vma linked list")
Signed-off-by: Liam R. Howlett <Liam.Howlett(a)oracle.com>
Reported-by: syzbot+a7c1ec5b1d71ceaa5186(a)syzkaller.appspotmail.com
Link: https://lkml.kernel.org/r/20230410152205.2294819-1-Liam.Howlett@oracle.com
Tested-by: syzbot+a7c1ec5b1d71ceaa5186(a)syzkaller.appspotmail.com
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/mempolicy.c | 102 ++++++++++++++++++++++-------------------------
1 file changed, 48 insertions(+), 54 deletions(-)
--- a/mm/mempolicy.c~mm-mempolicy-fix-use-after-free-of-vma-iterator
+++ a/mm/mempolicy.c
@@ -790,61 +790,50 @@ static int vma_replace_policy(struct vm_
return err;
}
-/* Step 2: apply policy to a range and do splits. */
-static int mbind_range(struct mm_struct *mm, unsigned long start,
- unsigned long end, struct mempolicy *new_pol)
+/* Split or merge the VMA (if required) and apply the new policy */
+static int mbind_range(struct vma_iterator *vmi, struct vm_area_struct *vma,
+ struct vm_area_struct **prev, unsigned long start,
+ unsigned long end, struct mempolicy *new_pol)
{
- VMA_ITERATOR(vmi, mm, start);
- struct vm_area_struct *prev;
- struct vm_area_struct *vma;
- int err = 0;
+ struct vm_area_struct *merged;
+ unsigned long vmstart, vmend;
pgoff_t pgoff;
+ int err;
- prev = vma_prev(&vmi);
- vma = vma_find(&vmi, end);
- if (WARN_ON(!vma))
+ vmend = min(end, vma->vm_end);
+ if (start > vma->vm_start) {
+ *prev = vma;
+ vmstart = start;
+ } else {
+ vmstart = vma->vm_start;
+ }
+
+ if (mpol_equal(vma_policy(vma), new_pol))
return 0;
- if (start > vma->vm_start)
- prev = vma;
+ pgoff = vma->vm_pgoff + ((vmstart - vma->vm_start) >> PAGE_SHIFT);
+ merged = vma_merge(vmi, vma->vm_mm, *prev, vmstart, vmend, vma->vm_flags,
+ vma->anon_vma, vma->vm_file, pgoff, new_pol,
+ vma->vm_userfaultfd_ctx, anon_vma_name(vma));
+ if (merged) {
+ *prev = merged;
+ return vma_replace_policy(merged, new_pol);
+ }
- do {
- unsigned long vmstart = max(start, vma->vm_start);
- unsigned long vmend = min(end, vma->vm_end);
-
- if (mpol_equal(vma_policy(vma), new_pol))
- goto next;
-
- pgoff = vma->vm_pgoff +
- ((vmstart - vma->vm_start) >> PAGE_SHIFT);
- prev = vma_merge(&vmi, mm, prev, vmstart, vmend, vma->vm_flags,
- vma->anon_vma, vma->vm_file, pgoff,
- new_pol, vma->vm_userfaultfd_ctx,
- anon_vma_name(vma));
- if (prev) {
- vma = prev;
- goto replace;
- }
- if (vma->vm_start != vmstart) {
- err = split_vma(&vmi, vma, vmstart, 1);
- if (err)
- goto out;
- }
- if (vma->vm_end != vmend) {
- err = split_vma(&vmi, vma, vmend, 0);
- if (err)
- goto out;
- }
-replace:
- err = vma_replace_policy(vma, new_pol);
+ if (vma->vm_start != vmstart) {
+ err = split_vma(vmi, vma, vmstart, 1);
if (err)
- goto out;
-next:
- prev = vma;
- } for_each_vma_range(vmi, vma, end);
+ return err;
+ }
-out:
- return err;
+ if (vma->vm_end != vmend) {
+ err = split_vma(vmi, vma, vmend, 0);
+ if (err)
+ return err;
+ }
+
+ *prev = vma;
+ return vma_replace_policy(vma, new_pol);
}
/* Set the process memory policy */
@@ -1259,6 +1248,8 @@ static long do_mbind(unsigned long start
nodemask_t *nmask, unsigned long flags)
{
struct mm_struct *mm = current->mm;
+ struct vm_area_struct *vma, *prev;
+ struct vma_iterator vmi;
struct mempolicy *new;
unsigned long end;
int err;
@@ -1328,7 +1319,13 @@ static long do_mbind(unsigned long start
goto up_out;
}
- err = mbind_range(mm, start, end, new);
+ vma_iter_init(&vmi, mm, start);
+ prev = vma_prev(&vmi);
+ for_each_vma_range(vmi, vma, end) {
+ err = mbind_range(&vmi, vma, &prev, start, end, new);
+ if (err)
+ break;
+ }
if (!err) {
int nr_failed = 0;
@@ -1489,10 +1486,8 @@ SYSCALL_DEFINE4(set_mempolicy_home_node,
unsigned long, home_node, unsigned long, flags)
{
struct mm_struct *mm = current->mm;
- struct vm_area_struct *vma;
+ struct vm_area_struct *vma, *prev;
struct mempolicy *new, *old;
- unsigned long vmstart;
- unsigned long vmend;
unsigned long end;
int err = -ENOENT;
VMA_ITERATOR(vmi, mm, start);
@@ -1521,6 +1516,7 @@ SYSCALL_DEFINE4(set_mempolicy_home_node,
if (end == start)
return 0;
mmap_write_lock(mm);
+ prev = vma_prev(&vmi);
for_each_vma_range(vmi, vma, end) {
/*
* If any vma in the range got policy other than MPOL_BIND
@@ -1541,9 +1537,7 @@ SYSCALL_DEFINE4(set_mempolicy_home_node,
}
new->home_node = home_node;
- vmstart = max(start, vma->vm_start);
- vmend = min(end, vma->vm_end);
- err = mbind_range(mm, vmstart, vmend, new);
+ err = mbind_range(&vmi, vma, &prev, start, end, new);
mpol_put(new);
if (err)
break;
_
Patches currently in -mm which might be from Liam.Howlett(a)oracle.com are
mm-mprotect-fix-do_mprotect_pkey-return-on-error.patch
mm-mempolicy-fix-use-after-free-of-vma-iterator.patch
The patch titled
Subject: maple_tree: use correct variable type in sizeof
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
maple_tree-use-correct-variable-type-in-sizeof.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Peng Zhang <zhangpeng.00(a)bytedance.com>
Subject: maple_tree: use correct variable type in sizeof
Date: Tue, 11 Apr 2023 10:35:13 +0800
The type of variable pointed to by pivs is unsigned long, but the type
used in sizeof is a pointer type. Change it to unsigned long.
Link: https://lkml.kernel.org/r/20230411023513.15227-1-zhangpeng.00@bytedance.com
Fixes: 54a611b60590 ("Maple Tree: add new data structure")
Signed-off-by: Peng Zhang <zhangpeng.00(a)bytedance.com>
Reported-by: David Binderman <dcb314(a)hotmail.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett(a)oracle.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
lib/maple_tree.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/lib/maple_tree.c~maple_tree-use-correct-variable-type-in-sizeof
+++ a/lib/maple_tree.c
@@ -3279,7 +3279,7 @@ static inline void mas_destroy_rebalance
if (tmp < max_p)
memset(pivs + tmp, 0,
- sizeof(unsigned long *) * (max_p - tmp));
+ sizeof(unsigned long) * (max_p - tmp));
if (tmp < mt_slots[mt])
memset(slots + tmp, 0, sizeof(void *) * (max_s - tmp));
_
Patches currently in -mm which might be from zhangpeng.00(a)bytedance.com are
maple_tree-fix-a-potential-memory-leak-oob-access-or-other-unpredictable-bug.patch
maple_tree-use-correct-variable-type-in-sizeof.patch
mm-kfence-improve-the-performance-of-__kfence_alloc-and-__kfence_free.patch
maple_tree-simplify-mas_wr_node_walk.patch
maple_tree-add-a-test-case-to-check-maple_alloc.patch
From: "Liam R. Howlett" <Liam.Howlett(a)Oracle.com>
Use the maple tree in RCU mode for VMA tracking.
The maple tree tracks the stack and is able to update the pivot
(lower/upper boundary) in-place to allow the page fault handler to write
to the tree while holding just the mmap read lock. This is safe as the
writes to the stack have a guard VMA which ensures there will always be
a NULL in the direction of the growth and thus will only update a pivot.
It is possible, but not recommended, to have VMAs that grow up/down
without guard VMAs. syzbot has constructed a testcase which sets up a
VMA to grow and consume the empty space. Overwriting the entire NULL
entry causes the tree to be altered in a way that is not safe for
concurrent readers; the readers may see a node being rewritten or one
that does not match the maple state they are using.
Enabling RCU mode allows the concurrent readers to see a stable node and
will return the expected result.
Link: https://lkml.kernel.org/r/20230227173632.3292573-9-surenb@google.com
Link: https://lore.kernel.org/linux-mm/000000000000b0a65805f663ace6@google.com/
Cc: stable(a)vger.kernel.org
Fixes: d4af56c5c7c6 ("mm: start tracking VMAs with maple tree")
Signed-off-by: Liam R. Howlett <Liam.Howlett(a)oracle.com>
Reported-by: syzbot+8d95422d3537159ca390(a)syzkaller.appspotmail.com
---
include/linux/mm_types.h | 3 ++-
kernel/fork.c | 3 +++
mm/mmap.c | 3 ++-
3 files changed, 7 insertions(+), 2 deletions(-)
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 0722859c3647..a57e6ae78e65 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -774,7 +774,8 @@ struct mm_struct {
unsigned long cpu_bitmap[];
};
-#define MM_MT_FLAGS (MT_FLAGS_ALLOC_RANGE | MT_FLAGS_LOCK_EXTERN)
+#define MM_MT_FLAGS (MT_FLAGS_ALLOC_RANGE | MT_FLAGS_LOCK_EXTERN | \
+ MT_FLAGS_USE_RCU)
extern struct mm_struct init_mm;
/* Pointer magic because the dynamic array size confuses some compilers. */
diff --git a/kernel/fork.c b/kernel/fork.c
index d8cda4c6de6c..1bf31ba07e85 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -617,6 +617,7 @@ static __latent_entropy int dup_mmap(struct mm_struct *mm,
if (retval)
goto out;
+ mt_clear_in_rcu(vmi.mas.tree);
for_each_vma(old_vmi, mpnt) {
struct file *file;
@@ -700,6 +701,8 @@ static __latent_entropy int dup_mmap(struct mm_struct *mm,
retval = arch_dup_mmap(oldmm, mm);
loop_out:
vma_iter_free(&vmi);
+ if (!retval)
+ mt_set_in_rcu(vmi.mas.tree);
out:
mmap_write_unlock(mm);
flush_tlb_mm(oldmm);
diff --git a/mm/mmap.c b/mm/mmap.c
index 740b54be3ed4..16cbb83b3ec6 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -2277,7 +2277,7 @@ do_vmi_align_munmap(struct vma_iterator *vmi, struct vm_area_struct *vma,
int count = 0;
int error = -ENOMEM;
MA_STATE(mas_detach, &mt_detach, 0, 0);
- mt_init_flags(&mt_detach, MT_FLAGS_LOCK_EXTERN);
+ mt_init_flags(&mt_detach, vmi->mas.tree->ma_flags & MT_FLAGS_LOCK_MASK);
mt_set_external_lock(&mt_detach, &mm->mmap_lock);
/*
@@ -3042,6 +3042,7 @@ void exit_mmap(struct mm_struct *mm)
*/
set_bit(MMF_OOM_SKIP, &mm->flags);
mmap_write_lock(mm);
+ mt_clear_in_rcu(&mm->mm_mt);
free_pgtables(&tlb, &mm->mm_mt, vma, FIRST_USER_ADDRESS,
USER_PGTABLES_CEILING);
tlb_finish_mmu(&tlb);
--
2.39.2
The following changes since commit 7e364e56293bb98cae1b55fd835f5991c4e96e7d:
Linux 6.3-rc5 (2023-04-02 14:29:29 -0700)
are available in the Git repository at:
https://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git tags/for_linus
for you to fetch changes up to 9da667e50c7e62266f3c2f8ad57b32fca40716b1:
vdpa_sim_net: complete the initialization before register the device (2023-04-04 14:22:12 -0400)
----------------------------------------------------------------
virtio: last minute fixes
Some last minute fixes - most of them for regressions.
Signed-off-by: Michael S. Tsirkin <mst(a)redhat.com>
----------------------------------------------------------------
Dmitry Fomichev (2):
virtio-blk: fix to match virtio spec
virtio-blk: fix ZBD probe in kernels without ZBD support
Eli Cohen (1):
vdpa/mlx5: Add and remove debugfs in setup/teardown driver
Mike Christie (2):
vhost-scsi: Fix vhost_scsi struct use after free
vhost-scsi: Fix crash during LUN unmapping
Ross Zwisler (1):
tools/virtio: fix typo in README instructions
Stefano Garzarella (1):
vdpa_sim_net: complete the initialization before register the device
drivers/block/virtio_blk.c | 269 ++++++++++++++++++++++-------------
drivers/vdpa/mlx5/net/mlx5_vnet.c | 8 +-
drivers/vdpa/vdpa_sim/vdpa_sim_net.c | 13 +-
drivers/vhost/scsi.c | 39 +----
include/uapi/linux/virtio_blk.h | 18 +--
tools/virtio/virtio-trace/README | 2 +-
6 files changed, 205 insertions(+), 144 deletions(-)
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x ed1f4ccfe947a3e1018a3bd7325134574c7ff9b3
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023041033-paragraph-uselessly-114f@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
ed1f4ccfe947 ("clk: imx: imx8mp: add shared clk gate for usb suspend clk")
cf7f3f4fa9e5 ("clk: imx8mp: fix usb_root_clk parent")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From ed1f4ccfe947a3e1018a3bd7325134574c7ff9b3 Mon Sep 17 00:00:00 2001
From: Li Jun <jun.li(a)nxp.com>
Date: Fri, 30 Sep 2022 22:54:22 +0800
Subject: [PATCH] clk: imx: imx8mp: add shared clk gate for usb suspend clk
32K usb suspend clock gate is shared with usb_root_clk, this
shared clock gate was initially defined only for usb suspend
clock, usb suspend clk is kept on while system is active or
system sleep with usb wakeup enabled, so usb root clock is
fine with this situation; with the commit cf7f3f4fa9e5
("clk: imx8mp: fix usb_root_clk parent"), this clock gate is
changed to be for usb root clock, but usb root clock will
be off while usb is suspended, so usb suspend clock will be
gated too, this cause some usb functionalities will not work,
so define this clock to be a shared clock gate to conform with
the real HW status.
Fixes: 9c140d9926761 ("clk: imx: Add support for i.MX8MP clock driver")
Cc: stable(a)vger.kernel.org # v5.19+
Tested-by: Alexander Stein <alexander.stein(a)ew.tq-group.com>
Signed-off-by: Li Jun <jun.li(a)nxp.com>
Signed-off-by: Abel Vesa <abel.vesa(a)linaro.org>
Link: https://lore.kernel.org/r/1664549663-20364-2-git-send-email-jun.li@nxp.com
diff --git a/drivers/clk/imx/clk-imx8mp.c b/drivers/clk/imx/clk-imx8mp.c
index 652ae58c2735..5d68d975b4eb 100644
--- a/drivers/clk/imx/clk-imx8mp.c
+++ b/drivers/clk/imx/clk-imx8mp.c
@@ -17,6 +17,7 @@
static u32 share_count_nand;
static u32 share_count_media;
+static u32 share_count_usb;
static const char * const pll_ref_sels[] = { "osc_24m", "dummy", "dummy", "dummy", };
static const char * const audio_pll1_bypass_sels[] = {"audio_pll1", "audio_pll1_ref_sel", };
@@ -673,7 +674,8 @@ static int imx8mp_clocks_probe(struct platform_device *pdev)
hws[IMX8MP_CLK_UART2_ROOT] = imx_clk_hw_gate4("uart2_root_clk", "uart2", ccm_base + 0x44a0, 0);
hws[IMX8MP_CLK_UART3_ROOT] = imx_clk_hw_gate4("uart3_root_clk", "uart3", ccm_base + 0x44b0, 0);
hws[IMX8MP_CLK_UART4_ROOT] = imx_clk_hw_gate4("uart4_root_clk", "uart4", ccm_base + 0x44c0, 0);
- hws[IMX8MP_CLK_USB_ROOT] = imx_clk_hw_gate4("usb_root_clk", "hsio_axi", ccm_base + 0x44d0, 0);
+ hws[IMX8MP_CLK_USB_ROOT] = imx_clk_hw_gate2_shared2("usb_root_clk", "hsio_axi", ccm_base + 0x44d0, 0, &share_count_usb);
+ hws[IMX8MP_CLK_USB_SUSP] = imx_clk_hw_gate2_shared2("usb_suspend_clk", "osc_32k", ccm_base + 0x44d0, 0, &share_count_usb);
hws[IMX8MP_CLK_USB_PHY_ROOT] = imx_clk_hw_gate4("usb_phy_root_clk", "usb_phy_ref", ccm_base + 0x44f0, 0);
hws[IMX8MP_CLK_USDHC1_ROOT] = imx_clk_hw_gate4("usdhc1_root_clk", "usdhc1", ccm_base + 0x4510, 0);
hws[IMX8MP_CLK_USDHC2_ROOT] = imx_clk_hw_gate4("usdhc2_root_clk", "usdhc2", ccm_base + 0x4520, 0);
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.14.y
git checkout FETCH_HEAD
git cherry-pick -x b43a18647f03c87e77d50d6fe74904b61b96323e
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023041046-synthetic-urgent-3126@gregkh' --subject-prefix 'PATCH 4.14.y' HEAD^..
Possible dependencies:
b43a18647f03 ("tty: serial: sh-sci: Fix transmit end interrupt handler")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From b43a18647f03c87e77d50d6fe74904b61b96323e Mon Sep 17 00:00:00 2001
From: Biju Das <biju.das.jz(a)bp.renesas.com>
Date: Fri, 17 Mar 2023 15:04:03 +0000
Subject: [PATCH] tty: serial: sh-sci: Fix transmit end interrupt handler
The fourth interrupt on SCI port is transmit end interrupt compared to
the break interrupt on other port types. So, shuffle the interrupts to fix
the transmit end interrupt handler.
Fixes: e1d0be616186 ("sh-sci: Add h8300 SCI")
Cc: stable <stable(a)kernel.org>
Suggested-by: Geert Uytterhoeven <geert+renesas(a)glider.be>
Signed-off-by: Biju Das <biju.das.jz(a)bp.renesas.com>
Link: https://lore.kernel.org/r/20230317150403.154094-1-biju.das.jz@bp.renesas.com
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
diff --git a/drivers/tty/serial/sh-sci.c b/drivers/tty/serial/sh-sci.c
index 7bd080720929..c07663fe80bf 100644
--- a/drivers/tty/serial/sh-sci.c
+++ b/drivers/tty/serial/sh-sci.c
@@ -31,6 +31,7 @@
#include <linux/ioport.h>
#include <linux/ktime.h>
#include <linux/major.h>
+#include <linux/minmax.h>
#include <linux/module.h>
#include <linux/mm.h>
#include <linux/of.h>
@@ -2864,6 +2865,13 @@ static int sci_init_single(struct platform_device *dev,
sci_port->irqs[i] = platform_get_irq(dev, i);
}
+ /*
+ * The fourth interrupt on SCI port is transmit end interrupt, so
+ * shuffle the interrupts.
+ */
+ if (p->type == PORT_SCI)
+ swap(sci_port->irqs[SCIx_BRI_IRQ], sci_port->irqs[SCIx_TEI_IRQ]);
+
/* The SCI generates several interrupts. They can be muxed together or
* connected to different interrupt lines. In the muxed case only one
* interrupt resource is specified as there is only one interrupt ID.
If firmware loading fails, the controller's pm_state is updated to
MHI_PM_FW_DL_ERR unconditionally. This can corrupt the pm_state as the
update is not done under the proper lock, and also does not validate
the state transition. The firmware loading can fail due to a detected
syserr, but if MHI_PM_FW_DL_ERR is unconditionally set as the pm_state,
the handling of the syserr can break when it attempts to transition from
syserr detect, to syserr process.
By grabbing the lock, we ensure we don't race with some other pm_state
update. By using mhi_try_set_pm_state(), we check that the transition
to MHI_PM_FW_DL_ERR is valid via the state machine logic. If it is not
valid, then some other transition is occurring like syserr processing, and
we assume that will resolve the firmware loading error.
Fixes: 12e050c77be0 ("bus: mhi: core: Move to an error state on any firmware load failure")
Cc: stable(a)vger.kernel.org
Signed-off-by: Jeffrey Hugo <quic_jhugo(a)quicinc.com>
Reviewed-by: Carl Vanderlip <quic_carlv(a)quicinc.com>
Reviewed-by: Manivannan Sadhasivam <mani(a)kernel.org>
---
drivers/bus/mhi/host/boot.c | 16 ++++++++++++----
1 file changed, 12 insertions(+), 4 deletions(-)
diff --git a/drivers/bus/mhi/host/boot.c b/drivers/bus/mhi/host/boot.c
index 1c69fee..d2a19b07 100644
--- a/drivers/bus/mhi/host/boot.c
+++ b/drivers/bus/mhi/host/boot.c
@@ -391,6 +391,7 @@ void mhi_fw_load_handler(struct mhi_controller *mhi_cntrl)
{
const struct firmware *firmware = NULL;
struct device *dev = &mhi_cntrl->mhi_dev->dev;
+ enum mhi_pm_state new_state;
const char *fw_name;
void *buf;
dma_addr_t dma_addr;
@@ -508,14 +509,18 @@ void mhi_fw_load_handler(struct mhi_controller *mhi_cntrl)
}
error_fw_load:
- mhi_cntrl->pm_state = MHI_PM_FW_DL_ERR;
- wake_up_all(&mhi_cntrl->state_event);
+ write_lock_irq(&mhi_cntrl->pm_lock);
+ new_state = mhi_tryset_pm_state(mhi_cntrl, MHI_PM_FW_DL_ERR);
+ write_unlock_irq(&mhi_cntrl->pm_lock);
+ if (new_state == MHI_PM_FW_DL_ERR)
+ wake_up_all(&mhi_cntrl->state_event);
}
int mhi_download_amss_image(struct mhi_controller *mhi_cntrl)
{
struct image_info *image_info = mhi_cntrl->fbc_image;
struct device *dev = &mhi_cntrl->mhi_dev->dev;
+ enum mhi_pm_state new_state;
int ret;
if (!image_info)
@@ -526,8 +531,11 @@ int mhi_download_amss_image(struct mhi_controller *mhi_cntrl)
&image_info->mhi_buf[image_info->entries - 1]);
if (ret) {
dev_err(dev, "MHI did not load AMSS, ret:%d\n", ret);
- mhi_cntrl->pm_state = MHI_PM_FW_DL_ERR;
- wake_up_all(&mhi_cntrl->state_event);
+ write_lock_irq(&mhi_cntrl->pm_lock);
+ new_state = mhi_tryset_pm_state(mhi_cntrl, MHI_PM_FW_DL_ERR);
+ write_unlock_irq(&mhi_cntrl->pm_lock);
+ if (new_state == MHI_PM_FW_DL_ERR)
+ wake_up_all(&mhi_cntrl->state_event);
}
return ret;
--
2.7.4
If we detect a system error via intvec, we only process the syserr if the
current ee is different than the last observed ee. The reason for this
check is to prevent bhie from running multiple times, but with the single
queue handling syserr, that is not possible.
The check can cause an issue with device recovery. If PBL loads a bad SBL
via BHI, but that SBL hangs before notifying the host of an ee change,
then issuing soc_reset to crash the device and retry (after supplying a
fixed SBL) will not recover the device as the host will observe a PBL->PBL
transition and not process the syserr. The device will be stuck until
either the driver is reloaded, or the host is rebooted. Instead, remove
the check so that we can attempt to recover the device.
Fixes: ef2126c4e2ea ("bus: mhi: core: Process execution environment changes serially")
Cc: stable(a)vger.kernel.org
Signed-off-by: Jeffrey Hugo <quic_jhugo(a)quicinc.com>
Reviewed-by: Carl Vanderlip <quic_carlv(a)quicinc.com>
Reviewed-by: Manivannan Sadhasivam <mani(a)kernel.org>
---
drivers/bus/mhi/host/main.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/bus/mhi/host/main.c b/drivers/bus/mhi/host/main.c
index 4fa0969..3a08518 100644
--- a/drivers/bus/mhi/host/main.c
+++ b/drivers/bus/mhi/host/main.c
@@ -503,7 +503,7 @@ irqreturn_t mhi_intvec_threaded_handler(int irq_number, void *priv)
}
write_unlock_irq(&mhi_cntrl->pm_lock);
- if (pm_state != MHI_PM_SYS_ERR_DETECT || ee == mhi_cntrl->ee)
+ if (pm_state != MHI_PM_SYS_ERR_DETECT)
goto exit_intvec;
switch (ee) {
--
2.7.4
In mas_alloc_nodes(), there is such a piece of code:
while (requested) {
...
node->node_count = 0;
...
}
"node->node_count = 0" means to initialize the node_count field of the
new node, but the node may not be a new node. It may be a node that
existed before and node_count has a value, setting it to 0 will cause a
memory leak. At this time, mas->alloc->total will be greater than the
actual number of nodes in the linked list, which may cause many other
errors. For example, out-of-bounds access in mas_pop_node(), and
mas_pop_node() may return addresses that should not be used.
Fix it by initializing node_count only for new nodes.
Fixes: 54a611b60590 ("Maple Tree: add new data structure")
Signed-off-by: Peng Zhang <zhangpeng.00(a)bytedance.com>
Cc: <stable(a)vger.kernel.org>
---
lib/maple_tree.c | 16 ++++------------
1 file changed, 4 insertions(+), 12 deletions(-)
diff --git a/lib/maple_tree.c b/lib/maple_tree.c
index 65fd861b30e1..9e25b3215803 100644
--- a/lib/maple_tree.c
+++ b/lib/maple_tree.c
@@ -1249,26 +1249,18 @@ static inline void mas_alloc_nodes(struct ma_state *mas, gfp_t gfp)
node = mas->alloc;
node->request_count = 0;
while (requested) {
- max_req = MAPLE_ALLOC_SLOTS;
- if (node->node_count) {
- unsigned int offset = node->node_count;
-
- slots = (void **)&node->slot[offset];
- max_req -= offset;
- } else {
- slots = (void **)&node->slot;
- }
-
+ max_req = MAPLE_ALLOC_SLOTS - node->node_count;
+ slots = (void **)&node->slot[node->node_count];
max_req = min(requested, max_req);
count = mt_alloc_bulk(gfp, max_req, slots);
if (!count)
goto nomem_bulk;
+ if (node->node_count == 0)
+ node->slot[0]->node_count = 0;
node->node_count += count;
allocated += count;
node = node->slot[0];
- node->node_count = 0;
- node->request_count = 0;
requested -= count;
}
mas->alloc->total = allocated;
--
2.20.1
This bug is marked as fixed by commit:
net: core: netlink: add helper refcount dec and lock function
net: sched: add helper function to take reference to Qdisc
net: sched: extend Qdisc with rcu
net: sched: rename qdisc_destroy() to qdisc_put()
net: sched: use Qdisc rcu API instead of relying on rtnl lock
But I can't find it in the tested trees[1] for more than 90 days.
Is it a correct commit? Please update it by replying:
#syz fix: exact-commit-title
Until then the bug is still considered open and new crashes with
the same signature are ignored.
Kernel: Linux 4.19
Dashboard link: https://syzkaller.appspot.com/bug?extid=5f229e48cccc804062c0
---
[1] I expect the commit to be present in:
1. linux-4.19.y branch of
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git
Hi there, hello,
I've noticed that the recent (at the moment) releases 6.2.10 and 6.1.23 weren't merged into linux-rolling-stable and linux-rolling-lts respectively. Is there something wrong with them that they require the delay?
Kind regards,
~acidbong
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.14.y
git checkout FETCH_HEAD
git cherry-pick -x 9425914f3de6febbd6250395f56c8279676d9c3c
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023041012-drippy-italics-387e@gregkh' --subject-prefix 'PATCH 4.14.y' HEAD^..
Possible dependencies:
9425914f3de6 ("tty: serial: fsl_lpuart: avoid checking for transfer complete when UARTCTRL_SBK is asserted in lpuart32_tx_empty")
46dd6d779dcc ("serial: fsl_lpuart: consider TX FIFO too in lpuart32_tx_empty")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 9425914f3de6febbd6250395f56c8279676d9c3c Mon Sep 17 00:00:00 2001
From: Sherry Sun <sherry.sun(a)nxp.com>
Date: Thu, 23 Mar 2023 13:44:15 +0800
Subject: [PATCH] tty: serial: fsl_lpuart: avoid checking for transfer complete
when UARTCTRL_SBK is asserted in lpuart32_tx_empty
According to LPUART RM, Transmission Complete Flag becomes 0 if queuing
a break character by writing 1 to CTRL[SBK], so here need to avoid
checking for transmission complete when UARTCTRL_SBK is asserted,
otherwise the lpuart32_tx_empty may never get TIOCSER_TEMT.
Commit 2411fd94ceaa("tty: serial: fsl_lpuart: skip waiting for
transmission complete when UARTCTRL_SBK is asserted") only fix it in
lpuart32_set_termios(), here also fix it in lpuart32_tx_empty().
Fixes: 380c966c093e ("tty: serial: fsl_lpuart: add 32-bit register interface support")
Cc: stable <stable(a)kernel.org>
Signed-off-by: Sherry Sun <sherry.sun(a)nxp.com>
Link: https://lore.kernel.org/r/20230323054415.20363-1-sherry.sun@nxp.com
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
diff --git a/drivers/tty/serial/fsl_lpuart.c b/drivers/tty/serial/fsl_lpuart.c
index 56e6ba3250cd..edc6e35b701a 100644
--- a/drivers/tty/serial/fsl_lpuart.c
+++ b/drivers/tty/serial/fsl_lpuart.c
@@ -858,11 +858,17 @@ static unsigned int lpuart32_tx_empty(struct uart_port *port)
struct lpuart_port, port);
unsigned long stat = lpuart32_read(port, UARTSTAT);
unsigned long sfifo = lpuart32_read(port, UARTFIFO);
+ unsigned long ctrl = lpuart32_read(port, UARTCTRL);
if (sport->dma_tx_in_progress)
return 0;
- if (stat & UARTSTAT_TC && sfifo & UARTFIFO_TXEMPT)
+ /*
+ * LPUART Transmission Complete Flag may never be set while queuing a break
+ * character, so avoid checking for transmission complete when UARTCTRL_SBK
+ * is asserted.
+ */
+ if ((stat & UARTSTAT_TC && sfifo & UARTFIFO_TXEMPT) || ctrl & UARTCTRL_SBK)
return TIOCSER_TEMT;
return 0;
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.19.y
git checkout FETCH_HEAD
git cherry-pick -x 9425914f3de6febbd6250395f56c8279676d9c3c
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023041011-vanilla-unpicked-92cc@gregkh' --subject-prefix 'PATCH 4.19.y' HEAD^..
Possible dependencies:
9425914f3de6 ("tty: serial: fsl_lpuart: avoid checking for transfer complete when UARTCTRL_SBK is asserted in lpuart32_tx_empty")
46dd6d779dcc ("serial: fsl_lpuart: consider TX FIFO too in lpuart32_tx_empty")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 9425914f3de6febbd6250395f56c8279676d9c3c Mon Sep 17 00:00:00 2001
From: Sherry Sun <sherry.sun(a)nxp.com>
Date: Thu, 23 Mar 2023 13:44:15 +0800
Subject: [PATCH] tty: serial: fsl_lpuart: avoid checking for transfer complete
when UARTCTRL_SBK is asserted in lpuart32_tx_empty
According to LPUART RM, Transmission Complete Flag becomes 0 if queuing
a break character by writing 1 to CTRL[SBK], so here need to avoid
checking for transmission complete when UARTCTRL_SBK is asserted,
otherwise the lpuart32_tx_empty may never get TIOCSER_TEMT.
Commit 2411fd94ceaa("tty: serial: fsl_lpuart: skip waiting for
transmission complete when UARTCTRL_SBK is asserted") only fix it in
lpuart32_set_termios(), here also fix it in lpuart32_tx_empty().
Fixes: 380c966c093e ("tty: serial: fsl_lpuart: add 32-bit register interface support")
Cc: stable <stable(a)kernel.org>
Signed-off-by: Sherry Sun <sherry.sun(a)nxp.com>
Link: https://lore.kernel.org/r/20230323054415.20363-1-sherry.sun@nxp.com
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
diff --git a/drivers/tty/serial/fsl_lpuart.c b/drivers/tty/serial/fsl_lpuart.c
index 56e6ba3250cd..edc6e35b701a 100644
--- a/drivers/tty/serial/fsl_lpuart.c
+++ b/drivers/tty/serial/fsl_lpuart.c
@@ -858,11 +858,17 @@ static unsigned int lpuart32_tx_empty(struct uart_port *port)
struct lpuart_port, port);
unsigned long stat = lpuart32_read(port, UARTSTAT);
unsigned long sfifo = lpuart32_read(port, UARTFIFO);
+ unsigned long ctrl = lpuart32_read(port, UARTCTRL);
if (sport->dma_tx_in_progress)
return 0;
- if (stat & UARTSTAT_TC && sfifo & UARTFIFO_TXEMPT)
+ /*
+ * LPUART Transmission Complete Flag may never be set while queuing a break
+ * character, so avoid checking for transmission complete when UARTCTRL_SBK
+ * is asserted.
+ */
+ if ((stat & UARTSTAT_TC && sfifo & UARTFIFO_TXEMPT) || ctrl & UARTCTRL_SBK)
return TIOCSER_TEMT;
return 0;
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.19.y
git checkout FETCH_HEAD
git cherry-pick -x eddebe39602efe631b83ff8d03f26eba12cfd760
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023041032-pasty-shifty-db8c@gregkh' --subject-prefix 'PATCH 4.19.y' HEAD^..
Possible dependencies:
eddebe39602e ("usb: typec: altmodes/displayport: Fix configure initial pin assignment")
09fed4d64d3f ("usb: typec: altmodes/displayport: Fall back to multi-func pins")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From eddebe39602efe631b83ff8d03f26eba12cfd760 Mon Sep 17 00:00:00 2001
From: RD Babiera <rdbabiera(a)google.com>
Date: Wed, 29 Mar 2023 21:51:59 +0000
Subject: [PATCH] usb: typec: altmodes/displayport: Fix configure initial pin
assignment
While determining the initial pin assignment to be sent in the configure
message, using the DP_PIN_ASSIGN_DP_ONLY_MASK mask causes the DFP_U to
send both Pin Assignment C and E when both are supported by the DFP_U and
UFP_U. The spec (Table 5-7 DFP_U Pin Assignment Selection Mandates,
VESA DisplayPort Alt Mode Standard v2.0) indicates that the DFP_U never
selects Pin Assignment E when Pin Assignment C is offered.
Update the DP_PIN_ASSIGN_DP_ONLY_MASK conditional to intially select only
Pin Assignment C if it is available.
Fixes: 0e3bb7d6894d ("usb: typec: Add driver for DisplayPort alternate mode")
Cc: stable(a)vger.kernel.org
Signed-off-by: RD Babiera <rdbabiera(a)google.com>
Reviewed-by: Heikki Krogerus <heikki.krogerus(a)linux.intel.com>
Link: https://lore.kernel.org/r/20230329215159.2046932-1-rdbabiera@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
diff --git a/drivers/usb/typec/altmodes/displayport.c b/drivers/usb/typec/altmodes/displayport.c
index 662cd043b50e..8f3e884222ad 100644
--- a/drivers/usb/typec/altmodes/displayport.c
+++ b/drivers/usb/typec/altmodes/displayport.c
@@ -112,8 +112,12 @@ static int dp_altmode_configure(struct dp_altmode *dp, u8 con)
if (dp->data.status & DP_STATUS_PREFER_MULTI_FUNC &&
pin_assign & DP_PIN_ASSIGN_MULTI_FUNC_MASK)
pin_assign &= DP_PIN_ASSIGN_MULTI_FUNC_MASK;
- else if (pin_assign & DP_PIN_ASSIGN_DP_ONLY_MASK)
+ else if (pin_assign & DP_PIN_ASSIGN_DP_ONLY_MASK) {
pin_assign &= DP_PIN_ASSIGN_DP_ONLY_MASK;
+ /* Default to pin assign C if available */
+ if (pin_assign & BIT(DP_PIN_ASSIGN_C))
+ pin_assign = BIT(DP_PIN_ASSIGN_C);
+ }
if (!pin_assign)
return -EINVAL;
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.19.y
git checkout FETCH_HEAD
git cherry-pick -x ecaa4902439298f6b0e29f47424a86b310a9ff4f
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023041023-unweave-bolt-64d3@gregkh' --subject-prefix 'PATCH 4.19.y' HEAD^..
Possible dependencies:
ecaa49024392 ("xhci: also avoid the XHCI_ZERO_64B_REGS quirk with a passthrough iommu")
f7fac17ca925 ("xhci: Convert xhci_handshake() to use readl_poll_timeout_atomic()")
05afde1a7ef3 ("xhci: Use device_iommu_mapped()")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From ecaa4902439298f6b0e29f47424a86b310a9ff4f Mon Sep 17 00:00:00 2001
From: D Scott Phillips <scott(a)os.amperecomputing.com>
Date: Thu, 30 Mar 2023 17:30:54 +0300
Subject: [PATCH] xhci: also avoid the XHCI_ZERO_64B_REGS quirk with a
passthrough iommu
Previously the quirk was skipped when no iommu was present. The same
rationale for skipping the quirk also applies in the iommu.passthrough=1
case.
Skip applying the XHCI_ZERO_64B_REGS quirk if the device's iommu domain is
passthrough.
Fixes: 12de0a35c996 ("xhci: Add quirk to zero 64bit registers on Renesas PCIe controllers")
Cc: stable <stable(a)kernel.org>
Signed-off-by: D Scott Phillips <scott(a)os.amperecomputing.com>
Acked-by: Marc Zyngier <maz(a)kernel.org>
Signed-off-by: Mathias Nyman <mathias.nyman(a)linux.intel.com>
Link: https://lore.kernel.org/r/20230330143056.1390020-2-mathias.nyman@linux.inte…
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
diff --git a/drivers/usb/host/xhci.c b/drivers/usb/host/xhci.c
index 6183ce8574b1..bdb6dd819a3b 100644
--- a/drivers/usb/host/xhci.c
+++ b/drivers/usb/host/xhci.c
@@ -9,6 +9,7 @@
*/
#include <linux/pci.h>
+#include <linux/iommu.h>
#include <linux/iopoll.h>
#include <linux/irq.h>
#include <linux/log2.h>
@@ -228,6 +229,7 @@ int xhci_reset(struct xhci_hcd *xhci, u64 timeout_us)
static void xhci_zero_64b_regs(struct xhci_hcd *xhci)
{
struct device *dev = xhci_to_hcd(xhci)->self.sysdev;
+ struct iommu_domain *domain;
int err, i;
u64 val;
u32 intrs;
@@ -246,7 +248,9 @@ static void xhci_zero_64b_regs(struct xhci_hcd *xhci)
* an iommu. Doing anything when there is no iommu is definitely
* unsafe...
*/
- if (!(xhci->quirks & XHCI_ZERO_64B_REGS) || !device_iommu_mapped(dev))
+ domain = iommu_get_domain_for_dev(dev);
+ if (!(xhci->quirks & XHCI_ZERO_64B_REGS) || !domain ||
+ domain->type == IOMMU_DOMAIN_IDENTITY)
return;
xhci_info(xhci, "Zeroing 64bit base registers, expecting fault\n");
Greetings.
I am Ms Nadage Lassou,I have something important to discuss with you.
i will send you the details once i hear from you.
Thanks,
Ms Nadage Lassou
In dual-bridges scenario, some bugs were found for irq
controllers drivers, so the patch serie is used to fix them.
V2->V3:
1. Add Cc: stable(a)vger.kernel.org to make them be backported to stable branch
V1->V2:
1. Remove all of ChangeID in patches
2. Exchange the sequence of some patches
3. Adjust code style of if...else... in patch[2]
Jianmin Lv (5):
irqchip/loongson-eiointc: Fix returned value on parsing MADT
irqchip/loongson-eiointc: Fix incorrect use of acpi_get_vec_parent
irqchip/loongson-eiointc: Fix registration of syscore_ops
irqchip/loongson-pch-pic: Fix registration of syscore_ops
irqchip/loongson-pch-pic: Fix pch_pic_acpi_init calling
drivers/irqchip/irq-loongson-eiointc.c | 32 ++++++++++++++++++--------
drivers/irqchip/irq-loongson-pch-pic.c | 6 ++++-
2 files changed, 27 insertions(+), 11 deletions(-)
--
2.31.1
The patch titled
Subject: scripts/gdb: fix lx-timerlist for Python3
has been added to the -mm mm-nonmm-unstable branch. Its filename is
scripts-gdb-fix-lx-timerlist-for-python3.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-nonmm-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Peng Liu <liupeng17(a)lenovo.com>
Subject: scripts/gdb: fix lx-timerlist for Python3
Date: Tue, 21 Mar 2023 14:19:29 +0800
Below incompatibilities between Python2 and Python3 made lx-timerlist fail
to run under Python3.
o xrange() is replaced by range() in Python3
o bytes and str are different types in Python3
o the return value of Inferior.read_memory() is memoryview object in
Python3
akpm: cc stable so that older kernels are properly debuggable under newer
Python.
Link: https://lkml.kernel.org/r/TYCP286MB2146EE1180A4D5176CBA8AB2C6819@TYCP286MB2…
Signed-off-by: Peng Liu <liupeng17(a)lenovo.com>
Reviewed-by: Jan Kiszka <jan.kiszka(a)siemens.com>
Cc: Florian Fainelli <f.fainelli(a)gmail.com>
Cc: Kieran Bingham <kbingham(a)kernel.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
scripts/gdb/linux/timerlist.py | 4 +++-
scripts/gdb/linux/utils.py | 5 ++++-
2 files changed, 7 insertions(+), 2 deletions(-)
--- a/scripts/gdb/linux/timerlist.py~scripts-gdb-fix-lx-timerlist-for-python3
+++ a/scripts/gdb/linux/timerlist.py
@@ -72,7 +72,7 @@ def print_cpu(hrtimer_bases, cpu, max_cl
ts = cpus.per_cpu(tick_sched_ptr, cpu)
text = "cpu: {}\n".format(cpu)
- for i in xrange(max_clock_bases):
+ for i in range(max_clock_bases):
text += " clock {}:\n".format(i)
text += print_base(cpu_base['clock_base'][i])
@@ -157,6 +157,8 @@ def pr_cpumask(mask):
num_bytes = (nr_cpu_ids + 7) / 8
buf = utils.read_memoryview(inf, bits, num_bytes).tobytes()
buf = binascii.b2a_hex(buf)
+ if type(buf) is not str:
+ buf=buf.decode()
chunks = []
i = num_bytes
--- a/scripts/gdb/linux/utils.py~scripts-gdb-fix-lx-timerlist-for-python3
+++ a/scripts/gdb/linux/utils.py
@@ -88,7 +88,10 @@ def get_target_endianness():
def read_memoryview(inf, start, length):
- return memoryview(inf.read_memory(start, length))
+ m = inf.read_memory(start, length)
+ if type(m) is memoryview:
+ return m
+ return memoryview(m)
def read_u16(buffer, offset):
_
Patches currently in -mm which might be from liupeng17(a)lenovo.com are
scripts-gdb-fix-lx-timerlist-for-struct-timequeue_head-change.patch
scripts-gdb-fix-lx-timerlist-for-python3.patch
scripts-gdb-fix-lx-timerlist-for-hrtimer_max_clock_bases-printing.patch
KASAN report null-ptr-deref:
==================================================================
BUG: KASAN: null-ptr-deref in bdi_split_work_to_wbs+0x6ce/0x6e0
Write of size 8 at addr 0000000000000000 by task syz-executor.3/3514
CPU: 3 PID: 3514 Comm: syz-executor.3 Not tainted 5.10.0-dirty #1
Call Trace:
dump_stack+0xbe/0xfd
__kasan_report.cold+0x34/0x84
kasan_report+0x3a/0x50
check_memory_region+0xfd/0x1f0
bdi_split_work_to_wbs+0x6ce/0x6e0
__writeback_inodes_sb_nr+0x184/0x1f0
try_to_writeback_inodes_sb+0x7f/0xa0
ext4_nonda_switch+0x125/0x130
ext4_da_write_begin+0x126/0x6e0
generic_perform_write+0x199/0x3a0
ext4_buffered_write_iter+0x16d/0x2b0
ext4_file_write_iter+0xea/0x140
new_sync_write+0x2fa/0x430
vfs_write+0x4a1/0x570
ksys_write+0xf6/0x1f0
do_syscall_64+0x30/0x40
entry_SYSCALL_64_after_hwframe+0x61/0xc6
RIP: 0033:0x45513d
[...]
==================================================================
Above issue may happen as follows:
cpu1 cpu2
----------------------------|----------------------------
ext4_nonda_switch
try_to_writeback_inodes_sb
__writeback_inodes_sb_nr
bdi_split_work_to_wbs
kmalloc(sizeof(*work), GFP_ATOMIC) ---> alloc mem failed
inode_switch_wbs
inode_switch_wbs_work_fn
wb_put_many
percpu_ref_put_many
ref->data->release(ref)
cgwb_release
&wb->release_work
cgwb_release_workfn
percpu_ref_exit
ref->data = NULL
kfree(data)
wb_get(wb)
percpu_ref_get(&wb->refcnt)
percpu_ref_get_many(ref, 1)
atomic_long_add(nr, &ref->data->count) ---> ref->data = NULL
atomic64_add(i, v) ---> trigger null-ptr-deref
bdi_split_work_to_wbs() traverses &bdi->wb_list to split work into all wbs.
If the allocation of new work fails, the on-stack fallback will be used and
the reference count of the current wb is increased afterwards. If cgroup
writeback membership switches occur before getting the reference count and
the current wb is released as old_wd, then calling wb_get() or wb_put()
will trigger the null pointer dereference above.
A similar problem is fixed in commit 7fc5854f8c6e ("writeback: synchronize
sync(2) against cgroup writeback membership switches"), but the patch only
adds locks to sync_inodes_sb() and not to the __writeback_inodes_sb_nr()
function that also calls bdi_split_work_to_wbs() function. So avoid the
above race by adding the same lock to __writeback_inodes_sb_nr() and
expanding the range of wb_switch_rwsem held in inode_switch_wbs_work_fn().
Fixes: b817525a4a80 ("writeback: bdi_writeback iteration must not skip dying ones")
Cc: stable(a)vger.kernel.org
Signed-off-by: Baokun Li <libaokun1(a)huawei.com>
---
fs/fs-writeback.c | 12 ++++++++++--
1 file changed, 10 insertions(+), 2 deletions(-)
diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index 195dc23e0d83..52825aaf549b 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -506,13 +506,13 @@ static void inode_switch_wbs_work_fn(struct work_struct *work)
spin_unlock(&new_wb->list_lock);
spin_unlock(&old_wb->list_lock);
- up_read(&bdi->wb_switch_rwsem);
-
if (nr_switched) {
wb_wakeup(new_wb);
wb_put_many(old_wb, nr_switched);
}
+ up_read(&bdi->wb_switch_rwsem);
+
for (inodep = isw->inodes; *inodep; inodep++)
iput(*inodep);
wb_put(new_wb);
@@ -936,6 +936,11 @@ static long wb_split_bdi_pages(struct bdi_writeback *wb, long nr_pages)
* have dirty inodes. If @base_work->nr_page isn't %LONG_MAX, it's
* distributed to the busy wbs according to each wb's proportion in the
* total active write bandwidth of @bdi.
+ *
+ * Called under &bdi->wb_switch_rwsem, otherwise bdi_split_work_to_wbs()
+ * may race against cgwb (cgroup writeback) membership switches, which may
+ * cause some inodes to fail to write back, or even trigger a null pointer
+ * dereference using a freed wb.
*/
static void bdi_split_work_to_wbs(struct backing_dev_info *bdi,
struct wb_writeback_work *base_work,
@@ -2637,8 +2642,11 @@ static void __writeback_inodes_sb_nr(struct super_block *sb, unsigned long nr,
return;
WARN_ON(!rwsem_is_locked(&sb->s_umount));
+ /* protect against inode wb switch, see inode_switch_wbs_work_fn() */
+ bdi_down_write_wb_switch_rwsem(bdi);
bdi_split_work_to_wbs(sb->s_bdi, &work, skip_if_busy);
wb_wait_for_completion(&done);
+ bdi_up_write_wb_switch_rwsem(bdi);
}
/**
--
2.31.1
Any multiplication between GENMASK(31, 0) and a number bigger than 1
will be truncated because of the overflow, if the size of unsigned long
is 32 bits.
Replaced GENMASK with GENMASK_ULL to make sure that multiplication will
be between 64 bits values.
Cc: <stable(a)vger.kernel.org> # 5.15+
Fixes: 514def5dd339 ("phy: nxp-c45-tja11xx: add timestamping support")
Signed-off-by: Radu Pirea (OSS) <radu-nicolae.pirea(a)oss.nxp.com>
---
drivers/net/phy/nxp-c45-tja11xx.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/phy/nxp-c45-tja11xx.c b/drivers/net/phy/nxp-c45-tja11xx.c
index 27738d1ae9ea..029875a59ff8 100644
--- a/drivers/net/phy/nxp-c45-tja11xx.c
+++ b/drivers/net/phy/nxp-c45-tja11xx.c
@@ -191,7 +191,7 @@
#define MAX_ID_PS 2260U
#define DEFAULT_ID_PS 2000U
-#define PPM_TO_SUBNS_INC(ppb) div_u64(GENMASK(31, 0) * (ppb) * \
+#define PPM_TO_SUBNS_INC(ppb) div_u64(GENMASK_ULL(31, 0) * (ppb) * \
PTP_CLK_PERIOD_100BT1, NSEC_PER_SEC)
#define NXP_C45_SKB_CB(skb) ((struct nxp_c45_skb_cb *)(skb)->cb)
--
2.34.1
By default the indirect state sampler data (border colors) are stored
in the same heap as the SAMPLER_STATE structure. For userspace drivers
that can be 2 different heaps (dynamic state heap & bindless sampler
state heap). This means that border colors have to copied in 2
different places so that the same SAMPLER_STATE structure find the
right data.
This change is forcing the indirect state sampler data to only be in
the dynamic state pool (more convinient for userspace drivers, they
only have to have one copy of the border colors). This is reproducing
the behavior of the Windows drivers.
BSpec: 46052
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin(a)intel.com>
Cc: stable(a)vger.kernel.org
Reviewed-by: Haridhar Kalvala <haridhar.kalvala(a)intel.com>
---
drivers/gpu/drm/i915/gt/intel_gt_regs.h | 1 +
drivers/gpu/drm/i915/gt/intel_workarounds.c | 19 +++++++++++++++++++
2 files changed, 20 insertions(+)
diff --git a/drivers/gpu/drm/i915/gt/intel_gt_regs.h b/drivers/gpu/drm/i915/gt/intel_gt_regs.h
index 492b3de6678d7..fd1f9cd35e9d7 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt_regs.h
+++ b/drivers/gpu/drm/i915/gt/intel_gt_regs.h
@@ -1145,6 +1145,7 @@
#define SC_DISABLE_POWER_OPTIMIZATION_EBB REG_BIT(9)
#define GEN11_SAMPLER_ENABLE_HEADLESS_MSG REG_BIT(5)
#define MTL_DISABLE_SAMPLER_SC_OOO REG_BIT(3)
+#define GEN11_INDIRECT_STATE_BASE_ADDR_OVERRIDE REG_BIT(0)
#define GEN9_HALF_SLICE_CHICKEN7 MCR_REG(0xe194)
#define DG2_DISABLE_ROUND_ENABLE_ALLOW_FOR_SSLA REG_BIT(15)
diff --git a/drivers/gpu/drm/i915/gt/intel_workarounds.c b/drivers/gpu/drm/i915/gt/intel_workarounds.c
index 6ea453ddd0116..b925ef47304b6 100644
--- a/drivers/gpu/drm/i915/gt/intel_workarounds.c
+++ b/drivers/gpu/drm/i915/gt/intel_workarounds.c
@@ -2971,6 +2971,25 @@ general_render_compute_wa_init(struct intel_engine_cs *engine, struct i915_wa_li
add_render_compute_tuning_settings(i915, wal);
+ if (GRAPHICS_VER(i915) >= 11) {
+ /* This is not a Wa (although referred to as
+ * WaSetInidrectStateOverride in places), this allows
+ * applications that reference sampler states through
+ * the BindlessSamplerStateBaseAddress to have their
+ * border color relative to DynamicStateBaseAddress
+ * rather than BindlessSamplerStateBaseAddress.
+ *
+ * Otherwise SAMPLER_STATE border colors have to be
+ * copied in multiple heaps (DynamicStateBaseAddress &
+ * BindlessSamplerStateBaseAddress)
+ *
+ * BSpec: 46052
+ */
+ wa_mcr_masked_en(wal,
+ GEN10_SAMPLER_MODE,
+ GEN11_INDIRECT_STATE_BASE_ADDR_OVERRIDE);
+ }
+
if (IS_MTL_GRAPHICS_STEP(i915, M, STEP_B0, STEP_FOREVER) ||
IS_MTL_GRAPHICS_STEP(i915, P, STEP_B0, STEP_FOREVER))
/* Wa_14017856879 */
--
2.34.1
The patch titled
Subject: maple_tree: fix a potential memory leak, OOB access, or other unpredictable bug
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
maple_tree-fix-a-potential-memory-leak-oob-access-or-other-unpredictable-bug.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Peng Zhang <zhangpeng.00(a)bytedance.com>
Subject: maple_tree: fix a potential memory leak, OOB access, or other unpredictable bug
Date: Fri, 7 Apr 2023 12:07:18 +0800
In mas_alloc_nodes(), there is such a piece of code:
while (requested) {
...
node->node_count = 0;
...
}
"node->node_count = 0" means to initialize the node_count field of the new
node, but the node may not be a new node. It may be a node that existed
before and node_count has a value, setting it to 0 will cause a memory
leak. At this time, mas->alloc->total will be greater than the actual
number of nodes in the linked list, which may cause many other errors.
For example, out-of-bounds access in mas_pop_node(), and mas_pop_node()
may return addresses that should not be used. Fix it by initializing
node_count only for new nodes.
Link: https://lkml.kernel.org/r/20230407040718.99064-2-zhangpeng.00@bytedance.com
Fixes: 54a611b60590 ("Maple Tree: add new data structure")
Signed-off-by: Peng Zhang <zhangpeng.00(a)bytedance.com>
Cc: Liam R. Howlett <Liam.Howlett(a)Oracle.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
lib/maple_tree.c | 16 ++++------------
1 file changed, 4 insertions(+), 12 deletions(-)
--- a/lib/maple_tree.c~maple_tree-fix-a-potential-memory-leak-oob-access-or-other-unpredictable-bug
+++ a/lib/maple_tree.c
@@ -1303,26 +1303,18 @@ static inline void mas_alloc_nodes(struc
node = mas->alloc;
node->request_count = 0;
while (requested) {
- max_req = MAPLE_ALLOC_SLOTS;
- if (node->node_count) {
- unsigned int offset = node->node_count;
-
- slots = (void **)&node->slot[offset];
- max_req -= offset;
- } else {
- slots = (void **)&node->slot;
- }
-
+ max_req = MAPLE_ALLOC_SLOTS - node->node_count;
+ slots = (void **)&node->slot[node->node_count];
max_req = min(requested, max_req);
count = mt_alloc_bulk(gfp, max_req, slots);
if (!count)
goto nomem_bulk;
+ if (node->node_count == 0)
+ node->slot[0]->node_count = 0;
node->node_count += count;
allocated += count;
node = node->slot[0];
- node->node_count = 0;
- node->request_count = 0;
requested -= count;
}
mas->alloc->total = allocated;
_
Patches currently in -mm which might be from zhangpeng.00(a)bytedance.com are
maple_tree-fix-a-potential-memory-leak-oob-access-or-other-unpredictable-bug.patch
mm-kfence-improve-the-performance-of-__kfence_alloc-and-__kfence_free.patch
maple_tree-simplify-mas_wr_node_walk.patch
This reverts commit d6af2ed29c7c1c311b96dac989dcb991e90ee195.
The statement "the hv_pci_bus_exit() call releases structures of all its
child devices" in commit d6af2ed29c7c is not true: in the path
hv_pci_probe() -> hv_pci_enter_d0() -> hv_pci_bus_exit(hdev, true): the
parameter "keep_devs" is true, so hv_pci_bus_exit() does *not* release the
child "struct hv_pci_dev *hpdev" that is created earlier in
pci_devices_present_work() -> new_pcichild_device().
The commit d6af2ed29c7c was originally made in July 2020 for RHEL 7.7,
where the old version of hv_pci_bus_exit() was used; when the commit was
rebased and merged into the upstream, people didn't notice that it's
not really necessary. The commit itself doesn't cause any issue, but it
makes hv_pci_probe() more complicated. Revert it to facilitate some
upcoming changes to hv_pci_probe().
Signed-off-by: Dexuan Cui <decui(a)microsoft.com>
Acked-by: Wei Hu <weh(a)microsoft.com>
Cc: stable(a)vger.kernel.org
---
v2:
No change to the patch body.
Added Wei Hu's Acked-by.
Added Cc:stable
drivers/pci/controller/pci-hyperv.c | 71 ++++++++++++++---------------
1 file changed, 34 insertions(+), 37 deletions(-)
diff --git a/drivers/pci/controller/pci-hyperv.c b/drivers/pci/controller/pci-hyperv.c
index 46df6d093d683..48feab095a144 100644
--- a/drivers/pci/controller/pci-hyperv.c
+++ b/drivers/pci/controller/pci-hyperv.c
@@ -3225,8 +3225,10 @@ static int hv_pci_enter_d0(struct hv_device *hdev)
struct pci_bus_d0_entry *d0_entry;
struct hv_pci_compl comp_pkt;
struct pci_packet *pkt;
+ bool retry = true;
int ret;
+enter_d0_retry:
/*
* Tell the host that the bus is ready to use, and moved into the
* powered-on state. This includes telling the host which region
@@ -3253,6 +3255,38 @@ static int hv_pci_enter_d0(struct hv_device *hdev)
if (ret)
goto exit;
+ /*
+ * In certain case (Kdump) the pci device of interest was
+ * not cleanly shut down and resource is still held on host
+ * side, the host could return invalid device status.
+ * We need to explicitly request host to release the resource
+ * and try to enter D0 again.
+ */
+ if (comp_pkt.completion_status < 0 && retry) {
+ retry = false;
+
+ dev_err(&hdev->device, "Retrying D0 Entry\n");
+
+ /*
+ * Hv_pci_bus_exit() calls hv_send_resource_released()
+ * to free up resources of its child devices.
+ * In the kdump kernel we need to set the
+ * wslot_res_allocated to 255 so it scans all child
+ * devices to release resources allocated in the
+ * normal kernel before panic happened.
+ */
+ hbus->wslot_res_allocated = 255;
+
+ ret = hv_pci_bus_exit(hdev, true);
+
+ if (ret == 0) {
+ kfree(pkt);
+ goto enter_d0_retry;
+ }
+ dev_err(&hdev->device,
+ "Retrying D0 failed with ret %d\n", ret);
+ }
+
if (comp_pkt.completion_status < 0) {
dev_err(&hdev->device,
"PCI Pass-through VSP failed D0 Entry with status %x\n",
@@ -3493,7 +3527,6 @@ static int hv_pci_probe(struct hv_device *hdev,
struct hv_pcibus_device *hbus;
u16 dom_req, dom;
char *name;
- bool enter_d0_retry = true;
int ret;
/*
@@ -3633,47 +3666,11 @@ static int hv_pci_probe(struct hv_device *hdev,
if (ret)
goto free_fwnode;
-retry:
ret = hv_pci_query_relations(hdev);
if (ret)
goto free_irq_domain;
ret = hv_pci_enter_d0(hdev);
- /*
- * In certain case (Kdump) the pci device of interest was
- * not cleanly shut down and resource is still held on host
- * side, the host could return invalid device status.
- * We need to explicitly request host to release the resource
- * and try to enter D0 again.
- * Since the hv_pci_bus_exit() call releases structures
- * of all its child devices, we need to start the retry from
- * hv_pci_query_relations() call, requesting host to send
- * the synchronous child device relations message before this
- * information is needed in hv_send_resources_allocated()
- * call later.
- */
- if (ret == -EPROTO && enter_d0_retry) {
- enter_d0_retry = false;
-
- dev_err(&hdev->device, "Retrying D0 Entry\n");
-
- /*
- * Hv_pci_bus_exit() calls hv_send_resources_released()
- * to free up resources of its child devices.
- * In the kdump kernel we need to set the
- * wslot_res_allocated to 255 so it scans all child
- * devices to release resources allocated in the
- * normal kernel before panic happened.
- */
- hbus->wslot_res_allocated = 255;
- ret = hv_pci_bus_exit(hdev, true);
-
- if (ret == 0)
- goto retry;
-
- dev_err(&hdev->device,
- "Retrying D0 failed with ret %d\n", ret);
- }
if (ret)
goto free_irq_domain;
--
2.25.1
In the case of fast device addition/removal, it's possible that
hv_eject_device_work() can start to run before create_root_hv_pci_bus()
starts to run; as a result, the pci_get_domain_bus_and_slot() in
hv_eject_device_work() can return a 'pdev' of NULL, and
hv_eject_device_work() can remove the 'hpdev', and immediately send a
message PCI_EJECTION_COMPLETE to the host, and the host immediately
unassigns the PCI device from the guest; meanwhile,
create_root_hv_pci_bus() and the PCI device driver can be probing the
dead PCI device and reporting timeout errors.
Fix the issue by adding a per-bus mutex 'state_lock' and grabbing the
mutex before powering on the PCI bus in hv_pci_enter_d0(): when
hv_eject_device_work() starts to run, it's able to find the 'pdev' and call
pci_stop_and_remove_bus_device(pdev): if the PCI device driver has
loaded, the PCI device driver's probe() function is already called in
create_root_hv_pci_bus() -> pci_bus_add_devices(), and now
hv_eject_device_work() -> pci_stop_and_remove_bus_device() is able
to call the PCI device driver's remove() function and remove the device
reliably; if the PCI device driver hasn't loaded yet, the function call
hv_eject_device_work() -> pci_stop_and_remove_bus_device() is able to
remove the PCI device reliably and the PCI device driver's probe()
function won't be called; if the PCI device driver's probe() is already
running (e.g., systemd-udev is loading the PCI device driver), it must
be holding the per-device lock, and after the probe() finishes and releases
the lock, hv_eject_device_work() -> pci_stop_and_remove_bus_device() is
able to proceed to remove the device reliably.
Fixes: 4daace0d8ce8 ("PCI: hv: Add paravirtual PCI front-end for Microsoft Hyper-V VMs")
Signed-off-by: Dexuan Cui <decui(a)microsoft.com>
Cc: stable(a)vger.kernel.org
---
v2:
Removed the "debug code".
Fixed the "goto out" in hv_pci_resume() [Michael Kelley]
Added Cc:stable
drivers/pci/controller/pci-hyperv.c | 29 ++++++++++++++++++++++++++---
1 file changed, 26 insertions(+), 3 deletions(-)
diff --git a/drivers/pci/controller/pci-hyperv.c b/drivers/pci/controller/pci-hyperv.c
index 48feab095a144..3ae2f99dea8c2 100644
--- a/drivers/pci/controller/pci-hyperv.c
+++ b/drivers/pci/controller/pci-hyperv.c
@@ -489,7 +489,10 @@ struct hv_pcibus_device {
struct fwnode_handle *fwnode;
/* Protocol version negotiated with the host */
enum pci_protocol_version_t protocol_version;
+
+ struct mutex state_lock;
enum hv_pcibus_state state;
+
struct hv_device *hdev;
resource_size_t low_mmio_space;
resource_size_t high_mmio_space;
@@ -2512,6 +2515,8 @@ static void pci_devices_present_work(struct work_struct *work)
if (!dr)
return;
+ mutex_lock(&hbus->state_lock);
+
/* First, mark all existing children as reported missing. */
spin_lock_irqsave(&hbus->device_list_lock, flags);
list_for_each_entry(hpdev, &hbus->children, list_entry) {
@@ -2593,6 +2598,8 @@ static void pci_devices_present_work(struct work_struct *work)
break;
}
+ mutex_unlock(&hbus->state_lock);
+
kfree(dr);
}
@@ -2741,6 +2748,8 @@ static void hv_eject_device_work(struct work_struct *work)
hpdev = container_of(work, struct hv_pci_dev, wrk);
hbus = hpdev->hbus;
+ mutex_lock(&hbus->state_lock);
+
/*
* Ejection can come before or after the PCI bus has been set up, so
* attempt to find it and tear down the bus state, if it exists. This
@@ -2777,6 +2786,8 @@ static void hv_eject_device_work(struct work_struct *work)
put_pcichild(hpdev);
put_pcichild(hpdev);
/* hpdev has been freed. Do not use it any more. */
+
+ mutex_unlock(&hbus->state_lock);
}
/**
@@ -3562,6 +3573,7 @@ static int hv_pci_probe(struct hv_device *hdev,
return -ENOMEM;
hbus->bridge = bridge;
+ mutex_init(&hbus->state_lock);
hbus->state = hv_pcibus_init;
hbus->wslot_res_allocated = -1;
@@ -3670,9 +3682,11 @@ static int hv_pci_probe(struct hv_device *hdev,
if (ret)
goto free_irq_domain;
+ mutex_lock(&hbus->state_lock);
+
ret = hv_pci_enter_d0(hdev);
if (ret)
- goto free_irq_domain;
+ goto release_state_lock;
ret = hv_pci_allocate_bridge_windows(hbus);
if (ret)
@@ -3690,12 +3704,15 @@ static int hv_pci_probe(struct hv_device *hdev,
if (ret)
goto free_windows;
+ mutex_unlock(&hbus->state_lock);
return 0;
free_windows:
hv_pci_free_bridge_windows(hbus);
exit_d0:
(void) hv_pci_bus_exit(hdev, true);
+release_state_lock:
+ mutex_unlock(&hbus->state_lock);
free_irq_domain:
irq_domain_remove(hbus->irq_domain);
free_fwnode:
@@ -3945,20 +3962,26 @@ static int hv_pci_resume(struct hv_device *hdev)
if (ret)
goto out;
+ mutex_lock(&hbus->state_lock);
+
ret = hv_pci_enter_d0(hdev);
if (ret)
- goto out;
+ goto release_state_lock;
ret = hv_send_resources_allocated(hdev);
if (ret)
- goto out;
+ goto release_state_lock;
prepopulate_bars(hbus);
hv_pci_restore_msi_state(hbus);
hbus->state = hv_pcibus_installed;
+ mutex_unlock(&hbus->state_lock);
return 0;
+
+release_state_lock:
+ mutex_unlock(&hbus->state_lock);
out:
vmbus_close(hdev->channel);
return ret;
--
2.25.1
The hpdev->state is never really useful. The only use in
hv_pci_eject_device() and hv_eject_device_work() is not really necessary.
Signed-off-by: Dexuan Cui <decui(a)microsoft.com>
Cc: stable(a)vger.kernel.org
---
drivers/pci/controller/pci-hyperv.c | 12 ------------
1 file changed, 12 deletions(-)
v2:
No change to the patch body.
Added Cc:stable
diff --git a/drivers/pci/controller/pci-hyperv.c b/drivers/pci/controller/pci-hyperv.c
index 1b11cf7391933..46df6d093d683 100644
--- a/drivers/pci/controller/pci-hyperv.c
+++ b/drivers/pci/controller/pci-hyperv.c
@@ -553,19 +553,10 @@ struct hv_dr_state {
struct hv_pcidev_description func[];
};
-enum hv_pcichild_state {
- hv_pcichild_init = 0,
- hv_pcichild_requirements,
- hv_pcichild_resourced,
- hv_pcichild_ejecting,
- hv_pcichild_maximum
-};
-
struct hv_pci_dev {
/* List protected by pci_rescan_remove_lock */
struct list_head list_entry;
refcount_t refs;
- enum hv_pcichild_state state;
struct pci_slot *pci_slot;
struct hv_pcidev_description desc;
bool reported_missing;
@@ -2750,8 +2741,6 @@ static void hv_eject_device_work(struct work_struct *work)
hpdev = container_of(work, struct hv_pci_dev, wrk);
hbus = hpdev->hbus;
- WARN_ON(hpdev->state != hv_pcichild_ejecting);
-
/*
* Ejection can come before or after the PCI bus has been set up, so
* attempt to find it and tear down the bus state, if it exists. This
@@ -2808,7 +2797,6 @@ static void hv_pci_eject_device(struct hv_pci_dev *hpdev)
return;
}
- hpdev->state = hv_pcichild_ejecting;
get_pcichild(hpdev);
INIT_WORK(&hpdev->wrk, hv_eject_device_work);
queue_work(hbus->wq, &hpdev->wrk);
--
2.25.1
When the host tries to remove a PCI device, the host first sends a
PCI_EJECT message to the guest, and the guest is supposed to gracefully
remove the PCI device and send a PCI_EJECTION_COMPLETE message to the host;
the host then sends a VMBus message CHANNELMSG_RESCIND_CHANNELOFFER to
the guest (when the guest receives this message, the device is already
unassigned from the guest) and the guest can do some final cleanup work;
if the guest fails to respond to the PCI_EJECT message within one minute,
the host sends the VMBus message CHANNELMSG_RESCIND_CHANNELOFFER and
removes the PCI device forcibly.
In the case of fast device addition/removal, it's possible that the PCI
device driver is still configuring MSI-X interrupts when the guest receives
the PCI_EJECT message; the channel callback calls hv_pci_eject_device(),
which sets hpdev->state to hv_pcichild_ejecting, and schedules a work
hv_eject_device_work(); if the PCI device driver is calling
pci_alloc_irq_vectors() -> ... -> hv_compose_msi_msg(), we can break the
while loop in hv_compose_msi_msg() due to the updated hpdev->state, and
leave data->chip_data with its default value of NULL; later, when the PCI
device driver calls request_irq() -> ... -> hv_irq_unmask(), the guest
crashes in hv_arch_irq_unmask() due to data->chip_data being NULL.
Fix the issue by not testing hpdev->state in the while loop: when the
guest receives PCI_EJECT, the device is still assigned to the guest, and
the guest has one minute to finish the device removal gracefully. We don't
really need to (and we should not) test hpdev->state in the loop.
Fixes: de0aa7b2f97d ("PCI: hv: Fix 2 hang issues in hv_compose_msi_msg()")
Signed-off-by: Dexuan Cui <decui(a)microsoft.com>
Cc: stable(a)vger.kernel.org
---
v2:
Removed the "debug code".
No change to the patch body.
Added Cc:stable
drivers/pci/controller/pci-hyperv.c | 11 +++++------
1 file changed, 5 insertions(+), 6 deletions(-)
diff --git a/drivers/pci/controller/pci-hyperv.c b/drivers/pci/controller/pci-hyperv.c
index b82c7cde19e66..1b11cf7391933 100644
--- a/drivers/pci/controller/pci-hyperv.c
+++ b/drivers/pci/controller/pci-hyperv.c
@@ -643,6 +643,11 @@ static void hv_arch_irq_unmask(struct irq_data *data)
pbus = pdev->bus;
hbus = container_of(pbus->sysdata, struct hv_pcibus_device, sysdata);
int_desc = data->chip_data;
+ if (!int_desc) {
+ dev_warn(&hbus->hdev->device, "%s() can not unmask irq %u\n",
+ __func__, data->irq);
+ return;
+ }
spin_lock_irqsave(&hbus->retarget_msi_interrupt_lock, flags);
@@ -1911,12 +1916,6 @@ static void hv_compose_msi_msg(struct irq_data *data, struct msi_msg *msg)
hv_pci_onchannelcallback(hbus);
spin_unlock_irqrestore(&channel->sched_lock, flags);
- if (hpdev->state == hv_pcichild_ejecting) {
- dev_err_once(&hbus->hdev->device,
- "the device is being ejected\n");
- goto enable_tasklet;
- }
-
udelay(100);
}
--
2.25.1
Fix the longstanding race between hv_pci_query_relations() and
survey_child_resources() by flushing the workqueue before we exit from
hv_pci_query_relations().
Fixes: 4daace0d8ce8 ("PCI: hv: Add paravirtual PCI front-end for Microsoft Hyper-V VMs")
Signed-off-by: Dexuan Cui <decui(a)microsoft.com>
Cc: stable(a)vger.kernel.org
---
v2:
Removed the "debug code".
No change to the patch body.
Added Cc:stable
drivers/pci/controller/pci-hyperv.c | 13 +++++++++++++
1 file changed, 13 insertions(+)
diff --git a/drivers/pci/controller/pci-hyperv.c b/drivers/pci/controller/pci-hyperv.c
index f33370b756283..b82c7cde19e66 100644
--- a/drivers/pci/controller/pci-hyperv.c
+++ b/drivers/pci/controller/pci-hyperv.c
@@ -3308,6 +3308,19 @@ static int hv_pci_query_relations(struct hv_device *hdev)
if (!ret)
ret = wait_for_response(hdev, &comp);
+ /*
+ * In the case of fast device addition/removal, it's possible that
+ * vmbus_sendpacket() or wait_for_response() returns -ENODEV but we
+ * already got a PCI_BUS_RELATIONS* message from the host and the
+ * channel callback already scheduled a work to hbus->wq, which can be
+ * running survey_child_resources() -> complete(&hbus->survey_event),
+ * even after hv_pci_query_relations() exits and the stack variable
+ * 'comp' is no longer valid. This can cause a strange hang issue
+ * or sometimes a page fault. Flush hbus->wq before we exit from
+ * hv_pci_query_relations() to avoid the issues.
+ */
+ flush_workqueue(hbus->wq);
+
return ret;
}
--
2.25.1
Since 32ef9e5054ec, -Wa,-gdwarf-2 is no longer used in KBUILD_AFLAGS.
Instead, it includes -g, the appropriate -gdwarf-* flag, and also the
-Wa versions of both of those if building with Clang and GNU as. As a
result, debug info was being generated for the purgatory objects, even
though the intention was that it not be.
Fixes: 32ef9e5054ec ("Makefile.debug: re-enable debug info for .S files")
Signed-off-by: Alyssa Ross <hi(a)alyssa.is>
Cc: stable(a)vger.kernel.org
---
Difference from v2: replace each AFLAGS_REMOVE_* assignment with a
single aflags-remove-y line, and use foreach to add the -Wa versions,
as suggested by Masahiro Yamada.
arch/riscv/purgatory/Makefile | 7 +------
arch/x86/purgatory/Makefile | 3 +--
2 files changed, 2 insertions(+), 8 deletions(-)
diff --git a/arch/riscv/purgatory/Makefile b/arch/riscv/purgatory/Makefile
index d16bf715a586..5730797a6b40 100644
--- a/arch/riscv/purgatory/Makefile
+++ b/arch/riscv/purgatory/Makefile
@@ -84,12 +84,7 @@ CFLAGS_string.o += $(PURGATORY_CFLAGS)
CFLAGS_REMOVE_ctype.o += $(PURGATORY_CFLAGS_REMOVE)
CFLAGS_ctype.o += $(PURGATORY_CFLAGS)
-AFLAGS_REMOVE_entry.o += -Wa,-gdwarf-2
-AFLAGS_REMOVE_memcpy.o += -Wa,-gdwarf-2
-AFLAGS_REMOVE_memset.o += -Wa,-gdwarf-2
-AFLAGS_REMOVE_strcmp.o += -Wa,-gdwarf-2
-AFLAGS_REMOVE_strlen.o += -Wa,-gdwarf-2
-AFLAGS_REMOVE_strncmp.o += -Wa,-gdwarf-2
+asflags-remove-y += $(foreach x, -g -gdwarf-4 -gdwarf-5, $(x) -Wa,$(x))
$(obj)/purgatory.ro: $(PURGATORY_OBJS) FORCE
$(call if_changed,ld)
diff --git a/arch/x86/purgatory/Makefile b/arch/x86/purgatory/Makefile
index 17f09dc26381..82fec66d46d2 100644
--- a/arch/x86/purgatory/Makefile
+++ b/arch/x86/purgatory/Makefile
@@ -69,8 +69,7 @@ CFLAGS_sha256.o += $(PURGATORY_CFLAGS)
CFLAGS_REMOVE_string.o += $(PURGATORY_CFLAGS_REMOVE)
CFLAGS_string.o += $(PURGATORY_CFLAGS)
-AFLAGS_REMOVE_setup-x86_$(BITS).o += -Wa,-gdwarf-2
-AFLAGS_REMOVE_entry64.o += -Wa,-gdwarf-2
+asflags-remove-y += $(foreach x, -g -gdwarf-4 -gdwarf-5, $(x) -Wa,$(x))
$(obj)/purgatory.ro: $(PURGATORY_OBJS) FORCE
$(call if_changed,ld)
base-commit: da8e7da11e4ba758caf4c149cc8d8cd555aefe5f
--
2.37.1
Hi! Since kernel 6.1.22 starting a resume from hibernate by hitting a
key on the keyboard fails. However, if the PC was switched off and on
again (or reset), the resume is OK. The APU is a Ryzen 5600G.
Bisecting between 6.1.21/22 turned up this:
Author: Tim Huang <tim.huang(a)amd.com>
Date: Thu Mar 9 16:27:51 2023 +0800
drm/amdgpu: skip ASIC reset for APUs when go to S4
commit b589626674de94d977e81c99bf7905872b991197 upstream.
For GC IP v11.0.4/11, PSP TMR need to be reserved
for ASIC mode2 reset. But for S4, when psp suspend,
it will destroy the TMR that fails the ASIC reset.
[...]
Reverting the commit solves the problem.
Thanks.
Rainer Fiebig
--
The truth always turns out to be simpler than you thought.
Richard Feynman