This patch series backports a few VM preemption_status, steal_time and
PV TLB flushing fixes to 5.10 stable kernel.
Most of the changes backport cleanly except i had to work around a few
becauseof missing support/APIs in 5.10 kernel. I have captured those in
the changelog as well in the individual patches.
Changelog
- Use mark_page_dirty_in_slot api without kvm argument (KVM: x86: Fix
recording of guest steal time / preempted status)
- Avoid checking for xen_msr and SEV-ES conditions (KVM: x86:
do not set st->preempted when going back to user space)
- Use VCPU_STAT macro to expose preemption_reported and
preemption_other fields (KVM: x86: do not report a vCPU as preempted
outside instruction boundaries)
David Woodhouse (2):
KVM: x86: Fix recording of guest steal time / preempted status
KVM: Fix steal time asm constraints
Lai Jiangshan (1):
KVM: x86: Ensure PV TLB flush tracepoint reflects KVM behavior
Paolo Bonzini (5):
KVM: x86: do not set st->preempted when going back to user space
KVM: x86: do not report a vCPU as preempted outside instruction
boundaries
KVM: x86: revalidate steal time cache if MSR value changes
KVM: x86: do not report preemption if the steal time cache is stale
KVM: x86: move guest_pv_has out of user_access section
Sean Christopherson (1):
KVM: x86: Remove obsolete disabling of page faults in
kvm_arch_vcpu_put()
arch/x86/include/asm/kvm_host.h | 5 +-
arch/x86/kvm/svm/svm.c | 2 +
arch/x86/kvm/vmx/vmx.c | 1 +
arch/x86/kvm/x86.c | 164 ++++++++++++++++++++++----------
4 files changed, 122 insertions(+), 50 deletions(-)
--
2.37.1
Apparently despite it being marked inline, the compiler
may not inline __down_read_common() which makes it difficult
to identify the cause of lock contention, as the blocked
function in traceevents will always be listed as
__down_read_common().
So this patch adds __always_inline annotation to the common
function (as well as the inlined helper callers) to force it to
be inlined so the blocking function will be listed (via Wchan)
in traceevents.
Cc: Minchan Kim <minchan(a)kernel.org>
Cc: Tim Murray <timmurray(a)google.com>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Ingo Molnar <mingo(a)redhat.com>
Cc: Will Deacon <will(a)kernel.org>
Cc: Waiman Long <longman(a)redhat.com>
Cc: Boqun Feng <boqun.feng(a)gmail.com>
Cc: kernel-team(a)android.com
Cc: stable(a)vger.kernel.org
Fixes: c995e638ccbb ("locking/rwsem: Fold __down_{read,write}*()")
Reviewed-by: Waiman Long <longman(a)redhat.com>
Reported-by: Tim Murray <timmurray(a)google.com>
Signed-off-by: John Stultz <jstultz(a)google.com>
---
v2: Reworked to use __always_inline instead of __sched as
suggested by Waiman Long
v3: Add __always_inline annotations to currently inlined users
of __down_read_common() to avoid the compiler later doing the
same thing there. (Suggested by Peter).
---
kernel/locking/rwsem.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/kernel/locking/rwsem.c b/kernel/locking/rwsem.c
index acb5a50309a1..9eabd585ce7a 100644
--- a/kernel/locking/rwsem.c
+++ b/kernel/locking/rwsem.c
@@ -1240,7 +1240,7 @@ static struct rw_semaphore *rwsem_downgrade_wake(struct rw_semaphore *sem)
/*
* lock for reading
*/
-static inline int __down_read_common(struct rw_semaphore *sem, int state)
+static __always_inline int __down_read_common(struct rw_semaphore *sem, int state)
{
int ret = 0;
long count;
@@ -1258,17 +1258,17 @@ static inline int __down_read_common(struct rw_semaphore *sem, int state)
return ret;
}
-static inline void __down_read(struct rw_semaphore *sem)
+static __always_inline void __down_read(struct rw_semaphore *sem)
{
__down_read_common(sem, TASK_UNINTERRUPTIBLE);
}
-static inline int __down_read_interruptible(struct rw_semaphore *sem)
+static __always_inline int __down_read_interruptible(struct rw_semaphore *sem)
{
return __down_read_common(sem, TASK_INTERRUPTIBLE);
}
-static inline int __down_read_killable(struct rw_semaphore *sem)
+static __always_inline int __down_read_killable(struct rw_semaphore *sem)
{
return __down_read_common(sem, TASK_KILLABLE);
}
--
2.40.1.495.gc816e09b53d-goog
The "serial/8250: Use fifo in 8250 console driver" commit
has revealed an issue of not re-enabling FIFO after resume.
The problematic path is inside uart_resume_port() function.
First, when the console device is re-enabled,
a call to uport->ops->set_termios() internally initializes FIFO
(in serial8250_do_set_termios()), although further code
disables it by issuing ops->startup() (pointer to serial8250_do_startup,
internally calling serial8250_clear_fifos()).
There is even a comment saying "Clear the FIFO buffers and disable them.
(they will be reenabled in set_termios())", but in this scenario,
set_termios() has been called already and FIFO remains disabled.
This patch address the issue by reversing the order - first checks
if tty port is suspended and performs actions accordingly
(e.g. call to ops->startup()), then tries to re-enable
the console device after suspend (and call to uport->ops->set_termios()).
Signed-off-by: Lukasz Majczak <lma(a)semihalf.com>
Cc: <stable(a)vger.kernel.org> # 6.1+
---
drivers/tty/serial/serial_core.c | 54 ++++++++++++++++----------------
1 file changed, 27 insertions(+), 27 deletions(-)
diff --git a/drivers/tty/serial/serial_core.c b/drivers/tty/serial/serial_core.c
index 394a05c09d87..57a153adba3a 100644
--- a/drivers/tty/serial/serial_core.c
+++ b/drivers/tty/serial/serial_core.c
@@ -2406,33 +2406,6 @@ int uart_resume_port(struct uart_driver *drv, struct uart_port *uport)
put_device(tty_dev);
uport->suspended = 0;
- /*
- * Re-enable the console device after suspending.
- */
- if (uart_console(uport)) {
- /*
- * First try to use the console cflag setting.
- */
- memset(&termios, 0, sizeof(struct ktermios));
- termios.c_cflag = uport->cons->cflag;
- termios.c_ispeed = uport->cons->ispeed;
- termios.c_ospeed = uport->cons->ospeed;
-
- /*
- * If that's unset, use the tty termios setting.
- */
- if (port->tty && termios.c_cflag == 0)
- termios = port->tty->termios;
-
- if (console_suspend_enabled)
- uart_change_pm(state, UART_PM_STATE_ON);
- uport->ops->set_termios(uport, &termios, NULL);
- if (!console_suspend_enabled && uport->ops->start_rx)
- uport->ops->start_rx(uport);
- if (console_suspend_enabled)
- console_start(uport->cons);
- }
-
if (tty_port_suspended(port)) {
const struct uart_ops *ops = uport->ops;
int ret;
@@ -2471,6 +2444,33 @@ int uart_resume_port(struct uart_driver *drv, struct uart_port *uport)
tty_port_set_suspended(port, false);
}
+ /*
+ * Re-enable the console device after suspending.
+ */
+ if (uart_console(uport)) {
+ /*
+ * First try to use the console cflag setting.
+ */
+ memset(&termios, 0, sizeof(struct ktermios));
+ termios.c_cflag = uport->cons->cflag;
+ termios.c_ispeed = uport->cons->ispeed;
+ termios.c_ospeed = uport->cons->ospeed;
+
+ /*
+ * If that's unset, use the tty termios setting.
+ */
+ if (port->tty && termios.c_cflag == 0)
+ termios = port->tty->termios;
+
+ if (console_suspend_enabled)
+ uart_change_pm(state, UART_PM_STATE_ON);
+ uport->ops->set_termios(uport, &termios, NULL);
+ if (!console_suspend_enabled && uport->ops->start_rx)
+ uport->ops->start_rx(uport);
+ if (console_suspend_enabled)
+ console_start(uport->cons);
+ }
+
mutex_unlock(&port->mutex);
return 0;
--
2.40.0.577.gac1e443424-goog
From: Xiubo Li <xiubli(a)redhat.com>
Blindly expanding the readahead windows will cause unneccessary
pagecache thrashing and also will introdue the network workload.
We should disable expanding the windows if the readahead is disabled
and also shouldn't expand the windows too much.
Expanding forward firstly instead of expanding backward for possible
sequential reads.
Bound `rreq->len` to the actual file size to restore the previous page
cache usage.
Cc: stable(a)vger.kernel.org
Fixes: 49870056005c ("ceph: convert ceph_readpages to ceph_readahead")
URL: https://lore.kernel.org/ceph-devel/20230504082510.247-1-sehuww@mail.scut.ed…
URL: https://www.spinics.net/lists/ceph-users/msg76183.html
Cc: Hu Weiwen <sehuww(a)mail.scut.edu.cn>
Signed-off-by: Xiubo Li <xiubli(a)redhat.com>
---
V4:
- two small cleanup from Ilya's comments. Thanks
fs/ceph/addr.c | 28 +++++++++++++++++++++-------
1 file changed, 21 insertions(+), 7 deletions(-)
diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c
index ca4dc6450887..683ba9fbd590 100644
--- a/fs/ceph/addr.c
+++ b/fs/ceph/addr.c
@@ -188,16 +188,30 @@ static void ceph_netfs_expand_readahead(struct netfs_io_request *rreq)
struct inode *inode = rreq->inode;
struct ceph_inode_info *ci = ceph_inode(inode);
struct ceph_file_layout *lo = &ci->i_layout;
+ unsigned long max_pages = inode->i_sb->s_bdi->ra_pages;
+ unsigned long max_len = max_pages << PAGE_SHIFT;
+ loff_t end = rreq->start + rreq->len, new_end;
u32 blockoff;
- u64 blockno;
- /* Expand the start downward */
- blockno = div_u64_rem(rreq->start, lo->stripe_unit, &blockoff);
- rreq->start = blockno * lo->stripe_unit;
- rreq->len += blockoff;
+ /* Readahead is disabled */
+ if (!max_pages)
+ return;
- /* Now, round up the length to the next block */
- rreq->len = roundup(rreq->len, lo->stripe_unit);
+ /*
+ * Try to expand the length forward by rounding up it to the next
+ * block, but do not exceed the file size, unless the original
+ * request already exceeds it.
+ */
+ new_end = min(round_up(end, lo->stripe_unit), rreq->i_size);
+ if (new_end > end && new_end <= rreq->start + max_len)
+ rreq->len = new_end - rreq->start;
+
+ /* Try to expand the start downward */
+ div_u64_rem(rreq->start, lo->stripe_unit, &blockoff);
+ if (rreq->len + blockoff <= max_len) {
+ rreq->start -= blockoff;
+ rreq->len += blockoff;
+ }
}
static bool ceph_netfs_clamp_length(struct netfs_io_subrequest *subreq)
--
2.40.0
When mgag200 switched from simple KMS to regular atomic helpers,
the initialization of the gamma settings was lost.
This leads to a black screen, if the bios/uefi doesn't use the same
pixel color depth.
Link: https://bugzilla.redhat.com/show_bug.cgi?id=2171155
Fixes: 1baf9127c482 ("drm/mgag200: Replace simple-KMS with regular atomic helpers")
Tested-by: Phil Oester <kernel(a)linuxace.com>
Signed-off-by: Jocelyn Falempe <jfalempe(a)redhat.com>
---
drivers/gpu/drm/mgag200/mgag200_mode.c | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/drivers/gpu/drm/mgag200/mgag200_mode.c b/drivers/gpu/drm/mgag200/mgag200_mode.c
index 461da1409fdf..911d46741e40 100644
--- a/drivers/gpu/drm/mgag200/mgag200_mode.c
+++ b/drivers/gpu/drm/mgag200/mgag200_mode.c
@@ -819,6 +819,11 @@ static void mgag200_crtc_helper_atomic_enable(struct drm_crtc *crtc,
else if (mdev->type == G200_EV)
mgag200_g200ev_set_hiprilvl(mdev);
+ if (crtc_state->gamma_lut)
+ mgag200_crtc_set_gamma(mdev, format, crtc_state->gamma_lut->data);
+ else
+ mgag200_crtc_set_gamma_linear(mdev, format);
+
mgag200_enable_display(mdev);
if (mdev->type == G200_WB || mdev->type == G200_EW3)
base-commit: 1baf9127c482a3a58aef81d92ae751798e2db202
--
2.39.2
The current uses of PageAnon in page table check functions can lead to
type confusion bugs between struct page and slab [1], if slab pages are
accidentally mapped into the user space. This is because slab reuses the
bits in struct page to store its internal states, which renders PageAnon
ineffective on slab pages.
Since slab pages are not expected to be mapped into user spaces, this
patch adds BUG_ON(PageSlab(page)) checks to ensure that slab pages are
not inadvertently mapped. Otherwise, there must be some bugs in the
kernel.
Reported-by: syzbot+fcf1a817ceb50935ce99(a)syzkaller.appspotmail.com
Closes: https://lore.kernel.org/lkml/000000000000258e5e05fae79fc1@google.com/ [1]
Fixes: df4e817b7108 ("mm: page table check")
Cc: <stable(a)vger.kernel.org> # 5.17
Signed-off-by: Ruihan Li <lrh2000(a)pku.edu.cn>
---
include/linux/page-flags.h | 6 ++++++
mm/page_table_check.c | 6 ++++++
2 files changed, 12 insertions(+)
diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index 1c68d67b8..7475a5399 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -617,6 +617,12 @@ PAGEFLAG_FALSE(VmemmapSelfHosted, vmemmap_self_hosted)
* Please note that, confusingly, "page_mapping" refers to the inode
* address_space which maps the page from disk; whereas "page_mapped"
* refers to user virtual address space into which the page is mapped.
+ *
+ * For slab pages, since slab reuses the bits in struct page to store its
+ * internal states, the page->mapping does not exist as such, nor do these
+ * flags below. So in order to avoid testing non-existent bits, please
+ * make sure that PageSlab(page) actually evaluates to false before calling
+ * the following functions (e.g., PageAnon). See slab.h.
*/
#define PAGE_MAPPING_ANON 0x1
#define PAGE_MAPPING_MOVABLE 0x2
diff --git a/mm/page_table_check.c b/mm/page_table_check.c
index 25d8610c0..f2baf97d5 100644
--- a/mm/page_table_check.c
+++ b/mm/page_table_check.c
@@ -71,6 +71,8 @@ static void page_table_check_clear(struct mm_struct *mm, unsigned long addr,
page = pfn_to_page(pfn);
page_ext = page_ext_get(page);
+
+ BUG_ON(PageSlab(page));
anon = PageAnon(page);
for (i = 0; i < pgcnt; i++) {
@@ -107,6 +109,8 @@ static void page_table_check_set(struct mm_struct *mm, unsigned long addr,
page = pfn_to_page(pfn);
page_ext = page_ext_get(page);
+
+ BUG_ON(PageSlab(page));
anon = PageAnon(page);
for (i = 0; i < pgcnt; i++) {
@@ -133,6 +137,8 @@ void __page_table_check_zero(struct page *page, unsigned int order)
struct page_ext *page_ext;
unsigned long i;
+ BUG_ON(PageSlab(page));
+
page_ext = page_ext_get(page);
BUG_ON(!page_ext);
for (i = 0; i < (1ul << order); i++) {
--
2.40.1