Hi Jason and Doug,
Here are most of our updates for 4.17. I will follow this up with a small
16B series then I still have a few more patches that are waiting on
some more thorough testing. Should be able to get them on the list tomrrow or
Friday at the latest, wanted to get these out now. I don't think anything
is really that scary in here.
These are mostly driver fixes. Patches 4,7,8,11, and 14 are marked stable.
They didn't get sent for the -rc because they fix really old issues.
Patch 5 is a core fix. I should have sent it a bit sooner, sorry about that but
it's pretty trivial so I decided to include it as well rather than wait for
4.18.
---
Alex Estrin (2):
IB/hfi1: Complete check for locally terminated smp
IB/{hfi1,qib}: Add handling of kernel restart
Ashutosh Dixit (1):
IB/core: Fix rkey invalidation from user space into the kernel
Michael J. Ruhl (5):
IB/hfi1: Return actual error value from program_rcvarray()
IB/hfi1: Use after free race condition in send context error path
IB/hfi1 Use correct type for num_user_context
IB/hfi1: Return correct value for device state
IB/hfi1: Reorder incorrect send context disable
Mike Marciniszyn (3):
IB/hfi1: Fix handling of FECN marked multicast packet
IB/hfi1: Fix fault injection init/exit issues
IB/hfi1: Fix loss of BECN with AHG
Sebastian Sanchez (2):
IB/hfi1: Prevent LNI hang when LCB can't obtain lanes
IB/{hfi1,rdmavt,qib}: Fit kernel completions into single aligned cache-line
drivers/infiniband/core/uverbs_cmd.c | 4 +
drivers/infiniband/hw/hfi1/chip.c | 59 ++++++++---
drivers/infiniband/hw/hfi1/chip.h | 15 ++-
drivers/infiniband/hw/hfi1/chip_registers.h | 7 +
drivers/infiniband/hw/hfi1/debugfs.c | 8 +
drivers/infiniband/hw/hfi1/driver.c | 19 +++-
drivers/infiniband/hw/hfi1/file_ops.c | 2
drivers/infiniband/hw/hfi1/hfi.h | 9 +-
drivers/infiniband/hw/hfi1/init.c | 9 +-
drivers/infiniband/hw/hfi1/mad.c | 36 ++++---
drivers/infiniband/hw/hfi1/pio.c | 44 ++++++--
drivers/infiniband/hw/hfi1/rc.c | 2
drivers/infiniband/hw/hfi1/ruc.c | 54 ++++++++--
drivers/infiniband/hw/hfi1/uc.c | 2
drivers/infiniband/hw/hfi1/ud.c | 10 +-
drivers/infiniband/hw/hfi1/user_exp_rcv.c | 1
drivers/infiniband/hw/qib/qib.h | 1
drivers/infiniband/hw/qib/qib_init.c | 5 +
drivers/infiniband/hw/qib/qib_rc.c | 2
drivers/infiniband/hw/qib/qib_ruc.c | 4 -
drivers/infiniband/hw/qib/qib_uc.c | 2
drivers/infiniband/hw/qib/qib_ud.c | 4 -
drivers/infiniband/sw/rdmavt/cq.c | 146 ++++++++++++++++++---------
drivers/infiniband/sw/rdmavt/qp.c | 4 -
drivers/infiniband/sw/rdmavt/trace_cq.h | 6 +
include/rdma/ib_verbs.h | 5 +
include/rdma/rdmavt_cq.h | 35 +++++-
include/rdma/rdmavt_qp.h | 2
28 files changed, 344 insertions(+), 153 deletions(-)
--
-Denny
The patch titled
Subject: mm, oom: fix concurrent munlock and oom reaper unmap
has been added to the -mm tree. Its filename is
mm-oom-fix-concurrent-munlock-and-oom-reaper-unmap.patch
This patch should soon appear at
http://ozlabs.org/~akpm/mmots/broken-out/mm-oom-fix-concurrent-munlock-and-…
and later at
http://ozlabs.org/~akpm/mmotm/broken-out/mm-oom-fix-concurrent-munlock-and-…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: David Rientjes <rientjes(a)google.com>
Subject: mm, oom: fix concurrent munlock and oom reaper unmap
Since exit_mmap() is done without the protection of mm->mmap_sem, it is
possible for the oom reaper to concurrently operate on an mm until
MMF_OOM_SKIP is set.
This allows munlock_vma_pages_all() to concurrently run while the oom
reaper is operating on a vma. Since munlock_vma_pages_range() depends on
clearing VM_LOCKED from vm_flags before actually doing the munlock to
determine if any other vmas are locking the same memory, the check for
VM_LOCKED in the oom reaper is racy.
This is especially noticeable on architectures such as powerpc where
clearing a huge pmd requires kick_all_cpus_sync(). If the pmd is zapped
by the oom reaper during follow_page_mask() after the check for pmd_none()
is bypassed, this ends up deferencing a NULL ptl.
Fix this by reusing MMF_UNSTABLE to specify that an mm should not be
reaped. This prevents the concurrent munlock_vma_pages_range() and
unmap_page_range(). The oom reaper will simply not operate on an mm that
has the bit set and leave the unmapping to exit_mmap().
Link: http://lkml.kernel.org/r/alpine.DEB.2.21.1804171545460.53786@chino.kir.corp…
Fixes: 212925802454 ("mm: oom: let oom_reap_task and exit_mmap run concurrently")
Signed-off-by: David Rientjes <rientjes(a)google.com>
Cc: Michal Hocko <mhocko(a)suse.com>
Cc: Andrea Arcangeli <aarcange(a)redhat.com>
Cc: Tetsuo Handa <penguin-kernel(a)I-love.SAKURA.ne.jp>
Cc: Roman Gushchin <guro(a)fb.com>
Cc: <stable(a)vger.kernel.org> [4.14+]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/mmap.c | 38 ++++++++++++++++++++------------------
mm/oom_kill.c | 19 ++++++++-----------
2 files changed, 28 insertions(+), 29 deletions(-)
diff -puN mm/mmap.c~mm-oom-fix-concurrent-munlock-and-oom-reaper-unmap mm/mmap.c
--- a/mm/mmap.c~mm-oom-fix-concurrent-munlock-and-oom-reaper-unmap
+++ a/mm/mmap.c
@@ -3015,6 +3015,25 @@ void exit_mmap(struct mm_struct *mm)
/* mm's last user has gone, and its about to be pulled down */
mmu_notifier_release(mm);
+ if (unlikely(mm_is_oom_victim(mm))) {
+ /*
+ * Wait for oom_reap_task() to stop working on this mm. Because
+ * MMF_UNSTABLE is already set before calling down_read(),
+ * oom_reap_task() will not run on this mm after up_write().
+ * oom_reap_task() also depends on a stable VM_LOCKED flag to
+ * indicate it should not unmap during munlock_vma_pages_all().
+ *
+ * mm_is_oom_victim() cannot be set from under us because
+ * victim->mm is already set to NULL under task_lock before
+ * calling mmput() and victim->signal->oom_mm is set by the oom
+ * killer only if victim->mm is non-NULL while holding
+ * task_lock().
+ */
+ set_bit(MMF_UNSTABLE, &mm->flags);
+ down_write(&mm->mmap_sem);
+ up_write(&mm->mmap_sem);
+ }
+
if (mm->locked_vm) {
vma = mm->mmap;
while (vma) {
@@ -3036,26 +3055,9 @@ void exit_mmap(struct mm_struct *mm)
/* update_hiwater_rss(mm) here? but nobody should be looking */
/* Use -1 here to ensure all VMAs in the mm are unmapped */
unmap_vmas(&tlb, vma, 0, -1);
-
- if (unlikely(mm_is_oom_victim(mm))) {
- /*
- * Wait for oom_reap_task() to stop working on this
- * mm. Because MMF_OOM_SKIP is already set before
- * calling down_read(), oom_reap_task() will not run
- * on this "mm" post up_write().
- *
- * mm_is_oom_victim() cannot be set from under us
- * either because victim->mm is already set to NULL
- * under task_lock before calling mmput and oom_mm is
- * set not NULL by the OOM killer only if victim->mm
- * is found not NULL while holding the task_lock.
- */
- set_bit(MMF_OOM_SKIP, &mm->flags);
- down_write(&mm->mmap_sem);
- up_write(&mm->mmap_sem);
- }
free_pgtables(&tlb, vma, FIRST_USER_ADDRESS, USER_PGTABLES_CEILING);
tlb_finish_mmu(&tlb, 0, -1);
+ set_bit(MMF_OOM_SKIP, &mm->flags);
/*
* Walk the list again, actually closing and freeing it,
diff -puN mm/oom_kill.c~mm-oom-fix-concurrent-munlock-and-oom-reaper-unmap mm/oom_kill.c
--- a/mm/oom_kill.c~mm-oom-fix-concurrent-munlock-and-oom-reaper-unmap
+++ a/mm/oom_kill.c
@@ -521,12 +521,17 @@ static bool __oom_reap_task_mm(struct ta
}
/*
- * MMF_OOM_SKIP is set by exit_mmap when the OOM reaper can't
- * work on the mm anymore. The check for MMF_OOM_SKIP must run
+ * Tell all users of get_user/copy_from_user etc... that the content
+ * is no longer stable. No barriers really needed because unmapping
+ * should imply barriers already and the reader would hit a page fault
+ * if it stumbled over reaped memory.
+ *
+ * MMF_UNSTABLE is also set by exit_mmap when the OOM reaper shouldn't
+ * work on the mm anymore. The check for MMF_OOM_UNSTABLE must run
* under mmap_sem for reading because it serializes against the
* down_write();up_write() cycle in exit_mmap().
*/
- if (test_bit(MMF_OOM_SKIP, &mm->flags)) {
+ if (test_and_set_bit(MMF_UNSTABLE, &mm->flags)) {
up_read(&mm->mmap_sem);
trace_skip_task_reaping(tsk->pid);
goto unlock_oom;
@@ -534,14 +539,6 @@ static bool __oom_reap_task_mm(struct ta
trace_start_task_reaping(tsk->pid);
- /*
- * Tell all users of get_user/copy_from_user etc... that the content
- * is no longer stable. No barriers really needed because unmapping
- * should imply barriers already and the reader would hit a page fault
- * if it stumbled over a reaped memory.
- */
- set_bit(MMF_UNSTABLE, &mm->flags);
-
for (vma = mm->mmap ; vma; vma = vma->vm_next) {
if (!can_madv_dontneed_vma(vma))
continue;
_
Patches currently in -mm which might be from rientjes(a)google.com are
mm-oom-fix-concurrent-munlock-and-oom-reaper-unmap.patch
Since SCSI scanning occurs asynchronously, since sd_revalidate_disk()
is called from sd_probe_async() and since sd_revalidate_disk() calls
sd_zbc_read_zones() it can happen that sd_zbc_read_zones() is called
concurrently with blkdev_report_zones() and/or blkdev_reset_zones().
That can cause these functions to fail with -EIO because
sd_zbc_read_zones() e.g. sets q->nr_zones to zero before restoring it
to the actual value, even if no drive characteristics have changed.
Avoid that this can happen by making the following changes:
- Protect the code that updates zone information with blk_queue_enter()
and blk_queue_exit().
- Modify sd_zbc_setup_seq_zones_bitmap() and sd_zbc_setup() such that
these functions do not modify struct scsi_disk before all zone
information has been obtained.
Note: since commit 055f6e18e08f ("block: Make q_usage_counter also
track legacy requests"; kernel v4.15) the request queue freezing
mechanism also affects legacy request queues.
Fixes: 89d947561077 ("sd: Implement support for ZBC devices")
Signed-off-by: Bart Van Assche <bart.vanassche(a)wdc.com>
Cc: Jens Axboe <axboe(a)kernel.dk>
Cc: Damien Le Moal <damien.lemoal(a)wdc.com>
Cc: Christoph Hellwig <hch(a)lst.de>
Cc: Hannes Reinecke <hare(a)suse.com>
Cc: stable(a)vger.kernel.org # v4.10
---
drivers/scsi/sd_zbc.c | 140 +++++++++++++++++++++++++++++--------------------
include/linux/blkdev.h | 5 ++
2 files changed, 87 insertions(+), 58 deletions(-)
diff --git a/drivers/scsi/sd_zbc.c b/drivers/scsi/sd_zbc.c
index 2d0c06f7db3e..323e3dc4bc59 100644
--- a/drivers/scsi/sd_zbc.c
+++ b/drivers/scsi/sd_zbc.c
@@ -390,8 +390,10 @@ static int sd_zbc_check_capacity(struct scsi_disk *sdkp, unsigned char *buf)
*
* Check that all zones of the device are equal. The last zone can however
* be smaller. The zone size must also be a power of two number of LBAs.
+ *
+ * Returns the zone size in bytes upon success or an error code upon failure.
*/
-static int sd_zbc_check_zone_size(struct scsi_disk *sdkp)
+static s64 sd_zbc_check_zone_size(struct scsi_disk *sdkp)
{
u64 zone_blocks = 0;
sector_t block = 0;
@@ -402,8 +404,6 @@ static int sd_zbc_check_zone_size(struct scsi_disk *sdkp)
int ret;
u8 same;
- sdkp->zone_blocks = 0;
-
/* Get a buffer */
buf = kmalloc(SD_ZBC_BUF_SIZE, GFP_KERNEL);
if (!buf)
@@ -435,16 +435,17 @@ static int sd_zbc_check_zone_size(struct scsi_disk *sdkp)
/* Parse zone descriptors */
while (rec < buf + buf_len) {
- zone_blocks = get_unaligned_be64(&rec[8]);
- if (sdkp->zone_blocks == 0) {
- sdkp->zone_blocks = zone_blocks;
- } else if (zone_blocks != sdkp->zone_blocks &&
- (block + zone_blocks < sdkp->capacity
- || zone_blocks > sdkp->zone_blocks)) {
- zone_blocks = 0;
+ u64 this_zone_blocks = get_unaligned_be64(&rec[8]);
+
+ if (zone_blocks == 0) {
+ zone_blocks = this_zone_blocks;
+ } else if (this_zone_blocks != zone_blocks &&
+ (block + this_zone_blocks < sdkp->capacity
+ || this_zone_blocks > zone_blocks)) {
+ this_zone_blocks = 0;
goto out;
}
- block += zone_blocks;
+ block += this_zone_blocks;
rec += 64;
}
@@ -457,8 +458,6 @@ static int sd_zbc_check_zone_size(struct scsi_disk *sdkp)
} while (block < sdkp->capacity);
- zone_blocks = sdkp->zone_blocks;
-
out:
if (!zone_blocks) {
if (sdkp->first_scan)
@@ -478,8 +477,7 @@ static int sd_zbc_check_zone_size(struct scsi_disk *sdkp)
"Zone size too large\n");
ret = -ENODEV;
} else {
- sdkp->zone_blocks = zone_blocks;
- sdkp->zone_shift = ilog2(zone_blocks);
+ ret = zone_blocks;
}
out_free:
@@ -490,15 +488,14 @@ static int sd_zbc_check_zone_size(struct scsi_disk *sdkp)
/**
* sd_zbc_alloc_zone_bitmap - Allocate a zone bitmap (one bit per zone).
- * @sdkp: The disk of the bitmap
+ * @nr_zones: Number of zones to allocate space for.
+ * @numa_node: NUMA node to allocate the memory from.
*/
-static inline unsigned long *sd_zbc_alloc_zone_bitmap(struct scsi_disk *sdkp)
+static inline unsigned long *
+sd_zbc_alloc_zone_bitmap(u32 nr_zones, int numa_node)
{
- struct request_queue *q = sdkp->disk->queue;
-
- return kzalloc_node(BITS_TO_LONGS(sdkp->nr_zones)
- * sizeof(unsigned long),
- GFP_KERNEL, q->node);
+ return kzalloc_node(BITS_TO_LONGS(nr_zones) * sizeof(unsigned long),
+ GFP_KERNEL, numa_node);
}
/**
@@ -506,6 +503,7 @@ static inline unsigned long *sd_zbc_alloc_zone_bitmap(struct scsi_disk *sdkp)
* @sdkp: disk used
* @buf: report reply buffer
* @buflen: length of @buf
+ * @zone_shift: logarithm base 2 of the number of blocks in a zone
* @seq_zones_bitmap: bitmap of sequential zones to set
*
* Parse reported zone descriptors in @buf to identify sequential zones and
@@ -515,7 +513,7 @@ static inline unsigned long *sd_zbc_alloc_zone_bitmap(struct scsi_disk *sdkp)
* Return the LBA after the last zone reported.
*/
static sector_t sd_zbc_get_seq_zones(struct scsi_disk *sdkp, unsigned char *buf,
- unsigned int buflen,
+ unsigned int buflen, u32 zone_shift,
unsigned long *seq_zones_bitmap)
{
sector_t lba, next_lba = sdkp->capacity;
@@ -534,7 +532,7 @@ static sector_t sd_zbc_get_seq_zones(struct scsi_disk *sdkp, unsigned char *buf,
if (type != ZBC_ZONE_TYPE_CONV &&
cond != ZBC_ZONE_COND_READONLY &&
cond != ZBC_ZONE_COND_OFFLINE)
- set_bit(lba >> sdkp->zone_shift, seq_zones_bitmap);
+ set_bit(lba >> zone_shift, seq_zones_bitmap);
next_lba = lba + get_unaligned_be64(&rec[8]);
rec += 64;
}
@@ -543,12 +541,16 @@ static sector_t sd_zbc_get_seq_zones(struct scsi_disk *sdkp, unsigned char *buf,
}
/**
- * sd_zbc_setup_seq_zones_bitmap - Initialize the disk seq zone bitmap.
+ * sd_zbc_setup_seq_zones_bitmap - Initialize a seq zone bitmap.
* @sdkp: target disk
+ * @zone_shift: logarithm base 2 of the number of blocks in a zone
+ * @nr_zones: number of zones to set up a seq zone bitmap for
*
* Allocate a zone bitmap and initialize it by identifying sequential zones.
*/
-static int sd_zbc_setup_seq_zones_bitmap(struct scsi_disk *sdkp)
+static unsigned long *
+sd_zbc_setup_seq_zones_bitmap(struct scsi_disk *sdkp, u32 zone_shift,
+ u32 nr_zones)
{
struct request_queue *q = sdkp->disk->queue;
unsigned long *seq_zones_bitmap;
@@ -556,9 +558,9 @@ static int sd_zbc_setup_seq_zones_bitmap(struct scsi_disk *sdkp)
unsigned char *buf;
int ret = -ENOMEM;
- seq_zones_bitmap = sd_zbc_alloc_zone_bitmap(sdkp);
+ seq_zones_bitmap = sd_zbc_alloc_zone_bitmap(nr_zones, q->node);
if (!seq_zones_bitmap)
- return -ENOMEM;
+ return ERR_PTR(-ENOMEM);
buf = kmalloc(SD_ZBC_BUF_SIZE, GFP_KERNEL);
if (!buf)
@@ -569,7 +571,7 @@ static int sd_zbc_setup_seq_zones_bitmap(struct scsi_disk *sdkp)
if (ret)
goto out;
lba = sd_zbc_get_seq_zones(sdkp, buf, SD_ZBC_BUF_SIZE,
- seq_zones_bitmap);
+ zone_shift, seq_zones_bitmap);
}
if (lba != sdkp->capacity) {
@@ -581,12 +583,9 @@ static int sd_zbc_setup_seq_zones_bitmap(struct scsi_disk *sdkp)
kfree(buf);
if (ret) {
kfree(seq_zones_bitmap);
- return ret;
+ return ERR_PTR(ret);
}
-
- q->seq_zones_bitmap = seq_zones_bitmap;
-
- return 0;
+ return seq_zones_bitmap;
}
static void sd_zbc_cleanup(struct scsi_disk *sdkp)
@@ -602,44 +601,64 @@ static void sd_zbc_cleanup(struct scsi_disk *sdkp)
q->nr_zones = 0;
}
-static int sd_zbc_setup(struct scsi_disk *sdkp)
+static int sd_zbc_setup(struct scsi_disk *sdkp, u32 zone_blocks)
{
struct request_queue *q = sdkp->disk->queue;
+ u32 zone_shift = ilog2(zone_blocks);
+ u32 nr_zones;
int ret;
- /* READ16/WRITE16 is mandatory for ZBC disks */
- sdkp->device->use_16_for_rw = 1;
- sdkp->device->use_10_for_rw = 0;
-
/* chunk_sectors indicates the zone size */
- blk_queue_chunk_sectors(sdkp->disk->queue,
- logical_to_sectors(sdkp->device, sdkp->zone_blocks));
- sdkp->nr_zones =
- round_up(sdkp->capacity, sdkp->zone_blocks) >> sdkp->zone_shift;
+ blk_queue_chunk_sectors(q,
+ logical_to_sectors(sdkp->device, zone_blocks));
+ nr_zones = round_up(sdkp->capacity, zone_blocks) >> zone_shift;
/*
* Initialize the device request queue information if the number
* of zones changed.
*/
- if (sdkp->nr_zones != q->nr_zones) {
-
- sd_zbc_cleanup(sdkp);
-
- q->nr_zones = sdkp->nr_zones;
- if (sdkp->nr_zones) {
- q->seq_zones_wlock = sd_zbc_alloc_zone_bitmap(sdkp);
- if (!q->seq_zones_wlock) {
+ if (nr_zones != sdkp->nr_zones || nr_zones != q->nr_zones) {
+ unsigned long *seq_zones_wlock = NULL, *seq_zones_bitmap = NULL;
+ size_t zone_bitmap_size;
+
+ if (nr_zones) {
+ seq_zones_wlock = sd_zbc_alloc_zone_bitmap(nr_zones,
+ q->node);
+ if (!seq_zones_wlock) {
ret = -ENOMEM;
goto err;
}
- ret = sd_zbc_setup_seq_zones_bitmap(sdkp);
- if (ret) {
- sd_zbc_cleanup(sdkp);
+ seq_zones_bitmap = sd_zbc_setup_seq_zones_bitmap(sdkp,
+ zone_shift, nr_zones);
+ if (IS_ERR(seq_zones_bitmap)) {
+ ret = PTR_ERR(seq_zones_bitmap);
+ kfree(seq_zones_wlock);
goto err;
}
}
-
+ zone_bitmap_size = BITS_TO_LONGS(nr_zones) *
+ sizeof(unsigned long);
+ blk_mq_freeze_queue(q);
+ if (q->nr_zones != nr_zones) {
+ /* READ16/WRITE16 is mandatory for ZBC disks */
+ sdkp->device->use_16_for_rw = 1;
+ sdkp->device->use_10_for_rw = 0;
+
+ sdkp->zone_blocks = zone_blocks;
+ sdkp->zone_shift = zone_shift;
+ sdkp->nr_zones = nr_zones;
+ q->nr_zones = nr_zones;
+ swap(q->seq_zones_wlock, seq_zones_wlock);
+ swap(q->seq_zones_bitmap, seq_zones_bitmap);
+ } else if (memcmp(q->seq_zones_bitmap, seq_zones_bitmap,
+ zone_bitmap_size) != 0) {
+ memcpy(q->seq_zones_bitmap, seq_zones_bitmap,
+ zone_bitmap_size);
+ }
+ blk_mq_unfreeze_queue(q);
+ kfree(seq_zones_wlock);
+ kfree(seq_zones_bitmap);
}
return 0;
@@ -651,6 +670,7 @@ static int sd_zbc_setup(struct scsi_disk *sdkp)
int sd_zbc_read_zones(struct scsi_disk *sdkp, unsigned char *buf)
{
+ int64_t zone_blocks;
int ret;
if (!sd_is_zoned(sdkp))
@@ -687,12 +707,16 @@ int sd_zbc_read_zones(struct scsi_disk *sdkp, unsigned char *buf)
* Check zone size: only devices with a constant zone size (except
* an eventual last runt zone) that is a power of 2 are supported.
*/
- ret = sd_zbc_check_zone_size(sdkp);
- if (ret)
+ zone_blocks = sd_zbc_check_zone_size(sdkp);
+ ret = -EFBIG;
+ if (zone_blocks != (u32)zone_blocks)
+ goto err;
+ ret = zone_blocks;
+ if (ret < 0)
goto err;
/* The drive satisfies the kernel restrictions: set it up */
- ret = sd_zbc_setup(sdkp);
+ ret = sd_zbc_setup(sdkp, zone_blocks);
if (ret)
goto err;
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 9af3e0f430bc..21e21f273a21 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -605,6 +605,11 @@ struct request_queue {
* initialized by the low level device driver (e.g. scsi/sd.c).
* Stacking drivers (device mappers) may or may not initialize
* these fields.
+ *
+ * Reads of this information must be protected with blk_queue_enter() /
+ * blk_queue_exit(). Modifying this information is only allowed while
+ * no requests are being processed. See also blk_mq_freeze_queue() and
+ * blk_mq_unfreeze_queue().
*/
unsigned int nr_zones;
unsigned long *seq_zones_bitmap;
--
2.16.3
From: Long Li <longli(a)microsoft.com>
When sending the last iov that breaks into smaller buffers to fit the
transfer size, it's necessary to check if this is the last iov.
If this is the latest iov, stop and proceed to send pages.
Signed-off-by: Long Li <longli(a)microsoft.com>
---
fs/cifs/smbdirect.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/fs/cifs/smbdirect.c b/fs/cifs/smbdirect.c
index 90e673c..b5c6c0d 100644
--- a/fs/cifs/smbdirect.c
+++ b/fs/cifs/smbdirect.c
@@ -2197,6 +2197,8 @@ int smbd_send(struct smbd_connection *info, struct smb_rqst *rqst)
goto done;
}
i++;
+ if (i == rqst->rq_nvec)
+ break;
}
start = i;
buflen = 0;
--
2.7.4
The patch titled
Subject: mm, slab: reschedule cache_reap() on the same CPU
has been removed from the -mm tree. Its filename was
mm-slab-reschedule-cache_reap-on-the-same-cpu.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: Vlastimil Babka <vbabka(a)suse.cz>
Subject: mm, slab: reschedule cache_reap() on the same CPU
cache_reap() is initially scheduled in start_cpu_timer() via
schedule_delayed_work_on(). But then the next iterations are scheduled
via schedule_delayed_work(), i.e. using WORK_CPU_UNBOUND.
Thus since commit ef557180447f ("workqueue: schedule WORK_CPU_UNBOUND work
on wq_unbound_cpumask CPUs") there is no guarantee the future iterations
will run on the originally intended cpu, although it's still preferred. I
was able to demonstrate this with
/sys/module/workqueue/parameters/debug_force_rr_cpu. IIUC, it may also
happen due to migrating timers in nohz context. As a result, some cpu's
would be calling cache_reap() more frequently and others never.
This patch uses schedule_delayed_work_on() with the current cpu when
scheduling the next iteration.
Link: http://lkml.kernel.org/r/20180411070007.32225-1-vbabka@suse.cz
Fixes: ef557180447f ("workqueue: schedule WORK_CPU_UNBOUND work on wq_unbound_cpumask CPUs")
Signed-off-by: Vlastimil Babka <vbabka(a)suse.cz>
Acked-by: Pekka Enberg <penberg(a)kernel.org>
Acked-by: Christoph Lameter <cl(a)linux.com>
Cc: Joonsoo Kim <iamjoonsoo.kim(a)lge.com>
Cc: David Rientjes <rientjes(a)google.com>
Cc: Tejun Heo <tj(a)kernel.org>
Cc: Lai Jiangshan <jiangshanlai(a)gmail.com>
Cc: John Stultz <john.stultz(a)linaro.org>
Cc: Thomas Gleixner <tglx(a)linutronix.de>
Cc: Stephen Boyd <sboyd(a)kernel.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/slab.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff -puN mm/slab.c~mm-slab-reschedule-cache_reap-on-the-same-cpu mm/slab.c
--- a/mm/slab.c~mm-slab-reschedule-cache_reap-on-the-same-cpu
+++ a/mm/slab.c
@@ -4086,7 +4086,8 @@ next:
next_reap_node();
out:
/* Set up the next iteration */
- schedule_delayed_work(work, round_jiffies_relative(REAPTIMEOUT_AC));
+ schedule_delayed_work_on(smp_processor_id(), work,
+ round_jiffies_relative(REAPTIMEOUT_AC));
}
void get_slabinfo(struct kmem_cache *cachep, struct slabinfo *sinfo)
_
Patches currently in -mm which might be from vbabka(a)suse.cz are
The patch titled
Subject: ipc/shm: fix use-after-free of shm file via remap_file_pages()
has been removed from the -mm tree. Its filename was
ipc-shm-fix-use-after-free-of-shm-file-via-remap_file_pages.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: Eric Biggers <ebiggers(a)google.com>
Subject: ipc/shm: fix use-after-free of shm file via remap_file_pages()
syzbot reported a use-after-free of shm_file_data(file)->file->f_op in
shm_get_unmapped_area(), called via sys_remap_file_pages(). Unfortunately
it couldn't generate a reproducer, but I found a bug which I think caused
it. When remap_file_pages() is passed a full System V shared memory
segment, the memory is first unmapped, then a new map is created using the
->vm_file. Between these steps, the shm ID can be removed and reused for
a new shm segment. But, shm_mmap() only checks whether the ID is
currently valid before calling the underlying file's ->mmap(); it doesn't
check whether it was reused. Thus it can use the wrong underlying file,
one that was already freed.
Fix this by making the "outer" shm file (the one that gets put in
->vm_file) hold a reference to the real shm file, and by making
__shm_open() require that the file associated with the shm ID matches the
one associated with the "outer" file. Taking the reference to the real
shm file is needed to fully solve the problem, since otherwise sfd->file
could point to a freed file, which then could be reallocated for the
reused shm ID, causing the wrong shm segment to be mapped (and without the
required permission checks).
Commit 1ac0b6dec656 ("ipc/shm: handle removed segments gracefully in
shm_mmap()") almost fixed this bug, but it didn't go far enough because it
didn't consider the case where the shm ID is reused.
The following program usually reproduces this bug:
#include <stdlib.h>
#include <sys/shm.h>
#include <sys/syscall.h>
#include <unistd.h>
int main()
{
int is_parent = (fork() != 0);
srand(getpid());
for (;;) {
int id = shmget(0xF00F, 4096, IPC_CREAT|0700);
if (is_parent) {
void *addr = shmat(id, NULL, 0);
usleep(rand() % 50);
while (!syscall(__NR_remap_file_pages, addr, 4096, 0, 0, 0));
} else {
usleep(rand() % 50);
shmctl(id, IPC_RMID, NULL);
}
}
}
It causes the following NULL pointer dereference due to a 'struct file'
being used while it's being freed. (I couldn't actually get a KASAN
use-after-free splat like in the syzbot report. But I think it's possible
with this bug; it would just take a more extraordinary race...)
BUG: unable to handle kernel NULL pointer dereference at 0000000000000058
PGD 0 P4D 0
Oops: 0000 [#1] SMP NOPTI
CPU: 9 PID: 258 Comm: syz_ipc Not tainted 4.16.0-05140-gf8cf2f16a7c95 #189
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-20171110_100015-anatol 04/01/2014
RIP: 0010:d_inode include/linux/dcache.h:519 [inline]
RIP: 0010:touch_atime+0x25/0xd0 fs/inode.c:1724
[...]
Call Trace:
file_accessed include/linux/fs.h:2063 [inline]
shmem_mmap+0x25/0x40 mm/shmem.c:2149
call_mmap include/linux/fs.h:1789 [inline]
shm_mmap+0x34/0x80 ipc/shm.c:465
call_mmap include/linux/fs.h:1789 [inline]
mmap_region+0x309/0x5b0 mm/mmap.c:1712
do_mmap+0x294/0x4a0 mm/mmap.c:1483
do_mmap_pgoff include/linux/mm.h:2235 [inline]
SYSC_remap_file_pages mm/mmap.c:2853 [inline]
SyS_remap_file_pages+0x232/0x310 mm/mmap.c:2769
do_syscall_64+0x64/0x1a0 arch/x86/entry/common.c:287
entry_SYSCALL_64_after_hwframe+0x42/0xb7
[ebiggers(a)google.com: add comment]
Link: http://lkml.kernel.org/r/20180410192850.235835-1-ebiggers3@gmail.com
Link: http://lkml.kernel.org/r/20180409043039.28915-1-ebiggers3@gmail.com
Reported-by: syzbot+d11f321e7f1923157eac80aa990b446596f46439(a)syzkaller.appspotmail.com
Fixes: c8d78c1823f4 ("mm: replace remap_file_pages() syscall with emulation")
Signed-off-by: Eric Biggers <ebiggers(a)google.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com>
Acked-by: Davidlohr Bueso <dbueso(a)suse.de>
Cc: Manfred Spraul <manfred(a)colorfullife.com>
Cc: "Eric W . Biederman" <ebiederm(a)xmission.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
ipc/shm.c | 23 ++++++++++++++++++++---
1 file changed, 20 insertions(+), 3 deletions(-)
diff -puN ipc/shm.c~ipc-shm-fix-use-after-free-of-shm-file-via-remap_file_pages ipc/shm.c
--- a/ipc/shm.c~ipc-shm-fix-use-after-free-of-shm-file-via-remap_file_pages
+++ a/ipc/shm.c
@@ -225,6 +225,12 @@ static int __shm_open(struct vm_area_str
if (IS_ERR(shp))
return PTR_ERR(shp);
+ if (shp->shm_file != sfd->file) {
+ /* ID was reused */
+ shm_unlock(shp);
+ return -EINVAL;
+ }
+
shp->shm_atim = ktime_get_real_seconds();
ipc_update_pid(&shp->shm_lprid, task_tgid(current));
shp->shm_nattch++;
@@ -455,8 +461,9 @@ static int shm_mmap(struct file *file, s
int ret;
/*
- * In case of remap_file_pages() emulation, the file can represent
- * removed IPC ID: propogate shm_lock() error to caller.
+ * In case of remap_file_pages() emulation, the file can represent an
+ * IPC ID that was removed, and possibly even reused by another shm
+ * segment already. Propagate this case as an error to caller.
*/
ret = __shm_open(vma);
if (ret)
@@ -480,6 +487,7 @@ static int shm_release(struct inode *ino
struct shm_file_data *sfd = shm_file_data(file);
put_ipc_ns(sfd->ns);
+ fput(sfd->file);
shm_file_data(file) = NULL;
kfree(sfd);
return 0;
@@ -1445,7 +1453,16 @@ long do_shmat(int shmid, char __user *sh
file->f_mapping = shp->shm_file->f_mapping;
sfd->id = shp->shm_perm.id;
sfd->ns = get_ipc_ns(ns);
- sfd->file = shp->shm_file;
+ /*
+ * We need to take a reference to the real shm file to prevent the
+ * pointer from becoming stale in cases where the lifetime of the outer
+ * file extends beyond that of the shm segment. It's not usually
+ * possible, but it can happen during remap_file_pages() emulation as
+ * that unmaps the memory, then does ->mmap() via file reference only.
+ * We'll deny the ->mmap() if the shm segment was since removed, but to
+ * detect shm ID reuse we need to compare the file pointers.
+ */
+ sfd->file = get_file(shp->shm_file);
sfd->vm_ops = NULL;
err = security_mmap_file(file, prot, flags);
_
Patches currently in -mm which might be from ebiggers(a)google.com are
The patch titled
Subject: get_user_pages_fast(): return -EFAULT on access_ok failure
has been removed from the -mm tree. Its filename was
gup-return-efault-on-access_ok-failure.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: "Michael S. Tsirkin" <mst(a)redhat.com>
Subject: get_user_pages_fast(): return -EFAULT on access_ok failure
get_user_pages_fast is supposed to be a faster drop-in equivalent of
get_user_pages. As such, callers expect it to return a negative return
code when passed an invalid address, and never expect it to return 0 when
passed a positive number of pages, since its documentation says:
* Returns number of pages pinned. This may be fewer than the number
* requested. If nr_pages is 0 or negative, returns 0. If no pages
* were pinned, returns -errno.
When get_user_pages_fast fall back on get_user_pages this is exactly what
happens. Unfortunately the implementation is inconsistent: it returns 0
if passed a kernel address, confusing callers: for example, the following
is pretty common but does not appear to do the right thing with a kernel
address:
ret = get_user_pages_fast(addr, 1, writeable, &page);
if (ret < 0)
return ret;
Change get_user_pages_fast to return -EFAULT when supplied a kernel
address to make it match expectations.
All callers have been audited for consistency with the documented
semantics.
Link: http://lkml.kernel.org/r/1522962072-182137-4-git-send-email-mst@redhat.com
Fixes: 5b65c4677a57 ("mm, x86/mm: Fix performance regression in get_user_pages_fast()")
Signed-off-by: Michael S. Tsirkin <mst(a)redhat.com>
Reported-by: syzbot+6304bf97ef436580fede(a)syzkaller.appspotmail.com
Reviewed-by: Andrew Morton <akpm(a)linux-foundation.org>
Cc: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com>
Cc: Huang Ying <ying.huang(a)intel.com>
Cc: Jonathan Corbet <corbet(a)lwn.net>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Thomas Gleixner <tglx(a)linutronix.de>
Cc: Thorsten Leemhuis <regressions(a)leemhuis.info>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/gup.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff -puN mm/gup.c~gup-return-efault-on-access_ok-failure mm/gup.c
--- a/mm/gup.c~gup-return-efault-on-access_ok-failure
+++ a/mm/gup.c
@@ -1806,9 +1806,12 @@ int get_user_pages_fast(unsigned long st
len = (unsigned long) nr_pages << PAGE_SHIFT;
end = start + len;
+ if (nr_pages <= 0)
+ return 0;
+
if (unlikely(!access_ok(write ? VERIFY_WRITE : VERIFY_READ,
(void __user *)start, len)))
- return 0;
+ return -EFAULT;
if (gup_fast_permitted(start, nr_pages, write)) {
local_irq_disable();
_
Patches currently in -mm which might be from mst(a)redhat.com are
The patch titled
Subject: mm/gup_benchmark: handle gup failures
has been removed from the -mm tree. Its filename was
mm-gup_benchmark-handle-gup-failures.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: "Michael S. Tsirkin" <mst(a)redhat.com>
Subject: mm/gup_benchmark: handle gup failures
Patch series "mm/get_user_pages_fast fixes, cleanups", v2.
Turns out get_user_pages_fast and __get_user_pages_fast return different
values on error when given a single page: __get_user_pages_fast returns 0.
get_user_pages_fast returns either 0 or an error.
Callers of get_user_pages_fast expect an error so fix it up to return an
error consistently.
Stress the difference between get_user_pages_fast and
__get_user_pages_fast to make sure callers aren't confused.
This patch (of 3):
__gup_benchmark_ioctl does not handle the case where get_user_pages_fast
fails:
- a negative return code will cause a buffer overrun
- returning with partial success will cause use of
uninitialized memory.
[akpm(a)linux-foundation.org: simplification]
Link: http://lkml.kernel.org/r/1522962072-182137-3-git-send-email-mst@redhat.com
Signed-off-by: Michael S. Tsirkin <mst(a)redhat.com>
Reviewed-by: Andrew Morton <akpm(a)linux-foundation.org>
Cc: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com>
Cc: Huang Ying <ying.huang(a)intel.com>
Cc: Jonathan Corbet <corbet(a)lwn.net>
Cc: Linus Torvalds <torvalds(a)linux-foundation.org>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Thomas Gleixner <tglx(a)linutronix.de>
Cc: Thorsten Leemhuis <regressions(a)leemhuis.info>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/gup_benchmark.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff -puN mm/gup_benchmark.c~mm-gup_benchmark-handle-gup-failures mm/gup_benchmark.c
--- a/mm/gup_benchmark.c~mm-gup_benchmark-handle-gup-failures
+++ a/mm/gup_benchmark.c
@@ -23,7 +23,7 @@ static int __gup_benchmark_ioctl(unsigne
struct page **pages;
nr_pages = gup->size / PAGE_SIZE;
- pages = kvmalloc(sizeof(void *) * nr_pages, GFP_KERNEL);
+ pages = kvzalloc(sizeof(void *) * nr_pages, GFP_KERNEL);
if (!pages)
return -ENOMEM;
@@ -41,6 +41,8 @@ static int __gup_benchmark_ioctl(unsigne
}
nr = get_user_pages_fast(addr, nr, gup->flags & 1, pages + i);
+ if (nr <= 0)
+ break;
i += nr;
}
end_time = ktime_get();
_
Patches currently in -mm which might be from mst(a)redhat.com are
The patch titled
Subject: resource: fix integer overflow at reallocation
has been removed from the -mm tree. Its filename was
resource-fix-integer-overflow-at-reallocation-v1.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: Takashi Iwai <tiwai(a)suse.de>
Subject: resource: fix integer overflow at reallocation
We've got a bug report indicating a kernel panic at booting on an x86-32
system, and it turned out to be the invalid PCI resource assigned after
reallocation. __find_resource() first aligns the resource start address
and resets the end address with start+size-1 accordingly, then checks
whether it's contained. Here the end address may overflow the integer,
although resource_contains() still returns true because the function
validates only start and end address. So this ends up with returning an
invalid resource (start > end).
There was already an attempt to cover such a problem in the commit
47ea91b4052d ("Resource: fix wrong resource window calculation"), but this
case is an overseen one.
This patch adds the validity check of the newly calculated resource for
avoiding the integer overflow problem.
Bugzilla: http://bugzilla.opensuse.org/show_bug.cgi?id=1086739
Link: http://lkml.kernel.org/r/s5hpo37d5l8.wl-tiwai@suse.de
Fixes: 23c570a67448 ("resource: ability to resize an allocated resource")
Signed-off-by: Takashi Iwai <tiwai(a)suse.de>
Reported-by: Michael Henders <hendersm(a)shaw.ca>
Tested-by: Michael Henders <hendersm(a)shaw.ca>
Reviewed-by: Andrew Morton <akpm(a)linux-foundation.org>
Cc: Ram Pai <linuxram(a)us.ibm.com>
Cc: Bjorn Helgaas <bhelgaas(a)google.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
kernel/resource.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff -puN kernel/resource.c~resource-fix-integer-overflow-at-reallocation-v1 kernel/resource.c
--- a/kernel/resource.c~resource-fix-integer-overflow-at-reallocation-v1
+++ a/kernel/resource.c
@@ -651,7 +651,8 @@ static int __find_resource(struct resour
alloc.start = constraint->alignf(constraint->alignf_data, &avail,
size, constraint->align);
alloc.end = alloc.start + size - 1;
- if (resource_contains(&avail, &alloc)) {
+ if (alloc.start <= alloc.end &&
+ resource_contains(&avail, &alloc)) {
new->start = alloc.start;
new->end = alloc.end;
return 0;
_
Patches currently in -mm which might be from tiwai(a)suse.de are
The __clear_user function is defined to return the number of bytes that
could not be cleared. From the underlying memset / bzero implementation
this means setting register a2 to that number on return. Currently if a
page fault is triggered within the memset_partial block, the value
loaded into a2 on return is meaningless.
The label .Lpartial_fixup\@ is jumped to on page fault. Currently it
masks the remaining count of bytes (a2) with STORMASK, meaning that the
least significant 2 (32bit) or 3 (64bit) bits of the remaining count are
always clear.
Secondly, .Lpartial_fixup\@ expects t1 to contain the end address of the
copy. This is set up by the initial block:
PTR_ADDU t1, a0 /* end address */
However, the .Lmemset_partial\@ block then reuses register t1 to
calculate a jump through a block of word copies. This leaves it no
longer containing the end address of the copy operation if a page fault
occurs, and the remaining bytes calculation is incorrect.
Fix these issues by removing the and of a2 with STORMASK, and replace t1
with register t2 in the .Lmemset_partial\@ block.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Cc: stable(a)vger.kernel.org
Signed-off-by: Matt Redfearn <matt.redfearn(a)mips.com>
---
arch/mips/lib/memset.S | 9 ++++-----
1 file changed, 4 insertions(+), 5 deletions(-)
diff --git a/arch/mips/lib/memset.S b/arch/mips/lib/memset.S
index 90bcdf1224ee..3257dca58cad 100644
--- a/arch/mips/lib/memset.S
+++ b/arch/mips/lib/memset.S
@@ -161,19 +161,19 @@
.Lmemset_partial\@:
R10KCBARRIER(0(ra))
- PTR_LA t1, 2f /* where to start */
+ PTR_LA t2, 2f /* where to start */
#ifdef CONFIG_CPU_MICROMIPS
LONG_SRL t7, t0, 1
#endif
#if LONGSIZE == 4
- PTR_SUBU t1, FILLPTRG
+ PTR_SUBU t2, FILLPTRG
#else
.set noat
LONG_SRL AT, FILLPTRG, 1
- PTR_SUBU t1, AT
+ PTR_SUBU t2, AT
.set at
#endif
- jr t1
+ jr t2
PTR_ADDU a0, t0 /* dest ptr */
.set push
@@ -250,7 +250,6 @@
.Lpartial_fixup\@:
PTR_L t0, TI_TASK($28)
- andi a2, STORMASK
LONG_L t0, THREAD_BUADDR(t0)
LONG_ADDU a2, t1
jr ra
--
2.7.4
On Sat, Dec 23, 2017 at 07:41:47PM +0100, Hans de Goede wrote:
> We're seeing a lot of bogus backlight interfaces on newer machines without
> a LCD such as desktops, servers and HDMI sticks. This causes userspace to
> show a non-functional brightness slider in e.g. the GNOME3 system menu,
> which is undesirable. More in general we should simply just not register
> a non functional backlight interface.
>
> Checking the lcd flag causes the bogus acpi_video backlight interfaces to
> go away (on the machines this was tested on).
>
> This commit enables the lcd_only option by default on any machines which
> are win8 ready, fixing this.
>
> This is not entirely without risk of regressions, but video_detect.c
> already prefers native-backlight interfaces over the acpi_video one
> on win8 ready machines, calling acpi_video_unregister_backlight() as soon
> as a native interface shows up. This is done because the acpi backlight
> interface often is broken on win8 ready machines, because win8 does not
> seem to actually use it.
This patch (in the form of commit 965736ee654d ("ACPI / video: Default
lcd_only to true on Win8-ready and newer machines") in stable v4.15.17),
breaks backlight control on my 2013 XPS13 laptop.
It normally uses the acpi backlight device, but after this patch that
device no longer shows up in sysfs.
This isn't the first time the backlight has gotton broken on this
system, though I think last time it was because the intel backlight
driver got used instead of the ACPI one and that didn't work properly
with it, so it needed a quirk to make it use ACPI instead.
Is some other quirk needed around here too?
Cheers
James
The patch below does not apply to the 4.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 9f99e50d460ac7fd5f6c9b97aad0088c28c8656d Mon Sep 17 00:00:00 2001
From: Amir Goldstein <amir73il(a)gmail.com>
Date: Wed, 11 Apr 2018 20:09:29 +0300
Subject: [PATCH] ovl: set lower layer st_dev only if setting lower st_ino
For broken hardlinks, we do not return lower st_ino, so we should
also not return lower pseudo st_dev.
Fixes: a0c5ad307ac0 ("ovl: relax same fs constraint for constant st_ino")
Cc: <stable(a)vger.kernel.org> #v4.15
Signed-off-by: Amir Goldstein <amir73il(a)gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi(a)redhat.com>
diff --git a/fs/overlayfs/inode.c b/fs/overlayfs/inode.c
index 4689716f23d8..1d75b2e96c96 100644
--- a/fs/overlayfs/inode.c
+++ b/fs/overlayfs/inode.c
@@ -118,13 +118,10 @@ int ovl_getattr(const struct path *path, struct kstat *stat,
*/
if (ovl_test_flag(OVL_INDEX, d_inode(dentry)) ||
(!ovl_verify_lower(dentry->d_sb) &&
- (is_dir || lowerstat.nlink == 1)))
+ (is_dir || lowerstat.nlink == 1))) {
stat->ino = lowerstat.ino;
-
- if (samefs)
- WARN_ON_ONCE(stat->dev != lowerstat.dev);
- else
stat->dev = ovl_get_pseudo_dev(dentry);
+ }
}
if (samefs) {
/*
The patch below does not apply to the 4.16-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 41387bb7d869e96df4b870e1880ad49f053cc755 Mon Sep 17 00:00:00 2001
From: Matthew Wilcox <mawilcox(a)microsoft.com>
Date: Fri, 2 Mar 2018 10:40:14 -0800
Subject: [PATCH] Documentation/sphinx: Fix Directive import error
Sphinx 1.7 removed sphinx.util.compat.Directive so people
who have upgraded cannot build the documentation. Switch to
docutils.parsers.rst.Directive which has been available since
docutils 0.5 released in 2009.
Bugzilla: https://bugzilla.opensuse.org/show_bug.cgi?id=1083694
Co-developed-by: Takashi Iwai <tiwai(a)suse.de>
Acked-by: Jani Nikula <jani.nikula(a)intel.com>
Cc: stable(a)vger.kernel.org
Signed-off-by: Matthew Wilcox <mawilcox(a)microsoft.com>
Signed-off-by: Jonathan Corbet <corbet(a)lwn.net>
diff --git a/Documentation/sphinx/kerneldoc.py b/Documentation/sphinx/kerneldoc.py
index 39aa9e8697cc..fbedcc39460b 100644
--- a/Documentation/sphinx/kerneldoc.py
+++ b/Documentation/sphinx/kerneldoc.py
@@ -36,8 +36,7 @@ import glob
from docutils import nodes, statemachine
from docutils.statemachine import ViewList
-from docutils.parsers.rst import directives
-from sphinx.util.compat import Directive
+from docutils.parsers.rst import directives, Directive
from sphinx.ext.autodoc import AutodocReporter
__version__ = '1.0'
Hi Greg,
Upstream commit 0c4c5860e998 ("hwmon: (ina2xx) Fix access to uninitialized
mutex") fixes commit 5d389b125186 ("hwmon: (ina2xx) Make calibration register
value fixed"), which has found its way into v4.4.y, v4.9.y, v4.14.y, and
v4.15.y.
Please apply commit 0c4c5860e998 to all affected releases to fix the
resulting regression.
Maybe Sasha's script could be improved to look for Fixes: tags when
suggesting to apply patches to older releases.
Thanks,
Guenter
在 2018-04-17 18:32,Greg KH 写道:
> On Tue, Apr 17, 2018 at 11:32:42AM +0800, leiwan(a)codeaurora.org wrote:
>>
>> xhci-plat Shutdown callback should check HCD_FLAG_HW_ACCESSIBLE
>> before accessing any register. This should avoid hung with access
>> controllers which support runtime suspend
>>
>> This can fix for issue of https://patchwork.kernel.org/patch/10339317/
>> corresponding upload in CAF:
>> https://source.codeaurora.org/quic/la/kernel/msm-4.4/commit/?h=LV.HB.1.1.5-…
>>
>> full patch refer attachment.
>> diff --git a/drivers/usb/host/xhci.c b/drivers/usb/host/xhci.c
>> index 9b27798..bdf914d 100644
>> --- a/drivers/usb/host/xhci.c
>> +++ b/drivers/usb/host/xhci.c
>> @@ -702,6 +702,10 @@ static void xhci_shutdown(struct usb_hcd *hcd)
>> usb_disable_xhci_ports(to_pci_dev(hcd->self.sysdev));
>>
>> spin_lock_irq(&xhci->lock);
>> + if (!HCD_HW_ACCESSIBLE(hcd)) {
>> + spin_unlock_irq(&xhci->lock);
>> + return;
>> + }
>> xhci_halt(xhci);
>
> A blank line after the if statement?
> >> [lei]yes
> What about all of the other places in this driver that you should also
> check for this? Look at the other host controllers, shouldn't you
> mirror what they are doing?
> >> [lei]I checked other usb host module shutdown and suspend workflow.
>> All usb host driver need to check hw accessable before
>> read/write usb register especially in runtime PM case..
> And this needs a Fixes: tag, along with a cc: stable so as to properly
> get backported as this is broken in some stable kernels right now.
> >> [lei] Added by v2 patch
> thanks,
>
> greg k-h
From c03697fa259ab38d1002598ec2ccfac37607ca0b Mon Sep 17 00:00:00 2001
From: Lei wang <leiwan(a)codeaurora.org>
Date: Tue, 17 Apr 2018 10:55:35 +0800
Subject: [PATCH v2] xhci: plat: Fix xhci_plat shutdown hung
xhci-plat Shutdown callback should check HCD_FLAG_HW_ACCESSIBLE
before accessing any register. This should avoid hung with access
controllers which support runtime suspend
Fixes: b07c12517f2a ("xhci: plat: Register shutdown for xhci_plat")
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Lei wang <leiwan(a)codeaurora.org>
---
drivers/usb/host/xhci.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/drivers/usb/host/xhci.c b/drivers/usb/host/xhci.c
index 9b27798..bdf914d 100644
--- a/drivers/usb/host/xhci.c
+++ b/drivers/usb/host/xhci.c
@@ -702,6 +702,10 @@ static void xhci_shutdown(struct usb_hcd *hcd)
usb_disable_xhci_ports(to_pci_dev(hcd->self.sysdev));
spin_lock_irq(&xhci->lock);
+ if (!HCD_HW_ACCESSIBLE(hcd)) {
+ spin_unlock_irq(&xhci->lock);
+ return;
+ }
xhci_halt(xhci);
/* Workaround for spurious wakeups at shutdown with HSW */
if (xhci->quirks & XHCI_SPURIOUS_WAKEUP)
--
1.9.1
The patch below was submitted to be applied to the 4.16-stable tree.
I fail to see how this patch meets the stable kernel rules as found at
Documentation/process/stable-kernel-rules.rst.
I could be totally wrong, and if so, please respond to
<stable(a)vger.kernel.org> and let me know why this patch should be
applied. Otherwise, it is now dropped from my patch queues, never to be
seen again.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 3140c156e919b0f5fad5c5f6cf7876c39d1d4f06 Mon Sep 17 00:00:00 2001
From: Peng Hao <peng.hao2(a)zte.com.cn>
Date: Mon, 2 Apr 2018 09:15:32 +0800
Subject: [PATCH] kvm: x86: fix a compile warning
fix a "warning: no previous prototype".
Cc: stable(a)vger.kernel.org
Signed-off-by: Peng Hao <peng.hao2(a)zte.com.cn>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 8f108131d85d..b2ff74b12ec4 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -7943,7 +7943,7 @@ int kvm_task_switch(struct kvm_vcpu *vcpu, u16 tss_selector, int idt_index,
}
EXPORT_SYMBOL_GPL(kvm_task_switch);
-int kvm_valid_sregs(struct kvm_vcpu *vcpu, struct kvm_sregs *sregs)
+static int kvm_valid_sregs(struct kvm_vcpu *vcpu, struct kvm_sregs *sregs)
{
if ((sregs->efer & EFER_LME) && (sregs->cr0 & X86_CR0_PG)) {
/*
The label .Llast_fixup\@ is jumped to on page fault within the final
byte set loop of memset (on < MIPSR6 architectures). For some reason, in
this fault handler, the v1 register is randomly set to a2 & STORMASK.
This clobbers v1 for the calling function. This can be observed with the
following test code:
static int __init __attribute__((optimize("O0"))) test_clear_user(void)
{
register int t asm("v1");
char *test;
int j, k;
pr_info("\n\n\nTesting clear_user\n");
test = vmalloc(PAGE_SIZE);
for (j = 256; j < 512; j++) {
t = 0xa5a5a5a5;
if ((k = clear_user(test + PAGE_SIZE - 256, j)) != j - 256) {
pr_err("clear_user (%px %d) returned %d\n", test + PAGE_SIZE - 256, j, k);
}
if (t != 0xa5a5a5a5) {
pr_err("v1 was clobbered to 0x%x!\n", t);
}
}
return 0;
}
late_initcall(test_clear_user);
Which demonstrates that v1 is indeed clobbered (MIPS64):
Testing clear_user
v1 was clobbered to 0x1!
v1 was clobbered to 0x2!
v1 was clobbered to 0x3!
v1 was clobbered to 0x4!
v1 was clobbered to 0x5!
v1 was clobbered to 0x6!
v1 was clobbered to 0x7!
Since the number of bytes that could not be set is already contained in
a2, the andi placing a value in v1 is not necessary and actively
harmful in clobbering v1.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Cc: stable(a)vger.kernel.org
Reported-by: James Hogan <jhogan(a)kernel.org>
Signed-off-by: Matt Redfearn <matt.redfearn(a)mips.com>
---
arch/mips/lib/memset.S | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/mips/lib/memset.S b/arch/mips/lib/memset.S
index fa3bec269331..84e91f4fdf53 100644
--- a/arch/mips/lib/memset.S
+++ b/arch/mips/lib/memset.S
@@ -258,7 +258,7 @@
.Llast_fixup\@:
jr ra
- andi v1, a2, STORMASK
+ nop
.Lsmall_fixup\@:
PTR_SUBU a2, t1, a0
--
2.7.4
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 41387bb7d869e96df4b870e1880ad49f053cc755 Mon Sep 17 00:00:00 2001
From: Matthew Wilcox <mawilcox(a)microsoft.com>
Date: Fri, 2 Mar 2018 10:40:14 -0800
Subject: [PATCH] Documentation/sphinx: Fix Directive import error
Sphinx 1.7 removed sphinx.util.compat.Directive so people
who have upgraded cannot build the documentation. Switch to
docutils.parsers.rst.Directive which has been available since
docutils 0.5 released in 2009.
Bugzilla: https://bugzilla.opensuse.org/show_bug.cgi?id=1083694
Co-developed-by: Takashi Iwai <tiwai(a)suse.de>
Acked-by: Jani Nikula <jani.nikula(a)intel.com>
Cc: stable(a)vger.kernel.org
Signed-off-by: Matthew Wilcox <mawilcox(a)microsoft.com>
Signed-off-by: Jonathan Corbet <corbet(a)lwn.net>
diff --git a/Documentation/sphinx/kerneldoc.py b/Documentation/sphinx/kerneldoc.py
index 39aa9e8697cc..fbedcc39460b 100644
--- a/Documentation/sphinx/kerneldoc.py
+++ b/Documentation/sphinx/kerneldoc.py
@@ -36,8 +36,7 @@ import glob
from docutils import nodes, statemachine
from docutils.statemachine import ViewList
-from docutils.parsers.rst import directives
-from sphinx.util.compat import Directive
+from docutils.parsers.rst import directives, Directive
from sphinx.ext.autodoc import AutodocReporter
__version__ = '1.0'
The patch below does not apply to the 4.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 41387bb7d869e96df4b870e1880ad49f053cc755 Mon Sep 17 00:00:00 2001
From: Matthew Wilcox <mawilcox(a)microsoft.com>
Date: Fri, 2 Mar 2018 10:40:14 -0800
Subject: [PATCH] Documentation/sphinx: Fix Directive import error
Sphinx 1.7 removed sphinx.util.compat.Directive so people
who have upgraded cannot build the documentation. Switch to
docutils.parsers.rst.Directive which has been available since
docutils 0.5 released in 2009.
Bugzilla: https://bugzilla.opensuse.org/show_bug.cgi?id=1083694
Co-developed-by: Takashi Iwai <tiwai(a)suse.de>
Acked-by: Jani Nikula <jani.nikula(a)intel.com>
Cc: stable(a)vger.kernel.org
Signed-off-by: Matthew Wilcox <mawilcox(a)microsoft.com>
Signed-off-by: Jonathan Corbet <corbet(a)lwn.net>
diff --git a/Documentation/sphinx/kerneldoc.py b/Documentation/sphinx/kerneldoc.py
index 39aa9e8697cc..fbedcc39460b 100644
--- a/Documentation/sphinx/kerneldoc.py
+++ b/Documentation/sphinx/kerneldoc.py
@@ -36,8 +36,7 @@ import glob
from docutils import nodes, statemachine
from docutils.statemachine import ViewList
-from docutils.parsers.rst import directives
-from sphinx.util.compat import Directive
+from docutils.parsers.rst import directives, Directive
from sphinx.ext.autodoc import AutodocReporter
__version__ = '1.0'
The patch below does not apply to the 4.16-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 9e7f06c8beee304ee21b791653fefcd713f48b9a Mon Sep 17 00:00:00 2001
From: Takashi Iwai <tiwai(a)suse.de>
Date: Tue, 10 Apr 2018 19:05:13 +0200
Subject: [PATCH] swiotlb: fix unexpected swiotlb_alloc_coherent failures
The code refactoring by commit 0176adb00406 ("swiotlb: refactor coherent
buffer allocation") made swiotlb_alloc_buffer almost always failing due
to a thinko: namely, the function evaluates the dma_coherent_ok call
incorrectly and dealing as if it's invalid. This ends up with weird
errors like iwlwifi probe failure or amdgpu screen flickering.
This patch corrects the logic error.
Bugzilla: https://bugzilla.suse.com/show_bug.cgi?id=1088658
Bugzilla: https://bugzilla.suse.com/show_bug.cgi?id=1088902
Fixes: 0176adb00406 ("swiotlb: refactor coherent buffer allocation")
Cc: <stable(a)vger.kernel.org> # v4.16+
Signed-off-by: Takashi Iwai <tiwai(a)suse.de>
Signed-off-by: Christoph Hellwig <hch(a)lst.de>
diff --git a/lib/swiotlb.c b/lib/swiotlb.c
index 47aeb04c1997..de7cc540450f 100644
--- a/lib/swiotlb.c
+++ b/lib/swiotlb.c
@@ -719,7 +719,7 @@ swiotlb_alloc_buffer(struct device *dev, size_t size, dma_addr_t *dma_handle,
goto out_warn;
*dma_handle = __phys_to_dma(dev, phys_addr);
- if (dma_coherent_ok(dev, *dma_handle, size))
+ if (!dma_coherent_ok(dev, *dma_handle, size))
goto out_unmap;
memset(phys_to_virt(phys_addr), 0, size);