The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From f21c601a2bb319ec19eb4562eadc7797d90fd90e Mon Sep 17 00:00:00 2001
From: Mike Snitzer <snitzer(a)redhat.com>
Date: Fri, 15 Jun 2018 09:35:33 -0400
Subject: [PATCH] dm: use bio_split() when splitting out the already processed
bio
Use of bio_clone_bioset() is inefficient if there is no need to clone
the original bio's bio_vec array. Best to use the bio_clone_fast()
variant. Also, just using bio_advance() is only part of what is needed
to properly setup the clone -- it doesn't account for the various
bio_integrity() related work that also needs to be performed (see
bio_split).
Address both of these issues by switching from bio_clone_bioset() to
bio_split().
Fixes: 18a25da8 ("dm: ensure bio submission follows a depth-first tree walk")
Cc: stable(a)vger.kernel.org # 4.15+, requires removal of '&' before md->queue->bio_split
Reported-by: Christoph Hellwig <hch(a)lst.de>
Reviewed-by: NeilBrown <neilb(a)suse.com>
Signed-off-by: Mike Snitzer <snitzer(a)redhat.com>
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index e65429a29c06..a3b103e8e3ce 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -1606,10 +1606,9 @@ static blk_qc_t __split_and_process_bio(struct mapped_device *md,
* the usage of io->orig_bio in dm_remap_zone_report()
* won't be affected by this reassignment.
*/
- struct bio *b = bio_clone_bioset(bio, GFP_NOIO,
- &md->queue->bio_split);
+ struct bio *b = bio_split(bio, bio_sectors(bio) - ci.sector_count,
+ GFP_NOIO, &md->queue->bio_split);
ci.io->orig_bio = b;
- bio_advance(bio, (bio_sectors(bio) - ci.sector_count) << 9);
bio_chain(b, bio);
ret = generic_make_request(bio);
break;
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From f21c601a2bb319ec19eb4562eadc7797d90fd90e Mon Sep 17 00:00:00 2001
From: Mike Snitzer <snitzer(a)redhat.com>
Date: Fri, 15 Jun 2018 09:35:33 -0400
Subject: [PATCH] dm: use bio_split() when splitting out the already processed
bio
Use of bio_clone_bioset() is inefficient if there is no need to clone
the original bio's bio_vec array. Best to use the bio_clone_fast()
variant. Also, just using bio_advance() is only part of what is needed
to properly setup the clone -- it doesn't account for the various
bio_integrity() related work that also needs to be performed (see
bio_split).
Address both of these issues by switching from bio_clone_bioset() to
bio_split().
Fixes: 18a25da8 ("dm: ensure bio submission follows a depth-first tree walk")
Cc: stable(a)vger.kernel.org # 4.15+, requires removal of '&' before md->queue->bio_split
Reported-by: Christoph Hellwig <hch(a)lst.de>
Reviewed-by: NeilBrown <neilb(a)suse.com>
Signed-off-by: Mike Snitzer <snitzer(a)redhat.com>
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index e65429a29c06..a3b103e8e3ce 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -1606,10 +1606,9 @@ static blk_qc_t __split_and_process_bio(struct mapped_device *md,
* the usage of io->orig_bio in dm_remap_zone_report()
* won't be affected by this reassignment.
*/
- struct bio *b = bio_clone_bioset(bio, GFP_NOIO,
- &md->queue->bio_split);
+ struct bio *b = bio_split(bio, bio_sectors(bio) - ci.sector_count,
+ GFP_NOIO, &md->queue->bio_split);
ci.io->orig_bio = b;
- bio_advance(bio, (bio_sectors(bio) - ci.sector_count) << 9);
bio_chain(b, bio);
ret = generic_make_request(bio);
break;
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From dbc626597c39b24cefce09fbd8e9dea85869a801 Mon Sep 17 00:00:00 2001
From: Ross Zwisler <ross.zwisler(a)linux.intel.com>
Date: Tue, 26 Jun 2018 16:30:41 -0600
Subject: [PATCH] dm: prevent DAX mounts if not supported
Currently device_supports_dax() just checks to see if the QUEUE_FLAG_DAX
flag is set on the device's request queue to decide whether or not the
device supports filesystem DAX. Really we should be using
bdev_dax_supported() like filesystems do at mount time. This performs
other tests like checking to make sure the dax_direct_access() path works.
We also explicitly clear QUEUE_FLAG_DAX on the DM device's request queue if
any of the underlying devices do not support DAX. This makes the handling
of QUEUE_FLAG_DAX consistent with the setting/clearing of most other flags
in dm_table_set_restrictions().
Now that bdev_dax_supported() explicitly checks for QUEUE_FLAG_DAX, this
will ensure that filesystems built upon DM devices will only be able to
mount with DAX if all underlying devices also support DAX.
Signed-off-by: Ross Zwisler <ross.zwisler(a)linux.intel.com>
Fixes: commit 545ed20e6df6 ("dm: add infrastructure for DAX support")
Cc: stable(a)vger.kernel.org
Acked-by: Dan Williams <dan.j.williams(a)intel.com>
Reviewed-by: Toshi Kani <toshi.kani(a)hpe.com>
Signed-off-by: Mike Snitzer <snitzer(a)redhat.com>
diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c
index 938766794c2e..3d0e2c198f06 100644
--- a/drivers/md/dm-table.c
+++ b/drivers/md/dm-table.c
@@ -885,9 +885,7 @@ EXPORT_SYMBOL_GPL(dm_table_set_type);
static int device_supports_dax(struct dm_target *ti, struct dm_dev *dev,
sector_t start, sector_t len, void *data)
{
- struct request_queue *q = bdev_get_queue(dev->bdev);
-
- return q && blk_queue_dax(q);
+ return bdev_dax_supported(dev->bdev, PAGE_SIZE);
}
static bool dm_table_supports_dax(struct dm_table *t)
@@ -1907,6 +1905,9 @@ void dm_table_set_restrictions(struct dm_table *t, struct request_queue *q,
if (dm_table_supports_dax(t))
blk_queue_flag_set(QUEUE_FLAG_DAX, q);
+ else
+ blk_queue_flag_clear(QUEUE_FLAG_DAX, q);
+
if (dm_table_supports_dax_write_cache(t))
dax_write_cache(t->md->dax_dev, true);
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index a3b103e8e3ce..b0dd7027848b 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -1056,8 +1056,7 @@ static long dm_dax_direct_access(struct dax_device *dax_dev, pgoff_t pgoff,
if (len < 1)
goto out;
nr_pages = min(len, nr_pages);
- if (ti->type->direct_access)
- ret = ti->type->direct_access(ti, pgoff, nr_pages, kaddr, pfn);
+ ret = ti->type->direct_access(ti, pgoff, nr_pages, kaddr, pfn);
out:
dm_put_live_table(md, srcu_idx);
The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From ce99319a182fe766be67f96338386f3ec73e321c Mon Sep 17 00:00:00 2001
From: Maxime Chevallier <maxime.chevallier(a)bootlin.com>
Date: Fri, 2 Mar 2018 15:55:09 +0100
Subject: [PATCH] spi: Fix scatterlist elements size in spi_map_buf
When SPI transfers can be offloaded using DMA, the SPI core need to
build a scatterlist to make sure that the buffer to be transferred is
dma-able.
This patch fixes the scatterlist entry size computation in the case
where the maximum acceptable scatterlist entry supported by the DMA
controller is less than PAGE_SIZE, when the buffer is vmalloced.
For each entry, the actual size is given by the minimum between the
desc_len (which is the max buffer size supported by the DMA controller)
and the remaining buffer length until we cross a page boundary.
Fixes: 65598c13fd66 ("spi: Fix per-page mapping of unaligned vmalloc-ed buffer")
Signed-off-by: Maxime Chevallier <maxime.chevallier(a)bootlin.com>
Signed-off-by: Mark Brown <broonie(a)kernel.org>
Cc: stable(a)vger.kernel.org
diff --git a/drivers/spi/spi.c b/drivers/spi/spi.c
index b33a727a0158..4153f959f28c 100644
--- a/drivers/spi/spi.c
+++ b/drivers/spi/spi.c
@@ -779,8 +779,14 @@ static int spi_map_buf(struct spi_controller *ctlr, struct device *dev,
for (i = 0; i < sgs; i++) {
if (vmalloced_buf || kmap_buf) {
- min = min_t(size_t,
- len, desc_len - offset_in_page(buf));
+ /*
+ * Next scatterlist entry size is the minimum between
+ * the desc_len and the remaining buffer length that
+ * fits in a page.
+ */
+ min = min_t(size_t, desc_len,
+ min_t(size_t, len,
+ PAGE_SIZE - offset_in_page(buf)));
if (vmalloced_buf)
vm_page = vmalloc_to_page(buf);
else
The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 5811375325420052fcadd944792a416a43072b7f Mon Sep 17 00:00:00 2001
From: Liu Bo <bo.li.liu(a)oracle.com>
Date: Wed, 31 Jan 2018 17:09:13 -0700
Subject: [PATCH] Btrfs: fix unexpected cow in run_delalloc_nocow
Fstests generic/475 provides a way to fail metadata reads while
checking if checksum exists for the inode inside run_delalloc_nocow(),
and csum_exist_in_range() interprets error (-EIO) as inode having
checksum and makes its caller enter the cow path.
In case of free space inode, this ends up with a warning in
cow_file_range().
The same problem applies to btrfs_cross_ref_exist() since it may also
read metadata in between.
With this, run_delalloc_nocow() bails out when errors occur at the two
places.
cc: <stable(a)vger.kernel.org> v2.6.28+
Fixes: 17d217fe970d ("Btrfs: fix nodatasum handling in balancing code")
Signed-off-by: Liu Bo <bo.li.liu(a)oracle.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 6504e63b2317..491a7397f6fa 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -1256,6 +1256,8 @@ static noinline int csum_exist_in_range(struct btrfs_fs_info *fs_info,
list_del(&sums->list);
kfree(sums);
}
+ if (ret < 0)
+ return ret;
return 1;
}
@@ -1388,10 +1390,23 @@ static noinline int run_delalloc_nocow(struct inode *inode,
goto out_check;
if (btrfs_extent_readonly(fs_info, disk_bytenr))
goto out_check;
- if (btrfs_cross_ref_exist(root, ino,
- found_key.offset -
- extent_offset, disk_bytenr))
+ ret = btrfs_cross_ref_exist(root, ino,
+ found_key.offset -
+ extent_offset, disk_bytenr);
+ if (ret) {
+ /*
+ * ret could be -EIO if the above fails to read
+ * metadata.
+ */
+ if (ret < 0) {
+ if (cow_start != (u64)-1)
+ cur_offset = cow_start;
+ goto error;
+ }
+
+ WARN_ON_ONCE(nolock);
goto out_check;
+ }
disk_bytenr += extent_offset;
disk_bytenr += cur_offset - found_key.offset;
num_bytes = min(end + 1, extent_end) - cur_offset;
@@ -1409,10 +1424,22 @@ static noinline int run_delalloc_nocow(struct inode *inode,
* this ensure that csum for a given extent are
* either valid or do not exist.
*/
- if (csum_exist_in_range(fs_info, disk_bytenr,
- num_bytes)) {
+ ret = csum_exist_in_range(fs_info, disk_bytenr,
+ num_bytes);
+ if (ret) {
if (!nolock)
btrfs_end_write_no_snapshotting(root);
+
+ /*
+ * ret could be -EIO if the above fails to read
+ * metadata.
+ */
+ if (ret < 0) {
+ if (cow_start != (u64)-1)
+ cur_offset = cow_start;
+ goto error;
+ }
+ WARN_ON_ONCE(nolock);
goto out_check;
}
if (!btrfs_inc_nocow_writers(fs_info, disk_bytenr)) {
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From fdcb613d49321b5bf5d5a1bd0fba8e7c241dcc70 Mon Sep 17 00:00:00 2001
From: Hans de Goede <hdegoede(a)redhat.com>
Date: Thu, 26 Apr 2018 14:10:24 +0200
Subject: [PATCH] ACPI / LPSS: Add missing prv_offset setting for byt/cht PWM
devices
The LPSS PWM device on on Bay Trail and Cherry Trail devices has a set
of private registers at offset 0x800, the current lpss_device_desc for
them already sets the LPSS_SAVE_CTX flag to have these saved/restored
over device-suspend, but the current lpss_device_desc was not setting
the prv_offset field, leading to the regular device registers getting
saved/restored instead.
This is causing the PWM controller to no longer work, resulting in a black
screen, after a suspend/resume on systems where the firmware clears the
APB clock and reset bits at offset 0x804.
This commit fixes this by properly setting prv_offset to 0x800 for
the PWM devices.
Cc: stable(a)vger.kernel.org
Fixes: e1c748179754 ("ACPI / LPSS: Add Intel BayTrail ACPI mode PWM")
Fixes: 1bfbd8eb8a7f ("ACPI / LPSS: Add ACPI IDs for Intel Braswell")
Signed-off-by: Hans de Goede <hdegoede(a)redhat.com>
Acked-by: Rafael J . Wysocki <rjw(a)rjwysocki.net>
Signed-off-by: Thierry Reding <thierry.reding(a)gmail.com>
diff --git a/drivers/acpi/acpi_lpss.c b/drivers/acpi/acpi_lpss.c
index 2bcffec8dbf0..c4ba9164e582 100644
--- a/drivers/acpi/acpi_lpss.c
+++ b/drivers/acpi/acpi_lpss.c
@@ -229,11 +229,13 @@ static const struct lpss_device_desc lpt_sdio_dev_desc = {
static const struct lpss_device_desc byt_pwm_dev_desc = {
.flags = LPSS_SAVE_CTX,
+ .prv_offset = 0x800,
.setup = byt_pwm_setup,
};
static const struct lpss_device_desc bsw_pwm_dev_desc = {
.flags = LPSS_SAVE_CTX | LPSS_NO_D3_DELAY,
+ .prv_offset = 0x800,
.setup = bsw_pwm_setup,
};
Subject of the patch:
perf/x86/intel/uncore: Add event constraint for BDX PCU
Commit id:
bb9fbe1b57503f790dbbf9f06e72cb0fb9e60740
Wish to apply to:
4.14
Fix an issue on BDX.
Thanks
Jin Yao
Subject of the patch:
perf vendor events: Add Goldmont Plus V1 event file
Commit id:
65db92e0965ab56e8031d5c804f26d5be0e47047
Wish to apply to:
4.14, 4.9, 4.4
Wish to support Goldmont Plus event since it's needed for some
productions which run with stable kernel.
Thanks
Jin Yao
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From c0b0d540db1a8bfb041166c4991dd6f624e8de45 Mon Sep 17 00:00:00 2001
From: Sean Wang <sean.wang(a)mediatek.com>
Date: Wed, 11 Apr 2018 16:53:56 +0800
Subject: [PATCH] arm: dts: mt7623: fix invalid memory node being generated
Below two wrong nodes in existing DTS files would cause a fail boot since
in fact the address 0 is not the correct place the memory device locates
at.
memory {
device_type = "memory";
reg = <0x0 0x0 0x0 0x0>;
};
memory@80000000 {
reg = <0x0 0x80000000 0x0 0x40000000>;
};
In order to avoid having a memory node starting at address 0, we can't
include file skeleton64.dtsi and instead need to explicitly manually
define a few of properties the DTS relies on such as #address-cells
and #size-cells in root node and device_type in the node memory@80000000.
Cc: stable(a)vger.kernel.org
Fixes: 31ac0d69a1d4 ("ARM: dts: mediatek: add MT7623 basic support")
Signed-off-by: Sean Wang <sean.wang(a)mediatek.com>
Cc: Rob Herring <robh+dt(a)kernel.org>
Signed-off-by: Matthias Brugger <matthias.bgg(a)gmail.com>
diff --git a/arch/arm/boot/dts/mt7623.dtsi b/arch/arm/boot/dts/mt7623.dtsi
index 68e987ddedc7..d4d04c365960 100644
--- a/arch/arm/boot/dts/mt7623.dtsi
+++ b/arch/arm/boot/dts/mt7623.dtsi
@@ -15,11 +15,12 @@
#include <dt-bindings/phy/phy.h>
#include <dt-bindings/reset/mt2701-resets.h>
#include <dt-bindings/thermal/thermal.h>
-#include "skeleton64.dtsi"
/ {
compatible = "mediatek,mt7623";
interrupt-parent = <&sysirq>;
+ #address-cells = <2>;
+ #size-cells = <2>;
cpu_opp_table: opp-table {
compatible = "operating-points-v2";
diff --git a/arch/arm/boot/dts/mt7623n-bananapi-bpi-r2.dts b/arch/arm/boot/dts/mt7623n-bananapi-bpi-r2.dts
index bbf56f855e46..5938e4c79deb 100644
--- a/arch/arm/boot/dts/mt7623n-bananapi-bpi-r2.dts
+++ b/arch/arm/boot/dts/mt7623n-bananapi-bpi-r2.dts
@@ -109,6 +109,7 @@
};
memory@80000000 {
+ device_type = "memory";
reg = <0 0x80000000 0 0x40000000>;
};
};
diff --git a/arch/arm/boot/dts/mt7623n-rfb.dtsi b/arch/arm/boot/dts/mt7623n-rfb.dtsi
index a199ae78dd25..343e8efe5f25 100644
--- a/arch/arm/boot/dts/mt7623n-rfb.dtsi
+++ b/arch/arm/boot/dts/mt7623n-rfb.dtsi
@@ -40,6 +40,7 @@
};
memory@80000000 {
+ device_type = "memory";
reg = <0 0x80000000 0 0x40000000>;
};
The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 81114baa835b59ed02d14aa1d67f91ea874077cd Mon Sep 17 00:00:00 2001
From: Chao Yu <yuchao0(a)huawei.com>
Date: Mon, 9 Apr 2018 20:25:06 +0800
Subject: [PATCH] f2fs: don't use GFP_ZERO for page caches
Related to https://lkml.org/lkml/2018/4/8/661
Sometimes, we need to write meta data to new allocated block address,
then we will allocate a zeroed page in inner inode's address space, and
fill partial data in it, and leave other place with zero value which means
some fields are initial status.
There are two inner inodes (meta inode and node inode) setting __GFP_ZERO,
I have just checked them, for both of them, we can avoid using __GFP_ZERO,
and do initialization by ourselves to avoid unneeded/redundant zeroing
from mm.
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Chao Yu <yuchao0(a)huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk(a)kernel.org>
diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c
index 3d3a1f6b21a1..8fc55d42ba25 100644
--- a/fs/f2fs/checkpoint.c
+++ b/fs/f2fs/checkpoint.c
@@ -100,8 +100,10 @@ repeat:
* readonly and make sure do not write checkpoint with non-uptodate
* meta page.
*/
- if (unlikely(!PageUptodate(page)))
+ if (unlikely(!PageUptodate(page))) {
+ memset(page_address(page), 0, PAGE_SIZE);
f2fs_stop_checkpoint(sbi, false);
+ }
out:
return page;
}
diff --git a/fs/f2fs/inode.c b/fs/f2fs/inode.c
index 417c9dcd0269..87535bf63421 100644
--- a/fs/f2fs/inode.c
+++ b/fs/f2fs/inode.c
@@ -320,10 +320,10 @@ struct inode *f2fs_iget(struct super_block *sb, unsigned long ino)
make_now:
if (ino == F2FS_NODE_INO(sbi)) {
inode->i_mapping->a_ops = &f2fs_node_aops;
- mapping_set_gfp_mask(inode->i_mapping, GFP_F2FS_ZERO);
+ mapping_set_gfp_mask(inode->i_mapping, GFP_NOFS);
} else if (ino == F2FS_META_INO(sbi)) {
inode->i_mapping->a_ops = &f2fs_meta_aops;
- mapping_set_gfp_mask(inode->i_mapping, GFP_F2FS_ZERO);
+ mapping_set_gfp_mask(inode->i_mapping, GFP_NOFS);
} else if (S_ISREG(inode->i_mode)) {
inode->i_op = &f2fs_file_inode_operations;
inode->i_fop = &f2fs_file_operations;
diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
index a7ec952093f8..0fb006f591a4 100644
--- a/fs/f2fs/segment.c
+++ b/fs/f2fs/segment.c
@@ -1979,6 +1979,7 @@ static void write_current_sum_page(struct f2fs_sb_info *sbi,
struct f2fs_summary_block *dst;
dst = (struct f2fs_summary_block *)page_address(page);
+ memset(dst, 0, PAGE_SIZE);
mutex_lock(&curseg->curseg_mutex);
@@ -3133,6 +3134,7 @@ static void write_compacted_summaries(struct f2fs_sb_info *sbi, block_t blkaddr)
page = grab_meta_page(sbi, blkaddr++);
kaddr = (unsigned char *)page_address(page);
+ memset(kaddr, 0, PAGE_SIZE);
/* Step 1: write nat cache */
seg_i = CURSEG_I(sbi, CURSEG_HOT_DATA);
@@ -3157,6 +3159,7 @@ static void write_compacted_summaries(struct f2fs_sb_info *sbi, block_t blkaddr)
if (!page) {
page = grab_meta_page(sbi, blkaddr++);
kaddr = (unsigned char *)page_address(page);
+ memset(kaddr, 0, PAGE_SIZE);
written_size = 0;
}
summary = (struct f2fs_summary *)(kaddr + written_size);
diff --git a/fs/f2fs/segment.h b/fs/f2fs/segment.h
index 3325d0769723..492ad0c86fa9 100644
--- a/fs/f2fs/segment.h
+++ b/fs/f2fs/segment.h
@@ -375,6 +375,7 @@ static inline void seg_info_to_sit_page(struct f2fs_sb_info *sbi,
int i;
raw_sit = (struct f2fs_sit_block *)page_address(page);
+ memset(raw_sit, 0, PAGE_SIZE);
for (i = 0; i < end - start; i++) {
rs = &raw_sit->entries[i];
se = get_seg_entry(sbi, start + i);
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 81114baa835b59ed02d14aa1d67f91ea874077cd Mon Sep 17 00:00:00 2001
From: Chao Yu <yuchao0(a)huawei.com>
Date: Mon, 9 Apr 2018 20:25:06 +0800
Subject: [PATCH] f2fs: don't use GFP_ZERO for page caches
Related to https://lkml.org/lkml/2018/4/8/661
Sometimes, we need to write meta data to new allocated block address,
then we will allocate a zeroed page in inner inode's address space, and
fill partial data in it, and leave other place with zero value which means
some fields are initial status.
There are two inner inodes (meta inode and node inode) setting __GFP_ZERO,
I have just checked them, for both of them, we can avoid using __GFP_ZERO,
and do initialization by ourselves to avoid unneeded/redundant zeroing
from mm.
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Chao Yu <yuchao0(a)huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk(a)kernel.org>
diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c
index 3d3a1f6b21a1..8fc55d42ba25 100644
--- a/fs/f2fs/checkpoint.c
+++ b/fs/f2fs/checkpoint.c
@@ -100,8 +100,10 @@ repeat:
* readonly and make sure do not write checkpoint with non-uptodate
* meta page.
*/
- if (unlikely(!PageUptodate(page)))
+ if (unlikely(!PageUptodate(page))) {
+ memset(page_address(page), 0, PAGE_SIZE);
f2fs_stop_checkpoint(sbi, false);
+ }
out:
return page;
}
diff --git a/fs/f2fs/inode.c b/fs/f2fs/inode.c
index 417c9dcd0269..87535bf63421 100644
--- a/fs/f2fs/inode.c
+++ b/fs/f2fs/inode.c
@@ -320,10 +320,10 @@ struct inode *f2fs_iget(struct super_block *sb, unsigned long ino)
make_now:
if (ino == F2FS_NODE_INO(sbi)) {
inode->i_mapping->a_ops = &f2fs_node_aops;
- mapping_set_gfp_mask(inode->i_mapping, GFP_F2FS_ZERO);
+ mapping_set_gfp_mask(inode->i_mapping, GFP_NOFS);
} else if (ino == F2FS_META_INO(sbi)) {
inode->i_mapping->a_ops = &f2fs_meta_aops;
- mapping_set_gfp_mask(inode->i_mapping, GFP_F2FS_ZERO);
+ mapping_set_gfp_mask(inode->i_mapping, GFP_NOFS);
} else if (S_ISREG(inode->i_mode)) {
inode->i_op = &f2fs_file_inode_operations;
inode->i_fop = &f2fs_file_operations;
diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
index a7ec952093f8..0fb006f591a4 100644
--- a/fs/f2fs/segment.c
+++ b/fs/f2fs/segment.c
@@ -1979,6 +1979,7 @@ static void write_current_sum_page(struct f2fs_sb_info *sbi,
struct f2fs_summary_block *dst;
dst = (struct f2fs_summary_block *)page_address(page);
+ memset(dst, 0, PAGE_SIZE);
mutex_lock(&curseg->curseg_mutex);
@@ -3133,6 +3134,7 @@ static void write_compacted_summaries(struct f2fs_sb_info *sbi, block_t blkaddr)
page = grab_meta_page(sbi, blkaddr++);
kaddr = (unsigned char *)page_address(page);
+ memset(kaddr, 0, PAGE_SIZE);
/* Step 1: write nat cache */
seg_i = CURSEG_I(sbi, CURSEG_HOT_DATA);
@@ -3157,6 +3159,7 @@ static void write_compacted_summaries(struct f2fs_sb_info *sbi, block_t blkaddr)
if (!page) {
page = grab_meta_page(sbi, blkaddr++);
kaddr = (unsigned char *)page_address(page);
+ memset(kaddr, 0, PAGE_SIZE);
written_size = 0;
}
summary = (struct f2fs_summary *)(kaddr + written_size);
diff --git a/fs/f2fs/segment.h b/fs/f2fs/segment.h
index 3325d0769723..492ad0c86fa9 100644
--- a/fs/f2fs/segment.h
+++ b/fs/f2fs/segment.h
@@ -375,6 +375,7 @@ static inline void seg_info_to_sit_page(struct f2fs_sb_info *sbi,
int i;
raw_sit = (struct f2fs_sit_block *)page_address(page);
+ memset(raw_sit, 0, PAGE_SIZE);
for (i = 0; i < end - start; i++) {
rs = &raw_sit->entries[i];
se = get_seg_entry(sbi, start + i);
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 81114baa835b59ed02d14aa1d67f91ea874077cd Mon Sep 17 00:00:00 2001
From: Chao Yu <yuchao0(a)huawei.com>
Date: Mon, 9 Apr 2018 20:25:06 +0800
Subject: [PATCH] f2fs: don't use GFP_ZERO for page caches
Related to https://lkml.org/lkml/2018/4/8/661
Sometimes, we need to write meta data to new allocated block address,
then we will allocate a zeroed page in inner inode's address space, and
fill partial data in it, and leave other place with zero value which means
some fields are initial status.
There are two inner inodes (meta inode and node inode) setting __GFP_ZERO,
I have just checked them, for both of them, we can avoid using __GFP_ZERO,
and do initialization by ourselves to avoid unneeded/redundant zeroing
from mm.
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Chao Yu <yuchao0(a)huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk(a)kernel.org>
diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c
index 3d3a1f6b21a1..8fc55d42ba25 100644
--- a/fs/f2fs/checkpoint.c
+++ b/fs/f2fs/checkpoint.c
@@ -100,8 +100,10 @@ repeat:
* readonly and make sure do not write checkpoint with non-uptodate
* meta page.
*/
- if (unlikely(!PageUptodate(page)))
+ if (unlikely(!PageUptodate(page))) {
+ memset(page_address(page), 0, PAGE_SIZE);
f2fs_stop_checkpoint(sbi, false);
+ }
out:
return page;
}
diff --git a/fs/f2fs/inode.c b/fs/f2fs/inode.c
index 417c9dcd0269..87535bf63421 100644
--- a/fs/f2fs/inode.c
+++ b/fs/f2fs/inode.c
@@ -320,10 +320,10 @@ struct inode *f2fs_iget(struct super_block *sb, unsigned long ino)
make_now:
if (ino == F2FS_NODE_INO(sbi)) {
inode->i_mapping->a_ops = &f2fs_node_aops;
- mapping_set_gfp_mask(inode->i_mapping, GFP_F2FS_ZERO);
+ mapping_set_gfp_mask(inode->i_mapping, GFP_NOFS);
} else if (ino == F2FS_META_INO(sbi)) {
inode->i_mapping->a_ops = &f2fs_meta_aops;
- mapping_set_gfp_mask(inode->i_mapping, GFP_F2FS_ZERO);
+ mapping_set_gfp_mask(inode->i_mapping, GFP_NOFS);
} else if (S_ISREG(inode->i_mode)) {
inode->i_op = &f2fs_file_inode_operations;
inode->i_fop = &f2fs_file_operations;
diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
index a7ec952093f8..0fb006f591a4 100644
--- a/fs/f2fs/segment.c
+++ b/fs/f2fs/segment.c
@@ -1979,6 +1979,7 @@ static void write_current_sum_page(struct f2fs_sb_info *sbi,
struct f2fs_summary_block *dst;
dst = (struct f2fs_summary_block *)page_address(page);
+ memset(dst, 0, PAGE_SIZE);
mutex_lock(&curseg->curseg_mutex);
@@ -3133,6 +3134,7 @@ static void write_compacted_summaries(struct f2fs_sb_info *sbi, block_t blkaddr)
page = grab_meta_page(sbi, blkaddr++);
kaddr = (unsigned char *)page_address(page);
+ memset(kaddr, 0, PAGE_SIZE);
/* Step 1: write nat cache */
seg_i = CURSEG_I(sbi, CURSEG_HOT_DATA);
@@ -3157,6 +3159,7 @@ static void write_compacted_summaries(struct f2fs_sb_info *sbi, block_t blkaddr)
if (!page) {
page = grab_meta_page(sbi, blkaddr++);
kaddr = (unsigned char *)page_address(page);
+ memset(kaddr, 0, PAGE_SIZE);
written_size = 0;
}
summary = (struct f2fs_summary *)(kaddr + written_size);
diff --git a/fs/f2fs/segment.h b/fs/f2fs/segment.h
index 3325d0769723..492ad0c86fa9 100644
--- a/fs/f2fs/segment.h
+++ b/fs/f2fs/segment.h
@@ -375,6 +375,7 @@ static inline void seg_info_to_sit_page(struct f2fs_sb_info *sbi,
int i;
raw_sit = (struct f2fs_sit_block *)page_address(page);
+ memset(raw_sit, 0, PAGE_SIZE);
for (i = 0; i < end - start; i++) {
rs = &raw_sit->entries[i];
se = get_seg_entry(sbi, start + i);
Hello,
Syzkaller has reported a crash here[1] for a slab OOB read in
xfrm_hash_rebuild.
Could the following 2 patches be applied in order to on 4.4.y?
6916fb3b10("xfrm: Ignore socket policies when rebuilding hash tables")
862591bf4f("xfrm: skip policies marked as dead while rehashing")
[1] https://syzkaller.appspot.com/bug?id=1c11a638b7d27e871aa297f3b4d5fd5bc90f0c…
Thanks,
- Zubin
From: Martin Kelly <mkelly(a)xevo.com>
commit c043ec1ca5baae63726aae32abbe003192bc6eec upstream.
Currently, we use int for buffer length and bytes_per_datum. However,
kfifo uses unsigned int for length and size_t for element size. We need
to make sure these matches or we will have bugs related to overflow (in
the range between INT_MAX and UINT_MAX for length, for example).
In addition, set_bytes_per_datum uses size_t while bytes_per_datum is an
int, which would cause bugs for large values of bytes_per_datum.
Change buffer length to use unsigned int and bytes_per_datum to use
size_t.
Signed-off-by: Martin Kelly <mkelly(a)xevo.com>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron(a)huawei.com>
[bwh: Backported to 4.9:
- Drop change to iio_dma_buffer_set_length()
- Adjust filename, context]
Signed-off-by: Ben Hutchings <ben.hutchings(a)codethink.co.uk>
---
drivers/iio/buffer/kfifo_buf.c | 4 ++--
include/linux/iio/buffer.h | 6 +++---
2 files changed, 5 insertions(+), 5 deletions(-)
diff --git a/drivers/iio/buffer/kfifo_buf.c b/drivers/iio/buffer/kfifo_buf.c
index 7ef9b13262a8..e44181f9eb36 100644
--- a/drivers/iio/buffer/kfifo_buf.c
+++ b/drivers/iio/buffer/kfifo_buf.c
@@ -19,7 +19,7 @@ struct iio_kfifo {
#define iio_to_kfifo(r) container_of(r, struct iio_kfifo, buffer)
static inline int __iio_allocate_kfifo(struct iio_kfifo *buf,
- int bytes_per_datum, int length)
+ size_t bytes_per_datum, unsigned int length)
{
if ((length == 0) || (bytes_per_datum == 0))
return -EINVAL;
@@ -71,7 +71,7 @@ static int iio_set_bytes_per_datum_kfifo(struct iio_buffer *r, size_t bpd)
return 0;
}
-static int iio_set_length_kfifo(struct iio_buffer *r, int length)
+static int iio_set_length_kfifo(struct iio_buffer *r, unsigned int length)
{
/* Avoid an invalid state */
if (length < 2)
diff --git a/include/linux/iio/buffer.h b/include/linux/iio/buffer.h
index 70a5164f4728..821965c90070 100644
--- a/include/linux/iio/buffer.h
+++ b/include/linux/iio/buffer.h
@@ -61,7 +61,7 @@ struct iio_buffer_access_funcs {
int (*request_update)(struct iio_buffer *buffer);
int (*set_bytes_per_datum)(struct iio_buffer *buffer, size_t bpd);
- int (*set_length)(struct iio_buffer *buffer, int length);
+ int (*set_length)(struct iio_buffer *buffer, unsigned int length);
int (*enable)(struct iio_buffer *buffer, struct iio_dev *indio_dev);
int (*disable)(struct iio_buffer *buffer, struct iio_dev *indio_dev);
@@ -96,8 +96,8 @@ struct iio_buffer_access_funcs {
* @watermark: [INTERN] number of datums to wait for poll/read.
*/
struct iio_buffer {
- int length;
- int bytes_per_datum;
+ unsigned int length;
+ size_t bytes_per_datum;
struct attribute_group *scan_el_attrs;
long *scan_mask;
bool scan_timestamp;
--
Ben Hutchings, Software Developer Codethink Ltd
https://www.codethink.co.uk/ Dale House, 35 Dale Street
Manchester, M1 2HF, United Kingdom
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From ca4c8fc97e669d7464a95e006d0ca4eadfa4e4bc Mon Sep 17 00:00:00 2001
From: Jean-Baptiste Maneyrol <jmaneyrol(a)invensense.com>
Date: Mon, 30 Apr 2018 12:14:09 +0200
Subject: [PATCH] iio: imu: inv_mpu6050: fix possible deadlock between mutex
and iio
Detected by kernel circular locking dependency checker.
We are locking iio mutex (iio_device_claim_direct_mode) after
locking our internal mutex. But when the buffer starts, iio first
locks its mutex and then we lock our internal one.
To avoid possible deadlock, we need to use the same order
everwhere. So we change the ordering by locking first iio mutex,
then our internal mutex.
Fixes: 68cd6e5b206b ("iio: imu: inv_mpu6050: fix lock issues by using our own mutex")
Cc: stable(a)vger.kernel.org
Signed-off-by: Jean-Baptiste Maneyrol <jmaneyrol(a)invensense.com>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron(a)huawei.com>
diff --git a/drivers/iio/imu/inv_mpu6050/inv_mpu_core.c b/drivers/iio/imu/inv_mpu6050/inv_mpu_core.c
index aafa77766b08..7358935fb391 100644
--- a/drivers/iio/imu/inv_mpu6050/inv_mpu_core.c
+++ b/drivers/iio/imu/inv_mpu6050/inv_mpu_core.c
@@ -340,12 +340,9 @@ static int inv_mpu6050_read_channel_data(struct iio_dev *indio_dev,
int result;
int ret;
- result = iio_device_claim_direct_mode(indio_dev);
- if (result)
- return result;
result = inv_mpu6050_set_power_itg(st, true);
if (result)
- goto error_release;
+ return result;
switch (chan->type) {
case IIO_ANGL_VEL:
@@ -386,14 +383,11 @@ static int inv_mpu6050_read_channel_data(struct iio_dev *indio_dev,
result = inv_mpu6050_set_power_itg(st, false);
if (result)
goto error_power_off;
- iio_device_release_direct_mode(indio_dev);
return ret;
error_power_off:
inv_mpu6050_set_power_itg(st, false);
-error_release:
- iio_device_release_direct_mode(indio_dev);
return result;
}
@@ -407,9 +401,13 @@ inv_mpu6050_read_raw(struct iio_dev *indio_dev,
switch (mask) {
case IIO_CHAN_INFO_RAW:
+ ret = iio_device_claim_direct_mode(indio_dev);
+ if (ret)
+ return ret;
mutex_lock(&st->lock);
ret = inv_mpu6050_read_channel_data(indio_dev, chan, val);
mutex_unlock(&st->lock);
+ iio_device_release_direct_mode(indio_dev);
return ret;
case IIO_CHAN_INFO_SCALE:
switch (chan->type) {
@@ -532,17 +530,18 @@ static int inv_mpu6050_write_raw(struct iio_dev *indio_dev,
struct inv_mpu6050_state *st = iio_priv(indio_dev);
int result;
- mutex_lock(&st->lock);
/*
* we should only update scale when the chip is disabled, i.e.
* not running
*/
result = iio_device_claim_direct_mode(indio_dev);
if (result)
- goto error_write_raw_unlock;
+ return result;
+
+ mutex_lock(&st->lock);
result = inv_mpu6050_set_power_itg(st, true);
if (result)
- goto error_write_raw_release;
+ goto error_write_raw_unlock;
switch (mask) {
case IIO_CHAN_INFO_SCALE:
@@ -581,10 +580,9 @@ static int inv_mpu6050_write_raw(struct iio_dev *indio_dev,
}
result |= inv_mpu6050_set_power_itg(st, false);
-error_write_raw_release:
- iio_device_release_direct_mode(indio_dev);
error_write_raw_unlock:
mutex_unlock(&st->lock);
+ iio_device_release_direct_mode(indio_dev);
return result;
}
@@ -643,17 +641,18 @@ inv_mpu6050_fifo_rate_store(struct device *dev, struct device_attribute *attr,
fifo_rate > INV_MPU6050_MAX_FIFO_RATE)
return -EINVAL;
+ result = iio_device_claim_direct_mode(indio_dev);
+ if (result)
+ return result;
+
mutex_lock(&st->lock);
if (fifo_rate == st->chip_config.fifo_rate) {
result = 0;
goto fifo_rate_fail_unlock;
}
- result = iio_device_claim_direct_mode(indio_dev);
- if (result)
- goto fifo_rate_fail_unlock;
result = inv_mpu6050_set_power_itg(st, true);
if (result)
- goto fifo_rate_fail_release;
+ goto fifo_rate_fail_unlock;
d = INV_MPU6050_ONE_K_HZ / fifo_rate - 1;
result = regmap_write(st->map, st->reg->sample_rate_div, d);
@@ -667,10 +666,9 @@ inv_mpu6050_fifo_rate_store(struct device *dev, struct device_attribute *attr,
fifo_rate_fail_power_off:
result |= inv_mpu6050_set_power_itg(st, false);
-fifo_rate_fail_release:
- iio_device_release_direct_mode(indio_dev);
fifo_rate_fail_unlock:
mutex_unlock(&st->lock);
+ iio_device_release_direct_mode(indio_dev);
if (result)
return result;
The patch below does not apply to the 4.17-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From ca4c8fc97e669d7464a95e006d0ca4eadfa4e4bc Mon Sep 17 00:00:00 2001
From: Jean-Baptiste Maneyrol <jmaneyrol(a)invensense.com>
Date: Mon, 30 Apr 2018 12:14:09 +0200
Subject: [PATCH] iio: imu: inv_mpu6050: fix possible deadlock between mutex
and iio
Detected by kernel circular locking dependency checker.
We are locking iio mutex (iio_device_claim_direct_mode) after
locking our internal mutex. But when the buffer starts, iio first
locks its mutex and then we lock our internal one.
To avoid possible deadlock, we need to use the same order
everwhere. So we change the ordering by locking first iio mutex,
then our internal mutex.
Fixes: 68cd6e5b206b ("iio: imu: inv_mpu6050: fix lock issues by using our own mutex")
Cc: stable(a)vger.kernel.org
Signed-off-by: Jean-Baptiste Maneyrol <jmaneyrol(a)invensense.com>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron(a)huawei.com>
diff --git a/drivers/iio/imu/inv_mpu6050/inv_mpu_core.c b/drivers/iio/imu/inv_mpu6050/inv_mpu_core.c
index aafa77766b08..7358935fb391 100644
--- a/drivers/iio/imu/inv_mpu6050/inv_mpu_core.c
+++ b/drivers/iio/imu/inv_mpu6050/inv_mpu_core.c
@@ -340,12 +340,9 @@ static int inv_mpu6050_read_channel_data(struct iio_dev *indio_dev,
int result;
int ret;
- result = iio_device_claim_direct_mode(indio_dev);
- if (result)
- return result;
result = inv_mpu6050_set_power_itg(st, true);
if (result)
- goto error_release;
+ return result;
switch (chan->type) {
case IIO_ANGL_VEL:
@@ -386,14 +383,11 @@ static int inv_mpu6050_read_channel_data(struct iio_dev *indio_dev,
result = inv_mpu6050_set_power_itg(st, false);
if (result)
goto error_power_off;
- iio_device_release_direct_mode(indio_dev);
return ret;
error_power_off:
inv_mpu6050_set_power_itg(st, false);
-error_release:
- iio_device_release_direct_mode(indio_dev);
return result;
}
@@ -407,9 +401,13 @@ inv_mpu6050_read_raw(struct iio_dev *indio_dev,
switch (mask) {
case IIO_CHAN_INFO_RAW:
+ ret = iio_device_claim_direct_mode(indio_dev);
+ if (ret)
+ return ret;
mutex_lock(&st->lock);
ret = inv_mpu6050_read_channel_data(indio_dev, chan, val);
mutex_unlock(&st->lock);
+ iio_device_release_direct_mode(indio_dev);
return ret;
case IIO_CHAN_INFO_SCALE:
switch (chan->type) {
@@ -532,17 +530,18 @@ static int inv_mpu6050_write_raw(struct iio_dev *indio_dev,
struct inv_mpu6050_state *st = iio_priv(indio_dev);
int result;
- mutex_lock(&st->lock);
/*
* we should only update scale when the chip is disabled, i.e.
* not running
*/
result = iio_device_claim_direct_mode(indio_dev);
if (result)
- goto error_write_raw_unlock;
+ return result;
+
+ mutex_lock(&st->lock);
result = inv_mpu6050_set_power_itg(st, true);
if (result)
- goto error_write_raw_release;
+ goto error_write_raw_unlock;
switch (mask) {
case IIO_CHAN_INFO_SCALE:
@@ -581,10 +580,9 @@ static int inv_mpu6050_write_raw(struct iio_dev *indio_dev,
}
result |= inv_mpu6050_set_power_itg(st, false);
-error_write_raw_release:
- iio_device_release_direct_mode(indio_dev);
error_write_raw_unlock:
mutex_unlock(&st->lock);
+ iio_device_release_direct_mode(indio_dev);
return result;
}
@@ -643,17 +641,18 @@ inv_mpu6050_fifo_rate_store(struct device *dev, struct device_attribute *attr,
fifo_rate > INV_MPU6050_MAX_FIFO_RATE)
return -EINVAL;
+ result = iio_device_claim_direct_mode(indio_dev);
+ if (result)
+ return result;
+
mutex_lock(&st->lock);
if (fifo_rate == st->chip_config.fifo_rate) {
result = 0;
goto fifo_rate_fail_unlock;
}
- result = iio_device_claim_direct_mode(indio_dev);
- if (result)
- goto fifo_rate_fail_unlock;
result = inv_mpu6050_set_power_itg(st, true);
if (result)
- goto fifo_rate_fail_release;
+ goto fifo_rate_fail_unlock;
d = INV_MPU6050_ONE_K_HZ / fifo_rate - 1;
result = regmap_write(st->map, st->reg->sample_rate_div, d);
@@ -667,10 +666,9 @@ inv_mpu6050_fifo_rate_store(struct device *dev, struct device_attribute *attr,
fifo_rate_fail_power_off:
result |= inv_mpu6050_set_power_itg(st, false);
-fifo_rate_fail_release:
- iio_device_release_direct_mode(indio_dev);
fifo_rate_fail_unlock:
mutex_unlock(&st->lock);
+ iio_device_release_direct_mode(indio_dev);
if (result)
return result;
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 5811375325420052fcadd944792a416a43072b7f Mon Sep 17 00:00:00 2001
From: Liu Bo <bo.li.liu(a)oracle.com>
Date: Wed, 31 Jan 2018 17:09:13 -0700
Subject: [PATCH] Btrfs: fix unexpected cow in run_delalloc_nocow
Fstests generic/475 provides a way to fail metadata reads while
checking if checksum exists for the inode inside run_delalloc_nocow(),
and csum_exist_in_range() interprets error (-EIO) as inode having
checksum and makes its caller enter the cow path.
In case of free space inode, this ends up with a warning in
cow_file_range().
The same problem applies to btrfs_cross_ref_exist() since it may also
read metadata in between.
With this, run_delalloc_nocow() bails out when errors occur at the two
places.
cc: <stable(a)vger.kernel.org> v2.6.28+
Fixes: 17d217fe970d ("Btrfs: fix nodatasum handling in balancing code")
Signed-off-by: Liu Bo <bo.li.liu(a)oracle.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 6504e63b2317..491a7397f6fa 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -1256,6 +1256,8 @@ static noinline int csum_exist_in_range(struct btrfs_fs_info *fs_info,
list_del(&sums->list);
kfree(sums);
}
+ if (ret < 0)
+ return ret;
return 1;
}
@@ -1388,10 +1390,23 @@ static noinline int run_delalloc_nocow(struct inode *inode,
goto out_check;
if (btrfs_extent_readonly(fs_info, disk_bytenr))
goto out_check;
- if (btrfs_cross_ref_exist(root, ino,
- found_key.offset -
- extent_offset, disk_bytenr))
+ ret = btrfs_cross_ref_exist(root, ino,
+ found_key.offset -
+ extent_offset, disk_bytenr);
+ if (ret) {
+ /*
+ * ret could be -EIO if the above fails to read
+ * metadata.
+ */
+ if (ret < 0) {
+ if (cow_start != (u64)-1)
+ cur_offset = cow_start;
+ goto error;
+ }
+
+ WARN_ON_ONCE(nolock);
goto out_check;
+ }
disk_bytenr += extent_offset;
disk_bytenr += cur_offset - found_key.offset;
num_bytes = min(end + 1, extent_end) - cur_offset;
@@ -1409,10 +1424,22 @@ static noinline int run_delalloc_nocow(struct inode *inode,
* this ensure that csum for a given extent are
* either valid or do not exist.
*/
- if (csum_exist_in_range(fs_info, disk_bytenr,
- num_bytes)) {
+ ret = csum_exist_in_range(fs_info, disk_bytenr,
+ num_bytes);
+ if (ret) {
if (!nolock)
btrfs_end_write_no_snapshotting(root);
+
+ /*
+ * ret could be -EIO if the above fails to read
+ * metadata.
+ */
+ if (ret < 0) {
+ if (cow_start != (u64)-1)
+ cur_offset = cow_start;
+ goto error;
+ }
+ WARN_ON_ONCE(nolock);
goto out_check;
}
if (!btrfs_inc_nocow_writers(fs_info, disk_bytenr)) {
The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From b5c40d598f5408bd0ca22dfffa82f03cd9433f23 Mon Sep 17 00:00:00 2001
From: Omar Sandoval <osandov(a)fb.com>
Date: Tue, 22 May 2018 15:02:12 -0700
Subject: [PATCH] Btrfs: fix clone vs chattr NODATASUM race
In btrfs_clone_files(), we must check the NODATASUM flag while the
inodes are locked. Otherwise, it's possible that btrfs_ioctl_setflags()
will change the flags after we check and we can end up with a party
checksummed file.
The race window is only a few instructions in size, between the if and
the locks which is:
3834 if (S_ISDIR(src->i_mode) || S_ISDIR(inode->i_mode))
3835 return -EISDIR;
where the setflags must be run and toggle the NODATASUM flag (provided
the file size is 0). The clone will block on the inode lock, segflags
takes the inode lock, changes flags, releases log and clone continues.
Not impossible but still needs a lot of bad luck to hit unintentionally.
Fixes: 0e7b824c4ef9 ("Btrfs: don't make a file partly checksummed through file clone")
CC: stable(a)vger.kernel.org # 4.4+
Signed-off-by: Omar Sandoval <osandov(a)fb.com>
Reviewed-by: Nikolay Borisov <nborisov(a)suse.com>
Reviewed-by: David Sterba <dsterba(a)suse.com>
[ update changelog ]
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 743c4f1b8001..b9b779a4ab6e 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -3808,11 +3808,6 @@ static noinline int btrfs_clone_files(struct file *file, struct file *file_src,
src->i_sb != inode->i_sb)
return -EXDEV;
- /* don't make the dst file partly checksummed */
- if ((BTRFS_I(src)->flags & BTRFS_INODE_NODATASUM) !=
- (BTRFS_I(inode)->flags & BTRFS_INODE_NODATASUM))
- return -EINVAL;
-
if (S_ISDIR(src->i_mode) || S_ISDIR(inode->i_mode))
return -EISDIR;
@@ -3822,6 +3817,13 @@ static noinline int btrfs_clone_files(struct file *file, struct file *file_src,
inode_lock(src);
}
+ /* don't make the dst file partly checksummed */
+ if ((BTRFS_I(src)->flags & BTRFS_INODE_NODATASUM) !=
+ (BTRFS_I(inode)->flags & BTRFS_INODE_NODATASUM)) {
+ ret = -EINVAL;
+ goto out_unlock;
+ }
+
/* determine range to clone */
ret = -EINVAL;
if (off + len > src->i_size || off + len < off)
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From ff7c9917143b3a6cf2fa61212a32d67cf259bf9c Mon Sep 17 00:00:00 2001
From: Srinivas Pandruvada <srinivas.pandruvada(a)linux.intel.com>
Date: Mon, 18 Jun 2018 12:47:45 -0700
Subject: [PATCH] cpufreq: intel_pstate: Fix scaling max/min limits with Turbo
3.0
When scaling max/min settings are changed, internally they are converted
to a ratio using the max turbo 1 core turbo frequency. This works fine
when 1 core max is same irrespective of the core. But under Turbo 3.0,
this will not be the case. For example:
Core 0: max turbo pstate: 43 (4.3GHz)
Core 1: max turbo pstate: 45 (4.5GHz)
In this case 1 core turbo ratio will be maximum of all, so it will be
45 (4.5GHz). Suppose scaling max is set to 4GHz (ratio 40) for all cores
,then on core one it will be
= max_state * policy->max / max_freq;
= 43 * (4000000/4500000) = 38 (3.8GHz)
= 38
which is 200MHz less than the desired.
On core2, it will be correctly set to ratio 40 (4GHz). Same holds true
for scaling min frequency limit. So this requires usage of correct turbo
max frequency for core one, which in this case is 4.3GHz. So we need to
adjust per CPU cpu->pstate.turbo_freq using the maximum HWP ratio of that
core.
This change uses the HWP capability of a core to adjust max turbo
frequency. But since Broadwell HWP doesn't use ratios in the HWP
capabilities, we have to use legacy max 1 core turbo ratio. This is not
a problem as the HWP capabilities don't differ among cores in Broadwell.
We need to check for non Broadwell CPU model for applying this change,
though.
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada(a)linux.intel.com>
Cc: 4.6+ <stable(a)vger.kernel.org> # 4.6+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki(a)intel.com>
diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c
index 1de5ec8d5ea3..ece120da3353 100644
--- a/drivers/cpufreq/intel_pstate.c
+++ b/drivers/cpufreq/intel_pstate.c
@@ -294,6 +294,7 @@ struct pstate_funcs {
static struct pstate_funcs pstate_funcs __read_mostly;
static int hwp_active __read_mostly;
+static int hwp_mode_bdw __read_mostly;
static bool per_cpu_limits __read_mostly;
static bool hwp_boost __read_mostly;
@@ -1413,7 +1414,15 @@ static void intel_pstate_get_cpu_pstates(struct cpudata *cpu)
cpu->pstate.turbo_pstate = pstate_funcs.get_turbo();
cpu->pstate.scaling = pstate_funcs.get_scaling();
cpu->pstate.max_freq = cpu->pstate.max_pstate * cpu->pstate.scaling;
- cpu->pstate.turbo_freq = cpu->pstate.turbo_pstate * cpu->pstate.scaling;
+
+ if (hwp_active && !hwp_mode_bdw) {
+ unsigned int phy_max, current_max;
+
+ intel_pstate_get_hwp_max(cpu->cpu, &phy_max, ¤t_max);
+ cpu->pstate.turbo_freq = phy_max * cpu->pstate.scaling;
+ } else {
+ cpu->pstate.turbo_freq = cpu->pstate.turbo_pstate * cpu->pstate.scaling;
+ }
if (pstate_funcs.get_aperf_mperf_shift)
cpu->aperf_mperf_shift = pstate_funcs.get_aperf_mperf_shift();
@@ -2467,28 +2476,36 @@ static inline bool intel_pstate_has_acpi_ppc(void) { return false; }
static inline void intel_pstate_request_control_from_smm(void) {}
#endif /* CONFIG_ACPI */
+#define INTEL_PSTATE_HWP_BROADWELL 0x01
+
+#define ICPU_HWP(model, hwp_mode) \
+ { X86_VENDOR_INTEL, 6, model, X86_FEATURE_HWP, hwp_mode }
+
static const struct x86_cpu_id hwp_support_ids[] __initconst = {
- { X86_VENDOR_INTEL, 6, X86_MODEL_ANY, X86_FEATURE_HWP },
+ ICPU_HWP(INTEL_FAM6_BROADWELL_X, INTEL_PSTATE_HWP_BROADWELL),
+ ICPU_HWP(INTEL_FAM6_BROADWELL_XEON_D, INTEL_PSTATE_HWP_BROADWELL),
+ ICPU_HWP(X86_MODEL_ANY, 0),
{}
};
static int __init intel_pstate_init(void)
{
+ const struct x86_cpu_id *id;
int rc;
if (no_load)
return -ENODEV;
- if (x86_match_cpu(hwp_support_ids)) {
+ id = x86_match_cpu(hwp_support_ids);
+ if (id) {
copy_cpu_funcs(&core_funcs);
if (!no_hwp) {
hwp_active++;
+ hwp_mode_bdw = id->driver_data;
intel_pstate.attr = hwp_cpufreq_attrs;
goto hwp_cpu_matched;
}
} else {
- const struct x86_cpu_id *id;
-
id = x86_match_cpu(intel_pstate_cpu_ids);
if (!id)
return -ENODEV;
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 988a35f8da1dec5a8cd2788054d1e717be61bf25 Mon Sep 17 00:00:00 2001
From: Tetsuo Handa <penguin-kernel(a)I-love.SAKURA.ne.jp>
Date: Fri, 11 May 2018 19:54:19 +0900
Subject: [PATCH] printk: fix possible reuse of va_list variable
I noticed that there is a possibility that printk_safe_log_store() causes
kernel oops because "args" parameter is passed to vsnprintf() again when
atomic_cmpxchg() detected that we raced. Fix this by using va_copy().
Link: http://lkml.kernel.org/r/201805112002.GIF21216.OFVHFOMLJtQFSO@I-love.SAKURA…
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Steven Rostedt <rostedt(a)goodmis.org>
Cc: dvyukov(a)google.com
Cc: syzkaller(a)googlegroups.com
Cc: fengguang.wu(a)intel.com
Cc: linux-kernel(a)vger.kernel.org
Signed-off-by: Tetsuo Handa <penguin-kernel(a)I-love.SAKURA.ne.jp>
Fixes: 42a0bb3f71383b45 ("printk/nmi: generic solution for safe printk in NMI")
Cc: 4.7+ <stable(a)vger.kernel.org> # v4.7+
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky(a)gmail.com>
Signed-off-by: Petr Mladek <pmladek(a)suse.com>
diff --git a/kernel/printk/printk_safe.c b/kernel/printk/printk_safe.c
index 3e3c2004bb23..449d67edfa4b 100644
--- a/kernel/printk/printk_safe.c
+++ b/kernel/printk/printk_safe.c
@@ -82,6 +82,7 @@ static __printf(2, 0) int printk_safe_log_store(struct printk_safe_seq_buf *s,
{
int add;
size_t len;
+ va_list ap;
again:
len = atomic_read(&s->len);
@@ -100,7 +101,9 @@ static __printf(2, 0) int printk_safe_log_store(struct printk_safe_seq_buf *s,
if (!len)
smp_rmb();
- add = vscnprintf(s->buffer + len, sizeof(s->buffer) - len, fmt, args);
+ va_copy(ap, args);
+ add = vscnprintf(s->buffer + len, sizeof(s->buffer) - len, fmt, ap);
+ va_end(ap);
if (!add)
return 0;
The patch below does not apply to the 4.17-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From c2deae44616dab0112d965a0dc72d053b5727b4b Mon Sep 17 00:00:00 2001
From: Koen Vandeputte <koen.vandeputte(a)ncentric.com>
Date: Mon, 15 Jan 2018 10:36:08 +0100
Subject: [PATCH] PCI: dwc: Fix enumeration end when reaching root subordinate
The subordinate value indicates the highest bus number which can be
reached downstream though a certain device.
Commit a20c7f36bd3d ("PCI: Do not allocate more buses than available in
parent")
ensures that downstream devices cannot assign busnumbers higher than the
upstream device subordinate number, which was indeed illogical.
By default, dw_pcie_setup_rc() inits the Root Complex subordinate to a
value of 0x01.
Due to this combined with above commit, enumeration stops digging deeper
downstream as soon as bus num 0x01 has been assigned, which is always
the case for a bridge device.
This results in all devices behind a bridge bus to remain undetected, as
these would be connected to bus 0x02 or higher.
Fix this by initializing the RC to a subordinate value of 0xff, which is
not altering hardware behaviour in any way, but informs probing
function pci_scan_bridge() later on which reads this value back from
register.
Following nasty errors during boot are also fixed by this:
[ 0.459145] pci_bus 0000:02: busn_res: can not insert [bus 02-ff]
under [bus 01] (conflicts with (null) [bus 01])
...
[ 0.464515] pci_bus 0000:03: [bus 03] partially hidden behind bridge
0000:01 [bus 01]
...
[ 0.464892] pci_bus 0000:04: [bus 04] partially hidden behind bridge
0000:01 [bus 01]
...
[ 0.466488] pci_bus 0000:05: [bus 05] partially hidden behind bridge
0000:01 [bus 01]
[ 0.466506] pci_bus 0000:02: busn_res: [bus 02-ff] end is updated to
05
[ 0.466517] pci_bus 0000:02: busn_res: can not insert [bus 02-05]
under [bus 01] (conflicts with (null) [bus 01])
[ 0.466534] pci_bus 0000:02: [bus 02-05] partially hidden behind
bridge 0000:01 [bus 01]
Fixes: a20c7f36bd3d ("PCI: Do not allocate more buses than available in
parent")
Tested-by: Niklas Cassel <niklas.cassel(a)axis.com>
Tested-by: Fabio Estevam <fabio.estevam(a)nxp.com>
Tested-by: Sebastian Reichel <sebastian.reichel(a)collabora.co.uk>
Signed-off-by: Koen Vandeputte <koen.vandeputte(a)ncentric.com>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi(a)arm.com>
Reviewed-by: Mika Westerberg <mika.westerberg(a)linux.intel.com>
Acked-by: Lucas Stach <l.stach(a)pengutronix.de>
Cc: <stable(a)vger.kernel.org> # 4.15
Cc: Binghui Wang <wangbinghui(a)hisilicon.com>
Cc: Bjorn Helgaas <bhelgaas(a)google.com>
Cc: Jesper Nilsson <jesper.nilsson(a)axis.com>
Cc: Jianguo Sun <sunjianguo1(a)huawei.com>
Cc: Jingoo Han <jingoohan1(a)gmail.com>
Cc: Kishon Vijay Abraham I <kishon(a)ti.com>
Cc: Lorenzo Pieralisi <lorenzo.pieralisi(a)arm.com>
Cc: Lucas Stach <l.stach(a)pengutronix.de>
Cc: Mika Westerberg <mika.westerberg(a)linux.intel.com>
Cc: Minghuan Lian <minghuan.Lian(a)freescale.com>
Cc: Mingkai Hu <mingkai.hu(a)freescale.com>
Cc: Murali Karicheri <m-karicheri2(a)ti.com>
Cc: Pratyush Anand <pratyush.anand(a)gmail.com>
Cc: Richard Zhu <hongxing.zhu(a)nxp.com>
Cc: Roy Zang <tie-fei.zang(a)freescale.com>
Cc: Shawn Guo <shawn.guo(a)linaro.org>
Cc: Stanimir Varbanov <svarbanov(a)mm-sol.com>
Cc: Thomas Petazzoni <thomas.petazzoni(a)free-electrons.com>
Cc: Xiaowei Song <songxiaowei(a)hisilicon.com>
Cc: Zhou Wang <wangzhou1(a)hisilicon.com>
diff --git a/drivers/pci/dwc/pcie-designware-host.c b/drivers/pci/dwc/pcie-designware-host.c
index 8de2d5c69b1d..dc9303abda42 100644
--- a/drivers/pci/dwc/pcie-designware-host.c
+++ b/drivers/pci/dwc/pcie-designware-host.c
@@ -613,7 +613,7 @@ void dw_pcie_setup_rc(struct pcie_port *pp)
/* setup bus numbers */
val = dw_pcie_readl_dbi(pci, PCI_PRIMARY_BUS);
val &= 0xff000000;
- val |= 0x00010100;
+ val |= 0x00ff0100;
dw_pcie_writel_dbi(pci, PCI_PRIMARY_BUS, val);
/* setup command register */
The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 3f77f244d8ec28e3a0a81240ffac7d626390060c Mon Sep 17 00:00:00 2001
From: Martin Kaiser <martin(a)kaiser.cx>
Date: Mon, 18 Jun 2018 22:41:03 +0200
Subject: [PATCH] mtd: rawnand: mxc: set spare area size register explicitly
The v21 version of the NAND flash controller contains a Spare Area Size
Register (SPAS) at offset 0x10. Its setting defaults to the maximum
spare area size of 218 bytes. The size that is set in this register is
used by the controller when it calculates the ECC bytes internally in
hardware.
Usually, this register is updated from settings in the IIM fuses when
the system is booting from NAND flash. For other boot media, however,
the SPAS register remains at the default setting, which may not work for
the particular flash chip on the board. The same goes for flash chips
whose configuration cannot be set in the IIM fuses (e.g. chips with 2k
sector size and 128 bytes spare area size can't be configured in the IIM
fuses on imx25 systems).
Set the SPAS register explicitly during the preset operation. Derive the
register value from mtd->oobsize that was detected during probe by
decoding the flash chip's ID bytes.
While at it, rename the define for the spare area register's offset to
NFC_V21_RSLTSPARE_AREA. The register at offset 0x10 on v1 controllers is
different from the register on v21 controllers.
Fixes: d484018 ("mtd: mxc_nand: set NFC registers after reset")
Cc: stable(a)vger.kernel.org
Signed-off-by: Martin Kaiser <martin(a)kaiser.cx>
Reviewed-by: Sascha Hauer <s.hauer(a)pengutronix.de>
Reviewed-by: Miquel Raynal <miquel.raynal(a)bootlin.com>
Signed-off-by: Boris Brezillon <boris.brezillon(a)bootlin.com>
diff --git a/drivers/mtd/nand/raw/mxc_nand.c b/drivers/mtd/nand/raw/mxc_nand.c
index 45786e707b7b..26cef218bb43 100644
--- a/drivers/mtd/nand/raw/mxc_nand.c
+++ b/drivers/mtd/nand/raw/mxc_nand.c
@@ -48,7 +48,7 @@
#define NFC_V1_V2_CONFIG (host->regs + 0x0a)
#define NFC_V1_V2_ECC_STATUS_RESULT (host->regs + 0x0c)
#define NFC_V1_V2_RSLTMAIN_AREA (host->regs + 0x0e)
-#define NFC_V1_V2_RSLTSPARE_AREA (host->regs + 0x10)
+#define NFC_V21_RSLTSPARE_AREA (host->regs + 0x10)
#define NFC_V1_V2_WRPROT (host->regs + 0x12)
#define NFC_V1_UNLOCKSTART_BLKADDR (host->regs + 0x14)
#define NFC_V1_UNLOCKEND_BLKADDR (host->regs + 0x16)
@@ -1274,6 +1274,9 @@ static void preset_v2(struct mtd_info *mtd)
writew(config1, NFC_V1_V2_CONFIG1);
/* preset operation */
+ /* spare area size in 16-bit half-words */
+ writew(mtd->oobsize / 2, NFC_V21_RSLTSPARE_AREA);
+
/* Unlock the internal RAM Buffer */
writew(0x2, NFC_V1_V2_CONFIG);