This is the start of the stable review cycle for the 4.19.245 release. There are 44 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed, 25 May 2022 16:56:55 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.19.245-rc... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.19.y and the diffstat can be found below.
thanks,
greg k-h
------------- Pseudo-Shortlog of commits:
Greg Kroah-Hartman gregkh@linuxfoundation.org Linux 4.19.245-rc1
David Howells dhowells@redhat.com afs: Fix afs_getattr() to refetch file status if callback break occurred
Linus Torvalds torvalds@linux-foundation.org Reinstate some of "swiotlb: rework "fix info leak with DMA_FROM_DEVICE""
Halil Pasic pasic@linux.ibm.com swiotlb: fix info leak with DMA_FROM_DEVICE
Grant Grundler grundler@chromium.org net: atlantic: verify hw_head_ lies within TX buffer ring
Yang Yingliang yangyingliang@huawei.com net: stmmac: fix missing pci_disable_device() on error in stmmac_pci_probe()
Yang Yingliang yangyingliang@huawei.com ethernet: tulip: fix missing pci_disable_device() on error in tulip_init_one()
Felix Fietkau nbd@nbd.name mac80211: fix rx reordering with non explicit / psmp ack policy
Gleb Chesnokov Chesnokov.G@raidix.com scsi: qla2xxx: Fix missed DMA unmap for aborted commands
Thomas Richter tmricht@linux.ibm.com perf bench numa: Address compiler error on s390
Uwe Kleine-König u.kleine-koenig@pengutronix.de gpio: mvebu/pwm: Refuse requests with inverted polarity
Haibo Chen haibo.chen@nxp.com gpio: gpio-vf610: do not touch other bits when set the target bit
Andrew Lunn andrew@lunn.ch net: bridge: Clear offload_fwd_mark when passing frame up bridge interface.
Kevin Mitchell kevmitch@arista.com igb: skip phy status check where unavailable
Ard Biesheuvel ardb@kernel.org ARM: 9197/1: spectre-bhb: fix loop8 sequence for Thumb2
Ard Biesheuvel ardb@kernel.org ARM: 9196/1: spectre-bhb: enable for Cortex-A15
Jiasheng Jiang jiasheng@iscas.ac.cn net: af_key: add check for pfkey_broadcast in function pfkey_process
Maxim Mikityanskiy maximmi@nvidia.com net/mlx5e: Properly block LRO when XDP is enabled
Duoming Zhou duoming@zju.edu.cn NFC: nci: fix sleep in atomic context bugs caused by nci_skb_alloc
Christophe JAILLET christophe.jaillet@wanadoo.fr net/qla3xxx: Fix a test in ql_reset_work()
Codrin Ciubotariu codrin.ciubotariu@microchip.com clk: at91: generated: consider range when calculating best rate
Zixuan Fu r33s3n6@gmail.com net: vmxnet3: fix possible NULL pointer dereference in vmxnet3_rq_cleanup()
Zixuan Fu r33s3n6@gmail.com net: vmxnet3: fix possible use-after-free bugs in vmxnet3_rq_alloc_rx_buf()
Paolo Abeni pabeni@redhat.com net/sched: act_pedit: sanitize shift argument before usage
Harini Katakam harini.katakam@xilinx.com net: macb: Increment rx bd head after allocating skb and buffer
Ulf Hansson ulf.hansson@linaro.org mmc: core: Default to generic_cmd6_time as timeout in __mmc_switch()
Ulf Hansson ulf.hansson@linaro.org mmc: block: Use generic_cmd6_time when modifying INAND_CMD38_ARG_EXT_CSD
Ulf Hansson ulf.hansson@linaro.org mmc: core: Specify timeouts for BKOPS and CACHE_FLUSH for eMMC
Ulf Hansson ulf.hansson@linaro.org mmc: core: Cleanup BKOPS support
Hangyu Hua hbh25y@gmail.com drm/dp/mst: fix a possible memory leak in fetch_monitor_name()
Ondrej Mosnacek omosnace@redhat.com crypto: qcom-rng - fix infinite loop on requests not multiple of WORD_SZ
Rafael J. Wysocki rafael.j.wysocki@intel.com PCI/PM: Avoid putting Elo i2 PCIe Ports in D3cold
Al Viro viro@zeniv.linux.org.uk Fix double fget() in vhost_net_set_backend()
Peter Zijlstra peterz@infradead.org perf: Fix sys_perf_event_open() race against self
Takashi Iwai tiwai@suse.de ALSA: wavefront: Proper check of get_user() error
Ryusuke Konishi konishi.ryusuke@gmail.com nilfs2: fix lockdep warnings during disk space reclamation
Ryusuke Konishi konishi.ryusuke@gmail.com nilfs2: fix lockdep warnings in page operations for btree nodes
linyujun linyujun809@huawei.com ARM: 9191/1: arm/stacktrace, kasan: Silence KASAN warnings in unwind_frame()
Jakob Koschel jakobkoschel@gmail.com drbd: remove usage of list iterator variable after loop
Xiaoke Wang xkernel.wang@foxmail.com MIPS: lantiq: check the return value of kzalloc()
Zheng Yongjun zhengyongjun3@huawei.com crypto: stm32 - fix reference leak in stm32_crc_remove
Zheng Yongjun zhengyongjun3@huawei.com Input: stmfts - fix reference leak in stmfts_input_open
Jeff LaBundy jeff@labundy.com Input: add bounds checking to input_set_capability()
David Gow davidgow@google.com um: Cleanup syscall_handler_t definition/cast, fix warning
Willy Tarreau w@1wt.eu floppy: use a statically allocated error counter
-------------
Diffstat:
Makefile | 4 +- arch/arm/kernel/entry-armv.S | 2 +- arch/arm/kernel/stacktrace.c | 10 +- arch/arm/mm/proc-v7-bugs.c | 1 + arch/mips/lantiq/falcon/sysctrl.c | 2 + arch/mips/lantiq/xway/gptu.c | 2 + arch/mips/lantiq/xway/sysctrl.c | 46 +++--- arch/x86/um/shared/sysdep/syscalls_64.h | 5 +- drivers/block/drbd/drbd_main.c | 7 +- drivers/block/floppy.c | 17 +-- drivers/clk/at91/clk-generated.c | 4 + drivers/crypto/qcom-rng.c | 1 + drivers/crypto/stm32/stm32_crc32.c | 4 +- drivers/gpio/gpio-mvebu.c | 3 + drivers/gpio/gpio-vf610.c | 8 +- drivers/gpu/drm/drm_dp_mst_topology.c | 1 + drivers/input/input.c | 19 +++ drivers/input/touchscreen/stmfts.c | 8 +- drivers/mmc/core/block.c | 8 +- drivers/mmc/core/card.h | 6 +- drivers/mmc/core/mmc.c | 6 - drivers/mmc/core/mmc_ops.c | 110 ++++---------- drivers/mmc/core/mmc_ops.h | 3 +- .../ethernet/aquantia/atlantic/hw_atl/hw_atl_b0.c | 7 + drivers/net/ethernet/cadence/macb_main.c | 2 +- drivers/net/ethernet/dec/tulip/tulip_core.c | 5 +- drivers/net/ethernet/intel/igb/igb_main.c | 3 +- drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 7 + drivers/net/ethernet/qlogic/qla3xxx.c | 3 +- drivers/net/ethernet/stmicro/stmmac/stmmac_pci.c | 4 +- drivers/net/vmxnet3/vmxnet3_drv.c | 6 + drivers/pci/pci.c | 10 ++ drivers/scsi/qla2xxx/qla_target.c | 3 + drivers/vhost/net.c | 15 +- fs/afs/inode.c | 14 +- fs/nilfs2/btnode.c | 23 ++- fs/nilfs2/btnode.h | 1 + fs/nilfs2/btree.c | 27 ++-- fs/nilfs2/dat.c | 4 +- fs/nilfs2/gcinode.c | 7 +- fs/nilfs2/inode.c | 159 +++++++++++++++++++-- fs/nilfs2/mdt.c | 43 ++++-- fs/nilfs2/mdt.h | 6 +- fs/nilfs2/nilfs.h | 16 +-- fs/nilfs2/page.c | 7 +- fs/nilfs2/segment.c | 9 +- fs/nilfs2/super.c | 5 +- kernel/dma/swiotlb.c | 12 +- kernel/events/core.c | 14 ++ net/bridge/br_input.c | 7 + net/key/af_key.c | 6 +- net/mac80211/rx.c | 3 +- net/nfc/nci/data.c | 2 +- net/nfc/nci/hci.c | 4 +- net/sched/act_pedit.c | 4 + sound/isa/wavefront/wavefront_synth.c | 3 +- tools/perf/bench/numa.c | 2 +- 57 files changed, 483 insertions(+), 237 deletions(-)
From: Willy Tarreau w@1wt.eu
commit f71f01394f742fc4558b3f9f4c7ef4c4cf3b07c8 upstream.
Interrupt handler bad_flp_intr() may cause a UAF on the recently freed request just to increment the error count. There's no point keeping that one in the request anyway, and since the interrupt handler uses a static pointer to the error which cannot be kept in sync with the pending request, better make it use a static error counter that's reset for each new request. This reset now happens when entering redo_fd_request() for a new request via set_next_request().
One initial concern about a single error counter was that errors on one floppy drive could be reported on another one, but this problem is not real given that the driver uses a single drive at a time, as that PC-compatible controllers also have this limitation by using shared signals. As such the error count is always for the "current" drive.
Reported-by: Minh Yuan yuanmingbuaa@gmail.com Suggested-by: Linus Torvalds torvalds@linuxfoundation.org Tested-by: Denis Efremov efremov@linux.com Signed-off-by: Willy Tarreau w@1wt.eu Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Denis Efremov efremov@linux.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/block/floppy.c | 17 +++++++---------- 1 file changed, 7 insertions(+), 10 deletions(-)
--- a/drivers/block/floppy.c +++ b/drivers/block/floppy.c @@ -520,8 +520,8 @@ static unsigned long fdc_busy; static DECLARE_WAIT_QUEUE_HEAD(fdc_wait); static DECLARE_WAIT_QUEUE_HEAD(command_done);
-/* Errors during formatting are counted here. */ -static int format_errors; +/* errors encountered on the current (or last) request */ +static int floppy_errors;
/* Format request descriptor. */ static struct format_descr format_req; @@ -541,7 +541,6 @@ static struct format_descr format_req; static char *floppy_track_buffer; static int max_buffer_sectors;
-static int *errors; typedef void (*done_f)(int); static const struct cont_t { void (*interrupt)(void); @@ -1434,7 +1433,7 @@ static int interpret_errors(void) if (DP->flags & FTD_MSG) DPRINT("Over/Underrun - retrying\n"); bad = 0; - } else if (*errors >= DP->max_errors.reporting) { + } else if (floppy_errors >= DP->max_errors.reporting) { print_errors(); } if (ST2 & ST2_WC || ST2 & ST2_BC) @@ -2054,7 +2053,7 @@ static void bad_flp_intr(void) if (!next_valid_format()) return; } - err_count = ++(*errors); + err_count = ++floppy_errors; INFBOUND(DRWE->badness, err_count); if (err_count > DP->max_errors.abort) cont->done(0); @@ -2199,9 +2198,8 @@ static int do_format(int drive, struct f return -EINVAL; } format_req = *tmp_format_req; - format_errors = 0; cont = &format_cont; - errors = &format_errors; + floppy_errors = 0; ret = wait_til_done(redo_format, true); if (ret == -EINTR) return -EINTR; @@ -2684,7 +2682,7 @@ static int make_raw_rw_request(void) */ if (!direct || (indirect * 2 > direct * 3 && - *errors < DP->max_errors.read_track && + floppy_errors < DP->max_errors.read_track && ((!probing || (DP->read_track & (1 << DRS->probed_format)))))) { max_size = blk_rq_sectors(current_req); @@ -2818,7 +2816,7 @@ static int set_next_request(void) if (q) { current_req = blk_fetch_request(q); if (current_req) { - current_req->error_count = 0; + floppy_errors = 0; break; } } @@ -2880,7 +2878,6 @@ do_request: _floppy = floppy_type + DP->autodetect[DRS->probed_format]; } else probing = 0; - errors = &(current_req->error_count); tmp = make_raw_rw_request(); if (tmp < 2) { request_done(tmp);
From: David Gow davidgow@google.com
[ Upstream commit f4f03f299a56ce4d73c5431e0327b3b6cb55ebb9 ]
The syscall_handler_t type for x86_64 was defined as 'long (*)(void)', but always cast to 'long (*)(long, long, long, long, long, long)' before use. This now triggers a warning (see below).
Define syscall_handler_t as the latter instead, and remove the cast. This simplifies the code, and fixes the warning.
Warning: In file included from ../arch/um/include/asm/processor-generic.h:13 from ../arch/x86/um/asm/processor.h:41 from ../include/linux/rcupdate.h:30 from ../include/linux/rculist.h:11 from ../include/linux/pid.h:5 from ../include/linux/sched.h:14 from ../include/linux/ptrace.h:6 from ../arch/um/kernel/skas/syscall.c:7: ../arch/um/kernel/skas/syscall.c: In function ‘handle_syscall’: ../arch/x86/um/shared/sysdep/syscalls_64.h:18:11: warning: cast between incompatible function types from ‘long int (*)(void)’ to ‘long int (*)(long int, long int, long int, long int, long int, long int)’ [ -Wcast-function-type] 18 | (((long (*)(long, long, long, long, long, long)) \ | ^ ../arch/x86/um/asm/ptrace.h:36:62: note: in definition of macro ‘PT_REGS_SET_SYSCALL_RETURN’ 36 | #define PT_REGS_SET_SYSCALL_RETURN(r, res) (PT_REGS_AX(r) = (res)) | ^~~ ../arch/um/kernel/skas/syscall.c:46:33: note: in expansion of macro ‘EXECUTE_SYSCALL’ 46 | EXECUTE_SYSCALL(syscall, regs)); | ^~~~~~~~~~~~~~~
Signed-off-by: David Gow davidgow@google.com Signed-off-by: Richard Weinberger richard@nod.at Signed-off-by: Sasha Levin sashal@kernel.org --- arch/x86/um/shared/sysdep/syscalls_64.h | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-)
diff --git a/arch/x86/um/shared/sysdep/syscalls_64.h b/arch/x86/um/shared/sysdep/syscalls_64.h index 8a7d5e1da98e..1e6875b4ffd8 100644 --- a/arch/x86/um/shared/sysdep/syscalls_64.h +++ b/arch/x86/um/shared/sysdep/syscalls_64.h @@ -10,13 +10,12 @@ #include <linux/msg.h> #include <linux/shm.h>
-typedef long syscall_handler_t(void); +typedef long syscall_handler_t(long, long, long, long, long, long);
extern syscall_handler_t *sys_call_table[];
#define EXECUTE_SYSCALL(syscall, regs) \ - (((long (*)(long, long, long, long, long, long)) \ - (*sys_call_table[syscall]))(UPT_SYSCALL_ARG1(®s->regs), \ + (((*sys_call_table[syscall]))(UPT_SYSCALL_ARG1(®s->regs), \ UPT_SYSCALL_ARG2(®s->regs), \ UPT_SYSCALL_ARG3(®s->regs), \ UPT_SYSCALL_ARG4(®s->regs), \
From: Jeff LaBundy jeff@labundy.com
[ Upstream commit 409353cbe9fe48f6bc196114c442b1cff05a39bc ]
Update input_set_capability() to prevent kernel panic in case the event code exceeds the bitmap for the given event type.
Suggested-by: Tomasz Moń tomasz.mon@camlingroup.com Signed-off-by: Jeff LaBundy jeff@labundy.com Reviewed-by: Tomasz Moń tomasz.mon@camlingroup.com Link: https://lore.kernel.org/r/20220320032537.545250-1-jeff@labundy.com Signed-off-by: Dmitry Torokhov dmitry.torokhov@gmail.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/input/input.c | 19 +++++++++++++++++++ 1 file changed, 19 insertions(+)
diff --git a/drivers/input/input.c b/drivers/input/input.c index a0d90022fcf7..dcbf53b5b2bc 100644 --- a/drivers/input/input.c +++ b/drivers/input/input.c @@ -50,6 +50,17 @@ static DEFINE_MUTEX(input_mutex);
static const struct input_value input_value_sync = { EV_SYN, SYN_REPORT, 1 };
+static const unsigned int input_max_code[EV_CNT] = { + [EV_KEY] = KEY_MAX, + [EV_REL] = REL_MAX, + [EV_ABS] = ABS_MAX, + [EV_MSC] = MSC_MAX, + [EV_SW] = SW_MAX, + [EV_LED] = LED_MAX, + [EV_SND] = SND_MAX, + [EV_FF] = FF_MAX, +}; + static inline int is_event_supported(unsigned int code, unsigned long *bm, unsigned int max) { @@ -1915,6 +1926,14 @@ EXPORT_SYMBOL(input_free_device); */ void input_set_capability(struct input_dev *dev, unsigned int type, unsigned int code) { + if (type < EV_CNT && input_max_code[type] && + code > input_max_code[type]) { + pr_err("%s: invalid code %u for type %u\n", __func__, code, + type); + dump_stack(); + return; + } + switch (type) { case EV_KEY: __set_bit(code, dev->keybit);
From: Zheng Yongjun zhengyongjun3@huawei.com
[ Upstream commit 26623eea0da3476446909af96c980768df07bbd9 ]
pm_runtime_get_sync() will increment pm usage counter even it failed. Forgetting to call pm_runtime_put_noidle will result in reference leak in stmfts_input_open, so we should fix it.
Signed-off-by: Zheng Yongjun zhengyongjun3@huawei.com Link: https://lore.kernel.org/r/20220317131604.53538-1-zhengyongjun3@huawei.com Signed-off-by: Dmitry Torokhov dmitry.torokhov@gmail.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/input/touchscreen/stmfts.c | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-)
diff --git a/drivers/input/touchscreen/stmfts.c b/drivers/input/touchscreen/stmfts.c index cd8805d71d97..be1dd504d5b1 100644 --- a/drivers/input/touchscreen/stmfts.c +++ b/drivers/input/touchscreen/stmfts.c @@ -339,11 +339,11 @@ static int stmfts_input_open(struct input_dev *dev)
err = pm_runtime_get_sync(&sdata->client->dev); if (err < 0) - return err; + goto out;
err = i2c_smbus_write_byte(sdata->client, STMFTS_MS_MT_SENSE_ON); if (err) - return err; + goto out;
mutex_lock(&sdata->mutex); sdata->running = true; @@ -366,7 +366,9 @@ static int stmfts_input_open(struct input_dev *dev) "failed to enable touchkey\n"); }
- return 0; +out: + pm_runtime_put_noidle(&sdata->client->dev); + return err; }
static void stmfts_input_close(struct input_dev *dev)
From: Zheng Yongjun zhengyongjun3@huawei.com
[ Upstream commit e9a36feecee0ee5845f2e0656f50f9942dd0bed3 ]
pm_runtime_get_sync() will increment pm usage counter even it failed. Forgetting to call pm_runtime_put_noidle will result in reference leak in stm32_crc_remove, so we should fix it.
Signed-off-by: Zheng Yongjun zhengyongjun3@huawei.com Signed-off-by: Herbert Xu herbert@gondor.apana.org.au Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/crypto/stm32/stm32_crc32.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/crypto/stm32/stm32_crc32.c b/drivers/crypto/stm32/stm32_crc32.c index 6848f34a9e66..de645bf84980 100644 --- a/drivers/crypto/stm32/stm32_crc32.c +++ b/drivers/crypto/stm32/stm32_crc32.c @@ -334,8 +334,10 @@ static int stm32_crc_remove(struct platform_device *pdev) struct stm32_crc *crc = platform_get_drvdata(pdev); int ret = pm_runtime_get_sync(crc->dev);
- if (ret < 0) + if (ret < 0) { + pm_runtime_put_noidle(crc->dev); return ret; + }
spin_lock(&crc_list.lock); list_del(&crc->list);
From: Xiaoke Wang xkernel.wang@foxmail.com
[ Upstream commit 34123208bbcc8c884a0489f543a23fe9eebb5514 ]
kzalloc() is a memory allocation function which can return NULL when some internal memory errors happen. So it is better to check the return value of it to prevent potential wrong memory access or memory leak.
Signed-off-by: Xiaoke Wang xkernel.wang@foxmail.com Signed-off-by: Thomas Bogendoerfer tsbogend@alpha.franken.de Signed-off-by: Sasha Levin sashal@kernel.org --- arch/mips/lantiq/falcon/sysctrl.c | 2 ++ arch/mips/lantiq/xway/gptu.c | 2 ++ arch/mips/lantiq/xway/sysctrl.c | 46 ++++++++++++++++++++----------- 3 files changed, 34 insertions(+), 16 deletions(-)
diff --git a/arch/mips/lantiq/falcon/sysctrl.c b/arch/mips/lantiq/falcon/sysctrl.c index 82bbd0e2e298..714d92659489 100644 --- a/arch/mips/lantiq/falcon/sysctrl.c +++ b/arch/mips/lantiq/falcon/sysctrl.c @@ -169,6 +169,8 @@ static inline void clkdev_add_sys(const char *dev, unsigned int module, { struct clk *clk = kzalloc(sizeof(struct clk), GFP_KERNEL);
+ if (!clk) + return; clk->cl.dev_id = dev; clk->cl.con_id = NULL; clk->cl.clk = clk; diff --git a/arch/mips/lantiq/xway/gptu.c b/arch/mips/lantiq/xway/gptu.c index e304aabd6678..7d4081d67d61 100644 --- a/arch/mips/lantiq/xway/gptu.c +++ b/arch/mips/lantiq/xway/gptu.c @@ -124,6 +124,8 @@ static inline void clkdev_add_gptu(struct device *dev, const char *con, { struct clk *clk = kzalloc(sizeof(struct clk), GFP_KERNEL);
+ if (!clk) + return; clk->cl.dev_id = dev_name(dev); clk->cl.con_id = con; clk->cl.clk = clk; diff --git a/arch/mips/lantiq/xway/sysctrl.c b/arch/mips/lantiq/xway/sysctrl.c index e0af39b33e28..293ebb833659 100644 --- a/arch/mips/lantiq/xway/sysctrl.c +++ b/arch/mips/lantiq/xway/sysctrl.c @@ -313,6 +313,8 @@ static void clkdev_add_pmu(const char *dev, const char *con, bool deactivate, { struct clk *clk = kzalloc(sizeof(struct clk), GFP_KERNEL);
+ if (!clk) + return; clk->cl.dev_id = dev; clk->cl.con_id = con; clk->cl.clk = clk; @@ -336,6 +338,8 @@ static void clkdev_add_cgu(const char *dev, const char *con, { struct clk *clk = kzalloc(sizeof(struct clk), GFP_KERNEL);
+ if (!clk) + return; clk->cl.dev_id = dev; clk->cl.con_id = con; clk->cl.clk = clk; @@ -354,24 +358,28 @@ static void clkdev_add_pci(void) struct clk *clk_ext = kzalloc(sizeof(struct clk), GFP_KERNEL);
/* main pci clock */ - clk->cl.dev_id = "17000000.pci"; - clk->cl.con_id = NULL; - clk->cl.clk = clk; - clk->rate = CLOCK_33M; - clk->rates = valid_pci_rates; - clk->enable = pci_enable; - clk->disable = pmu_disable; - clk->module = 0; - clk->bits = PMU_PCI; - clkdev_add(&clk->cl); + if (clk) { + clk->cl.dev_id = "17000000.pci"; + clk->cl.con_id = NULL; + clk->cl.clk = clk; + clk->rate = CLOCK_33M; + clk->rates = valid_pci_rates; + clk->enable = pci_enable; + clk->disable = pmu_disable; + clk->module = 0; + clk->bits = PMU_PCI; + clkdev_add(&clk->cl); + }
/* use internal/external bus clock */ - clk_ext->cl.dev_id = "17000000.pci"; - clk_ext->cl.con_id = "external"; - clk_ext->cl.clk = clk_ext; - clk_ext->enable = pci_ext_enable; - clk_ext->disable = pci_ext_disable; - clkdev_add(&clk_ext->cl); + if (clk_ext) { + clk_ext->cl.dev_id = "17000000.pci"; + clk_ext->cl.con_id = "external"; + clk_ext->cl.clk = clk_ext; + clk_ext->enable = pci_ext_enable; + clk_ext->disable = pci_ext_disable; + clkdev_add(&clk_ext->cl); + } }
/* xway socs can generate clocks on gpio pins */ @@ -391,9 +399,15 @@ static void clkdev_add_clkout(void) char *name;
name = kzalloc(sizeof("clkout0"), GFP_KERNEL); + if (!name) + continue; sprintf(name, "clkout%d", i);
clk = kzalloc(sizeof(struct clk), GFP_KERNEL); + if (!clk) { + kfree(name); + continue; + } clk->cl.dev_id = "1f103000.cgu"; clk->cl.con_id = name; clk->cl.clk = clk;
From: Jakob Koschel jakobkoschel@gmail.com
[ Upstream commit 901aeda62efa21f2eae937bccb71b49ae531be06 ]
In preparation to limit the scope of a list iterator to the list traversal loop, use a dedicated pointer to iterate through the list [1].
Since that variable should not be used past the loop iteration, a separate variable is used to 'remember the current location within the loop'.
To either continue iterating from that position or skip the iteration (if the previous iteration was complete) list_prepare_entry() is used.
Link: https://lore.kernel.org/all/CAHk-=wgRr_D8CB-D9Kg-c=EHreAsk5SqXPwr9Y7k9sA6cWX... [1] Signed-off-by: Jakob Koschel jakobkoschel@gmail.com Link: https://lore.kernel.org/r/20220331220349.885126-1-jakobkoschel@gmail.com Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/block/drbd/drbd_main.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/drivers/block/drbd/drbd_main.c b/drivers/block/drbd/drbd_main.c index 5e3885f5729b..c3e4f9d83b29 100644 --- a/drivers/block/drbd/drbd_main.c +++ b/drivers/block/drbd/drbd_main.c @@ -195,7 +195,7 @@ void tl_release(struct drbd_connection *connection, unsigned int barrier_nr, unsigned int set_size) { struct drbd_request *r; - struct drbd_request *req = NULL; + struct drbd_request *req = NULL, *tmp = NULL; int expect_epoch = 0; int expect_size = 0;
@@ -249,8 +249,11 @@ void tl_release(struct drbd_connection *connection, unsigned int barrier_nr, * to catch requests being barrier-acked "unexpectedly". * It usually should find the same req again, or some READ preceding it. */ list_for_each_entry(req, &connection->transfer_log, tl_requests) - if (req->epoch == expect_epoch) + if (req->epoch == expect_epoch) { + tmp = req; break; + } + req = list_prepare_entry(tmp, &connection->transfer_log, tl_requests); list_for_each_entry_safe_from(req, r, &connection->transfer_log, tl_requests) { if (req->epoch != expect_epoch) break;
From: linyujun linyujun809@huawei.com
[ Upstream commit 9be4c88bb7924f68f88cfd47d925c2d046f51a73 ]
The following KASAN warning is detected by QEMU.
================================================================== BUG: KASAN: stack-out-of-bounds in unwind_frame+0x508/0x870 Read of size 4 at addr c36bba90 by task cat/163
CPU: 1 PID: 163 Comm: cat Not tainted 5.10.0-rc1 #40 Hardware name: ARM-Versatile Express [<c0113fac>] (unwind_backtrace) from [<c010e71c>] (show_stack+0x10/0x14) [<c010e71c>] (show_stack) from [<c0b805b4>] (dump_stack+0x98/0xb0) [<c0b805b4>] (dump_stack) from [<c0b7d658>] (print_address_description.constprop.0+0x58/0x4bc) [<c0b7d658>] (print_address_description.constprop.0) from [<c031435c>] (kasan_report+0x154/0x170) [<c031435c>] (kasan_report) from [<c0113c44>] (unwind_frame+0x508/0x870) [<c0113c44>] (unwind_frame) from [<c010e298>] (__save_stack_trace+0x110/0x134) [<c010e298>] (__save_stack_trace) from [<c01ce0d8>] (stack_trace_save+0x8c/0xb4) [<c01ce0d8>] (stack_trace_save) from [<c0313520>] (kasan_set_track+0x38/0x60) [<c0313520>] (kasan_set_track) from [<c0314cb8>] (kasan_set_free_info+0x20/0x2c) [<c0314cb8>] (kasan_set_free_info) from [<c0313474>] (__kasan_slab_free+0xec/0x120) [<c0313474>] (__kasan_slab_free) from [<c0311e20>] (kmem_cache_free+0x7c/0x334) [<c0311e20>] (kmem_cache_free) from [<c01c35dc>] (rcu_core+0x390/0xccc) [<c01c35dc>] (rcu_core) from [<c01013a8>] (__do_softirq+0x180/0x518) [<c01013a8>] (__do_softirq) from [<c0135214>] (irq_exit+0x9c/0xe0) [<c0135214>] (irq_exit) from [<c01a40e4>] (__handle_domain_irq+0xb0/0x110) [<c01a40e4>] (__handle_domain_irq) from [<c0691248>] (gic_handle_irq+0xa0/0xb8) [<c0691248>] (gic_handle_irq) from [<c0100b0c>] (__irq_svc+0x6c/0x94) Exception stack(0xc36bb928 to 0xc36bb970) b920: c36bb9c0 00000000 c0126919 c0101228 c36bb9c0 b76d7730 b940: c36b8000 c36bb9a0 c3335b00 c01ce0d8 00000003 c36bba3c c36bb940 c36bb978 b960: c010e298 c011373c 60000013 ffffffff [<c0100b0c>] (__irq_svc) from [<c011373c>] (unwind_frame+0x0/0x870) [<c011373c>] (unwind_frame) from [<00000000>] (0x0)
The buggy address belongs to the page: page:(ptrval) refcount:0 mapcount:0 mapping:00000000 index:0x0 pfn:0x636bb flags: 0x0() raw: 00000000 00000000 ef867764 00000000 00000000 00000000 ffffffff 00000000 page dumped because: kasan: bad access detected
addr c36bba90 is located in stack of task cat/163 at offset 48 in frame: stack_trace_save+0x0/0xb4
this frame has 1 object: [32, 48) 'trace'
Memory state around the buggy address: c36bb980: f1 f1 f1 f1 00 04 f2 f2 00 00 f3 f3 00 00 00 00 c36bba00: 00 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
c36bba80: 00 00 f3 f3 00 00 00 00 00 00 00 00 00 00 00 00
^ c36bbb00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 c36bbb80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ==================================================================
There is a same issue on x86 and has been resolved by the commit f7d27c35ddff ("x86/mm, kasan: Silence KASAN warnings in get_wchan()"). The solution could be applied to arm architecture too.
Signed-off-by: Lin Yujun linyujun809@huawei.com Reported-by: He Ying heying24@huawei.com Signed-off-by: Russell King (Oracle) rmk+kernel@armlinux.org.uk Signed-off-by: Sasha Levin sashal@kernel.org --- arch/arm/kernel/stacktrace.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-)
diff --git a/arch/arm/kernel/stacktrace.c b/arch/arm/kernel/stacktrace.c index a452b859f485..d99b45307566 100644 --- a/arch/arm/kernel/stacktrace.c +++ b/arch/arm/kernel/stacktrace.c @@ -52,17 +52,17 @@ int notrace unwind_frame(struct stackframe *frame) return -EINVAL;
frame->sp = frame->fp; - frame->fp = *(unsigned long *)(fp); - frame->pc = *(unsigned long *)(fp + 4); + frame->fp = READ_ONCE_NOCHECK(*(unsigned long *)(fp)); + frame->pc = READ_ONCE_NOCHECK(*(unsigned long *)(fp + 4)); #else /* check current frame pointer is within bounds */ if (fp < low + 12 || fp > high - 4) return -EINVAL;
/* restore the registers from the stack frame */ - frame->fp = *(unsigned long *)(fp - 12); - frame->sp = *(unsigned long *)(fp - 8); - frame->pc = *(unsigned long *)(fp - 4); + frame->fp = READ_ONCE_NOCHECK(*(unsigned long *)(fp - 12)); + frame->sp = READ_ONCE_NOCHECK(*(unsigned long *)(fp - 8)); + frame->pc = READ_ONCE_NOCHECK(*(unsigned long *)(fp - 4)); #endif
return 0;
From: Ryusuke Konishi konishi.ryusuke@gmail.com
[ Upstream commit e897be17a441fa637cd166fc3de1445131e57692 ]
Patch series "nilfs2 lockdep warning fixes".
The first two are to resolve the lockdep warning issue, and the last one is the accompanying cleanup and low priority.
Based on your comment, this series solves the issue by separating inode object as needed. Since I was worried about the impact of the object composition changes, I tested the series carefully not to cause regressions especially for delicate functions such like disk space reclamation and snapshots.
This patch (of 3):
If CONFIG_LOCKDEP is enabled, nilfs2 hits lockdep warnings at inode_to_wb() during page/folio operations for btree nodes:
WARNING: CPU: 0 PID: 6575 at include/linux/backing-dev.h:269 inode_to_wb include/linux/backing-dev.h:269 [inline] WARNING: CPU: 0 PID: 6575 at include/linux/backing-dev.h:269 folio_account_dirtied mm/page-writeback.c:2460 [inline] WARNING: CPU: 0 PID: 6575 at include/linux/backing-dev.h:269 __folio_mark_dirty+0xa7c/0xe30 mm/page-writeback.c:2509 Modules linked in: ... RIP: 0010:inode_to_wb include/linux/backing-dev.h:269 [inline] RIP: 0010:folio_account_dirtied mm/page-writeback.c:2460 [inline] RIP: 0010:__folio_mark_dirty+0xa7c/0xe30 mm/page-writeback.c:2509 ... Call Trace: __set_page_dirty include/linux/pagemap.h:834 [inline] mark_buffer_dirty+0x4e6/0x650 fs/buffer.c:1145 nilfs_btree_propagate_p fs/nilfs2/btree.c:1889 [inline] nilfs_btree_propagate+0x4ae/0xea0 fs/nilfs2/btree.c:2085 nilfs_bmap_propagate+0x73/0x170 fs/nilfs2/bmap.c:337 nilfs_collect_dat_data+0x45/0xd0 fs/nilfs2/segment.c:625 nilfs_segctor_apply_buffers+0x14a/0x470 fs/nilfs2/segment.c:1009 nilfs_segctor_scan_file+0x47a/0x700 fs/nilfs2/segment.c:1048 nilfs_segctor_collect_blocks fs/nilfs2/segment.c:1224 [inline] nilfs_segctor_collect fs/nilfs2/segment.c:1494 [inline] nilfs_segctor_do_construct+0x14f3/0x6c60 fs/nilfs2/segment.c:2036 nilfs_segctor_construct+0x7a7/0xb30 fs/nilfs2/segment.c:2372 nilfs_segctor_thread_construct fs/nilfs2/segment.c:2480 [inline] nilfs_segctor_thread+0x3c3/0xf90 fs/nilfs2/segment.c:2563 kthread+0x405/0x4f0 kernel/kthread.c:327 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295
This is because nilfs2 uses two page caches for each inode and inode->i_mapping never points to one of them, the btree node cache.
This causes inode_to_wb(inode) to refer to a different page cache than the caller page/folio operations such like __folio_start_writeback(), __folio_end_writeback(), or __folio_mark_dirty() acquired the lock.
This patch resolves the issue by allocating and using an additional inode to hold the page cache of btree nodes. The inode is attached one-to-one to the traditional nilfs2 inode if it requires a block mapping with b-tree. This setup change is in memory only and does not affect the disk format.
Link: https://lkml.kernel.org/r/1647867427-30498-1-git-send-email-konishi.ryusuke@... Link: https://lkml.kernel.org/r/1647867427-30498-2-git-send-email-konishi.ryusuke@... Link: https://lore.kernel.org/r/YXrYvIo8YRnAOJCj@casper.infradead.org Link: https://lore.kernel.org/r/9a20b33d-b38f-b4a2-4742-c1eb5b8e4d6c@redhat.com Signed-off-by: Ryusuke Konishi konishi.ryusuke@gmail.com Reported-by: syzbot+0d5b462a6f07447991b3@syzkaller.appspotmail.com Reported-by: syzbot+34ef28bb2aeb28724aa0@syzkaller.appspotmail.com Reported-by: Hao Sun sunhao.th@gmail.com Reported-by: David Hildenbrand david@redhat.com Tested-by: Ryusuke Konishi konishi.ryusuke@gmail.com Cc: Matthew Wilcox willy@infradead.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nilfs2/btnode.c | 23 ++++++++-- fs/nilfs2/btnode.h | 1 + fs/nilfs2/btree.c | 27 ++++++++---- fs/nilfs2/gcinode.c | 7 +-- fs/nilfs2/inode.c | 104 ++++++++++++++++++++++++++++++++++++++------ fs/nilfs2/mdt.c | 7 +-- fs/nilfs2/nilfs.h | 14 +++--- fs/nilfs2/page.c | 7 ++- fs/nilfs2/segment.c | 9 ++-- fs/nilfs2/super.c | 5 +-- 10 files changed, 154 insertions(+), 50 deletions(-)
diff --git a/fs/nilfs2/btnode.c b/fs/nilfs2/btnode.c index ebb24a314f43..138ebbb7a1ee 100644 --- a/fs/nilfs2/btnode.c +++ b/fs/nilfs2/btnode.c @@ -20,6 +20,23 @@ #include "page.h" #include "btnode.h"
+ +/** + * nilfs_init_btnc_inode - initialize B-tree node cache inode + * @btnc_inode: inode to be initialized + * + * nilfs_init_btnc_inode() sets up an inode for B-tree node cache. + */ +void nilfs_init_btnc_inode(struct inode *btnc_inode) +{ + struct nilfs_inode_info *ii = NILFS_I(btnc_inode); + + btnc_inode->i_mode = S_IFREG; + ii->i_flags = 0; + memset(&ii->i_bmap_data, 0, sizeof(struct nilfs_bmap)); + mapping_set_gfp_mask(btnc_inode->i_mapping, GFP_NOFS); +} + void nilfs_btnode_cache_clear(struct address_space *btnc) { invalidate_mapping_pages(btnc, 0, -1); @@ -29,7 +46,7 @@ void nilfs_btnode_cache_clear(struct address_space *btnc) struct buffer_head * nilfs_btnode_create_block(struct address_space *btnc, __u64 blocknr) { - struct inode *inode = NILFS_BTNC_I(btnc); + struct inode *inode = btnc->host; struct buffer_head *bh;
bh = nilfs_grab_buffer(inode, btnc, blocknr, BIT(BH_NILFS_Node)); @@ -57,7 +74,7 @@ int nilfs_btnode_submit_block(struct address_space *btnc, __u64 blocknr, struct buffer_head **pbh, sector_t *submit_ptr) { struct buffer_head *bh; - struct inode *inode = NILFS_BTNC_I(btnc); + struct inode *inode = btnc->host; struct page *page; int err;
@@ -157,7 +174,7 @@ int nilfs_btnode_prepare_change_key(struct address_space *btnc, struct nilfs_btnode_chkey_ctxt *ctxt) { struct buffer_head *obh, *nbh; - struct inode *inode = NILFS_BTNC_I(btnc); + struct inode *inode = btnc->host; __u64 oldkey = ctxt->oldkey, newkey = ctxt->newkey; int err;
diff --git a/fs/nilfs2/btnode.h b/fs/nilfs2/btnode.h index 0f88dbc9bcb3..05ab64d354dc 100644 --- a/fs/nilfs2/btnode.h +++ b/fs/nilfs2/btnode.h @@ -30,6 +30,7 @@ struct nilfs_btnode_chkey_ctxt { struct buffer_head *newbh; };
+void nilfs_init_btnc_inode(struct inode *btnc_inode); void nilfs_btnode_cache_clear(struct address_space *); struct buffer_head *nilfs_btnode_create_block(struct address_space *btnc, __u64 blocknr); diff --git a/fs/nilfs2/btree.c b/fs/nilfs2/btree.c index 23e043eca237..919d1238ce45 100644 --- a/fs/nilfs2/btree.c +++ b/fs/nilfs2/btree.c @@ -58,7 +58,8 @@ static void nilfs_btree_free_path(struct nilfs_btree_path *path) static int nilfs_btree_get_new_block(const struct nilfs_bmap *btree, __u64 ptr, struct buffer_head **bhp) { - struct address_space *btnc = &NILFS_BMAP_I(btree)->i_btnode_cache; + struct inode *btnc_inode = NILFS_BMAP_I(btree)->i_assoc_inode; + struct address_space *btnc = btnc_inode->i_mapping; struct buffer_head *bh;
bh = nilfs_btnode_create_block(btnc, ptr); @@ -470,7 +471,8 @@ static int __nilfs_btree_get_block(const struct nilfs_bmap *btree, __u64 ptr, struct buffer_head **bhp, const struct nilfs_btree_readahead_info *ra) { - struct address_space *btnc = &NILFS_BMAP_I(btree)->i_btnode_cache; + struct inode *btnc_inode = NILFS_BMAP_I(btree)->i_assoc_inode; + struct address_space *btnc = btnc_inode->i_mapping; struct buffer_head *bh, *ra_bh; sector_t submit_ptr = 0; int ret; @@ -1742,6 +1744,10 @@ nilfs_btree_prepare_convert_and_insert(struct nilfs_bmap *btree, __u64 key, dat = nilfs_bmap_get_dat(btree); }
+ ret = nilfs_attach_btree_node_cache(&NILFS_BMAP_I(btree)->vfs_inode); + if (ret < 0) + return ret; + ret = nilfs_bmap_prepare_alloc_ptr(btree, dreq, dat); if (ret < 0) return ret; @@ -1914,7 +1920,7 @@ static int nilfs_btree_prepare_update_v(struct nilfs_bmap *btree, path[level].bp_ctxt.newkey = path[level].bp_newreq.bpr_ptr; path[level].bp_ctxt.bh = path[level].bp_bh; ret = nilfs_btnode_prepare_change_key( - &NILFS_BMAP_I(btree)->i_btnode_cache, + NILFS_BMAP_I(btree)->i_assoc_inode->i_mapping, &path[level].bp_ctxt); if (ret < 0) { nilfs_dat_abort_update(dat, @@ -1940,7 +1946,7 @@ static void nilfs_btree_commit_update_v(struct nilfs_bmap *btree,
if (buffer_nilfs_node(path[level].bp_bh)) { nilfs_btnode_commit_change_key( - &NILFS_BMAP_I(btree)->i_btnode_cache, + NILFS_BMAP_I(btree)->i_assoc_inode->i_mapping, &path[level].bp_ctxt); path[level].bp_bh = path[level].bp_ctxt.bh; } @@ -1959,7 +1965,7 @@ static void nilfs_btree_abort_update_v(struct nilfs_bmap *btree, &path[level].bp_newreq.bpr_req); if (buffer_nilfs_node(path[level].bp_bh)) nilfs_btnode_abort_change_key( - &NILFS_BMAP_I(btree)->i_btnode_cache, + NILFS_BMAP_I(btree)->i_assoc_inode->i_mapping, &path[level].bp_ctxt); }
@@ -2135,7 +2141,8 @@ static void nilfs_btree_add_dirty_buffer(struct nilfs_bmap *btree, static void nilfs_btree_lookup_dirty_buffers(struct nilfs_bmap *btree, struct list_head *listp) { - struct address_space *btcache = &NILFS_BMAP_I(btree)->i_btnode_cache; + struct inode *btnc_inode = NILFS_BMAP_I(btree)->i_assoc_inode; + struct address_space *btcache = btnc_inode->i_mapping; struct list_head lists[NILFS_BTREE_LEVEL_MAX]; struct pagevec pvec; struct buffer_head *bh, *head; @@ -2189,12 +2196,12 @@ static int nilfs_btree_assign_p(struct nilfs_bmap *btree, path[level].bp_ctxt.newkey = blocknr; path[level].bp_ctxt.bh = *bh; ret = nilfs_btnode_prepare_change_key( - &NILFS_BMAP_I(btree)->i_btnode_cache, + NILFS_BMAP_I(btree)->i_assoc_inode->i_mapping, &path[level].bp_ctxt); if (ret < 0) return ret; nilfs_btnode_commit_change_key( - &NILFS_BMAP_I(btree)->i_btnode_cache, + NILFS_BMAP_I(btree)->i_assoc_inode->i_mapping, &path[level].bp_ctxt); *bh = path[level].bp_ctxt.bh; } @@ -2399,6 +2406,10 @@ int nilfs_btree_init(struct nilfs_bmap *bmap)
if (nilfs_btree_root_broken(nilfs_btree_get_root(bmap), bmap->b_inode)) ret = -EIO; + else + ret = nilfs_attach_btree_node_cache( + &NILFS_BMAP_I(bmap)->vfs_inode); + return ret; }
diff --git a/fs/nilfs2/gcinode.c b/fs/nilfs2/gcinode.c index aa3c328ee189..114774ac2185 100644 --- a/fs/nilfs2/gcinode.c +++ b/fs/nilfs2/gcinode.c @@ -126,9 +126,10 @@ int nilfs_gccache_submit_read_data(struct inode *inode, sector_t blkoff, int nilfs_gccache_submit_read_node(struct inode *inode, sector_t pbn, __u64 vbn, struct buffer_head **out_bh) { + struct inode *btnc_inode = NILFS_I(inode)->i_assoc_inode; int ret;
- ret = nilfs_btnode_submit_block(&NILFS_I(inode)->i_btnode_cache, + ret = nilfs_btnode_submit_block(btnc_inode->i_mapping, vbn ? : pbn, pbn, REQ_OP_READ, 0, out_bh, &pbn); if (ret == -EEXIST) /* internal code (cache hit) */ @@ -170,7 +171,7 @@ int nilfs_init_gcinode(struct inode *inode) ii->i_flags = 0; nilfs_bmap_init_gc(ii->i_bmap);
- return 0; + return nilfs_attach_btree_node_cache(inode); }
/** @@ -185,7 +186,7 @@ void nilfs_remove_all_gcinodes(struct the_nilfs *nilfs) ii = list_first_entry(head, struct nilfs_inode_info, i_dirty); list_del_init(&ii->i_dirty); truncate_inode_pages(&ii->vfs_inode.i_data, 0); - nilfs_btnode_cache_clear(&ii->i_btnode_cache); + nilfs_btnode_cache_clear(ii->i_assoc_inode->i_mapping); iput(&ii->vfs_inode); } } diff --git a/fs/nilfs2/inode.c b/fs/nilfs2/inode.c index 671085512e0f..b0a0822e371c 100644 --- a/fs/nilfs2/inode.c +++ b/fs/nilfs2/inode.c @@ -28,12 +28,14 @@ * @cno: checkpoint number * @root: pointer on NILFS root object (mounted checkpoint) * @for_gc: inode for GC flag + * @for_btnc: inode for B-tree node cache flag */ struct nilfs_iget_args { u64 ino; __u64 cno; struct nilfs_root *root; - int for_gc; + bool for_gc; + bool for_btnc; };
static int nilfs_iget_test(struct inode *inode, void *opaque); @@ -322,7 +324,8 @@ static int nilfs_insert_inode_locked(struct inode *inode, unsigned long ino) { struct nilfs_iget_args args = { - .ino = ino, .root = root, .cno = 0, .for_gc = 0 + .ino = ino, .root = root, .cno = 0, .for_gc = false, + .for_btnc = false };
return insert_inode_locked4(inode, ino, nilfs_iget_test, &args); @@ -534,6 +537,13 @@ static int nilfs_iget_test(struct inode *inode, void *opaque) return 0;
ii = NILFS_I(inode); + if (test_bit(NILFS_I_BTNC, &ii->i_state)) { + if (!args->for_btnc) + return 0; + } else if (args->for_btnc) { + return 0; + } + if (!test_bit(NILFS_I_GCINODE, &ii->i_state)) return !args->for_gc;
@@ -545,15 +555,15 @@ static int nilfs_iget_set(struct inode *inode, void *opaque) struct nilfs_iget_args *args = opaque;
inode->i_ino = args->ino; - if (args->for_gc) { + NILFS_I(inode)->i_cno = args->cno; + NILFS_I(inode)->i_root = args->root; + if (args->root && args->ino == NILFS_ROOT_INO) + nilfs_get_root(args->root); + + if (args->for_gc) NILFS_I(inode)->i_state = BIT(NILFS_I_GCINODE); - NILFS_I(inode)->i_cno = args->cno; - NILFS_I(inode)->i_root = NULL; - } else { - if (args->root && args->ino == NILFS_ROOT_INO) - nilfs_get_root(args->root); - NILFS_I(inode)->i_root = args->root; - } + if (args->for_btnc) + NILFS_I(inode)->i_state |= BIT(NILFS_I_BTNC); return 0; }
@@ -561,7 +571,8 @@ struct inode *nilfs_ilookup(struct super_block *sb, struct nilfs_root *root, unsigned long ino) { struct nilfs_iget_args args = { - .ino = ino, .root = root, .cno = 0, .for_gc = 0 + .ino = ino, .root = root, .cno = 0, .for_gc = false, + .for_btnc = false };
return ilookup5(sb, ino, nilfs_iget_test, &args); @@ -571,7 +582,8 @@ struct inode *nilfs_iget_locked(struct super_block *sb, struct nilfs_root *root, unsigned long ino) { struct nilfs_iget_args args = { - .ino = ino, .root = root, .cno = 0, .for_gc = 0 + .ino = ino, .root = root, .cno = 0, .for_gc = false, + .for_btnc = false };
return iget5_locked(sb, ino, nilfs_iget_test, nilfs_iget_set, &args); @@ -602,7 +614,8 @@ struct inode *nilfs_iget_for_gc(struct super_block *sb, unsigned long ino, __u64 cno) { struct nilfs_iget_args args = { - .ino = ino, .root = NULL, .cno = cno, .for_gc = 1 + .ino = ino, .root = NULL, .cno = cno, .for_gc = true, + .for_btnc = false }; struct inode *inode; int err; @@ -622,6 +635,68 @@ struct inode *nilfs_iget_for_gc(struct super_block *sb, unsigned long ino, return inode; }
+/** + * nilfs_attach_btree_node_cache - attach a B-tree node cache to the inode + * @inode: inode object + * + * nilfs_attach_btree_node_cache() attaches a B-tree node cache to @inode, + * or does nothing if the inode already has it. This function allocates + * an additional inode to maintain page cache of B-tree nodes one-on-one. + * + * Return Value: On success, 0 is returned. On errors, one of the following + * negative error code is returned. + * + * %-ENOMEM - Insufficient memory available. + */ +int nilfs_attach_btree_node_cache(struct inode *inode) +{ + struct nilfs_inode_info *ii = NILFS_I(inode); + struct inode *btnc_inode; + struct nilfs_iget_args args; + + if (ii->i_assoc_inode) + return 0; + + args.ino = inode->i_ino; + args.root = ii->i_root; + args.cno = ii->i_cno; + args.for_gc = test_bit(NILFS_I_GCINODE, &ii->i_state) != 0; + args.for_btnc = true; + + btnc_inode = iget5_locked(inode->i_sb, inode->i_ino, nilfs_iget_test, + nilfs_iget_set, &args); + if (unlikely(!btnc_inode)) + return -ENOMEM; + if (btnc_inode->i_state & I_NEW) { + nilfs_init_btnc_inode(btnc_inode); + unlock_new_inode(btnc_inode); + } + NILFS_I(btnc_inode)->i_assoc_inode = inode; + NILFS_I(btnc_inode)->i_bmap = ii->i_bmap; + ii->i_assoc_inode = btnc_inode; + + return 0; +} + +/** + * nilfs_detach_btree_node_cache - detach the B-tree node cache from the inode + * @inode: inode object + * + * nilfs_detach_btree_node_cache() detaches the B-tree node cache and its + * holder inode bound to @inode, or does nothing if @inode doesn't have it. + */ +void nilfs_detach_btree_node_cache(struct inode *inode) +{ + struct nilfs_inode_info *ii = NILFS_I(inode); + struct inode *btnc_inode = ii->i_assoc_inode; + + if (btnc_inode) { + NILFS_I(btnc_inode)->i_assoc_inode = NULL; + ii->i_assoc_inode = NULL; + iput(btnc_inode); + } +} + void nilfs_write_inode_common(struct inode *inode, struct nilfs_inode *raw_inode, int has_bmap) { @@ -770,7 +845,8 @@ static void nilfs_clear_inode(struct inode *inode) if (test_bit(NILFS_I_BMAP, &ii->i_state)) nilfs_bmap_clear(ii->i_bmap);
- nilfs_btnode_cache_clear(&ii->i_btnode_cache); + if (!test_bit(NILFS_I_BTNC, &ii->i_state)) + nilfs_detach_btree_node_cache(inode);
if (ii->i_root && inode->i_ino == NILFS_ROOT_INO) nilfs_put_root(ii->i_root); diff --git a/fs/nilfs2/mdt.c b/fs/nilfs2/mdt.c index 700870a92bc4..3a1200220b97 100644 --- a/fs/nilfs2/mdt.c +++ b/fs/nilfs2/mdt.c @@ -531,7 +531,7 @@ int nilfs_mdt_save_to_shadow_map(struct inode *inode) goto out;
ret = nilfs_copy_dirty_pages(&shadow->frozen_btnodes, - &ii->i_btnode_cache); + ii->i_assoc_inode->i_mapping); if (ret) goto out;
@@ -622,8 +622,9 @@ void nilfs_mdt_restore_from_shadow_map(struct inode *inode) nilfs_clear_dirty_pages(inode->i_mapping, true); nilfs_copy_back_pages(inode->i_mapping, &shadow->frozen_data);
- nilfs_clear_dirty_pages(&ii->i_btnode_cache, true); - nilfs_copy_back_pages(&ii->i_btnode_cache, &shadow->frozen_btnodes); + nilfs_clear_dirty_pages(ii->i_assoc_inode->i_mapping, true); + nilfs_copy_back_pages(ii->i_assoc_inode->i_mapping, + &shadow->frozen_btnodes);
nilfs_bmap_restore(ii->i_bmap, &shadow->bmap_store);
diff --git a/fs/nilfs2/nilfs.h b/fs/nilfs2/nilfs.h index a2f247b6a209..a63aa5b5993c 100644 --- a/fs/nilfs2/nilfs.h +++ b/fs/nilfs2/nilfs.h @@ -28,7 +28,7 @@ * @i_xattr: <TODO> * @i_dir_start_lookup: page index of last successful search * @i_cno: checkpoint number for GC inode - * @i_btnode_cache: cached pages of b-tree nodes + * @i_assoc_inode: associated inode (B-tree node cache holder or back pointer) * @i_dirty: list for connecting dirty files * @xattr_sem: semaphore for extended attributes processing * @i_bh: buffer contains disk inode @@ -43,7 +43,7 @@ struct nilfs_inode_info { __u64 i_xattr; /* sector_t ??? */ __u32 i_dir_start_lookup; __u64 i_cno; /* check point number for GC inode */ - struct address_space i_btnode_cache; + struct inode *i_assoc_inode; struct list_head i_dirty; /* List for connecting dirty files */
#ifdef CONFIG_NILFS_XATTR @@ -75,13 +75,6 @@ NILFS_BMAP_I(const struct nilfs_bmap *bmap) return container_of(bmap, struct nilfs_inode_info, i_bmap_data); }
-static inline struct inode *NILFS_BTNC_I(struct address_space *btnc) -{ - struct nilfs_inode_info *ii = - container_of(btnc, struct nilfs_inode_info, i_btnode_cache); - return &ii->vfs_inode; -} - /* * Dynamic state flags of NILFS on-memory inode (i_state) */ @@ -98,6 +91,7 @@ enum { NILFS_I_INODE_SYNC, /* dsync is not allowed for inode */ NILFS_I_BMAP, /* has bmap and btnode_cache */ NILFS_I_GCINODE, /* inode for GC, on memory only */ + NILFS_I_BTNC, /* inode for btree node cache */ };
/* @@ -265,6 +259,8 @@ struct inode *nilfs_iget(struct super_block *sb, struct nilfs_root *root, unsigned long ino); extern struct inode *nilfs_iget_for_gc(struct super_block *sb, unsigned long ino, __u64 cno); +int nilfs_attach_btree_node_cache(struct inode *inode); +void nilfs_detach_btree_node_cache(struct inode *inode); extern void nilfs_update_inode(struct inode *, struct buffer_head *, int); extern void nilfs_truncate(struct inode *); extern void nilfs_evict_inode(struct inode *); diff --git a/fs/nilfs2/page.c b/fs/nilfs2/page.c index 329a056b73b1..c726b42ca92d 100644 --- a/fs/nilfs2/page.c +++ b/fs/nilfs2/page.c @@ -452,10 +452,9 @@ void nilfs_mapping_init(struct address_space *mapping, struct inode *inode) /* * NILFS2 needs clear_page_dirty() in the following two cases: * - * 1) For B-tree node pages and data pages of the dat/gcdat, NILFS2 clears - * page dirty flags when it copies back pages from the shadow cache - * (gcdat->{i_mapping,i_btnode_cache}) to its original cache - * (dat->{i_mapping,i_btnode_cache}). + * 1) For B-tree node pages and data pages of DAT file, NILFS2 clears dirty + * flag of pages when it copies back pages from shadow cache to the + * original cache. * * 2) Some B-tree operations like insertion or deletion may dispose buffers * in dirty state, and this needs to cancel the dirty state of their pages. diff --git a/fs/nilfs2/segment.c b/fs/nilfs2/segment.c index 91b58c897f92..eb3ac7619088 100644 --- a/fs/nilfs2/segment.c +++ b/fs/nilfs2/segment.c @@ -738,15 +738,18 @@ static void nilfs_lookup_dirty_node_buffers(struct inode *inode, struct list_head *listp) { struct nilfs_inode_info *ii = NILFS_I(inode); - struct address_space *mapping = &ii->i_btnode_cache; + struct inode *btnc_inode = ii->i_assoc_inode; struct pagevec pvec; struct buffer_head *bh, *head; unsigned int i; pgoff_t index = 0;
+ if (!btnc_inode) + return; + pagevec_init(&pvec);
- while (pagevec_lookup_tag(&pvec, mapping, &index, + while (pagevec_lookup_tag(&pvec, btnc_inode->i_mapping, &index, PAGECACHE_TAG_DIRTY)) { for (i = 0; i < pagevec_count(&pvec); i++) { bh = head = page_buffers(pvec.pages[i]); @@ -2410,7 +2413,7 @@ nilfs_remove_written_gcinodes(struct the_nilfs *nilfs, struct list_head *head) continue; list_del_init(&ii->i_dirty); truncate_inode_pages(&ii->vfs_inode.i_data, 0); - nilfs_btnode_cache_clear(&ii->i_btnode_cache); + nilfs_btnode_cache_clear(ii->i_assoc_inode->i_mapping); iput(&ii->vfs_inode); } } diff --git a/fs/nilfs2/super.c b/fs/nilfs2/super.c index 26290aa1023f..2a3ad1270133 100644 --- a/fs/nilfs2/super.c +++ b/fs/nilfs2/super.c @@ -151,7 +151,8 @@ struct inode *nilfs_alloc_inode(struct super_block *sb) ii->i_bh = NULL; ii->i_state = 0; ii->i_cno = 0; - nilfs_mapping_init(&ii->i_btnode_cache, &ii->vfs_inode); + ii->i_assoc_inode = NULL; + ii->i_bmap = &ii->i_bmap_data; return &ii->vfs_inode; }
@@ -1382,8 +1383,6 @@ static void nilfs_inode_init_once(void *obj) #ifdef CONFIG_NILFS_XATTR init_rwsem(&ii->xattr_sem); #endif - address_space_init_once(&ii->i_btnode_cache); - ii->i_bmap = &ii->i_bmap_data; inode_init_once(&ii->vfs_inode); }
From: Ryusuke Konishi konishi.ryusuke@gmail.com
[ Upstream commit 6e211930f79aa45d422009a5f2e5467d2369ffe5 ]
During disk space reclamation, nilfs2 still emits the following lockdep warning due to page/folio operations on shadowed page caches that nilfs2 uses to get a snapshot of DAT file in memory:
WARNING: CPU: 0 PID: 2643 at include/linux/backing-dev.h:272 __folio_mark_dirty+0x645/0x670 ... RIP: 0010:__folio_mark_dirty+0x645/0x670 ... Call Trace: filemap_dirty_folio+0x74/0xd0 __set_page_dirty_nobuffers+0x85/0xb0 nilfs_copy_dirty_pages+0x288/0x510 [nilfs2] nilfs_mdt_save_to_shadow_map+0x50/0xe0 [nilfs2] nilfs_clean_segments+0xee/0x5d0 [nilfs2] nilfs_ioctl_clean_segments.isra.19+0xb08/0xf40 [nilfs2] nilfs_ioctl+0xc52/0xfb0 [nilfs2] __x64_sys_ioctl+0x11d/0x170
This fixes the remaining warning by using inode objects to hold those page caches.
Link: https://lkml.kernel.org/r/1647867427-30498-3-git-send-email-konishi.ryusuke@... Signed-off-by: Ryusuke Konishi konishi.ryusuke@gmail.com Tested-by: Ryusuke Konishi konishi.ryusuke@gmail.com Cc: Matthew Wilcox willy@infradead.org Cc: David Hildenbrand david@redhat.com Cc: Hao Sun sunhao.th@gmail.com Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nilfs2/dat.c | 4 ++- fs/nilfs2/inode.c | 63 ++++++++++++++++++++++++++++++++++++++++++++--- fs/nilfs2/mdt.c | 38 +++++++++++++++++++--------- fs/nilfs2/mdt.h | 6 ++--- fs/nilfs2/nilfs.h | 2 ++ 5 files changed, 92 insertions(+), 21 deletions(-)
diff --git a/fs/nilfs2/dat.c b/fs/nilfs2/dat.c index 6f4066636be9..a3523a243e11 100644 --- a/fs/nilfs2/dat.c +++ b/fs/nilfs2/dat.c @@ -497,7 +497,9 @@ int nilfs_dat_read(struct super_block *sb, size_t entry_size, di = NILFS_DAT_I(dat); lockdep_set_class(&di->mi.mi_sem, &dat_lock_key); nilfs_palloc_setup_cache(dat, &di->palloc_cache); - nilfs_mdt_setup_shadow_map(dat, &di->shadow); + err = nilfs_mdt_setup_shadow_map(dat, &di->shadow); + if (err) + goto failed;
err = nilfs_read_inode_common(dat, raw_inode); if (err) diff --git a/fs/nilfs2/inode.c b/fs/nilfs2/inode.c index b0a0822e371c..35b0bfe9324f 100644 --- a/fs/nilfs2/inode.c +++ b/fs/nilfs2/inode.c @@ -29,6 +29,7 @@ * @root: pointer on NILFS root object (mounted checkpoint) * @for_gc: inode for GC flag * @for_btnc: inode for B-tree node cache flag + * @for_shadow: inode for shadowed page cache flag */ struct nilfs_iget_args { u64 ino; @@ -36,6 +37,7 @@ struct nilfs_iget_args { struct nilfs_root *root; bool for_gc; bool for_btnc; + bool for_shadow; };
static int nilfs_iget_test(struct inode *inode, void *opaque); @@ -325,7 +327,7 @@ static int nilfs_insert_inode_locked(struct inode *inode, { struct nilfs_iget_args args = { .ino = ino, .root = root, .cno = 0, .for_gc = false, - .for_btnc = false + .for_btnc = false, .for_shadow = false };
return insert_inode_locked4(inode, ino, nilfs_iget_test, &args); @@ -543,6 +545,12 @@ static int nilfs_iget_test(struct inode *inode, void *opaque) } else if (args->for_btnc) { return 0; } + if (test_bit(NILFS_I_SHADOW, &ii->i_state)) { + if (!args->for_shadow) + return 0; + } else if (args->for_shadow) { + return 0; + }
if (!test_bit(NILFS_I_GCINODE, &ii->i_state)) return !args->for_gc; @@ -564,6 +572,8 @@ static int nilfs_iget_set(struct inode *inode, void *opaque) NILFS_I(inode)->i_state = BIT(NILFS_I_GCINODE); if (args->for_btnc) NILFS_I(inode)->i_state |= BIT(NILFS_I_BTNC); + if (args->for_shadow) + NILFS_I(inode)->i_state |= BIT(NILFS_I_SHADOW); return 0; }
@@ -572,7 +582,7 @@ struct inode *nilfs_ilookup(struct super_block *sb, struct nilfs_root *root, { struct nilfs_iget_args args = { .ino = ino, .root = root, .cno = 0, .for_gc = false, - .for_btnc = false + .for_btnc = false, .for_shadow = false };
return ilookup5(sb, ino, nilfs_iget_test, &args); @@ -583,7 +593,7 @@ struct inode *nilfs_iget_locked(struct super_block *sb, struct nilfs_root *root, { struct nilfs_iget_args args = { .ino = ino, .root = root, .cno = 0, .for_gc = false, - .for_btnc = false + .for_btnc = false, .for_shadow = false };
return iget5_locked(sb, ino, nilfs_iget_test, nilfs_iget_set, &args); @@ -615,7 +625,7 @@ struct inode *nilfs_iget_for_gc(struct super_block *sb, unsigned long ino, { struct nilfs_iget_args args = { .ino = ino, .root = NULL, .cno = cno, .for_gc = true, - .for_btnc = false + .for_btnc = false, .for_shadow = false }; struct inode *inode; int err; @@ -662,6 +672,7 @@ int nilfs_attach_btree_node_cache(struct inode *inode) args.cno = ii->i_cno; args.for_gc = test_bit(NILFS_I_GCINODE, &ii->i_state) != 0; args.for_btnc = true; + args.for_shadow = test_bit(NILFS_I_SHADOW, &ii->i_state) != 0;
btnc_inode = iget5_locked(inode->i_sb, inode->i_ino, nilfs_iget_test, nilfs_iget_set, &args); @@ -697,6 +708,50 @@ void nilfs_detach_btree_node_cache(struct inode *inode) } }
+/** + * nilfs_iget_for_shadow - obtain inode for shadow mapping + * @inode: inode object that uses shadow mapping + * + * nilfs_iget_for_shadow() allocates a pair of inodes that holds page + * caches for shadow mapping. The page cache for data pages is set up + * in one inode and the one for b-tree node pages is set up in the + * other inode, which is attached to the former inode. + * + * Return Value: On success, a pointer to the inode for data pages is + * returned. On errors, one of the following negative error code is returned + * in a pointer type. + * + * %-ENOMEM - Insufficient memory available. + */ +struct inode *nilfs_iget_for_shadow(struct inode *inode) +{ + struct nilfs_iget_args args = { + .ino = inode->i_ino, .root = NULL, .cno = 0, .for_gc = false, + .for_btnc = false, .for_shadow = true + }; + struct inode *s_inode; + int err; + + s_inode = iget5_locked(inode->i_sb, inode->i_ino, nilfs_iget_test, + nilfs_iget_set, &args); + if (unlikely(!s_inode)) + return ERR_PTR(-ENOMEM); + if (!(s_inode->i_state & I_NEW)) + return inode; + + NILFS_I(s_inode)->i_flags = 0; + memset(NILFS_I(s_inode)->i_bmap, 0, sizeof(struct nilfs_bmap)); + mapping_set_gfp_mask(s_inode->i_mapping, GFP_NOFS); + + err = nilfs_attach_btree_node_cache(s_inode); + if (unlikely(err)) { + iget_failed(s_inode); + return ERR_PTR(err); + } + unlock_new_inode(s_inode); + return s_inode; +} + void nilfs_write_inode_common(struct inode *inode, struct nilfs_inode *raw_inode, int has_bmap) { diff --git a/fs/nilfs2/mdt.c b/fs/nilfs2/mdt.c index 3a1200220b97..7c9055d767d1 100644 --- a/fs/nilfs2/mdt.c +++ b/fs/nilfs2/mdt.c @@ -469,9 +469,18 @@ int nilfs_mdt_init(struct inode *inode, gfp_t gfp_mask, size_t objsz) void nilfs_mdt_clear(struct inode *inode) { struct nilfs_mdt_info *mdi = NILFS_MDT(inode); + struct nilfs_shadow_map *shadow = mdi->mi_shadow;
if (mdi->mi_palloc_cache) nilfs_palloc_destroy_cache(inode); + + if (shadow) { + struct inode *s_inode = shadow->inode; + + shadow->inode = NULL; + iput(s_inode); + mdi->mi_shadow = NULL; + } }
/** @@ -505,12 +514,15 @@ int nilfs_mdt_setup_shadow_map(struct inode *inode, struct nilfs_shadow_map *shadow) { struct nilfs_mdt_info *mi = NILFS_MDT(inode); + struct inode *s_inode;
INIT_LIST_HEAD(&shadow->frozen_buffers); - address_space_init_once(&shadow->frozen_data); - nilfs_mapping_init(&shadow->frozen_data, inode); - address_space_init_once(&shadow->frozen_btnodes); - nilfs_mapping_init(&shadow->frozen_btnodes, inode); + + s_inode = nilfs_iget_for_shadow(inode); + if (IS_ERR(s_inode)) + return PTR_ERR(s_inode); + + shadow->inode = s_inode; mi->mi_shadow = shadow; return 0; } @@ -524,13 +536,14 @@ int nilfs_mdt_save_to_shadow_map(struct inode *inode) struct nilfs_mdt_info *mi = NILFS_MDT(inode); struct nilfs_inode_info *ii = NILFS_I(inode); struct nilfs_shadow_map *shadow = mi->mi_shadow; + struct inode *s_inode = shadow->inode; int ret;
- ret = nilfs_copy_dirty_pages(&shadow->frozen_data, inode->i_mapping); + ret = nilfs_copy_dirty_pages(s_inode->i_mapping, inode->i_mapping); if (ret) goto out;
- ret = nilfs_copy_dirty_pages(&shadow->frozen_btnodes, + ret = nilfs_copy_dirty_pages(NILFS_I(s_inode)->i_assoc_inode->i_mapping, ii->i_assoc_inode->i_mapping); if (ret) goto out; @@ -547,7 +560,7 @@ int nilfs_mdt_freeze_buffer(struct inode *inode, struct buffer_head *bh) struct page *page; int blkbits = inode->i_blkbits;
- page = grab_cache_page(&shadow->frozen_data, bh->b_page->index); + page = grab_cache_page(shadow->inode->i_mapping, bh->b_page->index); if (!page) return -ENOMEM;
@@ -579,7 +592,7 @@ nilfs_mdt_get_frozen_buffer(struct inode *inode, struct buffer_head *bh) struct page *page; int n;
- page = find_lock_page(&shadow->frozen_data, bh->b_page->index); + page = find_lock_page(shadow->inode->i_mapping, bh->b_page->index); if (page) { if (page_has_buffers(page)) { n = bh_offset(bh) >> inode->i_blkbits; @@ -620,11 +633,11 @@ void nilfs_mdt_restore_from_shadow_map(struct inode *inode) nilfs_palloc_clear_cache(inode);
nilfs_clear_dirty_pages(inode->i_mapping, true); - nilfs_copy_back_pages(inode->i_mapping, &shadow->frozen_data); + nilfs_copy_back_pages(inode->i_mapping, shadow->inode->i_mapping);
nilfs_clear_dirty_pages(ii->i_assoc_inode->i_mapping, true); nilfs_copy_back_pages(ii->i_assoc_inode->i_mapping, - &shadow->frozen_btnodes); + NILFS_I(shadow->inode)->i_assoc_inode->i_mapping);
nilfs_bmap_restore(ii->i_bmap, &shadow->bmap_store);
@@ -639,10 +652,11 @@ void nilfs_mdt_clear_shadow_map(struct inode *inode) { struct nilfs_mdt_info *mi = NILFS_MDT(inode); struct nilfs_shadow_map *shadow = mi->mi_shadow; + struct inode *shadow_btnc_inode = NILFS_I(shadow->inode)->i_assoc_inode;
down_write(&mi->mi_sem); nilfs_release_frozen_buffers(shadow); - truncate_inode_pages(&shadow->frozen_data, 0); - truncate_inode_pages(&shadow->frozen_btnodes, 0); + truncate_inode_pages(shadow->inode->i_mapping, 0); + truncate_inode_pages(shadow_btnc_inode->i_mapping, 0); up_write(&mi->mi_sem); } diff --git a/fs/nilfs2/mdt.h b/fs/nilfs2/mdt.h index e77aea4bb921..9d8ac0d27c16 100644 --- a/fs/nilfs2/mdt.h +++ b/fs/nilfs2/mdt.h @@ -18,14 +18,12 @@ /** * struct nilfs_shadow_map - shadow mapping of meta data file * @bmap_store: shadow copy of bmap state - * @frozen_data: shadowed dirty data pages - * @frozen_btnodes: shadowed dirty b-tree nodes' pages + * @inode: holder of page caches used in shadow mapping * @frozen_buffers: list of frozen buffers */ struct nilfs_shadow_map { struct nilfs_bmap_store bmap_store; - struct address_space frozen_data; - struct address_space frozen_btnodes; + struct inode *inode; struct list_head frozen_buffers; };
diff --git a/fs/nilfs2/nilfs.h b/fs/nilfs2/nilfs.h index a63aa5b5993c..8699bdc9e391 100644 --- a/fs/nilfs2/nilfs.h +++ b/fs/nilfs2/nilfs.h @@ -92,6 +92,7 @@ enum { NILFS_I_BMAP, /* has bmap and btnode_cache */ NILFS_I_GCINODE, /* inode for GC, on memory only */ NILFS_I_BTNC, /* inode for btree node cache */ + NILFS_I_SHADOW, /* inode for shadowed page cache */ };
/* @@ -261,6 +262,7 @@ extern struct inode *nilfs_iget_for_gc(struct super_block *sb, unsigned long ino, __u64 cno); int nilfs_attach_btree_node_cache(struct inode *inode); void nilfs_detach_btree_node_cache(struct inode *inode); +struct inode *nilfs_iget_for_shadow(struct inode *inode); extern void nilfs_update_inode(struct inode *, struct buffer_head *, int); extern void nilfs_truncate(struct inode *); extern void nilfs_evict_inode(struct inode *);
From: Takashi Iwai tiwai@suse.de
commit a34ae6c0660d3b96b0055f68ef74dc9478852245 upstream.
The antient ISA wavefront driver reads its sample patch data (uploaded over an ioctl) via __get_user() with no good reason; likely just for some performance optimizations in the past. Let's change this to the standard get_user() and the error check for handling the fault case properly.
Reported-by: Linus Torvalds torvalds@linux-foundation.org Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20220510103626.16635-1-tiwai@suse.de Signed-off-by: Takashi Iwai tiwai@suse.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- sound/isa/wavefront/wavefront_synth.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
--- a/sound/isa/wavefront/wavefront_synth.c +++ b/sound/isa/wavefront/wavefront_synth.c @@ -1092,7 +1092,8 @@ wavefront_send_sample (snd_wavefront_t *
if (dataptr < data_end) { - __get_user (sample_short, dataptr); + if (get_user(sample_short, dataptr)) + return -EFAULT; dataptr += skip; if (data_is_unsigned) { /* GUS ? */
From: Peter Zijlstra peterz@infradead.org
commit 3ac6487e584a1eb54071dbe1212e05b884136704 upstream.
Norbert reported that it's possible to race sys_perf_event_open() such that the looser ends up in another context from the group leader, triggering many WARNs.
The move_group case checks for races against itself, but the !move_group case doesn't, seemingly relying on the previous group_leader->ctx == ctx check. However, that check is racy due to not holding any locks at that time.
Therefore, re-check the result after acquiring locks and bailing if they no longer match.
Additionally, clarify the not_move_group case from the move_group-vs-move_group race.
Fixes: f63a8daa5812 ("perf: Fix event->ctx locking") Reported-by: Norbert Slusarek nslusarek@gmx.net Signed-off-by: Peter Zijlstra (Intel) peterz@infradead.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- kernel/events/core.c | 14 ++++++++++++++ 1 file changed, 14 insertions(+)
--- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -10758,6 +10758,9 @@ SYSCALL_DEFINE5(perf_event_open, * Do not allow to attach to a group in a different task * or CPU context. If we're moving SW events, we'll fix * this up later, so allow that. + * + * Racy, not holding group_leader->ctx->mutex, see comment with + * perf_event_ctx_lock(). */ if (!move_group && group_leader->ctx != ctx) goto err_context; @@ -10807,6 +10810,7 @@ SYSCALL_DEFINE5(perf_event_open, } else { perf_event_ctx_unlock(group_leader, gctx); move_group = 0; + goto not_move_group; } }
@@ -10823,7 +10827,17 @@ SYSCALL_DEFINE5(perf_event_open, } } else { mutex_lock(&ctx->mutex); + + /* + * Now that we hold ctx->lock, (re)validate group_leader->ctx == ctx, + * see the group_leader && !move_group test earlier. + */ + if (group_leader && group_leader->ctx != ctx) { + err = -EINVAL; + goto err_locked; + } } +not_move_group:
if (ctx->task == TASK_TOMBSTONE) { err = -ESRCH;
From: Al Viro viro@zeniv.linux.org.uk
commit fb4554c2232e44d595920f4d5c66cf8f7d13f9bc upstream.
Descriptor table is a shared resource; two fget() on the same descriptor may return different struct file references. get_tap_ptr_ring() is called after we'd found (and pinned) the socket we'll be using and it tries to find the private tun/tap data structures associated with it. Redoing the lookup by the same file descriptor we'd used to get the socket is racy - we need to same struct file.
Thanks to Jason for spotting a braino in the original variant of patch - I'd missed the use of fd == -1 for disabling backend, and in that case we can end up with sock == NULL and sock != oldsock.
Cc: stable@kernel.org Acked-by: Michael S. Tsirkin mst@redhat.com Signed-off-by: Jason Wang jasowang@redhat.com Signed-off-by: Al Viro viro@zeniv.linux.org.uk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/vhost/net.c | 15 +++++++-------- 1 file changed, 7 insertions(+), 8 deletions(-)
--- a/drivers/vhost/net.c +++ b/drivers/vhost/net.c @@ -1209,13 +1209,9 @@ err: return ERR_PTR(r); }
-static struct ptr_ring *get_tap_ptr_ring(int fd) +static struct ptr_ring *get_tap_ptr_ring(struct file *file) { struct ptr_ring *ring; - struct file *file = fget(fd); - - if (!file) - return NULL; ring = tun_get_tx_ring(file); if (!IS_ERR(ring)) goto out; @@ -1224,7 +1220,6 @@ static struct ptr_ring *get_tap_ptr_ring goto out; ring = NULL; out: - fput(file); return ring; }
@@ -1311,8 +1306,12 @@ static long vhost_net_set_backend(struct r = vhost_net_enable_vq(n, vq); if (r) goto err_used; - if (index == VHOST_NET_VQ_RX) - nvq->rx_ring = get_tap_ptr_ring(fd); + if (index == VHOST_NET_VQ_RX) { + if (sock) + nvq->rx_ring = get_tap_ptr_ring(sock->file); + else + nvq->rx_ring = NULL; + }
oldubufs = nvq->ubufs; nvq->ubufs = ubufs;
From: Rafael J. Wysocki rafael.j.wysocki@intel.com
commit 92597f97a40bf661bebceb92e26ff87c76d562d4 upstream.
If a Root Port on Elo i2 is put into D3cold and then back into D0, the downstream device becomes permanently inaccessible, so add a bridge D3 DMI quirk for that system.
This was exposed by 14858dcc3b35 ("PCI: Use pci_update_current_state() in pci_enable_device_flags()"), but before that commit the Root Port in question had never been put into D3cold for real due to a mismatch between its power state retrieved from the PCI_PM_CTRL register (which was accessible even though the platform firmware indicated that the port was in D3cold) and the state of an ACPI power resource involved in its power management.
BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=215715 Link: https://lore.kernel.org/r/11980172.O9o76ZdvQC@kreacher Reported-by: Stefan Gottwald gottwald@igel.com Signed-off-by: Rafael J. Wysocki rafael.j.wysocki@intel.com Signed-off-by: Bjorn Helgaas bhelgaas@google.com Cc: stable@vger.kernel.org # v5.15+ Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/pci/pci.c | 10 ++++++++++ 1 file changed, 10 insertions(+)
--- a/drivers/pci/pci.c +++ b/drivers/pci/pci.c @@ -2517,6 +2517,16 @@ static const struct dmi_system_id bridge DMI_MATCH(DMI_BOARD_VENDOR, "Gigabyte Technology Co., Ltd."), DMI_MATCH(DMI_BOARD_NAME, "X299 DESIGNARE EX-CF"), }, + /* + * Downstream device is not accessible after putting a root port + * into D3cold and back into D0 on Elo i2. + */ + .ident = "Elo i2", + .matches = { + DMI_MATCH(DMI_SYS_VENDOR, "Elo Touch Solutions"), + DMI_MATCH(DMI_PRODUCT_NAME, "Elo i2"), + DMI_MATCH(DMI_PRODUCT_VERSION, "RevB"), + }, }, #endif { }
From: Ondrej Mosnacek omosnace@redhat.com
commit 16287397ec5c08aa58db6acf7dbc55470d78087d upstream.
The commit referenced in the Fixes tag removed the 'break' from the else branch in qcom_rng_read(), causing an infinite loop whenever 'max' is not a multiple of WORD_SZ. This can be reproduced e.g. by running:
kcapi-rng -b 67 >/dev/null
There are many ways to fix this without adding back the 'break', but they all seem more awkward than simply adding it back, so do just that.
Tested on a machine with Qualcomm Amberwing processor.
Fixes: a680b1832ced ("crypto: qcom-rng - ensure buffer for generate is completely filled") Cc: stable@vger.kernel.org Signed-off-by: Ondrej Mosnacek omosnace@redhat.com Reviewed-by: Brian Masney bmasney@redhat.com Signed-off-by: Herbert Xu herbert@gondor.apana.org.au Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/crypto/qcom-rng.c | 1 + 1 file changed, 1 insertion(+)
--- a/drivers/crypto/qcom-rng.c +++ b/drivers/crypto/qcom-rng.c @@ -64,6 +64,7 @@ static int qcom_rng_read(struct qcom_rng } else { /* copy only remaining bytes */ memcpy(data, &val, max - currsize); + break; } } while (currsize < max);
From: Hangyu Hua hbh25y@gmail.com
commit 6e03b13cc7d9427c2c77feed1549191015615202 upstream.
drm_dp_mst_get_edid call kmemdup to create mst_edid. So mst_edid need to be freed after use.
Signed-off-by: Hangyu Hua hbh25y@gmail.com Reviewed-by: Lyude Paul lyude@redhat.com Signed-off-by: Lyude Paul lyude@redhat.com Cc: stable@vger.kernel.org Link: https://patchwork.freedesktop.org/patch/msgid/20220516032042.13166-1-hbh25y@... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/drm_dp_mst_topology.c | 1 + 1 file changed, 1 insertion(+)
--- a/drivers/gpu/drm/drm_dp_mst_topology.c +++ b/drivers/gpu/drm/drm_dp_mst_topology.c @@ -2987,6 +2987,7 @@ static void fetch_monitor_name(struct dr
mst_edid = drm_dp_mst_get_edid(port->connector, mgr, port); drm_edid_get_monitor_name(mst_edid, name, namelen); + kfree(mst_edid); }
/**
From: Ulf Hansson ulf.hansson@linaro.org
commit 0c204979c691f05666ecfb74501e7adfdde8fbf9 upstream
It's been ~6 years ago since we introduced the BKOPS support for eMMC cards. The current code is a bit messy and primarily that's because it prepares to support running BKOPS in an asynchronous mode. However, that mode has never been fully implemented/enabled. Instead BKOPS is always executed in synchronously, when the card has reported an urgent BKOPS level.
For these reasons, let's make the code more readable by dropping the unused parts. Let's also rename mmc_start_bkops() to mmc_run_bkops(), as to make it more descriptive.
Cc: Jaehoon Chung jh80.chung@samsung.com Signed-off-by: Ulf Hansson ulf.hansson@linaro.org Signed-off-by: Florian Fainelli f.fainelli@gmail.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/mmc/core/block.c | 2 - drivers/mmc/core/card.h | 6 --- drivers/mmc/core/mmc.c | 6 --- drivers/mmc/core/mmc_ops.c | 87 +++++++++------------------------------------ drivers/mmc/core/mmc_ops.h | 3 - 5 files changed, 21 insertions(+), 83 deletions(-)
--- a/drivers/mmc/core/block.c +++ b/drivers/mmc/core/block.c @@ -1944,7 +1944,7 @@ static void mmc_blk_urgent_bkops(struct struct mmc_queue_req *mqrq) { if (mmc_blk_urgent_bkops_needed(mq, mqrq)) - mmc_start_bkops(mq->card, true); + mmc_run_bkops(mq->card); }
void mmc_blk_mq_complete(struct request *req) --- a/drivers/mmc/core/card.h +++ b/drivers/mmc/core/card.h @@ -23,15 +23,13 @@ #define MMC_STATE_BLOCKADDR (1<<2) /* card uses block-addressing */ #define MMC_CARD_SDXC (1<<3) /* card is SDXC */ #define MMC_CARD_REMOVED (1<<4) /* card has been removed */ -#define MMC_STATE_DOING_BKOPS (1<<5) /* card is doing BKOPS */ -#define MMC_STATE_SUSPENDED (1<<6) /* card is suspended */ +#define MMC_STATE_SUSPENDED (1<<5) /* card is suspended */
#define mmc_card_present(c) ((c)->state & MMC_STATE_PRESENT) #define mmc_card_readonly(c) ((c)->state & MMC_STATE_READONLY) #define mmc_card_blockaddr(c) ((c)->state & MMC_STATE_BLOCKADDR) #define mmc_card_ext_capacity(c) ((c)->state & MMC_CARD_SDXC) #define mmc_card_removed(c) ((c) && ((c)->state & MMC_CARD_REMOVED)) -#define mmc_card_doing_bkops(c) ((c)->state & MMC_STATE_DOING_BKOPS) #define mmc_card_suspended(c) ((c)->state & MMC_STATE_SUSPENDED)
#define mmc_card_set_present(c) ((c)->state |= MMC_STATE_PRESENT) @@ -39,8 +37,6 @@ #define mmc_card_set_blockaddr(c) ((c)->state |= MMC_STATE_BLOCKADDR) #define mmc_card_set_ext_capacity(c) ((c)->state |= MMC_CARD_SDXC) #define mmc_card_set_removed(c) ((c)->state |= MMC_CARD_REMOVED) -#define mmc_card_set_doing_bkops(c) ((c)->state |= MMC_STATE_DOING_BKOPS) -#define mmc_card_clr_doing_bkops(c) ((c)->state &= ~MMC_STATE_DOING_BKOPS) #define mmc_card_set_suspended(c) ((c)->state |= MMC_STATE_SUSPENDED) #define mmc_card_clr_suspended(c) ((c)->state &= ~MMC_STATE_SUSPENDED)
--- a/drivers/mmc/core/mmc.c +++ b/drivers/mmc/core/mmc.c @@ -2026,12 +2026,6 @@ static int _mmc_suspend(struct mmc_host if (mmc_card_suspended(host->card)) goto out;
- if (mmc_card_doing_bkops(host->card)) { - err = mmc_stop_bkops(host->card); - if (err) - goto out; - } - err = mmc_flush_cache(host->card); if (err) goto out; --- a/drivers/mmc/core/mmc_ops.c +++ b/drivers/mmc/core/mmc_ops.c @@ -899,34 +899,6 @@ int mmc_can_ext_csd(struct mmc_card *car return (card && card->csd.mmca_vsn > CSD_SPEC_VER_3); }
-/** - * mmc_stop_bkops - stop ongoing BKOPS - * @card: MMC card to check BKOPS - * - * Send HPI command to stop ongoing background operations to - * allow rapid servicing of foreground operations, e.g. read/ - * writes. Wait until the card comes out of the programming state - * to avoid errors in servicing read/write requests. - */ -int mmc_stop_bkops(struct mmc_card *card) -{ - int err = 0; - - err = mmc_interrupt_hpi(card); - - /* - * If err is EINVAL, we can't issue an HPI. - * It should complete the BKOPS. - */ - if (!err || (err == -EINVAL)) { - mmc_card_clr_doing_bkops(card); - mmc_retune_release(card->host); - err = 0; - } - - return err; -} - static int mmc_read_bkops_status(struct mmc_card *card) { int err; @@ -943,22 +915,17 @@ static int mmc_read_bkops_status(struct }
/** - * mmc_start_bkops - start BKOPS for supported cards - * @card: MMC card to start BKOPS - * @from_exception: A flag to indicate if this function was - * called due to an exception raised by the card + * mmc_run_bkops - Run BKOPS for supported cards + * @card: MMC card to run BKOPS for * - * Start background operations whenever requested. - * When the urgent BKOPS bit is set in a R1 command response - * then background operations should be started immediately. + * Run background operations synchronously for cards having manual BKOPS + * enabled and in case it reports urgent BKOPS level. */ -void mmc_start_bkops(struct mmc_card *card, bool from_exception) +void mmc_run_bkops(struct mmc_card *card) { int err; - int timeout; - bool use_busy_signal;
- if (!card->ext_csd.man_bkops_en || mmc_card_doing_bkops(card)) + if (!card->ext_csd.man_bkops_en) return;
err = mmc_read_bkops_status(card); @@ -968,44 +935,26 @@ void mmc_start_bkops(struct mmc_card *ca return; }
- if (!card->ext_csd.raw_bkops_status) + if (!card->ext_csd.raw_bkops_status || + card->ext_csd.raw_bkops_status < EXT_CSD_BKOPS_LEVEL_2) return;
- if (card->ext_csd.raw_bkops_status < EXT_CSD_BKOPS_LEVEL_2 && - from_exception) - return; - - if (card->ext_csd.raw_bkops_status >= EXT_CSD_BKOPS_LEVEL_2) { - timeout = MMC_OPS_TIMEOUT_MS; - use_busy_signal = true; - } else { - timeout = 0; - use_busy_signal = false; - } - mmc_retune_hold(card->host);
- err = __mmc_switch(card, EXT_CSD_CMD_SET_NORMAL, - EXT_CSD_BKOPS_START, 1, timeout, 0, - use_busy_signal, true, false); - if (err) { + /* + * For urgent BKOPS status, LEVEL_2 and higher, let's execute + * synchronously. Future wise, we may consider to start BKOPS, for less + * urgent levels by using an asynchronous background task, when idle. + */ + err = mmc_switch(card, EXT_CSD_CMD_SET_NORMAL, + EXT_CSD_BKOPS_START, 1, MMC_OPS_TIMEOUT_MS); + if (err) pr_warn("%s: Error %d starting bkops\n", mmc_hostname(card->host), err); - mmc_retune_release(card->host); - return; - }
- /* - * For urgent bkops status (LEVEL_2 and more) - * bkops executed synchronously, otherwise - * the operation is in progress - */ - if (!use_busy_signal) - mmc_card_set_doing_bkops(card); - else - mmc_retune_release(card->host); + mmc_retune_release(card->host); } -EXPORT_SYMBOL(mmc_start_bkops); +EXPORT_SYMBOL(mmc_run_bkops);
/* * Flush the cache to the non-volatile storage. --- a/drivers/mmc/core/mmc_ops.h +++ b/drivers/mmc/core/mmc_ops.h @@ -40,8 +40,7 @@ int __mmc_switch(struct mmc_card *card, bool use_busy_signal, bool send_status, bool retry_crc_err); int mmc_switch(struct mmc_card *card, u8 set, u8 index, u8 value, unsigned int timeout_ms); -int mmc_stop_bkops(struct mmc_card *card); -void mmc_start_bkops(struct mmc_card *card, bool from_exception); +void mmc_run_bkops(struct mmc_card *card); int mmc_flush_cache(struct mmc_card *card); int mmc_cmdq_enable(struct mmc_card *card); int mmc_cmdq_disable(struct mmc_card *card);
From: Ulf Hansson ulf.hansson@linaro.org
commit 24ed3bd01d6a844fd5e8a75f48d0a3d10ed71bf9 upstream
The timeout values used while waiting for a CMD6 for BKOPS or a CACHE_FLUSH to complete, are not defined by the eMMC spec. However, a timeout of 10 minutes as is currently being used, is just silly for both of these cases. Instead, let's specify more reasonable timeouts, 120s for BKOPS and 30s for CACHE_FLUSH.
Signed-off-by: Ulf Hansson ulf.hansson@linaro.org Link: https://lore.kernel.org/r/20200122142747.5690-2-ulf.hansson@linaro.org Signed-off-by: Florian Fainelli f.fainelli@gmail.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/mmc/core/mmc_ops.c | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-)
--- a/drivers/mmc/core/mmc_ops.c +++ b/drivers/mmc/core/mmc_ops.c @@ -23,7 +23,9 @@ #include "host.h" #include "mmc_ops.h"
-#define MMC_OPS_TIMEOUT_MS (10 * 60 * 1000) /* 10 minute timeout */ +#define MMC_OPS_TIMEOUT_MS (10 * 60 * 1000) /* 10min*/ +#define MMC_BKOPS_TIMEOUT_MS (120 * 1000) /* 120s */ +#define MMC_CACHE_FLUSH_TIMEOUT_MS (30 * 1000) /* 30s */
static const u8 tuning_blk_pattern_4bit[] = { 0xff, 0x0f, 0xff, 0x00, 0xff, 0xcc, 0xc3, 0xcc, @@ -947,7 +949,7 @@ void mmc_run_bkops(struct mmc_card *card * urgent levels by using an asynchronous background task, when idle. */ err = mmc_switch(card, EXT_CSD_CMD_SET_NORMAL, - EXT_CSD_BKOPS_START, 1, MMC_OPS_TIMEOUT_MS); + EXT_CSD_BKOPS_START, 1, MMC_BKOPS_TIMEOUT_MS); if (err) pr_warn("%s: Error %d starting bkops\n", mmc_hostname(card->host), err); @@ -965,7 +967,8 @@ int mmc_flush_cache(struct mmc_card *car
if (mmc_cache_enabled(card->host)) { err = mmc_switch(card, EXT_CSD_CMD_SET_NORMAL, - EXT_CSD_FLUSH_CACHE, 1, 0); + EXT_CSD_FLUSH_CACHE, 1, + MMC_CACHE_FLUSH_TIMEOUT_MS); if (err) pr_err("%s: cache flush error %d\n", mmc_hostname(card->host), err);
From: Ulf Hansson ulf.hansson@linaro.org
commit ad91619aa9d78ab1c6d4a969c3db68bc331ae76c upstream
The INAND_CMD38_ARG_EXT_CSD is a vendor specific EXT_CSD register, which is used to prepare an erase/trim operation. However, it doesn't make sense to use a timeout of 10 minutes while updating the register, which becomes the case when the timeout_ms argument for mmc_switch() is set to zero.
Instead, let's use the generic_cmd6_time, as that seems like a reasonable timeout to use for these cases.
Signed-off-by: Ulf Hansson ulf.hansson@linaro.org Link: https://lore.kernel.org/r/20200122142747.5690-3-ulf.hansson@linaro.org Signed-off-by: Florian Fainelli f.fainelli@gmail.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/mmc/core/block.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
--- a/drivers/mmc/core/block.c +++ b/drivers/mmc/core/block.c @@ -1133,7 +1133,7 @@ static void mmc_blk_issue_discard_rq(str arg == MMC_TRIM_ARG ? INAND_CMD38_ARG_TRIM : INAND_CMD38_ARG_ERASE, - 0); + card->ext_csd.generic_cmd6_time); } if (!err) err = mmc_erase(card, from, nr, arg); @@ -1175,7 +1175,7 @@ retry: arg == MMC_SECURE_TRIM1_ARG ? INAND_CMD38_ARG_SECTRIM1 : INAND_CMD38_ARG_SECERASE, - 0); + card->ext_csd.generic_cmd6_time); if (err) goto out_retry; } @@ -1193,7 +1193,7 @@ retry: err = mmc_switch(card, EXT_CSD_CMD_SET_NORMAL, INAND_CMD38_ARG_EXT_CSD, INAND_CMD38_ARG_SECTRIM2, - 0); + card->ext_csd.generic_cmd6_time); if (err) goto out_retry; }
From: Ulf Hansson ulf.hansson@linaro.org
commit 533a6cfe08f96a7b5c65e06d20916d552c11b256 upstream
All callers of __mmc_switch() should now be specifying a valid timeout for the CMD6 command. However, just to be sure, let's print a warning and default to use the generic_cmd6_time in case the provided timeout_ms argument is zero.
In this context, let's also simplify some of the corresponding code and clarify some related comments.
Signed-off-by: Ulf Hansson ulf.hansson@linaro.org Link: https://lore.kernel.org/r/20200122142747.5690-4-ulf.hansson@linaro.org Signed-off-by: Florian Fainelli f.fainelli@gmail.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/mmc/core/mmc_ops.c | 16 +++++++--------- 1 file changed, 7 insertions(+), 9 deletions(-)
--- a/drivers/mmc/core/mmc_ops.c +++ b/drivers/mmc/core/mmc_ops.c @@ -458,10 +458,6 @@ static int mmc_poll_for_busy(struct mmc_ bool expired = false; bool busy = false;
- /* We have an unspecified cmd timeout, use the fallback value. */ - if (!timeout_ms) - timeout_ms = MMC_OPS_TIMEOUT_MS; - /* * In cases when not allowed to poll by using CMD13 or because we aren't * capable of polling by using ->card_busy(), then rely on waiting the @@ -534,6 +530,12 @@ int __mmc_switch(struct mmc_card *card,
mmc_retune_hold(host);
+ if (!timeout_ms) { + pr_warn("%s: unspecified timeout for CMD6 - use generic\n", + mmc_hostname(host)); + timeout_ms = card->ext_csd.generic_cmd6_time; + } + /* * If the cmd timeout and the max_busy_timeout of the host are both * specified, let's validate them. A failure means we need to prevent @@ -542,7 +544,7 @@ int __mmc_switch(struct mmc_card *card, * which also means they are on their own when it comes to deal with the * busy timeout. */ - if (!(host->caps & MMC_CAP_NEED_RSP_BUSY) && timeout_ms && + if (!(host->caps & MMC_CAP_NEED_RSP_BUSY) && host->max_busy_timeout && (timeout_ms > host->max_busy_timeout)) use_r1b_resp = false;
@@ -554,10 +556,6 @@ int __mmc_switch(struct mmc_card *card, cmd.flags = MMC_CMD_AC; if (use_r1b_resp) { cmd.flags |= MMC_RSP_SPI_R1B | MMC_RSP_R1B; - /* - * A busy_timeout of zero means the host can decide to use - * whatever value it finds suitable. - */ cmd.busy_timeout = timeout_ms; } else { cmd.flags |= MMC_RSP_SPI_R1 | MMC_RSP_R1;
From: Harini Katakam harini.katakam@xilinx.com
[ Upstream commit 9500acc631dbb8b73166e25700e656b11f6007b6 ]
In gem_rx_refill rx_prepared_head is incremented at the beginning of the while loop preparing the skb and data buffers. If the skb or data buffer allocation fails, this BD will be unusable BDs until the head loops back to the same BD (and obviously buffer allocation succeeds). In the unlikely event that there's a string of allocation failures, there will be an equal number of unusable BDs and an inconsistent RX BD chain. Hence increment the head at the end of the while loop to be clean.
Fixes: 4df95131ea80 ("net/macb: change RX path for GEM") Signed-off-by: Harini Katakam harini.katakam@xilinx.com Signed-off-by: Michal Simek michal.simek@xilinx.com Signed-off-by: Radhey Shyam Pandey radhey.shyam.pandey@xilinx.com Reviewed-by: Claudiu Beznea claudiu.beznea@microchip.com Link: https://lore.kernel.org/r/20220512171900.32593-1-harini.katakam@xilinx.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/cadence/macb_main.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/cadence/macb_main.c b/drivers/net/ethernet/cadence/macb_main.c index d8e4842af055..50331b202f73 100644 --- a/drivers/net/ethernet/cadence/macb_main.c +++ b/drivers/net/ethernet/cadence/macb_main.c @@ -915,7 +915,6 @@ static void gem_rx_refill(struct macb_queue *queue) /* Make hw descriptor updates visible to CPU */ rmb();
- queue->rx_prepared_head++; desc = macb_rx_desc(queue, entry);
if (!queue->rx_skbuff[entry]) { @@ -954,6 +953,7 @@ static void gem_rx_refill(struct macb_queue *queue) dma_wmb(); desc->addr &= ~MACB_BIT(RX_USED); } + queue->rx_prepared_head++; }
/* Make descriptor updates visible to hardware */
From: Paolo Abeni pabeni@redhat.com
[ Upstream commit 4d42d54a7d6aa6d29221d3fd4f2ae9503e94f011 ]
syzbot was able to trigger an Out-of-Bound on the pedit action:
UBSAN: shift-out-of-bounds in net/sched/act_pedit.c:238:43 shift exponent 1400735974 is too large for 32-bit type 'unsigned int' CPU: 0 PID: 3606 Comm: syz-executor151 Not tainted 5.18.0-rc5-syzkaller-00165-g810c2f0a3f86 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: <TASK> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106 ubsan_epilogue+0xb/0x50 lib/ubsan.c:151 __ubsan_handle_shift_out_of_bounds.cold+0xb1/0x187 lib/ubsan.c:322 tcf_pedit_init.cold+0x1a/0x1f net/sched/act_pedit.c:238 tcf_action_init_1+0x414/0x690 net/sched/act_api.c:1367 tcf_action_init+0x530/0x8d0 net/sched/act_api.c:1432 tcf_action_add+0xf9/0x480 net/sched/act_api.c:1956 tc_ctl_action+0x346/0x470 net/sched/act_api.c:2015 rtnetlink_rcv_msg+0x413/0xb80 net/core/rtnetlink.c:5993 netlink_rcv_skb+0x153/0x420 net/netlink/af_netlink.c:2502 netlink_unicast_kernel net/netlink/af_netlink.c:1319 [inline] netlink_unicast+0x543/0x7f0 net/netlink/af_netlink.c:1345 netlink_sendmsg+0x904/0xe00 net/netlink/af_netlink.c:1921 sock_sendmsg_nosec net/socket.c:705 [inline] sock_sendmsg+0xcf/0x120 net/socket.c:725 ____sys_sendmsg+0x6e2/0x800 net/socket.c:2413 ___sys_sendmsg+0xf3/0x170 net/socket.c:2467 __sys_sendmsg+0xe5/0x1b0 net/socket.c:2496 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x44/0xae RIP: 0033:0x7fe36e9e1b59 Code: 28 c3 e8 2a 14 00 00 66 2e 0f 1f 84 00 00 00 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 c0 ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007ffef796fe88 EFLAGS: 00000246 ORIG_RAX: 000000000000002e RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fe36e9e1b59 RDX: 0000000000000000 RSI: 0000000020000300 RDI: 0000000000000003 RBP: 00007fe36e9a5d00 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 00007fe36e9a5d90 R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 </TASK>
The 'shift' field is not validated, and any value above 31 will trigger out-of-bounds. The issue predates the git history, but syzbot was able to trigger it only after the commit mentioned in the fixes tag, and this change only applies on top of such commit.
Address the issue bounding the 'shift' value to the maximum allowed by the relevant operator.
Reported-and-tested-by: syzbot+8ed8fc4c57e9dcf23ca6@syzkaller.appspotmail.com Fixes: 8b796475fd78 ("net/sched: act_pedit: really ensure the skb is writable") Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- net/sched/act_pedit.c | 4 ++++ 1 file changed, 4 insertions(+)
diff --git a/net/sched/act_pedit.c b/net/sched/act_pedit.c index fec0f7fdb015..aeb8f84cbd9e 100644 --- a/net/sched/act_pedit.c +++ b/net/sched/act_pedit.c @@ -225,6 +225,10 @@ static int tcf_pedit_init(struct net *net, struct nlattr *nla, for (i = 0; i < p->tcfp_nkeys; ++i) { u32 cur = p->tcfp_keys[i].off;
+ /* sanitize the shift value for any later use */ + p->tcfp_keys[i].shift = min_t(size_t, BITS_PER_TYPE(int) - 1, + p->tcfp_keys[i].shift); + /* The AT option can read a single byte, we can bound the actual * value with uchar max. */
From: Zixuan Fu r33s3n6@gmail.com
[ Upstream commit 9e7fef9521e73ca8afd7da9e58c14654b02dfad8 ]
In vmxnet3_rq_alloc_rx_buf(), when dma_map_single() fails, rbi->skb is freed immediately. Similarly, in another branch, when dma_map_page() fails, rbi->page is also freed. In the two cases, vmxnet3_rq_alloc_rx_buf() returns an error to its callers vmxnet3_rq_init() -> vmxnet3_rq_init_all() -> vmxnet3_activate_dev(). Then vmxnet3_activate_dev() calls vmxnet3_rq_cleanup_all() in error handling code, and rbi->skb or rbi->page are freed again in vmxnet3_rq_cleanup_all(), causing use-after-free bugs.
To fix these possible bugs, rbi->skb and rbi->page should be cleared after they are freed.
The error log in our fault-injection testing is shown as follows:
[ 14.319016] BUG: KASAN: use-after-free in consume_skb+0x2f/0x150 ... [ 14.321586] Call Trace: ... [ 14.325357] consume_skb+0x2f/0x150 [ 14.325671] vmxnet3_rq_cleanup_all+0x33a/0x4e0 [vmxnet3] [ 14.326150] vmxnet3_activate_dev+0xb9d/0x2ca0 [vmxnet3] [ 14.326616] vmxnet3_open+0x387/0x470 [vmxnet3] ... [ 14.361675] Allocated by task 351: ... [ 14.362688] __netdev_alloc_skb+0x1b3/0x6f0 [ 14.362960] vmxnet3_rq_alloc_rx_buf+0x1b0/0x8d0 [vmxnet3] [ 14.363317] vmxnet3_activate_dev+0x3e3/0x2ca0 [vmxnet3] [ 14.363661] vmxnet3_open+0x387/0x470 [vmxnet3] ... [ 14.367309] [ 14.367412] Freed by task 351: ... [ 14.368932] __dev_kfree_skb_any+0xd2/0xe0 [ 14.369193] vmxnet3_rq_alloc_rx_buf+0x71e/0x8d0 [vmxnet3] [ 14.369544] vmxnet3_activate_dev+0x3e3/0x2ca0 [vmxnet3] [ 14.369883] vmxnet3_open+0x387/0x470 [vmxnet3] [ 14.370174] __dev_open+0x28a/0x420 [ 14.370399] __dev_change_flags+0x192/0x590 [ 14.370667] dev_change_flags+0x7a/0x180 [ 14.370919] do_setlink+0xb28/0x3570 [ 14.371150] rtnl_newlink+0x1160/0x1740 [ 14.371399] rtnetlink_rcv_msg+0x5bf/0xa50 [ 14.371661] netlink_rcv_skb+0x1cd/0x3e0 [ 14.371913] netlink_unicast+0x5dc/0x840 [ 14.372169] netlink_sendmsg+0x856/0xc40 [ 14.372420] ____sys_sendmsg+0x8a7/0x8d0 [ 14.372673] __sys_sendmsg+0x1c2/0x270 [ 14.372914] do_syscall_64+0x41/0x90 [ 14.373145] entry_SYSCALL_64_after_hwframe+0x44/0xae ...
Fixes: 5738a09d58d5a ("vmxnet3: fix checks for dma mapping errors") Reported-by: TOTE Robot oslab@tsinghua.edu.cn Signed-off-by: Zixuan Fu r33s3n6@gmail.com Link: https://lore.kernel.org/r/20220514050656.2636588-1-r33s3n6@gmail.com Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/vmxnet3/vmxnet3_drv.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/drivers/net/vmxnet3/vmxnet3_drv.c b/drivers/net/vmxnet3/vmxnet3_drv.c index c004819bebe3..1df67c899d4f 100644 --- a/drivers/net/vmxnet3/vmxnet3_drv.c +++ b/drivers/net/vmxnet3/vmxnet3_drv.c @@ -595,6 +595,7 @@ vmxnet3_rq_alloc_rx_buf(struct vmxnet3_rx_queue *rq, u32 ring_idx, if (dma_mapping_error(&adapter->pdev->dev, rbi->dma_addr)) { dev_kfree_skb_any(rbi->skb); + rbi->skb = NULL; rq->stats.rx_buf_alloc_failure++; break; } @@ -619,6 +620,7 @@ vmxnet3_rq_alloc_rx_buf(struct vmxnet3_rx_queue *rq, u32 ring_idx, if (dma_mapping_error(&adapter->pdev->dev, rbi->dma_addr)) { put_page(rbi->page); + rbi->page = NULL; rq->stats.rx_buf_alloc_failure++; break; }
From: Zixuan Fu r33s3n6@gmail.com
[ Upstream commit edf410cb74dc612fd47ef5be319c5a0bcd6e6ccd ]
In vmxnet3_rq_create(), when dma_alloc_coherent() fails, vmxnet3_rq_destroy() is called. It sets rq->rx_ring[i].base to NULL. Then vmxnet3_rq_create() returns an error to its callers mxnet3_rq_create_all() -> vmxnet3_change_mtu(). Then vmxnet3_change_mtu() calls vmxnet3_force_close() -> dev_close() in error handling code. And the driver calls vmxnet3_close() -> vmxnet3_quiesce_dev() -> vmxnet3_rq_cleanup_all() -> vmxnet3_rq_cleanup(). In vmxnet3_rq_cleanup(), rq->rx_ring[ring_idx].base is accessed, but this variable is NULL, causing a NULL pointer dereference.
To fix this possible bug, an if statement is added to check whether rq->rx_ring[0].base is NULL in vmxnet3_rq_cleanup() and exit early if so.
The error log in our fault-injection testing is shown as follows:
[ 65.220135] BUG: kernel NULL pointer dereference, address: 0000000000000008 ... [ 65.222633] RIP: 0010:vmxnet3_rq_cleanup_all+0x396/0x4e0 [vmxnet3] ... [ 65.227977] Call Trace: ... [ 65.228262] vmxnet3_quiesce_dev+0x80f/0x8a0 [vmxnet3] [ 65.228580] vmxnet3_close+0x2c4/0x3f0 [vmxnet3] [ 65.228866] __dev_close_many+0x288/0x350 [ 65.229607] dev_close_many+0xa4/0x480 [ 65.231124] dev_close+0x138/0x230 [ 65.231933] vmxnet3_force_close+0x1f0/0x240 [vmxnet3] [ 65.232248] vmxnet3_change_mtu+0x75d/0x920 [vmxnet3] ...
Fixes: d1a890fa37f27 ("net: VMware virtual Ethernet NIC driver: vmxnet3") Reported-by: TOTE Robot oslab@tsinghua.edu.cn Signed-off-by: Zixuan Fu r33s3n6@gmail.com Link: https://lore.kernel.org/r/20220514050711.2636709-1-r33s3n6@gmail.com Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/vmxnet3/vmxnet3_drv.c | 4 ++++ 1 file changed, 4 insertions(+)
diff --git a/drivers/net/vmxnet3/vmxnet3_drv.c b/drivers/net/vmxnet3/vmxnet3_drv.c index 1df67c899d4f..a57ea3914968 100644 --- a/drivers/net/vmxnet3/vmxnet3_drv.c +++ b/drivers/net/vmxnet3/vmxnet3_drv.c @@ -1586,6 +1586,10 @@ vmxnet3_rq_cleanup(struct vmxnet3_rx_queue *rq, u32 i, ring_idx; struct Vmxnet3_RxDesc *rxd;
+ /* ring has already been cleaned up */ + if (!rq->rx_ring[0].base) + return; + for (ring_idx = 0; ring_idx < 2; ring_idx++) { for (i = 0; i < rq->rx_ring[ring_idx].size; i++) { #ifdef __BIG_ENDIAN_BITFIELD
From: Codrin Ciubotariu codrin.ciubotariu@microchip.com
[ Upstream commit d0031e6fbed955ff8d5f5bbc8fe7382482559cec ]
clk_generated_best_diff() helps in finding the parent and the divisor to compute a rate closest to the required one. However, it doesn't take into account the request's range for the new rate. Make sure the new rate is within the required range.
Fixes: 8a8f4bf0c480 ("clk: at91: clk-generated: create function to find best_diff") Signed-off-by: Codrin Ciubotariu codrin.ciubotariu@microchip.com Link: https://lore.kernel.org/r/20220413071318.244912-1-codrin.ciubotariu@microchi... Reviewed-by: Claudiu Beznea claudiu.beznea@microchip.com Signed-off-by: Stephen Boyd sboyd@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/clk/at91/clk-generated.c | 4 ++++ 1 file changed, 4 insertions(+)
diff --git a/drivers/clk/at91/clk-generated.c b/drivers/clk/at91/clk-generated.c index ea23002be4de..b397556c34d9 100644 --- a/drivers/clk/at91/clk-generated.c +++ b/drivers/clk/at91/clk-generated.c @@ -119,6 +119,10 @@ static void clk_generated_best_diff(struct clk_rate_request *req, tmp_rate = parent_rate; else tmp_rate = parent_rate / div; + + if (tmp_rate < req->min_rate || tmp_rate > req->max_rate) + return; + tmp_diff = abs(req->rate - tmp_rate);
if (*best_diff < 0 || *best_diff > tmp_diff) {
From: Christophe JAILLET christophe.jaillet@wanadoo.fr
[ Upstream commit 5361448e45fac6fb96738df748229432a62d78b6 ]
test_bit() tests if one bit is set or not. Here the logic seems to check of bit QL_RESET_PER_SCSI (i.e. 4) OR bit QL_RESET_START (i.e. 3) is set.
In fact, it checks if bit 7 (4 | 3 = 7) is set, that is to say QL_ADAPTER_UP.
This looks harmless, because this bit is likely be set, and when the ql_reset_work() delayed work is scheduled in ql3xxx_isr() (the only place that schedule this work), QL_RESET_START or QL_RESET_PER_SCSI is set.
This has been spotted by smatch.
Fixes: 5a4faa873782 ("[PATCH] qla3xxx NIC driver") Signed-off-by: Christophe JAILLET christophe.jaillet@wanadoo.fr Link: https://lore.kernel.org/r/80e73e33f390001d9c0140ffa9baddf6466a41a2.165263733... Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/qlogic/qla3xxx.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/qlogic/qla3xxx.c b/drivers/net/ethernet/qlogic/qla3xxx.c index f38dda1d92e2..51e17a635d4b 100644 --- a/drivers/net/ethernet/qlogic/qla3xxx.c +++ b/drivers/net/ethernet/qlogic/qla3xxx.c @@ -3630,7 +3630,8 @@ static void ql_reset_work(struct work_struct *work) qdev->mem_map_registers; unsigned long hw_flags;
- if (test_bit((QL_RESET_PER_SCSI | QL_RESET_START), &qdev->flags)) { + if (test_bit(QL_RESET_PER_SCSI, &qdev->flags) || + test_bit(QL_RESET_START, &qdev->flags)) { clear_bit(QL_LINK_MASTER, &qdev->flags);
/*
From: Duoming Zhou duoming@zju.edu.cn
[ Upstream commit 23dd4581350d4ffa23d58976ec46408f8f4c1e16 ]
There are sleep in atomic context bugs when the request to secure element of st-nci is timeout. The root cause is that nci_skb_alloc with GFP_KERNEL parameter is called in st_nci_se_wt_timeout which is a timer handler. The call paths that could trigger bugs are shown below:
(interrupt context 1) st_nci_se_wt_timeout nci_hci_send_event nci_hci_send_data nci_skb_alloc(..., GFP_KERNEL) //may sleep
(interrupt context 2) st_nci_se_wt_timeout nci_hci_send_event nci_hci_send_data nci_send_data nci_queue_tx_data_frags nci_skb_alloc(..., GFP_KERNEL) //may sleep
This patch changes allocation mode of nci_skb_alloc from GFP_KERNEL to GFP_ATOMIC in order to prevent atomic context sleeping. The GFP_ATOMIC flag makes memory allocation operation could be used in atomic context.
Fixes: ed06aeefdac3 ("nfc: st-nci: Rename st21nfcb to st-nci") Signed-off-by: Duoming Zhou duoming@zju.edu.cn Reviewed-by: Krzysztof Kozlowski krzysztof.kozlowski@linaro.org Link: https://lore.kernel.org/r/20220517012530.75714-1-duoming@zju.edu.cn Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/nfc/nci/data.c | 2 +- net/nfc/nci/hci.c | 4 ++-- 2 files changed, 3 insertions(+), 3 deletions(-)
diff --git a/net/nfc/nci/data.c b/net/nfc/nci/data.c index 5405d073804c..9e3f9460f14f 100644 --- a/net/nfc/nci/data.c +++ b/net/nfc/nci/data.c @@ -130,7 +130,7 @@ static int nci_queue_tx_data_frags(struct nci_dev *ndev,
skb_frag = nci_skb_alloc(ndev, (NCI_DATA_HDR_SIZE + frag_len), - GFP_KERNEL); + GFP_ATOMIC); if (skb_frag == NULL) { rc = -ENOMEM; goto free_exit; diff --git a/net/nfc/nci/hci.c b/net/nfc/nci/hci.c index c972c212e7ca..e5c5cff33236 100644 --- a/net/nfc/nci/hci.c +++ b/net/nfc/nci/hci.c @@ -165,7 +165,7 @@ static int nci_hci_send_data(struct nci_dev *ndev, u8 pipe,
i = 0; skb = nci_skb_alloc(ndev, conn_info->max_pkt_payload_len + - NCI_DATA_HDR_SIZE, GFP_KERNEL); + NCI_DATA_HDR_SIZE, GFP_ATOMIC); if (!skb) return -ENOMEM;
@@ -198,7 +198,7 @@ static int nci_hci_send_data(struct nci_dev *ndev, u8 pipe, if (i < data_len) { skb = nci_skb_alloc(ndev, conn_info->max_pkt_payload_len + - NCI_DATA_HDR_SIZE, GFP_KERNEL); + NCI_DATA_HDR_SIZE, GFP_ATOMIC); if (!skb) return -ENOMEM;
From: Maxim Mikityanskiy maximmi@nvidia.com
[ Upstream commit cf6e34c8c22fba66bd21244b95ea47e235f68974 ]
LRO is incompatible and mutually exclusive with XDP. However, the needed checks are only made when enabling XDP. If LRO is enabled when XDP is already active, the command will succeed, and XDP will be skipped in the data path, although still enabled.
This commit fixes the bug by checking the XDP status in mlx5e_fix_features and disabling LRO if XDP is enabled.
Fixes: 86994156c736 ("net/mlx5e: XDP fast RX drop bpf programs support") Signed-off-by: Maxim Mikityanskiy maximmi@nvidia.com Reviewed-by: Tariq Toukan tariqt@nvidia.com Signed-off-by: Saeed Mahameed saeedm@nvidia.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 7 +++++++ 1 file changed, 7 insertions(+)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c index 5979fcf124bb..75872aef44d0 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c @@ -3739,6 +3739,13 @@ static netdev_features_t mlx5e_fix_features(struct net_device *netdev, netdev_warn(netdev, "Disabling LRO, not supported in legacy RQ\n"); }
+ if (params->xdp_prog) { + if (features & NETIF_F_LRO) { + netdev_warn(netdev, "LRO is incompatible with XDP\n"); + features &= ~NETIF_F_LRO; + } + } + if (MLX5E_GET_PFLAG(params, MLX5E_PFLAG_RX_CQE_COMPRESS)) { features &= ~NETIF_F_RXHASH; if (netdev->features & NETIF_F_RXHASH)
From: Jiasheng Jiang jiasheng@iscas.ac.cn
[ Upstream commit 4dc2a5a8f6754492180741facf2a8787f2c415d7 ]
If skb_clone() returns null pointer, pfkey_broadcast() will return error. Therefore, it should be better to check the return value of pfkey_broadcast() and return error if fails.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Jiasheng Jiang jiasheng@iscas.ac.cn Signed-off-by: Steffen Klassert steffen.klassert@secunet.com Signed-off-by: Sasha Levin sashal@kernel.org --- net/key/af_key.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/net/key/af_key.c b/net/key/af_key.c index a416c0f90056..170960ef7e36 100644 --- a/net/key/af_key.c +++ b/net/key/af_key.c @@ -2836,8 +2836,10 @@ static int pfkey_process(struct sock *sk, struct sk_buff *skb, const struct sadb void *ext_hdrs[SADB_EXT_MAX]; int err;
- pfkey_broadcast(skb_clone(skb, GFP_KERNEL), GFP_KERNEL, - BROADCAST_PROMISC_ONLY, NULL, sock_net(sk)); + err = pfkey_broadcast(skb_clone(skb, GFP_KERNEL), GFP_KERNEL, + BROADCAST_PROMISC_ONLY, NULL, sock_net(sk)); + if (err) + return err;
memset(ext_hdrs, 0, sizeof(ext_hdrs)); err = parse_exthdrs(skb, hdr, ext_hdrs);
From: Ard Biesheuvel ardb@kernel.org
[ Upstream commit 0dc14aa94ccd8ba35eb17a0f9b123d1566efd39e ]
The Spectre-BHB mitigations were inadvertently left disabled for Cortex-A15, due to the fact that cpu_v7_bugs_init() is not called in that case. So fix that.
Fixes: b9baf5c8c5c3 ("ARM: Spectre-BHB workaround") Signed-off-by: Ard Biesheuvel ardb@kernel.org Signed-off-by: Russell King (Oracle) rmk+kernel@armlinux.org.uk Signed-off-by: Sasha Levin sashal@kernel.org --- arch/arm/mm/proc-v7-bugs.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/arch/arm/mm/proc-v7-bugs.c b/arch/arm/mm/proc-v7-bugs.c index 8394307272d6..0381e1495486 100644 --- a/arch/arm/mm/proc-v7-bugs.c +++ b/arch/arm/mm/proc-v7-bugs.c @@ -302,6 +302,7 @@ void cpu_v7_ca15_ibe(void) { if (check_spectre_auxcr(this_cpu_ptr(&spectre_warned), BIT(0))) cpu_v7_spectre_v2_init(); + cpu_v7_spectre_bhb_init(); }
void cpu_v7_bugs_init(void)
From: Ard Biesheuvel ardb@kernel.org
[ Upstream commit 3cfb3019979666bdf33a1010147363cf05e0f17b ]
In Thumb2, 'b . + 4' produces a branch instruction that uses a narrow encoding, and so it does not jump to the following instruction as expected. So use W(b) instead.
Fixes: 6c7cb60bff7a ("ARM: fix Thumb2 regression with Spectre BHB") Signed-off-by: Ard Biesheuvel ardb@kernel.org Signed-off-by: Russell King (Oracle) rmk+kernel@armlinux.org.uk Signed-off-by: Sasha Levin sashal@kernel.org --- arch/arm/kernel/entry-armv.S | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/arm/kernel/entry-armv.S b/arch/arm/kernel/entry-armv.S index a929b6acb149..d779cd1a3b0c 100644 --- a/arch/arm/kernel/entry-armv.S +++ b/arch/arm/kernel/entry-armv.S @@ -1067,7 +1067,7 @@ vector_bhb_loop8_\name:
@ bhb workaround mov r0, #8 -3: b . + 4 +3: W(b) . + 4 subs r0, r0, #1 bne 3b dsb
From: Kevin Mitchell kevmitch@arista.com
[ Upstream commit 942d2ad5d2e0df758a645ddfadffde2795322728 ]
igb_read_phy_reg() will silently return, leaving phy_data untouched, if hw->ops.read_reg isn't set. Depending on the uninitialized value of phy_data, this led to the phy status check either succeeding immediately or looping continuously for 2 seconds before emitting a noisy err-level timeout. This message went out to the console even though there was no actual problem.
Instead, first check if there is read_reg function pointer. If not, proceed without trying to check the phy status register.
Fixes: b72f3f72005d ("igb: When GbE link up, wait for Remote receiver status condition") Signed-off-by: Kevin Mitchell kevmitch@arista.com Tested-by: Gurucharan gurucharanx.g@intel.com (A Contingent worker at Intel) Signed-off-by: Tony Nguyen anthony.l.nguyen@intel.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/intel/igb/igb_main.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c index 74b50f17832d..a93edd31011f 100644 --- a/drivers/net/ethernet/intel/igb/igb_main.c +++ b/drivers/net/ethernet/intel/igb/igb_main.c @@ -5346,7 +5346,8 @@ static void igb_watchdog_task(struct work_struct *work) break; }
- if (adapter->link_speed != SPEED_1000) + if (adapter->link_speed != SPEED_1000 || + !hw->phy.ops.read_reg) goto no_wait;
/* wait for Remote receiver status OK */
From: Andrew Lunn andrew@lunn.ch
[ Upstream commit fbb3abdf2223cd0dfc07de85fe5a43ba7f435bdf ]
It is possible to stack bridges on top of each other. Consider the following which makes use of an Ethernet switch:
br1 / \ / \ / \ br0.11 wlan0 | br0 / | \ p1 p2 p3
br0 is offloaded to the switch. Above br0 is a vlan interface, for vlan 11. This vlan interface is then a slave of br1. br1 also has a wireless interface as a slave. This setup trunks wireless lan traffic over the copper network inside a VLAN.
A frame received on p1 which is passed up to the bridge has the skb->offload_fwd_mark flag set to true, indicating that the switch has dealt with forwarding the frame out ports p2 and p3 as needed. This flag instructs the software bridge it does not need to pass the frame back down again. However, the flag is not getting reset when the frame is passed upwards. As a result br1 sees the flag, wrongly interprets it, and fails to forward the frame to wlan0.
When passing a frame upwards, clear the flag. This is the Rx equivalent of br_switchdev_frame_unmark() in br_dev_xmit().
Fixes: f1c2eddf4cb6 ("bridge: switchdev: Use an helper to clear forward mark") Signed-off-by: Andrew Lunn andrew@lunn.ch Reviewed-by: Ido Schimmel idosch@nvidia.com Tested-by: Ido Schimmel idosch@nvidia.com Acked-by: Nikolay Aleksandrov razor@blackwall.org Link: https://lore.kernel.org/r/20220518005840.771575-1-andrew@lunn.ch Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- net/bridge/br_input.c | 7 +++++++ 1 file changed, 7 insertions(+)
diff --git a/net/bridge/br_input.c b/net/bridge/br_input.c index 2532c1a19645..14c2fdc268ea 100644 --- a/net/bridge/br_input.c +++ b/net/bridge/br_input.c @@ -47,6 +47,13 @@ static int br_pass_frame_up(struct sk_buff *skb) u64_stats_update_end(&brstats->syncp);
vg = br_vlan_group_rcu(br); + + /* Reset the offload_fwd_mark because there could be a stacked + * bridge above, and it should not think this bridge it doing + * that bridge's work forwarding out its ports. + */ + br_switchdev_frame_unmark(skb); + /* Bridge is just like any other port. Make sure the * packet is allowed except in promisc modue when someone * may be running packet capture.
From: Haibo Chen haibo.chen@nxp.com
[ Upstream commit 9bf3ac466faa83d51a8fe9212131701e58fdef74 ]
For gpio controller contain register PDDR, when set one target bit, current logic will clear all other bits, this is wrong. Use operator '|=' to fix it.
Fixes: 659d8a62311f ("gpio: vf610: add imx7ulp support") Reviewed-by: Peng Fan peng.fan@nxp.com Signed-off-by: Haibo Chen haibo.chen@nxp.com Signed-off-by: Bartosz Golaszewski brgl@bgdev.pl Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpio/gpio-vf610.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/drivers/gpio/gpio-vf610.c b/drivers/gpio/gpio-vf610.c index a9cb5571de54..f7692999df47 100644 --- a/drivers/gpio/gpio-vf610.c +++ b/drivers/gpio/gpio-vf610.c @@ -135,9 +135,13 @@ static int vf610_gpio_direction_output(struct gpio_chip *chip, unsigned gpio, { struct vf610_gpio_port *port = gpiochip_get_data(chip); unsigned long mask = BIT(gpio); + u32 val;
- if (port->sdata && port->sdata->have_paddr) - vf610_gpio_writel(mask, port->gpio_base + GPIO_PDDR); + if (port->sdata && port->sdata->have_paddr) { + val = vf610_gpio_readl(port->gpio_base + GPIO_PDDR); + val |= mask; + vf610_gpio_writel(val, port->gpio_base + GPIO_PDDR); + }
vf610_gpio_set(chip, gpio, value);
From: Uwe Kleine-König u.kleine-koenig@pengutronix.de
[ Upstream commit 3ecb10175b1f776f076553c24e2689e42953fef5 ]
The driver doesn't take struct pwm_state::polarity into account when configuring the hardware, so refuse requests for inverted polarity.
Fixes: 757642f9a584 ("gpio: mvebu: Add limited PWM support") Signed-off-by: Uwe Kleine-König u.kleine-koenig@pengutronix.de Signed-off-by: Bartosz Golaszewski brgl@bgdev.pl Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpio/gpio-mvebu.c | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/drivers/gpio/gpio-mvebu.c b/drivers/gpio/gpio-mvebu.c index 874caed72390..199b96967ca2 100644 --- a/drivers/gpio/gpio-mvebu.c +++ b/drivers/gpio/gpio-mvebu.c @@ -690,6 +690,9 @@ static int mvebu_pwm_apply(struct pwm_chip *chip, struct pwm_device *pwm, unsigned long flags; unsigned int on, off;
+ if (state->polarity != PWM_POLARITY_NORMAL) + return -EINVAL; + val = (unsigned long long) mvpwm->clk_rate * state->duty_cycle; do_div(val, NSEC_PER_SEC); if (val > UINT_MAX)
From: Thomas Richter tmricht@linux.ibm.com
[ Upstream commit f8ac1c478424a9a14669b8cef7389b1e14e5229d ]
The compilation on s390 results in this error:
# make DEBUG=y bench/numa.o ... bench/numa.c: In function ‘__bench_numa’: bench/numa.c:1749:81: error: ‘%d’ directive output may be truncated writing between 1 and 11 bytes into a region of size between 10 and 20 [-Werror=format-truncation=] 1749 | snprintf(tname, sizeof(tname), "process%d:thread%d", p, t); ^~ ... bench/numa.c:1749:64: note: directive argument in the range [-2147483647, 2147483646] ... #
The maximum length of the %d replacement is 11 characters because of the negative sign. Therefore extend the array by two more characters.
Output after:
# make DEBUG=y bench/numa.o > /dev/null 2>&1; ll bench/numa.o -rw-r--r-- 1 root root 418320 May 19 09:11 bench/numa.o #
Fixes: 3aff8ba0a4c9c919 ("perf bench numa: Avoid possible truncation when using snprintf()") Suggested-by: Namhyung Kim namhyung@gmail.com Signed-off-by: Thomas Richter tmricht@linux.ibm.com Cc: Heiko Carstens hca@linux.ibm.com Cc: Sumanth Korikkar sumanthk@linux.ibm.com Cc: Sven Schnelle svens@linux.ibm.com Cc: Vasily Gorbik gor@linux.ibm.com Link: https://lore.kernel.org/r/20220520081158.2990006-1-tmricht@linux.ibm.com Signed-off-by: Arnaldo Carvalho de Melo acme@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- tools/perf/bench/numa.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/perf/bench/numa.c b/tools/perf/bench/numa.c index 91c0a4434da2..e7fde88a0845 100644 --- a/tools/perf/bench/numa.c +++ b/tools/perf/bench/numa.c @@ -1631,7 +1631,7 @@ static int __bench_numa(const char *name) "GB/sec,", "total-speed", "GB/sec total speed");
if (g->p.show_details >= 2) { - char tname[14 + 2 * 10 + 1]; + char tname[14 + 2 * 11 + 1]; struct thread_data *td; for (p = 0; p < g->p.nr_proc; p++) { for (t = 0; t < g->p.nr_threads; t++) {
From: Gleb Chesnokov Chesnokov.G@raidix.com
[ Upstream commit 26f9ce53817a8fd84b69a73473a7de852a24c897 ]
Aborting commands that have already been sent to the firmware can cause BUG in qlt_free_cmd(): BUG_ON(cmd->sg_mapped)
For instance:
- Command passes rdx_to_xfer state, maps sgl, sends to the firmware
- Reset occurs, qla2xxx performs ISP error recovery, aborts the command
- Target stack calls qlt_abort_cmd() and then qlt_free_cmd()
- BUG_ON(cmd->sg_mapped) in qlt_free_cmd() occurs because sgl was not unmapped
Thus, unmap sgl in qlt_abort_cmd() for commands with the aborted flag set.
Link: https://lore.kernel.org/r/AS8PR10MB4952D545F84B6B1DFD39EC1E9DEE9@AS8PR10MB49... Reviewed-by: Himanshu Madhani himanshu.madhani@oracle.com Signed-off-by: Gleb Chesnokov Chesnokov.G@raidix.com Signed-off-by: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/scsi/qla2xxx/qla_target.c | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/drivers/scsi/qla2xxx/qla_target.c b/drivers/scsi/qla2xxx/qla_target.c index 09c52ef66887..27d3293eadf5 100644 --- a/drivers/scsi/qla2xxx/qla_target.c +++ b/drivers/scsi/qla2xxx/qla_target.c @@ -3753,6 +3753,9 @@ int qlt_abort_cmd(struct qla_tgt_cmd *cmd)
spin_lock_irqsave(&cmd->cmd_lock, flags); if (cmd->aborted) { + if (cmd->sg_mapped) + qlt_unmap_sg(vha, cmd); + spin_unlock_irqrestore(&cmd->cmd_lock, flags); /* * It's normal to see 2 calls in this path:
From: Felix Fietkau nbd@nbd.name
[ Upstream commit 5e469ed9764d4722c59562da13120bd2dc6834c5 ]
When the QoS ack policy was set to non explicit / psmp ack, frames are treated as not being part of a BA session, which causes extra latency on reordering. Fix this by only bypassing reordering for packets with no-ack policy
Signed-off-by: Felix Fietkau nbd@nbd.name Link: https://lore.kernel.org/r/20220420105038.36443-1-nbd@nbd.name Signed-off-by: Johannes Berg johannes.berg@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- net/mac80211/rx.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/net/mac80211/rx.c b/net/mac80211/rx.c index f30b732af61d..3598ebe52d08 100644 --- a/net/mac80211/rx.c +++ b/net/mac80211/rx.c @@ -1322,8 +1322,7 @@ static void ieee80211_rx_reorder_ampdu(struct ieee80211_rx_data *rx, goto dont_reorder;
/* not part of a BA session */ - if (ack_policy != IEEE80211_QOS_CTL_ACK_POLICY_BLOCKACK && - ack_policy != IEEE80211_QOS_CTL_ACK_POLICY_NORMAL) + if (ack_policy == IEEE80211_QOS_CTL_ACK_POLICY_NOACK) goto dont_reorder;
/* new, potentially un-ordered, ampdu frame - process it */
From: Yang Yingliang yangyingliang@huawei.com
[ Upstream commit 51ca86b4c9c7c75f5630fa0dbe5f8f0bd98e3c3e ]
Fix the missing pci_disable_device() before return from tulip_init_one() in the error handling case.
Reported-by: Hulk Robot hulkci@huawei.com Signed-off-by: Yang Yingliang yangyingliang@huawei.com Link: https://lore.kernel.org/r/20220506094250.3630615-1-yangyingliang@huawei.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/dec/tulip/tulip_core.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/dec/tulip/tulip_core.c b/drivers/net/ethernet/dec/tulip/tulip_core.c index 3e3e08698876..fea4223ad6f1 100644 --- a/drivers/net/ethernet/dec/tulip/tulip_core.c +++ b/drivers/net/ethernet/dec/tulip/tulip_core.c @@ -1410,8 +1410,10 @@ static int tulip_init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
/* alloc_etherdev ensures aligned and zeroed private structures */ dev = alloc_etherdev (sizeof (*tp)); - if (!dev) + if (!dev) { + pci_disable_device(pdev); return -ENOMEM; + }
SET_NETDEV_DEV(dev, &pdev->dev); if (pci_resource_len (pdev, 0) < tulip_tbl[chip_idx].io_size) { @@ -1788,6 +1790,7 @@ static int tulip_init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
err_out_free_netdev: free_netdev (dev); + pci_disable_device(pdev); return -ENODEV; }
From: Yang Yingliang yangyingliang@huawei.com
[ Upstream commit 0807ce0b010418a191e0e4009803b2d74c3245d5 ]
Switch to using pcim_enable_device() to avoid missing pci_disable_device().
Reported-by: Hulk Robot hulkci@huawei.com Signed-off-by: Yang Yingliang yangyingliang@huawei.com Link: https://lore.kernel.org/r/20220510031316.1780409-1-yangyingliang@huawei.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/stmicro/stmmac/stmmac_pci.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-)
diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_pci.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_pci.c index cc1e887e47b5..3dec109251ad 100644 --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_pci.c +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_pci.c @@ -261,7 +261,7 @@ static int stmmac_pci_probe(struct pci_dev *pdev, return -ENOMEM;
/* Enable pci device */ - ret = pci_enable_device(pdev); + ret = pcim_enable_device(pdev); if (ret) { dev_err(&pdev->dev, "%s: ERROR: failed to enable device\n", __func__); @@ -313,8 +313,6 @@ static void stmmac_pci_remove(struct pci_dev *pdev) pcim_iounmap_regions(pdev, BIT(i)); break; } - - pci_disable_device(pdev); }
static int __maybe_unused stmmac_pci_suspend(struct device *dev)
From: Grant Grundler grundler@chromium.org
[ Upstream commit 2120b7f4d128433ad8c5f503a9584deba0684901 ]
Bounds check hw_head index provided by NIC to verify it lies within the TX buffer ring.
Reported-by: Aashay Shringarpure aashay@google.com Reported-by: Yi Chou yich@google.com Reported-by: Shervin Oloumi enlightened@google.com Signed-off-by: Grant Grundler grundler@chromium.org Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_b0.c | 7 +++++++ 1 file changed, 7 insertions(+)
diff --git a/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_b0.c b/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_b0.c index c4f914a29c38..bdb0b37c048a 100644 --- a/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_b0.c +++ b/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_b0.c @@ -637,6 +637,13 @@ static int hw_atl_b0_hw_ring_tx_head_update(struct aq_hw_s *self, err = -ENXIO; goto err_exit; } + + /* Validate that the new hw_head_ is reasonable. */ + if (hw_head_ >= ring->size) { + err = -ENXIO; + goto err_exit; + } + ring->hw_head = hw_head_; err = aq_hw_err_from_flags(self);
From: Halil Pasic pasic@linux.ibm.com
commit ddbd89deb7d32b1fbb879f48d68fda1a8ac58e8e upstream.
The problem I'm addressing was discovered by the LTP test covering cve-2018-1000204.
A short description of what happens follows: 1) The test case issues a command code 00 (TEST UNIT READY) via the SG_IO interface with: dxfer_len == 524288, dxdfer_dir == SG_DXFER_FROM_DEV and a corresponding dxferp. The peculiar thing about this is that TUR is not reading from the device. 2) In sg_start_req() the invocation of blk_rq_map_user() effectively bounces the user-space buffer. As if the device was to transfer into it. Since commit a45b599ad808 ("scsi: sg: allocate with __GFP_ZERO in sg_build_indirect()") we make sure this first bounce buffer is allocated with GFP_ZERO. 3) For the rest of the story we keep ignoring that we have a TUR, so the device won't touch the buffer we prepare as if the we had a DMA_FROM_DEVICE type of situation. My setup uses a virtio-scsi device and the buffer allocated by SG is mapped by the function virtqueue_add_split() which uses DMA_FROM_DEVICE for the "in" sgs (here scatter-gather and not scsi generics). This mapping involves bouncing via the swiotlb (we need swiotlb to do virtio in protected guest like s390 Secure Execution, or AMD SEV). 4) When the SCSI TUR is done, we first copy back the content of the second (that is swiotlb) bounce buffer (which most likely contains some previous IO data), to the first bounce buffer, which contains all zeros. Then we copy back the content of the first bounce buffer to the user-space buffer. 5) The test case detects that the buffer, which it zero-initialized, ain't all zeros and fails.
One can argue that this is an swiotlb problem, because without swiotlb we leak all zeros, and the swiotlb should be transparent in a sense that it does not affect the outcome (if all other participants are well behaved).
Copying the content of the original buffer into the swiotlb buffer is the only way I can think of to make swiotlb transparent in such scenarios. So let's do just that if in doubt, but allow the driver to tell us that the whole mapped buffer is going to be overwritten, in which case we can preserve the old behavior and avoid the performance impact of the extra bounce.
Signed-off-by: Halil Pasic pasic@linux.ibm.com Signed-off-by: Christoph Hellwig hch@lst.de [OP: backport to 4.19: adjusted context] Signed-off-by: Ovidiu Panait ovidiu.panait@windriver.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- Documentation/DMA-attributes.txt | 10 ++++++++++ include/linux/dma-mapping.h | 8 ++++++++ kernel/dma/swiotlb.c | 3 ++- 3 files changed, 20 insertions(+), 1 deletion(-)
--- a/Documentation/DMA-attributes.txt +++ b/Documentation/DMA-attributes.txt @@ -156,3 +156,13 @@ accesses to DMA buffers in both privileg subsystem that the buffer is fully accessible at the elevated privilege level (and ideally inaccessible or at least read-only at the lesser-privileged levels). + +DMA_ATTR_PRIVILEGED +------------------- + +Some advanced peripherals such as remote processors and GPUs perform +accesses to DMA buffers in both privileged "supervisor" and unprivileged +"user" modes. This attribute is used to indicate to the DMA-mapping +subsystem that the buffer is fully accessible at the elevated privilege +level (and ideally inaccessible or at least read-only at the +lesser-privileged levels). --- a/include/linux/dma-mapping.h +++ b/include/linux/dma-mapping.h @@ -71,6 +71,14 @@ #define DMA_ATTR_PRIVILEGED (1UL << 9)
/* + * This is a hint to the DMA-mapping subsystem that the device is expected + * to overwrite the entire mapped size, thus the caller does not require any + * of the previous buffer contents to be preserved. This allows + * bounce-buffering implementations to optimise DMA_FROM_DEVICE transfers. + */ +#define DMA_ATTR_OVERWRITE (1UL << 10) + +/* * A dma_addr_t can hold any valid DMA or bus address for the platform. * It can be given to a device to use as a DMA source or target. A CPU cannot * reference a dma_addr_t directly because there may be translation between --- a/kernel/dma/swiotlb.c +++ b/kernel/dma/swiotlb.c @@ -588,7 +588,8 @@ found: for (i = 0; i < nslots; i++) io_tlb_orig_addr[index+i] = orig_addr + (i << IO_TLB_SHIFT); if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC) && - (dir == DMA_TO_DEVICE || dir == DMA_BIDIRECTIONAL)) + (!(attrs & DMA_ATTR_OVERWRITE) || dir == DMA_TO_DEVICE || + dir == DMA_BIDIRECTIONAL)) swiotlb_bounce(orig_addr, tlb_addr, size, DMA_TO_DEVICE);
return tlb_addr;
On 5/23/22 11:04 AM, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 4.19.245 release. There are 44 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed, 25 May 2022 16:56:55 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.19.245-rc... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.19.y and the diffstat can be found below.
thanks,
greg k-h
Compiled and booted on my test system. No dmesg regressions.
Tested-by: Shuah Khan skhan@linuxfoundation.org
thanks, -- Shuah
On Mon, 23 May 2022 at 22:35, Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
This is the start of the stable review cycle for the 4.19.245 release. There are 44 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed, 25 May 2022 16:56:55 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.19.245-rc... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.19.y and the diffstat can be found below.
thanks,
greg k-h
Results from Linaro’s test farm. No regressions on arm64, arm, x86_64, and i386.
Tested-by: Linux Kernel Functional Testing lkft@linaro.org
## Build * kernel: 4.19.245-rc1 * git: https://gitlab.com/Linaro/lkft/mirrors/stable/linux-stable-rc * git branch: linux-4.19.y * git commit: e88efc48b31fe3946b6e39d613eeae6d96ada4fb * git describe: v4.19.244-45-ge88efc48b31f * test details: https://qa-reports.linaro.org/lkft/linux-stable-rc-linux-4.19.y/build/v4.19....
## Test Regressions (compared to v4.19.244) No test regressions found.
## Metric Regressions (compared to v4.19.244) No metric regressions found.
## Test Fixes (compared to v4.19.244) No test fixes found.
## Metric Fixes (compared to v4.19.244) No metric fixes found.
## Test result summary total: 85385, pass: 70036, fail: 859, skip: 12899, xfail: 1591
## Build Summary * arm: 275 total, 275 passed, 0 failed * arm64: 39 total, 39 passed, 0 failed * i386: 18 total, 18 passed, 0 failed * mips: 27 total, 27 passed, 0 failed * powerpc: 55 total, 54 passed, 1 failed * s390: 12 total, 12 passed, 0 failed * sparc: 12 total, 12 passed, 0 failed * x86_64: 38 total, 38 passed, 0 failed
## Test suites summary * fwts * igt-gpu-tools * kselftest-android * kselftest-arm64 * kselftest-breakpoints * kselftest-capabilities * kselftest-cgroup * kselftest-clone3 * kselftest-core * kselftest-cpu-hotplug * kselftest-cpufreq * kselftest-drivers * kselftest-efivarfs * kselftest-filesystems * kselftest-firmware * kselftest-fpu * kselftest-futex * kselftest-gpio * kselftest-intel_pstate * kselftest-ipc * kselftest-ir * kselftest-kcmp * kselftest-kexec * kselftest-kvm * kselftest-lib * kselftest-livepatch * kselftest-membarrier * kselftest-memfd * kselftest-memory-hotplug * kselftest-mincore * kselftest-mount * kselftest-mqueue * kselftest-net * kselftest-netfilter * kselftest-nsfs * kselftest-openat2 * kselftest-pid_namespace * kselftest-pidfd * kselftest-proc * kselftest-pstore * kselftest-ptrace * kselftest-rseq * kselftest-rtc * kselftest-seccomp * kselftest-sigaltstack * kselftest-size * kselftest-splice * kselftest-static_keys * kselftest-sync * kselftest-sysctl * kselftest-tc-testing * kselftest-timens * kselftest-timers * kselftest-tmpfs * kselftest-tpm2 * kselftest-user * kselftest-vm * kselftest-x86 * kselftest-zram * kvm-unit-tests * libhugetlbfs * linux-log-parser * log-parser-boot * log-parser-test * ltp-cap_bounds-tests * ltp-commands-tests * ltp-containers-tests * ltp-controllers-tests * ltp-cpuhotplug-tests * ltp-crypto-tests * ltp-cve-tests * ltp-dio-tests * ltp-fcntl-locktests-tests * ltp-filecaps-tests * ltp-fs-tests * ltp-fs_bind-tests * ltp-fs_perms_simple-tests * ltp-fsx-tests * ltp-hugetlb-tests * ltp-io-tests * ltp-ipc-tests * ltp-math-tests * ltp-mm-tests * ltp-nptl-tests * ltp-open-posix-tests * ltp-pty-tests * ltp-sched-tests * ltp-securebits-tests * ltp-syscalls-tests * ltp-tracing-tests * network-basic-tests * packetdrill * rcutorture * ssuite * v4l2-compliance * vdso
-- Linaro LKFT https://lkft.linaro.org
Hi Greg,
On Mon, May 23, 2022 at 07:04:44PM +0200, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 4.19.245 release. There are 44 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed, 25 May 2022 16:56:55 +0000. Anything received after that time might be too late.
Build test: mips (gcc version 12.1.0): 63 configs -> no failure arm (gcc version 12.1.0): 116 configs -> no new failure arm64 (gcc version 12.1.0): 2 configs -> no failure x86_64 (gcc version 12.1.0): 4 configs -> no failure
Boot test: x86_64: Booted on my test laptop. No regression. x86_64: Booted on qemu. No regression. [1]
[1]. https://openqa.qa.codethink.co.uk/tests/1194
Tested-by: Sudip Mukherjee sudip.mukherjee@codethink.co.uk
-- Regards Sudip
Hi!
This is the start of the stable review cycle for the 4.19.245 release. There are 44 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
CIP testing did not find any problems here:
https://gitlab.com/cip-project/cip-testing/linux-stable-rc-ci/-/tree/linux-4...
Tested-by: Pavel Machek (CIP) pavel@denx.de
Best regards, Pavel
On Mon, May 23, 2022 at 07:04:44PM +0200, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 4.19.245 release. There are 44 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed, 25 May 2022 16:56:55 +0000. Anything received after that time might be too late.
Build results: total: 156 pass: 156 fail: 0 Qemu test results: total: 425 pass: 425 fail: 0
Tested-by: Guenter Roeck linux@roeck-us.net
Guenter
linux-stable-mirror@lists.linaro.org