This is the start of the stable review cycle for the 5.4.249 release. There are 60 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed, 28 Jun 2023 18:07:23 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.4.249-rc1... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.4.y and the diffstat can be found below.
thanks,
greg k-h
------------- Pseudo-Shortlog of commits:
Greg Kroah-Hartman gregkh@linuxfoundation.org Linux 5.4.249-rc1
Darrick J. Wong djwong@kernel.org xfs: verify buffer contents when we skip log replay
Linus Torvalds torvalds@linux-foundation.org mm: make wait_on_page_writeback() wait for multiple pending writebacks
Hugh Dickins hughd@google.com mm: fix VM_BUG_ON(PageTail) and BUG_ON(PageWriteback)
Clark Wang xiaoning.wang@nxp.com i2c: imx-lpi2c: fix type char overflow issue when calculating the clock cycle
Dheeraj Kumar Srivastava dheerajkumar.srivastava@amd.com x86/apic: Fix kernel panic when booting with intremap=off and x2apic_phys
Min Li lm0963hack@gmail.com drm/radeon: fix race condition UAF in radeon_gem_set_domain_ioctl
Min Li lm0963hack@gmail.com drm/exynos: fix race condition UAF in exynos_g2d_exec_ioctl
Inki Dae inki.dae@samsung.com drm/exynos: vidi: fix a wrong error return
Linus Walleij linus.walleij@linaro.org ARM: dts: Fix erroneous ADS touchscreen polarities
Edson Juliano Drosdeck edson.drosdeck@gmail.com ASoC: nau8824: Add quirk to active-high jack-detect
Vineeth Vijayan vneethv@linux.ibm.com s390/cio: unregister device when the only path is gone
Dan Carpenter dan.carpenter@linaro.org usb: gadget: udc: fix NULL dereference in remove()
Osama Muhammad osmtendev@gmail.com nfcsim.c: Fix error checking for debugfs_create_dir
Hans Verkuil hverkuil-cisco@xs4all.nl media: cec: core: don't set last_initiator if tx in progress
Marc Zyngier maz@kernel.org arm64: Add missing Set/Way CMO encodings
Denis Arefev arefev@swemel.ru HID: wacom: Add error check to wacom_parse_and_register()
Maurizio Lombardi mlombard@redhat.com scsi: target: iscsi: Prevent login threads from racing between each other
Eric Dumazet edumazet@google.com sch_netem: acquire qdisc lock in netem_change()
Francesco Dolcini francesco.dolcini@toradex.com Revert "net: phy: dp83867: perform soft reset and retain established link"
Pablo Neira Ayuso pablo@netfilter.org netfilter: nfnetlink_osf: fix module autoload
Pablo Neira Ayuso pablo@netfilter.org netfilter: nf_tables: disallow element updates of bound anonymous sets
Ross Lagerwall ross.lagerwall@citrix.com be2net: Extend xmit workaround to BE3 chip
Arınç ÜNAL arinc.unal@arinc9.com net: dsa: mt7530: fix trapping frames on non-MT7621 SoC MT7530 switch
Terin Stock terin@cloudflare.com ipvs: align inner_mac_header for encapsulation
Sergey Shtylyov s.shtylyov@omp.ru mmc: usdhi60rol0: fix deferred probing
Sergey Shtylyov s.shtylyov@omp.ru mmc: sh_mmcif: fix deferred probing
Sergey Shtylyov s.shtylyov@omp.ru mmc: sdhci-acpi: fix deferred probing
Sergey Shtylyov s.shtylyov@omp.ru mmc: omap_hsmmc: fix deferred probing
Sergey Shtylyov s.shtylyov@omp.ru mmc: omap: fix deferred probing
Sergey Shtylyov s.shtylyov@omp.ru mmc: mvsdio: fix deferred probing
Yangtao Li tiny.windzz@gmail.com mmc: mvsdio: convert to devm_platform_ioremap_resource
Sergey Shtylyov s.shtylyov@omp.ru mmc: mtk-sd: fix deferred probing
Stefan Wahren stefan.wahren@i2se.com net: qca_spi: Avoid high load if QCA7000 is not available
Sebastian Andrzej Siewior bigeasy@linutronix.de xfrm: Linearize the skb after offloading if needed.
Chen Aotian chenaotian2@163.com ieee802154: hwsim: Fix possible memory leaks
Paul E. McKenney paulmck@kernel.org rcu: Upgrade rcu_swap_protected() to rcu_replace_pointer()
Lee Jones lee@kernel.org x86/mm: Avoid using set_pgd() outside of real PGD pages
Paulo Alcantara (SUSE) pc@cjr.nz cifs: Fix potential deadlock when updating vol in cifs_reconnect()
Paulo Alcantara (SUSE) pc@cjr.nz cifs: Merge is_path_valid() into get_normalized_path()
Paulo Alcantara (SUSE) pc@cjr.nz cifs: Introduce helpers for finding TCP connection
Paulo Alcantara (SUSE) pc@cjr.nz cifs: Get rid of kstrdup_const()'d paths
Paulo Alcantara (SUSE) pc@cjr.nz cifs: Clean up DFS referral cache
Ryusuke Konishi konishi.ryusuke@gmail.com nilfs2: prevent general protection fault in nilfs_clear_dirty_page()
Rafael Aquini aquini@redhat.com writeback: fix dereferencing NULL mapping->host on writeback_page_template
Matthias May matthias.may@westermo.com ip_tunnels: allow VXLAN/GENEVE to inherit TOS/TTL from VLAN
Martin Hundebøll martin@geanix.com mmc: meson-gx: remove redundant mmc_request_done() call from irq context
Xiu Jianfeng xiujianfeng@huawei.com cgroup: Do not corrupt task iteration when rebinding subsystem
Dexuan Cui decui@microsoft.com PCI: hv: Fix a race condition bug in hv_pci_query_relations()
Michael Kelley mikelley@microsoft.com Drivers: hv: vmbus: Fix vmbus_wait_for_unload() to scan present CPUs
Ryusuke Konishi konishi.ryusuke@gmail.com nilfs2: fix buffer corruption due to concurrent device reads
Hyunwoo Kim imv4bel@gmail.com media: dvb-core: Fix use-after-free due to race at dvb_register_device()
Mauro Carvalho Chehab mchehab+huawei@kernel.org media: dvbdev: fix error logic at dvb_register_device()
Dinghao Liu dinghao.liu@zju.edu.cn media: dvbdev: Fix memleak in dvb_register_device
Thomas Gleixner tglx@linutronix.de tick/common: Align tick period during sched_timer setup
Ricardo Ribalda ribalda@chromium.org x86/purgatory: remove PGO flags
Steven Rostedt (Google) rostedt@goodmis.org tracing: Add tracing_reset_all_online_cpus_unlocked() function
Benjamin Segall bsegall@google.com epoll: ep_autoremove_wake_function should use list_del_init_careful
Linus Torvalds torvalds@linux-foundation.org list: add "list_del_init_careful()" to go with "list_empty_careful()"
Linus Torvalds torvalds@linux-foundation.org mm: rewrite wait_on_page_bit_common() logic
Ryusuke Konishi konishi.ryusuke@gmail.com nilfs2: reject devices with insufficient block count
-------------
Diffstat:
Makefile | 4 +- arch/arm/boot/dts/am57xx-cl-som-am57x.dts | 2 +- arch/arm/boot/dts/at91sam9261ek.dts | 2 +- arch/arm/boot/dts/imx7d-pico-hobbit.dts | 2 +- arch/arm/boot/dts/imx7d-sdb.dts | 2 +- arch/arm/boot/dts/omap3-cm-t3x.dtsi | 2 +- arch/arm/boot/dts/omap3-devkit8000-lcd-common.dtsi | 2 +- arch/arm/boot/dts/omap3-lilly-a83x.dtsi | 2 +- arch/arm/boot/dts/omap3-overo-common-lcd35.dtsi | 2 +- arch/arm/boot/dts/omap3-overo-common-lcd43.dtsi | 2 +- arch/arm/boot/dts/omap3-pandora-common.dtsi | 2 +- arch/arm/boot/dts/omap5-cm-t54.dts | 2 +- arch/arm64/include/asm/sysreg.h | 6 + arch/x86/kernel/apic/x2apic_phys.c | 5 +- arch/x86/mm/kaslr.c | 8 +- arch/x86/purgatory/Makefile | 5 + drivers/gpu/drm/exynos/exynos_drm_g2d.c | 2 +- drivers/gpu/drm/exynos/exynos_drm_vidi.c | 2 - drivers/gpu/drm/radeon/radeon_gem.c | 4 +- drivers/hid/wacom_sys.c | 7 +- drivers/hv/channel_mgmt.c | 18 +- drivers/i2c/busses/i2c-imx-lpi2c.c | 4 +- drivers/media/cec/cec-adap.c | 3 +- drivers/media/dvb-core/dvbdev.c | 88 ++- drivers/mmc/host/meson-gx-mmc.c | 10 +- drivers/mmc/host/mtk-sd.c | 2 +- drivers/mmc/host/mvsdio.c | 8 +- drivers/mmc/host/omap.c | 2 +- drivers/mmc/host/omap_hsmmc.c | 6 +- drivers/mmc/host/sdhci-acpi.c | 2 +- drivers/mmc/host/sh_mmcif.c | 2 +- drivers/mmc/host/usdhi6rol0.c | 6 +- drivers/net/dsa/mt7530.c | 2 +- drivers/net/ethernet/emulex/benet/be_main.c | 4 +- drivers/net/ethernet/qualcomm/qca_spi.c | 3 +- drivers/net/ieee802154/mac802154_hwsim.c | 6 +- drivers/net/phy/dp83867.c | 2 +- drivers/nfc/nfcsim.c | 4 - drivers/pci/controller/pci-hyperv.c | 18 + drivers/s390/cio/device.c | 5 +- drivers/target/iscsi/iscsi_target_nego.c | 4 +- drivers/usb/gadget/udc/amd5536udc_pci.c | 3 + fs/cifs/dfs_cache.c | 707 +++++++++++---------- fs/eventpoll.c | 6 +- fs/nilfs2/page.c | 10 +- fs/nilfs2/segbuf.c | 6 + fs/nilfs2/segment.c | 7 + fs/nilfs2/super.c | 25 +- fs/nilfs2/the_nilfs.c | 44 +- fs/xfs/xfs_log_recover.c | 10 + include/linux/list.h | 20 +- include/linux/rcupdate.h | 18 + include/media/dvbdev.h | 15 + include/net/ip_tunnels.h | 12 +- include/trace/events/writeback.h | 2 +- kernel/cgroup/cgroup.c | 20 +- kernel/sched/wait.c | 2 +- kernel/time/tick-common.c | 13 +- kernel/time/tick-sched.c | 13 +- kernel/trace/trace.c | 11 +- kernel/trace/trace.h | 1 + kernel/trace/trace_events.c | 2 +- mm/filemap.c | 137 ++-- mm/page-writeback.c | 8 +- net/ipv4/esp4_offload.c | 3 + net/ipv6/esp6_offload.c | 3 + net/netfilter/ipvs/ip_vs_xmit.c | 2 + net/netfilter/nf_tables_api.c | 7 +- net/netfilter/nfnetlink_osf.c | 1 + net/netfilter/xt_osf.c | 1 - net/sched/sch_netem.c | 8 +- sound/soc/codecs/nau8824.c | 24 + 72 files changed, 898 insertions(+), 507 deletions(-)
From: Ryusuke Konishi konishi.ryusuke@gmail.com
commit 92c5d1b860e9581d64baca76779576c0ab0d943d upstream.
The current sanity check for nilfs2 geometry information lacks checks for the number of segments stored in superblocks, so even for device images that have been destructively truncated or have an unusually high number of segments, the mount operation may succeed.
This causes out-of-bounds block I/O on file system block reads or log writes to the segments, the latter in particular causing "a_ops->writepages" to repeatedly fail, resulting in sync_inodes_sb() to hang.
Fix this issue by checking the number of segments stored in the superblock and avoiding mounting devices that can cause out-of-bounds accesses. To eliminate the possibility of overflow when calculating the number of blocks required for the device from the number of segments, this also adds a helper function to calculate the upper bound on the number of segments and inserts a check using it.
Link: https://lkml.kernel.org/r/20230526021332.3431-1-konishi.ryusuke@gmail.com Signed-off-by: Ryusuke Konishi konishi.ryusuke@gmail.com Reported-by: syzbot+7d50f1e54a12ba3aeae2@syzkaller.appspotmail.com Link: https://syzkaller.appspot.com/bug?extid=7d50f1e54a12ba3aeae2 Tested-by: Ryusuke Konishi konishi.ryusuke@gmail.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/nilfs2/the_nilfs.c | 44 +++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 43 insertions(+), 1 deletion(-)
--- a/fs/nilfs2/the_nilfs.c +++ b/fs/nilfs2/the_nilfs.c @@ -375,6 +375,18 @@ unsigned long nilfs_nrsvsegs(struct the_ 100)); }
+/** + * nilfs_max_segment_count - calculate the maximum number of segments + * @nilfs: nilfs object + */ +static u64 nilfs_max_segment_count(struct the_nilfs *nilfs) +{ + u64 max_count = U64_MAX; + + do_div(max_count, nilfs->ns_blocks_per_segment); + return min_t(u64, max_count, ULONG_MAX); +} + void nilfs_set_nsegments(struct the_nilfs *nilfs, unsigned long nsegs) { nilfs->ns_nsegments = nsegs; @@ -384,6 +396,8 @@ void nilfs_set_nsegments(struct the_nilf static int nilfs_store_disk_layout(struct the_nilfs *nilfs, struct nilfs_super_block *sbp) { + u64 nsegments, nblocks; + if (le32_to_cpu(sbp->s_rev_level) < NILFS_MIN_SUPP_REV) { nilfs_msg(nilfs->ns_sb, KERN_ERR, "unsupported revision (superblock rev.=%d.%d, current rev.=%d.%d). Please check the version of mkfs.nilfs(2).", @@ -430,7 +444,35 @@ static int nilfs_store_disk_layout(struc return -EINVAL; }
- nilfs_set_nsegments(nilfs, le64_to_cpu(sbp->s_nsegments)); + nsegments = le64_to_cpu(sbp->s_nsegments); + if (nsegments > nilfs_max_segment_count(nilfs)) { + nilfs_msg(nilfs->ns_sb, KERN_ERR, + "segment count %llu exceeds upper limit (%llu segments)", + (unsigned long long)nsegments, + (unsigned long long)nilfs_max_segment_count(nilfs)); + return -EINVAL; + } + + nblocks = (u64)i_size_read(nilfs->ns_sb->s_bdev->bd_inode) >> + nilfs->ns_sb->s_blocksize_bits; + if (nblocks) { + u64 min_block_count = nsegments * nilfs->ns_blocks_per_segment; + /* + * To avoid failing to mount early device images without a + * second superblock, exclude that block count from the + * "min_block_count" calculation. + */ + + if (nblocks < min_block_count) { + nilfs_msg(nilfs->ns_sb, KERN_ERR, + "total number of segment blocks %llu exceeds device size (%llu blocks)", + (unsigned long long)min_block_count, + (unsigned long long)nblocks); + return -EINVAL; + } + } + + nilfs_set_nsegments(nilfs, nsegments); nilfs->ns_crc_seed = le32_to_cpu(sbp->s_crc_seed); return 0; }
From: Linus Torvalds torvalds@linux-foundation.org
[ Upstream commit 2a9127fcf2296674d58024f83981f40b128fffea ]
It turns out that wait_on_page_bit_common() had several problems, ranging from just unfair behavioe due to re-queueing at the end of the wait queue when re-trying, and an outright bug that could result in missed wakeups (but probably never happened in practice).
This rewrites the whole logic to avoid both issues, by simply moving the logic to check (and possibly take) the bit lock into the wakeup path instead.
That makes everything much more straightforward, and means that we never need to re-queue the wait entry: if we get woken up, we'll be notified through WQ_FLAG_WOKEN, and the wait queue entry will have been removed, and everything will have been done for us.
Link: https://lore.kernel.org/lkml/CAHk-=wjJA2Z3kUFb-5s=6+n0qbTs8ELqKFt9B3pH85a8fG... Link: https://lore.kernel.org/lkml/alpine.LSU.2.11.2007221359450.1017@eggly.anvils... Reported-by: Oleg Nesterov oleg@redhat.com Reported-by: Hugh Dickins hughd@google.com Cc: Michal Hocko mhocko@suse.com Reviewed-by: Oleg Nesterov oleg@redhat.com Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Stable-dep-of: 2192bba03d80 ("epoll: ep_autoremove_wake_function should use list_del_init_careful") Signed-off-by: Sasha Levin sashal@kernel.org --- mm/filemap.c | 132 +++++++++++++++++++++++++++++++++------------------ 1 file changed, 85 insertions(+), 47 deletions(-)
diff --git a/mm/filemap.c b/mm/filemap.c index c094103191a6e..83b324420046b 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -1046,6 +1046,7 @@ struct wait_page_queue {
static int wake_page_function(wait_queue_entry_t *wait, unsigned mode, int sync, void *arg) { + int ret; struct wait_page_key *key = arg; struct wait_page_queue *wait_page = container_of(wait, struct wait_page_queue, wait); @@ -1058,17 +1059,40 @@ static int wake_page_function(wait_queue_entry_t *wait, unsigned mode, int sync, return 0;
/* - * Stop walking if it's locked. - * Is this safe if put_and_wait_on_page_locked() is in use? - * Yes: the waker must hold a reference to this page, and if PG_locked - * has now already been set by another task, that task must also hold - * a reference to the *same usage* of this page; so there is no need - * to walk on to wake even the put_and_wait_on_page_locked() callers. + * If it's an exclusive wait, we get the bit for it, and + * stop walking if we can't. + * + * If it's a non-exclusive wait, then the fact that this + * wake function was called means that the bit already + * was cleared, and we don't care if somebody then + * re-took it. */ - if (test_bit(key->bit_nr, &key->page->flags)) - return -1; + ret = 0; + if (wait->flags & WQ_FLAG_EXCLUSIVE) { + if (test_and_set_bit(key->bit_nr, &key->page->flags)) + return -1; + ret = 1; + } + wait->flags |= WQ_FLAG_WOKEN;
- return autoremove_wake_function(wait, mode, sync, key); + wake_up_state(wait->private, mode); + + /* + * Ok, we have successfully done what we're waiting for, + * and we can unconditionally remove the wait entry. + * + * Note that this has to be the absolute last thing we do, + * since after list_del_init(&wait->entry) the wait entry + * might be de-allocated and the process might even have + * exited. + * + * We _really_ should have a "list_del_init_careful()" to + * properly pair with the unlocked "list_empty_careful()" + * in finish_wait(). + */ + smp_mb(); + list_del_init(&wait->entry); + return ret; }
static void wake_up_page_bit(struct page *page, int bit_nr) @@ -1147,16 +1171,31 @@ enum behavior { */ };
+/* + * Attempt to check (or get) the page bit, and mark the + * waiter woken if successful. + */ +static inline bool trylock_page_bit_common(struct page *page, int bit_nr, + struct wait_queue_entry *wait) +{ + if (wait->flags & WQ_FLAG_EXCLUSIVE) { + if (test_and_set_bit(bit_nr, &page->flags)) + return false; + } else if (test_bit(bit_nr, &page->flags)) + return false; + + wait->flags |= WQ_FLAG_WOKEN; + return true; +} + static inline int wait_on_page_bit_common(wait_queue_head_t *q, struct page *page, int bit_nr, int state, enum behavior behavior) { struct wait_page_queue wait_page; wait_queue_entry_t *wait = &wait_page.wait; - bool bit_is_set; bool thrashing = false; bool delayacct = false; unsigned long pflags; - int ret = 0;
if (bit_nr == PG_locked && !PageUptodate(page) && PageWorkingset(page)) { @@ -1174,48 +1213,47 @@ static inline int wait_on_page_bit_common(wait_queue_head_t *q, wait_page.page = page; wait_page.bit_nr = bit_nr;
- for (;;) { - spin_lock_irq(&q->lock); + /* + * Do one last check whether we can get the + * page bit synchronously. + * + * Do the SetPageWaiters() marking before that + * to let any waker we _just_ missed know they + * need to wake us up (otherwise they'll never + * even go to the slow case that looks at the + * page queue), and add ourselves to the wait + * queue if we need to sleep. + * + * This part needs to be done under the queue + * lock to avoid races. + */ + spin_lock_irq(&q->lock); + SetPageWaiters(page); + if (!trylock_page_bit_common(page, bit_nr, wait)) + __add_wait_queue_entry_tail(q, wait); + spin_unlock_irq(&q->lock);
- if (likely(list_empty(&wait->entry))) { - __add_wait_queue_entry_tail(q, wait); - SetPageWaiters(page); - } + /* + * From now on, all the logic will be based on + * the WQ_FLAG_WOKEN flag, and the and the page + * bit testing (and setting) will be - or has + * already been - done by the wake function. + * + * We can drop our reference to the page. + */ + if (behavior == DROP) + put_page(page);
+ for (;;) { set_current_state(state);
- spin_unlock_irq(&q->lock); - - bit_is_set = test_bit(bit_nr, &page->flags); - if (behavior == DROP) - put_page(page); - - if (likely(bit_is_set)) - io_schedule(); - - if (behavior == EXCLUSIVE) { - if (!test_and_set_bit_lock(bit_nr, &page->flags)) - break; - } else if (behavior == SHARED) { - if (!test_bit(bit_nr, &page->flags)) - break; - } - - if (signal_pending_state(state, current)) { - ret = -EINTR; + if (signal_pending_state(state, current)) break; - }
- if (behavior == DROP) { - /* - * We can no longer safely access page->flags: - * even if CONFIG_MEMORY_HOTREMOVE is not enabled, - * there is a risk of waiting forever on a page reused - * for something that keeps it locked indefinitely. - * But best check for -EINTR above before breaking. - */ + if (wait->flags & WQ_FLAG_WOKEN) break; - } + + io_schedule(); }
finish_wait(q, wait); @@ -1234,7 +1272,7 @@ static inline int wait_on_page_bit_common(wait_queue_head_t *q, * bother with signals either. */
- return ret; + return wait->flags & WQ_FLAG_WOKEN ? 0 : -EINTR; }
void wait_on_page_bit(struct page *page, int bit_nr)
From: Linus Torvalds torvalds@linux-foundation.org
[ Upstream commit c6fe44d96fc1536af5b11cd859686453d1b7bfd1 ]
That gives us ordering guarantees around the pair.
Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Stable-dep-of: 2192bba03d80 ("epoll: ep_autoremove_wake_function should use list_del_init_careful") Signed-off-by: Sasha Levin sashal@kernel.org --- include/linux/list.h | 20 +++++++++++++++++++- kernel/sched/wait.c | 2 +- mm/filemap.c | 7 +------ 3 files changed, 21 insertions(+), 8 deletions(-)
diff --git a/include/linux/list.h b/include/linux/list.h index ce19c6b632a59..231ff089f7d1c 100644 --- a/include/linux/list.h +++ b/include/linux/list.h @@ -268,6 +268,24 @@ static inline int list_empty(const struct list_head *head) return READ_ONCE(head->next) == head; }
+/** + * list_del_init_careful - deletes entry from list and reinitialize it. + * @entry: the element to delete from the list. + * + * This is the same as list_del_init(), except designed to be used + * together with list_empty_careful() in a way to guarantee ordering + * of other memory operations. + * + * Any memory operations done before a list_del_init_careful() are + * guaranteed to be visible after a list_empty_careful() test. + */ +static inline void list_del_init_careful(struct list_head *entry) +{ + __list_del_entry(entry); + entry->prev = entry; + smp_store_release(&entry->next, entry); +} + /** * list_empty_careful - tests whether a list is empty and not being modified * @head: the list to test @@ -283,7 +301,7 @@ static inline int list_empty(const struct list_head *head) */ static inline int list_empty_careful(const struct list_head *head) { - struct list_head *next = head->next; + struct list_head *next = smp_load_acquire(&head->next); return (next == head) && (next == head->prev); }
diff --git a/kernel/sched/wait.c b/kernel/sched/wait.c index 7d668b31dbc6d..c76fe1d4d91e2 100644 --- a/kernel/sched/wait.c +++ b/kernel/sched/wait.c @@ -384,7 +384,7 @@ int autoremove_wake_function(struct wait_queue_entry *wq_entry, unsigned mode, i int ret = default_wake_function(wq_entry, mode, sync, key);
if (ret) - list_del_init(&wq_entry->entry); + list_del_init_careful(&wq_entry->entry);
return ret; } diff --git a/mm/filemap.c b/mm/filemap.c index 83b324420046b..a106d63e84679 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -1085,13 +1085,8 @@ static int wake_page_function(wait_queue_entry_t *wait, unsigned mode, int sync, * since after list_del_init(&wait->entry) the wait entry * might be de-allocated and the process might even have * exited. - * - * We _really_ should have a "list_del_init_careful()" to - * properly pair with the unlocked "list_empty_careful()" - * in finish_wait(). */ - smp_mb(); - list_del_init(&wait->entry); + list_del_init_careful(&wait->entry); return ret; }
From: Benjamin Segall bsegall@google.com
[ Upstream commit 2192bba03d80f829233bfa34506b428f71e531e7 ]
autoremove_wake_function uses list_del_init_careful, so should epoll's more aggressive variant. It only doesn't because it was copied from an older wait.c rather than the most recent.
[bsegall@google.com: add comment] Link: https://lkml.kernel.org/r/xm26bki0ulsr.fsf_-_@google.com Link: https://lkml.kernel.org/r/xm26pm6hvfer.fsf@google.com Fixes: a16ceb139610 ("epoll: autoremove wakers even more aggressively") Signed-off-by: Ben Segall bsegall@google.com Cc: Al Viro viro@zeniv.linux.org.uk Cc: Christian Brauner brauner@kernel.org Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Sasha Levin sashal@kernel.org --- fs/eventpoll.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/fs/eventpoll.c b/fs/eventpoll.c index 877f9f61a4e8d..8c0e94183186f 100644 --- a/fs/eventpoll.c +++ b/fs/eventpoll.c @@ -1814,7 +1814,11 @@ static int ep_autoremove_wake_function(struct wait_queue_entry *wq_entry, { int ret = default_wake_function(wq_entry, mode, sync, key);
- list_del_init(&wq_entry->entry); + /* + * Pairs with list_empty_careful in ep_poll, and ensures future loop + * iterations see the cause of this wakeup. + */ + list_del_init_careful(&wq_entry->entry); return ret; }
From: Steven Rostedt (Google) rostedt@goodmis.org
commit e18eb8783ec4949adebc7d7b0fdb65f65bfeefd9 upstream.
Currently the tracing_reset_all_online_cpus() requires the trace_types_lock held. But only one caller of this function actually has that lock held before calling it, and the other just takes the lock so that it can call it. More users of this function is needed where the lock is not held.
Add a tracing_reset_all_online_cpus_unlocked() function for the one use case that calls it without being held, and also add a lockdep_assert to make sure it is held when called.
Then have tracing_reset_all_online_cpus() take the lock internally, such that callers do not need to worry about taking it.
Link: https://lkml.kernel.org/r/20221123192741.658273220@goodmis.org
Cc: Masami Hiramatsu mhiramat@kernel.org Cc: Andrew Morton akpm@linux-foundation.org Cc: Zheng Yejian zhengyejian1@huawei.com Signed-off-by: Steven Rostedt (Google) rostedt@goodmis.org Signed-off-by: Zheng Yejian zhengyejian1@huawei.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- kernel/trace/trace.c | 11 ++++++++++- kernel/trace/trace.h | 1 + kernel/trace/trace_events.c | 2 +- 3 files changed, 12 insertions(+), 2 deletions(-)
--- a/kernel/trace/trace.c +++ b/kernel/trace/trace.c @@ -1931,10 +1931,12 @@ void tracing_reset_online_cpus(struct tr }
/* Must have trace_types_lock held */ -void tracing_reset_all_online_cpus(void) +void tracing_reset_all_online_cpus_unlocked(void) { struct trace_array *tr;
+ lockdep_assert_held(&trace_types_lock); + list_for_each_entry(tr, &ftrace_trace_arrays, list) { if (!tr->clear_trace) continue; @@ -1946,6 +1948,13 @@ void tracing_reset_all_online_cpus(void) } }
+void tracing_reset_all_online_cpus(void) +{ + mutex_lock(&trace_types_lock); + tracing_reset_all_online_cpus_unlocked(); + mutex_unlock(&trace_types_lock); +} + /* * The tgid_map array maps from pid to tgid; i.e. the value stored at index i * is the tgid last observed corresponding to pid=i. --- a/kernel/trace/trace.h +++ b/kernel/trace/trace.h @@ -677,6 +677,7 @@ int tracing_is_enabled(void); void tracing_reset_online_cpus(struct trace_buffer *buf); void tracing_reset_current(int cpu); void tracing_reset_all_online_cpus(void); +void tracing_reset_all_online_cpus_unlocked(void); int tracing_open_generic(struct inode *inode, struct file *filp); int tracing_open_generic_tr(struct inode *inode, struct file *filp); bool tracing_is_disabled(void); --- a/kernel/trace/trace_events.c +++ b/kernel/trace/trace_events.c @@ -2440,7 +2440,7 @@ static void trace_module_remove_events(s * over from this module may be passed to the new module events and * unexpected results may occur. */ - tracing_reset_all_online_cpus(); + tracing_reset_all_online_cpus_unlocked(); }
static int trace_module_notify(struct notifier_block *self,
From: Ricardo Ribalda ribalda@chromium.org
commit 97b6b9cbba40a21c1d9a344d5c1991f8cfbf136e upstream.
If profile-guided optimization is enabled, the purgatory ends up with multiple .text sections. This is not supported by kexec and crashes the system.
Link: https://lkml.kernel.org/r/20230321-kexec_clang16-v7-2-b05c520b7296@chromium.... Fixes: 930457057abe ("kernel/kexec_file.c: split up __kexec_load_puragory") Signed-off-by: Ricardo Ribalda ribalda@chromium.org Cc: stable@vger.kernel.org Cc: Albert Ou aou@eecs.berkeley.edu Cc: Baoquan He bhe@redhat.com Cc: Borislav Petkov (AMD) bp@alien8.de Cc: Christophe Leroy christophe.leroy@csgroup.eu Cc: Dave Hansen dave.hansen@linux.intel.com Cc: Dave Young dyoung@redhat.com Cc: Eric W. Biederman ebiederm@xmission.com Cc: "H. Peter Anvin" hpa@zytor.com Cc: Ingo Molnar mingo@redhat.com Cc: Michael Ellerman mpe@ellerman.id.au Cc: Nathan Chancellor nathan@kernel.org Cc: Nicholas Piggin npiggin@gmail.com Cc: Nick Desaulniers ndesaulniers@google.com Cc: Palmer Dabbelt palmer@dabbelt.com Cc: Palmer Dabbelt palmer@rivosinc.com Cc: Paul Walmsley paul.walmsley@sifive.com Cc: Philipp Rudo prudo@redhat.com Cc: Ross Zwisler zwisler@google.com Cc: Simon Horman horms@kernel.org Cc: Steven Rostedt (Google) rostedt@goodmis.org Cc: Thomas Gleixner tglx@linutronix.de Cc: Tom Rix trix@redhat.com Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Ricardo Ribalda Delgado ribalda@chromium.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/purgatory/Makefile | 5 +++++ 1 file changed, 5 insertions(+)
--- a/arch/x86/purgatory/Makefile +++ b/arch/x86/purgatory/Makefile @@ -14,6 +14,11 @@ $(obj)/sha256.o: $(srctree)/lib/crypto/s
CFLAGS_sha256.o := -D__DISABLE_EXPORTS
+# When profile-guided optimization is enabled, llvm emits two different +# overlapping text sections, which is not supported by kexec. Remove profile +# optimization flags. +KBUILD_CFLAGS := $(filter-out -fprofile-sample-use=% -fprofile-use=%,$(KBUILD_CFLAGS)) + LDFLAGS_purgatory.ro := -e purgatory_start -r --no-undefined -nostdlib -z nodefaultlib targets += purgatory.ro
From: Thomas Gleixner tglx@linutronix.de
commit 13bb06f8dd42071cb9a49f6e21099eea05d4b856 upstream.
The tick period is aligned very early while the first clock_event_device is registered. At that point the system runs in periodic mode and switches later to one-shot mode if possible.
The next wake-up event is programmed based on the aligned value (tick_next_period) but the delta value, that is used to program the clock_event_device, is computed based on ktime_get().
With the subtracted offset, the device fires earlier than the exact time frame. With a large enough offset the system programs the timer for the next wake-up and the remaining time left is too small to make any boot progress. The system hangs.
Move the alignment later to the setup of tick_sched timer. At this point the system switches to oneshot mode and a high resolution clocksource is available. At this point it is safe to align tick_next_period because ktime_get() will now return accurate (not jiffies based) time.
[bigeasy: Patch description + testing].
Fixes: e9523a0d81899 ("tick/common: Align tick period with the HZ tick.") Reported-by: Mathias Krause minipli@grsecurity.net Reported-by: "Bhatnagar, Rishabh" risbhat@amazon.com Suggested-by: Mathias Krause minipli@grsecurity.net Signed-off-by: Thomas Gleixner tglx@linutronix.de Signed-off-by: Sebastian Andrzej Siewior bigeasy@linutronix.de Signed-off-by: Thomas Gleixner tglx@linutronix.de Tested-by: Richard W.M. Jones rjones@redhat.com Tested-by: Mathias Krause minipli@grsecurity.net Acked-by: SeongJae Park sj@kernel.org Cc: stable@vger.kernel.org Link: https://lore.kernel.org/5a56290d-806e-b9a5-f37c-f21958b5a8c0@grsecurity.net Link: https://lore.kernel.org/12c6f9a3-d087-b824-0d05-0d18c9bc1bf3@amazon.com Link: https://lore.kernel.org/r/20230615091830.RxMV2xf_@linutronix.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- kernel/time/tick-common.c | 13 +------------ kernel/time/tick-sched.c | 13 ++++++++++++- 2 files changed, 13 insertions(+), 13 deletions(-)
--- a/kernel/time/tick-common.c +++ b/kernel/time/tick-common.c @@ -216,19 +216,8 @@ static void tick_setup_device(struct tic * this cpu: */ if (tick_do_timer_cpu == TICK_DO_TIMER_BOOT) { - ktime_t next_p; - u32 rem; - tick_do_timer_cpu = cpu; - - next_p = ktime_get(); - div_u64_rem(next_p, TICK_NSEC, &rem); - if (rem) { - next_p -= rem; - next_p += TICK_NSEC; - } - - tick_next_period = next_p; + tick_next_period = ktime_get(); #ifdef CONFIG_NO_HZ_FULL /* * The boot CPU may be nohz_full, in which case set --- a/kernel/time/tick-sched.c +++ b/kernel/time/tick-sched.c @@ -129,8 +129,19 @@ static ktime_t tick_init_jiffy_update(vo raw_spin_lock(&jiffies_lock); write_seqcount_begin(&jiffies_seq); /* Did we start the jiffies update yet ? */ - if (last_jiffies_update == 0) + if (last_jiffies_update == 0) { + u32 rem; + + /* + * Ensure that the tick is aligned to a multiple of + * TICK_NSEC. + */ + div_u64_rem(tick_next_period, TICK_NSEC, &rem); + if (rem) + tick_next_period += TICK_NSEC - rem; + last_jiffies_update = tick_next_period; + } period = last_jiffies_update; write_seqcount_end(&jiffies_seq); raw_spin_unlock(&jiffies_lock);
From: Dinghao Liu dinghao.liu@zju.edu.cn
commit 167faadfcf9339088910e9e85a1b711fcbbef8e9 upstream.
When device_create() fails, dvbdev and dvbdevfops should be freed just like when dvb_register_media_device() fails.
Signed-off-by: Dinghao Liu dinghao.liu@zju.edu.cn Signed-off-by: Sean Young sean@mess.org Signed-off-by: Mauro Carvalho Chehab mchehab+huawei@kernel.org Signed-off-by: Ovidiu Panait ovidiu.panait@windriver.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/media/dvb-core/dvbdev.c | 3 +++ 1 file changed, 3 insertions(+)
--- a/drivers/media/dvb-core/dvbdev.c +++ b/drivers/media/dvb-core/dvbdev.c @@ -545,6 +545,9 @@ int dvb_register_device(struct dvb_adapt if (IS_ERR(clsdev)) { pr_err("%s: failed to create device dvb%d.%s%d (%ld)\n", __func__, adap->num, dnames[type], id, PTR_ERR(clsdev)); + dvb_media_device_free(dvbdev); + kfree(dvbdevfops); + kfree(dvbdev); return PTR_ERR(clsdev); } dprintk("DVB: register adapter%d/%s%d @ minor: %i (0x%02x)\n",
From: Mauro Carvalho Chehab mchehab+huawei@kernel.org
commit 1fec2ecc252301110e4149e6183fa70460d29674 upstream.
As reported by smatch:
drivers/media/dvb-core/dvbdev.c: drivers/media/dvb-core/dvbdev.c:510 dvb_register_device() warn: '&dvbdev->list_head' not removed from list drivers/media/dvb-core/dvbdev.c: drivers/media/dvb-core/dvbdev.c:530 dvb_register_device() warn: '&dvbdev->list_head' not removed from list drivers/media/dvb-core/dvbdev.c: drivers/media/dvb-core/dvbdev.c:545 dvb_register_device() warn: '&dvbdev->list_head' not removed from list
The error logic inside dvb_register_device() doesn't remove devices from the dvb_adapter_list in case of errors.
Signed-off-by: Mauro Carvalho Chehab mchehab+huawei@kernel.org Signed-off-by: Ovidiu Panait ovidiu.panait@windriver.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/media/dvb-core/dvbdev.c | 3 +++ 1 file changed, 3 insertions(+)
--- a/drivers/media/dvb-core/dvbdev.c +++ b/drivers/media/dvb-core/dvbdev.c @@ -511,6 +511,7 @@ int dvb_register_device(struct dvb_adapt break;
if (minor == MAX_DVB_MINORS) { + list_del (&dvbdev->list_head); kfree(dvbdevfops); kfree(dvbdev); up_write(&minor_rwsem); @@ -531,6 +532,7 @@ int dvb_register_device(struct dvb_adapt __func__);
dvb_media_device_free(dvbdev); + list_del (&dvbdev->list_head); kfree(dvbdevfops); kfree(dvbdev); mutex_unlock(&dvbdev_register_lock); @@ -546,6 +548,7 @@ int dvb_register_device(struct dvb_adapt pr_err("%s: failed to create device dvb%d.%s%d (%ld)\n", __func__, adap->num, dnames[type], id, PTR_ERR(clsdev)); dvb_media_device_free(dvbdev); + list_del (&dvbdev->list_head); kfree(dvbdevfops); kfree(dvbdev); return PTR_ERR(clsdev);
From: Hyunwoo Kim imv4bel@gmail.com
commit 627bb528b086b4136315c25d6a447a98ea9448d3 upstream.
dvb_register_device() dynamically allocates fops with kmemdup() to set the fops->owner. And these fops are registered in 'file->f_ops' using replace_fops() in the dvb_device_open() process, and kfree()d in dvb_free_device().
However, it is not common to use dynamically allocated fops instead of 'static const' fops as an argument of replace_fops(), and UAF may occur. These UAFs can occur on any dvb type using dvb_register_device(), such as dvb_dvr, dvb_demux, dvb_frontend, dvb_net, etc.
So, instead of kfree() the fops dynamically allocated in dvb_register_device() in dvb_free_device() called during the .disconnect() process, kfree() it collectively in exit_dvbdev() called when the dvbdev.c module is removed.
Link: https://lore.kernel.org/linux-media/20221117045925.14297-4-imv4bel@gmail.com Signed-off-by: Hyunwoo Kim imv4bel@gmail.com Reported-by: kernel test robot lkp@intel.com Reported-by: Dan Carpenter error27@gmail.com Signed-off-by: Mauro Carvalho Chehab mchehab@kernel.org Signed-off-by: Ovidiu Panait ovidiu.panait@windriver.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/media/dvb-core/dvbdev.c | 84 ++++++++++++++++++++++++++++++---------- include/media/dvbdev.h | 15 +++++++ 2 files changed, 78 insertions(+), 21 deletions(-)
--- a/drivers/media/dvb-core/dvbdev.c +++ b/drivers/media/dvb-core/dvbdev.c @@ -37,6 +37,7 @@ #include <media/tuner.h>
static DEFINE_MUTEX(dvbdev_mutex); +static LIST_HEAD(dvbdevfops_list); static int dvbdev_debug;
module_param(dvbdev_debug, int, 0644); @@ -462,14 +463,15 @@ int dvb_register_device(struct dvb_adapt enum dvb_device_type type, int demux_sink_pads) { struct dvb_device *dvbdev; - struct file_operations *dvbdevfops; + struct file_operations *dvbdevfops = NULL; + struct dvbdevfops_node *node = NULL, *new_node = NULL; struct device *clsdev; int minor; int id, ret;
mutex_lock(&dvbdev_register_lock);
- if ((id = dvbdev_get_free_id (adap, type)) < 0){ + if ((id = dvbdev_get_free_id (adap, type)) < 0) { mutex_unlock(&dvbdev_register_lock); *pdvbdev = NULL; pr_err("%s: couldn't find free device id\n", __func__); @@ -477,18 +479,45 @@ int dvb_register_device(struct dvb_adapt }
*pdvbdev = dvbdev = kzalloc(sizeof(*dvbdev), GFP_KERNEL); - if (!dvbdev){ mutex_unlock(&dvbdev_register_lock); return -ENOMEM; }
- dvbdevfops = kmemdup(template->fops, sizeof(*dvbdevfops), GFP_KERNEL); + /* + * When a device of the same type is probe()d more than once, + * the first allocated fops are used. This prevents memory leaks + * that can occur when the same device is probe()d repeatedly. + */ + list_for_each_entry(node, &dvbdevfops_list, list_head) { + if (node->fops->owner == adap->module && + node->type == type && + node->template == template) { + dvbdevfops = node->fops; + break; + } + }
- if (!dvbdevfops){ - kfree (dvbdev); - mutex_unlock(&dvbdev_register_lock); - return -ENOMEM; + if (dvbdevfops == NULL) { + dvbdevfops = kmemdup(template->fops, sizeof(*dvbdevfops), GFP_KERNEL); + if (!dvbdevfops) { + kfree(dvbdev); + mutex_unlock(&dvbdev_register_lock); + return -ENOMEM; + } + + new_node = kzalloc(sizeof(struct dvbdevfops_node), GFP_KERNEL); + if (!new_node) { + kfree(dvbdevfops); + kfree(dvbdev); + mutex_unlock(&dvbdev_register_lock); + return -ENOMEM; + } + + new_node->fops = dvbdevfops; + new_node->type = type; + new_node->template = template; + list_add_tail (&new_node->list_head, &dvbdevfops_list); }
memcpy(dvbdev, template, sizeof(struct dvb_device)); @@ -499,20 +528,20 @@ int dvb_register_device(struct dvb_adapt dvbdev->priv = priv; dvbdev->fops = dvbdevfops; init_waitqueue_head (&dvbdev->wait_queue); - dvbdevfops->owner = adap->module; - list_add_tail (&dvbdev->list_head, &adap->device_list); - down_write(&minor_rwsem); #ifdef CONFIG_DVB_DYNAMIC_MINORS for (minor = 0; minor < MAX_DVB_MINORS; minor++) if (dvb_minors[minor] == NULL) break; - if (minor == MAX_DVB_MINORS) { + if (new_node) { + list_del (&new_node->list_head); + kfree(dvbdevfops); + kfree(new_node); + } list_del (&dvbdev->list_head); - kfree(dvbdevfops); kfree(dvbdev); up_write(&minor_rwsem); mutex_unlock(&dvbdev_register_lock); @@ -521,41 +550,47 @@ int dvb_register_device(struct dvb_adapt #else minor = nums2minor(adap->num, type, id); #endif - dvbdev->minor = minor; dvb_minors[minor] = dvb_device_get(dvbdev); up_write(&minor_rwsem); - ret = dvb_register_media_device(dvbdev, type, minor, demux_sink_pads); if (ret) { pr_err("%s: dvb_register_media_device failed to create the mediagraph\n", __func__); - + if (new_node) { + list_del (&new_node->list_head); + kfree(dvbdevfops); + kfree(new_node); + } dvb_media_device_free(dvbdev); list_del (&dvbdev->list_head); - kfree(dvbdevfops); kfree(dvbdev); mutex_unlock(&dvbdev_register_lock); return ret; }
- mutex_unlock(&dvbdev_register_lock); - clsdev = device_create(dvb_class, adap->device, MKDEV(DVB_MAJOR, minor), dvbdev, "dvb%d.%s%d", adap->num, dnames[type], id); if (IS_ERR(clsdev)) { pr_err("%s: failed to create device dvb%d.%s%d (%ld)\n", __func__, adap->num, dnames[type], id, PTR_ERR(clsdev)); + if (new_node) { + list_del (&new_node->list_head); + kfree(dvbdevfops); + kfree(new_node); + } dvb_media_device_free(dvbdev); list_del (&dvbdev->list_head); - kfree(dvbdevfops); kfree(dvbdev); + mutex_unlock(&dvbdev_register_lock); return PTR_ERR(clsdev); } + dprintk("DVB: register adapter%d/%s%d @ minor: %i (0x%02x)\n", adap->num, dnames[type], id, minor, minor);
+ mutex_unlock(&dvbdev_register_lock); return 0; } EXPORT_SYMBOL(dvb_register_device); @@ -584,7 +619,6 @@ static void dvb_free_device(struct kref { struct dvb_device *dvbdev = container_of(ref, struct dvb_device, ref);
- kfree (dvbdev->fops); kfree (dvbdev); }
@@ -1090,9 +1124,17 @@ error:
static void __exit exit_dvbdev(void) { + struct dvbdevfops_node *node, *next; + class_destroy(dvb_class); cdev_del(&dvb_device_cdev); unregister_chrdev_region(MKDEV(DVB_MAJOR, 0), MAX_DVB_MINORS); + + list_for_each_entry_safe(node, next, &dvbdevfops_list, list_head) { + list_del (&node->list_head); + kfree(node->fops); + kfree(node); + } }
subsys_initcall(init_dvbdev); --- a/include/media/dvbdev.h +++ b/include/media/dvbdev.h @@ -190,6 +190,21 @@ struct dvb_device { };
/** + * struct dvbdevfops_node - fops nodes registered in dvbdevfops_list + * + * @fops: Dynamically allocated fops for ->owner registration + * @type: type of dvb_device + * @template: dvb_device used for registration + * @list_head: list_head for dvbdevfops_list + */ +struct dvbdevfops_node { + struct file_operations *fops; + enum dvb_device_type type; + const struct dvb_device *template; + struct list_head list_head; +}; + +/** * dvb_device_get - Increase dvb_device reference * * @dvbdev: pointer to struct dvb_device
From: Ryusuke Konishi konishi.ryusuke@gmail.com
commit 679bd7ebdd315bf457a4740b306ae99f1d0a403d upstream.
As a result of analysis of a syzbot report, it turned out that in three cases where nilfs2 allocates block device buffers directly via sb_getblk, concurrent reads to the device can corrupt the allocated buffers.
Nilfs2 uses sb_getblk for segment summary blocks, that make up a log header, and the super root block, that is the trailer, and when moving and writing the second super block after fs resize.
In any of these, since the uptodate flag is not set when storing metadata to be written in the allocated buffers, the stored metadata will be overwritten if a device read of the same block occurs concurrently before the write. This causes metadata corruption and misbehavior in the log write itself, causing warnings in nilfs_btree_assign() as reported.
Fix these issues by setting an uptodate flag on the buffer head on the first or before modifying each buffer obtained with sb_getblk, and clearing the flag on failure.
When setting the uptodate flag, the lock_buffer/unlock_buffer pair is used to perform necessary exclusive control, and the buffer is filled to ensure that uninitialized bytes are not mixed into the data read from others. As for buffers for segment summary blocks, they are filled incrementally, so if the uptodate flag was unset on their allocation, set the flag and zero fill the buffer once at that point.
Also, regarding the superblock move routine, the starting point of the memset call to zerofill the block is incorrectly specified, which can cause a buffer overflow on file systems with block sizes greater than 4KiB. In addition, if the superblock is moved within a large block, it is necessary to assume the possibility that the data in the superblock will be destroyed by zero-filling before copying. So fix these potential issues as well.
Link: https://lkml.kernel.org/r/20230609035732.20426-1-konishi.ryusuke@gmail.com Signed-off-by: Ryusuke Konishi konishi.ryusuke@gmail.com Reported-by: syzbot+31837fe952932efc8fb9@syzkaller.appspotmail.com Closes: https://lkml.kernel.org/r/00000000000030000a05e981f475@google.com Tested-by: Ryusuke Konishi konishi.ryusuke@gmail.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/nilfs2/segbuf.c | 6 ++++++ fs/nilfs2/segment.c | 7 +++++++ fs/nilfs2/super.c | 23 ++++++++++++++++++++++- 3 files changed, 35 insertions(+), 1 deletion(-)
--- a/fs/nilfs2/segbuf.c +++ b/fs/nilfs2/segbuf.c @@ -101,6 +101,12 @@ int nilfs_segbuf_extend_segsum(struct ni if (unlikely(!bh)) return -ENOMEM;
+ lock_buffer(bh); + if (!buffer_uptodate(bh)) { + memset(bh->b_data, 0, bh->b_size); + set_buffer_uptodate(bh); + } + unlock_buffer(bh); nilfs_segbuf_add_segsum_buffer(segbuf, bh); return 0; } --- a/fs/nilfs2/segment.c +++ b/fs/nilfs2/segment.c @@ -984,10 +984,13 @@ static void nilfs_segctor_fill_in_super_ unsigned int isz, srsz;
bh_sr = NILFS_LAST_SEGBUF(&sci->sc_segbufs)->sb_super_root; + + lock_buffer(bh_sr); raw_sr = (struct nilfs_super_root *)bh_sr->b_data; isz = nilfs->ns_inode_size; srsz = NILFS_SR_BYTES(isz);
+ raw_sr->sr_sum = 0; /* Ensure initialization within this update */ raw_sr->sr_bytes = cpu_to_le16(srsz); raw_sr->sr_nongc_ctime = cpu_to_le64(nilfs_doing_gc() ? @@ -1001,6 +1004,8 @@ static void nilfs_segctor_fill_in_super_ nilfs_write_inode_common(nilfs->ns_sufile, (void *)raw_sr + NILFS_SR_SUFILE_OFFSET(isz), 1); memset((void *)raw_sr + srsz, 0, nilfs->ns_blocksize - srsz); + set_buffer_uptodate(bh_sr); + unlock_buffer(bh_sr); }
static void nilfs_redirty_inodes(struct list_head *head) @@ -1778,6 +1783,7 @@ static void nilfs_abort_logs(struct list list_for_each_entry(segbuf, logs, sb_list) { list_for_each_entry(bh, &segbuf->sb_segsum_buffers, b_assoc_buffers) { + clear_buffer_uptodate(bh); if (bh->b_page != bd_page) { if (bd_page) end_page_writeback(bd_page); @@ -1789,6 +1795,7 @@ static void nilfs_abort_logs(struct list b_assoc_buffers) { clear_buffer_async_write(bh); if (bh == segbuf->sb_super_root) { + clear_buffer_uptodate(bh); if (bh->b_page != bd_page) { end_page_writeback(bd_page); bd_page = bh->b_page; --- a/fs/nilfs2/super.c +++ b/fs/nilfs2/super.c @@ -367,10 +367,31 @@ static int nilfs_move_2nd_super(struct s goto out; } nsbp = (void *)nsbh->b_data + offset; - memset(nsbp, 0, nilfs->ns_blocksize);
+ lock_buffer(nsbh); if (sb2i >= 0) { + /* + * The position of the second superblock only changes by 4KiB, + * which is larger than the maximum superblock data size + * (= 1KiB), so there is no need to use memmove() to allow + * overlap between source and destination. + */ memcpy(nsbp, nilfs->ns_sbp[sb2i], nilfs->ns_sbsize); + + /* + * Zero fill after copy to avoid overwriting in case of move + * within the same block. + */ + memset(nsbh->b_data, 0, offset); + memset((void *)nsbp + nilfs->ns_sbsize, 0, + nsbh->b_size - offset - nilfs->ns_sbsize); + } else { + memset(nsbh->b_data, 0, nsbh->b_size); + } + set_buffer_uptodate(nsbh); + unlock_buffer(nsbh); + + if (sb2i >= 0) { brelse(nilfs->ns_sbh[sb2i]); nilfs->ns_sbh[sb2i] = nsbh; nilfs->ns_sbp[sb2i] = nsbp;
From: Michael Kelley mikelley@microsoft.com
commit 320805ab61e5f1e2a5729ae266e16bec2904050c upstream.
vmbus_wait_for_unload() may be called in the panic path after other CPUs are stopped. vmbus_wait_for_unload() currently loops through online CPUs looking for the UNLOAD response message. But the values of CONFIG_KEXEC_CORE and crash_kexec_post_notifiers affect the path used to stop the other CPUs, and in one of the paths the stopped CPUs are removed from cpu_online_mask. This removal happens in both x86/x64 and arm64 architectures. In such a case, vmbus_wait_for_unload() only checks the panic'ing CPU, and misses the UNLOAD response message except when the panic'ing CPU is CPU 0. vmbus_wait_for_unload() eventually times out, but only after waiting 100 seconds.
Fix this by looping through *present* CPUs in vmbus_wait_for_unload(). The cpu_present_mask is not modified by stopping the other CPUs in the panic path, nor should it be.
Also, in a CoCo VM the synic_message_page is not allocated in hv_synic_alloc(), but is set and cleared in hv_synic_enable_regs() and hv_synic_disable_regs() such that it is set only when the CPU is online. If not all present CPUs are online when vmbus_wait_for_unload() is called, the synic_message_page might be NULL. Add a check for this.
Fixes: cd95aad55793 ("Drivers: hv: vmbus: handle various crash scenarios") Cc: stable@vger.kernel.org Reported-by: John Starks jostarks@microsoft.com Signed-off-by: Michael Kelley mikelley@microsoft.com Reviewed-by: Vitaly Kuznetsov vkuznets@redhat.com Link: https://lore.kernel.org/r/1684422832-38476-1-git-send-email-mikelley@microso... Signed-off-by: Wei Liu wei.liu@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/hv/channel_mgmt.c | 18 ++++++++++++++++-- 1 file changed, 16 insertions(+), 2 deletions(-)
--- a/drivers/hv/channel_mgmt.c +++ b/drivers/hv/channel_mgmt.c @@ -802,11 +802,22 @@ static void vmbus_wait_for_unload(void) if (completion_done(&vmbus_connection.unload_event)) goto completed;
- for_each_online_cpu(cpu) { + for_each_present_cpu(cpu) { struct hv_per_cpu_context *hv_cpu = per_cpu_ptr(hv_context.cpu_context, cpu);
+ /* + * In a CoCo VM the synic_message_page is not allocated + * in hv_synic_alloc(). Instead it is set/cleared in + * hv_synic_enable_regs() and hv_synic_disable_regs() + * such that it is set only when the CPU is online. If + * not all present CPUs are online, the message page + * might be NULL, so skip such CPUs. + */ page_addr = hv_cpu->synic_message_page; + if (!page_addr) + continue; + msg = (struct hv_message *)page_addr + VMBUS_MESSAGE_SINT;
@@ -840,11 +851,14 @@ completed: * maybe-pending messages on all CPUs to be able to receive new * messages after we reconnect. */ - for_each_online_cpu(cpu) { + for_each_present_cpu(cpu) { struct hv_per_cpu_context *hv_cpu = per_cpu_ptr(hv_context.cpu_context, cpu);
page_addr = hv_cpu->synic_message_page; + if (!page_addr) + continue; + msg = (struct hv_message *)page_addr + VMBUS_MESSAGE_SINT; msg->header.message_type = HVMSG_NONE; }
From: Dexuan Cui decui@microsoft.com
commit 440b5e3663271b0ffbd4908115044a6a51fb938b upstream.
Since day 1 of the driver, there has been a race between hv_pci_query_relations() and survey_child_resources(): during fast device hotplug, hv_pci_query_relations() may error out due to device-remove and the stack variable 'comp' is no longer valid; however, pci_devices_present_work() -> survey_child_resources() -> complete() may be running on another CPU and accessing the no-longer-valid 'comp'. Fix the race by flushing the workqueue before we exit from hv_pci_query_relations().
Fixes: 4daace0d8ce8 ("PCI: hv: Add paravirtual PCI front-end for Microsoft Hyper-V VMs") Signed-off-by: Dexuan Cui decui@microsoft.com Reviewed-by: Michael Kelley mikelley@microsoft.com Acked-by: Lorenzo Pieralisi lpieralisi@kernel.org Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20230615044451.5580-2-decui@microsoft.com Signed-off-by: Wei Liu wei.liu@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/pci/controller/pci-hyperv.c | 18 ++++++++++++++++++ 1 file changed, 18 insertions(+)
--- a/drivers/pci/controller/pci-hyperv.c +++ b/drivers/pci/controller/pci-hyperv.c @@ -2739,6 +2739,24 @@ static int hv_pci_query_relations(struct if (!ret) ret = wait_for_response(hdev, &comp);
+ /* + * In the case of fast device addition/removal, it's possible that + * vmbus_sendpacket() or wait_for_response() returns -ENODEV but we + * already got a PCI_BUS_RELATIONS* message from the host and the + * channel callback already scheduled a work to hbus->wq, which can be + * running pci_devices_present_work() -> survey_child_resources() -> + * complete(&hbus->survey_event), even after hv_pci_query_relations() + * exits and the stack variable 'comp' is no longer valid; as a result, + * a hang or a page fault may happen when the complete() calls + * raw_spin_lock_irqsave(). Flush hbus->wq before we exit from + * hv_pci_query_relations() to avoid the issues. Note: if 'ret' is + * -ENODEV, there can't be any more work item scheduled to hbus->wq + * after the flush_workqueue(): see vmbus_onoffer_rescind() -> + * vmbus_reset_channel_cb(), vmbus_rescind_cleanup() -> + * channel->rescind = true. + */ + flush_workqueue(hbus->wq); + return ret; }
From: Xiu Jianfeng xiujianfeng@huawei.com
commit 6f363f5aa845561f7ea496d8b1175e3204470486 upstream.
We found a refcount UAF bug as follows:
refcount_t: addition on 0; use-after-free. WARNING: CPU: 1 PID: 342 at lib/refcount.c:25 refcount_warn_saturate+0xa0/0x148 Workqueue: events cpuset_hotplug_workfn Call trace: refcount_warn_saturate+0xa0/0x148 __refcount_add.constprop.0+0x5c/0x80 css_task_iter_advance_css_set+0xd8/0x210 css_task_iter_advance+0xa8/0x120 css_task_iter_next+0x94/0x158 update_tasks_root_domain+0x58/0x98 rebuild_root_domains+0xa0/0x1b0 rebuild_sched_domains_locked+0x144/0x188 cpuset_hotplug_workfn+0x138/0x5a0 process_one_work+0x1e8/0x448 worker_thread+0x228/0x3e0 kthread+0xe0/0xf0 ret_from_fork+0x10/0x20
then a kernel panic will be triggered as below:
Unable to handle kernel paging request at virtual address 00000000c0000010 Call trace: cgroup_apply_control_disable+0xa4/0x16c rebind_subsystems+0x224/0x590 cgroup_destroy_root+0x64/0x2e0 css_free_rwork_fn+0x198/0x2a0 process_one_work+0x1d4/0x4bc worker_thread+0x158/0x410 kthread+0x108/0x13c ret_from_fork+0x10/0x18
The race that cause this bug can be shown as below:
(hotplug cpu) | (umount cpuset) mutex_lock(&cpuset_mutex) | mutex_lock(&cgroup_mutex) cpuset_hotplug_workfn | rebuild_root_domains | rebind_subsystems update_tasks_root_domain | spin_lock_irq(&css_set_lock) css_task_iter_start | list_move_tail(&cset->e_cset_node[ss->id] while(css_task_iter_next) | &dcgrp->e_csets[ss->id]); css_task_iter_end | spin_unlock_irq(&css_set_lock) mutex_unlock(&cpuset_mutex) | mutex_unlock(&cgroup_mutex)
Inside css_task_iter_start/next/end, css_set_lock is hold and then released, so when iterating task(left side), the css_set may be moved to another list(right side), then it->cset_head points to the old list head and it->cset_pos->next points to the head node of new list, which can't be used as struct css_set.
To fix this issue, switch from all css_sets to only scgrp's css_sets to patch in-flight iterators to preserve correct iteration, and then update it->cset_head as well.
Reported-by: Gaosheng Cui cuigaosheng1@huawei.com Link: https://www.spinics.net/lists/cgroups/msg37935.html Suggested-by: Michal Koutný mkoutny@suse.com Link: https://lore.kernel.org/all/20230526114139.70274-1-xiujianfeng@huaweicloud.c... Signed-off-by: Xiu Jianfeng xiujianfeng@huawei.com Fixes: 2d8f243a5e6e ("cgroup: implement cgroup->e_csets[]") Cc: stable@vger.kernel.org # v3.16+ Signed-off-by: Tejun Heo tj@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- kernel/cgroup/cgroup.c | 20 +++++++++++++++++--- 1 file changed, 17 insertions(+), 3 deletions(-)
--- a/kernel/cgroup/cgroup.c +++ b/kernel/cgroup/cgroup.c @@ -1723,7 +1723,7 @@ int rebind_subsystems(struct cgroup_root { struct cgroup *dcgrp = &dst_root->cgrp; struct cgroup_subsys *ss; - int ssid, i, ret; + int ssid, ret; u16 dfl_disable_ss_mask = 0;
lockdep_assert_held(&cgroup_mutex); @@ -1767,7 +1767,8 @@ int rebind_subsystems(struct cgroup_root struct cgroup_root *src_root = ss->root; struct cgroup *scgrp = &src_root->cgrp; struct cgroup_subsys_state *css = cgroup_css(scgrp, ss); - struct css_set *cset; + struct css_set *cset, *cset_pos; + struct css_task_iter *it;
WARN_ON(!css || cgroup_css(dcgrp, ss));
@@ -1785,9 +1786,22 @@ int rebind_subsystems(struct cgroup_root css->cgroup = dcgrp;
spin_lock_irq(&css_set_lock); - hash_for_each(css_set_table, i, cset, hlist) + WARN_ON(!list_empty(&dcgrp->e_csets[ss->id])); + list_for_each_entry_safe(cset, cset_pos, &scgrp->e_csets[ss->id], + e_cset_node[ss->id]) { list_move_tail(&cset->e_cset_node[ss->id], &dcgrp->e_csets[ss->id]); + /* + * all css_sets of scgrp together in same order to dcgrp, + * patch in-flight iterators to preserve correct iteration. + * since the iterator is always advanced right away and + * finished when it->cset_pos meets it->cset_head, so only + * update it->cset_head is enough here. + */ + list_for_each_entry(it, &cset->task_iters, iters_node) + if (it->cset_head == &scgrp->e_csets[ss->id]) + it->cset_head = &dcgrp->e_csets[ss->id]; + } spin_unlock_irq(&css_set_lock);
/* default hierarchy doesn't enable controllers by default */
From: Martin Hundebøll martin@geanix.com
commit 3c40eb8145325b0f5b93b8a169146078cb2c49d6 upstream.
The call to mmc_request_done() can schedule, so it must not be called from irq context. Wake the irq thread if it needs to be called, and let its existing logic do its work.
Fixes the following kernel bug, which appears when running an RT patched kernel on the AmLogic Meson AXG A113X SoC: [ 11.111407] BUG: scheduling while atomic: kworker/0:1H/75/0x00010001 [ 11.111438] Modules linked in: [ 11.111451] CPU: 0 PID: 75 Comm: kworker/0:1H Not tainted 6.4.0-rc3-rt2-rtx-00081-gfd07f41ed6b4-dirty #1 [ 11.111461] Hardware name: RTX AXG A113X Linux Platform Board (DT) [ 11.111469] Workqueue: kblockd blk_mq_run_work_fn [ 11.111492] Call trace: [ 11.111497] dump_backtrace+0xac/0xe8 [ 11.111510] show_stack+0x18/0x28 [ 11.111518] dump_stack_lvl+0x48/0x60 [ 11.111530] dump_stack+0x18/0x24 [ 11.111537] __schedule_bug+0x4c/0x68 [ 11.111548] __schedule+0x80/0x574 [ 11.111558] schedule_loop+0x2c/0x50 [ 11.111567] schedule_rtlock+0x14/0x20 [ 11.111576] rtlock_slowlock_locked+0x468/0x730 [ 11.111587] rt_spin_lock+0x40/0x64 [ 11.111596] __wake_up_common_lock+0x5c/0xc4 [ 11.111610] __wake_up+0x18/0x24 [ 11.111620] mmc_blk_mq_req_done+0x68/0x138 [ 11.111633] mmc_request_done+0x104/0x118 [ 11.111644] meson_mmc_request_done+0x38/0x48 [ 11.111654] meson_mmc_irq+0x128/0x1f0 [ 11.111663] __handle_irq_event_percpu+0x70/0x114 [ 11.111674] handle_irq_event_percpu+0x18/0x4c [ 11.111683] handle_irq_event+0x80/0xb8 [ 11.111691] handle_fasteoi_irq+0xa4/0x120 [ 11.111704] handle_irq_desc+0x20/0x38 [ 11.111712] generic_handle_domain_irq+0x1c/0x28 [ 11.111721] gic_handle_irq+0x8c/0xa8 [ 11.111735] call_on_irq_stack+0x24/0x4c [ 11.111746] do_interrupt_handler+0x88/0x94 [ 11.111757] el1_interrupt+0x34/0x64 [ 11.111769] el1h_64_irq_handler+0x18/0x24 [ 11.111779] el1h_64_irq+0x64/0x68 [ 11.111786] __add_wait_queue+0x0/0x4c [ 11.111795] mmc_blk_rw_wait+0x84/0x118 [ 11.111804] mmc_blk_mq_issue_rq+0x5c4/0x654 [ 11.111814] mmc_mq_queue_rq+0x194/0x214 [ 11.111822] blk_mq_dispatch_rq_list+0x3ac/0x528 [ 11.111834] __blk_mq_sched_dispatch_requests+0x340/0x4d0 [ 11.111847] blk_mq_sched_dispatch_requests+0x38/0x70 [ 11.111858] blk_mq_run_work_fn+0x3c/0x70 [ 11.111865] process_one_work+0x17c/0x1f0 [ 11.111876] worker_thread+0x1d4/0x26c [ 11.111885] kthread+0xe4/0xf4 [ 11.111894] ret_from_fork+0x10/0x20
Fixes: 51c5d8447bd7 ("MMC: meson: initial support for GX platforms") Cc: stable@vger.kernel.org Signed-off-by: Martin Hundebøll martin@geanix.com Link: https://lore.kernel.org/r/20230607082713.517157-1-martin@geanix.com Signed-off-by: Ulf Hansson ulf.hansson@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/mmc/host/meson-gx-mmc.c | 10 ++-------- 1 file changed, 2 insertions(+), 8 deletions(-)
--- a/drivers/mmc/host/meson-gx-mmc.c +++ b/drivers/mmc/host/meson-gx-mmc.c @@ -973,11 +973,8 @@ static irqreturn_t meson_mmc_irq(int irq if (status & (IRQ_END_OF_CHAIN | IRQ_RESP_STATUS)) { if (data && !cmd->error) data->bytes_xfered = data->blksz * data->blocks; - if (meson_mmc_bounce_buf_read(data) || - meson_mmc_get_next_command(cmd)) - ret = IRQ_WAKE_THREAD; - else - ret = IRQ_HANDLED; + + return IRQ_WAKE_THREAD; }
out: @@ -989,9 +986,6 @@ out: writel(start, host->regs + SD_EMMC_START); }
- if (ret == IRQ_HANDLED) - meson_mmc_request_done(host->mmc, cmd->mrq); - return ret; }
From: Matthias May matthias.may@westermo.com
commit 7074732c8faee201a245a6f983008a5789c0be33 upstream.
The current code allows for VXLAN and GENEVE to inherit the TOS respective the TTL when skb-protocol is ETH_P_IP or ETH_P_IPV6. However when the payload is VLAN encapsulated, then this inheriting does not work, because the visible skb-protocol is of type ETH_P_8021Q or ETH_P_8021AD.
Instead of skb->protocol use skb_protocol().
Signed-off-by: Matthias May matthias.may@westermo.com Link: https://lore.kernel.org/r/20220721202718.10092-1-matthias.may@westermo.com Signed-off-by: Jakub Kicinski kuba@kernel.org Cc: Nicolas Dichtel nicolas.dichtel@6wind.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- include/net/ip_tunnels.h | 12 ++++++++---- 1 file changed, 8 insertions(+), 4 deletions(-)
--- a/include/net/ip_tunnels.h +++ b/include/net/ip_tunnels.h @@ -374,9 +374,11 @@ static inline int ip_tunnel_encap(struct static inline u8 ip_tunnel_get_dsfield(const struct iphdr *iph, const struct sk_buff *skb) { - if (skb->protocol == htons(ETH_P_IP)) + __be16 payload_protocol = skb_protocol(skb, true); + + if (payload_protocol == htons(ETH_P_IP)) return iph->tos; - else if (skb->protocol == htons(ETH_P_IPV6)) + else if (payload_protocol == htons(ETH_P_IPV6)) return ipv6_get_dsfield((const struct ipv6hdr *)iph); else return 0; @@ -385,9 +387,11 @@ static inline u8 ip_tunnel_get_dsfield(c static inline u8 ip_tunnel_get_ttl(const struct iphdr *iph, const struct sk_buff *skb) { - if (skb->protocol == htons(ETH_P_IP)) + __be16 payload_protocol = skb_protocol(skb, true); + + if (payload_protocol == htons(ETH_P_IP)) return iph->ttl; - else if (skb->protocol == htons(ETH_P_IPV6)) + else if (payload_protocol == htons(ETH_P_IPV6)) return ((const struct ipv6hdr *)iph)->hop_limit; else return 0;
From: Rafael Aquini aquini@redhat.com
commit 54abe19e00cfcc5a72773d15cd00ed19ab763439 upstream.
When commit 19343b5bdd16 ("mm/page-writeback: introduce tracepoint for wait_on_page_writeback()") repurposed the writeback_dirty_page trace event as a template to create its new wait_on_page_writeback trace event, it ended up opening a window to NULL pointer dereference crashes due to the (infrequent) occurrence of a race where an access to a page in the swap-cache happens concurrently with the moment this page is being written to disk and the tracepoint is enabled:
BUG: kernel NULL pointer dereference, address: 0000000000000040 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 800000010ec0a067 P4D 800000010ec0a067 PUD 102353067 PMD 0 Oops: 0000 [#1] PREEMPT SMP PTI CPU: 1 PID: 1320 Comm: shmem-worker Kdump: loaded Not tainted 6.4.0-rc5+ #13 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS edk2-20230301gitf80f052277c8-1.fc37 03/01/2023 RIP: 0010:trace_event_raw_event_writeback_folio_template+0x76/0xf0 Code: 4d 85 e4 74 5c 49 8b 3c 24 e8 06 98 ee ff 48 89 c7 e8 9e 8b ee ff ba 20 00 00 00 48 89 ef 48 89 c6 e8 fe d4 1a 00 49 8b 04 24 <48> 8b 40 40 48 89 43 28 49 8b 45 20 48 89 e7 48 89 43 30 e8 a2 4d RSP: 0000:ffffaad580b6fb60 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff90e38035c01c RCX: 0000000000000000 RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff90e38035c044 RBP: ffff90e38035c024 R08: 0000000000000002 R09: 0000000000000006 R10: ffff90e38035c02e R11: 0000000000000020 R12: ffff90e380bac000 R13: ffffe3a7456d9200 R14: 0000000000001b81 R15: ffffe3a7456d9200 FS: 00007f2e4e8a15c0(0000) GS:ffff90e3fbc80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000040 CR3: 00000001150c6003 CR4: 0000000000170ee0 Call Trace: <TASK> ? __die+0x20/0x70 ? page_fault_oops+0x76/0x170 ? kernelmode_fixup_or_oops+0x84/0x110 ? exc_page_fault+0x65/0x150 ? asm_exc_page_fault+0x22/0x30 ? trace_event_raw_event_writeback_folio_template+0x76/0xf0 folio_wait_writeback+0x6b/0x80 shmem_swapin_folio+0x24a/0x500 ? filemap_get_entry+0xe3/0x140 shmem_get_folio_gfp+0x36e/0x7c0 ? find_busiest_group+0x43/0x1a0 shmem_fault+0x76/0x2a0 ? __update_load_avg_cfs_rq+0x281/0x2f0 __do_fault+0x33/0x130 do_read_fault+0x118/0x160 do_pte_missing+0x1ed/0x2a0 __handle_mm_fault+0x566/0x630 handle_mm_fault+0x91/0x210 do_user_addr_fault+0x22c/0x740 exc_page_fault+0x65/0x150 asm_exc_page_fault+0x22/0x30
This problem arises from the fact that the repurposed writeback_dirty_page trace event code was written assuming that every pointer to mapping (struct address_space) would come from a file-mapped page-cache object, thus mapping->host would always be populated, and that was a valid case before commit 19343b5bdd16. The swap-cache address space (swapper_spaces), however, doesn't populate its ->host (struct inode) pointer, thus leading to the crashes in the corner-case aforementioned.
commit 19343b5bdd16 ended up breaking the assignment of __entry->name and __entry->ino for the wait_on_page_writeback tracepoint -- both dependent on mapping->host carrying a pointer to a valid inode. The assignment of __entry->name was fixed by commit 68f23b89067f ("memcg: fix a crash in wb_workfn when a device disappears"), and this commit fixes the remaining case, for __entry->ino.
Link: https://lkml.kernel.org/r/20230606233613.1290819-1-aquini@redhat.com Fixes: 19343b5bdd16 ("mm/page-writeback: introduce tracepoint for wait_on_page_writeback()") Signed-off-by: Rafael Aquini aquini@redhat.com Reviewed-by: Yafang Shao laoar.shao@gmail.com Cc: Aristeu Rozanski aris@redhat.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org ---
--- include/trace/events/writeback.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/include/trace/events/writeback.h +++ b/include/trace/events/writeback.h @@ -68,7 +68,7 @@ DECLARE_EVENT_CLASS(writeback_page_templ strscpy_pad(__entry->name, bdi_dev_name(mapping ? inode_to_bdi(mapping->host) : NULL), 32); - __entry->ino = mapping ? mapping->host->i_ino : 0; + __entry->ino = (mapping && mapping->host) ? mapping->host->i_ino : 0; __entry->index = page->index; ),
From: Ryusuke Konishi konishi.ryusuke@gmail.com
commit 782e53d0c14420858dbf0f8f797973c150d3b6d7 upstream.
In a syzbot stress test that deliberately causes file system errors on nilfs2 with a corrupted disk image, it has been reported that nilfs_clear_dirty_page() called from nilfs_clear_dirty_pages() can cause a general protection fault.
In nilfs_clear_dirty_pages(), when looking up dirty pages from the page cache and calling nilfs_clear_dirty_page() for each dirty page/folio retrieved, the back reference from the argument page to "mapping" may have been changed to NULL (and possibly others). It is necessary to check this after locking the page/folio.
So, fix this issue by not calling nilfs_clear_dirty_page() on a page/folio after locking it in nilfs_clear_dirty_pages() if the back reference "mapping" from the page/folio is different from the "mapping" that held the page/folio just before.
Link: https://lkml.kernel.org/r/20230612021456.3682-1-konishi.ryusuke@gmail.com Signed-off-by: Ryusuke Konishi konishi.ryusuke@gmail.com Reported-by: syzbot+53369d11851d8f26735c@syzkaller.appspotmail.com Closes: https://lkml.kernel.org/r/000000000000da4f6b05eb9bf593@google.com Tested-by: Ryusuke Konishi konishi.ryusuke@gmail.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/nilfs2/page.c | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-)
--- a/fs/nilfs2/page.c +++ b/fs/nilfs2/page.c @@ -370,7 +370,15 @@ void nilfs_clear_dirty_pages(struct addr struct page *page = pvec.pages[i];
lock_page(page); - nilfs_clear_dirty_page(page, silent); + + /* + * This page may have been removed from the address + * space by truncation or invalidation when the lock + * was acquired. Skip processing in that case. + */ + if (likely(page->mapping == mapping)) + nilfs_clear_dirty_page(page, silent); + unlock_page(page); } pagevec_release(&pvec);
From: "Paulo Alcantara (SUSE)" pc@cjr.nz
commit 185352ae6171c845951e21017b2925a6f2795904 upstream.
Do some renaming and code cleanup.
No functional changes.
Signed-off-by: Paulo Alcantara (SUSE) pc@cjr.nz Reviewed-by: Aurelien Aptel aaptel@suse.com Signed-off-by: Steve French stfrench@microsoft.com Signed-off-by: Rishabh Bhatnagar risbhat@amazon.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/cifs/dfs_cache.c | 565 +++++++++++++++++++++++++--------------------------- 1 file changed, 279 insertions(+), 286 deletions(-)
--- a/fs/cifs/dfs_cache.c +++ b/fs/cifs/dfs_cache.c @@ -22,60 +22,59 @@
#include "dfs_cache.h"
-#define DFS_CACHE_HTABLE_SIZE 32 -#define DFS_CACHE_MAX_ENTRIES 64 +#define CACHE_HTABLE_SIZE 32 +#define CACHE_MAX_ENTRIES 64
#define IS_INTERLINK_SET(v) ((v) & (DFSREF_REFERRAL_SERVER | \ DFSREF_STORAGE_SERVER))
-struct dfs_cache_tgt { - char *t_name; - struct list_head t_list; +struct cache_dfs_tgt { + char *name; + struct list_head list; };
-struct dfs_cache_entry { - struct hlist_node ce_hlist; - const char *ce_path; - int ce_ttl; - int ce_srvtype; - int ce_flags; - struct timespec64 ce_etime; - int ce_path_consumed; - int ce_numtgts; - struct list_head ce_tlist; - struct dfs_cache_tgt *ce_tgthint; - struct rcu_head ce_rcu; +struct cache_entry { + struct hlist_node hlist; + const char *path; + int ttl; + int srvtype; + int flags; + struct timespec64 etime; + int path_consumed; + int numtgts; + struct list_head tlist; + struct cache_dfs_tgt *tgthint; + struct rcu_head rcu; };
-static struct kmem_cache *dfs_cache_slab __read_mostly; - -struct dfs_cache_vol_info { - char *vi_fullpath; - struct smb_vol vi_vol; - char *vi_mntdata; - struct list_head vi_list; +struct vol_info { + char *fullpath; + struct smb_vol smb_vol; + char *mntdata; + struct list_head list; };
-struct dfs_cache { - struct mutex dc_lock; - struct nls_table *dc_nlsc; - struct list_head dc_vol_list; - int dc_ttl; - struct delayed_work dc_refresh; -}; +static struct kmem_cache *cache_slab __read_mostly; +static struct workqueue_struct *dfscache_wq __read_mostly;
-static struct dfs_cache dfs_cache; +static int cache_ttl; +static struct nls_table *cache_nlsc;
/* * Number of entries in the cache */ -static size_t dfs_cache_count; +static size_t cache_count; + +static struct hlist_head cache_htable[CACHE_HTABLE_SIZE]; +static DEFINE_MUTEX(list_lock);
-static DEFINE_MUTEX(dfs_cache_list_lock); -static struct hlist_head dfs_cache_htable[DFS_CACHE_HTABLE_SIZE]; +static LIST_HEAD(vol_list); +static DEFINE_MUTEX(vol_lock);
static void refresh_cache_worker(struct work_struct *work);
+static DECLARE_DELAYED_WORK(refresh_task, refresh_cache_worker); + static inline bool is_path_valid(const char *path) { return path && (strchr(path + 1, '\') || strchr(path + 1, '/')); @@ -100,42 +99,42 @@ static inline void free_normalized_path( kfree(npath); }
-static inline bool cache_entry_expired(const struct dfs_cache_entry *ce) +static inline bool cache_entry_expired(const struct cache_entry *ce) { struct timespec64 ts;
ktime_get_coarse_real_ts64(&ts); - return timespec64_compare(&ts, &ce->ce_etime) >= 0; + return timespec64_compare(&ts, &ce->etime) >= 0; }
-static inline void free_tgts(struct dfs_cache_entry *ce) +static inline void free_tgts(struct cache_entry *ce) { - struct dfs_cache_tgt *t, *n; + struct cache_dfs_tgt *t, *n;
- list_for_each_entry_safe(t, n, &ce->ce_tlist, t_list) { - list_del(&t->t_list); - kfree(t->t_name); + list_for_each_entry_safe(t, n, &ce->tlist, list) { + list_del(&t->list); + kfree(t->name); kfree(t); } }
static void free_cache_entry(struct rcu_head *rcu) { - struct dfs_cache_entry *ce = container_of(rcu, struct dfs_cache_entry, - ce_rcu); - kmem_cache_free(dfs_cache_slab, ce); + struct cache_entry *ce = container_of(rcu, struct cache_entry, rcu); + + kmem_cache_free(cache_slab, ce); }
-static inline void flush_cache_ent(struct dfs_cache_entry *ce) +static inline void flush_cache_ent(struct cache_entry *ce) { - if (hlist_unhashed(&ce->ce_hlist)) + if (hlist_unhashed(&ce->hlist)) return;
- hlist_del_init_rcu(&ce->ce_hlist); - kfree_const(ce->ce_path); + hlist_del_init_rcu(&ce->hlist); + kfree_const(ce->path); free_tgts(ce); - dfs_cache_count--; - call_rcu(&ce->ce_rcu, free_cache_entry); + cache_count--; + call_rcu(&ce->rcu, free_cache_entry); }
static void flush_cache_ents(void) @@ -143,11 +142,11 @@ static void flush_cache_ents(void) int i;
rcu_read_lock(); - for (i = 0; i < DFS_CACHE_HTABLE_SIZE; i++) { - struct hlist_head *l = &dfs_cache_htable[i]; - struct dfs_cache_entry *ce; + for (i = 0; i < CACHE_HTABLE_SIZE; i++) { + struct hlist_head *l = &cache_htable[i]; + struct cache_entry *ce;
- hlist_for_each_entry_rcu(ce, l, ce_hlist) + hlist_for_each_entry_rcu(ce, l, hlist) flush_cache_ent(ce); } rcu_read_unlock(); @@ -159,35 +158,35 @@ static void flush_cache_ents(void) static int dfscache_proc_show(struct seq_file *m, void *v) { int bucket; - struct dfs_cache_entry *ce; - struct dfs_cache_tgt *t; + struct cache_entry *ce; + struct cache_dfs_tgt *t;
seq_puts(m, "DFS cache\n---------\n");
- mutex_lock(&dfs_cache_list_lock); + mutex_lock(&list_lock);
rcu_read_lock(); - hash_for_each_rcu(dfs_cache_htable, bucket, ce, ce_hlist) { + hash_for_each_rcu(cache_htable, bucket, ce, hlist) { seq_printf(m, "cache entry: path=%s,type=%s,ttl=%d,etime=%ld," "interlink=%s,path_consumed=%d,expired=%s\n", - ce->ce_path, - ce->ce_srvtype == DFS_TYPE_ROOT ? "root" : "link", - ce->ce_ttl, ce->ce_etime.tv_nsec, - IS_INTERLINK_SET(ce->ce_flags) ? "yes" : "no", - ce->ce_path_consumed, + ce->path, + ce->srvtype == DFS_TYPE_ROOT ? "root" : "link", + ce->ttl, ce->etime.tv_nsec, + IS_INTERLINK_SET(ce->flags) ? "yes" : "no", + ce->path_consumed, cache_entry_expired(ce) ? "yes" : "no");
- list_for_each_entry(t, &ce->ce_tlist, t_list) { + list_for_each_entry(t, &ce->tlist, list) { seq_printf(m, " %s%s\n", - t->t_name, - ce->ce_tgthint == t ? " (target hint)" : ""); + t->name, + ce->tgthint == t ? " (target hint)" : ""); }
} rcu_read_unlock();
- mutex_unlock(&dfs_cache_list_lock); + mutex_unlock(&list_lock); return 0; }
@@ -205,9 +204,9 @@ static ssize_t dfscache_proc_write(struc return -EINVAL;
cifs_dbg(FYI, "clearing dfs cache"); - mutex_lock(&dfs_cache_list_lock); + mutex_lock(&list_lock); flush_cache_ents(); - mutex_unlock(&dfs_cache_list_lock); + mutex_unlock(&list_lock);
return count; } @@ -226,25 +225,25 @@ const struct file_operations dfscache_pr };
#ifdef CONFIG_CIFS_DEBUG2 -static inline void dump_tgts(const struct dfs_cache_entry *ce) +static inline void dump_tgts(const struct cache_entry *ce) { - struct dfs_cache_tgt *t; + struct cache_dfs_tgt *t;
cifs_dbg(FYI, "target list:\n"); - list_for_each_entry(t, &ce->ce_tlist, t_list) { - cifs_dbg(FYI, " %s%s\n", t->t_name, - ce->ce_tgthint == t ? " (target hint)" : ""); + list_for_each_entry(t, &ce->tlist, list) { + cifs_dbg(FYI, " %s%s\n", t->name, + ce->tgthint == t ? " (target hint)" : ""); } }
-static inline void dump_ce(const struct dfs_cache_entry *ce) +static inline void dump_ce(const struct cache_entry *ce) { cifs_dbg(FYI, "cache entry: path=%s,type=%s,ttl=%d,etime=%ld," - "interlink=%s,path_consumed=%d,expired=%s\n", ce->ce_path, - ce->ce_srvtype == DFS_TYPE_ROOT ? "root" : "link", ce->ce_ttl, - ce->ce_etime.tv_nsec, - IS_INTERLINK_SET(ce->ce_flags) ? "yes" : "no", - ce->ce_path_consumed, + "interlink=%s,path_consumed=%d,expired=%s\n", ce->path, + ce->srvtype == DFS_TYPE_ROOT ? "root" : "link", ce->ttl, + ce->etime.tv_nsec, + IS_INTERLINK_SET(ce->flags) ? "yes" : "no", + ce->path_consumed, cache_entry_expired(ce) ? "yes" : "no"); dump_tgts(ce); } @@ -284,25 +283,34 @@ static inline void dump_refs(const struc */ int dfs_cache_init(void) { + int rc; int i;
- dfs_cache_slab = kmem_cache_create("cifs_dfs_cache", - sizeof(struct dfs_cache_entry), 0, - SLAB_HWCACHE_ALIGN, NULL); - if (!dfs_cache_slab) + dfscache_wq = alloc_workqueue("cifs-dfscache", + WQ_FREEZABLE | WQ_MEM_RECLAIM, 1); + if (!dfscache_wq) return -ENOMEM;
- for (i = 0; i < DFS_CACHE_HTABLE_SIZE; i++) - INIT_HLIST_HEAD(&dfs_cache_htable[i]); + cache_slab = kmem_cache_create("cifs_dfs_cache", + sizeof(struct cache_entry), 0, + SLAB_HWCACHE_ALIGN, NULL); + if (!cache_slab) { + rc = -ENOMEM; + goto out_destroy_wq; + } + + for (i = 0; i < CACHE_HTABLE_SIZE; i++) + INIT_HLIST_HEAD(&cache_htable[i]);
- INIT_LIST_HEAD(&dfs_cache.dc_vol_list); - mutex_init(&dfs_cache.dc_lock); - INIT_DELAYED_WORK(&dfs_cache.dc_refresh, refresh_cache_worker); - dfs_cache.dc_ttl = -1; - dfs_cache.dc_nlsc = load_nls_default(); + cache_ttl = -1; + cache_nlsc = load_nls_default();
cifs_dbg(FYI, "%s: initialized DFS referral cache\n", __func__); return 0; + +out_destroy_wq: + destroy_workqueue(dfscache_wq); + return rc; }
static inline unsigned int cache_entry_hash(const void *data, int size) @@ -310,7 +318,7 @@ static inline unsigned int cache_entry_h unsigned int h;
h = jhash(data, size, 0); - return h & (DFS_CACHE_HTABLE_SIZE - 1); + return h & (CACHE_HTABLE_SIZE - 1); }
/* Check whether second path component of @path is SYSVOL or NETLOGON */ @@ -325,11 +333,11 @@ static inline bool is_sysvol_or_netlogon }
/* Return target hint of a DFS cache entry */ -static inline char *get_tgt_name(const struct dfs_cache_entry *ce) +static inline char *get_tgt_name(const struct cache_entry *ce) { - struct dfs_cache_tgt *t = ce->ce_tgthint; + struct cache_dfs_tgt *t = ce->tgthint;
- return t ? t->t_name : ERR_PTR(-ENOENT); + return t ? t->name : ERR_PTR(-ENOENT); }
/* Return expire time out of a new entry's TTL */ @@ -346,19 +354,19 @@ static inline struct timespec64 get_expi }
/* Allocate a new DFS target */ -static inline struct dfs_cache_tgt *alloc_tgt(const char *name) +static inline struct cache_dfs_tgt *alloc_tgt(const char *name) { - struct dfs_cache_tgt *t; + struct cache_dfs_tgt *t;
t = kmalloc(sizeof(*t), GFP_KERNEL); if (!t) return ERR_PTR(-ENOMEM); - t->t_name = kstrndup(name, strlen(name), GFP_KERNEL); - if (!t->t_name) { + t->name = kstrndup(name, strlen(name), GFP_KERNEL); + if (!t->name) { kfree(t); return ERR_PTR(-ENOMEM); } - INIT_LIST_HEAD(&t->t_list); + INIT_LIST_HEAD(&t->list); return t; }
@@ -367,63 +375,63 @@ static inline struct dfs_cache_tgt *allo * target hint. */ static int copy_ref_data(const struct dfs_info3_param *refs, int numrefs, - struct dfs_cache_entry *ce, const char *tgthint) + struct cache_entry *ce, const char *tgthint) { int i;
- ce->ce_ttl = refs[0].ttl; - ce->ce_etime = get_expire_time(ce->ce_ttl); - ce->ce_srvtype = refs[0].server_type; - ce->ce_flags = refs[0].ref_flag; - ce->ce_path_consumed = refs[0].path_consumed; + ce->ttl = refs[0].ttl; + ce->etime = get_expire_time(ce->ttl); + ce->srvtype = refs[0].server_type; + ce->flags = refs[0].ref_flag; + ce->path_consumed = refs[0].path_consumed;
for (i = 0; i < numrefs; i++) { - struct dfs_cache_tgt *t; + struct cache_dfs_tgt *t;
t = alloc_tgt(refs[i].node_name); if (IS_ERR(t)) { free_tgts(ce); return PTR_ERR(t); } - if (tgthint && !strcasecmp(t->t_name, tgthint)) { - list_add(&t->t_list, &ce->ce_tlist); + if (tgthint && !strcasecmp(t->name, tgthint)) { + list_add(&t->list, &ce->tlist); tgthint = NULL; } else { - list_add_tail(&t->t_list, &ce->ce_tlist); + list_add_tail(&t->list, &ce->tlist); } - ce->ce_numtgts++; + ce->numtgts++; }
- ce->ce_tgthint = list_first_entry_or_null(&ce->ce_tlist, - struct dfs_cache_tgt, t_list); + ce->tgthint = list_first_entry_or_null(&ce->tlist, + struct cache_dfs_tgt, list);
return 0; }
/* Allocate a new cache entry */ -static struct dfs_cache_entry * -alloc_cache_entry(const char *path, const struct dfs_info3_param *refs, - int numrefs) +static struct cache_entry *alloc_cache_entry(const char *path, + const struct dfs_info3_param *refs, + int numrefs) { - struct dfs_cache_entry *ce; + struct cache_entry *ce; int rc;
- ce = kmem_cache_zalloc(dfs_cache_slab, GFP_KERNEL); + ce = kmem_cache_zalloc(cache_slab, GFP_KERNEL); if (!ce) return ERR_PTR(-ENOMEM);
- ce->ce_path = kstrdup_const(path, GFP_KERNEL); - if (!ce->ce_path) { - kmem_cache_free(dfs_cache_slab, ce); + ce->path = kstrdup_const(path, GFP_KERNEL); + if (!ce->path) { + kmem_cache_free(cache_slab, ce); return ERR_PTR(-ENOMEM); } - INIT_HLIST_NODE(&ce->ce_hlist); - INIT_LIST_HEAD(&ce->ce_tlist); + INIT_HLIST_NODE(&ce->hlist); + INIT_LIST_HEAD(&ce->tlist);
rc = copy_ref_data(refs, numrefs, ce, NULL); if (rc) { - kfree_const(ce->ce_path); - kmem_cache_free(dfs_cache_slab, ce); + kfree_const(ce->path); + kmem_cache_free(cache_slab, ce); ce = ERR_PTR(rc); } return ce; @@ -432,13 +440,13 @@ alloc_cache_entry(const char *path, cons static void remove_oldest_entry(void) { int bucket; - struct dfs_cache_entry *ce; - struct dfs_cache_entry *to_del = NULL; + struct cache_entry *ce; + struct cache_entry *to_del = NULL;
rcu_read_lock(); - hash_for_each_rcu(dfs_cache_htable, bucket, ce, ce_hlist) { - if (!to_del || timespec64_compare(&ce->ce_etime, - &to_del->ce_etime) < 0) + hash_for_each_rcu(cache_htable, bucket, ce, hlist) { + if (!to_del || timespec64_compare(&ce->etime, + &to_del->etime) < 0) to_del = ce; } if (!to_del) { @@ -453,93 +461,84 @@ out: }
/* Add a new DFS cache entry */ -static inline struct dfs_cache_entry * +static inline struct cache_entry * add_cache_entry(unsigned int hash, const char *path, const struct dfs_info3_param *refs, int numrefs) { - struct dfs_cache_entry *ce; + struct cache_entry *ce;
ce = alloc_cache_entry(path, refs, numrefs); if (IS_ERR(ce)) return ce;
- hlist_add_head_rcu(&ce->ce_hlist, &dfs_cache_htable[hash]); + hlist_add_head_rcu(&ce->hlist, &cache_htable[hash]);
- mutex_lock(&dfs_cache.dc_lock); - if (dfs_cache.dc_ttl < 0) { - dfs_cache.dc_ttl = ce->ce_ttl; - queue_delayed_work(cifsiod_wq, &dfs_cache.dc_refresh, - dfs_cache.dc_ttl * HZ); + mutex_lock(&vol_lock); + if (cache_ttl < 0) { + cache_ttl = ce->ttl; + queue_delayed_work(dfscache_wq, &refresh_task, cache_ttl * HZ); } else { - dfs_cache.dc_ttl = min_t(int, dfs_cache.dc_ttl, ce->ce_ttl); - mod_delayed_work(cifsiod_wq, &dfs_cache.dc_refresh, - dfs_cache.dc_ttl * HZ); + cache_ttl = min_t(int, cache_ttl, ce->ttl); + mod_delayed_work(dfscache_wq, &refresh_task, cache_ttl * HZ); } - mutex_unlock(&dfs_cache.dc_lock); + mutex_unlock(&vol_lock);
return ce; }
-static struct dfs_cache_entry *__find_cache_entry(unsigned int hash, - const char *path) +/* + * Find a DFS cache entry in hash table and optionally check prefix path against + * @path. + * Use whole path components in the match. + * Return ERR_PTR(-ENOENT) if the entry is not found. + */ +static struct cache_entry *lookup_cache_entry(const char *path, + unsigned int *hash) { - struct dfs_cache_entry *ce; + struct cache_entry *ce; + unsigned int h; bool found = false;
- rcu_read_lock(); - hlist_for_each_entry_rcu(ce, &dfs_cache_htable[hash], ce_hlist) { - if (!strcasecmp(path, ce->ce_path)) { -#ifdef CONFIG_CIFS_DEBUG2 - char *name = get_tgt_name(ce); + h = cache_entry_hash(path, strlen(path));
- if (IS_ERR(name)) { - rcu_read_unlock(); - return ERR_CAST(name); - } - cifs_dbg(FYI, "%s: cache hit\n", __func__); - cifs_dbg(FYI, "%s: target hint: %s\n", __func__, name); -#endif + rcu_read_lock(); + hlist_for_each_entry_rcu(ce, &cache_htable[h], hlist) { + if (!strcasecmp(path, ce->path)) { found = true; + dump_ce(ce); break; } } rcu_read_unlock(); - return found ? ce : ERR_PTR(-ENOENT); -}
-/* - * Find a DFS cache entry in hash table and optionally check prefix path against - * @path. - * Use whole path components in the match. - * Return ERR_PTR(-ENOENT) if the entry is not found. - */ -static inline struct dfs_cache_entry *find_cache_entry(const char *path, - unsigned int *hash) -{ - *hash = cache_entry_hash(path, strlen(path)); - return __find_cache_entry(*hash, path); + if (!found) + ce = ERR_PTR(-ENOENT); + if (hash) + *hash = h; + + return ce; }
static inline void destroy_slab_cache(void) { rcu_barrier(); - kmem_cache_destroy(dfs_cache_slab); + kmem_cache_destroy(cache_slab); }
-static inline void free_vol(struct dfs_cache_vol_info *vi) +static inline void free_vol(struct vol_info *vi) { - list_del(&vi->vi_list); - kfree(vi->vi_fullpath); - kfree(vi->vi_mntdata); - cifs_cleanup_volume_info_contents(&vi->vi_vol); + list_del(&vi->list); + kfree(vi->fullpath); + kfree(vi->mntdata); + cifs_cleanup_volume_info_contents(&vi->smb_vol); kfree(vi); }
static inline void free_vol_list(void) { - struct dfs_cache_vol_info *vi, *nvi; + struct vol_info *vi, *nvi;
- list_for_each_entry_safe(vi, nvi, &dfs_cache.dc_vol_list, vi_list) + list_for_each_entry_safe(vi, nvi, &vol_list, list) free_vol(vi); }
@@ -548,40 +547,38 @@ static inline void free_vol_list(void) */ void dfs_cache_destroy(void) { - cancel_delayed_work_sync(&dfs_cache.dc_refresh); - unload_nls(dfs_cache.dc_nlsc); + cancel_delayed_work_sync(&refresh_task); + unload_nls(cache_nlsc); free_vol_list(); - mutex_destroy(&dfs_cache.dc_lock); - flush_cache_ents(); destroy_slab_cache(); - mutex_destroy(&dfs_cache_list_lock); + destroy_workqueue(dfscache_wq);
cifs_dbg(FYI, "%s: destroyed DFS referral cache\n", __func__); }
-static inline struct dfs_cache_entry * +static inline struct cache_entry * __update_cache_entry(const char *path, const struct dfs_info3_param *refs, int numrefs) { int rc; unsigned int h; - struct dfs_cache_entry *ce; + struct cache_entry *ce; char *s, *th = NULL;
- ce = find_cache_entry(path, &h); + ce = lookup_cache_entry(path, &h); if (IS_ERR(ce)) return ce;
- if (ce->ce_tgthint) { - s = ce->ce_tgthint->t_name; + if (ce->tgthint) { + s = ce->tgthint->name; th = kstrndup(s, strlen(s), GFP_KERNEL); if (!th) return ERR_PTR(-ENOMEM); }
free_tgts(ce); - ce->ce_numtgts = 0; + ce->numtgts = 0;
rc = copy_ref_data(refs, numrefs, ce, th); kfree(th); @@ -593,10 +590,10 @@ __update_cache_entry(const char *path, c }
/* Update an expired cache entry by getting a new DFS referral from server */ -static struct dfs_cache_entry * +static struct cache_entry * update_cache_entry(const unsigned int xid, struct cifs_ses *ses, const struct nls_table *nls_codepage, int remap, - const char *path, struct dfs_cache_entry *ce) + const char *path, struct cache_entry *ce) { int rc; struct dfs_info3_param *refs = NULL; @@ -636,20 +633,20 @@ update_cache_entry(const unsigned int xi * For interlinks, __cifs_dfs_mount() and expand_dfs_referral() are supposed to * handle them properly. */ -static struct dfs_cache_entry * +static struct cache_entry * do_dfs_cache_find(const unsigned int xid, struct cifs_ses *ses, const struct nls_table *nls_codepage, int remap, const char *path, bool noreq) { int rc; unsigned int h; - struct dfs_cache_entry *ce; + struct cache_entry *ce; struct dfs_info3_param *nrefs; int numnrefs;
cifs_dbg(FYI, "%s: search path: %s\n", __func__, path);
- ce = find_cache_entry(path, &h); + ce = lookup_cache_entry(path, &h); if (IS_ERR(ce)) { cifs_dbg(FYI, "%s: cache miss\n", __func__); /* @@ -690,9 +687,9 @@ do_dfs_cache_find(const unsigned int xid
cifs_dbg(FYI, "%s: new cache entry\n", __func__);
- if (dfs_cache_count >= DFS_CACHE_MAX_ENTRIES) { + if (cache_count >= CACHE_MAX_ENTRIES) { cifs_dbg(FYI, "%s: reached max cache size (%d)", - __func__, DFS_CACHE_MAX_ENTRIES); + __func__, CACHE_MAX_ENTRIES); remove_oldest_entry(); } ce = add_cache_entry(h, path, nrefs, numnrefs); @@ -701,7 +698,7 @@ do_dfs_cache_find(const unsigned int xid if (IS_ERR(ce)) return ce;
- dfs_cache_count++; + cache_count++; }
dump_ce(ce); @@ -723,7 +720,7 @@ do_dfs_cache_find(const unsigned int xid }
/* Set up a new DFS referral from a given cache entry */ -static int setup_ref(const char *path, const struct dfs_cache_entry *ce, +static int setup_ref(const char *path, const struct cache_entry *ce, struct dfs_info3_param *ref, const char *tgt) { int rc; @@ -736,7 +733,7 @@ static int setup_ref(const char *path, c if (!ref->path_name) return -ENOMEM;
- ref->path_consumed = ce->ce_path_consumed; + ref->path_consumed = ce->path_consumed;
ref->node_name = kstrndup(tgt, strlen(tgt), GFP_KERNEL); if (!ref->node_name) { @@ -744,9 +741,9 @@ static int setup_ref(const char *path, c goto err_free_path; }
- ref->ttl = ce->ce_ttl; - ref->server_type = ce->ce_srvtype; - ref->ref_flag = ce->ce_flags; + ref->ttl = ce->ttl; + ref->server_type = ce->srvtype; + ref->ref_flag = ce->flags;
return 0;
@@ -757,25 +754,25 @@ err_free_path: }
/* Return target list of a DFS cache entry */ -static int get_tgt_list(const struct dfs_cache_entry *ce, +static int get_tgt_list(const struct cache_entry *ce, struct dfs_cache_tgt_list *tl) { int rc; struct list_head *head = &tl->tl_list; - struct dfs_cache_tgt *t; + struct cache_dfs_tgt *t; struct dfs_cache_tgt_iterator *it, *nit;
memset(tl, 0, sizeof(*tl)); INIT_LIST_HEAD(head);
- list_for_each_entry(t, &ce->ce_tlist, t_list) { + list_for_each_entry(t, &ce->tlist, list) { it = kzalloc(sizeof(*it), GFP_KERNEL); if (!it) { rc = -ENOMEM; goto err_free_it; }
- it->it_name = kstrndup(t->t_name, strlen(t->t_name), + it->it_name = kstrndup(t->name, strlen(t->name), GFP_KERNEL); if (!it->it_name) { kfree(it); @@ -783,12 +780,12 @@ static int get_tgt_list(const struct dfs goto err_free_it; }
- if (ce->ce_tgthint == t) + if (ce->tgthint == t) list_add(&it->it_list, head); else list_add_tail(&it->it_list, head); } - tl->tl_numtgts = ce->ce_numtgts; + tl->tl_numtgts = ce->numtgts;
return 0;
@@ -829,7 +826,7 @@ int dfs_cache_find(const unsigned int xi { int rc; char *npath; - struct dfs_cache_entry *ce; + struct cache_entry *ce;
if (unlikely(!is_path_valid(path))) return -EINVAL; @@ -838,7 +835,7 @@ int dfs_cache_find(const unsigned int xi if (rc) return rc;
- mutex_lock(&dfs_cache_list_lock); + mutex_lock(&list_lock); ce = do_dfs_cache_find(xid, ses, nls_codepage, remap, npath, false); if (!IS_ERR(ce)) { if (ref) @@ -850,7 +847,7 @@ int dfs_cache_find(const unsigned int xi } else { rc = PTR_ERR(ce); } - mutex_unlock(&dfs_cache_list_lock); + mutex_unlock(&list_lock); free_normalized_path(path, npath); return rc; } @@ -876,7 +873,7 @@ int dfs_cache_noreq_find(const char *pat { int rc; char *npath; - struct dfs_cache_entry *ce; + struct cache_entry *ce;
if (unlikely(!is_path_valid(path))) return -EINVAL; @@ -885,7 +882,7 @@ int dfs_cache_noreq_find(const char *pat if (rc) return rc;
- mutex_lock(&dfs_cache_list_lock); + mutex_lock(&list_lock); ce = do_dfs_cache_find(0, NULL, NULL, 0, npath, true); if (IS_ERR(ce)) { rc = PTR_ERR(ce); @@ -899,7 +896,7 @@ int dfs_cache_noreq_find(const char *pat if (!rc && tgt_list) rc = get_tgt_list(ce, tgt_list); out: - mutex_unlock(&dfs_cache_list_lock); + mutex_unlock(&list_lock); free_normalized_path(path, npath); return rc; } @@ -929,8 +926,8 @@ int dfs_cache_update_tgthint(const unsig { int rc; char *npath; - struct dfs_cache_entry *ce; - struct dfs_cache_tgt *t; + struct cache_entry *ce; + struct cache_dfs_tgt *t;
if (unlikely(!is_path_valid(path))) return -EINVAL; @@ -941,7 +938,7 @@ int dfs_cache_update_tgthint(const unsig
cifs_dbg(FYI, "%s: path: %s\n", __func__, npath);
- mutex_lock(&dfs_cache_list_lock); + mutex_lock(&list_lock); ce = do_dfs_cache_find(xid, ses, nls_codepage, remap, npath, false); if (IS_ERR(ce)) { rc = PTR_ERR(ce); @@ -950,14 +947,14 @@ int dfs_cache_update_tgthint(const unsig
rc = 0;
- t = ce->ce_tgthint; + t = ce->tgthint;
- if (likely(!strcasecmp(it->it_name, t->t_name))) + if (likely(!strcasecmp(it->it_name, t->name))) goto out;
- list_for_each_entry(t, &ce->ce_tlist, t_list) { - if (!strcasecmp(t->t_name, it->it_name)) { - ce->ce_tgthint = t; + list_for_each_entry(t, &ce->tlist, list) { + if (!strcasecmp(t->name, it->it_name)) { + ce->tgthint = t; cifs_dbg(FYI, "%s: new target hint: %s\n", __func__, it->it_name); break; @@ -965,7 +962,7 @@ int dfs_cache_update_tgthint(const unsig }
out: - mutex_unlock(&dfs_cache_list_lock); + mutex_unlock(&list_lock); free_normalized_path(path, npath); return rc; } @@ -989,8 +986,8 @@ int dfs_cache_noreq_update_tgthint(const { int rc; char *npath; - struct dfs_cache_entry *ce; - struct dfs_cache_tgt *t; + struct cache_entry *ce; + struct cache_dfs_tgt *t;
if (unlikely(!is_path_valid(path)) || !it) return -EINVAL; @@ -1001,7 +998,7 @@ int dfs_cache_noreq_update_tgthint(const
cifs_dbg(FYI, "%s: path: %s\n", __func__, npath);
- mutex_lock(&dfs_cache_list_lock); + mutex_lock(&list_lock);
ce = do_dfs_cache_find(0, NULL, NULL, 0, npath, true); if (IS_ERR(ce)) { @@ -1011,14 +1008,14 @@ int dfs_cache_noreq_update_tgthint(const
rc = 0;
- t = ce->ce_tgthint; + t = ce->tgthint;
- if (unlikely(!strcasecmp(it->it_name, t->t_name))) + if (unlikely(!strcasecmp(it->it_name, t->name))) goto out;
- list_for_each_entry(t, &ce->ce_tlist, t_list) { - if (!strcasecmp(t->t_name, it->it_name)) { - ce->ce_tgthint = t; + list_for_each_entry(t, &ce->tlist, list) { + if (!strcasecmp(t->name, it->it_name)) { + ce->tgthint = t; cifs_dbg(FYI, "%s: new target hint: %s\n", __func__, it->it_name); break; @@ -1026,7 +1023,7 @@ int dfs_cache_noreq_update_tgthint(const }
out: - mutex_unlock(&dfs_cache_list_lock); + mutex_unlock(&list_lock); free_normalized_path(path, npath); return rc; } @@ -1047,7 +1044,7 @@ int dfs_cache_get_tgt_referral(const cha { int rc; char *npath; - struct dfs_cache_entry *ce; + struct cache_entry *ce; unsigned int h;
if (!it || !ref) @@ -1061,9 +1058,9 @@ int dfs_cache_get_tgt_referral(const cha
cifs_dbg(FYI, "%s: path: %s\n", __func__, npath);
- mutex_lock(&dfs_cache_list_lock); + mutex_lock(&list_lock);
- ce = find_cache_entry(npath, &h); + ce = lookup_cache_entry(npath, &h); if (IS_ERR(ce)) { rc = PTR_ERR(ce); goto out; @@ -1074,7 +1071,7 @@ int dfs_cache_get_tgt_referral(const cha rc = setup_ref(path, ce, ref, it->it_name);
out: - mutex_unlock(&dfs_cache_list_lock); + mutex_unlock(&list_lock); free_normalized_path(path, npath); return rc; } @@ -1085,7 +1082,7 @@ static int dup_vol(struct smb_vol *vol,
if (vol->username) { new->username = kstrndup(vol->username, strlen(vol->username), - GFP_KERNEL); + GFP_KERNEL); if (!new->username) return -ENOMEM; } @@ -1103,7 +1100,7 @@ static int dup_vol(struct smb_vol *vol, } if (vol->domainname) { new->domainname = kstrndup(vol->domainname, - strlen(vol->domainname), GFP_KERNEL); + strlen(vol->domainname), GFP_KERNEL); if (!new->domainname) goto err_free_unc; } @@ -1150,7 +1147,7 @@ err_free_username: int dfs_cache_add_vol(char *mntdata, struct smb_vol *vol, const char *fullpath) { int rc; - struct dfs_cache_vol_info *vi; + struct vol_info *vi;
if (!vol || !fullpath || !mntdata) return -EINVAL; @@ -1161,38 +1158,37 @@ int dfs_cache_add_vol(char *mntdata, str if (!vi) return -ENOMEM;
- vi->vi_fullpath = kstrndup(fullpath, strlen(fullpath), GFP_KERNEL); - if (!vi->vi_fullpath) { + vi->fullpath = kstrndup(fullpath, strlen(fullpath), GFP_KERNEL); + if (!vi->fullpath) { rc = -ENOMEM; goto err_free_vi; }
- rc = dup_vol(vol, &vi->vi_vol); + rc = dup_vol(vol, &vi->smb_vol); if (rc) goto err_free_fullpath;
- vi->vi_mntdata = mntdata; + vi->mntdata = mntdata;
- mutex_lock(&dfs_cache.dc_lock); - list_add_tail(&vi->vi_list, &dfs_cache.dc_vol_list); - mutex_unlock(&dfs_cache.dc_lock); + mutex_lock(&vol_lock); + list_add_tail(&vi->list, &vol_list); + mutex_unlock(&vol_lock); return 0;
err_free_fullpath: - kfree(vi->vi_fullpath); + kfree(vi->fullpath); err_free_vi: kfree(vi); return rc; }
-static inline struct dfs_cache_vol_info *find_vol(const char *fullpath) +static inline struct vol_info *find_vol(const char *fullpath) { - struct dfs_cache_vol_info *vi; + struct vol_info *vi;
- list_for_each_entry(vi, &dfs_cache.dc_vol_list, vi_list) { - cifs_dbg(FYI, "%s: vi->vi_fullpath: %s\n", __func__, - vi->vi_fullpath); - if (!strcasecmp(vi->vi_fullpath, fullpath)) + list_for_each_entry(vi, &vol_list, list) { + cifs_dbg(FYI, "%s: vi->fullpath: %s\n", __func__, vi->fullpath); + if (!strcasecmp(vi->fullpath, fullpath)) return vi; } return ERR_PTR(-ENOENT); @@ -1209,14 +1205,14 @@ static inline struct dfs_cache_vol_info int dfs_cache_update_vol(const char *fullpath, struct TCP_Server_Info *server) { int rc; - struct dfs_cache_vol_info *vi; + struct vol_info *vi;
if (!fullpath || !server) return -EINVAL;
cifs_dbg(FYI, "%s: fullpath: %s\n", __func__, fullpath);
- mutex_lock(&dfs_cache.dc_lock); + mutex_lock(&vol_lock);
vi = find_vol(fullpath); if (IS_ERR(vi)) { @@ -1225,12 +1221,12 @@ int dfs_cache_update_vol(const char *ful }
cifs_dbg(FYI, "%s: updating volume info\n", __func__); - memcpy(&vi->vi_vol.dstaddr, &server->dstaddr, - sizeof(vi->vi_vol.dstaddr)); + memcpy(&vi->smb_vol.dstaddr, &server->dstaddr, + sizeof(vi->smb_vol.dstaddr)); rc = 0;
out: - mutex_unlock(&dfs_cache.dc_lock); + mutex_unlock(&vol_lock); return rc; }
@@ -1241,18 +1237,18 @@ out: */ void dfs_cache_del_vol(const char *fullpath) { - struct dfs_cache_vol_info *vi; + struct vol_info *vi;
if (!fullpath || !*fullpath) return;
cifs_dbg(FYI, "%s: fullpath: %s\n", __func__, fullpath);
- mutex_lock(&dfs_cache.dc_lock); + mutex_lock(&vol_lock); vi = find_vol(fullpath); if (!IS_ERR(vi)) free_vol(vi); - mutex_unlock(&dfs_cache.dc_lock); + mutex_unlock(&vol_lock); }
/* Get all tcons that are within a DFS namespace and can be refreshed */ @@ -1280,7 +1276,7 @@ static void get_tcons(struct TCP_Server_ spin_unlock(&cifs_tcp_ses_lock); }
-static inline bool is_dfs_link(const char *path) +static bool is_dfs_link(const char *path) { char *s;
@@ -1290,7 +1286,7 @@ static inline bool is_dfs_link(const cha return !!strchr(s + 1, '\'); }
-static inline char *get_dfs_root(const char *path) +static char *get_dfs_root(const char *path) { char *s, *npath;
@@ -1310,8 +1306,9 @@ static inline char *get_dfs_root(const c }
/* Find root SMB session out of a DFS link path */ -static struct cifs_ses *find_root_ses(struct dfs_cache_vol_info *vi, - struct cifs_tcon *tcon, const char *path) +static struct cifs_ses *find_root_ses(struct vol_info *vi, + struct cifs_tcon *tcon, + const char *path) { char *rpath; int rc; @@ -1333,8 +1330,7 @@ static struct cifs_ses *find_root_ses(st goto out; }
- mdata = cifs_compose_mount_options(vi->vi_mntdata, rpath, &ref, - &devname); + mdata = cifs_compose_mount_options(vi->mntdata, rpath, &ref, &devname); free_dfs_info_param(&ref);
if (IS_ERR(mdata)) { @@ -1373,14 +1369,13 @@ out: }
/* Refresh DFS cache entry from a given tcon */ -static void do_refresh_tcon(struct dfs_cache *dc, struct dfs_cache_vol_info *vi, - struct cifs_tcon *tcon) +static void refresh_tcon(struct vol_info *vi, struct cifs_tcon *tcon) { int rc = 0; unsigned int xid; char *path, *npath; unsigned int h; - struct dfs_cache_entry *ce; + struct cache_entry *ce; struct dfs_info3_param *refs = NULL; int numrefs = 0; struct cifs_ses *root_ses = NULL, *ses; @@ -1393,9 +1388,9 @@ static void do_refresh_tcon(struct dfs_c if (rc) goto out;
- mutex_lock(&dfs_cache_list_lock); - ce = find_cache_entry(npath, &h); - mutex_unlock(&dfs_cache_list_lock); + mutex_lock(&list_lock); + ce = lookup_cache_entry(npath, &h); + mutex_unlock(&list_lock);
if (IS_ERR(ce)) { rc = PTR_ERR(ce); @@ -1421,12 +1416,12 @@ static void do_refresh_tcon(struct dfs_c rc = -EOPNOTSUPP; } else { rc = ses->server->ops->get_dfs_refer(xid, ses, path, &refs, - &numrefs, dc->dc_nlsc, + &numrefs, cache_nlsc, tcon->remap); if (!rc) { - mutex_lock(&dfs_cache_list_lock); + mutex_lock(&list_lock); ce = __update_cache_entry(npath, refs, numrefs); - mutex_unlock(&dfs_cache_list_lock); + mutex_unlock(&list_lock); dump_refs(refs, numrefs); free_dfs_info_array(refs, numrefs); if (IS_ERR(ce)) @@ -1448,30 +1443,28 @@ out: */ static void refresh_cache_worker(struct work_struct *work) { - struct dfs_cache *dc = container_of(work, struct dfs_cache, - dc_refresh.work); - struct dfs_cache_vol_info *vi; + struct vol_info *vi; struct TCP_Server_Info *server; LIST_HEAD(list); struct cifs_tcon *tcon, *ntcon;
- mutex_lock(&dc->dc_lock); + mutex_lock(&vol_lock);
- list_for_each_entry(vi, &dc->dc_vol_list, vi_list) { - server = cifs_find_tcp_session(&vi->vi_vol); + list_for_each_entry(vi, &vol_list, list) { + server = cifs_find_tcp_session(&vi->smb_vol); if (IS_ERR_OR_NULL(server)) continue; if (server->tcpStatus != CifsGood) goto next; get_tcons(server, &list); list_for_each_entry_safe(tcon, ntcon, &list, ulist) { - do_refresh_tcon(dc, vi, tcon); + refresh_tcon(vi, tcon); list_del_init(&tcon->ulist); cifs_put_tcon(tcon); } next: cifs_put_tcp_session(server, 0); } - queue_delayed_work(cifsiod_wq, &dc->dc_refresh, dc->dc_ttl * HZ); - mutex_unlock(&dc->dc_lock); + queue_delayed_work(dfscache_wq, &refresh_task, cache_ttl * HZ); + mutex_unlock(&vol_lock); }
From: "Paulo Alcantara (SUSE)" pc@cjr.nz
commit 199c6bdfb04b71d88a7765e08285885fbca60df4 upstream.
The DFS cache API is mostly used with heap allocated strings.
Signed-off-by: Paulo Alcantara (SUSE) pc@cjr.nz Reviewed-by: Aurelien Aptel aaptel@suse.com Signed-off-by: Steve French stfrench@microsoft.com Signed-off-by: Rishabh Bhatnagar risbhat@amazon.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/cifs/dfs_cache.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
--- a/fs/cifs/dfs_cache.c +++ b/fs/cifs/dfs_cache.c @@ -131,7 +131,7 @@ static inline void flush_cache_ent(struc return;
hlist_del_init_rcu(&ce->hlist); - kfree_const(ce->path); + kfree(ce->path); free_tgts(ce); cache_count--; call_rcu(&ce->rcu, free_cache_entry); @@ -420,7 +420,7 @@ static struct cache_entry *alloc_cache_e if (!ce) return ERR_PTR(-ENOMEM);
- ce->path = kstrdup_const(path, GFP_KERNEL); + ce->path = kstrndup(path, strlen(path), GFP_KERNEL); if (!ce->path) { kmem_cache_free(cache_slab, ce); return ERR_PTR(-ENOMEM); @@ -430,7 +430,7 @@ static struct cache_entry *alloc_cache_e
rc = copy_ref_data(refs, numrefs, ce, NULL); if (rc) { - kfree_const(ce->path); + kfree(ce->path); kmem_cache_free(cache_slab, ce); ce = ERR_PTR(rc); }
From: "Paulo Alcantara (SUSE)" pc@cjr.nz
commit 345c1a4a9e09dc5842b7bbb6728a77910db69c52 upstream.
Add helpers for finding TCP connections that are good candidates for being used by DFS refresh worker.
Signed-off-by: Paulo Alcantara (SUSE) pc@cjr.nz Reviewed-by: Aurelien Aptel aaptel@suse.com Signed-off-by: Steve French stfrench@microsoft.com Signed-off-by: Rishabh Bhatnagar risbhat@amazon.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/cifs/dfs_cache.c | 44 +++++++++++++++++++++++++++++++------------- 1 file changed, 31 insertions(+), 13 deletions(-)
--- a/fs/cifs/dfs_cache.c +++ b/fs/cifs/dfs_cache.c @@ -1305,6 +1305,30 @@ static char *get_dfs_root(const char *pa return npath; }
+static inline void put_tcp_server(struct TCP_Server_Info *server) +{ + cifs_put_tcp_session(server, 0); +} + +static struct TCP_Server_Info *get_tcp_server(struct smb_vol *vol) +{ + struct TCP_Server_Info *server; + + server = cifs_find_tcp_session(vol); + if (IS_ERR_OR_NULL(server)) + return NULL; + + spin_lock(&GlobalMid_Lock); + if (server->tcpStatus != CifsGood) { + spin_unlock(&GlobalMid_Lock); + put_tcp_server(server); + return NULL; + } + spin_unlock(&GlobalMid_Lock); + + return server; +} + /* Find root SMB session out of a DFS link path */ static struct cifs_ses *find_root_ses(struct vol_info *vi, struct cifs_tcon *tcon, @@ -1347,13 +1371,8 @@ static struct cifs_ses *find_root_ses(st goto out; }
- server = cifs_find_tcp_session(&vol); - if (IS_ERR_OR_NULL(server)) { - ses = ERR_PTR(-EHOSTDOWN); - goto out; - } - if (server->tcpStatus != CifsGood) { - cifs_put_tcp_session(server, 0); + server = get_tcp_server(&vol); + if (!server) { ses = ERR_PTR(-EHOSTDOWN); goto out; } @@ -1451,19 +1470,18 @@ static void refresh_cache_worker(struct mutex_lock(&vol_lock);
list_for_each_entry(vi, &vol_list, list) { - server = cifs_find_tcp_session(&vi->smb_vol); - if (IS_ERR_OR_NULL(server)) + server = get_tcp_server(&vi->smb_vol); + if (!server) continue; - if (server->tcpStatus != CifsGood) - goto next; + get_tcons(server, &list); list_for_each_entry_safe(tcon, ntcon, &list, ulist) { refresh_tcon(vi, tcon); list_del_init(&tcon->ulist); cifs_put_tcon(tcon); } -next: - cifs_put_tcp_session(server, 0); + + put_tcp_server(server); } queue_delayed_work(dfscache_wq, &refresh_task, cache_ttl * HZ); mutex_unlock(&vol_lock);
From: "Paulo Alcantara (SUSE)" pc@cjr.nz
commit ff2f7fc08268f266372c30a815349749e8499eb5 upstream.
Just do the trivial path validation in get_normalized_path().
Signed-off-by: Paulo Alcantara (SUSE) pc@cjr.nz Reviewed-by: Aurelien Aptel aaptel@suse.com Signed-off-by: Steve French stfrench@microsoft.com Signed-off-by: Rishabh Bhatnagar risbhat@amazon.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/cifs/dfs_cache.c | 21 ++++----------------- 1 file changed, 4 insertions(+), 17 deletions(-)
--- a/fs/cifs/dfs_cache.c +++ b/fs/cifs/dfs_cache.c @@ -75,13 +75,11 @@ static void refresh_cache_worker(struct
static DECLARE_DELAYED_WORK(refresh_task, refresh_cache_worker);
-static inline bool is_path_valid(const char *path) +static int get_normalized_path(const char *path, char **npath) { - return path && (strchr(path + 1, '\') || strchr(path + 1, '/')); -} + if (!path || strlen(path) < 3 || (*path != '\' && *path != '/')) + return -EINVAL;
-static inline int get_normalized_path(const char *path, char **npath) -{ if (*path == '\') { *npath = (char *)path; } else { @@ -828,9 +826,6 @@ int dfs_cache_find(const unsigned int xi char *npath; struct cache_entry *ce;
- if (unlikely(!is_path_valid(path))) - return -EINVAL; - rc = get_normalized_path(path, &npath); if (rc) return rc; @@ -875,9 +870,6 @@ int dfs_cache_noreq_find(const char *pat char *npath; struct cache_entry *ce;
- if (unlikely(!is_path_valid(path))) - return -EINVAL; - rc = get_normalized_path(path, &npath); if (rc) return rc; @@ -929,9 +921,6 @@ int dfs_cache_update_tgthint(const unsig struct cache_entry *ce; struct cache_dfs_tgt *t;
- if (unlikely(!is_path_valid(path))) - return -EINVAL; - rc = get_normalized_path(path, &npath); if (rc) return rc; @@ -989,7 +978,7 @@ int dfs_cache_noreq_update_tgthint(const struct cache_entry *ce; struct cache_dfs_tgt *t;
- if (unlikely(!is_path_valid(path)) || !it) + if (!it) return -EINVAL;
rc = get_normalized_path(path, &npath); @@ -1049,8 +1038,6 @@ int dfs_cache_get_tgt_referral(const cha
if (!it || !ref) return -EINVAL; - if (unlikely(!is_path_valid(path))) - return -EINVAL;
rc = get_normalized_path(path, &npath); if (rc)
From: "Paulo Alcantara (SUSE)" pc@cjr.nz
commit 06d57378bcc9b2c33640945174842115593795d1 upstream.
We can't acquire volume lock while refreshing the DFS cache because cifs_reconnect() may call dfs_cache_update_vol() while we are walking through the volume list.
To prevent that, make vol_info refcounted, create a temp list with all volumes eligible for refreshing, and then use it without any locks held.
Besides, replace vol_lock with a spinlock and protect cache_ttl from concurrent accesses or changes.
Signed-off-by: Paulo Alcantara (SUSE) pc@cjr.nz Signed-off-by: Steve French stfrench@microsoft.com Signed-off-by: Rishabh Bhatnagar risbhat@amazon.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/cifs/dfs_cache.c | 109 ++++++++++++++++++++++++++++++++++++---------------- 1 file changed, 77 insertions(+), 32 deletions(-)
--- a/fs/cifs/dfs_cache.c +++ b/fs/cifs/dfs_cache.c @@ -49,15 +49,20 @@ struct cache_entry {
struct vol_info { char *fullpath; + spinlock_t smb_vol_lock; struct smb_vol smb_vol; char *mntdata; struct list_head list; + struct list_head rlist; + struct kref refcnt; };
static struct kmem_cache *cache_slab __read_mostly; static struct workqueue_struct *dfscache_wq __read_mostly;
static int cache_ttl; +static DEFINE_SPINLOCK(cache_ttl_lock); + static struct nls_table *cache_nlsc;
/* @@ -69,7 +74,7 @@ static struct hlist_head cache_htable[CA static DEFINE_MUTEX(list_lock);
static LIST_HEAD(vol_list); -static DEFINE_MUTEX(vol_lock); +static DEFINE_SPINLOCK(vol_list_lock);
static void refresh_cache_worker(struct work_struct *work);
@@ -300,7 +305,6 @@ int dfs_cache_init(void) for (i = 0; i < CACHE_HTABLE_SIZE; i++) INIT_HLIST_HEAD(&cache_htable[i]);
- cache_ttl = -1; cache_nlsc = load_nls_default();
cifs_dbg(FYI, "%s: initialized DFS referral cache\n", __func__); @@ -471,15 +475,15 @@ add_cache_entry(unsigned int hash, const
hlist_add_head_rcu(&ce->hlist, &cache_htable[hash]);
- mutex_lock(&vol_lock); - if (cache_ttl < 0) { + spin_lock(&cache_ttl_lock); + if (!cache_ttl) { cache_ttl = ce->ttl; queue_delayed_work(dfscache_wq, &refresh_task, cache_ttl * HZ); } else { cache_ttl = min_t(int, cache_ttl, ce->ttl); mod_delayed_work(dfscache_wq, &refresh_task, cache_ttl * HZ); } - mutex_unlock(&vol_lock); + spin_unlock(&cache_ttl_lock);
return ce; } @@ -523,21 +527,32 @@ static inline void destroy_slab_cache(vo kmem_cache_destroy(cache_slab); }
-static inline void free_vol(struct vol_info *vi) +static void __vol_release(struct vol_info *vi) { - list_del(&vi->list); kfree(vi->fullpath); kfree(vi->mntdata); cifs_cleanup_volume_info_contents(&vi->smb_vol); kfree(vi); }
+static void vol_release(struct kref *kref) +{ + struct vol_info *vi = container_of(kref, struct vol_info, refcnt); + + spin_lock(&vol_list_lock); + list_del(&vi->list); + spin_unlock(&vol_list_lock); + __vol_release(vi); +} + static inline void free_vol_list(void) { struct vol_info *vi, *nvi;
- list_for_each_entry_safe(vi, nvi, &vol_list, list) - free_vol(vi); + list_for_each_entry_safe(vi, nvi, &vol_list, list) { + list_del_init(&vi->list); + __vol_release(vi); + } }
/** @@ -1156,10 +1171,13 @@ int dfs_cache_add_vol(char *mntdata, str goto err_free_fullpath;
vi->mntdata = mntdata; + spin_lock_init(&vi->smb_vol_lock); + kref_init(&vi->refcnt);
- mutex_lock(&vol_lock); + spin_lock(&vol_list_lock); list_add_tail(&vi->list, &vol_list); - mutex_unlock(&vol_lock); + spin_unlock(&vol_list_lock); + return 0;
err_free_fullpath: @@ -1169,7 +1187,8 @@ err_free_vi: return rc; }
-static inline struct vol_info *find_vol(const char *fullpath) +/* Must be called with vol_list_lock held */ +static struct vol_info *find_vol(const char *fullpath) { struct vol_info *vi;
@@ -1191,7 +1210,6 @@ static inline struct vol_info *find_vol( */ int dfs_cache_update_vol(const char *fullpath, struct TCP_Server_Info *server) { - int rc; struct vol_info *vi;
if (!fullpath || !server) @@ -1199,22 +1217,24 @@ int dfs_cache_update_vol(const char *ful
cifs_dbg(FYI, "%s: fullpath: %s\n", __func__, fullpath);
- mutex_lock(&vol_lock); - + spin_lock(&vol_list_lock); vi = find_vol(fullpath); if (IS_ERR(vi)) { - rc = PTR_ERR(vi); - goto out; + spin_unlock(&vol_list_lock); + return PTR_ERR(vi); } + kref_get(&vi->refcnt); + spin_unlock(&vol_list_lock);
cifs_dbg(FYI, "%s: updating volume info\n", __func__); + spin_lock(&vi->smb_vol_lock); memcpy(&vi->smb_vol.dstaddr, &server->dstaddr, sizeof(vi->smb_vol.dstaddr)); - rc = 0; + spin_unlock(&vi->smb_vol_lock);
-out: - mutex_unlock(&vol_lock); - return rc; + kref_put(&vi->refcnt, vol_release); + + return 0; }
/** @@ -1231,11 +1251,11 @@ void dfs_cache_del_vol(const char *fullp
cifs_dbg(FYI, "%s: fullpath: %s\n", __func__, fullpath);
- mutex_lock(&vol_lock); + spin_lock(&vol_list_lock); vi = find_vol(fullpath); - if (!IS_ERR(vi)) - free_vol(vi); - mutex_unlock(&vol_lock); + spin_unlock(&vol_list_lock); + + kref_put(&vi->refcnt, vol_release); }
/* Get all tcons that are within a DFS namespace and can be refreshed */ @@ -1449,27 +1469,52 @@ out: */ static void refresh_cache_worker(struct work_struct *work) { - struct vol_info *vi; + struct vol_info *vi, *nvi; struct TCP_Server_Info *server; - LIST_HEAD(list); + LIST_HEAD(vols); + LIST_HEAD(tcons); struct cifs_tcon *tcon, *ntcon;
- mutex_lock(&vol_lock); - + /* + * Find SMB volumes that are eligible (server->tcpStatus == CifsGood) + * for refreshing. + */ + spin_lock(&vol_list_lock); list_for_each_entry(vi, &vol_list, list) { server = get_tcp_server(&vi->smb_vol); if (!server) continue;
- get_tcons(server, &list); - list_for_each_entry_safe(tcon, ntcon, &list, ulist) { + kref_get(&vi->refcnt); + list_add_tail(&vi->rlist, &vols); + put_tcp_server(server); + } + spin_unlock(&vol_list_lock); + + /* Walk through all TCONs and refresh any expired cache entry */ + list_for_each_entry_safe(vi, nvi, &vols, rlist) { + spin_lock(&vi->smb_vol_lock); + server = get_tcp_server(&vi->smb_vol); + spin_unlock(&vi->smb_vol_lock); + + if (!server) + goto next_vol; + + get_tcons(server, &tcons); + list_for_each_entry_safe(tcon, ntcon, &tcons, ulist) { refresh_tcon(vi, tcon); list_del_init(&tcon->ulist); cifs_put_tcon(tcon); }
put_tcp_server(server); + +next_vol: + list_del_init(&vi->rlist); + kref_put(&vi->refcnt, vol_release); } + + spin_lock(&cache_ttl_lock); queue_delayed_work(dfscache_wq, &refresh_task, cache_ttl * HZ); - mutex_unlock(&vol_lock); + spin_unlock(&cache_ttl_lock); }
From: Lee Jones lee@kernel.org
commit d082d48737c75d2b3cc1f972b8c8674c25131534 upstream.
KPTI keeps around two PGDs: one for userspace and another for the kernel. Among other things, set_pgd() contains infrastructure to ensure that updates to the kernel PGD are reflected in the user PGD as well.
One side-effect of this is that set_pgd() expects to be passed whole pages. Unfortunately, init_trampoline_kaslr() passes in a single entry: 'trampoline_pgd_entry'.
When KPTI is on, set_pgd() will update 'trampoline_pgd_entry' (an 8-Byte globally stored [.bss] variable) and will then proceed to replicate that value into the non-existent neighboring user page (located +4k away), leading to the corruption of other global [.bss] stored variables.
Fix it by directly assigning 'trampoline_pgd_entry' and avoiding set_pgd().
[ dhansen: tweak subject and changelog ]
Fixes: 0925dda5962e ("x86/mm/KASLR: Use only one PUD entry for real mode trampoline") Suggested-by: Dave Hansen dave.hansen@linux.intel.com Signed-off-by: Lee Jones lee@kernel.org Signed-off-by: Dave Hansen dave.hansen@linux.intel.com Cc: stable@vger.kernel.org Link: https://lore.kernel.org/all/20230614163859.924309-1-lee@kernel.org/g Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/mm/kaslr.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)
--- a/arch/x86/mm/kaslr.c +++ b/arch/x86/mm/kaslr.c @@ -182,11 +182,11 @@ static void __meminit init_trampoline_pu set_p4d(p4d_tramp, __p4d(_KERNPG_TABLE | __pa(pud_page_tramp)));
- set_pgd(&trampoline_pgd_entry, - __pgd(_KERNPG_TABLE | __pa(p4d_page_tramp))); + trampoline_pgd_entry = + __pgd(_KERNPG_TABLE | __pa(p4d_page_tramp)); } else { - set_pgd(&trampoline_pgd_entry, - __pgd(_KERNPG_TABLE | __pa(pud_page_tramp))); + trampoline_pgd_entry = + __pgd(_KERNPG_TABLE | __pa(pud_page_tramp)); } }
From: Paul E. McKenney paulmck@kernel.org
[ Upstream commit a63fc6b75cca984c71f095282e0227a390ba88f3 ]
Although the rcu_swap_protected() macro follows the example of swap(), the interactions with RCU make its update of its argument somewhat counter-intuitive. This commit therefore introduces an rcu_replace_pointer() that returns the old value of the RCU pointer instead of doing the argument update. Once all the uses of rcu_swap_protected() are updated to instead use rcu_replace_pointer(), rcu_swap_protected() will be removed.
Link: https://lore.kernel.org/lkml/CAHk-=wiAsJLw1egFEE=Z7-GGtM6wcvtyytXZA1+BHqta4g... Reported-by: Linus Torvalds torvalds@linux-foundation.org [ paulmck: From rcu_replace() to rcu_replace_pointer() per Ingo Molnar. ] Signed-off-by: Paul E. McKenney paulmck@kernel.org Cc: Bart Van Assche bart.vanassche@wdc.com Cc: Christoph Hellwig hch@lst.de Cc: Hannes Reinecke hare@suse.de Cc: Johannes Thumshirn jthumshirn@suse.de Cc: Shane M Seymour shane.seymour@hpe.com Cc: Martin K. Petersen martin.petersen@oracle.com Stable-dep-of: a61675294735 ("ieee802154: hwsim: Fix possible memory leaks") Signed-off-by: Sasha Levin sashal@kernel.org --- include/linux/rcupdate.h | 18 ++++++++++++++++++ 1 file changed, 18 insertions(+)
diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h index 09e6ac4b669b2..c75b38ba4a728 100644 --- a/include/linux/rcupdate.h +++ b/include/linux/rcupdate.h @@ -384,6 +384,24 @@ do { \ smp_store_release(&p, RCU_INITIALIZER((typeof(p))_r_a_p__v)); \ } while (0)
+/** + * rcu_replace_pointer() - replace an RCU pointer, returning its old value + * @rcu_ptr: RCU pointer, whose old value is returned + * @ptr: regular pointer + * @c: the lockdep conditions under which the dereference will take place + * + * Perform a replacement, where @rcu_ptr is an RCU-annotated + * pointer and @c is the lockdep argument that is passed to the + * rcu_dereference_protected() call used to read that pointer. The old + * value of @rcu_ptr is returned, and @rcu_ptr is set to @ptr. + */ +#define rcu_replace_pointer(rcu_ptr, ptr, c) \ +({ \ + typeof(ptr) __tmp = rcu_dereference_protected((rcu_ptr), (c)); \ + rcu_assign_pointer((rcu_ptr), (ptr)); \ + __tmp; \ +}) + /** * rcu_swap_protected() - swap an RCU and a regular pointer * @rcu_ptr: RCU pointer
From: Chen Aotian chenaotian2@163.com
[ Upstream commit a61675294735570daca3779bd1dbb3715f7232bd ]
After replacing e->info, it is necessary to free the old einfo.
Fixes: f25da51fdc38 ("ieee802154: hwsim: add replacement for fakelb") Reviewed-by: Miquel Raynal miquel.raynal@bootlin.com Reviewed-by: Alexander Aring aahringo@redhat.com Signed-off-by: Chen Aotian chenaotian2@163.com Link: https://lore.kernel.org/r/20230409022048.61223-1-chenaotian2@163.com Signed-off-by: Stefan Schmidt stefan@datenfreihafen.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ieee802154/mac802154_hwsim.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ieee802154/mac802154_hwsim.c b/drivers/net/ieee802154/mac802154_hwsim.c index 1d181eff0c299..4028cbe275d67 100644 --- a/drivers/net/ieee802154/mac802154_hwsim.c +++ b/drivers/net/ieee802154/mac802154_hwsim.c @@ -522,7 +522,7 @@ static int hwsim_del_edge_nl(struct sk_buff *msg, struct genl_info *info) static int hwsim_set_edge_lqi(struct sk_buff *msg, struct genl_info *info) { struct nlattr *edge_attrs[MAC802154_HWSIM_EDGE_ATTR_MAX + 1]; - struct hwsim_edge_info *einfo; + struct hwsim_edge_info *einfo, *einfo_old; struct hwsim_phy *phy_v0; struct hwsim_edge *e; u32 v0, v1; @@ -560,8 +560,10 @@ static int hwsim_set_edge_lqi(struct sk_buff *msg, struct genl_info *info) list_for_each_entry_rcu(e, &phy_v0->edges, list) { if (e->endpoint->idx == v1) { einfo->lqi = lqi; - rcu_assign_pointer(e->info, einfo); + einfo_old = rcu_replace_pointer(e->info, einfo, + lockdep_is_held(&hwsim_phys_lock)); rcu_read_unlock(); + kfree_rcu(einfo_old, rcu); mutex_unlock(&hwsim_phys_lock); return 0; }
From: Sebastian Andrzej Siewior bigeasy@linutronix.de
[ Upstream commit f015b900bc3285322029b4a7d132d6aeb0e51857 ]
With offloading enabled, esp_xmit() gets invoked very late, from within validate_xmit_xfrm() which is after validate_xmit_skb() validates and linearizes the skb if the underlying device does not support fragments.
esp_output_tail() may add a fragment to the skb while adding the auth tag/ IV. Devices without the proper support will then send skb->data points to with the correct length so the packet will have garbage at the end. A pcap sniffer will claim that the proper data has been sent since it parses the skb properly.
It is not affected with INET_ESP_OFFLOAD disabled.
Linearize the skb after offloading if the sending hardware requires it. It was tested on v4, v6 has been adopted.
Fixes: 7785bba299a8d ("esp: Add a software GRO codepath") Signed-off-by: Sebastian Andrzej Siewior bigeasy@linutronix.de Signed-off-by: Steffen Klassert steffen.klassert@secunet.com Signed-off-by: Sasha Levin sashal@kernel.org --- net/ipv4/esp4_offload.c | 3 +++ net/ipv6/esp6_offload.c | 3 +++ 2 files changed, 6 insertions(+)
diff --git a/net/ipv4/esp4_offload.c b/net/ipv4/esp4_offload.c index 8c0af30fb0679..5cd219d7e7466 100644 --- a/net/ipv4/esp4_offload.c +++ b/net/ipv4/esp4_offload.c @@ -283,6 +283,9 @@ static int esp_xmit(struct xfrm_state *x, struct sk_buff *skb, netdev_features_
secpath_reset(skb);
+ if (skb_needs_linearize(skb, skb->dev->features) && + __skb_linearize(skb)) + return -ENOMEM; return 0; }
diff --git a/net/ipv6/esp6_offload.c b/net/ipv6/esp6_offload.c index 1c532638b2adf..e19c1844276f8 100644 --- a/net/ipv6/esp6_offload.c +++ b/net/ipv6/esp6_offload.c @@ -314,6 +314,9 @@ static int esp6_xmit(struct xfrm_state *x, struct sk_buff *skb, netdev_features
secpath_reset(skb);
+ if (skb_needs_linearize(skb, skb->dev->features) && + __skb_linearize(skb)) + return -ENOMEM; return 0; }
From: Stefan Wahren stefan.wahren@i2se.com
[ Upstream commit 92717c2356cb62c89e8a3dc37cbbab2502562524 ]
In case the QCA7000 is not available via SPI (e.g. in reset), the driver will cause a high load. The reason for this is that the synchronization is never finished and schedule() is never called. Since the synchronization is not timing critical, it's safe to drop this from the scheduling condition.
Signed-off-by: Stefan Wahren stefan.wahren@i2se.com Fixes: 291ab06ecf67 ("net: qualcomm: new Ethernet over SPI driver for QCA7000") Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/qualcomm/qca_spi.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/qualcomm/qca_spi.c b/drivers/net/ethernet/qualcomm/qca_spi.c index 15591ad5fe4ea..db6817de24a14 100644 --- a/drivers/net/ethernet/qualcomm/qca_spi.c +++ b/drivers/net/ethernet/qualcomm/qca_spi.c @@ -574,8 +574,7 @@ qcaspi_spi_thread(void *data) while (!kthread_should_stop()) { set_current_state(TASK_INTERRUPTIBLE); if ((qca->intr_req == qca->intr_svc) && - (qca->txr.skb[qca->txr.head] == NULL) && - (qca->sync == QCASPI_SYNC_READY)) + !qca->txr.skb[qca->txr.head]) schedule();
set_current_state(TASK_RUNNING);
From: Sergey Shtylyov s.shtylyov@omp.ru
[ Upstream commit 0c4dc0f054891a2cbde0426b0c0fdf232d89f47f ]
The driver overrides the error codes returned by platform_get_irq() to -EINVAL, so if it returns -EPROBE_DEFER, the driver will fail the probe permanently instead of the deferred probing. Switch to propagating the error codes upstream.
Fixes: 208489032bdd ("mmc: mediatek: Add Mediatek MMC driver") Signed-off-by: Sergey Shtylyov s.shtylyov@omp.ru Link: https://lore.kernel.org/r/20230617203622.6812-4-s.shtylyov@omp.ru Signed-off-by: Ulf Hansson ulf.hansson@linaro.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/mmc/host/mtk-sd.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/mmc/host/mtk-sd.c b/drivers/mmc/host/mtk-sd.c index 1254a5650cfff..2673890c76900 100644 --- a/drivers/mmc/host/mtk-sd.c +++ b/drivers/mmc/host/mtk-sd.c @@ -2249,7 +2249,7 @@ static int msdc_drv_probe(struct platform_device *pdev)
host->irq = platform_get_irq(pdev, 0); if (host->irq < 0) { - ret = -EINVAL; + ret = host->irq; goto host_free; }
From: Yangtao Li tiny.windzz@gmail.com
[ Upstream commit 0a337eb168d6cbb85f6b4eb56d1be55e24c80452 ]
Use devm_platform_ioremap_resource() to simplify code.
Signed-off-by: Yangtao Li tiny.windzz@gmail.com Link: https://lore.kernel.org/r/20191215175120.3290-11-tiny.windzz@gmail.com Signed-off-by: Ulf Hansson ulf.hansson@linaro.org Stable-dep-of: 8d84064da0d4 ("mmc: mvsdio: fix deferred probing") Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/mmc/host/mvsdio.c | 6 ++---- 1 file changed, 2 insertions(+), 4 deletions(-)
diff --git a/drivers/mmc/host/mvsdio.c b/drivers/mmc/host/mvsdio.c index 74a0a7fbbf7fd..203b617126014 100644 --- a/drivers/mmc/host/mvsdio.c +++ b/drivers/mmc/host/mvsdio.c @@ -696,16 +696,14 @@ static int mvsd_probe(struct platform_device *pdev) struct mmc_host *mmc = NULL; struct mvsd_host *host = NULL; const struct mbus_dram_target_info *dram; - struct resource *r; int ret, irq;
if (!np) { dev_err(&pdev->dev, "no DT node\n"); return -ENODEV; } - r = platform_get_resource(pdev, IORESOURCE_MEM, 0); irq = platform_get_irq(pdev, 0); - if (!r || irq < 0) + if (irq < 0) return -ENXIO;
mmc = mmc_alloc_host(sizeof(struct mvsd_host), &pdev->dev); @@ -758,7 +756,7 @@ static int mvsd_probe(struct platform_device *pdev)
spin_lock_init(&host->lock);
- host->base = devm_ioremap_resource(&pdev->dev, r); + host->base = devm_platform_ioremap_resource(pdev, 0); if (IS_ERR(host->base)) { ret = PTR_ERR(host->base); goto out;
From: Sergey Shtylyov s.shtylyov@omp.ru
[ Upstream commit 8d84064da0d4672e74f984e8710f27881137472c ]
The driver overrides the error codes returned by platform_get_irq() to -ENXIO, so if it returns -EPROBE_DEFER, the driver will fail the probe permanently instead of the deferred probing. Switch to propagating the error codes upstream.
Fixes: 9ec36cafe43b ("of/irq: do irq resolution in platform_get_irq") Signed-off-by: Sergey Shtylyov s.shtylyov@omp.ru Link: https://lore.kernel.org/r/20230617203622.6812-5-s.shtylyov@omp.ru Signed-off-by: Ulf Hansson ulf.hansson@linaro.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/mmc/host/mvsdio.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/mmc/host/mvsdio.c b/drivers/mmc/host/mvsdio.c index 203b617126014..0dfcf7bea9ffa 100644 --- a/drivers/mmc/host/mvsdio.c +++ b/drivers/mmc/host/mvsdio.c @@ -704,7 +704,7 @@ static int mvsd_probe(struct platform_device *pdev) } irq = platform_get_irq(pdev, 0); if (irq < 0) - return -ENXIO; + return irq;
mmc = mmc_alloc_host(sizeof(struct mvsd_host), &pdev->dev); if (!mmc) {
From: Sergey Shtylyov s.shtylyov@omp.ru
[ Upstream commit aedf4ba1ad00aaa94c1b66c73ecaae95e2564b95 ]
The driver overrides the error codes returned by platform_get_irq() to -ENXIO, so if it returns -EPROBE_DEFER, the driver will fail the probe permanently instead of the deferred probing. Switch to propagating the error codes upstream.
Fixes: 9ec36cafe43b ("of/irq: do irq resolution in platform_get_irq") Signed-off-by: Sergey Shtylyov s.shtylyov@omp.ru Link: https://lore.kernel.org/r/20230617203622.6812-6-s.shtylyov@omp.ru Signed-off-by: Ulf Hansson ulf.hansson@linaro.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/mmc/host/omap.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/mmc/host/omap.c b/drivers/mmc/host/omap.c index d74e73c95fdff..80574040d8fb3 100644 --- a/drivers/mmc/host/omap.c +++ b/drivers/mmc/host/omap.c @@ -1344,7 +1344,7 @@ static int mmc_omap_probe(struct platform_device *pdev)
irq = platform_get_irq(pdev, 0); if (irq < 0) - return -ENXIO; + return irq;
res = platform_get_resource(pdev, IORESOURCE_MEM, 0); host->virt_base = devm_ioremap_resource(&pdev->dev, res);
From: Sergey Shtylyov s.shtylyov@omp.ru
[ Upstream commit fb51b74a57859b707c3e8055ed0c25a7ca4f6a29 ]
The driver overrides the error codes returned by platform_get_irq() to -ENXIO, so if it returns -EPROBE_DEFER, the driver will fail the probe permanently instead of the deferred probing. Switch to propagating the error codes upstream.
Fixes: 9ec36cafe43b ("of/irq: do irq resolution in platform_get_irq") Signed-off-by: Sergey Shtylyov s.shtylyov@omp.ru Link: https://lore.kernel.org/r/20230617203622.6812-7-s.shtylyov@omp.ru Signed-off-by: Ulf Hansson ulf.hansson@linaro.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/mmc/host/omap_hsmmc.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/drivers/mmc/host/omap_hsmmc.c b/drivers/mmc/host/omap_hsmmc.c index ee9edf817a326..aef2253ed5c81 100644 --- a/drivers/mmc/host/omap_hsmmc.c +++ b/drivers/mmc/host/omap_hsmmc.c @@ -1843,9 +1843,11 @@ static int omap_hsmmc_probe(struct platform_device *pdev) }
res = platform_get_resource(pdev, IORESOURCE_MEM, 0); - irq = platform_get_irq(pdev, 0); - if (res == NULL || irq < 0) + if (!res) return -ENXIO; + irq = platform_get_irq(pdev, 0); + if (irq < 0) + return irq;
base = devm_ioremap_resource(&pdev->dev, res); if (IS_ERR(base))
From: Sergey Shtylyov s.shtylyov@omp.ru
[ Upstream commit b465dea5e1540c7d7b5211adaf94926980d3014b ]
The driver overrides the error codes returned by platform_get_irq() to -EINVAL, so if it returns -EPROBE_DEFER, the driver will fail the probe permanently instead of the deferred probing. Switch to propagating the error codes upstream.
Fixes: 1b7ba57ecc86 ("mmc: sdhci-acpi: Handle return value of platform_get_irq") Signed-off-by: Sergey Shtylyov s.shtylyov@omp.ru Acked-by: Adrian Hunter adrian.hunter@intel.com Link: https://lore.kernel.org/r/20230617203622.6812-9-s.shtylyov@omp.ru Signed-off-by: Ulf Hansson ulf.hansson@linaro.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/mmc/host/sdhci-acpi.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/mmc/host/sdhci-acpi.c b/drivers/mmc/host/sdhci-acpi.c index 0dc8eafdc81d0..a02f3538d5611 100644 --- a/drivers/mmc/host/sdhci-acpi.c +++ b/drivers/mmc/host/sdhci-acpi.c @@ -835,7 +835,7 @@ static int sdhci_acpi_probe(struct platform_device *pdev) host->ops = &sdhci_acpi_ops_dflt; host->irq = platform_get_irq(pdev, 0); if (host->irq < 0) { - err = -EINVAL; + err = host->irq; goto err_free; }
From: Sergey Shtylyov s.shtylyov@omp.ru
[ Upstream commit 5b067d7f855c61df7f8e2e8ccbcee133c282415e ]
The driver overrides the error codes returned by platform_get_irq() to -ENXIO, so if it returns -EPROBE_DEFER, the driver will fail the probe permanently instead of the deferred probing. Switch to propagating the error codes upstream.
Fixes: 9ec36cafe43b ("of/irq: do irq resolution in platform_get_irq") Signed-off-by: Sergey Shtylyov s.shtylyov@omp.ru Link: https://lore.kernel.org/r/20230617203622.6812-11-s.shtylyov@omp.ru Signed-off-by: Ulf Hansson ulf.hansson@linaro.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/mmc/host/sh_mmcif.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/mmc/host/sh_mmcif.c b/drivers/mmc/host/sh_mmcif.c index 98c575de43c75..ed18c233c7864 100644 --- a/drivers/mmc/host/sh_mmcif.c +++ b/drivers/mmc/host/sh_mmcif.c @@ -1395,7 +1395,7 @@ static int sh_mmcif_probe(struct platform_device *pdev) irq[0] = platform_get_irq(pdev, 0); irq[1] = platform_get_irq_optional(pdev, 1); if (irq[0] < 0) - return -ENXIO; + return irq[0];
res = platform_get_resource(pdev, IORESOURCE_MEM, 0); reg = devm_ioremap_resource(dev, res);
From: Sergey Shtylyov s.shtylyov@omp.ru
[ Upstream commit 413db499730248431c1005b392e8ed82c4fa19bf ]
The driver overrides the error codes returned by platform_get_irq_byname() to -ENODEV, so if it returns -EPROBE_DEFER, the driver will fail the probe permanently instead of the deferred probing. Switch to propagating error codes upstream.
Fixes: 9ec36cafe43b ("of/irq: do irq resolution in platform_get_irq") Signed-off-by: Sergey Shtylyov s.shtylyov@omp.ru Link: https://lore.kernel.org/r/20230617203622.6812-13-s.shtylyov@omp.ru Signed-off-by: Ulf Hansson ulf.hansson@linaro.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/mmc/host/usdhi6rol0.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/drivers/mmc/host/usdhi6rol0.c b/drivers/mmc/host/usdhi6rol0.c index 96b0f81a20322..56d131183c1e4 100644 --- a/drivers/mmc/host/usdhi6rol0.c +++ b/drivers/mmc/host/usdhi6rol0.c @@ -1743,8 +1743,10 @@ static int usdhi6_probe(struct platform_device *pdev) irq_cd = platform_get_irq_byname(pdev, "card detect"); irq_sd = platform_get_irq_byname(pdev, "data"); irq_sdio = platform_get_irq_byname(pdev, "SDIO"); - if (irq_sd < 0 || irq_sdio < 0) - return -ENODEV; + if (irq_sd < 0) + return irq_sd; + if (irq_sdio < 0) + return irq_sdio;
mmc = mmc_alloc_host(sizeof(struct usdhi6_host), dev); if (!mmc)
From: Terin Stock terin@cloudflare.com
[ Upstream commit d7fce52fdf96663ddc2eb21afecff3775588612a ]
When using encapsulation the original packet's headers are copied to the inner headers. This preserves the space for an inner mac header, which is not used by the inner payloads for the encapsulation types supported by IPVS. If a packet is using GUE or GRE encapsulation and needs to be segmented, flow can be passed to __skb_udp_tunnel_segment() which calculates a negative tunnel header length. A negative tunnel header length causes pskb_may_pull() to fail, dropping the packet.
This can be observed by attaching probes to ip_vs_in_hook(), __dev_queue_xmit(), and __skb_udp_tunnel_segment():
perf probe --add '__dev_queue_xmit skb->inner_mac_header \ skb->inner_network_header skb->mac_header skb->network_header' perf probe --add '__skb_udp_tunnel_segment:7 tnl_hlen' perf probe -m ip_vs --add 'ip_vs_in_hook skb->inner_mac_header \ skb->inner_network_header skb->mac_header skb->network_header'
These probes the headers and tunnel header length for packets which traverse the IPVS encapsulation path. A TCP packet can be forced into the segmentation path by being smaller than a calculated clamped MSS, but larger than the advertised MSS.
probe:ip_vs_in_hook: inner_mac_header=0x0 inner_network_header=0x0 mac_header=0x44 network_header=0x52 probe:ip_vs_in_hook: inner_mac_header=0x44 inner_network_header=0x52 mac_header=0x44 network_header=0x32 probe:dev_queue_xmit: inner_mac_header=0x44 inner_network_header=0x52 mac_header=0x44 network_header=0x32 probe:__skb_udp_tunnel_segment_L7: tnl_hlen=-2
When using veth-based encapsulation, the interfaces are set to be mac-less, which does not preserve space for an inner mac header. This prevents this issue from occurring.
In our real-world testing of sending a 32KB file we observed operation time increasing from ~75ms for veth-based encapsulation to over 1.5s using IPVS encapsulation due to retries from dropped packets.
This changeset modifies the packet on the encapsulation path in ip_vs_tunnel_xmit() and ip_vs_tunnel_xmit_v6() to remove the inner mac header offset. This fixes UDP segmentation for both encapsulation types, and corrects the inner headers for any IPIP flows that may use it.
Fixes: 84c0d5e96f3a ("ipvs: allow tunneling with gue encapsulation") Signed-off-by: Terin Stock terin@cloudflare.com Acked-by: Julian Anastasov ja@ssi.bg Acked-by: Simon Horman horms@kernel.org Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/netfilter/ipvs/ip_vs_xmit.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/net/netfilter/ipvs/ip_vs_xmit.c b/net/netfilter/ipvs/ip_vs_xmit.c index cefc39878b1a4..43ef3e25ea7d9 100644 --- a/net/netfilter/ipvs/ip_vs_xmit.c +++ b/net/netfilter/ipvs/ip_vs_xmit.c @@ -1231,6 +1231,7 @@ ip_vs_tunnel_xmit(struct sk_buff *skb, struct ip_vs_conn *cp, skb->transport_header = skb->network_header;
skb_set_inner_ipproto(skb, next_protocol); + skb_set_inner_mac_header(skb, skb_inner_network_offset(skb));
if (tun_type == IP_VS_CONN_F_TUNNEL_TYPE_GUE) { bool check = false; @@ -1379,6 +1380,7 @@ ip_vs_tunnel_xmit_v6(struct sk_buff *skb, struct ip_vs_conn *cp, skb->transport_header = skb->network_header;
skb_set_inner_ipproto(skb, next_protocol); + skb_set_inner_mac_header(skb, skb_inner_network_offset(skb));
if (tun_type == IP_VS_CONN_F_TUNNEL_TYPE_GUE) { bool check = false;
From: Arınç ÜNAL arinc.unal@arinc9.com
[ Upstream commit 4ae90f90e4909e3014e2dc6a0627964617a7b824 ]
All MT7530 switch IP variants share the MT7530_MFC register, but the current driver only writes it for the switch variant that is integrated in the MT7621 SoC. Modify the code to include all MT7530 derivatives.
Fixes: b8f126a8d543 ("net-next: dsa: add dsa support for Mediatek MT7530 switch") Suggested-by: Vladimir Oltean olteanv@gmail.com Signed-off-by: Arınç ÜNAL arinc.unal@arinc9.com Reviewed-by: Vladimir Oltean olteanv@gmail.com Reviewed-by: Russell King (Oracle) rmk+kernel@armlinux.org.uk Reviewed-by: Florian Fainelli florian.fainelli@broadcom.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/dsa/mt7530.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/dsa/mt7530.c b/drivers/net/dsa/mt7530.c index baa994b7f78b5..935004f5f5fe7 100644 --- a/drivers/net/dsa/mt7530.c +++ b/drivers/net/dsa/mt7530.c @@ -644,7 +644,7 @@ mt7530_cpu_port_enable(struct mt7530_priv *priv, mt7530_rmw(priv, MT7530_MFC, UNM_FFP_MASK, UNM_FFP(BIT(port)));
/* Set CPU port number */ - if (priv->id == ID_MT7621) + if (priv->id == ID_MT7530 || priv->id == ID_MT7621) mt7530_rmw(priv, MT7530_MFC, CPU_MASK, CPU_EN | CPU_PORT(port));
/* CPU port gets connected to all user ports of
From: Ross Lagerwall ross.lagerwall@citrix.com
[ Upstream commit 7580e0a78eb29e7bb1a772eba4088250bbb70d41 ]
We have seen a bug where the NIC incorrectly changes the length in the IP header of a padded packet to include the padding bytes. The driver already has a workaround for this so do the workaround for this NIC too. This resolves the issue.
The NIC in question identifies itself as follows:
[ 8.828494] be2net 0000:02:00.0: FW version is 10.7.110.31 [ 8.834759] be2net 0000:02:00.0: Emulex OneConnect(be3): PF FLEX10 port 1
02:00.0 Ethernet controller: Emulex Corporation OneConnect 10Gb NIC (be3) (rev 01)
Fixes: ca34fe38f06d ("be2net: fix wrong usage of adapter->generation") Signed-off-by: Ross Lagerwall ross.lagerwall@citrix.com Link: https://lore.kernel.org/r/20230616164549.2863037-1-ross.lagerwall@citrix.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/emulex/benet/be_main.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/emulex/benet/be_main.c b/drivers/net/ethernet/emulex/benet/be_main.c index 552877590a8ab..f1cce7636722e 100644 --- a/drivers/net/ethernet/emulex/benet/be_main.c +++ b/drivers/net/ethernet/emulex/benet/be_main.c @@ -1137,8 +1137,8 @@ static struct sk_buff *be_lancer_xmit_workarounds(struct be_adapter *adapter, eth_hdr_len = ntohs(skb->protocol) == ETH_P_8021Q ? VLAN_ETH_HLEN : ETH_HLEN; if (skb->len <= 60 && - (lancer_chip(adapter) || skb_vlan_tag_present(skb)) && - is_ipv4_pkt(skb)) { + (lancer_chip(adapter) || BE3_chip(adapter) || + skb_vlan_tag_present(skb)) && is_ipv4_pkt(skb)) { ip = (struct iphdr *)ip_hdr(skb); pskb_trim(skb, eth_hdr_len + ntohs(ip->tot_len)); }
From: Pablo Neira Ayuso pablo@netfilter.org
[ Upstream commit c88c535b592d3baeee74009f3eceeeaf0fdd5e1b ]
Anonymous sets come with NFT_SET_CONSTANT from userspace. Although API allows to create anonymous sets without NFT_SET_CONSTANT, it makes no sense to allow to add and to delete elements for bound anonymous sets.
Fixes: 96518518cc41 ("netfilter: add nftables") Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/netfilter/nf_tables_api.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c index 909076ef157e8..914fbd9ecef96 100644 --- a/net/netfilter/nf_tables_api.c +++ b/net/netfilter/nf_tables_api.c @@ -4804,7 +4804,8 @@ static int nf_tables_newsetelem(struct net *net, struct sock *nlsk, if (IS_ERR(set)) return PTR_ERR(set);
- if (!list_empty(&set->bindings) && set->flags & NFT_SET_CONSTANT) + if (!list_empty(&set->bindings) && + (set->flags & (NFT_SET_CONSTANT | NFT_SET_ANONYMOUS))) return -EBUSY;
nla_for_each_nested(attr, nla[NFTA_SET_ELEM_LIST_ELEMENTS], rem) { @@ -4987,7 +4988,9 @@ static int nf_tables_delsetelem(struct net *net, struct sock *nlsk, set = nft_set_lookup(ctx.table, nla[NFTA_SET_ELEM_LIST_SET], genmask); if (IS_ERR(set)) return PTR_ERR(set); - if (!list_empty(&set->bindings) && set->flags & NFT_SET_CONSTANT) + + if (!list_empty(&set->bindings) && + (set->flags & (NFT_SET_CONSTANT | NFT_SET_ANONYMOUS))) return -EBUSY;
if (nla[NFTA_SET_ELEM_LIST_ELEMENTS] == NULL) {
From: Pablo Neira Ayuso pablo@netfilter.org
[ Upstream commit 62f9a68a36d4441a6c412b81faed102594bc6670 ]
Move the alias from xt_osf to nfnetlink_osf.
Fixes: f9324952088f ("netfilter: nfnetlink_osf: extract nfnetlink_subsystem code from xt_osf.c") Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/netfilter/nfnetlink_osf.c | 1 + net/netfilter/xt_osf.c | 1 - 2 files changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/netfilter/nfnetlink_osf.c b/net/netfilter/nfnetlink_osf.c index 51e3953b414c0..9dbaa5ce24e51 100644 --- a/net/netfilter/nfnetlink_osf.c +++ b/net/netfilter/nfnetlink_osf.c @@ -440,3 +440,4 @@ module_init(nfnl_osf_init); module_exit(nfnl_osf_fini);
MODULE_LICENSE("GPL"); +MODULE_ALIAS_NFNL_SUBSYS(NFNL_SUBSYS_OSF); diff --git a/net/netfilter/xt_osf.c b/net/netfilter/xt_osf.c index e1990baf3a3b7..dc9485854002a 100644 --- a/net/netfilter/xt_osf.c +++ b/net/netfilter/xt_osf.c @@ -71,4 +71,3 @@ MODULE_AUTHOR("Evgeniy Polyakov zbr@ioremap.net"); MODULE_DESCRIPTION("Passive OS fingerprint matching."); MODULE_ALIAS("ipt_osf"); MODULE_ALIAS("ip6t_osf"); -MODULE_ALIAS_NFNL_SUBSYS(NFNL_SUBSYS_OSF);
From: Francesco Dolcini francesco.dolcini@toradex.com
[ Upstream commit a129b41fe0a8b4da828c46b10f5244ca07a3fec3 ]
This reverts commit da9ef50f545f86ffe6ff786174d26500c4db737a.
This fixes a regression in which the link would come up, but no communication was possible.
The reverted commit was also removing a comment about DP83867_PHYCR_FORCE_LINK_GOOD, this is not added back in this commits since it seems that this is unrelated to the original code change.
Closes: https://lore.kernel.org/all/ZGuDJos8D7N0J6Z2@francesco-nb.int.toradex.com/ Fixes: da9ef50f545f ("net: phy: dp83867: perform soft reset and retain established link") Signed-off-by: Francesco Dolcini francesco.dolcini@toradex.com Reviewed-by: Andrew Lunn andrew@lunn.ch Reviewed-by: Praneeth Bajjuri praneeth@ti.com Link: https://lore.kernel.org/r/20230619154435.355485-1-francesco@dolcini.it Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/phy/dp83867.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/phy/dp83867.c b/drivers/net/phy/dp83867.c index c7d91415a4369..1375beb3cf83c 100644 --- a/drivers/net/phy/dp83867.c +++ b/drivers/net/phy/dp83867.c @@ -476,7 +476,7 @@ static int dp83867_phy_reset(struct phy_device *phydev) { int err;
- err = phy_write(phydev, DP83867_CTRL, DP83867_SW_RESTART); + err = phy_write(phydev, DP83867_CTRL, DP83867_SW_RESET); if (err < 0) return err;
From: Eric Dumazet edumazet@google.com
[ Upstream commit 2174a08db80d1efeea382e25ac41c4e7511eb6d6 ]
syzbot managed to trigger a divide error [1] in netem.
It could happen if q->rate changes while netem_enqueue() is running, since q->rate is read twice.
It turns out netem_change() always lacked proper synchronization.
[1] divide error: 0000 [#1] SMP KASAN CPU: 1 PID: 7867 Comm: syz-executor.1 Not tainted 6.1.30-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 05/25/2023 RIP: 0010:div64_u64 include/linux/math64.h:69 [inline] RIP: 0010:packet_time_ns net/sched/sch_netem.c:357 [inline] RIP: 0010:netem_enqueue+0x2067/0x36d0 net/sched/sch_netem.c:576 Code: 89 e2 48 69 da 00 ca 9a 3b 42 80 3c 28 00 4c 8b a4 24 88 00 00 00 74 0d 4c 89 e7 e8 c3 4f 3b fd 48 8b 4c 24 18 48 89 d8 31 d2 <49> f7 34 24 49 01 c7 4c 8b 64 24 48 4d 01 f7 4c 89 e3 48 c1 eb 03 RSP: 0018:ffffc9000dccea60 EFLAGS: 00010246 RAX: 000001a442624200 RBX: 000001a442624200 RCX: ffff888108a4f000 RDX: 0000000000000000 RSI: 000000000000070d RDI: 000000000000070d RBP: ffffc9000dcceb90 R08: ffffffff849c5e26 R09: fffffbfff10e1297 R10: 0000000000000000 R11: dffffc0000000001 R12: ffff888108a4f358 R13: dffffc0000000000 R14: 0000001a8cd9a7ec R15: 0000000000000000 FS: 00007fa73fe18700(0000) GS:ffff8881f6b00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fa73fdf7718 CR3: 000000011d36e000 CR4: 0000000000350ee0 Call Trace: <TASK> [<ffffffff84714385>] __dev_xmit_skb net/core/dev.c:3931 [inline] [<ffffffff84714385>] __dev_queue_xmit+0xcf5/0x3370 net/core/dev.c:4290 [<ffffffff84d22df2>] dev_queue_xmit include/linux/netdevice.h:3030 [inline] [<ffffffff84d22df2>] neigh_hh_output include/net/neighbour.h:531 [inline] [<ffffffff84d22df2>] neigh_output include/net/neighbour.h:545 [inline] [<ffffffff84d22df2>] ip_finish_output2+0xb92/0x10d0 net/ipv4/ip_output.c:235 [<ffffffff84d21e63>] __ip_finish_output+0xc3/0x2b0 [<ffffffff84d10a81>] ip_finish_output+0x31/0x2a0 net/ipv4/ip_output.c:323 [<ffffffff84d10f14>] NF_HOOK_COND include/linux/netfilter.h:298 [inline] [<ffffffff84d10f14>] ip_output+0x224/0x2a0 net/ipv4/ip_output.c:437 [<ffffffff84d123b5>] dst_output include/net/dst.h:444 [inline] [<ffffffff84d123b5>] ip_local_out net/ipv4/ip_output.c:127 [inline] [<ffffffff84d123b5>] __ip_queue_xmit+0x1425/0x2000 net/ipv4/ip_output.c:542 [<ffffffff84d12fdc>] ip_queue_xmit+0x4c/0x70 net/ipv4/ip_output.c:556
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Reported-by: syzbot syzkaller@googlegroups.com Signed-off-by: Eric Dumazet edumazet@google.com Cc: Stephen Hemminger stephen@networkplumber.org Cc: Jamal Hadi Salim jhs@mojatatu.com Cc: Cong Wang xiyou.wangcong@gmail.com Cc: Jiri Pirko jiri@resnulli.us Reviewed-by: Jamal Hadi Salim jhs@mojatatu.com Reviewed-by: Simon Horman simon.horman@corigine.com Link: https://lore.kernel.org/r/20230620184425.1179809-1-edumazet@google.com Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- net/sched/sch_netem.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/net/sched/sch_netem.c b/net/sched/sch_netem.c index 1802f134aa407..69034c8cc3b86 100644 --- a/net/sched/sch_netem.c +++ b/net/sched/sch_netem.c @@ -969,6 +969,7 @@ static int netem_change(struct Qdisc *sch, struct nlattr *opt, if (ret < 0) return ret;
+ sch_tree_lock(sch); /* backup q->clg and q->loss_model */ old_clg = q->clg; old_loss_model = q->loss_model; @@ -977,7 +978,7 @@ static int netem_change(struct Qdisc *sch, struct nlattr *opt, ret = get_loss_clg(q, tb[TCA_NETEM_LOSS]); if (ret) { q->loss_model = old_loss_model; - return ret; + goto unlock; } } else { q->loss_model = CLG_RANDOM; @@ -1044,6 +1045,8 @@ static int netem_change(struct Qdisc *sch, struct nlattr *opt, /* capping jitter to the range acceptable by tabledist() */ q->jitter = min_t(s64, abs(q->jitter), INT_MAX);
+unlock: + sch_tree_unlock(sch); return ret;
get_table_failure: @@ -1053,7 +1056,8 @@ static int netem_change(struct Qdisc *sch, struct nlattr *opt, */ q->clg = old_clg; q->loss_model = old_loss_model; - return ret; + + goto unlock; }
static int netem_init(struct Qdisc *sch, struct nlattr *opt,
From: Maurizio Lombardi mlombard@redhat.com
[ Upstream commit 2a737d3b8c792400118d6cf94958f559de9c5e59 ]
The tpg->np_login_sem is a semaphore that is used to serialize the login process when multiple login threads run concurrently against the same target portal group.
The iscsi_target_locate_portal() function finds the tpg, calls iscsit_access_np() against the np_login_sem semaphore and saves the tpg pointer in conn->tpg;
If iscsi_target_locate_portal() fails, the caller will check for the conn->tpg pointer and, if it's not NULL, then it will assume that iscsi_target_locate_portal() called iscsit_access_np() on the semaphore.
Make sure that conn->tpg gets initialized only if iscsit_access_np() was successful, otherwise iscsit_deaccess_np() may end up being called against a semaphore we never took, allowing more than one thread to access the same tpg.
Signed-off-by: Maurizio Lombardi mlombard@redhat.com Link: https://lore.kernel.org/r/20230508162219.1731964-4-mlombard@redhat.com Reviewed-by: Mike Christie michael.christie@oracle.com Signed-off-by: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/target/iscsi/iscsi_target_nego.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/target/iscsi/iscsi_target_nego.c b/drivers/target/iscsi/iscsi_target_nego.c index e32d93b927428..4017464a5909b 100644 --- a/drivers/target/iscsi/iscsi_target_nego.c +++ b/drivers/target/iscsi/iscsi_target_nego.c @@ -1053,6 +1053,7 @@ int iscsi_target_locate_portal( iscsi_target_set_sock_callbacks(conn);
login->np = np; + conn->tpg = NULL;
login_req = (struct iscsi_login_req *) login->req; payload_length = ntoh24(login_req->dlength); @@ -1122,7 +1123,6 @@ int iscsi_target_locate_portal( */ sessiontype = strncmp(s_buf, DISCOVERY, 9); if (!sessiontype) { - conn->tpg = iscsit_global->discovery_tpg; if (!login->leading_connection) goto get_target;
@@ -1139,9 +1139,11 @@ int iscsi_target_locate_portal( * Serialize access across the discovery struct iscsi_portal_group to * process login attempt. */ + conn->tpg = iscsit_global->discovery_tpg; if (iscsit_access_np(np, conn->tpg) < 0) { iscsit_tx_login_rsp(conn, ISCSI_STATUS_CLS_TARGET_ERR, ISCSI_LOGIN_STATUS_SVC_UNAVAILABLE); + conn->tpg = NULL; ret = -1; goto out; }
From: Denis Arefev arefev@swemel.ru
[ Upstream commit 16a9c24f24fbe4564284eb575b18cc20586b9270 ]
Added a variable check and transition in case of an error
Found by Linux Verification Center (linuxtesting.org) with SVACE.
Signed-off-by: Denis Arefev arefev@swemel.ru Reviewed-by: Ping Cheng ping.cheng@wacom.com Signed-off-by: Jiri Kosina jkosina@suse.cz Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/hid/wacom_sys.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/drivers/hid/wacom_sys.c b/drivers/hid/wacom_sys.c index a93070f5b214c..36cb456709ed7 100644 --- a/drivers/hid/wacom_sys.c +++ b/drivers/hid/wacom_sys.c @@ -2419,8 +2419,13 @@ static int wacom_parse_and_register(struct wacom *wacom, bool wireless) goto fail_quirks; }
- if (features->device_type & WACOM_DEVICETYPE_WL_MONITOR) + if (features->device_type & WACOM_DEVICETYPE_WL_MONITOR) { error = hid_hw_open(hdev); + if (error) { + hid_err(hdev, "hw open failed\n"); + goto fail_quirks; + } + }
wacom_set_shared_values(wacom_wac); devres_close_group(&hdev->dev, wacom);
From: Marc Zyngier maz@kernel.org
[ Upstream commit 8d0f019e4c4f2ee2de81efd9bf1c27e9fb3c0460 ]
Add the missing Set/Way CMOs that apply to tagged memory.
Signed-off-by: Marc Zyngier maz@kernel.org Reviewed-by: Cornelia Huck cohuck@redhat.com Reviewed-by: Steven Price steven.price@arm.com Reviewed-by: Oliver Upton oliver.upton@linux.dev Link: https://lore.kernel.org/r/20230515204601.1270428-2-maz@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- arch/arm64/include/asm/sysreg.h | 6 ++++++ 1 file changed, 6 insertions(+)
diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h index 5b3bdad66b27e..1ff93878c8132 100644 --- a/arch/arm64/include/asm/sysreg.h +++ b/arch/arm64/include/asm/sysreg.h @@ -102,8 +102,14 @@ #define SB_BARRIER_INSN __SYS_BARRIER_INSN(0, 7, 31)
#define SYS_DC_ISW sys_insn(1, 0, 7, 6, 2) +#define SYS_DC_IGSW sys_insn(1, 0, 7, 6, 4) +#define SYS_DC_IGDSW sys_insn(1, 0, 7, 6, 6) #define SYS_DC_CSW sys_insn(1, 0, 7, 10, 2) +#define SYS_DC_CGSW sys_insn(1, 0, 7, 10, 4) +#define SYS_DC_CGDSW sys_insn(1, 0, 7, 10, 6) #define SYS_DC_CISW sys_insn(1, 0, 7, 14, 2) +#define SYS_DC_CIGSW sys_insn(1, 0, 7, 14, 4) +#define SYS_DC_CIGDSW sys_insn(1, 0, 7, 14, 6)
#define SYS_OSDTRRX_EL1 sys_reg(2, 0, 0, 0, 2) #define SYS_MDCCINT_EL1 sys_reg(2, 0, 0, 2, 0)
From: Hans Verkuil hverkuil-cisco@xs4all.nl
[ Upstream commit 73af6c7511038249cad3d5f3b44bf8d78ac0f499 ]
When a message was received the last_initiator is set to 0xff. This will force the signal free time for the next transmit to that for a new initiator. However, if a new transmit is already in progress, then don't set last_initiator, since that's the initiator of the current transmit. Overwriting this would cause the signal free time of a following transmit to be that of the new initiator instead of a next transmit.
Signed-off-by: Hans Verkuil hverkuil-cisco@xs4all.nl Signed-off-by: Mauro Carvalho Chehab mchehab@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/media/cec/cec-adap.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/media/cec/cec-adap.c b/drivers/media/cec/cec-adap.c index c665f7d20c448..4c1770b8128cb 100644 --- a/drivers/media/cec/cec-adap.c +++ b/drivers/media/cec/cec-adap.c @@ -1077,7 +1077,8 @@ void cec_received_msg_ts(struct cec_adapter *adap, mutex_lock(&adap->lock); dprintk(2, "%s: %*ph\n", __func__, msg->len, msg->msg);
- adap->last_initiator = 0xff; + if (!adap->transmit_in_progress) + adap->last_initiator = 0xff;
/* Check if this message was for us (directed or broadcast). */ if (!cec_msg_is_broadcast(msg))
From: Osama Muhammad osmtendev@gmail.com
[ Upstream commit 9b9e46aa07273ceb96866b2e812b46f1ee0b8d2f ]
This patch fixes the error checking in nfcsim.c. The DebugFS kernel API is developed in a way that the caller can safely ignore the errors that occur during the creation of DebugFS nodes.
Signed-off-by: Osama Muhammad osmtendev@gmail.com Reviewed-by: Simon Horman simon.horman@corigine.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/nfc/nfcsim.c | 4 ---- 1 file changed, 4 deletions(-)
diff --git a/drivers/nfc/nfcsim.c b/drivers/nfc/nfcsim.c index dd27c85190d34..b42d386350b72 100644 --- a/drivers/nfc/nfcsim.c +++ b/drivers/nfc/nfcsim.c @@ -336,10 +336,6 @@ static struct dentry *nfcsim_debugfs_root; static void nfcsim_debugfs_init(void) { nfcsim_debugfs_root = debugfs_create_dir("nfcsim", NULL); - - if (!nfcsim_debugfs_root) - pr_err("Could not create debugfs entry\n"); - }
static void nfcsim_debugfs_remove(void)
From: Dan Carpenter dan.carpenter@linaro.org
[ Upstream commit 016da9c65fec9f0e78c4909ed9a0f2d567af6775 ]
The "udc" pointer was never set in the probe() function so it will lead to a NULL dereference in udc_pci_remove() when we do:
usb_del_gadget_udc(&udc->gadget);
Signed-off-by: Dan Carpenter dan.carpenter@linaro.org Link: https://lore.kernel.org/r/ZG+A/dNpFWAlCChk@kili Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/usb/gadget/udc/amd5536udc_pci.c | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/drivers/usb/gadget/udc/amd5536udc_pci.c b/drivers/usb/gadget/udc/amd5536udc_pci.c index 362284057d307..a3d15c3fb82a9 100644 --- a/drivers/usb/gadget/udc/amd5536udc_pci.c +++ b/drivers/usb/gadget/udc/amd5536udc_pci.c @@ -171,6 +171,9 @@ static int udc_pci_probe( retval = -ENODEV; goto err_probe; } + + udc = dev; + return 0;
err_probe:
From: Vineeth Vijayan vneethv@linux.ibm.com
[ Upstream commit 89c0c62e947a01e7a36b54582fd9c9e346170255 ]
Currently, if the device is offline and all the channel paths are either configured or varied offline, the associated subchannel gets unregistered. Don't unregister the subchannel, instead unregister offline device.
Signed-off-by: Vineeth Vijayan vneethv@linux.ibm.com Reviewed-by: Peter Oberparleiter oberpar@linux.ibm.com Signed-off-by: Alexander Gordeev agordeev@linux.ibm.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/s390/cio/device.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/drivers/s390/cio/device.c b/drivers/s390/cio/device.c index 23e9227e60fd7..d7ca75efb49fb 100644 --- a/drivers/s390/cio/device.c +++ b/drivers/s390/cio/device.c @@ -1385,6 +1385,7 @@ void ccw_device_set_notoper(struct ccw_device *cdev) enum io_sch_action { IO_SCH_UNREG, IO_SCH_ORPH_UNREG, + IO_SCH_UNREG_CDEV, IO_SCH_ATTACH, IO_SCH_UNREG_ATTACH, IO_SCH_ORPH_ATTACH, @@ -1417,7 +1418,7 @@ static enum io_sch_action sch_get_action(struct subchannel *sch) } if ((sch->schib.pmcw.pam & sch->opm) == 0) { if (ccw_device_notify(cdev, CIO_NO_PATH) != NOTIFY_OK) - return IO_SCH_UNREG; + return IO_SCH_UNREG_CDEV; return IO_SCH_DISC; } if (device_is_disconnected(cdev)) @@ -1479,6 +1480,7 @@ static int io_subchannel_sch_event(struct subchannel *sch, int process) case IO_SCH_ORPH_ATTACH: ccw_device_set_disconnected(cdev); break; + case IO_SCH_UNREG_CDEV: case IO_SCH_UNREG_ATTACH: case IO_SCH_UNREG: if (!cdev) @@ -1512,6 +1514,7 @@ static int io_subchannel_sch_event(struct subchannel *sch, int process) if (rc) goto out; break; + case IO_SCH_UNREG_CDEV: case IO_SCH_UNREG_ATTACH: spin_lock_irqsave(sch->lock, flags); if (cdev->private->flags.resuming) {
From: Edson Juliano Drosdeck edson.drosdeck@gmail.com
[ Upstream commit e384dba03e3294ce7ea69e4da558e9bf8f0e8946 ]
Add entries for Positivo laptops: CW14Q01P, K1424G, N14ZP74G to the DMI table, so that active-high jack-detect will work properly on these laptops.
Signed-off-by: Edson Juliano Drosdeck edson.drosdeck@gmail.com Link: https://lore.kernel.org/r/20230529181911.632851-1-edson.drosdeck@gmail.com Signed-off-by: Mark Brown broonie@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- sound/soc/codecs/nau8824.c | 24 ++++++++++++++++++++++++ 1 file changed, 24 insertions(+)
diff --git a/sound/soc/codecs/nau8824.c b/sound/soc/codecs/nau8824.c index a95fe3fff1db8..9b22219a76937 100644 --- a/sound/soc/codecs/nau8824.c +++ b/sound/soc/codecs/nau8824.c @@ -1896,6 +1896,30 @@ static const struct dmi_system_id nau8824_quirk_table[] = { }, .driver_data = (void *)(NAU8824_JD_ACTIVE_HIGH), }, + { + /* Positivo CW14Q01P */ + .matches = { + DMI_MATCH(DMI_SYS_VENDOR, "Positivo Tecnologia SA"), + DMI_MATCH(DMI_BOARD_NAME, "CW14Q01P"), + }, + .driver_data = (void *)(NAU8824_JD_ACTIVE_HIGH), + }, + { + /* Positivo K1424G */ + .matches = { + DMI_MATCH(DMI_SYS_VENDOR, "Positivo Tecnologia SA"), + DMI_MATCH(DMI_BOARD_NAME, "K1424G"), + }, + .driver_data = (void *)(NAU8824_JD_ACTIVE_HIGH), + }, + { + /* Positivo N14ZP74G */ + .matches = { + DMI_MATCH(DMI_SYS_VENDOR, "Positivo Tecnologia SA"), + DMI_MATCH(DMI_BOARD_NAME, "N14ZP74G"), + }, + .driver_data = (void *)(NAU8824_JD_ACTIVE_HIGH), + }, {} };
From: Linus Walleij linus.walleij@linaro.org
[ Upstream commit 4a672d500bfd6bb87092c33d5a2572c3d0a1cf83 ]
Several device tree files get the polarity of the pendown-gpios wrong: this signal is active low. Fix up all incorrect flags, so that operating systems can rely on the flag being correctly set.
Signed-off-by: Linus Walleij linus.walleij@linaro.org Link: https://lore.kernel.org/r/20230510105156.1134320-1-linus.walleij@linaro.org Signed-off-by: Arnd Bergmann arnd@arndb.de Signed-off-by: Sasha Levin sashal@kernel.org --- arch/arm/boot/dts/am57xx-cl-som-am57x.dts | 2 +- arch/arm/boot/dts/at91sam9261ek.dts | 2 +- arch/arm/boot/dts/imx7d-pico-hobbit.dts | 2 +- arch/arm/boot/dts/imx7d-sdb.dts | 2 +- arch/arm/boot/dts/omap3-cm-t3x.dtsi | 2 +- arch/arm/boot/dts/omap3-devkit8000-lcd-common.dtsi | 2 +- arch/arm/boot/dts/omap3-lilly-a83x.dtsi | 2 +- arch/arm/boot/dts/omap3-overo-common-lcd35.dtsi | 2 +- arch/arm/boot/dts/omap3-overo-common-lcd43.dtsi | 2 +- arch/arm/boot/dts/omap3-pandora-common.dtsi | 2 +- arch/arm/boot/dts/omap5-cm-t54.dts | 2 +- 11 files changed, 11 insertions(+), 11 deletions(-)
diff --git a/arch/arm/boot/dts/am57xx-cl-som-am57x.dts b/arch/arm/boot/dts/am57xx-cl-som-am57x.dts index e86d4795e0244..e542a9f59e936 100644 --- a/arch/arm/boot/dts/am57xx-cl-som-am57x.dts +++ b/arch/arm/boot/dts/am57xx-cl-som-am57x.dts @@ -527,7 +527,7 @@
interrupt-parent = <&gpio1>; interrupts = <31 0>; - pendown-gpio = <&gpio1 31 0>; + pendown-gpio = <&gpio1 31 GPIO_ACTIVE_LOW>;
ti,x-min = /bits/ 16 <0x0>; diff --git a/arch/arm/boot/dts/at91sam9261ek.dts b/arch/arm/boot/dts/at91sam9261ek.dts index c4ef74fea97c2..ee90ea09e781f 100644 --- a/arch/arm/boot/dts/at91sam9261ek.dts +++ b/arch/arm/boot/dts/at91sam9261ek.dts @@ -156,7 +156,7 @@ compatible = "ti,ads7843"; interrupts-extended = <&pioC 2 IRQ_TYPE_EDGE_BOTH>; spi-max-frequency = <3000000>; - pendown-gpio = <&pioC 2 GPIO_ACTIVE_HIGH>; + pendown-gpio = <&pioC 2 GPIO_ACTIVE_LOW>;
ti,x-min = /bits/ 16 <150>; ti,x-max = /bits/ 16 <3830>; diff --git a/arch/arm/boot/dts/imx7d-pico-hobbit.dts b/arch/arm/boot/dts/imx7d-pico-hobbit.dts index d917dc4f2f227..6ad39dca70096 100644 --- a/arch/arm/boot/dts/imx7d-pico-hobbit.dts +++ b/arch/arm/boot/dts/imx7d-pico-hobbit.dts @@ -64,7 +64,7 @@ interrupt-parent = <&gpio2>; interrupts = <7 0>; spi-max-frequency = <1000000>; - pendown-gpio = <&gpio2 7 0>; + pendown-gpio = <&gpio2 7 GPIO_ACTIVE_LOW>; vcc-supply = <®_3p3v>; ti,x-min = /bits/ 16 <0>; ti,x-max = /bits/ 16 <4095>; diff --git a/arch/arm/boot/dts/imx7d-sdb.dts b/arch/arm/boot/dts/imx7d-sdb.dts index 363d1f57a608b..88e62801b82e1 100644 --- a/arch/arm/boot/dts/imx7d-sdb.dts +++ b/arch/arm/boot/dts/imx7d-sdb.dts @@ -176,7 +176,7 @@ pinctrl-0 = <&pinctrl_tsc2046_pendown>; interrupt-parent = <&gpio2>; interrupts = <29 0>; - pendown-gpio = <&gpio2 29 GPIO_ACTIVE_HIGH>; + pendown-gpio = <&gpio2 29 GPIO_ACTIVE_LOW>; touchscreen-max-pressure = <255>; wakeup-source; }; diff --git a/arch/arm/boot/dts/omap3-cm-t3x.dtsi b/arch/arm/boot/dts/omap3-cm-t3x.dtsi index cdb632df152a1..11ae189441954 100644 --- a/arch/arm/boot/dts/omap3-cm-t3x.dtsi +++ b/arch/arm/boot/dts/omap3-cm-t3x.dtsi @@ -227,7 +227,7 @@
interrupt-parent = <&gpio2>; interrupts = <25 0>; /* gpio_57 */ - pendown-gpio = <&gpio2 25 GPIO_ACTIVE_HIGH>; + pendown-gpio = <&gpio2 25 GPIO_ACTIVE_LOW>;
ti,x-min = /bits/ 16 <0x0>; ti,x-max = /bits/ 16 <0x0fff>; diff --git a/arch/arm/boot/dts/omap3-devkit8000-lcd-common.dtsi b/arch/arm/boot/dts/omap3-devkit8000-lcd-common.dtsi index 3decc2d78a6ca..a7f99ae0c1fe9 100644 --- a/arch/arm/boot/dts/omap3-devkit8000-lcd-common.dtsi +++ b/arch/arm/boot/dts/omap3-devkit8000-lcd-common.dtsi @@ -54,7 +54,7 @@
interrupt-parent = <&gpio1>; interrupts = <27 0>; /* gpio_27 */ - pendown-gpio = <&gpio1 27 GPIO_ACTIVE_HIGH>; + pendown-gpio = <&gpio1 27 GPIO_ACTIVE_LOW>;
ti,x-min = /bits/ 16 <0x0>; ti,x-max = /bits/ 16 <0x0fff>; diff --git a/arch/arm/boot/dts/omap3-lilly-a83x.dtsi b/arch/arm/boot/dts/omap3-lilly-a83x.dtsi index c22833d4e5685..fba69b3f79c2b 100644 --- a/arch/arm/boot/dts/omap3-lilly-a83x.dtsi +++ b/arch/arm/boot/dts/omap3-lilly-a83x.dtsi @@ -311,7 +311,7 @@ interrupt-parent = <&gpio1>; interrupts = <8 0>; /* boot6 / gpio_8 */ spi-max-frequency = <1000000>; - pendown-gpio = <&gpio1 8 GPIO_ACTIVE_HIGH>; + pendown-gpio = <&gpio1 8 GPIO_ACTIVE_LOW>; vcc-supply = <®_vcc3>; pinctrl-names = "default"; pinctrl-0 = <&tsc2048_pins>; diff --git a/arch/arm/boot/dts/omap3-overo-common-lcd35.dtsi b/arch/arm/boot/dts/omap3-overo-common-lcd35.dtsi index 185ce53de0ece..0523a369a4d75 100644 --- a/arch/arm/boot/dts/omap3-overo-common-lcd35.dtsi +++ b/arch/arm/boot/dts/omap3-overo-common-lcd35.dtsi @@ -149,7 +149,7 @@
interrupt-parent = <&gpio4>; interrupts = <18 0>; /* gpio_114 */ - pendown-gpio = <&gpio4 18 GPIO_ACTIVE_HIGH>; + pendown-gpio = <&gpio4 18 GPIO_ACTIVE_LOW>;
ti,x-min = /bits/ 16 <0x0>; ti,x-max = /bits/ 16 <0x0fff>; diff --git a/arch/arm/boot/dts/omap3-overo-common-lcd43.dtsi b/arch/arm/boot/dts/omap3-overo-common-lcd43.dtsi index 7fe0f9148232b..d340eb722f129 100644 --- a/arch/arm/boot/dts/omap3-overo-common-lcd43.dtsi +++ b/arch/arm/boot/dts/omap3-overo-common-lcd43.dtsi @@ -160,7 +160,7 @@
interrupt-parent = <&gpio4>; interrupts = <18 0>; /* gpio_114 */ - pendown-gpio = <&gpio4 18 GPIO_ACTIVE_HIGH>; + pendown-gpio = <&gpio4 18 GPIO_ACTIVE_LOW>;
ti,x-min = /bits/ 16 <0x0>; ti,x-max = /bits/ 16 <0x0fff>; diff --git a/arch/arm/boot/dts/omap3-pandora-common.dtsi b/arch/arm/boot/dts/omap3-pandora-common.dtsi index 150d5be42d278..4ea5656825de7 100644 --- a/arch/arm/boot/dts/omap3-pandora-common.dtsi +++ b/arch/arm/boot/dts/omap3-pandora-common.dtsi @@ -651,7 +651,7 @@ pinctrl-0 = <&penirq_pins>; interrupt-parent = <&gpio3>; interrupts = <30 IRQ_TYPE_NONE>; /* GPIO_94 */ - pendown-gpio = <&gpio3 30 GPIO_ACTIVE_HIGH>; + pendown-gpio = <&gpio3 30 GPIO_ACTIVE_LOW>; vcc-supply = <&vaux4>;
ti,x-min = /bits/ 16 <0>; diff --git a/arch/arm/boot/dts/omap5-cm-t54.dts b/arch/arm/boot/dts/omap5-cm-t54.dts index e78d3718f145d..d38781025eef8 100644 --- a/arch/arm/boot/dts/omap5-cm-t54.dts +++ b/arch/arm/boot/dts/omap5-cm-t54.dts @@ -354,7 +354,7 @@
interrupt-parent = <&gpio1>; interrupts = <15 0>; /* gpio1_wk15 */ - pendown-gpio = <&gpio1 15 GPIO_ACTIVE_HIGH>; + pendown-gpio = <&gpio1 15 GPIO_ACTIVE_LOW>;
ti,x-min = /bits/ 16 <0x0>;
From: Inki Dae inki.dae@samsung.com
[ Upstream commit 4a059559809fd1ddbf16f847c4d2237309c08edf ]
Fix a wrong error return by dropping an error return.
When vidi driver is remvoed, if ctx->raw_edid isn't same as fake_edid_info then only what we have to is to free ctx->raw_edid so that driver removing can work correctly - it's not an error case.
Signed-off-by: Inki Dae inki.dae@samsung.com Reviewed-by: Andi Shyti andi.shyti@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/exynos/exynos_drm_vidi.c | 2 -- 1 file changed, 2 deletions(-)
diff --git a/drivers/gpu/drm/exynos/exynos_drm_vidi.c b/drivers/gpu/drm/exynos/exynos_drm_vidi.c index 65b891cb9c50b..d882a22dfd6e6 100644 --- a/drivers/gpu/drm/exynos/exynos_drm_vidi.c +++ b/drivers/gpu/drm/exynos/exynos_drm_vidi.c @@ -483,8 +483,6 @@ static int vidi_remove(struct platform_device *pdev) if (ctx->raw_edid != (struct edid *)fake_edid_info) { kfree(ctx->raw_edid); ctx->raw_edid = NULL; - - return -EINVAL; }
component_del(&pdev->dev, &vidi_component_ops);
From: Min Li lm0963hack@gmail.com
[ Upstream commit 48bfd02569f5db49cc033f259e66d57aa6efc9a3 ]
If it is async, runqueue_node is freed in g2d_runqueue_worker on another worker thread. So in extreme cases, if g2d_runqueue_worker runs first, and then executes the following if statement, there will be use-after-free.
Signed-off-by: Min Li lm0963hack@gmail.com Reviewed-by: Andi Shyti andi.shyti@kernel.org Signed-off-by: Inki Dae inki.dae@samsung.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/exynos/exynos_drm_g2d.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/exynos/exynos_drm_g2d.c b/drivers/gpu/drm/exynos/exynos_drm_g2d.c index fcee33a43aca3..2df04de7f4354 100644 --- a/drivers/gpu/drm/exynos/exynos_drm_g2d.c +++ b/drivers/gpu/drm/exynos/exynos_drm_g2d.c @@ -1332,7 +1332,7 @@ int exynos_g2d_exec_ioctl(struct drm_device *drm_dev, void *data, /* Let the runqueue know that there is work to do. */ queue_work(g2d->g2d_workq, &g2d->runqueue_work);
- if (runqueue_node->async) + if (req->async) goto out;
wait_for_completion(&runqueue_node->complete);
From: Min Li lm0963hack@gmail.com
[ Upstream commit 982b173a6c6d9472730c3116051977e05d17c8c5 ]
Userspace can race to free the gobj(robj converted from), robj should not be accessed again after drm_gem_object_put, otherwith it will result in use-after-free.
Reviewed-by: Christian König christian.koenig@amd.com Signed-off-by: Min Li lm0963hack@gmail.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/radeon/radeon_gem.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-)
diff --git a/drivers/gpu/drm/radeon/radeon_gem.c b/drivers/gpu/drm/radeon/radeon_gem.c index b2b076606f54b..e164b3c7a234f 100644 --- a/drivers/gpu/drm/radeon/radeon_gem.c +++ b/drivers/gpu/drm/radeon/radeon_gem.c @@ -384,7 +384,6 @@ int radeon_gem_set_domain_ioctl(struct drm_device *dev, void *data, struct radeon_device *rdev = dev->dev_private; struct drm_radeon_gem_set_domain *args = data; struct drm_gem_object *gobj; - struct radeon_bo *robj; int r;
/* for now if someone requests domain CPU - @@ -397,13 +396,12 @@ int radeon_gem_set_domain_ioctl(struct drm_device *dev, void *data, up_read(&rdev->exclusive_lock); return -ENOENT; } - robj = gem_to_radeon_bo(gobj);
r = radeon_gem_set_domain(gobj, args->read_domains, args->write_domain);
drm_gem_object_put_unlocked(gobj); up_read(&rdev->exclusive_lock); - r = radeon_gem_handle_lockup(robj->rdev, r); + r = radeon_gem_handle_lockup(rdev, r); return r; }
From: Dheeraj Kumar Srivastava dheerajkumar.srivastava@amd.com
[ Upstream commit 85d38d5810e285d5aec7fb5283107d1da70c12a9 ]
When booting with "intremap=off" and "x2apic_phys" on the kernel command line, the physical x2APIC driver ends up being used even when x2APIC mode is disabled ("intremap=off" disables x2APIC mode). This happens because the first compound condition check in x2apic_phys_probe() is false due to x2apic_mode == 0 and so the following one returns true after default_acpi_madt_oem_check() having already selected the physical x2APIC driver.
This results in the following panic:
kernel BUG at arch/x86/kernel/apic/io_apic.c:2409! invalid opcode: 0000 [#1] PREEMPT SMP NOPTI CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.4.0-rc2-ver4.1rc2 #2 Hardware name: Dell Inc. PowerEdge R6515/07PXPY, BIOS 2.3.6 07/06/2021 RIP: 0010:setup_IO_APIC+0x9c/0xaf0 Call Trace: <TASK> ? native_read_msr apic_intr_mode_init x86_late_time_init start_kernel x86_64_start_reservations x86_64_start_kernel secondary_startup_64_no_verify </TASK>
which is:
setup_IO_APIC: apic_printk(APIC_VERBOSE, "ENABLING IO-APIC IRQs\n"); for_each_ioapic(ioapic) BUG_ON(mp_irqdomain_create(ioapic));
Return 0 to denote that x2APIC has not been enabled when probing the physical x2APIC driver.
[ bp: Massage commit message heavily. ]
Fixes: 9ebd680bd029 ("x86, apic: Use probe routines to simplify apic selection") Signed-off-by: Dheeraj Kumar Srivastava dheerajkumar.srivastava@amd.com Signed-off-by: Borislav Petkov (AMD) bp@alien8.de Reviewed-by: Kishon Vijay Abraham I kvijayab@amd.com Reviewed-by: Vasant Hegde vasant.hegde@amd.com Reviewed-by: Cyrill Gorcunov gorcunov@gmail.com Reviewed-by: Thomas Gleixner tglx@linutronix.de Link: https://lore.kernel.org/r/20230616212236.1389-1-dheerajkumar.srivastava@amd.... Signed-off-by: Sasha Levin sashal@kernel.org --- arch/x86/kernel/apic/x2apic_phys.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/arch/x86/kernel/apic/x2apic_phys.c b/arch/x86/kernel/apic/x2apic_phys.c index 032a00e5d9fa6..76c80e191a1b1 100644 --- a/arch/x86/kernel/apic/x2apic_phys.c +++ b/arch/x86/kernel/apic/x2apic_phys.c @@ -97,7 +97,10 @@ static void init_x2apic_ldr(void)
static int x2apic_phys_probe(void) { - if (x2apic_mode && (x2apic_phys || x2apic_fadt_phys())) + if (!x2apic_mode) + return 0; + + if (x2apic_phys || x2apic_fadt_phys()) return 1;
return apic == &apic_x2apic_phys;
From: Clark Wang xiaoning.wang@nxp.com
[ Upstream commit e69b9bc170c6d93ee375a5cbfd15f74c0fb59bdd ]
Claim clkhi and clklo as integer type to avoid possible calculation errors caused by data overflow.
Fixes: a55fa9d0e42e ("i2c: imx-lpi2c: add low power i2c bus driver") Signed-off-by: Clark Wang xiaoning.wang@nxp.com Signed-off-by: Carlos Song carlos.song@nxp.com Reviewed-by: Andi Shyti andi.shyti@kernel.org Signed-off-by: Wolfram Sang wsa@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/i2c/busses/i2c-imx-lpi2c.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/i2c/busses/i2c-imx-lpi2c.c b/drivers/i2c/busses/i2c-imx-lpi2c.c index 4fac2591b6618..89faef6f013b4 100644 --- a/drivers/i2c/busses/i2c-imx-lpi2c.c +++ b/drivers/i2c/busses/i2c-imx-lpi2c.c @@ -206,8 +206,8 @@ static void lpi2c_imx_stop(struct lpi2c_imx_struct *lpi2c_imx) /* CLKLO = I2C_CLK_RATIO * CLKHI, SETHOLD = CLKHI, DATAVD = CLKHI/2 */ static int lpi2c_imx_config(struct lpi2c_imx_struct *lpi2c_imx) { - u8 prescale, filt, sethold, clkhi, clklo, datavd; - unsigned int clk_rate, clk_cycle; + u8 prescale, filt, sethold, datavd; + unsigned int clk_rate, clk_cycle, clkhi, clklo; enum lpi2c_imx_pincfg pincfg; unsigned int temp;
From: Hugh Dickins hughd@google.com
commit 073861ed77b6b957c3c8d54a11dc503f7d986ceb upstream.
Twice now, when exercising ext4 looped on shmem huge pages, I have crashed on the PF_ONLY_HEAD check inside PageWaiters(): ext4_finish_bio() calling end_page_writeback() calling wake_up_page() on tail of a shmem huge page, no longer an ext4 page at all.
The problem is that PageWriteback is not accompanied by a page reference (as the NOTE at the end of test_clear_page_writeback() acknowledges): as soon as TestClearPageWriteback has been done, that page could be removed from page cache, freed, and reused for something else by the time that wake_up_page() is reached.
https://lore.kernel.org/linux-mm/20200827122019.GC14765@casper.infradead.org... Matthew Wilcox suggested avoiding or weakening the PageWaiters() tail check; but I'm paranoid about even looking at an unreferenced struct page, lest its memory might itself have already been reused or hotremoved (and wake_up_page_bit() may modify that memory with its ClearPageWaiters()).
Then on crashing a second time, realized there's a stronger reason against that approach. If my testing just occasionally crashes on that check, when the page is reused for part of a compound page, wouldn't it be much more common for the page to get reused as an order-0 page before reaching wake_up_page()? And on rare occasions, might that reused page already be marked PageWriteback by its new user, and already be waited upon? What would that look like?
It would look like BUG_ON(PageWriteback) after wait_on_page_writeback() in write_cache_pages() (though I have never seen that crash myself).
Matthew Wilcox explaining this to himself: "page is allocated, added to page cache, dirtied, writeback starts,
--- thread A --- filesystem calls end_page_writeback() test_clear_page_writeback() --- context switch to thread B --- truncate_inode_pages_range() finds the page, it doesn't have writeback set, we delete it from the page cache. Page gets reallocated, dirtied, writeback starts again. Then we call write_cache_pages(), see PageWriteback() set, call wait_on_page_writeback() --- context switch back to thread A --- wake_up_page(page, PG_writeback); ... thread B is woken, but because the wakeup was for the old use of the page, PageWriteback is still set.
Devious"
And prior to 2a9127fcf229 ("mm: rewrite wait_on_page_bit_common() logic") this would have been much less likely: before that, wake_page_function()'s non-exclusive case would stop walking and not wake if it found Writeback already set again; whereas now the non-exclusive case proceeds to wake.
I have not thought of a fix that does not add a little overhead: the simplest fix is for end_page_writeback() to get_page() before calling test_clear_page_writeback(), then put_page() after wake_up_page().
Was there a chance of missed wakeups before, since a page freed before reaching wake_up_page() would have PageWaiters cleared? I think not, because each waiter does hold a reference on the page. This bug comes when the old use of the page, the one we do TestClearPageWriteback on, had *no* waiters, so no additional page reference beyond the page cache (and whoever racily freed it). The reuse of the page has a waiter holding a reference, and its own PageWriteback set; but the belated wake_up_page() has woken the reuse to hit that BUG_ON(PageWriteback).
Reported-by: syzbot+3622cea378100f45d59f@syzkaller.appspotmail.com Reported-by: Qian Cai cai@lca.pw Fixes: 2a9127fcf229 ("mm: rewrite wait_on_page_bit_common() logic") Signed-off-by: Hugh Dickins hughd@google.com Cc: stable@vger.kernel.org # v5.8+ Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- mm/filemap.c | 8 ++++++++ mm/page-writeback.c | 6 ------ 2 files changed, 8 insertions(+), 6 deletions(-)
--- a/mm/filemap.c +++ b/mm/filemap.c @@ -1388,11 +1388,19 @@ void end_page_writeback(struct page *pag rotate_reclaimable_page(page); }
+ /* + * Writeback does not hold a page reference of its own, relying + * on truncation to wait for the clearing of PG_writeback. + * But here we must make sure that the page is not freed and + * reused before the wake_up_page(). + */ + get_page(page); if (!test_clear_page_writeback(page)) BUG();
smp_mb__after_atomic(); wake_up_page(page, PG_writeback); + put_page(page); } EXPORT_SYMBOL(end_page_writeback);
--- a/mm/page-writeback.c +++ b/mm/page-writeback.c @@ -2746,12 +2746,6 @@ int test_clear_page_writeback(struct pag } else { ret = TestClearPageWriteback(page); } - /* - * NOTE: Page might be free now! Writeback doesn't hold a page - * reference on its own, it relies on truncation to wait for - * the clearing of PG_writeback. The below can only access - * page state that is static across allocation cycles. - */ if (ret) { dec_lruvec_state(lruvec, NR_WRITEBACK); dec_zone_page_state(page, NR_ZONE_WRITE_PENDING);
From: Linus Torvalds torvalds@linux-foundation.org
commit c2407cf7d22d0c0d94cf20342b3b8f06f1d904e7 upstream.
Ever since commit 2a9127fcf229 ("mm: rewrite wait_on_page_bit_common() logic") we've had some very occasional reports of BUG_ON(PageWriteback) in write_cache_pages(), which we thought we already fixed in commit 073861ed77b6 ("mm: fix VM_BUG_ON(PageTail) and BUG_ON(PageWriteback)").
But syzbot just reported another one, even with that commit in place.
And it turns out that there's a simpler way to trigger the BUG_ON() than the one Hugh found with page re-use. It all boils down to the fact that the page writeback is ostensibly serialized by the page lock, but that isn't actually really true.
Yes, the people _setting_ writeback all do so under the page lock, but the actual clearing of the bit - and waking up any waiters - happens without any page lock.
This gives us this fairly simple race condition:
CPU1 = end previous writeback CPU2 = start new writeback under page lock CPU3 = write_cache_pages()
CPU1 CPU2 CPU3 ---- ---- ----
end_page_writeback() test_clear_page_writeback(page) ... delayed...
lock_page(); set_page_writeback() unlock_page()
lock_page() wait_on_page_writeback();
wake_up_page(page, PG_writeback); .. wakes up CPU3 ..
BUG_ON(PageWriteback(page));
where the BUG_ON() happens because we woke up the PG_writeback bit becasue of the _previous_ writeback, but a new one had already been started because the clearing of the bit wasn't actually atomic wrt the actual wakeup or serialized by the page lock.
The reason this didn't use to happen was that the old logic in waiting on a page bit would just loop if it ever saw the bit set again.
The nice proper fix would probably be to get rid of the whole "wait for writeback to clear, and then set it" logic in the writeback path, and replace it with an atomic "wait-to-set" (ie the same as we have for page locking: we set the page lock bit with a single "lock_page()", not with "wait for lock bit to clear and then set it").
However, out current model for writeback is that the waiting for the writeback bit is done by the generic VFS code (ie write_cache_pages()), but the actual setting of the writeback bit is done much later by the filesystem ".writepages()" function.
IOW, to make the writeback bit have that same kind of "wait-to-set" behavior as we have for page locking, we'd have to change our roughly ~50 different writeback functions. Painful.
Instead, just make "wait_on_page_writeback()" loop on the very unlikely situation that the PG_writeback bit is still set, basically re-instating the old behavior. This is very non-optimal in case of contention, but since we only ever set the bit under the page lock, that situation is controlled.
Reported-by: syzbot+2fc0712f8f8b8b8fa0ef@syzkaller.appspotmail.com Fixes: 2a9127fcf229 ("mm: rewrite wait_on_page_bit_common() logic") Acked-by: Hugh Dickins hughd@google.com Cc: Andrew Morton akpm@linux-foundation.org Cc: Matthew Wilcox willy@infradead.org Cc: stable@kernel.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- mm/page-writeback.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/mm/page-writeback.c +++ b/mm/page-writeback.c @@ -2811,7 +2811,7 @@ EXPORT_SYMBOL(__test_set_page_writeback) */ void wait_on_page_writeback(struct page *page) { - if (PageWriteback(page)) { + while (PageWriteback(page)) { trace_wait_on_page_writeback(page, page_mapping(page)); wait_on_page_bit(page, PG_writeback); }
From: Darrick J. Wong djwong@kernel.org
commit 22ed903eee23a5b174e240f1cdfa9acf393a5210 upstream.
syzbot detected a crash during log recovery:
XFS (loop0): Mounting V5 Filesystem bfdc47fc-10d8-4eed-a562-11a831b3f791 XFS (loop0): Torn write (CRC failure) detected at log block 0x180. Truncating head block from 0x200. XFS (loop0): Starting recovery (logdev: internal) ================================================================== BUG: KASAN: slab-out-of-bounds in xfs_btree_lookup_get_block+0x15c/0x6d0 fs/xfs/libxfs/xfs_btree.c:1813 Read of size 8 at addr ffff88807e89f258 by task syz-executor132/5074
CPU: 0 PID: 5074 Comm: syz-executor132 Not tainted 6.2.0-rc1-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/26/2022 Call Trace: <TASK> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0x1b1/0x290 lib/dump_stack.c:106 print_address_description+0x74/0x340 mm/kasan/report.c:306 print_report+0x107/0x1f0 mm/kasan/report.c:417 kasan_report+0xcd/0x100 mm/kasan/report.c:517 xfs_btree_lookup_get_block+0x15c/0x6d0 fs/xfs/libxfs/xfs_btree.c:1813 xfs_btree_lookup+0x346/0x12c0 fs/xfs/libxfs/xfs_btree.c:1913 xfs_btree_simple_query_range+0xde/0x6a0 fs/xfs/libxfs/xfs_btree.c:4713 xfs_btree_query_range+0x2db/0x380 fs/xfs/libxfs/xfs_btree.c:4953 xfs_refcount_recover_cow_leftovers+0x2d1/0xa60 fs/xfs/libxfs/xfs_refcount.c:1946 xfs_reflink_recover_cow+0xab/0x1b0 fs/xfs/xfs_reflink.c:930 xlog_recover_finish+0x824/0x920 fs/xfs/xfs_log_recover.c:3493 xfs_log_mount_finish+0x1ec/0x3d0 fs/xfs/xfs_log.c:829 xfs_mountfs+0x146a/0x1ef0 fs/xfs/xfs_mount.c:933 xfs_fs_fill_super+0xf95/0x11f0 fs/xfs/xfs_super.c:1666 get_tree_bdev+0x400/0x620 fs/super.c:1282 vfs_get_tree+0x88/0x270 fs/super.c:1489 do_new_mount+0x289/0xad0 fs/namespace.c:3145 do_mount fs/namespace.c:3488 [inline] __do_sys_mount fs/namespace.c:3697 [inline] __se_sys_mount+0x2d3/0x3c0 fs/namespace.c:3674 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x3d/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd RIP: 0033:0x7f89fa3f4aca Code: 83 c4 08 5b 5d c3 66 2e 0f 1f 84 00 00 00 00 00 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 a5 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 c0 ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007fffd5fb5ef8 EFLAGS: 00000206 ORIG_RAX: 00000000000000a5 RAX: ffffffffffffffda RBX: 00646975756f6e2c RCX: 00007f89fa3f4aca RDX: 0000000020000100 RSI: 0000000020009640 RDI: 00007fffd5fb5f10 RBP: 00007fffd5fb5f10 R08: 00007fffd5fb5f50 R09: 000000000000970d R10: 0000000000200800 R11: 0000000000000206 R12: 0000000000000004 R13: 0000555556c6b2c0 R14: 0000000000200800 R15: 00007fffd5fb5f50 </TASK>
The fuzzed image contains an AGF with an obviously garbage agf_refcount_level value of 32, and a dirty log with a buffer log item for that AGF. The ondisk AGF has a higher LSN than the recovered log item. xlog_recover_buf_commit_pass2 reads the buffer, compares the LSNs, and decides to skip replay because the ondisk buffer appears to be newer.
Unfortunately, the ondisk buffer is corrupt, but recovery just read the buffer with no buffer ops specified:
error = xfs_buf_read(mp->m_ddev_targp, buf_f->blf_blkno, buf_f->blf_len, buf_flags, &bp, NULL);
Skipping the buffer leaves its contents in memory unverified. This sets us up for a kernel crash because xfs_refcount_recover_cow_leftovers reads the buffer (which is still around in XBF_DONE state, so no read verification) and creates a refcountbt cursor of height 32. This is impossible so we run off the end of the cursor object and crash.
Fix this by invoking the verifier on all skipped buffers and aborting log recovery if the ondisk buffer is corrupt. It might be smarter to force replay the log item atop the buffer and then see if it'll pass the write verifier (like ext4 does) but for now let's go with the conservative option where we stop immediately.
Link: https://syzkaller.appspot.com/bug?extid=7e9494b8b399902e994e Signed-off-by: Darrick J. Wong djwong@kernel.org Reviewed-by: Dave Chinner dchinner@redhat.com Signed-off-by: Dave Chinner david@fromorbit.com Signed-off-by: Chandan Babu R chandan.babu@oracle.com Acked-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/xfs/xfs_log_recover.c | 10 ++++++++++ 1 file changed, 10 insertions(+)
--- a/fs/xfs/xfs_log_recover.c +++ b/fs/xfs/xfs_log_recover.c @@ -2783,6 +2783,16 @@ xlog_recover_buffer_pass2( if (lsn && lsn != -1 && XFS_LSN_CMP(lsn, current_lsn) >= 0) { trace_xfs_log_recover_buf_skip(log, buf_f); xlog_recover_validate_buf_type(mp, bp, buf_f, NULLCOMMITLSN); + + /* + * We're skipping replay of this buffer log item due to the log + * item LSN being behind the ondisk buffer. Verify the buffer + * contents since we aren't going to run the write verifier. + */ + if (bp->b_ops) { + bp->b_ops->verify_read(bp); + error = bp->b_error; + } goto out_release; }
On Mon, 26 Jun 2023 20:11:39 +0200, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 5.4.249 release. There are 60 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed, 28 Jun 2023 18:07:23 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.4.249-rc1... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.4.y and the diffstat can be found below.
thanks,
greg k-h
All tests passing for Tegra ...
Test results for stable-v5.4: 10 builds: 10 pass, 0 fail 26 boots: 26 pass, 0 fail 59 tests: 59 pass, 0 fail
Linux version: 5.4.249-rc1-g824b023c3cda Boards tested: tegra124-jetson-tk1, tegra186-p2771-0000, tegra194-p2972-0000, tegra20-ventana, tegra210-p2371-2180, tegra210-p3450-0000, tegra30-cardhu-a04
Tested-by: Jon Hunter jonathanh@nvidia.com
Jon
Hi Greg,
On 26/06/23 11:41 pm, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 5.4.249 release. There are 60 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed, 28 Jun 2023 18:07:23 +0000. Anything received after that time might be too late.
No problems seen on x86_64 and aarch64.
Tested-by: Harshit Mogalapalli harshit.m.mogalapalli@oracle.com
Thanks, Harshit
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.4.249-rc1... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.4.y and the diffstat can be found below.
thanks,
greg k-h
Hi Greg,
From: Greg Kroah-Hartman gregkh@linuxfoundation.org Sent: Monday, June 26, 2023 7:12 PM
This is the start of the stable review cycle for the 5.4.249 release. There are 60 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed, 28 Jun 2023 18:07:23 +0000. Anything received after that time might be too late.
Thank you for the release!
CIP configurations built and booted okay with Linux 5.4.249-rc1 (824b023c3cda): https://gitlab.com/cip-project/cip-testing/linux-stable-rc-ci/-/pipelines/91... https://gitlab.com/cip-project/cip-testing/linux-stable-rc-ci/-/commits/linu...
Tested-by: Chris Paterson (CIP) chris.paterson2@renesas.com
Kind regards, Chris
On Mon, Jun 26, 2023 at 08:11:39PM +0200, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 5.4.249 release. There are 60 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed, 28 Jun 2023 18:07:23 +0000. Anything received after that time might be too late.
Build results: total: 159 pass: 159 fail: 0 Qemu test results: total: 455 pass: 455 fail: 0
Tested-by: Guenter Roeck linux@roeck-us.net
Guenter
On Tue, 27 Jun 2023 at 00:06, Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
This is the start of the stable review cycle for the 5.4.249 release. There are 60 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed, 28 Jun 2023 18:07:23 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.4.249-rc1... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.4.y and the diffstat can be found below.
thanks,
greg k-h
Results from Linaro’s test farm. No regressions on arm64, arm, x86_64, and i386.
Tested-by: Linux Kernel Functional Testing lkft@linaro.org
## Build * kernel: 5.4.249-rc1 * git: https://gitlab.com/Linaro/lkft/mirrors/stable/linux-stable-rc * git branch: linux-5.4.y * git commit: 824b023c3cda00fe610c1f79d14a6223d68f425f * git describe: v5.4.248-61-g824b023c3cda * test details: https://qa-reports.linaro.org/lkft/linux-stable-rc-linux-5.4.y/build/v5.4.24...
## Test Regressions (compared to v5.4.248)
## Metric Regressions (compared to v5.4.248)
## Test Fixes (compared to v5.4.248)
## Metric Fixes (compared to v5.4.248)
## Test result summary total: 115217, pass: 90332, fail: 2137, skip: 22692, xfail: 56
## Build Summary * arc: 5 total, 5 passed, 0 failed * arm: 148 total, 129 passed, 19 failed * arm64: 48 total, 44 passed, 4 failed * i386: 30 total, 22 passed, 8 failed * mips: 30 total, 29 passed, 1 failed * parisc: 4 total, 4 passed, 0 failed * powerpc: 33 total, 32 passed, 1 failed * riscv: 15 total, 14 passed, 1 failed * s390: 8 total, 8 passed, 0 failed * sh: 14 total, 12 passed, 2 failed * sparc: 8 total, 8 passed, 0 failed * x86_64: 41 total, 39 passed, 2 failed
## Test suites summary * boot * kselftest-android * kselftest-arm64 * kselftest-breakpoints * kselftest-capabilities * kselftest-cgroup * kselftest-clone3 * kselftest-core * kselftest-cpu-hotplug * kselftest-cpufreq * kselftest-drivers-dma-buf * kselftest-efivarfs * kselftest-exec * kselftest-filesystems * kselftest-filesystems-binderfs * kselftest-firmware * kselftest-fpu * kselftest-ftrace * kselftest-futex * kselftest-gpio * kselftest-intel_pstate * kselftest-ipc * kselftest-ir * kselftest-kcmp * kselftest-kexec * kselftest-lib * kselftest-livepatch * kselftest-membarrier * kselftest-memfd * kselftest-memory-hotplug * kselftest-mincore * kselftest-mount * kselftest-mqueue * kselftest-net * kselftest-net-forwarding * kselftest-net-mptcp * kselftest-netfilter * kselftest-nsfs * kselftest-openat2 * kselftest-pid_namespace * kselftest-pidfd * kselftest-proc * kselftest-pstore * kselftest-ptrace * kselftest-rseq * kselftest-rtc * kselftest-sigaltstack * kselftest-size * kselftest-tc-testing * kselftest-timens * kselftest-timers * kselftest-tmpfs * kselftest-tpm2 * kselftest-user * kselftest-user_events * kselftest-vDSO * kselftest-watchdog * kselftest-x86 * kselftest-zram * kunit * libhugetlbfs * log-parser-boot * log-parser-test * ltp-cap_bounds * ltp-commands * ltp-containers * ltp-controllers * ltp-cpuhotplug * ltp-crypto * ltp-cve * ltp-dio * ltp-fcntl-locktests * ltp-filecaps * ltp-fs * ltp-fs_bind * ltp-fs_perms_simple * ltp-fsx * ltp-hugetlb * ltp-io * ltp-ipc * ltp-math * ltp-mm * ltp-nptl * ltp-pty * ltp-sched * ltp-securebits * ltp-smoke * ltp-syscalls * ltp-tracing * network-basic-tests * perf * rcutorture * v4l2-compliance * vdso
-- Linaro LKFT https://lkft.linaro.org
linux-stable-mirror@lists.linaro.org