December 2018 - Linux-stable-mirror

Re: [PATCH] acpi / apei: fix NULL deref during init

by Borislav Petkov

On Fri, Dec 14, 2018 at 11:42:01PM +0100, Thomas Schöbel-Theuer wrote: > Ah, I overlooked that commit e56c92565dfe2 is already providing a different > solution to the same problem in newer kernels _only_, as a _side_ effect > (not clear to me from the description, but clear from reading the code). Damn, I missed the fact that this is not the upstream kernel: CPU: 0 PID: 1 UID: 0 Comm: swapper/0 Not tainted 4.4.0-ui18344.004-uiabi1-infong-amd64 #1 > So another alternative would be backporting e56c92565dfe2 to the 4.4 LTS > series. Also fine for me. That looks like the right fix. A note for the next time: do not send a fix for a stable kernel which is not upstream: >From Documentation/process/stable-kernel-rules.rst: " - It or an equivalent fix must already exist in Linus' tree (upstream)." The stable kernels track upstream so if a stable kernel has a problem, the first thing one needs to do is to check whether this has been fixed upstream and if so, to backport it. This is the case most of the time. In the very seldom cases where a separate fix is needed, it needs to be handled by asking Greg what to do. :-) Adding stable@ folks to CC to set me straight if I'm missing something. -- Regards/Gruss, Boris. Good mailing practices for 400: avoid top-posting and trim the reply.

6 years, 10 months

2
1
0 0

Re: [PATCH v2] bcache: set max writeback rate when I/O request is idle

by Coly Li

On 12/15/18 2:10 AM, Michael Lyle wrote: > Coly-- > > Apologies for the late reply on this. I just noticed it based on Greg's > comment about stable. > > When I wrote the previous "accelerate writeback" patchset, my first > attempt was very much like this. I believe it was asked (by you?) > whether it would impact the latency of front-end I/O because of deep > backing device queues when a new request comes in. > > Won't this cause lots of requests to be pending to backing, so if there > is intermittent front-end I/O they'll have to wait for the device? > That's why I previous had it set to only complete one writeback at a > time, to bound the impact on latency-- based on that review feedback. Hi Mike, This patch is a much more conservative effort. It sets a high writeback rate only when all attached bcache device are idled for quite many seconds. In this situation, the cache set is really quite and spared. Commit b1092c9af9ed ("bcache: allow quick writeback when backing idle") just looks at single bcache device. If there are I/Os for other bcache device on the cache set, and a single bcache device is idle, a faster writeback rate for this single idle bcache device will happen, I/O to read dirty data on cache for writeback will have negative impact to I/O requests of other bcache devices. Therefore I give up a specific faster writeback, to make sure the latency of front end I/O in general. Thanks. Coly Li > On Mon, Jul 23, 2018 at 9:03 PM Coly Li <colyli(a)suse.de > <mailto:colyli@suse.de>> wrote: > > Commit b1092c9af9ed ("bcache: allow quick writeback when backing idle") > allows the writeback rate to be faster if there is no I/O request on a > bcache device. It works well if there is only one bcache device attached > to the cache set. If there are many bcache devices attached to a cache > set, it may introduce performance regression because multiple faster > writeback threads of the idle bcache devices will compete the btree > level > locks with the bcache device who have I/O requests coming. > > This patch fixes the above issue by only permitting fast writebac when > all bcache devices attached on the cache set are idle. And if one of the > bcache devices has new I/O request coming, minimized all writeback > throughput immediately and let PI controller __update_writeback_rate() > to decide the upcoming writeback rate for each bcache device. > > Also when all bcache devices are idle, limited wrieback rate to a small > number is wast of thoughput, especially when backing devices are slower > non-rotation devices (e.g. SATA SSD). This patch sets a max writeback > rate for each backing device if the whole cache set is idle. A faster > writeback rate in idle time means new I/Os may have more available space > for dirty data, and people may observe a better write performance then. > > Please note bcache may change its cache mode in run time, and this patch > still works if the cache mode is switched from writeback mode and there > is still dirty data on cache. > > Fixes: Commit b1092c9af9ed ("bcache: allow quick writeback when > backing idle") > Cc: stable(a)vger.kernel.org <mailto:stable@vger.kernel.org> #4.16+ > Signed-off-by: Coly Li <colyli(a)suse.de <mailto:colyli@suse.de>> > Tested-by: Kai Krakow <kai(a)kaishome.de <mailto:kai@kaishome.de>> > Cc: Michael Lyle <mlyle(a)lyle.org <mailto:mlyle@lyle.org>> > Cc: Stefan Priebe <s.priebe(a)profihost.ag <mailto:s.priebe@profihost.ag>> > --- > Channgelog: > v2, Fix a deadlock reported by Stefan Priebe. > v1, Initial version. > > drivers/md/bcache/bcache.h | 11 ++-- > drivers/md/bcache/request.c | 51 ++++++++++++++- > drivers/md/bcache/super.c | 1 + > drivers/md/bcache/sysfs.c | 14 +++-- > drivers/md/bcache/util.c | 2 +- > drivers/md/bcache/util.h | 2 +- > drivers/md/bcache/writeback.c | 115 ++++++++++++++++++++++++++-------- > 7 files changed, 155 insertions(+), 41 deletions(-) > > diff --git a/drivers/md/bcache/bcache.h b/drivers/md/bcache/bcache.h > index d6bf294f3907..469ab1a955e0 100644 > --- a/drivers/md/bcache/bcache.h > +++ b/drivers/md/bcache/bcache.h > @@ -328,13 +328,6 @@ struct cached_dev { > */ > atomic_t has_dirty; > > - /* > - * Set to zero by things that touch the backing volume-- except > - * writeback. Incremented by writeback. Used to determine > when to > - * accelerate idle writeback. > - */ > - atomic_t backing_idle; > - > struct bch_ratelimit writeback_rate; > struct delayed_work writeback_rate_update; > > @@ -514,6 +507,8 @@ struct cache_set { > struct cache_accounting accounting; > > unsigned long flags; > + atomic_t idle_counter; > + atomic_t at_max_writeback_rate; > > struct cache_sb sb; > > @@ -523,6 +518,8 @@ struct cache_set { > > struct bcache_device **devices; > unsigned devices_max_used; > + /* See set_at_max_writeback_rate() for how it is used */ > + unsigned previous_dirty_dc_nr; > struct list_head cached_devs; > uint64_t cached_dev_sectors; > struct closure caching; > diff --git a/drivers/md/bcache/request.c b/drivers/md/bcache/request.c > index ae67f5fa8047..1af3d96abfa5 100644 > --- a/drivers/md/bcache/request.c > +++ b/drivers/md/bcache/request.c > @@ -1104,6 +1104,43 @@ static void detached_dev_do_request(struct > bcache_device *d, struct bio *bio) > > /* Cached devices - read & write stuff */ > > +static void quit_max_writeback_rate(struct cache_set *c, > + struct cached_dev *this_dc) > +{ > + int i; > + struct bcache_device *d; > + struct cached_dev *dc; > + > + /* > + * If bch_register_lock is acquired by other attach/detach > operations, > + * waiting here will increase I/O request latency for > seconds or more. > + * To avoid such situation, only writeback rate of current > cached device > + * is set to 1, and __update_write_back() will decide > writeback rate > + * of other cached devices (remember c->idle_counter is 0 now). > + */ > + if (mutex_trylock(&bch_register_lock)){ > + for (i = 0; i < c->devices_max_used; i++) { > + if (!c->devices[i]) > + continue; > + > + if (UUID_FLASH_ONLY(&c->uuids[i])) > + continue; > + > + d = c->devices[i]; > + dc = container_of(d, struct cached_dev, disk); > + /* > + * set writeback rate to default minimum value, > + * then let update_writeback_rate() to > decide the > + * upcoming rate. > + */ > + atomic64_set(&dc->writeback_rate.rate, 1); > + } > + > + mutex_unlock(&bch_register_lock); > + } else > + atomic64_set(&this_dc->writeback_rate.rate, 1); > +} > + > static blk_qc_t cached_dev_make_request(struct request_queue *q, > struct bio *bio) > { > @@ -1119,7 +1156,19 @@ static blk_qc_t > cached_dev_make_request(struct request_queue *q, > return BLK_QC_T_NONE; > } > > - atomic_set(&dc->backing_idle, 0); > + if (d->c) { > + atomic_set(&d->c->idle_counter, 0); > + /* > + * If at_max_writeback_rate of cache set is true and > new I/O > + * comes, quit max writeback rate of all cached devices > + * attached to this cache set, and set > at_max_writeback_rate > + * to false. > + */ > + if > (unlikely(atomic_read(&d->c->at_max_writeback_rate) == 1)) { > + atomic_set(&d->c->at_max_writeback_rate, 0); > + quit_max_writeback_rate(d->c, dc); > + } > + } > generic_start_io_acct(q, rw, bio_sectors(bio), &d->disk->part0); > > bio_set_dev(bio, dc->bdev); > diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c > index fa4058e43202..fa532d9f9353 100644 > --- a/drivers/md/bcache/super.c > +++ b/drivers/md/bcache/super.c > @@ -1687,6 +1687,7 @@ struct cache_set *bch_cache_set_alloc(struct > cache_sb *sb) > c->block_bits = ilog2(sb->block_size); > c->nr_uuids = bucket_bytes(c) / sizeof(struct > uuid_entry); > c->devices_max_used = 0; > + c->previous_dirty_dc_nr = 0; > c->btree_pages = bucket_pages(c); > if (c->btree_pages > BTREE_MAX_PAGES) > c->btree_pages = max_t(int, c->btree_pages / 4, > diff --git a/drivers/md/bcache/sysfs.c b/drivers/md/bcache/sysfs.c > index 225b15aa0340..d719021bff81 100644 > --- a/drivers/md/bcache/sysfs.c > +++ b/drivers/md/bcache/sysfs.c > @@ -170,7 +170,8 @@ SHOW(__bch_cached_dev) > var_printf(writeback_running, "%i"); > var_print(writeback_delay); > var_print(writeback_percent); > - sysfs_hprint(writeback_rate, dc->writeback_rate.rate << 9); > + sysfs_hprint(writeback_rate, > + atomic64_read(&dc->writeback_rate.rate) << 9); > sysfs_hprint(io_errors, atomic_read(&dc->io_errors)); > sysfs_printf(io_error_limit, "%i", dc->error_limit); > sysfs_printf(io_disable, "%i", dc->io_disable); > @@ -188,7 +189,8 @@ SHOW(__bch_cached_dev) > char change[20]; > s64 next_io; > > - bch_hprint(rate, dc->writeback_rate.rate << 9); > + bch_hprint(rate, > + atomic64_read(&dc->writeback_rate.rate) > << 9); > bch_hprint(dirty, > bcache_dev_sectors_dirty(&dc->disk) << 9); > bch_hprint(target, dc->writeback_rate_target << 9); > > bch_hprint(proportional,dc->writeback_rate_proportional << 9); > @@ -255,8 +257,12 @@ STORE(__cached_dev) > > sysfs_strtoul_clamp(writeback_percent, > dc->writeback_percent, 0, 40); > > - sysfs_strtoul_clamp(writeback_rate, > - dc->writeback_rate.rate, 1, INT_MAX); > + if (attr == &sysfs_writeback_rate) { > + int v; > + > + sysfs_strtoul_clamp(writeback_rate, v, 1, INT_MAX); > + atomic64_set(&dc->writeback_rate.rate, v); > + } > > sysfs_strtoul_clamp(writeback_rate_update_seconds, > dc->writeback_rate_update_seconds, > diff --git a/drivers/md/bcache/util.c b/drivers/md/bcache/util.c > index fc479b026d6d..84f90c3d996d 100644 > --- a/drivers/md/bcache/util.c > +++ b/drivers/md/bcache/util.c > @@ -200,7 +200,7 @@ uint64_t bch_next_delay(struct bch_ratelimit *d, > uint64_t done) > { > uint64_t now = local_clock(); > > - d->next += div_u64(done * NSEC_PER_SEC, d->rate); > + d->next += div_u64(done * NSEC_PER_SEC, > atomic64_read(&d->rate)); > > /* Bound the time. Don't let us fall further than 2 seconds > behind > * (this prevents unnecessary backlog that would make it > impossible > diff --git a/drivers/md/bcache/util.h b/drivers/md/bcache/util.h > index cced87f8eb27..7e17f32ab563 100644 > --- a/drivers/md/bcache/util.h > +++ b/drivers/md/bcache/util.h > @@ -442,7 +442,7 @@ struct bch_ratelimit { > * Rate at which we want to do work, in units per second > * The units here correspond to the units passed to > bch_next_delay() > */ > - uint32_t rate; > + atomic64_t rate; > }; > > static inline void bch_ratelimit_reset(struct bch_ratelimit *d) > diff --git a/drivers/md/bcache/writeback.c > b/drivers/md/bcache/writeback.c > index ad45ebe1a74b..11ffadc3cf8f 100644 > --- a/drivers/md/bcache/writeback.c > +++ b/drivers/md/bcache/writeback.c > @@ -49,6 +49,80 @@ static uint64_t __calc_target_rate(struct > cached_dev *dc) > return (cache_dirty_target * bdev_share) >> > WRITEBACK_SHARE_SHIFT; > } > > +static bool set_at_max_writeback_rate(struct cache_set *c, > + struct cached_dev *dc) > +{ > + int i, dirty_dc_nr = 0; > + struct bcache_device *d; > + > + /* > + * bch_register_lock is acquired in > cached_dev_detach_finish() before > + * calling cancel_writeback_rate_update_dwork() to stop the > delayed > + * kworker writeback_rate_update (where the context we are > for now). > + * Therefore call mutex_lock() here may introduce deadlock > when shut > + * down the bcache device. > + * c->previous_dirty_dc_nr is used to record previous calculated > + * dirty_dc_nr when mutex_trylock() last time succeeded. Then if > + * mutex_trylock() failed here, use c->previous_dirty_dc_nr > as dirty > + * cached device number. Of cause it might be inaccurate, > but a few more > + * or less loop before setting c->at_max_writeback_rate is > much better > + * then a deadlock here. > + */ > + if (mutex_trylock(&bch_register_lock)) { > + for (i = 0; i < c->devices_max_used; i++) { > + if (!c->devices[i]) > + continue; > + if (UUID_FLASH_ONLY(&c->uuids[i])) > + continue; > + d = c->devices[i]; > + dc = container_of(d, struct cached_dev, disk); > + if (atomic_read(&dc->has_dirty)) > + dirty_dc_nr++; > + } > + c->previous_dirty_dc_nr = dirty_dc_nr; > + > + mutex_unlock(&bch_register_lock); > + } else > + dirty_dc_nr = c->previous_dirty_dc_nr; > + > + /* > + * Idle_counter is increased everytime when > update_writeback_rate() > + * is rescheduled in. If all backing devices attached to the > same > + * cache set has same dc->writeback_rate_update_seconds > value, it > + * is about 10 rounds of update_writeback_rate() is called > on each > + * backing device, then the code will fall through at set 1 to > + * c->at_max_writeback_rate, and a max wrteback rate to each > + * dc->writeback_rate.rate. This is not very accurate but > works well > + * to make sure the whole cache set has no new I/O coming before > + * writeback rate is set to a max number. > + */ > + if (atomic_inc_return(&c->idle_counter) < dirty_dc_nr * 10) > + return false; > + > + if (atomic_read(&c->at_max_writeback_rate) != 1) > + atomic_set(&c->at_max_writeback_rate, 1); > + > + > + atomic64_set(&dc->writeback_rate.rate, INT_MAX); > + > + /* keep writeback_rate_target as existing value */ > + dc->writeback_rate_proportional = 0; > + dc->writeback_rate_integral_scaled = 0; > + dc->writeback_rate_change = 0; > + > + /* > + * Check c->idle_counter and c->at_max_writeback_rate > agagain in case > + * new I/O arrives during before set_at_max_writeback_rate() > returns. > + * Then the writeback rate is set to 1, and its new value > should be > + * decided via __update_writeback_rate(). > + */ > + if (atomic_read(&c->idle_counter) < dirty_dc_nr * 10 || > + !atomic_read(&c->at_max_writeback_rate)) > + return false; > + > + return true; > +} > + > static void __update_writeback_rate(struct cached_dev *dc) > { > /* > @@ -104,8 +178,9 @@ static void __update_writeback_rate(struct > cached_dev *dc) > > dc->writeback_rate_proportional = proportional_scaled; > dc->writeback_rate_integral_scaled = integral_scaled; > - dc->writeback_rate_change = new_rate - dc->writeback_rate.rate; > - dc->writeback_rate.rate = new_rate; > + dc->writeback_rate_change = new_rate - > + atomic64_read(&dc->writeback_rate.rate); > + atomic64_set(&dc->writeback_rate.rate, new_rate); > dc->writeback_rate_target = target; > } > > @@ -138,9 +213,16 @@ static void update_writeback_rate(struct > work_struct *work) > > down_read(&dc->writeback_lock); > > - if (atomic_read(&dc->has_dirty) && > - dc->writeback_percent) > - __update_writeback_rate(dc); > + if (atomic_read(&dc->has_dirty) && dc->writeback_percent) { > + /* > + * If the whole cache set is idle, > set_at_max_writeback_rate() > + * will set writeback rate to a max number. Then it is > + * unncessary to update writeback rate for an idle > cache set > + * in maximum writeback rate number(s). > + */ > + if (!set_at_max_writeback_rate(c, dc)) > + __update_writeback_rate(dc); > + } > > up_read(&dc->writeback_lock); > > @@ -422,27 +504,6 @@ static void read_dirty(struct cached_dev *dc) > > delay = writeback_delay(dc, size); > > - /* If the control system would wait for at least half a > - * second, and there's been no reqs hitting the > backing disk > - * for awhile: use an alternate mode where we have > at most > - * one contiguous set of writebacks in flight at a > time. If > - * someone wants to do IO it will be quick, as it > will only > - * have to contend with one operation in flight, and > we'll > - * be round-tripping data to the backing disk as > quickly as > - * it can accept it. > - */ > - if (delay >= HZ / 2) { > - /* 3 means at least 1.5 seconds, up to 7.5 if we > - * have slowed way down. > - */ > - if (atomic_inc_return(&dc->backing_idle) >= 3) { > - /* Wait for current I/Os to finish */ > - closure_sync(&cl); > - /* And immediately launch a new set. */ > - delay = 0; > - } > - } > - > while (!kthread_should_stop() && > !test_bit(CACHE_SET_IO_DISABLE, > &dc->disk.c->flags) && > delay) { > @@ -715,7 +776,7 @@ void bch_cached_dev_writeback_init(struct > cached_dev *dc) > dc->writeback_running = true; > dc->writeback_percent = 10; > dc->writeback_delay = 30; > - dc->writeback_rate.rate = 1024; > + atomic64_set(&dc->writeback_rate.rate, 1024); > dc->writeback_rate_minimum = 8; > > dc->writeback_rate_update_seconds = > WRITEBACK_RATE_UPDATE_SECS_DEFAULT; > -- > 2.17.1 >

6 years, 10 months

2
2
0 0

[PATCH] x86: vdso: Pass --eh-frame-hdr to ld

by Alistair Strachan

Commit 379d98ddf413 ("x86: vdso: Use $LD instead of $CC to link") accidentally broke unwinding from userspace, because ld would strip the .eh_frame sections when linking. Originally, the compiler would implicitly add --eh-frame-hdr when invoking the linker, but when this Makefile was converted from invoking ld via the compiler, to invoking it directly (like vmlinux does), the flag was missed. (The EH_FRAME section is important for the VDSO shared libraries, but not for vmlinux.) Fix the problem by explicitly specifying --eh-frame-hdr, which restores parity with the old method. Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=201741 Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1659295 Reported-by: Laura Abbott <labbott(a)redhat.com> Fixes: 379d98ddf413 ("x86: vdso: Use $LD instead of $CC to link") Cc: stable(a)vger.kernel.org Cc: Andy Lutomirski <luto(a)kernel.org> Cc: Thomas Gleixner <tglx(a)linutronix.de> Cc: "H. Peter Anvin" <hpa(a)zytor.com> Cc: X86 ML <x86(a)kernel.org> Cc: Florian Weimer <fweimer(a)redhat.com>, Cc: Carlos O'Donell <carlos(a)redhat.com>, Cc: "H. J. Lu" <hjl.tools(a)gmail.com> Cc: Joel Fernandes <joel(a)joelfernandes.org> Cc: kernel-team(a)android.com Signed-off-by: Alistair Strachan <astrachan(a)google.com> --- arch/x86/entry/vdso/Makefile | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/arch/x86/entry/vdso/Makefile b/arch/x86/entry/vdso/Makefile index 141d415a8c80..c3d7ccd25381 100644 --- a/arch/x86/entry/vdso/Makefile +++ b/arch/x86/entry/vdso/Makefile @@ -171,7 +171,8 @@ quiet_cmd_vdso = VDSO $@ sh $(srctree)/$(src)/checkundef.sh '$(NM)' '$@' VDSO_LDFLAGS = -shared $(call ld-option, --hash-style=both) \ - $(call ld-option, --build-id) -Bsymbolic + $(call ld-option, --build-id) $(call ld-option, --eh-frame-hdr) \ + -Bsymbolic GCOV_PROFILE := n # -- 2.20.0.405.gbc1bbc6f85-goog

6 years, 10 months

2
1
0 0

[patch 11/11] scripts/spdxcheck.py: always open files in binary mode

by akpm＠linux-foundation.org

From: Thierry Reding <treding(a)nvidia.com> Subject: scripts/spdxcheck.py: always open files in binary mode The spdxcheck script currently falls over when confronted with a binary file (such as Documentation/logo.gif). To avoid that, always open files in binary mode and decode line-by-line, ignoring encoding errors. One tricky case is when piping data into the script and reading it from standard input. By default, standard input will be opened in text mode, so we need to reopen it in binary mode. The breakage only happens with python3 and results in a UnicodeDecodeError (according to Uwe). Link: http://lkml.kernel.org/r/20181212131210.28024-1-thierry.reding@gmail.com Fixes: 6f4d29df66ac ("scripts/spdxcheck.py: make python3 compliant") Signed-off-by: Thierry Reding <treding(a)nvidia.com> Reviewed-by: Jeremy Cline <jcline(a)redhat.com> Cc: Thomas Gleixner <tglx(a)linutronix.de> Cc: Jonathan Corbet <corbet(a)lwn.net> Cc: Joe Perches <joe(a)perches.com> Cc: Uwe Kleine-König <u.kleine-koenig(a)pengutronix.de> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- scripts/spdxcheck.py | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) --- a/scripts/spdxcheck.py~scripts-spdxcheckpy-always-open-files-in-binary-mode +++ a/scripts/spdxcheck.py @@ -168,6 +168,7 @@ class id_parser(object): self.curline = 0 try: for line in fd: + line = line.decode(locale.getpreferredencoding(False), errors='ignore') self.curline += 1 if self.curline > maxlines: break @@ -249,12 +250,13 @@ if __name__ == '__main__': try: if len(args.path) and args.path[0] == '-': - parser.parse_lines(sys.stdin, args.maxlines, '-') + stdin = os.fdopen(sys.stdin.fileno(), 'rb') + parser.parse_lines(stdin, args.maxlines, '-') else: if args.path: for p in args.path: if os.path.isfile(p): - parser.parse_lines(open(p), args.maxlines, p) + parser.parse_lines(open(p, 'rb'), args.maxlines, p) elif os.path.isdir(p): scan_git_subtree(repo.head.reference.commit.tree, p) else: _

6 years, 10 months

1
0
0 0

[patch 09/11] userfaultfd: check VM_MAYWRITE was set after verifying the uffd is registered

by akpm＠linux-foundation.org

From: Andrea Arcangeli <aarcange(a)redhat.com> Subject: userfaultfd: check VM_MAYWRITE was set after verifying the uffd is registered Calling UFFDIO_UNREGISTER on virtual ranges not yet registered in uffd could trigger an harmless false positive WARN_ON. Check the vma is already registered before checking VM_MAYWRITE to shut off the false positive warning. Link: http://lkml.kernel.org/r/20181206212028.18726-2-aarcange@redhat.com Cc: <stable(a)vger.kernel.org> Fixes: 29ec90660d68 ("userfaultfd: shmem/hugetlbfs: only allow to register VM_MAYWRITE vmas") Signed-off-by: Andrea Arcangeli <aarcange(a)redhat.com> Reported-by: syzbot+06c7092e7d71218a2c16(a)syzkaller.appspotmail.com Acked-by: Mike Rapoport <rppt(a)linux.ibm.com> Acked-by: Hugh Dickins <hughd(a)google.com> Acked-by: Peter Xu <peterx(a)redhat.com> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- fs/userfaultfd.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) --- a/fs/userfaultfd.c~userfaultfd-check-vm_maywrite-was-set-after-verifying-the-uffd-is-registered +++ a/fs/userfaultfd.c @@ -1566,7 +1566,6 @@ static int userfaultfd_unregister(struct cond_resched(); BUG_ON(!vma_can_userfault(vma)); - WARN_ON(!(vma->vm_flags & VM_MAYWRITE)); /* * Nothing to do: this vma is already registered into this @@ -1575,6 +1574,8 @@ static int userfaultfd_unregister(struct if (!vma->vm_userfaultfd_ctx.ctx) goto skip; + WARN_ON(!(vma->vm_flags & VM_MAYWRITE)); + if (vma->vm_start > start) start = vma->vm_start; vma_end = min(end, vma->vm_end); _

6 years, 10 months

1
0
0 0

[patch 08/11] fs/iomap.c: get/put the page in iomap_page_create/release()

by akpm＠linux-foundation.org

From: Piotr Jaroszynski <pjaroszynski(a)nvidia.com> Subject: fs/iomap.c: get/put the page in iomap_page_create/release() migrate_page_move_mapping() expects pages with private data set to have a page_count elevated by 1. This is what used to happen for xfs through the buffer_heads code before the switch to iomap in commit 82cb14175e7d ("xfs: add support for sub-pagesize writeback without buffer_heads"). Not having the count elevated causes move_pages() to fail on memory mapped files coming from xfs. Make iomap compatible with the migrate_page_move_mapping() assumption by elevating the page count as part of iomap_page_create() and lowering it in iomap_page_release(). It causes the move_pages() syscall to misbehave on memory mapped files from xfs. It does not not move any pages, which I suppose is "just" a perf issue, but it also ends up returning a positive number which is out of spec for the syscall. Talking to Michal Hocko, it sounds like returning positive numbers might be a necessary update to move_pages() anyway though (https://lkml.kernel.org/r/20181116114955.GJ14706@dhcp22.suse.cz). I only hit this in tests that verify that move_pages() actually moved the pages. The test also got confused by the positive return from move_pages() (it got treated as a success as positive numbers were not expected and not handled) making it a bit harder to track down what's going on. Link: http://lkml.kernel.org/r/20181115184140.1388751-1-pjaroszynski@nvidia.com Fixes: 82cb14175e7d ("xfs: add support for sub-pagesize writeback without buffer_heads") Signed-off-by: Piotr Jaroszynski <pjaroszynski(a)nvidia.com> Reviewed-by: Christoph Hellwig <hch(a)lst.de> Cc: William Kucharski <william.kucharski(a)oracle.com> Cc: Darrick J. Wong <darrick.wong(a)oracle.com> Cc: Brian Foster <bfoster(a)redhat.com> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- fs/iomap.c | 7 +++++++ 1 file changed, 7 insertions(+) --- a/fs/iomap.c~iomap-get-put-the-page-in-iomap_page_create-release +++ a/fs/iomap.c @@ -116,6 +116,12 @@ iomap_page_create(struct inode *inode, s atomic_set(&iop->read_count, 0); atomic_set(&iop->write_count, 0); bitmap_zero(iop->uptodate, PAGE_SIZE / SECTOR_SIZE); + + /* + * migrate_page_move_mapping() assumes that pages with private data have + * their count elevated by 1. + */ + get_page(page); set_page_private(page, (unsigned long)iop); SetPagePrivate(page); return iop; @@ -132,6 +138,7 @@ iomap_page_release(struct page *page) WARN_ON_ONCE(atomic_read(&iop->write_count)); ClearPagePrivate(page); set_page_private(page, 0); + put_page(page); kfree(iop); } _

6 years, 10 months

1
0
0 0

[RESEND PATCH v4 1/3] fs/dcache: Fix incorrect nr_dentry_unused accounting in shrink_dcache_sb()

by Waiman Long

The nr_dentry_unused per-cpu counter tracks dentries in both the LRU lists and the shrink lists where the DCACHE_LRU_LIST bit is set. The shrink_dcache_sb() function moves dentries from the LRU list to a shrink list and subtracts the dentry count from nr_dentry_unused. This is incorrect as the nr_dentry_unused count Will also be decremented in shrink_dentry_list() via d_shrink_del(). To fix this double decrement, the decrement in the shrink_dcache_sb() function is taken out. Fixes: 4e717f5c1083 ("list_lru: remove special case function list_lru_dispose_all." Cc: stable(a)vger.kernel.org Signed-off-by: Waiman Long <longman(a)redhat.com> Reviewed-by: Dave Chinner <dchinner(a)redhat.com> --- fs/dcache.c | 6 +----- 1 file changed, 1 insertion(+), 5 deletions(-) diff --git a/fs/dcache.c b/fs/dcache.c index 2593153..44e5652 100644 --- a/fs/dcache.c +++ b/fs/dcache.c @@ -1188,15 +1188,11 @@ static enum lru_status dentry_lru_isolate_shrink(struct list_head *item, */ void shrink_dcache_sb(struct super_block *sb) { - long freed; - do { LIST_HEAD(dispose); - freed = list_lru_walk(&sb->s_dentry_lru, + list_lru_walk(&sb->s_dentry_lru, dentry_lru_isolate_shrink, &dispose, 1024); - - this_cpu_sub(nr_dentry_unused, freed); shrink_dentry_list(&dispose); } while (list_lru_count(&sb->s_dentry_lru) > 0); } -- 1.8.3.1

6 years, 10 months

1
0
0 0

[PATCH] selftests: Fix test errors related to lib.mk khdr target

by shuah＠kernel.org

From: Shuah Khan <shuah(a)kernel.org> Commit b2d35fa5fc80 ("selftests: add headers_install to lib.mk") added khdr target to run headers_install target from the main Makefile. The logic uses KSFT_KHDR_INSTALL and top_srcdir as controls to initialize variables and include files to run headers_install from the top level Makefile. There are a few problems with this logic. 1. Exposes top_srcdir to all tests 2. Common logic impacts all tests 3. Uses KSFT_KHDR_INSTALL, top_srcdir, and khdr in an adhoc way. Tests add "khdr" dependency in their Makefiles to TEST_PROGS_EXTENDED in some cases, and STATIC_LIBS in other cases. This makes this framework confusing to use. The common logic that runs for all tests even when KSFT_KHDR_INSTALL isn't defined by the test. top_srcdir is initialized to a default value when test doesn't initialize it. It works for all tests without a sub-dir structure and tests with sub-dir structure fail to build. e.g: make -C sparc64/drivers/ or make -C drivers/dma-buf ../../lib.mk:20: ../../../../scripts/subarch.include: No such file or directory make: *** No rule to make target '../../../../scripts/subarch.include'. Stop. There is no reason to require all tests to define top_srcdir and there is no need to require tests to add khdr dependency using adhoc changes to TEST_* and other variables. Fix it with a consistent use of KSFT_KHDR_INSTALL and top_srcdir from tests that have the dependency on headers_install. Change common logic to include khdr target define and "all" target with dependency on khdr when KSFT_KHDR_INSTALL is defined. Only tests that have dependency on headers_install have to define just the KSFT_KHDR_INSTALL, and top_srcdir variables and there is no need to specify khdr dependency in the test Makefiles. Fixes: b2d35fa5fc80 ("selftests: add headers_install to lib.mk") Cc: stable(a)vger.kernel.org Signed-off-by: Shuah Khan <shuah(a)kernel.org> --- tools/testing/selftests/android/Makefile | 2 +- tools/testing/selftests/futex/functional/Makefile | 1 + tools/testing/selftests/gpio/Makefile | 6 +++--- tools/testing/selftests/kvm/Makefile | 2 +- tools/testing/selftests/lib.mk | 8 ++++---- tools/testing/selftests/networking/timestamping/Makefile | 1 + tools/testing/selftests/tc-testing/bpf/Makefile | 1 + tools/testing/selftests/vm/Makefile | 1 + 8 files changed, 13 insertions(+), 9 deletions(-) diff --git a/tools/testing/selftests/android/Makefile b/tools/testing/selftests/android/Makefile index d9a725478375..72c25a3cb658 100644 --- a/tools/testing/selftests/android/Makefile +++ b/tools/testing/selftests/android/Makefile @@ -6,7 +6,7 @@ TEST_PROGS := run.sh include ../lib.mk -all: khdr +all: @for DIR in $(SUBDIRS); do \ BUILD_TARGET=$(OUTPUT)/$$DIR; \ mkdir $$BUILD_TARGET -p; \ diff --git a/tools/testing/selftests/futex/functional/Makefile b/tools/testing/selftests/futex/functional/Makefile index ad1eeb14fda7..30996306cabc 100644 --- a/tools/testing/selftests/futex/functional/Makefile +++ b/tools/testing/selftests/futex/functional/Makefile @@ -19,6 +19,7 @@ TEST_GEN_FILES := \ TEST_PROGS := run.sh top_srcdir = ../../../../.. +KSFT_KHDR_INSTALL := 1 include ../../lib.mk $(TEST_GEN_FILES): $(HEADERS) diff --git a/tools/testing/selftests/gpio/Makefile b/tools/testing/selftests/gpio/Makefile index 46648427d537..07f572a1bd3f 100644 --- a/tools/testing/selftests/gpio/Makefile +++ b/tools/testing/selftests/gpio/Makefile @@ -10,8 +10,6 @@ TEST_PROGS_EXTENDED := gpio-mockup-chardev GPIODIR := $(realpath ../../../gpio) GPIOOBJ := gpio-utils.o -include ../lib.mk - all: $(TEST_PROGS_EXTENDED) override define CLEAN @@ -19,7 +17,9 @@ override define CLEAN $(MAKE) -C $(GPIODIR) OUTPUT=$(GPIODIR)/ clean endef -$(TEST_PROGS_EXTENDED):| khdr +KSFT_KHDR_INSTALL := 1 +include ../lib.mk + $(TEST_PROGS_EXTENDED): $(GPIODIR)/$(GPIOOBJ) $(GPIODIR)/$(GPIOOBJ): diff --git a/tools/testing/selftests/kvm/Makefile b/tools/testing/selftests/kvm/Makefile index 01a219229238..52bfe5e76907 100644 --- a/tools/testing/selftests/kvm/Makefile +++ b/tools/testing/selftests/kvm/Makefile @@ -1,6 +1,7 @@ all: top_srcdir = ../../../.. +KSFT_KHDR_INSTALL := 1 UNAME_M := $(shell uname -m) LIBKVM = lib/assert.c lib/elf.c lib/io.c lib/kvm_util.c lib/ucall.c lib/sparsebit.c @@ -44,7 +45,6 @@ $(OUTPUT)/libkvm.a: $(LIBKVM_OBJ) all: $(STATIC_LIBS) $(TEST_GEN_PROGS): $(STATIC_LIBS) -$(STATIC_LIBS):| khdr cscope: include_paths = $(LINUX_TOOL_INCLUDE) $(LINUX_HDR_PATH) include lib .. cscope: diff --git a/tools/testing/selftests/lib.mk b/tools/testing/selftests/lib.mk index 0a8e75886224..8b0f16409ed7 100644 --- a/tools/testing/selftests/lib.mk +++ b/tools/testing/selftests/lib.mk @@ -16,18 +16,18 @@ TEST_GEN_PROGS := $(patsubst %,$(OUTPUT)/%,$(TEST_GEN_PROGS)) TEST_GEN_PROGS_EXTENDED := $(patsubst %,$(OUTPUT)/%,$(TEST_GEN_PROGS_EXTENDED)) TEST_GEN_FILES := $(patsubst %,$(OUTPUT)/%,$(TEST_GEN_FILES)) +ifdef KSFT_KHDR_INSTALL top_srcdir ?= ../../../.. include $(top_srcdir)/scripts/subarch.include ARCH ?= $(SUBARCH) -all: $(TEST_GEN_PROGS) $(TEST_GEN_PROGS_EXTENDED) $(TEST_GEN_FILES) - .PHONY: khdr khdr: make ARCH=$(ARCH) -C $(top_srcdir) headers_install -ifdef KSFT_KHDR_INSTALL -$(TEST_GEN_PROGS) $(TEST_GEN_PROGS_EXTENDED) $(TEST_GEN_FILES):| khdr +all: khdr $(TEST_GEN_PROGS) $(TEST_GEN_PROGS_EXTENDED) $(TEST_GEN_FILES) +else +all: $(TEST_GEN_PROGS) $(TEST_GEN_PROGS_EXTENDED) $(TEST_GEN_FILES) endif .ONESHELL: diff --git a/tools/testing/selftests/networking/timestamping/Makefile b/tools/testing/selftests/networking/timestamping/Makefile index 14cfcf006936..c46c0eefab9e 100644 --- a/tools/testing/selftests/networking/timestamping/Makefile +++ b/tools/testing/selftests/networking/timestamping/Makefile @@ -6,6 +6,7 @@ TEST_PROGS := hwtstamp_config rxtimestamp timestamping txtimestamp all: $(TEST_PROGS) top_srcdir = ../../../../.. +KSFT_KHDR_INSTALL := 1 include ../../lib.mk clean: diff --git a/tools/testing/selftests/tc-testing/bpf/Makefile b/tools/testing/selftests/tc-testing/bpf/Makefile index dc92eb271d9a..be5a5e542804 100644 --- a/tools/testing/selftests/tc-testing/bpf/Makefile +++ b/tools/testing/selftests/tc-testing/bpf/Makefile @@ -4,6 +4,7 @@ APIDIR := ../../../../include/uapi TEST_GEN_FILES = action.o top_srcdir = ../../../../.. +KSFT_KHDR_INSTALL := 1 include ../../lib.mk CLANG ?= clang diff --git a/tools/testing/selftests/vm/Makefile b/tools/testing/selftests/vm/Makefile index 6e67e726e5a5..e13eb6cc8901 100644 --- a/tools/testing/selftests/vm/Makefile +++ b/tools/testing/selftests/vm/Makefile @@ -25,6 +25,7 @@ TEST_GEN_FILES += virtual_address_range TEST_PROGS := run_vmtests +KSFT_KHDR_INSTALL := 1 include ../lib.mk $(OUTPUT)/userfaultfd: LDLIBS += -lpthread -- 2.17.1

6 years, 10 months

5
7
0 0

Re: [PATCH 4.14 12/89] tcp: Do not underestimate rwnd_limited

by Greg Kroah-Hartman

On Fri, Dec 14, 2018 at 06:29:54AM -0800, Eric Dumazet wrote: > On Fri, Dec 14, 2018 at 6:26 AM Greg Kroah-Hartman < > gregkh(a)linuxfoundation.org> wrote: > > > On Fri, Dec 14, 2018 at 02:03:55PM +0000, Sudip Mukherjee wrote: > > > Hi Greg, > > > > > > On Fri, Dec 14, 2018 at 12:08 PM Greg Kroah-Hartman > > > <gregkh(a)linuxfoundation.org> wrote: > > > > > > > > 4.14-stable review patch. If anyone has any objections, please let me > > know. > > > > > > > > ------------------ > > > > > > > > From: Eric Dumazet <edumazet(a)google.com> > > > > > > > > [ Upstream commit 41727549de3e7281feb174d568c6e46823db8684 ] > > > > > > There is one upstream commit which fixes this one. > > > f9bfe4e6a9d0 ("tcp: lack of available data can also cause TSO defer") > > > > I can take this if Eric and/or David ack it :) > > > > > I ack it, I guess David had this queued already. > > No big deal, that is only for tcp_info instrumentation accuracy. If this is already queued for the next round of stable patches, and it's not a major issue, I'll just wait and take it when it shows up in a week or so. thanks, greg k-h

6 years, 10 months

3
2
0 0

[PATCH] MIPS: Expand MIPS32 ASIDs to 64 bits

by Paul Burton

ASIDs have always been stored as unsigned longs, ie. 32 bits on MIPS32 kernels. This is problematic because it is feasible for the ASID version to overflow & wrap around to zero. We currently attempt to handle this overflow by simply setting the ASID version to 1, using asid_first_version(), but we make no attempt to account for the fact that there may be mm_structs with stale ASIDs that have versions which we now reuse due to the overflow & wrap around. Encountering this requires that: 1) A struct mm_struct X is active on CPU A using ASID (V,n). 2) That mm is not used on CPU A for the length of time that it takes for CPU A's asid_cache to overflow & wrap around to the same version V that the mm had in step 1. During this time tasks using the mm could either be sleeping or only scheduled on other CPUs. 3) Some other mm Y becomes active on CPU A and is allocated the same ASID (V,n). 4) mm X now becomes active on CPU A again, and now incorrectly has the same ASID as mm Y. Where struct mm_struct ASIDs are represented above in the format (version, EntryHi.ASID), and on a typical MIPS32 system version will be 24 bits wide & EntryHi.ASID will be 8 bits wide. The length of time required in step 2 is highly dependent upon the CPU & workload, but for a hypothetical 2GHz CPU running a workload which generates a new ASID every 10000 cycles this period is around 248 days. Due to this long period of time & the fact that tasks need to be scheduled in just the right (or wrong, depending upon your inclination) way, this is obviously a difficult bug to encounter but it's entirely possible as evidenced by reports. In order to fix this, simply extend ASIDs to 64 bits even on MIPS32 builds. This will extend the period of time required for the hypothetical system above to encounter the problem from 28 days to around 3 trillion years, which feels safely outside of the realms of possibility. The cost of this is slightly more generated code in some commonly executed paths, but this is pretty minimal: | Code Size Gain | Percentage -----------------------|----------------|------------- decstation_defconfig | +270 | +0.00% 32r2el_defconfig | +652 | +0.01% 32r6el_defconfig | +1000 | +0.01% I have been unable to measure any change in performance of the LMbench lat_ctx or lat_proc tests resulting from the 64b ASIDs on either 32r2el_defconfig+interAptiv or 32r6el_defconfig+I6500 systems. Signed-off-by: Paul Burton <paul.burton(a)mips.com> Suggested-by: James Hogan <jhogan(a)kernel.org> References: https://lore.kernel.org/linux-mips/80B78A8B8FEE6145A87579E8435D78C30205D5F3… References: https://lore.kernel.org/linux-mips/1488684260-18867-1-git-send-email-jiwei.… Cc: Jiwei Sun <jiwei.sun(a)windriver.com> Cc: Yu Huabing <yhb(a)ruijie.com.cn> Cc: stable(a)vger.kernel.org # 2.6.12+ --- arch/mips/include/asm/cpu-info.h | 2 +- arch/mips/include/asm/mmu.h | 2 +- arch/mips/include/asm/mmu_context.h | 8 ++++---- arch/mips/mm/c-r3k.c | 2 +- 4 files changed, 7 insertions(+), 7 deletions(-) diff --git a/arch/mips/include/asm/cpu-info.h b/arch/mips/include/asm/cpu-info.h index a41059d47d31..ed7ffe4e63a3 100644 --- a/arch/mips/include/asm/cpu-info.h +++ b/arch/mips/include/asm/cpu-info.h @@ -50,7 +50,7 @@ struct guest_info { #define MIPS_CACHE_PINDEX 0x00000020 /* Physically indexed cache */ struct cpuinfo_mips { - unsigned long asid_cache; + u64 asid_cache; #ifdef CONFIG_MIPS_ASID_BITS_VARIABLE unsigned long asid_mask; #endif diff --git a/arch/mips/include/asm/mmu.h b/arch/mips/include/asm/mmu.h index 0740be7d5d4a..24d6b42345fb 100644 --- a/arch/mips/include/asm/mmu.h +++ b/arch/mips/include/asm/mmu.h @@ -7,7 +7,7 @@ #include <linux/wait.h> typedef struct { - unsigned long asid[NR_CPUS]; + u64 asid[NR_CPUS]; void *vdso; atomic_t fp_mode_switching; diff --git a/arch/mips/include/asm/mmu_context.h b/arch/mips/include/asm/mmu_context.h index 94414561de0e..fd869d538a3c 100644 --- a/arch/mips/include/asm/mmu_context.h +++ b/arch/mips/include/asm/mmu_context.h @@ -76,14 +76,14 @@ extern unsigned long pgd_current[]; * All unused by hardware upper bits will be considered * as a software asid extension. */ -static unsigned long asid_version_mask(unsigned int cpu) +static u64 asid_version_mask(unsigned int cpu) { unsigned long asid_mask = cpu_asid_mask(&cpu_data[cpu]); - return ~(asid_mask | (asid_mask - 1)); + return ~(u64)(asid_mask | (asid_mask - 1)); } -static unsigned long asid_first_version(unsigned int cpu) +static u64 asid_first_version(unsigned int cpu) { return ~asid_version_mask(cpu) + 1; } @@ -102,7 +102,7 @@ static inline void enter_lazy_tlb(struct mm_struct *mm, struct task_struct *tsk) static inline void get_new_mmu_context(struct mm_struct *mm, unsigned long cpu) { - unsigned long asid = asid_cache(cpu); + u64 asid = asid_cache(cpu); if (!((asid += cpu_asid_inc()) & cpu_asid_mask(&cpu_data[cpu]))) { if (cpu_has_vtag_icache) diff --git a/arch/mips/mm/c-r3k.c b/arch/mips/mm/c-r3k.c index 3466fcdae0ca..01848cdf2074 100644 --- a/arch/mips/mm/c-r3k.c +++ b/arch/mips/mm/c-r3k.c @@ -245,7 +245,7 @@ static void r3k_flush_cache_page(struct vm_area_struct *vma, pmd_t *pmdp; pte_t *ptep; - pr_debug("cpage[%08lx,%08lx]\n", + pr_debug("cpage[%08llx,%08lx]\n", cpu_context(smp_processor_id(), mm), addr); /* No ASID => no such page in the cache. */ -- 2.19.1

6 years, 10 months

1
2
0 0

2025

2024

2023

2022

2021

2020

2019

2018

2017

Linux-stable-mirror December 2018