On Mon, Sep 14, 2020 at 01:59:15PM -0700, Linus Torvalds wrote:
> On Mon, Sep 14, 2020 at 1:45 PM Thomas Gleixner <tglx(a)linutronix.de> wrote:
> >
> > Recently merged code does:
> >
> > gfp = preemptible() ? GFP_KERNEL : GFP_ATOMIC;
> >
> > Looks obviously correct, except for the fact that preemptible() is
> > unconditionally false for CONFIF_PREEMPT_COUNT=n, i.e. all allocations in
> > that code use GFP_ATOMIC on such kernels.
>
> I don't think this is a good reason to entirely get rid of the no-preempt thing.
>
> The above is just garbage. It's bogus. You can't do it.
>
> Blaming the no-preempt code for this bug is extremely unfair, imho.
>
> And the no-preempt code does help make for much better code generation
> for simple spinlocks.
>
> Where is that horribly buggy recent code? It's not in that exact
> format, certainly, since 'grep' doesn't find it.
It would be convenient for that "gfp =" code to work, as this would
allow better cache locality while invoking RCU callbacks, and would
further provide better robustness to callback floods. The full story
is quite long, but here are alternatives have not yet been proven to be
abject failures:
1. Use workqueues to do the allocations in a clean context.
While waiting for the allocations, the callbacks are queued
in the old cache-busting manner. This functions correctly,
but in the meantime (which on busy systems can be some time)
the cache locality and robustness are lost.
2. Provide the ability to allocate memory in raw atomic context.
This is extremely effective, especially when used in combination
with #1 above, but as you might suspect, the MM guys don't like
it much.
In contrast, with Thomas's patch series, call_rcu() and kvfree_rcu()
could just look at preemptible() to see whether or not it was safe to
allocate memory, even in !PREEMPT kernels -- and in the common case,
it almost always would be safe. It is quite possible that this approach
would work in isolation, or failing that, that adding #1 above would do
the trick.
I understand that this is all very hand-wavy, and I do apologize for that.
If you really want the full sad story with performance numbers and the
works, let me know!
Thanx, Paul
Hi,
This fixes a couple of minor aggravating factors that I ran across while
trying to do some changes in selftests/vm. These are simple things, but
like most things with GNU Make, it's rarely obvious what's wrong until
you understand *the entire Makefile and all of its includes*.
So while there is, of course, joy in learning those details, I thought I'd
fix these little things, so as to allow others to skip out on the Joy if
they so choose. :)
First of all, if you have an item (let's choose userfaultfd for an
example) that fails to build, you might do this:
$ make -j32
# ...you observe a failed item in the threaded output
# OK, let's get a closer look
$ make
# ...but now the build quietly "succeeds".
That's what Patch 0001 fixes.
Second, if you instead attempt this approach for your closer look (a casual
mistake, as it's not supported):
$ make userfaultfd
# ...userfaultfd fails to link, due to incomplete LDLIBS
That's what Patch 0002 fixes.
John Hubbard (2):
selftests/vm: fix false build success on the second and later attempts
selftests/vm: fix incorrect gcc invocation in some cases
tools/testing/selftests/vm/Makefile | 17 +++++++++++++++++
1 file changed, 17 insertions(+)
--
2.28.0
On Tue, 15 Sep 2020 at 01:43, Linus Torvalds
<torvalds(a)linux-foundation.org> wrote:
>
> On Mon, Sep 14, 2020 at 3:24 PM Linus Torvalds
> <torvalds(a)linux-foundation.org> wrote:
> >
> > Ard and Herbert added to participants: see
> > chacha20poly1305_crypt_sg_inplace(), which does
> >
> > flags = SG_MITER_TO_SG;
> > if (!preemptible())
> > flags |= SG_MITER_ATOMIC;
> >
> > introduced in commit d95312a3ccc0 ("crypto: lib/chacha20poly1305 -
> > reimplement crypt_from_sg() routine").
>
> As far as I can tell, the only reason for this all is to try to use
> "kmap()" rather than "kmap_atomic()".
>
> And kmap() actually has the much more complex "might_sleep()" tests,
> and apparently the "preemptible()" check wasn't even the proper full
> debug check, it was just a complete hack to catch the one that
> triggered.
>
This was not driven by a failing check.
The documentation of kmap_atomic() states the following:
* The use of kmap_atomic/kunmap_atomic is discouraged - kmap/kunmap
* gives a more generic (and caching) interface. But kmap_atomic can
* be used in IRQ contexts, so in some (very limited) cases we need
* it.
so if this is no longer accurate, perhaps we should fix it?
But another reason I tried to avoid kmap_atomic() is that it disables
preemption unconditionally, even on 64-bit architectures where HIGHMEM
is irrelevant. So using kmap_atomic() here means that the bulk of
WireGuard packet encryption runs with preemption disabled, essentially
for legacy reasons.
> From a quick look, that code should probably just get rid of
> SG_MITER_ATOMIC entirely, and alwayse use kmap_atomic().
>
> kmap_atomic() is actually the faster and proper interface to use
> anyway (never mind that any of this matters on any sane hardware). The
> old kmap() and kunmap() interfaces should generally be avoided like
> the plague - yes, they allow sleeping in the middle and that is
> sometimes required, but if you don't need that, you should never ever
> use them.
>
> We used to have a very nasty kmap_atomic() that required people to be
> very careful and know exactly which atomic entry to use, and that was
> admitedly quite nasty.
>
> So it _looks_ like this code started using kmap() - probably back when
> kmap_atomic() was so cumbersome to use - and was then converted
> (conditionally) to kmap_atomic() rather than just changed whole-sale.
> Is there actually something that wants to use those sg_miter functions
> and sleep?
>
> Because if there is, that choice should come from the outside, not
> from inside lib/scatterlist.c trying to make some bad guess based on
> the wrong thing entirely.
>
> Linus