- Linux-stable-mirror - lists.linaro.org

[PATCH][v2] x86/microcode/intel: check cpu stepping and processor flag before saving microcode

by Chen Yu

Currently scan_microcode() leverages microcode_matches() to check if the microcode matches the CPU by comparing the family and model. However before saving the microcode in scan_microcode(), the processor stepping and flag of the microcode signature should also be considered in order to avoid incompatible update and caused the failure of microcode update. For example on one platform the microcode failed to be updated to the latest revison on APs during resume from S3 due to incompatible cpu stepping and signature->pf. This is because the scan_microcode() has saved an incompatible copy of intel_ucode_patch in save_microcode_in_initrd_intel() after bootup. And this intel_ucode_patch is used by APs during early resume from S3 which results in unchecked MSR access error during resume from S3: [ 95.519390] unchecked MSR access error: RDMSR from 0x123 at rIP: 0xffffffffb7676208 (native_read_msr+0x8/0x40) [ 95.519391] Call Trace: [ 95.519395] update_srbds_msr+0x38/0x80 [ 95.519396] identify_secondary_cpu+0x7a/0x90 [ 95.519397] smp_store_cpu_info+0x4e/0x60 [ 95.519398] start_secondary+0x49/0x150 [ 95.519399] secondary_startup_64_no_verify+0xa6/0xab The system keeps running on old microcode during resume: [ 210.366757] microcode: load_ucode_intel_ap: CPU1, enter, intel_ucode_patch: 0xffff9bf2816e0000 [ 210.366757] microcode: load_ucode_intel_ap: CPU1, p: 0xffff9bf2816e0000, rev: 0xd6 [ 210.366759] microcode: apply_microcode_early: rev: 0x84 [ 210.367826] microcode: apply_microcode_early: rev after upgrade: 0x84 until mc_cpu_starting() is invoked on each AP during resume and the correct microcode is updated via apply_microcode_intel(). To fix this issue, the scan_microcode() uses find_matching_signature() instead of microcode_matches() to compare the (family, model, stepping, processor flag), and only save the microcode that matches. As there is no other place invoking microcode_matches(), remove it accordingly. Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=208535 Fixes: 06b8534cb728 ("x86/microcode: Rework microcode loading") Cc: stable(a)vger.kernel.org#v4.10+ Reviewed-by: Ashok Raj <ashok.raj(a)intel.com> Signed-off-by: Chen Yu <yu.c.chen(a)intel.com> --- v2: Remove RFC tag and Cc the stable mailing list. --- arch/x86/kernel/cpu/microcode/intel.c | 50 ++------------------------- 1 file changed, 2 insertions(+), 48 deletions(-) diff --git a/arch/x86/kernel/cpu/microcode/intel.c b/arch/x86/kernel/cpu/microcode/intel.c index 6a99535d7f37..923853f79099 100644 --- a/arch/x86/kernel/cpu/microcode/intel.c +++ b/arch/x86/kernel/cpu/microcode/intel.c @@ -100,53 +100,6 @@ static int has_newer_microcode(void *mc, unsigned int csig, int cpf, int new_rev return find_matching_signature(mc, csig, cpf); } -/* - * Given CPU signature and a microcode patch, this function finds if the - * microcode patch has matching family and model with the CPU. - * - * %true - if there's a match - * %false - otherwise - */ -static bool microcode_matches(struct microcode_header_intel *mc_header, - unsigned long sig) -{ - unsigned long total_size = get_totalsize(mc_header); - unsigned long data_size = get_datasize(mc_header); - struct extended_sigtable *ext_header; - unsigned int fam_ucode, model_ucode; - struct extended_signature *ext_sig; - unsigned int fam, model; - int ext_sigcount, i; - - fam = x86_family(sig); - model = x86_model(sig); - - fam_ucode = x86_family(mc_header->sig); - model_ucode = x86_model(mc_header->sig); - - if (fam == fam_ucode && model == model_ucode) - return true; - - /* Look for ext. headers: */ - if (total_size <= data_size + MC_HEADER_SIZE) - return false; - - ext_header = (void *) mc_header + data_size + MC_HEADER_SIZE; - ext_sig = (void *)ext_header + EXT_HEADER_SIZE; - ext_sigcount = ext_header->count; - - for (i = 0; i < ext_sigcount; i++) { - fam_ucode = x86_family(ext_sig->sig); - model_ucode = x86_model(ext_sig->sig); - - if (fam == fam_ucode && model == model_ucode) - return true; - - ext_sig++; - } - return false; -} - static struct ucode_patch *memdup_patch(void *data, unsigned int size) { struct ucode_patch *p; @@ -344,7 +297,8 @@ scan_microcode(void *data, size_t size, struct ucode_cpu_info *uci, bool save) size -= mc_size; - if (!microcode_matches(mc_header, uci->cpu_sig.sig)) { + if (!find_matching_signature(data, uci->cpu_sig.sig, + uci->cpu_sig.pf)) { data += mc_size; continue; } -- 2.17.1

4 years, 11 months

2
1
0 0

[PATCH] ubifs: wbuf: Don't leak kernel memory to flash

by Richard Weinberger

Write buffers use a kmalloc()'ed buffer, they can leak up to seven bytes of kernel memory to flash if writes are not aligned. So use ubifs_pad() to fill these gaps with padding bytes. This was never a problem while scanning because the scanner logic manually aligns node lengths and skips over these gaps. Cc: <stable(a)vger.kernel.org> Fixes: 1e51764a3c2ac05a2 ("UBIFS: add new flash file system") Signed-off-by: Richard Weinberger <richard(a)nod.at> --- fs/ubifs/io.c | 13 +++++++++++-- 1 file changed, 11 insertions(+), 2 deletions(-) diff --git a/fs/ubifs/io.c b/fs/ubifs/io.c index 7e4bfaf2871f..eae9cf5a57b0 100644 --- a/fs/ubifs/io.c +++ b/fs/ubifs/io.c @@ -319,7 +319,7 @@ void ubifs_pad(const struct ubifs_info *c, void *buf, int pad) { uint32_t crc; - ubifs_assert(c, pad >= 0 && !(pad & 7)); + ubifs_assert(c, pad >= 0); if (pad >= UBIFS_PAD_NODE_SZ) { struct ubifs_ch *ch = buf; @@ -764,6 +764,10 @@ int ubifs_wbuf_write_nolock(struct ubifs_wbuf *wbuf, void *buf, int len) * write-buffer. */ memcpy(wbuf->buf + wbuf->used, buf, len); + if (aligned_len > len) { + ubifs_assert(c, aligned_len - len < 8); + ubifs_pad(c, wbuf->buf + wbuf->used + len, aligned_len - len); + } if (aligned_len == wbuf->avail) { dbg_io("flush jhead %s wbuf to LEB %d:%d", @@ -856,13 +860,18 @@ int ubifs_wbuf_write_nolock(struct ubifs_wbuf *wbuf, void *buf, int len) } spin_lock(&wbuf->lock); - if (aligned_len) + if (aligned_len) { /* * And now we have what's left and what does not take whole * max. write unit, so write it to the write-buffer and we are * done. */ memcpy(wbuf->buf, buf + written, len); + if (aligned_len > len) { + ubifs_assert(c, aligned_len - len < 8); + ubifs_pad(c, wbuf->buf + len, aligned_len - len); + } + } if (c->leb_size - wbuf->offs >= c->max_write_size) wbuf->size = c->max_write_size; -- 2.26.2

4 years, 11 months

3
3
0 0

[PATCH] unlz4: Handle 0-size chunks, discard trailing padding/garbage

by siarhei.liakh＠concurrent-rt.com

From: Siarhei Liakh <siarhei.liakh(a)concurrent-rt.com> TL;DR: There are two places in unlz4() function where reads beyond the end of a buffer might happen under certain conditions which had been observed in real life on stock Ubuntu 20.04 x86_64 with several vanilla mainline kernels, including 5.10. As a result of this issue, the kernel fails to decompress LZ4-compressed initramfs with following message showing up in the logs: initramfs unpacking failed: Decoding failed Note that in most cases the affected system is still able to proceed with the boot process to completion. LONG STORY: Background. Not so long ago we've noticed that some of our Ubuntu 20.04 x86_64 test systems often fail to boot newly generated initramfs image. After extensive investigation we determined that a failure required the following combination for our 5.4.66-rt38 kernel with some additional custom patches: Real x86_64 hardware or QEMU UEFI boot Ubunutu 20.04 (or 20.04.1) x86_64 CONFIG_BLK_DEV_RAM=y in .config COMPRESS=lz4 in initramfs.conf Freshly compiled and installed kernel Freshly generated and installed initramfs image In our testing, such a combination would often produce a non-bootable system. It is important to note that [un]bootability of the system was later tracked down to particular instances of initramfs images, and would follow them if they were to be switched around/transferred to other systems. What is even more important is that consecutive re-generations of initramfs images from the same source and binary materials would yield about 75% of "bad" images. Further, once the image is identified as "bad",it always stays "bad"; once one is "good" it always stays "good". Reverting CONFIG_BLK_DEV_RAM to "m" (default in Ubuntu), or changing COMPRESS to "gzip" yields a 100% bootable system. Decompressing "bad" initramfs image with "unmkinitramfs" yields *exactly* the same set of binaries, as verified by matching MD5 sums to those from "good" image. Speculation. Based on general observations, it appears that Ubuntu's userland toolchain cannot consistently generate exactly the same compressed initramfs image, likely due to some variations in timestamps between the runs. This causes variations in compressed lz4 data stream. Further, either initramfs tools or lz4 libraries appear to pad compressed lz4 output to closest 4-byte boundary. lz4 v1.9.2 that ships with Ubuntu 20.04 appears to be able to handle such padding just fine, while lz4 (supposedly v1.8.3) within Linux kernel cannot. Several reports of somewhat similar behavior had been recently circulation through different bug tracking systems and discussion forums [1-4]. I also suspect only that systems which can mount permanent root directly (or with help of modules contained in first, supposedly uncompressed, part of initramfs, or the ones with statically linked modules) can actually complete the boot when LZ4 decompression fails. This would certainly explain why most of Ubuntu systems still manage to boot even after failing to decompress the image. The facts. Regardless of whether Ubuntu 20.04 toolchain produces a valid lz4-compressed initramfs image or not, current version of unlz4() function in kernel has two code paths which had been observed attempting to read beyond the buffer end when presented with one of the "padded"/"bad" initramfs images generated by stock Ubuntu 20.04 toolchain. Some configurations of some 5.4 kernels are known to fail to boot in such cases. This behavior also becomes evident on vanilla 5.10.0-rc3 and 5.10.0-rc4 kernels with addition of two logging statements for corresponding edge cases, even though it does not prevent system from booting in most generic configurations. Further investigation is likely warranted to confirm whether userland toolchain contains any bugs and/or whether any of these cases constitute violation of LZ4 and/or initramfs specification. References [1] https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1835660 [2] https://github.com/linuxmint/mint20-beta/issues/90 [3] https://askubuntu.com/questions/1245458/getting-the-message-0-283078-initra… [4] https://forums.linuxmint.com/viewtopic.php?t=323152 Signed-off-by: Siarhei Liakh <siarhei.liakh(a)concurrent-rt.com> --- Please CC: me directly on all replies. lib/decompress_unlz4.c | 29 +++++++++++++++++++++++++++++ 1 file changed, 29 insertions(+) diff --git a/lib/decompress_unlz4.c b/lib/decompress_unlz4.c index c0cfcfd486be..a016643a6dc5 100644 --- a/lib/decompress_unlz4.c +++ b/lib/decompress_unlz4.c @@ -125,6 +125,21 @@ STATIC inline int INIT unlz4(u8 *input, long in_len, continue; } + if (chunksize == 0) { + /* + * Nothing to decode... + * FIXME: this could be an error condition due + * to invalid or corrupt data. However, some + * userspace tools had been observed producing + * otherwise valid initramfs images which happen + * to hit this condition. + * TODO: need to figure out whether the latest + * LZ4 and initramfs specifications allows for + * zero-sized chunks. + * See similar message below. + */ + break; + } if (posp) *posp += 4; @@ -179,6 +194,20 @@ STATIC inline int INIT unlz4(u8 *input, long in_len, else if (size < 0) { error("data corrupted"); goto exit_2; + } else if (size < 4) { + /* + * Ignore any undesized junk/padding... + * FIXME: this could be an error condition due + * to invalid or corrupt data. However, some + * userspace tools had been observed producing + * otherwise valid initramfs images which happen + * to hit this condition. + * TODO: need to figure out whether the latest + * LZ4 and initramfs specifications allows for + * small padding at the end of the chunk. + * See similar message above. + */ + break; } inp += chunksize; } -- 2.17.1

4 years, 11 months

2
1
0 0

stable-rc/queue/4.14 baseline: 58 runs, 1 regressions (v4.14.206-57-g106ef0d11ee4)

by kernelci.org bot

stable-rc/queue/4.14 baseline: 58 runs, 1 regressions (v4.14.206-57-g106ef0d11ee4) Regressions Summary ------------------- platform | arch | lab | compiler | defconfig | regressions -----------+------+---------------+----------+--------------------+------------ odroid-xu3 | arm | lab-collabora | gcc-8 | multi_v7_defconfig | 1 Details: https://kernelci.org/test/job/stable-rc/branch/queue%2F4.14/kernel/v4.14.20… Test: baseline Tree: stable-rc Branch: queue/4.14 Describe: v4.14.206-57-g106ef0d11ee4 URL: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git SHA: 106ef0d11ee4695a42a4544818b1b6dbf22379e6 Test Regressions ---------------- platform | arch | lab | compiler | defconfig | regressions -----------+------+---------------+----------+--------------------+------------ odroid-xu3 | arm | lab-collabora | gcc-8 | multi_v7_defconfig | 1 Details: https://kernelci.org/test/plan/id/5fb333f07e16472dcdd22e62 Results: 0 PASS, 1 FAIL, 0 SKIP Full config: multi_v7_defconfig Compiler: gcc-8 (arm-linux-gnueabihf-gcc (Debian 8.3.0-2) 8.3.0) Plain log: https://storage.kernelci.org//stable-rc/queue-4.14/v4.14.206-57-g106ef0d11e… HTML log: https://storage.kernelci.org//stable-rc/queue-4.14/v4.14.206-57-g106ef0d11e… Rootfs: http://storage.kernelci.org/images/rootfs/buildroot/kci-2020.05-4-g97706c5d… * baseline.login: https://kernelci.org/test/case/id/5fb333f07e16472dcdd22e63 new failure (last pass: v4.14.206-22-g2ec7a9bf443b0)

4 years, 11 months

1
0
0 0

stable-rc/queue/4.14 build: 4 builds: 0 failed, 4 passed, 1 warning (v4.14.206-57-g106ef0d11ee4)

by kernelci.org bot

stable-rc/queue/4.14 build: 4 builds: 0 failed, 4 passed, 1 warning (v4.14.206-57-g106ef0d11ee4) Full Build Summary: https://kernelci.org/build/stable-rc/branch/queue%2F4.14/kernel/v4.14.206-5… Tree: stable-rc Branch: queue/4.14 Git Describe: v4.14.206-57-g106ef0d11ee4 Git Commit: 106ef0d11ee4695a42a4544818b1b6dbf22379e6 Git URL: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git Built: 2 unique architectures Warnings Detected: arm: palmz72_defconfig (gcc-8): 1 warning mips: Warnings summary: 1 /scratch/linux/drivers/clk/clk.c:48:27: warning: ‘orphan_list’ defined but not used [-Wunused-variable] ================================================================================ Detailed per-defconfig build reports: -------------------------------------------------------------------------------- dove_defconfig (arm, gcc-8) — PASS, 0 errors, 0 warnings, 0 section mismatches -------------------------------------------------------------------------------- e55_defconfig (mips, gcc-8) — PASS, 0 errors, 0 warnings, 0 section mismatches -------------------------------------------------------------------------------- multi_v7_defconfig (arm, gcc-8) — PASS, 0 errors, 0 warnings, 0 section mismatches -------------------------------------------------------------------------------- palmz72_defconfig (arm, gcc-8) — PASS, 0 errors, 1 warning, 0 section mismatches Warnings: /scratch/linux/drivers/clk/clk.c:48:27: warning: ‘orphan_list’ defined but not used [-Wunused-variable] --- For more info write to <info(a)kernelci.org>

4 years, 11 months

1
0
0 0

stable-rc/queue/4.4 build: 4 builds: 0 failed, 4 passed (v4.4.243-41-g75498c12fce0)

by kernelci.org bot

stable-rc/queue/4.4 build: 4 builds: 0 failed, 4 passed (v4.4.243-41-g75498c12fce0) Full Build Summary: https://kernelci.org/build/stable-rc/branch/queue%2F4.4/kernel/v4.4.243-41-… Tree: stable-rc Branch: queue/4.4 Git Describe: v4.4.243-41-g75498c12fce0 Git Commit: 75498c12fce0d400d89cf310841ed46844936e6f Git URL: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git Built: 2 unique architectures ================================================================================ Detailed per-defconfig build reports: -------------------------------------------------------------------------------- cns3420vb_defconfig (arm, gcc-8) — PASS, 0 errors, 0 warnings, 0 section mismatches -------------------------------------------------------------------------------- footbridge_defconfig (arm, gcc-8) — PASS, 0 errors, 0 warnings, 0 section mismatches -------------------------------------------------------------------------------- msp71xx_defconfig (mips, gcc-8) — PASS, 0 errors, 0 warnings, 0 section mismatches -------------------------------------------------------------------------------- s3c6400_defconfig (arm, gcc-8) — PASS, 0 errors, 0 warnings, 0 section mismatches --- For more info write to <info(a)kernelci.org>

4 years, 11 months

1
0
0 0

stable-rc/queue/4.9 build: 2 builds: 0 failed, 2 passed (v4.9.243-42-gaedb439106403)

by kernelci.org bot

stable-rc/queue/4.9 build: 2 builds: 0 failed, 2 passed (v4.9.243-42-gaedb439106403) Full Build Summary: https://kernelci.org/build/stable-rc/branch/queue%2F4.9/kernel/v4.9.243-42-… Tree: stable-rc Branch: queue/4.9 Git Describe: v4.9.243-42-gaedb439106403 Git Commit: aedb439106403bf8967b240e3026d45894489852 Git URL: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git Built: 1 unique architecture ================================================================================ Detailed per-defconfig build reports: -------------------------------------------------------------------------------- exynos_defconfig (arm, gcc-8) — PASS, 0 errors, 0 warnings, 0 section mismatches -------------------------------------------------------------------------------- imx_v4_v5_defconfig (arm, gcc-8) — PASS, 0 errors, 0 warnings, 0 section mismatches --- For more info write to <info(a)kernelci.org>

4 years, 11 months

1
0
0 0

[PATCH v2 2/4] Kbuild: do not emit debug info for assembly with LLVM_IAS=1

by Nick Desaulniers

Clang's integrated assembler produces the warning for assembly files: warning: DWARF2 only supports one section per compilation unit If -Wa,-gdwarf-* is unspecified, then debug info is not emitted. This will be re-enabled for new DWARF versions in a follow up patch. Enables defconfig+CONFIG_DEBUG_INFO to build cleanly with LLVM=1 LLVM_IAS=1 for x86_64 and arm64. Cc: <stable(a)vger.kernel.org> Link: https://github.com/ClangBuiltLinux/linux/issues/716 Reported-by: Nathan Chancellor <natechancellor(a)gmail.com> Suggested-by: Dmitry Golovin <dima(a)golovin.in> Suggested-by: Sedat Dilek <sedat.dilek(a)gmail.com> Signed-off-by: Nick Desaulniers <ndesaulniers(a)google.com> --- Makefile | 2 ++ 1 file changed, 2 insertions(+) diff --git a/Makefile b/Makefile index f353886dbf44..75b1a3dcbf30 100644 --- a/Makefile +++ b/Makefile @@ -826,7 +826,9 @@ else DEBUG_CFLAGS += -g endif +ifndef LLVM_IAS KBUILD_AFLAGS += -Wa,-gdwarf-2 +endif ifdef CONFIG_DEBUG_INFO_DWARF4 DEBUG_CFLAGS += -gdwarf-4 -- 2.29.1.341.ge80a0c044ae-goog

4 years, 11 months

3
5
0 0

stable-rc/queue/4.19 build: 10 builds: 0 failed, 10 passed, 4 warnings (v4.19.157-62-gdee36feaf4bf)

by kernelci.org bot

stable-rc/queue/4.19 build: 10 builds: 0 failed, 10 passed, 4 warnings (v4.19.157-62-gdee36feaf4bf) Full Build Summary: https://kernelci.org/build/stable-rc/branch/queue%2F4.19/kernel/v4.19.157-6… Tree: stable-rc Branch: queue/4.19 Git Describe: v4.19.157-62-gdee36feaf4bf Git Commit: dee36feaf4bf77a52f6cb2cda611492f7b5049b3 Git URL: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git Built: 2 unique architectures Warnings Detected: arm: colibri_pxa300_defconfig (gcc-8): 1 warning pxa910_defconfig (gcc-8): 1 warning tct_hammer_defconfig (gcc-8): 1 warning mips: gcw0_defconfig (gcc-8): 1 warning Warnings summary: 4 /scratch/linux/drivers/clk/clk.c:49:27: warning: ‘orphan_list’ defined but not used [-Wunused-variable] ================================================================================ Detailed per-defconfig build reports: -------------------------------------------------------------------------------- acs5k_tiny_defconfig (arm, gcc-8) — PASS, 0 errors, 0 warnings, 0 section mismatches -------------------------------------------------------------------------------- colibri_pxa270_defconfig (arm, gcc-8) — PASS, 0 errors, 0 warnings, 0 section mismatches -------------------------------------------------------------------------------- colibri_pxa300_defconfig (arm, gcc-8) — PASS, 0 errors, 1 warning, 0 section mismatches Warnings: /scratch/linux/drivers/clk/clk.c:49:27: warning: ‘orphan_list’ defined but not used [-Wunused-variable] -------------------------------------------------------------------------------- gcw0_defconfig (mips, gcc-8) — PASS, 0 errors, 1 warning, 0 section mismatches Warnings: /scratch/linux/drivers/clk/clk.c:49:27: warning: ‘orphan_list’ defined but not used [-Wunused-variable] -------------------------------------------------------------------------------- mini2440_defconfig (arm, gcc-8) — PASS, 0 errors, 0 warnings, 0 section mismatches -------------------------------------------------------------------------------- neponset_defconfig (arm, gcc-8) — PASS, 0 errors, 0 warnings, 0 section mismatches -------------------------------------------------------------------------------- nuc960_defconfig (arm, gcc-8) — PASS, 0 errors, 0 warnings, 0 section mismatches -------------------------------------------------------------------------------- pxa910_defconfig (arm, gcc-8) — PASS, 0 errors, 1 warning, 0 section mismatches Warnings: /scratch/linux/drivers/clk/clk.c:49:27: warning: ‘orphan_list’ defined but not used [-Wunused-variable] -------------------------------------------------------------------------------- realview_defconfig (arm, gcc-8) — PASS, 0 errors, 0 warnings, 0 section mismatches -------------------------------------------------------------------------------- tct_hammer_defconfig (arm, gcc-8) — PASS, 0 errors, 1 warning, 0 section mismatches Warnings: /scratch/linux/drivers/clk/clk.c:49:27: warning: ‘orphan_list’ defined but not used [-Wunused-variable] --- For more info write to <info(a)kernelci.org>

4 years, 11 months

1
0
0 0

+ page_frag-recover-from-memory-pressure.patch added to -mm tree

by akpm＠linux-foundation.org

The patch titled Subject: mm, page_frag: recover from memory pressure has been added to the -mm tree. Its filename is page_frag-recover-from-memory-pressure.patch This patch should soon appear at https://ozlabs.org/~akpm/mmots/broken-out/page_frag-recover-from-memory-pre… and later at https://ozlabs.org/~akpm/mmotm/broken-out/page_frag-recover-from-memory-pre… Before you just go and hit "reply", please: a) Consider who else should be cc'ed b) Prefer to cc a suitable mailing list as well c) Ideally: find the original patch on the mailing list and do a reply-to-all to that, adding suitable additional cc's *** Remember to use Documentation/process/submit-checklist.rst when testing your code *** The -mm tree is included into linux-next and is updated there every 3-4 working days ------------------------------------------------------ From: Dongli Zhang <dongli.zhang(a)oracle.com> Subject: mm, page_frag: recover from memory pressure The ethernet driver may allocate skb (and skb->data) via napi_alloc_skb(). This ends up to page_frag_alloc() to allocate skb->data from page_frag_cache->va. During the memory pressure, page_frag_cache->va may be allocated as pfmemalloc page. As a result, the skb->pfmemalloc is always true as skb->data is from page_frag_cache->va. The skb will be dropped if the sock (receiver) does not have SOCK_MEMALLOC. This is expected behaviour under memory pressure. However, once kernel is not under memory pressure any longer (suppose large amount of memory pages are just reclaimed), the page_frag_alloc() may still re-use the prior pfmemalloc page_frag_cache->va to allocate skb->data. As a result, the skb->pfmemalloc is always true unless page_frag_cache->va is re-allocated, even if the kernel is not under memory pressure any longer. Here is how kernel runs into issue. 1. The kernel is under memory pressure and allocation of PAGE_FRAG_CACHE_MAX_ORDER in __page_frag_cache_refill() will fail. Instead, the pfmemalloc page is allocated for page_frag_cache->va. 2. All skb->data from page_frag_cache->va (pfmemalloc) will have skb->pfmemalloc=true. The skb will always be dropped by sock without SOCK_MEMALLOC. This is an expected behaviour. 3. Suppose a large amount of pages are reclaimed and kernel is not under memory pressure any longer. We expect skb->pfmemalloc drop will not happen. 4. Unfortunately, page_frag_alloc() does not proactively re-allocate page_frag_alloc->va and will always re-use the prior pfmemalloc page. The skb->pfmemalloc is always true even kernel is not under memory pressure any longer. Fix this by freeing and re-allocating the page instead of recycling it. Link: https://lore.kernel.org/lkml/20201103193239.1807-1-dongli.zhang@oracle.com/ Link: https://lore.kernel.org/linux-mm/20201105042140.5253-1-willy@infradead.org/ Link: https://lkml.kernel.org/r/20201115201029.11903-1-dongli.zhang@oracle.com Fixes: 79930f5892e ("net: do not deplete pfmemalloc reserve") Signed-off-by: Dongli Zhang <dongli.zhang(a)oracle.com> Suggested-by: Matthew Wilcox (Oracle) <willy(a)infradead.org> Acked-by: Vlastimil Babka <vbabka(a)suse.cz> Reviewed-by: Eric Dumazet <edumazet(a)google.com> Cc: Aruna Ramakrishna <aruna.ramakrishna(a)oracle.com> Cc: Bert Barbe <bert.barbe(a)oracle.com> Cc: Rama Nichanamatlu <rama.nichanamatlu(a)oracle.com> Cc: Venkat Venkatsubra <venkat.x.venkatsubra(a)oracle.com> Cc: Manjunath Patil <manjunath.b.patil(a)oracle.com> Cc: Joe Jin <joe.jin(a)oracle.com> Cc: SRINIVAS <srinivas.eeda(a)oracle.com> Cc: David S. Miller <davem(a)davemloft.net> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- mm/page_alloc.c | 5 +++++ 1 file changed, 5 insertions(+) --- a/mm/page_alloc.c~page_frag-recover-from-memory-pressure +++ a/mm/page_alloc.c @@ -5103,6 +5103,11 @@ refill: if (!page_ref_sub_and_test(page, nc->pagecnt_bias)) goto refill; + if (unlikely(nc->pfmemalloc)) { + free_the_page(page, compound_order(page)); + goto refill; + } + #if (PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE) /* if size can vary use size else just use PAGE_SIZE */ size = nc->size; _ Patches currently in -mm which might be from dongli.zhang(a)oracle.com are page_frag-recover-from-memory-pressure.patch

4 years, 11 months

1
0
0 0

2025

2024

2023

2022

2021

2020

2019

2018

2017

Linux-stable-mirror