Currently scan_microcode() leverages microcode_matches() to check if the
microcode matches the CPU by comparing the family and model. However before
saving the microcode in scan_microcode(), the processor stepping and flag
of the microcode signature should also be considered in order to avoid
incompatible update and caused the failure of microcode update.
For example on one platform the microcode failed to be updated to the
latest revison on APs during resume from S3 due to incompatible cpu
stepping and signature->pf. This is because the scan_microcode() has
saved an incompatible copy of intel_ucode_patch in
save_microcode_in_initrd_intel() after bootup. And this intel_ucode_patch
is used by APs during early resume from S3 which results in unchecked MSR
access error during resume from S3:
[ 95.519390] unchecked MSR access error: RDMSR from 0x123 at
rIP: 0xffffffffb7676208 (native_read_msr+0x8/0x40)
[ 95.519391] Call Trace:
[ 95.519395] update_srbds_msr+0x38/0x80
[ 95.519396] identify_secondary_cpu+0x7a/0x90
[ 95.519397] smp_store_cpu_info+0x4e/0x60
[ 95.519398] start_secondary+0x49/0x150
[ 95.519399] secondary_startup_64_no_verify+0xa6/0xab
The system keeps running on old microcode during resume:
[ 210.366757] microcode: load_ucode_intel_ap: CPU1, enter, intel_ucode_patch: 0xffff9bf2816e0000
[ 210.366757] microcode: load_ucode_intel_ap: CPU1, p: 0xffff9bf2816e0000, rev: 0xd6
[ 210.366759] microcode: apply_microcode_early: rev: 0x84
[ 210.367826] microcode: apply_microcode_early: rev after upgrade: 0x84
until mc_cpu_starting() is invoked on each AP during resume and the
correct microcode is updated via apply_microcode_intel().
To fix this issue, the scan_microcode() uses find_matching_signature()
instead of microcode_matches() to compare the (family, model, stepping,
processor flag), and only save the microcode that matches. As there is
no other place invoking microcode_matches(), remove it accordingly.
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=208535
Fixes: 06b8534cb728 ("x86/microcode: Rework microcode loading")
Cc: stable(a)vger.kernel.org#v4.10+
Reviewed-by: Ashok Raj <ashok.raj(a)intel.com>
Signed-off-by: Chen Yu <yu.c.chen(a)intel.com>
---
v2: Remove RFC tag and Cc the stable mailing list.
---
arch/x86/kernel/cpu/microcode/intel.c | 50 ++-------------------------
1 file changed, 2 insertions(+), 48 deletions(-)
diff --git a/arch/x86/kernel/cpu/microcode/intel.c b/arch/x86/kernel/cpu/microcode/intel.c
index 6a99535d7f37..923853f79099 100644
--- a/arch/x86/kernel/cpu/microcode/intel.c
+++ b/arch/x86/kernel/cpu/microcode/intel.c
@@ -100,53 +100,6 @@ static int has_newer_microcode(void *mc, unsigned int csig, int cpf, int new_rev
return find_matching_signature(mc, csig, cpf);
}
-/*
- * Given CPU signature and a microcode patch, this function finds if the
- * microcode patch has matching family and model with the CPU.
- *
- * %true - if there's a match
- * %false - otherwise
- */
-static bool microcode_matches(struct microcode_header_intel *mc_header,
- unsigned long sig)
-{
- unsigned long total_size = get_totalsize(mc_header);
- unsigned long data_size = get_datasize(mc_header);
- struct extended_sigtable *ext_header;
- unsigned int fam_ucode, model_ucode;
- struct extended_signature *ext_sig;
- unsigned int fam, model;
- int ext_sigcount, i;
-
- fam = x86_family(sig);
- model = x86_model(sig);
-
- fam_ucode = x86_family(mc_header->sig);
- model_ucode = x86_model(mc_header->sig);
-
- if (fam == fam_ucode && model == model_ucode)
- return true;
-
- /* Look for ext. headers: */
- if (total_size <= data_size + MC_HEADER_SIZE)
- return false;
-
- ext_header = (void *) mc_header + data_size + MC_HEADER_SIZE;
- ext_sig = (void *)ext_header + EXT_HEADER_SIZE;
- ext_sigcount = ext_header->count;
-
- for (i = 0; i < ext_sigcount; i++) {
- fam_ucode = x86_family(ext_sig->sig);
- model_ucode = x86_model(ext_sig->sig);
-
- if (fam == fam_ucode && model == model_ucode)
- return true;
-
- ext_sig++;
- }
- return false;
-}
-
static struct ucode_patch *memdup_patch(void *data, unsigned int size)
{
struct ucode_patch *p;
@@ -344,7 +297,8 @@ scan_microcode(void *data, size_t size, struct ucode_cpu_info *uci, bool save)
size -= mc_size;
- if (!microcode_matches(mc_header, uci->cpu_sig.sig)) {
+ if (!find_matching_signature(data, uci->cpu_sig.sig,
+ uci->cpu_sig.pf)) {
data += mc_size;
continue;
}
--
2.17.1
Write buffers use a kmalloc()'ed buffer, they can leak
up to seven bytes of kernel memory to flash if writes are not
aligned.
So use ubifs_pad() to fill these gaps with padding bytes.
This was never a problem while scanning because the scanner logic
manually aligns node lengths and skips over these gaps.
Cc: <stable(a)vger.kernel.org>
Fixes: 1e51764a3c2ac05a2 ("UBIFS: add new flash file system")
Signed-off-by: Richard Weinberger <richard(a)nod.at>
---
fs/ubifs/io.c | 13 +++++++++++--
1 file changed, 11 insertions(+), 2 deletions(-)
diff --git a/fs/ubifs/io.c b/fs/ubifs/io.c
index 7e4bfaf2871f..eae9cf5a57b0 100644
--- a/fs/ubifs/io.c
+++ b/fs/ubifs/io.c
@@ -319,7 +319,7 @@ void ubifs_pad(const struct ubifs_info *c, void *buf, int pad)
{
uint32_t crc;
- ubifs_assert(c, pad >= 0 && !(pad & 7));
+ ubifs_assert(c, pad >= 0);
if (pad >= UBIFS_PAD_NODE_SZ) {
struct ubifs_ch *ch = buf;
@@ -764,6 +764,10 @@ int ubifs_wbuf_write_nolock(struct ubifs_wbuf *wbuf, void *buf, int len)
* write-buffer.
*/
memcpy(wbuf->buf + wbuf->used, buf, len);
+ if (aligned_len > len) {
+ ubifs_assert(c, aligned_len - len < 8);
+ ubifs_pad(c, wbuf->buf + wbuf->used + len, aligned_len - len);
+ }
if (aligned_len == wbuf->avail) {
dbg_io("flush jhead %s wbuf to LEB %d:%d",
@@ -856,13 +860,18 @@ int ubifs_wbuf_write_nolock(struct ubifs_wbuf *wbuf, void *buf, int len)
}
spin_lock(&wbuf->lock);
- if (aligned_len)
+ if (aligned_len) {
/*
* And now we have what's left and what does not take whole
* max. write unit, so write it to the write-buffer and we are
* done.
*/
memcpy(wbuf->buf, buf + written, len);
+ if (aligned_len > len) {
+ ubifs_assert(c, aligned_len - len < 8);
+ ubifs_pad(c, wbuf->buf + len, aligned_len - len);
+ }
+ }
if (c->leb_size - wbuf->offs >= c->max_write_size)
wbuf->size = c->max_write_size;
--
2.26.2
From: Siarhei Liakh <siarhei.liakh(a)concurrent-rt.com>
TL;DR:
There are two places in unlz4() function where reads beyond the end of a buffer
might happen under certain conditions which had been observed in real life on
stock Ubuntu 20.04 x86_64 with several vanilla mainline kernels, including 5.10.
As a result of this issue, the kernel fails to decompress LZ4-compressed
initramfs with following message showing up in the logs:
initramfs unpacking failed: Decoding failed
Note that in most cases the affected system is still able to proceed with the
boot process to completion.
LONG STORY:
Background.
Not so long ago we've noticed that some of our Ubuntu 20.04 x86_64 test systems
often fail to boot newly generated initramfs image. After extensive
investigation we determined that a failure required the following combination
for our 5.4.66-rt38 kernel with some additional custom patches:
Real x86_64 hardware or QEMU
UEFI boot
Ubunutu 20.04 (or 20.04.1) x86_64
CONFIG_BLK_DEV_RAM=y in .config
COMPRESS=lz4 in initramfs.conf
Freshly compiled and installed kernel
Freshly generated and installed initramfs image
In our testing, such a combination would often produce a non-bootable system. It
is important to note that [un]bootability of the system was later tracked down
to particular instances of initramfs images, and would follow them if they were
to be switched around/transferred to other systems. What is even more important
is that consecutive re-generations of initramfs images from the same source and
binary materials would yield about 75% of "bad" images. Further, once the image
is identified as "bad",it always stays "bad"; once one is "good" it always stays
"good". Reverting CONFIG_BLK_DEV_RAM to "m" (default in Ubuntu), or changing
COMPRESS to "gzip" yields a 100% bootable system. Decompressing "bad" initramfs
image with "unmkinitramfs" yields *exactly* the same set of binaries, as
verified by matching MD5 sums to those from "good" image.
Speculation.
Based on general observations, it appears that Ubuntu's userland toolchain
cannot consistently generate exactly the same compressed initramfs image, likely
due to some variations in timestamps between the runs. This causes variations in
compressed lz4 data stream. Further, either initramfs tools or lz4 libraries
appear to pad compressed lz4 output to closest 4-byte boundary. lz4 v1.9.2 that
ships with Ubuntu 20.04 appears to be able to handle such padding just fine,
while lz4 (supposedly v1.8.3) within Linux kernel cannot.
Several reports of somewhat similar behavior had been recently circulation
through different bug tracking systems and discussion forums [1-4].
I also suspect only that systems which can mount permanent root directly (or
with help of modules contained in first, supposedly uncompressed, part of
initramfs, or the ones with statically linked modules) can actually complete the
boot when LZ4 decompression fails. This would certainly explain why most of
Ubuntu systems still manage to boot even after failing to decompress the image.
The facts.
Regardless of whether Ubuntu 20.04 toolchain produces a valid lz4-compressed
initramfs image or not, current version of unlz4() function in kernel has two
code paths which had been observed attempting to read beyond the buffer end when
presented with one of the "padded"/"bad" initramfs images generated by stock
Ubuntu 20.04 toolchain. Some configurations of some 5.4 kernels are known to
fail to boot in such cases. This behavior also becomes evident on vanilla
5.10.0-rc3 and 5.10.0-rc4 kernels with addition of two logging statements for
corresponding edge cases, even though it does not prevent system from booting in
most generic configurations.
Further investigation is likely warranted to confirm whether userland toolchain
contains any bugs and/or whether any of these cases constitute violation of LZ4
and/or initramfs specification.
References
[1] https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1835660
[2] https://github.com/linuxmint/mint20-beta/issues/90
[3] https://askubuntu.com/questions/1245458/getting-the-message-0-283078-initra…
[4] https://forums.linuxmint.com/viewtopic.php?t=323152
Signed-off-by: Siarhei Liakh <siarhei.liakh(a)concurrent-rt.com>
---
Please CC: me directly on all replies.
lib/decompress_unlz4.c | 29 +++++++++++++++++++++++++++++
1 file changed, 29 insertions(+)
diff --git a/lib/decompress_unlz4.c b/lib/decompress_unlz4.c
index c0cfcfd486be..a016643a6dc5 100644
--- a/lib/decompress_unlz4.c
+++ b/lib/decompress_unlz4.c
@@ -125,6 +125,21 @@ STATIC inline int INIT unlz4(u8 *input, long in_len,
continue;
}
+ if (chunksize == 0) {
+ /*
+ * Nothing to decode...
+ * FIXME: this could be an error condition due
+ * to invalid or corrupt data. However, some
+ * userspace tools had been observed producing
+ * otherwise valid initramfs images which happen
+ * to hit this condition.
+ * TODO: need to figure out whether the latest
+ * LZ4 and initramfs specifications allows for
+ * zero-sized chunks.
+ * See similar message below.
+ */
+ break;
+ }
if (posp)
*posp += 4;
@@ -179,6 +194,20 @@ STATIC inline int INIT unlz4(u8 *input, long in_len,
else if (size < 0) {
error("data corrupted");
goto exit_2;
+ } else if (size < 4) {
+ /*
+ * Ignore any undesized junk/padding...
+ * FIXME: this could be an error condition due
+ * to invalid or corrupt data. However, some
+ * userspace tools had been observed producing
+ * otherwise valid initramfs images which happen
+ * to hit this condition.
+ * TODO: need to figure out whether the latest
+ * LZ4 and initramfs specifications allows for
+ * small padding at the end of the chunk.
+ * See similar message above.
+ */
+ break;
}
inp += chunksize;
}
--
2.17.1
Clang's integrated assembler produces the warning for assembly files:
warning: DWARF2 only supports one section per compilation unit
If -Wa,-gdwarf-* is unspecified, then debug info is not emitted. This
will be re-enabled for new DWARF versions in a follow up patch.
Enables defconfig+CONFIG_DEBUG_INFO to build cleanly with
LLVM=1 LLVM_IAS=1 for x86_64 and arm64.
Cc: <stable(a)vger.kernel.org>
Link: https://github.com/ClangBuiltLinux/linux/issues/716
Reported-by: Nathan Chancellor <natechancellor(a)gmail.com>
Suggested-by: Dmitry Golovin <dima(a)golovin.in>
Suggested-by: Sedat Dilek <sedat.dilek(a)gmail.com>
Signed-off-by: Nick Desaulniers <ndesaulniers(a)google.com>
---
Makefile | 2 ++
1 file changed, 2 insertions(+)
diff --git a/Makefile b/Makefile
index f353886dbf44..75b1a3dcbf30 100644
--- a/Makefile
+++ b/Makefile
@@ -826,7 +826,9 @@ else
DEBUG_CFLAGS += -g
endif
+ifndef LLVM_IAS
KBUILD_AFLAGS += -Wa,-gdwarf-2
+endif
ifdef CONFIG_DEBUG_INFO_DWARF4
DEBUG_CFLAGS += -gdwarf-4
--
2.29.1.341.ge80a0c044ae-goog
The patch titled
Subject: mm, page_frag: recover from memory pressure
has been added to the -mm tree. Its filename is
page_frag-recover-from-memory-pressure.patch
This patch should soon appear at
https://ozlabs.org/~akpm/mmots/broken-out/page_frag-recover-from-memory-pre…
and later at
https://ozlabs.org/~akpm/mmotm/broken-out/page_frag-recover-from-memory-pre…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: Dongli Zhang <dongli.zhang(a)oracle.com>
Subject: mm, page_frag: recover from memory pressure
The ethernet driver may allocate skb (and skb->data) via napi_alloc_skb().
This ends up to page_frag_alloc() to allocate skb->data from
page_frag_cache->va.
During the memory pressure, page_frag_cache->va may be allocated as
pfmemalloc page. As a result, the skb->pfmemalloc is always true as
skb->data is from page_frag_cache->va. The skb will be dropped if the
sock (receiver) does not have SOCK_MEMALLOC. This is expected behaviour
under memory pressure.
However, once kernel is not under memory pressure any longer (suppose
large amount of memory pages are just reclaimed), the page_frag_alloc()
may still re-use the prior pfmemalloc page_frag_cache->va to allocate
skb->data. As a result, the skb->pfmemalloc is always true unless
page_frag_cache->va is re-allocated, even if the kernel is not under
memory pressure any longer.
Here is how kernel runs into issue.
1. The kernel is under memory pressure and allocation of
PAGE_FRAG_CACHE_MAX_ORDER in __page_frag_cache_refill() will fail.
Instead, the pfmemalloc page is allocated for page_frag_cache->va.
2. All skb->data from page_frag_cache->va (pfmemalloc) will have
skb->pfmemalloc=true. The skb will always be dropped by sock without
SOCK_MEMALLOC. This is an expected behaviour.
3. Suppose a large amount of pages are reclaimed and kernel is not
under memory pressure any longer. We expect skb->pfmemalloc drop will
not happen.
4. Unfortunately, page_frag_alloc() does not proactively re-allocate
page_frag_alloc->va and will always re-use the prior pfmemalloc page.
The skb->pfmemalloc is always true even kernel is not under memory
pressure any longer.
Fix this by freeing and re-allocating the page instead of recycling it.
Link: https://lore.kernel.org/lkml/20201103193239.1807-1-dongli.zhang@oracle.com/
Link: https://lore.kernel.org/linux-mm/20201105042140.5253-1-willy@infradead.org/
Link: https://lkml.kernel.org/r/20201115201029.11903-1-dongli.zhang@oracle.com
Fixes: 79930f5892e ("net: do not deplete pfmemalloc reserve")
Signed-off-by: Dongli Zhang <dongli.zhang(a)oracle.com>
Suggested-by: Matthew Wilcox (Oracle) <willy(a)infradead.org>
Acked-by: Vlastimil Babka <vbabka(a)suse.cz>
Reviewed-by: Eric Dumazet <edumazet(a)google.com>
Cc: Aruna Ramakrishna <aruna.ramakrishna(a)oracle.com>
Cc: Bert Barbe <bert.barbe(a)oracle.com>
Cc: Rama Nichanamatlu <rama.nichanamatlu(a)oracle.com>
Cc: Venkat Venkatsubra <venkat.x.venkatsubra(a)oracle.com>
Cc: Manjunath Patil <manjunath.b.patil(a)oracle.com>
Cc: Joe Jin <joe.jin(a)oracle.com>
Cc: SRINIVAS <srinivas.eeda(a)oracle.com>
Cc: David S. Miller <davem(a)davemloft.net>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/page_alloc.c | 5 +++++
1 file changed, 5 insertions(+)
--- a/mm/page_alloc.c~page_frag-recover-from-memory-pressure
+++ a/mm/page_alloc.c
@@ -5103,6 +5103,11 @@ refill:
if (!page_ref_sub_and_test(page, nc->pagecnt_bias))
goto refill;
+ if (unlikely(nc->pfmemalloc)) {
+ free_the_page(page, compound_order(page));
+ goto refill;
+ }
+
#if (PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE)
/* if size can vary use size else just use PAGE_SIZE */
size = nc->size;
_
Patches currently in -mm which might be from dongli.zhang(a)oracle.com are
page_frag-recover-from-memory-pressure.patch