April 2019 - Linux-stable-mirror

Re: [PATCH] virt: vbox: Sanity-check parameter types for hgcm-calls coming from userspace

by Hans de Goede

Hi, On 05-04-19 16:15, Sasha Levin wrote: > Hi, > > [This is an automated email] > > This commit has been processed because it contains a -stable tag. > The stable tag indicates that it's relevant for the following trees: all > > The bot has tested the following trees: v5.0.6, v4.19.33, v4.14.110, v4.9.167, v4.4.178, v3.18.138. > > v5.0.6: Build OK! > v4.19.33: Build OK! > v4.14.110: Failed to apply! Possible dependencies: > 0ba002bc4393 ("virt: Add vboxguest driver for Virtual Box Guest integration") > > v4.9.167: Failed to apply! Possible dependencies: > 0ba002bc4393 ("virt: Add vboxguest driver for Virtual Box Guest integration") > > v4.4.178: Failed to apply! Possible dependencies: > 0ba002bc4393 ("virt: Add vboxguest driver for Virtual Box Guest integration") > > v3.18.138: Failed to apply! Possible dependencies: > 0ba002bc4393 ("virt: Add vboxguest driver for Virtual Box Guest integration") > > > How should we proceed with this patch? 4.14 and older do not have the vboxguest driver, so just applying this to 4.19+ is fine. Regards, Hans

6 years, 8 months

1
0
0 0

[tip:x86/urgent] x86/asm: Use stricter assembly constraints in bitops

by tip-bot for Alexander Potapenko

Commit-ID: 5b77e95dd7790ff6c8fbf1cd8d0104ebed818a03 Gitweb: https://git.kernel.org/tip/5b77e95dd7790ff6c8fbf1cd8d0104ebed818a03 Author: Alexander Potapenko <glider(a)google.com> AuthorDate: Tue, 2 Apr 2019 13:28:13 +0200 Committer: Ingo Molnar <mingo(a)kernel.org> CommitDate: Sat, 6 Apr 2019 09:52:02 +0200 x86/asm: Use stricter assembly constraints in bitops There's a number of problems with how arch/x86/include/asm/bitops.h is currently using assembly constraints for the memory region bitops are modifying: 1) Use memory clobber in bitops that touch arbitrary memory Certain bit operations that read/write bits take a base pointer and an arbitrarily large offset to address the bit relative to that base. Inline assembly constraints aren't expressive enough to tell the compiler that the assembly directive is going to touch a specific memory location of unknown size, therefore we have to use the "memory" clobber to indicate that the assembly is going to access memory locations other than those listed in the inputs/outputs. To indicate that BTR/BTS instructions don't necessarily touch the first sizeof(long) bytes of the argument, we also move the address to assembly inputs. This particular change leads to size increase of 124 kernel functions in a defconfig build. For some of them the diff is in NOP operations, other end up re-reading values from memory and may potentially slow down the execution. But without these clobbers the compiler is free to cache the contents of the bitmaps and use them as if they weren't changed by the inline assembly. 2) Use byte-sized arguments for operations touching single bytes. Passing a long value to ANDB/ORB/XORB instructions makes the compiler treat sizeof(long) bytes as being clobbered, which isn't the case. This may theoretically lead to worse code in the case of heavy optimization. Practical impact: I've built a defconfig kernel and looked through some of the functions generated by GCC 7.3.0 with and without this clobber, and didn't spot any miscompilations. However there is a (trivial) theoretical case where this code leads to miscompilation: https://lkml.org/lkml/2019/3/28/393 using just GCC 8.3.0 with -O2. It isn't hard to imagine someone writes such a function in the kernel someday. So the primary motivation is to fix an existing misuse of the asm directive, which happens to work in certain configurations now, but isn't guaranteed to work under different circumstances. [ --mingo: Added -stable tag because defconfig only builds a fraction of the kernel and the trivial testcase looks normal enough to be used in existing or in-development code. ] Signed-off-by: Alexander Potapenko <glider(a)google.com> Cc: <stable(a)vger.kernel.org> Cc: Andy Lutomirski <luto(a)kernel.org> Cc: Borislav Petkov <bp(a)alien8.de> Cc: Brian Gerst <brgerst(a)gmail.com> Cc: Denys Vlasenko <dvlasenk(a)redhat.com> Cc: Dmitry Vyukov <dvyukov(a)google.com> Cc: H. Peter Anvin <hpa(a)zytor.com> Cc: James Y Knight <jyknight(a)google.com> Cc: Linus Torvalds <torvalds(a)linux-foundation.org> Cc: Paul E. McKenney <paulmck(a)linux.ibm.com> Cc: Peter Zijlstra <peterz(a)infradead.org> Cc: Thomas Gleixner <tglx(a)linutronix.de> Link: http://lkml.kernel.org/r/20190402112813.193378-1-glider@google.com [ Edited the changelog, tidied up one of the defines. ] Signed-off-by: Ingo Molnar <mingo(a)kernel.org> --- arch/x86/include/asm/bitops.h | 41 ++++++++++++++++++----------------------- 1 file changed, 18 insertions(+), 23 deletions(-) diff --git a/arch/x86/include/asm/bitops.h b/arch/x86/include/asm/bitops.h index d153d570bb04..8e790ec219a5 100644 --- a/arch/x86/include/asm/bitops.h +++ b/arch/x86/include/asm/bitops.h @@ -36,16 +36,17 @@ * bit 0 is the LSB of addr; bit 32 is the LSB of (addr+1). */ -#define BITOP_ADDR(x) "+m" (*(volatile long *) (x)) +#define RLONG_ADDR(x) "m" (*(volatile long *) (x)) +#define WBYTE_ADDR(x) "+m" (*(volatile char *) (x)) -#define ADDR BITOP_ADDR(addr) +#define ADDR RLONG_ADDR(addr) /* * We do the locked ops that don't return the old value as * a mask operation on a byte. */ #define IS_IMMEDIATE(nr) (__builtin_constant_p(nr)) -#define CONST_MASK_ADDR(nr, addr) BITOP_ADDR((void *)(addr) + ((nr)>>3)) +#define CONST_MASK_ADDR(nr, addr) WBYTE_ADDR((void *)(addr) + ((nr)>>3)) #define CONST_MASK(nr) (1 << ((nr) & 7)) /** @@ -73,7 +74,7 @@ set_bit(long nr, volatile unsigned long *addr) : "memory"); } else { asm volatile(LOCK_PREFIX __ASM_SIZE(bts) " %1,%0" - : BITOP_ADDR(addr) : "Ir" (nr) : "memory"); + : : RLONG_ADDR(addr), "Ir" (nr) : "memory"); } } @@ -88,7 +89,7 @@ set_bit(long nr, volatile unsigned long *addr) */ static __always_inline void __set_bit(long nr, volatile unsigned long *addr) { - asm volatile(__ASM_SIZE(bts) " %1,%0" : ADDR : "Ir" (nr) : "memory"); + asm volatile(__ASM_SIZE(bts) " %1,%0" : : ADDR, "Ir" (nr) : "memory"); } /** @@ -110,8 +111,7 @@ clear_bit(long nr, volatile unsigned long *addr) : "iq" ((u8)~CONST_MASK(nr))); } else { asm volatile(LOCK_PREFIX __ASM_SIZE(btr) " %1,%0" - : BITOP_ADDR(addr) - : "Ir" (nr)); + : : RLONG_ADDR(addr), "Ir" (nr) : "memory"); } } @@ -131,7 +131,7 @@ static __always_inline void clear_bit_unlock(long nr, volatile unsigned long *ad static __always_inline void __clear_bit(long nr, volatile unsigned long *addr) { - asm volatile(__ASM_SIZE(btr) " %1,%0" : ADDR : "Ir" (nr)); + asm volatile(__ASM_SIZE(btr) " %1,%0" : : ADDR, "Ir" (nr) : "memory"); } static __always_inline bool clear_bit_unlock_is_negative_byte(long nr, volatile unsigned long *addr) @@ -139,7 +139,7 @@ static __always_inline bool clear_bit_unlock_is_negative_byte(long nr, volatile bool negative; asm volatile(LOCK_PREFIX "andb %2,%1" CC_SET(s) - : CC_OUT(s) (negative), ADDR + : CC_OUT(s) (negative), WBYTE_ADDR(addr) : "ir" ((char) ~(1 << nr)) : "memory"); return negative; } @@ -155,13 +155,9 @@ static __always_inline bool clear_bit_unlock_is_negative_byte(long nr, volatile * __clear_bit() is non-atomic and implies release semantics before the memory * operation. It can be used for an unlock if no other CPUs can concurrently * modify other bits in the word. - * - * No memory barrier is required here, because x86 cannot reorder stores past - * older loads. Same principle as spin_unlock. */ static __always_inline void __clear_bit_unlock(long nr, volatile unsigned long *addr) { - barrier(); __clear_bit(nr, addr); } @@ -176,7 +172,7 @@ static __always_inline void __clear_bit_unlock(long nr, volatile unsigned long * */ static __always_inline void __change_bit(long nr, volatile unsigned long *addr) { - asm volatile(__ASM_SIZE(btc) " %1,%0" : ADDR : "Ir" (nr)); + asm volatile(__ASM_SIZE(btc) " %1,%0" : : ADDR, "Ir" (nr) : "memory"); } /** @@ -196,8 +192,7 @@ static __always_inline void change_bit(long nr, volatile unsigned long *addr) : "iq" ((u8)CONST_MASK(nr))); } else { asm volatile(LOCK_PREFIX __ASM_SIZE(btc) " %1,%0" - : BITOP_ADDR(addr) - : "Ir" (nr)); + : : RLONG_ADDR(addr), "Ir" (nr) : "memory"); } } @@ -242,8 +237,8 @@ static __always_inline bool __test_and_set_bit(long nr, volatile unsigned long * asm(__ASM_SIZE(bts) " %2,%1" CC_SET(c) - : CC_OUT(c) (oldbit), ADDR - : "Ir" (nr)); + : CC_OUT(c) (oldbit) + : ADDR, "Ir" (nr) : "memory"); return oldbit; } @@ -282,8 +277,8 @@ static __always_inline bool __test_and_clear_bit(long nr, volatile unsigned long asm volatile(__ASM_SIZE(btr) " %2,%1" CC_SET(c) - : CC_OUT(c) (oldbit), ADDR - : "Ir" (nr)); + : CC_OUT(c) (oldbit) + : ADDR, "Ir" (nr) : "memory"); return oldbit; } @@ -294,8 +289,8 @@ static __always_inline bool __test_and_change_bit(long nr, volatile unsigned lon asm volatile(__ASM_SIZE(btc) " %2,%1" CC_SET(c) - : CC_OUT(c) (oldbit), ADDR - : "Ir" (nr) : "memory"); + : CC_OUT(c) (oldbit) + : ADDR, "Ir" (nr) : "memory"); return oldbit; } @@ -326,7 +321,7 @@ static __always_inline bool variable_test_bit(long nr, volatile const unsigned l asm volatile(__ASM_SIZE(bt) " %2,%1" CC_SET(c) : CC_OUT(c) (oldbit) - : "m" (*(unsigned long *)addr), "Ir" (nr)); + : "m" (*(unsigned long *)addr), "Ir" (nr) : "memory"); return oldbit; }

6 years, 8 months

1
0
0 0

stable-rc/linux-4.4.y boot: 90 boots: 2 failed, 72 passed with 16 offline (v4.4.178)

by kernelci.org bot

stable-rc/linux-4.4.y boot: 90 boots: 2 failed, 72 passed with 16 offline (v4.4.178) Full Boot Summary: https://kernelci.org/boot/all/job/stable-rc/branch/linux-4.4.y/kernel/v4.4.… Full Build Summary: https://kernelci.org/build/stable-rc/branch/linux-4.4.y/kernel/v4.4.178/ Tree: stable-rc Branch: linux-4.4.y Git Describe: v4.4.178 Git Commit: 12ae58ca7ec42fe23df5d0b0d01bce2ccb728fd5 Git URL: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git Tested: 43 unique boards, 21 SoC families, 14 builds out of 190 Boot Failures Detected: arm: multi_v7_defconfig: gcc-7: stih410-b2120: 1 failed lab arm64: defconfig: gcc-7: qcom-qdf2400: 1 failed lab Offline Platforms: arm: bcm2835_defconfig: gcc-7 bcm2835-rpi-b: 1 offline lab multi_v7_defconfig: gcc-7 alpine-db: 1 offline lab at91-sama5d4_xplained: 1 offline lab qcom-apq8064-cm-qs600: 1 offline lab qcom-apq8064-ifc6410: 1 offline lab socfpga_cyclone5_de0_sockit: 1 offline lab sun5i-r8-chip: 1 offline lab tegra124-jetson-tk1: 1 offline lab tegra20-iris-512: 1 offline lab tegra_defconfig: gcc-7 tegra124-jetson-tk1: 1 offline lab tegra20-iris-512: 1 offline lab sunxi_defconfig: gcc-7 sun5i-r8-chip: 1 offline lab sama5_defconfig: gcc-7 at91-sama5d4_xplained: 1 offline lab qcom_defconfig: gcc-7 qcom-apq8064-cm-qs600: 1 offline lab qcom-apq8064-ifc6410: 1 offline lab arm64: defconfig: gcc-7 apq8016-sbc: 1 offline lab --- For more info write to <info(a)kernelci.org>

6 years, 8 months

1
0
0 0

stable-rc/linux-4.9.y boot: 101 boots: 0 failed, 83 passed with 18 offline (v4.9.168)

by kernelci.org bot

stable-rc/linux-4.9.y boot: 101 boots: 0 failed, 83 passed with 18 offline (v4.9.168) Full Boot Summary: https://kernelci.org/boot/all/job/stable-rc/branch/linux-4.9.y/kernel/v4.9.… Full Build Summary: https://kernelci.org/build/stable-rc/branch/linux-4.9.y/kernel/v4.9.168/ Tree: stable-rc Branch: linux-4.9.y Git Describe: v4.9.168 Git Commit: e93d4749118fbb0ae8fc889706daa56d4dafecd4 Git URL: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git Tested: 50 unique boards, 22 SoC families, 15 builds out of 197 Offline Platforms: arm: bcm2835_defconfig: gcc-7 bcm2835-rpi-b: 1 offline lab multi_v7_defconfig: gcc-7 alpine-db: 1 offline lab at91-sama5d4_xplained: 1 offline lab qcom-apq8064-cm-qs600: 1 offline lab qcom-apq8064-ifc6410: 1 offline lab socfpga_cyclone5_de0_sockit: 1 offline lab sun5i-r8-chip: 1 offline lab tegra124-jetson-tk1: 1 offline lab tegra20-iris-512: 1 offline lab sunxi_defconfig: gcc-7 sun5i-r8-chip: 1 offline lab socfpga_defconfig: gcc-7 socfpga_cyclone5_de0_sockit: 1 offline lab tegra_defconfig: gcc-7 tegra124-jetson-tk1: 1 offline lab tegra20-iris-512: 1 offline lab sama5_defconfig: gcc-7 at91-sama5d4_xplained: 1 offline lab qcom_defconfig: gcc-7 qcom-apq8064-cm-qs600: 1 offline lab qcom-apq8064-ifc6410: 1 offline lab arm64: defconfig: gcc-7 apq8016-sbc: 1 offline lab juno-r2: 1 offline lab --- For more info write to <info(a)kernelci.org>

6 years, 8 months

1
0
0 0

stable/linux-4.9.y boot: 41 boots: 0 failed, 41 passed (v4.9.168)

by kernelci.org bot

stable/linux-4.9.y boot: 41 boots: 0 failed, 41 passed (v4.9.168) Full Boot Summary: https://kernelci.org/boot/all/job/stable/branch/linux-4.9.y/kernel/v4.9.168/ Full Build Summary: https://kernelci.org/build/stable/branch/linux-4.9.y/kernel/v4.9.168/ Tree: stable Branch: linux-4.9.y Git Describe: v4.9.168 Git Commit: e93d4749118fbb0ae8fc889706daa56d4dafecd4 Git URL: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git Tested: 20 unique boards, 12 SoC families, 8 builds out of 197 --- For more info write to <info(a)kernelci.org>

6 years, 8 months

1
0
0 0

stable/linux-4.14.y boot: 50 boots: 0 failed, 49 passed with 1 untried/unknown (v4.14.111)

by kernelci.org bot

stable/linux-4.14.y boot: 50 boots: 0 failed, 49 passed with 1 untried/unknown (v4.14.111) Full Boot Summary: https://kernelci.org/boot/all/job/stable/branch/linux-4.14.y/kernel/v4.14.1… Full Build Summary: https://kernelci.org/build/stable/branch/linux-4.14.y/kernel/v4.14.111/ Tree: stable Branch: linux-4.14.y Git Describe: v4.14.111 Git Commit: 1ec8f1f0bffe34ebdf95dbe0fd4a6635a84612a8 Git URL: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git Tested: 27 unique boards, 13 SoC families, 8 builds out of 201 --- For more info write to <info(a)kernelci.org>

6 years, 8 months

1
0
0 0

stable-rc/linux-3.18.y boot: 51 boots: 1 failed, 44 passed with 6 offline (v3.18.138)

by kernelci.org bot

stable-rc/linux-3.18.y boot: 51 boots: 1 failed, 44 passed with 6 offline (v3.18.138) Full Boot Summary: https://kernelci.org/boot/all/job/stable-rc/branch/linux-3.18.y/kernel/v3.1… Full Build Summary: https://kernelci.org/build/stable-rc/branch/linux-3.18.y/kernel/v3.18.138/ Tree: stable-rc Branch: linux-3.18.y Git Describe: v3.18.138 Git Commit: a1a43d6522bc1da70f210d46485fac7a71c13ca8 Git URL: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git Tested: 22 unique boards, 12 SoC families, 13 builds out of 189 Boot Failure Detected: x86_64: x86_64_defconfig: gcc-7: minnowboard-turbot-E3826: 1 failed lab Offline Platforms: arm: bcm2835_defconfig: gcc-7 bcm2835-rpi-b: 1 offline lab tegra_defconfig: gcc-7 tegra124-jetson-tk1: 1 offline lab tegra20-iris-512: 1 offline lab multi_v7_defconfig: gcc-7 tegra124-jetson-tk1: 1 offline lab tegra20-iris-512: 1 offline lab sama5_defconfig: gcc-7 at91-sama5d4ek: 1 offline lab --- For more info write to <info(a)kernelci.org>

6 years, 8 months

1
0
0 0

stable/linux-5.0.y boot: 53 boots: 5 failed, 48 passed (v5.0.7)

by kernelci.org bot

stable/linux-5.0.y boot: 53 boots: 5 failed, 48 passed (v5.0.7) Full Boot Summary: https://kernelci.org/boot/all/job/stable/branch/linux-5.0.y/kernel/v5.0.7/ Full Build Summary: https://kernelci.org/build/stable/branch/linux-5.0.y/kernel/v5.0.7/ Tree: stable Branch: linux-5.0.y Git Describe: v5.0.7 Git Commit: 8b298d3a0bd5feeb47129c4889356b38b78ab231 Git URL: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git Tested: 30 unique boards, 14 SoC families, 9 builds out of 208 Boot Regressions Detected: arm64: defconfig: gcc-7: meson-gxbb-nanopi-k2: lab-baylibre: new failure (last pass: v5.0.6) meson-gxbb-p200: lab-baylibre: new failure (last pass: v5.0.5) meson-gxl-s805x-p241: lab-baylibre: new failure (last pass: v5.0.6) meson-gxl-s905x-khadas-vim: lab-baylibre: new failure (last pass: v5.0.5) meson-gxl-s905x-libretech-cc: lab-baylibre: new failure (last pass: v5.0.6) Boot Failures Detected: arm64: defconfig: gcc-7: meson-gxbb-nanopi-k2: 1 failed lab meson-gxbb-p200: 1 failed lab meson-gxl-s805x-p241: 1 failed lab meson-gxl-s905x-khadas-vim: 1 failed lab meson-gxl-s905x-libretech-cc: 1 failed lab --- For more info write to <info(a)kernelci.org>

6 years, 8 months

1
0
0 0

stable-rc/linux-4.14.y boot: 112 boots: 1 failed, 93 passed with 18 offline (v4.14.111)

by kernelci.org bot

stable-rc/linux-4.14.y boot: 112 boots: 1 failed, 93 passed with 18 offline (v4.14.111) Full Boot Summary: https://kernelci.org/boot/all/job/stable-rc/branch/linux-4.14.y/kernel/v4.1… Full Build Summary: https://kernelci.org/build/stable-rc/branch/linux-4.14.y/kernel/v4.14.111/ Tree: stable-rc Branch: linux-4.14.y Git Describe: v4.14.111 Git Commit: 1ec8f1f0bffe34ebdf95dbe0fd4a6635a84612a8 Git URL: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git Tested: 61 unique boards, 23 SoC families, 14 builds out of 201 Boot Failure Detected: arm64: defconfig: gcc-7: rk3399-firefly: 1 failed lab Offline Platforms: arm: bcm2835_defconfig: gcc-7 bcm2835-rpi-b: 1 offline lab multi_v7_defconfig: gcc-7 alpine-db: 1 offline lab at91-sama5d4_xplained: 1 offline lab qcom-apq8064-cm-qs600: 1 offline lab qcom-apq8064-ifc6410: 1 offline lab socfpga_cyclone5_de0_sockit: 1 offline lab sun5i-r8-chip: 1 offline lab tegra124-jetson-tk1: 1 offline lab tegra20-iris-512: 1 offline lab tegra_defconfig: gcc-7 tegra124-jetson-tk1: 1 offline lab tegra20-iris-512: 1 offline lab sunxi_defconfig: gcc-7 sun5i-r8-chip: 1 offline lab sama5_defconfig: gcc-7 at91-sama5d4_xplained: 1 offline lab qcom_defconfig: gcc-7 qcom-apq8064-cm-qs600: 1 offline lab qcom-apq8064-ifc6410: 1 offline lab arm64: defconfig: gcc-7 apq8016-sbc: 1 offline lab juno-r2: 1 offline lab mt7622-rfb1: 1 offline lab --- For more info write to <info(a)kernelci.org>

6 years, 8 months

1
0
0 0

[patch 09/14] mm: writeback: use exact memcg dirty counts

by akpm＠linux-foundation.org

From: Greg Thelen <gthelen(a)google.com> Subject: mm: writeback: use exact memcg dirty counts Since a983b5ebee57 ("mm: memcontrol: fix excessive complexity in memory.stat reporting") memcg dirty and writeback counters are managed as: 1) per-memcg per-cpu values in range of [-32..32] 2) per-memcg atomic counter When a per-cpu counter cannot fit in [-32..32] it's flushed to the atomic. Stat readers only check the atomic. Thus readers such as balance_dirty_pages() may see a nontrivial error margin: 32 pages per cpu. Assuming 100 cpus: 4k x86 page_size: 13 MiB error per memcg 64k ppc page_size: 200 MiB error per memcg Considering that dirty+writeback are used together for some decisions the errors double. This inaccuracy can lead to undeserved oom kills. One nasty case is when all per-cpu counters hold positive values offsetting an atomic negative value (i.e. per_cpu[*]=32, atomic=n_cpu*-32). balance_dirty_pages() only consults the atomic and does not consider throttling the next n_cpu*32 dirty pages. If the file_lru is in the 13..200 MiB range then there's absolutely no dirty throttling, which burdens vmscan with only dirty+writeback pages thus resorting to oom kill. It could be argued that tiny containers are not supported, but it's more subtle. It's the amount the space available for file lru that matters. If a container has memory.max-200MiB of non reclaimable memory, then it will also suffer such oom kills on a 100 cpu machine. The following test reliably ooms without this patch. This patch avoids oom kills. $ cat test mount -t cgroup2 none /dev/cgroup cd /dev/cgroup echo +io +memory > cgroup.subtree_control mkdir test cd test echo 10M > memory.max (echo $BASHPID > cgroup.procs && exec /memcg-writeback-stress /foo) (echo $BASHPID > cgroup.procs && exec dd if=/dev/zero of=/foo bs=2M count=100) $ cat memcg-writeback-stress.c /* * Dirty pages from all but one cpu. * Clean pages from the non dirtying cpu. * This is to stress per cpu counter imbalance. * On a 100 cpu machine: * - per memcg per cpu dirty count is 32 pages for each of 99 cpus * - per memcg atomic is -99*32 pages * - thus the complete dirty limit: sum of all counters 0 * - balance_dirty_pages() only sees atomic count -99*32 pages, which * it max()s to 0. * - So a workload can dirty -99*32 pages before balance_dirty_pages() * cares. */ #define _GNU_SOURCE #include <err.h> #include <fcntl.h> #include <sched.h> #include <stdlib.h> #include <stdio.h> #include <sys/stat.h> #include <sys/sysinfo.h> #include <sys/types.h> #include <unistd.h> static char *buf; static int bufSize; static void set_affinity(int cpu) { cpu_set_t affinity; CPU_ZERO(&affinity); CPU_SET(cpu, &affinity); if (sched_setaffinity(0, sizeof(affinity), &affinity)) err(1, "sched_setaffinity"); } static void dirty_on(int output_fd, int cpu) { int i, wrote; set_affinity(cpu); for (i = 0; i < 32; i++) { for (wrote = 0; wrote < bufSize; ) { int ret = write(output_fd, buf+wrote, bufSize-wrote); if (ret == -1) err(1, "write"); wrote += ret; } } } int main(int argc, char **argv) { int cpu, flush_cpu = 1, output_fd; const char *output; if (argc != 2) errx(1, "usage: output_file"); output = argv[1]; bufSize = getpagesize(); buf = malloc(getpagesize()); if (buf == NULL) errx(1, "malloc failed"); output_fd = open(output, O_CREAT|O_RDWR); if (output_fd == -1) err(1, "open(%s)", output); for (cpu = 0; cpu < get_nprocs(); cpu++) { if (cpu != flush_cpu) dirty_on(output_fd, cpu); } set_affinity(flush_cpu); if (fsync(output_fd)) err(1, "fsync(%s)", output); if (close(output_fd)) err(1, "close(%s)", output); free(buf); } Make balance_dirty_pages() and wb_over_bg_thresh() work harder to collect exact per memcg counters. This avoids the aforementioned oom kills. This does not affect the overhead of memory.stat, which still reads the single atomic counter. Why not use percpu_counter? memcg already handles cpus going offline, so no need for that overhead from percpu_counter. And the percpu_counter spinlocks are more heavyweight than is required. It probably also makes sense to use exact dirty and writeback counters in memcg oom reports. But that is saved for later. Link: http://lkml.kernel.org/r/20190329174609.164344-1-gthelen@google.com Signed-off-by: Greg Thelen <gthelen(a)google.com> Reviewed-by: Roman Gushchin <guro(a)fb.com> Acked-by: Johannes Weiner <hannes(a)cmpxchg.org> Cc: Michal Hocko <mhocko(a)kernel.org> Cc: Vladimir Davydov <vdavydov.dev(a)gmail.com> Cc: Tejun Heo <tj(a)kernel.org> Cc: <stable(a)vger.kernel.org> [4.16+] Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- include/linux/memcontrol.h | 5 ++++- mm/memcontrol.c | 20 ++++++++++++++++++-- 2 files changed, 22 insertions(+), 3 deletions(-) --- a/include/linux/memcontrol.h~writeback-use-exact-memcg-dirty-counts +++ a/include/linux/memcontrol.h @@ -566,7 +566,10 @@ struct mem_cgroup *lock_page_memcg(struc void __unlock_page_memcg(struct mem_cgroup *memcg); void unlock_page_memcg(struct page *page); -/* idx can be of type enum memcg_stat_item or node_stat_item */ +/* + * idx can be of type enum memcg_stat_item or node_stat_item. + * Keep in sync with memcg_exact_page_state(). + */ static inline unsigned long memcg_page_state(struct mem_cgroup *memcg, int idx) { --- a/mm/memcontrol.c~writeback-use-exact-memcg-dirty-counts +++ a/mm/memcontrol.c @@ -3882,6 +3882,22 @@ struct wb_domain *mem_cgroup_wb_domain(s return &memcg->cgwb_domain; } +/* + * idx can be of type enum memcg_stat_item or node_stat_item. + * Keep in sync with memcg_exact_page(). + */ +static unsigned long memcg_exact_page_state(struct mem_cgroup *memcg, int idx) +{ + long x = atomic_long_read(&memcg->stat[idx]); + int cpu; + + for_each_online_cpu(cpu) + x += per_cpu_ptr(memcg->stat_cpu, cpu)->count[idx]; + if (x < 0) + x = 0; + return x; +} + /** * mem_cgroup_wb_stats - retrieve writeback related stats from its memcg * @wb: bdi_writeback in question @@ -3907,10 +3923,10 @@ void mem_cgroup_wb_stats(struct bdi_writ struct mem_cgroup *memcg = mem_cgroup_from_css(wb->memcg_css); struct mem_cgroup *parent; - *pdirty = memcg_page_state(memcg, NR_FILE_DIRTY); + *pdirty = memcg_exact_page_state(memcg, NR_FILE_DIRTY); /* this should eventually include NR_UNSTABLE_NFS */ - *pwriteback = memcg_page_state(memcg, NR_WRITEBACK); + *pwriteback = memcg_exact_page_state(memcg, NR_WRITEBACK); *pfilepages = mem_cgroup_nr_lru_pages(memcg, (1 << LRU_INACTIVE_FILE) | (1 << LRU_ACTIVE_FILE)); *pheadroom = PAGE_COUNTER_MAX; _

6 years, 8 months

1
0
0 0

2025

2024

2023

2022

2021

2020

2019

2018

2017

Linux-stable-mirror April 2019