Any write with either dd or flashcp to a device driven by the
spear_smi.c driver will pass through the spear_smi_cpy_toio()
function. This function will get called for chunks of up to 256 bytes.
If the amount of data is smaller, we may have a problem if the data
length is not 4-byte aligned. In this situation, the kernel panics
during the memcpy:
# dd if=/dev/urandom bs=1001 count=1 of=/dev/mtd6
spear_smi_cpy_toio [620] dest c9070000, src c7be8800, len 256
spear_smi_cpy_toio [620] dest c9070100, src c7be8900, len 256
spear_smi_cpy_toio [620] dest c9070200, src c7be8a00, len 256
spear_smi_cpy_toio [620] dest c9070300, src c7be8b00, len 233
Unhandled fault: external abort on non-linefetch (0x808) at 0xc90703e8
[...]
PC is at memcpy+0xcc/0x330
The above error occurs because the implementation of memcpy_toio()
tries to optimize the number of I/O by writing 4 bytes at a time as
much as possible, until there are less than 4 bytes left and then
switches to word or byte writes.
Unfortunately, the specification states about the Write Burst mode:
"the next AHB Write request should point to the next
incremented address and should have the same size (byte,
half-word or word)"
This means ARM architecture implementation of memcpy_toio() cannot
reliably be used blindly here. Workaround this situation by update the
write path to stick to byte access when the burst length is not
multiple of 4.
Fixes: f18dbbb1bfe0 ("mtd: ST SPEAr: Add SMI driver for serial NOR flash")
Cc: Russell King <linux(a)armlinux.org.uk>
Cc: Boris Brezillon <boris.brezillon(a)collabora.com>
Cc: stable(a)vger.kernel.org
Signed-off-by: Miquel Raynal <miquel.raynal(a)bootlin.com>
---
Changes in v3:
==============
* Prevent writes to non 4-byte aligned addresses to fail.
* Use the IS_ALIGNED() macro.
* Add a comment to explain why the 'memcpy_toio_b' helper is needed
directly in the code.
Changes in v2:
==============
* This time I think the patch really fixes the problem: we use a
memcpy_toio_b() function to force byte access only when needed. We
don't use the _memcpy_toio() helper anymore as the fact that it is
doing byte access is purely an implementation detail and is not part
of the API, while the function is also flagged as "should be
optimized".
* One could argue that potentially memcpy_toio() does not ensure by
design 4-bytes access only but I think it is good enough to use it
in this case as the ARM implementation of this function is already
extensively optimized. I also find clearer to use it than
adding my own spear_smi_mempy_toio_l(). Please tell me if you disagree
with this.
* The volatile keyword has been taken voluntarily from the _memcpy_toio()
implementation I was about to use previously.
drivers/mtd/devices/spear_smi.c | 38 ++++++++++++++++++++++++++++++++-
1 file changed, 37 insertions(+), 1 deletion(-)
diff --git a/drivers/mtd/devices/spear_smi.c b/drivers/mtd/devices/spear_smi.c
index 986f81d2f93e..348961663cf4 100644
--- a/drivers/mtd/devices/spear_smi.c
+++ b/drivers/mtd/devices/spear_smi.c
@@ -592,6 +592,26 @@ static int spear_mtd_read(struct mtd_info *mtd, loff_t from, size_t len,
return 0;
}
+/*
+ * The purpose of this function is to ensure a memcpy_toio() with byte writes
+ * only. Its structure is inspired from the ARM implementation of _memcpy_toio()
+ * which also does single byte writes but cannot be used here as this is just an
+ * implementation detail and not part of the API. Not mentioning the comment
+ * stating that _memcpy_toio() should be optimized.
+ */
+static void spear_smi_memcpy_toio_b(volatile void __iomem *dest,
+ const void *src, size_t len)
+{
+ const unsigned char *from = src;
+
+ while (len) {
+ len--;
+ writeb(*from, dest);
+ from++;
+ dest++;
+ }
+}
+
static inline int spear_smi_cpy_toio(struct spear_smi *dev, u32 bank,
void __iomem *dest, const void *src, size_t len)
{
@@ -614,7 +634,23 @@ static inline int spear_smi_cpy_toio(struct spear_smi *dev, u32 bank,
ctrlreg1 = readl(dev->io_base + SMI_CR1);
writel((ctrlreg1 | WB_MODE) & ~SW_MODE, dev->io_base + SMI_CR1);
- memcpy_toio(dest, src, len);
+ /*
+ * In Write Burst mode (WB_MODE), the specs states that writes must be:
+ * - incremental
+ * - of the same size
+ * The ARM implementation of memcpy_toio() will optimize the number of
+ * I/O by using as much 4-byte writes as possible, surrounded by
+ * 2-byte/1-byte access if:
+ * - the destination is not 4-byte aligned
+ * - the length is not a multiple of 4-byte.
+ * Avoid this alternance of write access size by using our own 'byte
+ * access' helper if at least one of the two conditions above is true.
+ */
+ if (IS_ALIGNED(len, sizeof(u32)) &&
+ IS_ALIGNED((unsigned int)dest, sizeof(u32)))
+ memcpy_toio(dest, src, len);
+ else
+ spear_smi_memcpy_toio_b(dest, src, len);
writel(ctrlreg1, dev->io_base + SMI_CR1);
--
2.20.1
8-letter strings representing ARC perf events are stores in two
32-bit registers as ASCII characters like that: "IJMP", "IALL", "IJMPTAK" etc.
And the same order of bytes in the word is used regardless CPU endianness.
Which means in case of big-endian CPU core we need to swap bytes to get
the same order as if it was on little-endian CPU.
Otherwise we're seeing the following error message on boot:
------------------------->8----------------------
ARC perf : 8 counters (32 bits), 40 conditions, [overflow IRQ support]
sysfs: cannot create duplicate filename '/devices/arc_pct/events/pmji'
CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.2.18 #3
Stack Trace:
arc_unwind_core+0xd4/0xfc
dump_stack+0x64/0x80
sysfs_warn_dup+0x46/0x58
sysfs_add_file_mode_ns+0xb2/0x168
create_files+0x70/0x2a0
------------[ cut here ]------------
WARNING: CPU: 0 PID: 1 at kernel/events/core.c:12144 perf_event_sysfs_init+0x70/0xa0
Failed to register pmu: arc_pct, reason -17
Modules linked in:
CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.2.18 #3
Stack Trace:
arc_unwind_core+0xd4/0xfc
dump_stack+0x64/0x80
__warn+0x9c/0xd4
warn_slowpath_fmt+0x22/0x2c
perf_event_sysfs_init+0x70/0xa0
---[ end trace a75fb9a9837bd1ec ]---
------------------------->8----------------------
What happens here we're trying to register more than one raw perf event
with the same name "PMJI". Why? Because ARC perf events are 4 to 8 letters
and encoded into two 32-bit words. In this particular case we deal with 2
events:
* "IJMP____" which counts all jump & branch instructions
* "IJMPC___" which counts only conditional jumps & branches
Those strings are split in two 32-bit words this way "IJMP" + "____" &
"IJMP" + "C___" correspondingly. Now if we read them swapped due to CPU core
being big-endian then we read "PMJI" + "____" & "PMJI" + "___C".
And since we interpret read array of ASCII letters as a null-terminated string
on big-endian CPU we end up with 2 events of the same name "PMJI".
Signed-off-by: Alexey Brodkin <abrodkin(a)synopsys.com>
Cc: stable(a)vger.kernel.org
---
arch/arc/kernel/perf_event.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/arch/arc/kernel/perf_event.c b/arch/arc/kernel/perf_event.c
index 861a8aea51f9..661fd842ea97 100644
--- a/arch/arc/kernel/perf_event.c
+++ b/arch/arc/kernel/perf_event.c
@@ -614,8 +614,8 @@ static int arc_pmu_device_probe(struct platform_device *pdev)
/* loop thru all available h/w condition indexes */
for (i = 0; i < cc_bcr.c; i++) {
write_aux_reg(ARC_REG_CC_INDEX, i);
- cc_name.indiv.word0 = read_aux_reg(ARC_REG_CC_NAME0);
- cc_name.indiv.word1 = read_aux_reg(ARC_REG_CC_NAME1);
+ cc_name.indiv.word0 = le32_to_cpu(read_aux_reg(ARC_REG_CC_NAME0));
+ cc_name.indiv.word1 = le32_to_cpu(read_aux_reg(ARC_REG_CC_NAME1));
arc_pmu_map_hw_event(i, cc_name.str);
arc_pmu_add_raw_event_attr(i, cc_name.str);
--
2.16.2
Any write with either dd or flashcp to a device driven by the
spear_smi.c driver will pass through the spear_smi_cpy_toio()
function. This function will get called for chunks of up to 256 bytes.
If the amount of data is smaller, we may have a problem if the data
length is not 4-byte aligned. In this situation, the kernel panics
during the memcpy:
# dd if=/dev/urandom bs=1001 count=1 of=/dev/mtd6
spear_smi_cpy_toio [620] dest c9070000, src c7be8800, len 256
spear_smi_cpy_toio [620] dest c9070100, src c7be8900, len 256
spear_smi_cpy_toio [620] dest c9070200, src c7be8a00, len 256
spear_smi_cpy_toio [620] dest c9070300, src c7be8b00, len 233
Unhandled fault: external abort on non-linefetch (0x808) at 0xc90703e8
[...]
PC is at memcpy+0xcc/0x330
The above error occurs because the implementation of memcpy_toio()
tries to optimize the number of I/O by writing 4 bytes at a time as
much as possible, until there are less than 4 bytes left and then
switches to word or byte writes.
Unfortunately, the specification states about the Write Burst mode:
"the next AHB Write request should point to the next
incremented address and should have the same size (byte,
half-word or word)"
This means ARM architecture implementation of memcpy_toio() cannot
reliably be used blindly here. Workaround this situation by update the
write path to stick to byte access when the burst length is not
multiple of 4.
Fixes: f18dbbb1bfe0 ("mtd: ST SPEAr: Add SMI driver for serial NOR flash")
Cc: Russell King <linux(a)armlinux.org.uk>
Cc: Boris Brezillon <boris.brezillon(a)collabora.com>
Cc: stable(a)vger.kernel.org
Signed-off-by: Miquel Raynal <miquel.raynal(a)bootlin.com>
---
Changes in v2:
==============
* This time I think the patch really fixes the problem: we use a
memcpy_toio_b() function to force byte access only when needed. We
don't use the _memcpy_toio() helper anymore as the fact that it is
doing byte access is purely an implementation detail and is not part
of the API, while the function is also flagged as "should be
optimized".
* One could argue that potentially memcpy_toio() does not ensure by
design 4-bytes access only but I think it is good enough to use it
in this case as the ARM implementation of this function is already
extensively optimized. I also find clearer to use it than
adding my own spear_smi_mempy_toio_l(). Please tell me if you disagree
with this.
* The volatile keyword has been taken voluntarily from the _memcpy_toio()
implementation I was about to use previously.
drivers/mtd/devices/spear_smi.c | 25 ++++++++++++++++++++++++-
1 file changed, 24 insertions(+), 1 deletion(-)
diff --git a/drivers/mtd/devices/spear_smi.c b/drivers/mtd/devices/spear_smi.c
index 986f81d2f93e..84b7487d781d 100644
--- a/drivers/mtd/devices/spear_smi.c
+++ b/drivers/mtd/devices/spear_smi.c
@@ -592,6 +592,19 @@ static int spear_mtd_read(struct mtd_info *mtd, loff_t from, size_t len,
return 0;
}
+static void spear_smi_memcpy_toio_b(volatile void __iomem *dest,
+ const void *src, size_t len)
+{
+ const unsigned char *from = src;
+
+ while (len) {
+ len--;
+ writeb(*from, dest);
+ from++;
+ dest++;
+ }
+}
+
static inline int spear_smi_cpy_toio(struct spear_smi *dev, u32 bank,
void __iomem *dest, const void *src, size_t len)
{
@@ -614,7 +627,17 @@ static inline int spear_smi_cpy_toio(struct spear_smi *dev, u32 bank,
ctrlreg1 = readl(dev->io_base + SMI_CR1);
writel((ctrlreg1 | WB_MODE) & ~SW_MODE, dev->io_base + SMI_CR1);
- memcpy_toio(dest, src, len);
+ /*
+ * In Write Burst mode (WB_MODE), the specs states that writes must be
+ * incremental and of the same size, so we cannot use memcpy_toio() if
+ * the length is not 4-byte aligned because in order to increase the
+ * performances, it would proceed as much as possible with 4-byte access
+ * and potentially finish with smaller access sizes.
+ */
+ if (len % sizeof(u32))
+ spear_smi_memcpy_toio_b(dest, src, len);
+ else
+ memcpy_toio(dest, src, len);
writel(ctrlreg1, dev->io_base + SMI_CR1);
--
2.20.1
Commit
8a58ddae2379 ("perf/core: Fix exclusive events' grouping")
allows CAP_EXCLUSIVE events to be grouped with other events. Since all
of those also happen to be AUX events (which is not the case the other
way around, because arch/s390), this changes the rules for stopping the
output: the AUX event may not be on its PMU's context any more, if it's
grouped with a HW event, in which case it will be on that HW event's
context instead. If that's the case, munmap() of the AUX buffer can't
find and stop the AUX event, potentially leaving the last reference with
the atomic context, which will then end up freeing the AUX buffer. This
will then trip warnings:
> WARNING: CPU: 2 PID: 318 at kernel/events/core.c:5615 perf_mmap_close+0x839/0x850
> Modules linked in:
> CPU: 2 PID: 318 Comm: exclusive-group Tainted: G W 5.4.0-rc3-00070-g39b656ee9f2c #846
> RIP: 0010:perf_mmap_close+0x839/0x850
Fix this by using the context's PMU context when looking for events
to stop, instead of the event's PMU context.
Signed-off-by: Alexander Shishkin <alexander.shishkin(a)linux.intel.com>
Cc: stable(a)vger.kernel.org
---
kernel/events/core.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 5bbaabdad068..77793ef0d8bc 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -6965,7 +6965,7 @@ static void __perf_event_output_stop(struct perf_event *event, void *data)
static int __perf_pmu_output_stop(void *info)
{
struct perf_event *event = info;
- struct pmu *pmu = event->pmu;
+ struct pmu *pmu = event->ctx->pmu;
struct perf_cpu_context *cpuctx = this_cpu_ptr(pmu->pmu_cpu_context);
struct remote_output ro = {
.rb = event->rb,
--
2.23.0
Hello Sacha,
I can do a specific patch for backporting to kernel 4.19 and older ones if needed.
This is really simple.
Tell me if this is OK for you and how to proceed.
Thanks.
Best regards,
JB
From: Sasha Levin <sashal(a)kernel.org>
Sent: Thursday, October 17, 2019 16:31
To: Sasha Levin <sashal(a)kernel.org>; Jean-Baptiste Maneyrol <JManeyrol(a)invensense.com>; jic23(a)kernel.org <jic23(a)kernel.org>
Cc: linux-iio(a)vger.kernel.org <linux-iio(a)vger.kernel.org>; stable(a)vger.kernel.org <stable(a)vger.kernel.org>; stable(a)vger.kernel.org <stable(a)vger.kernel.org>
Subject: Re: [PATCH] iio: imu: inv_mpu6050: fix no data on MPU6050
CAUTION: This email originated from outside of the organization. Please make sure the sender is who they say they are and do not click links or open attachments unless you recognize the sender and know the content is safe.
Hi,
[This is an automated email]
This commit has been processed because it contains a "Fixes:" tag,
fixing commit: f5057e7b2dba4 iio: imu: inv_mpu6050: better fifo overflow handling.
The bot has tested the following trees: v5.3.6, v4.19.79.
v5.3.6: Build OK!
v4.19.79: Failed to apply! Possible dependencies:
22904bdff9783 ("iio: imu: mpu6050: Add support for the ICM 20602 IMU")
NOTE: The patch will not be queued to stable trees until it is upstream.
How should we proceed with this patch?
--
Thanks,
Sasha
Any write with either dd or flashcp to a device driven by the
spear_smi.c driver will pass through the spear_smi_cpy_toio()
function. This function will get called for chunks of up to 256 bytes.
If the amount of data is smaller, we may have a problem if the data
length is not 4-byte aligned. In this situation, the kernel panics
during the memcpy:
# dd if=/dev/urandom bs=1001 count=1 of=/dev/mtd6
spear_smi_cpy_toio [620] dest c9070000, src c7be8800, len 256
spear_smi_cpy_toio [620] dest c9070100, src c7be8900, len 256
spear_smi_cpy_toio [620] dest c9070200, src c7be8a00, len 256
spear_smi_cpy_toio [620] dest c9070300, src c7be8b00, len 233
Unhandled fault: external abort on non-linefetch (0x808) at 0xc90703e8
[...]
PC is at memcpy+0xcc/0x330
Workaround this issue by using the alternate _memcpy_toio() method
which at least does not present the same problem.
Fixes: f18dbbb1bfe0 ("mtd: ST SPEAr: Add SMI driver for serial NOR flash")
Cc: stable(a)vger.kernel.org
Suggested-by: Boris Brezillon <boris.brezillon(a)collabora.com>
Signed-off-by: Miquel Raynal <miquel.raynal(a)bootlin.com>
---
Hello,
This patch could not be tested with a mainline kernel (only compiled)
but was tested with a stable 4.14.x kernel. I have really no idea why
memcpy fails in this situation that's why I propose this workaround
but I bet there is something deeper not working.
Thanks,
Miquèl
drivers/mtd/devices/spear_smi.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/mtd/devices/spear_smi.c b/drivers/mtd/devices/spear_smi.c
index 986f81d2f93e..d888625a3244 100644
--- a/drivers/mtd/devices/spear_smi.c
+++ b/drivers/mtd/devices/spear_smi.c
@@ -614,7 +614,7 @@ static inline int spear_smi_cpy_toio(struct spear_smi *dev, u32 bank,
ctrlreg1 = readl(dev->io_base + SMI_CR1);
writel((ctrlreg1 | WB_MODE) & ~SW_MODE, dev->io_base + SMI_CR1);
- memcpy_toio(dest, src, len);
+ _memcpy_toio(dest, src, len);
writel(ctrlreg1, dev->io_base + SMI_CR1);
--
2.20.1
Hello,
We ran automated tests on a patchset that was proposed for merging into this
kernel tree. The patches were applied to:
Kernel repo: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git
Commit: 365dab61f74e - Linux 5.3.7
The results of these automated tests are provided below.
Overall result: FAILED (see details below)
Merge: OK
Compile: OK
Tests: FAILED
All kernel binaries, config files, and logs are available for download here:
https://artifacts.cki-project.org/pipelines/239816
One or more kernel tests failed:
ppc64le:
❌ LTP lite
We hope that these logs can help you find the problem quickly. For the full
detail on our testing procedures, please scroll to the bottom of this message.
Please reply to this email if you have any questions about the tests that we
ran or if you have any suggestions on how to make future tests more effective.
,-. ,-.
( C ) ( K ) Continuous
`-',-.`-' Kernel
( I ) Integration
`-'
______________________________________________________________________________
Merge testing
-------------
We cloned this repository and checked out the following commit:
Repo: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git
Commit: 365dab61f74e - Linux 5.3.7
We grabbed the 84dd7a6f55e6 commit of the stable queue repository.
We then merged the patchset with `git am`:
drm-free-the-writeback_job-when-it-with-an-empty-fb.patch
drm-clear-the-fence-pointer-when-writeback-job-signa.patch
clk-ti-dra7-fix-mcasp8-clock-bits.patch
arm-dts-fix-wrong-clocks-for-dra7-mcasp.patch
nvme-pci-fix-a-race-in-controller-removal.patch
scsi-ufs-skip-shutdown-if-hba-is-not-powered.patch
scsi-megaraid-disable-device-when-probe-failed-after.patch
scsi-qla2xxx-silence-fwdump-template-message.patch
scsi-qla2xxx-fix-unbound-sleep-in-fcport-delete-path.patch
scsi-qla2xxx-fix-stale-mem-access-on-driver-unload.patch
scsi-qla2xxx-fix-n2n-link-reset.patch
scsi-qla2xxx-fix-n2n-link-up-fail.patch
arm-dts-fix-gpio0-flags-for-am335x-icev2.patch
arm-omap2-fix-missing-reset-done-flag-for-am3-and-am.patch
arm-omap2-add-missing-lcdc-midlemode-for-am335x.patch
arm-omap2-fix-warnings-with-broken-omap2_set_init_vo.patch
nvme-tcp-fix-wrong-stop-condition-in-io_work.patch
nvme-pci-save-pci-state-before-putting-drive-into-de.patch
nvme-fix-an-error-code-in-nvme_init_subsystem.patch
nvme-rdma-fix-max_hw_sectors-calculation.patch
added-quirks-for-adata-xpg-sx8200-pro-512gb.patch
nvme-add-quirk-for-kingston-nvme-ssd-running-fw-e8fk.patch
nvme-allow-64-bit-results-in-passthru-commands.patch
drm-komeda-prevent-memory-leak-in-komeda_wb_connecto.patch
nvme-rdma-fix-possible-use-after-free-in-connect-tim.patch
blk-mq-honor-io-scheduler-for-multiqueue-devices.patch
ieee802154-ca8210-prevent-memory-leak.patch
arm-dts-am4372-set-memory-bandwidth-limit-for-dispc.patch
net-dsa-qca8k-use-up-to-7-ports-for-all-operations.patch
mips-dts-ar9331-fix-interrupt-controller-size.patch
xen-efi-set-nonblocking-callbacks.patch
loop-change-queue-block-size-to-match-when-using-dio.patch
nl80211-fix-null-pointer-dereference.patch
mac80211-fix-txq-null-pointer-dereference.patch
netfilter-nft_connlimit-disable-bh-on-garbage-collec.patch
net-mscc-ocelot-add-missing-of_node_put-after-callin.patch
net-dsa-rtl8366rb-add-missing-of_node_put-after-call.patch
net-stmmac-xgmac-not-all-unicast-addresses-may-be-av.patch
net-stmmac-dwmac4-always-update-the-mac-hash-filter.patch
net-stmmac-correctly-take-timestamp-for-ptpv2.patch
net-stmmac-do-not-stop-phy-if-wol-is-enabled.patch
net-ag71xx-fix-mdio-subnode-support.patch
risc-v-clear-load-reservations-while-restoring-hart-.patch
riscv-fix-memblock-reservation-for-device-tree-blob.patch
drm-amdgpu-fix-multiple-memory-leaks-in-acp_hw_init.patch
drm-amd-display-memory-leak.patch
mips-loongson-fix-the-link-time-qualifier-of-serial_.patch
net-hisilicon-fix-usage-of-uninitialized-variable-in.patch
net-stmmac-avoid-deadlock-on-suspend-resume.patch
selftests-kvm-fix-libkvm-build-error.patch
lib-textsearch-fix-escapes-in-example-code.patch
s390-mm-fix-wunused-but-set-variable-warnings.patch
r8152-set-macpassthru-in-reset_resume-callback.patch
net-phy-allow-for-reset-line-to-be-tied-to-a-sleepy-.patch
net-phy-fix-write-to-mii-ctrl1000-register.patch
namespace-fix-namespace.pl-script-to-support-relativ.patch
convert-filldir-64-from-__put_user-to-unsafe_put_use.patch
elf-don-t-use-map_fixed_noreplace-for-elf-executable.patch
make-filldir-64-verify-the-directory-entry-filename-.patch
uaccess-implement-a-proper-unsafe_copy_to_user-and-s.patch
filldir-64-remove-warn_on_once-for-bad-directory-ent.patch
net_sched-fix-backward-compatibility-for-tca_kind.patch
net_sched-fix-backward-compatibility-for-tca_act_kin.patch
libata-ahci-fix-pcs-quirk-application.patch
md-raid0-fix-warning-message-for-parameter-default_l.patch
revert-drm-radeon-fix-eeh-during-kexec.patch
ocfs2-fix-panic-due-to-ocfs2_wq-is-null.patch
nvme-pci-set-the-prp2-correctly-when-using-more-than-4k-page.patch
Compile testing
---------------
We compiled the kernel for 3 architectures:
aarch64:
make options: -j30 INSTALL_MOD_STRIP=1 targz-pkg
ppc64le:
make options: -j30 INSTALL_MOD_STRIP=1 targz-pkg
x86_64:
make options: -j30 INSTALL_MOD_STRIP=1 targz-pkg
Hardware testing
----------------
We booted each kernel and ran the following tests:
aarch64:
Host 1:
✅ Boot test
✅ Podman system integration test (as root)
✅ Podman system integration test (as user)
✅ LTP lite
✅ Loopdev Sanity
✅ jvm test suite
✅ AMTU (Abstract Machine Test Utility)
✅ LTP: openposix test suite
✅ Ethernet drivers sanity
✅ Networking socket: fuzz
✅ audit: audit testsuite test
✅ httpd: mod_ssl smoke sanity
✅ iotop: sanity
✅ tuned: tune-processes-through-perf
✅ Usex - version 1.9-29
✅ storage: SCSI VPD
🚧 ✅ POSIX pjd-fstest suites
🚧 ✅ storage: dm/common
Host 2:
✅ Boot test
✅ xfstests: xfs
✅ selinux-policy: serge-testsuite
✅ lvm thinp sanity
✅ storage: software RAID testing
🚧 ✅ Storage blktests
ppc64le:
Host 1:
✅ Boot test
✅ Podman system integration test (as root)
✅ Podman system integration test (as user)
❌ LTP lite
✅ Loopdev Sanity
✅ jvm test suite
✅ AMTU (Abstract Machine Test Utility)
✅ LTP: openposix test suite
✅ Ethernet drivers sanity
✅ Networking socket: fuzz
✅ audit: audit testsuite test
✅ httpd: mod_ssl smoke sanity
✅ iotop: sanity
✅ tuned: tune-processes-through-perf
✅ Usex - version 1.9-29
🚧 ✅ POSIX pjd-fstest suites
🚧 ✅ storage: dm/common
Host 2:
✅ Boot test
✅ xfstests: xfs
✅ selinux-policy: serge-testsuite
✅ lvm thinp sanity
✅ storage: software RAID testing
🚧 ✅ Storage blktests
x86_64:
Host 1:
✅ Boot test
✅ xfstests: xfs
✅ selinux-policy: serge-testsuite
✅ lvm thinp sanity
✅ storage: software RAID testing
🚧 ✅ Storage blktests
Host 2:
✅ Boot test
✅ Podman system integration test (as root)
✅ Podman system integration test (as user)
✅ LTP lite
✅ Loopdev Sanity
✅ jvm test suite
✅ AMTU (Abstract Machine Test Utility)
✅ LTP: openposix test suite
✅ Ethernet drivers sanity
✅ Networking socket: fuzz
✅ audit: audit testsuite test
✅ httpd: mod_ssl smoke sanity
✅ iotop: sanity
✅ tuned: tune-processes-through-perf
✅ pciutils: sanity smoke test
✅ Usex - version 1.9-29
✅ storage: SCSI VPD
✅ stress: stress-ng
🚧 ✅ POSIX pjd-fstest suites
🚧 ✅ storage: dm/common
Test sources: https://github.com/CKI-project/tests-beaker
💚 Pull requests are welcome for new tests or improvements to existing tests!
Waived tests
------------
If the test run included waived tests, they are marked with 🚧. Such tests are
executed but their results are not taken into account. Tests are waived when
their results are not reliable enough, e.g. when they're just introduced or are
being fixed.
Testing timeout
---------------
We aim to provide a report within reasonable timeframe. Tests that haven't
finished running are marked with ⏱. Reports for non-upstream kernels have
a Beaker recipe linked to next to each host.
The patch titled
Subject: mm, meminit: recalculate pcpu batch and high limits after init completes
has been added to the -mm tree. Its filename is
mm-meminit-recalculate-pcpu-batch-and-high-limits-after-init-completes.patch
This patch should soon appear at
http://ozlabs.org/~akpm/mmots/broken-out/mm-meminit-recalculate-pcpu-batch-…
and later at
http://ozlabs.org/~akpm/mmotm/broken-out/mm-meminit-recalculate-pcpu-batch-…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: Mel Gorman <mgorman(a)techsingularity.net>
Subject: mm, meminit: recalculate pcpu batch and high limits after init completes
Deferred memory initialisation updates zone->managed_pages during the
initialisation phase but before that finishes, the per-cpu page allocator
(pcpu) calculates the number of pages allocated/freed in batches as well
as the maximum number of pages allowed on a per-cpu list. As
zone->managed_pages is not up to date yet, the pcpu initialisation
calculates inappropriately low batch and high values.
This increases zone lock contention quite severely in some cases with the
degree of severity depending on how many CPUs share a local zone and the
size of the zone. A private report indicated that kernel build times were
excessive with extremely high system CPU usage. A perf profile indicated
that a large chunk of time was lost on zone->lock contention.
This patch recalculates the pcpu batch and high values after deferred
initialisation completes for every populated zone in the system. It was
tested on a 2-socket AMD EPYC 2 machine using a kernel compilation
workload -- allmodconfig and all available CPUs.
mmtests configuration: config-workload-kernbench-max Configuration was
modified to build on a fresh XFS partition.
kernbench
5.4.0-rc3 5.4.0-rc3
vanilla resetpcpu-v2
Amean user-256 13249.50 ( 0.00%) 16401.31 * -23.79%*
Amean syst-256 14760.30 ( 0.00%) 4448.39 * 69.86%*
Amean elsp-256 162.42 ( 0.00%) 119.13 * 26.65%*
Stddev user-256 42.97 ( 0.00%) 19.15 ( 55.43%)
Stddev syst-256 336.87 ( 0.00%) 6.71 ( 98.01%)
Stddev elsp-256 2.46 ( 0.00%) 0.39 ( 84.03%)
5.4.0-rc3 5.4.0-rc3
vanilla resetpcpu-v2
Duration User 39766.24 49221.79
Duration System 44298.10 13361.67
Duration Elapsed 519.11 388.87
The patch reduces system CPU usage by 69.86% and total build time by
26.65%. The variance of system CPU usage is also much reduced.
Before, this was the breakdown of batch and high values over all zones was.
256 batch: 1
256 batch: 63
512 batch: 7
256 high: 0
256 high: 378
512 high: 42
512 pcpu pagesets had a batch limit of 7 and a high limit of 42. After
the patch:
256 batch: 1
768 batch: 63
256 high: 0
768 high: 378
Link: http://lkml.kernel.org/r/20191021094808.28824-2-mgorman@techsingularity.net
Signed-off-by: Mel Gorman <mgorman(a)techsingularity.net>
Acked-by: Michal Hocko <mhocko(a)suse.com>
Acked-by: Vlastimil Babka <vbabka(a)suse.cz>
Cc: Matt Fleming <matt(a)codeblueprint.co.uk>
Cc: Thomas Gleixner <tglx(a)linutronix.de>
Cc: Borislav Petkov <bp(a)alien8.de>
Cc: Qian Cai <cai(a)lca.pw>
Cc: <stable(a)vger.kernel.org> [4.1+]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/page_alloc.c | 8 ++++++++
1 file changed, 8 insertions(+)
--- a/mm/page_alloc.c~mm-meminit-recalculate-pcpu-batch-and-high-limits-after-init-completes
+++ a/mm/page_alloc.c
@@ -1948,6 +1948,14 @@ void __init page_alloc_init_late(void)
wait_for_completion(&pgdat_init_all_done_comp);
/*
+ * The number of managed pages has changed due to the initialisation
+ * so the pcpu batch and high limits needs to be updated or the limits
+ * will be artificially small.
+ */
+ for_each_populated_zone(zone)
+ zone_pcp_update(zone);
+
+ /*
* We initialized the rest of the deferred pages. Permanently disable
* on-demand struct page initialization.
*/
_
Patches currently in -mm which might be from mgorman(a)techsingularity.net are
mm-meminit-recalculate-pcpu-batch-and-high-limits-after-init-completes.patch
mm-pcp-share-common-code-between-memory-hotplug-and-percpu-sysctl-handler.patch
mm-pcpu-make-zone-pcp-updates-and-reset-internal-to-the-mm.patch
The patch titled
Subject: mm, pcp: share common code between memory hotplug and percpu sysctl handler
has been removed from the -mm tree. Its filename was
mm-pcp-share-common-code-between-memory-hotplug-and-percpu-sysctl-handler.patch
This patch was dropped because an updated version will be merged
------------------------------------------------------
From: Mel Gorman <mgorman(a)techsingularity.net>
Subject: mm, pcp: share common code between memory hotplug and percpu sysctl handler
Both the percpu_pagelist_fraction sysctl handler and memory hotplug have a
common requirement of updating the pcpu page allocation batch and high
values. Split the relevant helper to share common code.
No functional change.
Link: http://lkml.kernel.org/r/20191018105606.3249-2-mgorman@techsingularity.net
Signed-off-by: Mel Gorman <mgorman(a)techsingularity.net>
Tested-by: Matt Fleming <matt(a)codeblueprint.co.uk>
Acked-by: Michal Hocko <mhocko(a)suse.com>
Cc: Vlastimil Babka <vbabka(a)suse.cz>
Cc: Thomas Gleixner <tglx(a)linutronix.de>
Cc: Borislav Petkov <bp(a)alien8.de>
Cc: <stable(a)vger.kernel.org> [4.1+]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/page_alloc.c | 23 ++++++++++++-----------
1 file changed, 12 insertions(+), 11 deletions(-)
--- a/mm/page_alloc.c~mm-pcp-share-common-code-between-memory-hotplug-and-percpu-sysctl-handler
+++ a/mm/page_alloc.c
@@ -7993,6 +7993,15 @@ int lowmem_reserve_ratio_sysctl_handler(
return 0;
}
+static void __zone_pcp_update(struct zone *zone)
+{
+ unsigned int cpu;
+
+ for_each_possible_cpu(cpu)
+ pageset_set_high_and_batch(zone,
+ per_cpu_ptr(zone->pageset, cpu));
+}
+
/*
* percpu_pagelist_fraction - changes the pcp->high for each zone on each
* cpu. It is the fraction of total pages in each zone that a hot per cpu
@@ -8024,13 +8033,8 @@ int percpu_pagelist_fraction_sysctl_hand
if (percpu_pagelist_fraction == old_percpu_pagelist_fraction)
goto out;
- for_each_populated_zone(zone) {
- unsigned int cpu;
-
- for_each_possible_cpu(cpu)
- pageset_set_high_and_batch(zone,
- per_cpu_ptr(zone->pageset, cpu));
- }
+ for_each_populated_zone(zone)
+ __zone_pcp_update(zone);
out:
mutex_unlock(&pcp_batch_high_lock);
return ret;
@@ -8528,11 +8532,8 @@ void free_contig_range(unsigned long pfn
*/
void __meminit zone_pcp_update(struct zone *zone)
{
- unsigned cpu;
mutex_lock(&pcp_batch_high_lock);
- for_each_possible_cpu(cpu)
- pageset_set_high_and_batch(zone,
- per_cpu_ptr(zone->pageset, cpu));
+ __zone_pcp_update(zone);
mutex_unlock(&pcp_batch_high_lock);
}
_
Patches currently in -mm which might be from mgorman(a)techsingularity.net are
mm-pcpu-make-zone-pcp-updates-and-reset-internal-to-the-mm.patch
The patch titled
Subject: mm, meminit: recalculate pcpu batch and high limits after init completes
has been removed from the -mm tree. Its filename was
mm-meminit-recalculate-pcpu-batch-and-high-limits-after-init-completes.patch
This patch was dropped because an updated version will be merged
------------------------------------------------------
From: Mel Gorman <mgorman(a)techsingularity.net>
Subject: mm, meminit: recalculate pcpu batch and high limits after init completes
Deferred memory initialisation updates zone->managed_pages during the
initialisation phase but before that finishes, the per-cpu page allocator
(pcpu) calculates the number of pages allocated/freed in batches as well
as the maximum number of pages allowed on a per-cpu list. As
zone->managed_pages is not up to date yet, the pcpu initialisation
calculates inappropriately low batch and high values.
This increases zone lock contention quite severely in some cases with the
degree of severity depending on how many CPUs share a local zone and the
size of the zone. A private report indicated that kernel build times were
excessive with extremely high system CPU usage. A perf profile indicated
that a large chunk of time was lost on zone->lock contention.
This patch recalculates the pcpu batch and high values after deferred
initialisation completes on each node. It was tested on a 2-socket AMD
EPYC 2 machine using a kernel compilation workload -- allmodconfig and all
available CPUs.
mmtests configuration: config-workload-kernbench-max Configuration was
modified to build on a fresh XFS partition.
kernbench
5.4.0-rc3 5.4.0-rc3
vanilla resetpcpu-v1r1
Amean user-256 13249.50 ( 0.00%) 15928.40 * -20.22%*
Amean syst-256 14760.30 ( 0.00%) 4551.77 * 69.16%*
Amean elsp-256 162.42 ( 0.00%) 118.46 * 27.06%*
Stddev user-256 42.97 ( 0.00%) 50.83 ( -18.30%)
Stddev syst-256 336.87 ( 0.00%) 33.70 ( 90.00%)
Stddev elsp-256 2.46 ( 0.00%) 0.81 ( 67.01%)
5.4.0-rc3 5.4.0-rc3
vanillaresetpcpu-v1r1
Duration User 39766.24 47802.92
Duration System 44298.10 13671.93
Duration Elapsed 519.11 387.65
The patch reduces system CPU usage by 69.16% and total build time by
27.06%. The variance of system CPU usage is also much reduced.
Link: http://lkml.kernel.org/r/20191018105606.3249-3-mgorman@techsingularity.net
Signed-off-by: Mel Gorman <mgorman(a)techsingularity.net>
Tested-by: Matt Fleming <matt(a)codeblueprint.co.uk>
Acked-by: Michal Hocko <mhocko(a)suse.com>
Cc: Vlastimil Babka <vbabka(a)suse.cz>
Cc: Thomas Gleixner <tglx(a)linutronix.de>
Cc: Borislav Petkov <bp(a)alien8.de>
Cc: <stable(a)vger.kernel.org> [4.1+]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/page_alloc.c | 10 ++++++++--
1 file changed, 8 insertions(+), 2 deletions(-)
--- a/mm/page_alloc.c~mm-meminit-recalculate-pcpu-batch-and-high-limits-after-init-completes
+++ a/mm/page_alloc.c
@@ -1818,6 +1818,14 @@ static int __init deferred_init_memmap(v
*/
while (spfn < epfn)
nr_pages += deferred_init_maxorder(&i, zone, &spfn, &epfn);
+
+ /*
+ * The number of managed pages has changed due to the initialisation
+ * so the pcpu batch and high limits needs to be updated or the limits
+ * will be artificially small.
+ */
+ zone_pcp_update(zone);
+
zone_empty:
pgdat_resize_unlock(pgdat, &flags);
@@ -8514,7 +8522,6 @@ void free_contig_range(unsigned long pfn
WARN(count != 0, "%d pages are still in use!\n", count);
}
-#ifdef CONFIG_MEMORY_HOTPLUG
/*
* The zone indicated has a new number of managed_pages; batch sizes and percpu
* page high values need to be recalulated.
@@ -8528,7 +8535,6 @@ void __meminit zone_pcp_update(struct zo
per_cpu_ptr(zone->pageset, cpu));
mutex_unlock(&pcp_batch_high_lock);
}
-#endif
void zone_pcp_reset(struct zone *zone)
{
_
Patches currently in -mm which might be from mgorman(a)techsingularity.net are
mm-pcp-share-common-code-between-memory-hotplug-and-percpu-sysctl-handler.patch
mm-pcpu-make-zone-pcp-updates-and-reset-internal-to-the-mm.patch