March 2019 - Linux-stable-mirror

FAILED: patch "[PATCH] tpm/tpm_crb: Avoid unaligned reads in crb_recv()" failed to apply to 4.9-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 4.9-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. thanks, greg k-h ------------------ original commit in Linus's tree ------------------ >From 3d7a850fdc1a2e4d2adbc95cc0fc962974725e88 Mon Sep 17 00:00:00 2001 From: Jarkko Sakkinen <jarkko.sakkinen(a)linux.intel.com> Date: Mon, 4 Feb 2019 15:59:43 +0200 Subject: [PATCH] tpm/tpm_crb: Avoid unaligned reads in crb_recv() The current approach to read first 6 bytes from the response and then tail of the response, can cause the 2nd memcpy_fromio() to do an unaligned read (e.g. read 32-bit word from address aligned to a 16-bits), depending on how memcpy_fromio() is implemented. If this happens, the read will fail and the memory controller will fill the read with 1's. This was triggered by 170d13ca3a2f, which should be probably refined to check and react to the address alignment. Before that commit, on x86 memcpy_fromio() turned out to be memcpy(). By a luck GCC has done the right thing (from tpm_crb's perspective) for us so far, but we should not rely on that. Thus, it makes sense to fix this also in tpm_crb, not least because the fix can be then backported to stable kernels and make them more robust when compiled in differing environments. Cc: stable(a)vger.kernel.org Cc: James Morris <jmorris(a)namei.org> Cc: Tomas Winkler <tomas.winkler(a)intel.com> Cc: Jerry Snitselaar <jsnitsel(a)redhat.com> Fixes: 30fc8d138e91 ("tpm: TPM 2.0 CRB Interface") Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen(a)linux.intel.com> Reviewed-by: Jerry Snitselaar <jsnitsel(a)redhat.com> Acked-by: Tomas Winkler <tomas.winkler(a)intel.com> diff --git a/drivers/char/tpm/tpm_crb.c b/drivers/char/tpm/tpm_crb.c index 36952ef98f90..763fc7e6c005 100644 --- a/drivers/char/tpm/tpm_crb.c +++ b/drivers/char/tpm/tpm_crb.c @@ -287,19 +287,29 @@ static int crb_recv(struct tpm_chip *chip, u8 *buf, size_t count) struct crb_priv *priv = dev_get_drvdata(&chip->dev); unsigned int expected; - /* sanity check */ - if (count < 6) + /* A sanity check that the upper layer wants to get at least the header + * as that is the minimum size for any TPM response. + */ + if (count < TPM_HEADER_SIZE) return -EIO; + /* If this bit is set, according to the spec, the TPM is in + * unrecoverable condition. + */ if (ioread32(&priv->regs_t->ctrl_sts) & CRB_CTRL_STS_ERROR) return -EIO; - memcpy_fromio(buf, priv->rsp, 6); - expected = be32_to_cpup((__be32 *) &buf[2]); - if (expected > count || expected < 6) + /* Read the first 8 bytes in order to get the length of the response. + * We read exactly a quad word in order to make sure that the remaining + * reads will be aligned. + */ + memcpy_fromio(buf, priv->rsp, 8); + + expected = be32_to_cpup((__be32 *)&buf[2]); + if (expected > count || expected < TPM_HEADER_SIZE) return -EIO; - memcpy_fromio(&buf[6], &priv->rsp[6], expected - 6); + memcpy_fromio(&buf[8], &priv->rsp[8], expected - 8); return expected; }

6 years, 9 months

1
0
0 0

FAILED: patch "[PATCH] x86/kvmclock: set offset for kvm unstable clock" failed to apply to 4.19-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 4.19-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. thanks, greg k-h ------------------ original commit in Linus's tree ------------------ >From b5179ec4187251a751832193693d6e474d3445ac Mon Sep 17 00:00:00 2001 From: Pavel Tatashin <pasha.tatashin(a)soleen.com> Date: Sat, 26 Jan 2019 12:49:56 -0500 Subject: [PATCH] x86/kvmclock: set offset for kvm unstable clock VMs may show incorrect uptime and dmesg printk offsets on hypervisors with unstable clock. The problem is produced when VM is rebooted without exiting from qemu. The fix is to calculate clock offset not only for stable clock but for unstable clock as well, and use kvm_sched_clock_read() which substracts the offset for both clocks. This is safe, because pvclock_clocksource_read() does the right thing and makes sure that clock always goes forward, so once offset is calculated with unstable clock, we won't get new reads that are smaller than offset, and thus won't get negative results. Thank you Jon DeVree for helping to reproduce this issue. Fixes: 857baa87b642 ("sched/clock: Enable sched clock early") Cc: stable(a)vger.kernel.org Reported-by: Dominique Martinet <asmadeus(a)codewreck.org> Signed-off-by: Pavel Tatashin <pasha.tatashin(a)soleen.com> Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com> diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c index e811d4d1c824..d908a37bf3f3 100644 --- a/arch/x86/kernel/kvmclock.c +++ b/arch/x86/kernel/kvmclock.c @@ -104,12 +104,8 @@ static u64 kvm_sched_clock_read(void) static inline void kvm_sched_clock_init(bool stable) { - if (!stable) { - pv_ops.time.sched_clock = kvm_clock_read; + if (!stable) clear_sched_clock_stable(); - return; - } - kvm_sched_clock_offset = kvm_clock_read(); pv_ops.time.sched_clock = kvm_sched_clock_read;

6 years, 9 months

1
0
0 0

stable-rc/linux-4.14.y boot: 55 boots: 0 failed, 54 passed with 1 untried/unknown (v4.14.107-130-g4f24d4cd99b6)

by kernelci.org bot

stable-rc/linux-4.14.y boot: 55 boots: 0 failed, 54 passed with 1 untried/unknown (v4.14.107-130-g4f24d4cd99b6) Full Boot Summary: https://kernelci.org/boot/all/job/stable-rc/branch/linux-4.14.y/kernel/v4.1… Full Build Summary: https://kernelci.org/build/stable-rc/branch/linux-4.14.y/kernel/v4.14.107-1… Tree: stable-rc Branch: linux-4.14.y Git Describe: v4.14.107-130-g4f24d4cd99b6 Git Commit: 4f24d4cd99b600a54dea32405ee415b6682a4247 Git URL: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git Tested: 29 unique boards, 15 SoC families, 9 builds out of 201 --- For more info write to <info(a)kernelci.org>

6 years, 9 months

1
0
0 0

FAILED: patch "[PATCH] bcache: never writeback a discard operation" failed to apply to 4.4-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 4.4-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. thanks, greg k-h ------------------ original commit in Linus's tree ------------------ >From 9951379b0ca88c95876ad9778b9099e19a95d566 Mon Sep 17 00:00:00 2001 From: Daniel Axtens <dja(a)axtens.net> Date: Sat, 9 Feb 2019 12:52:53 +0800 Subject: [PATCH] bcache: never writeback a discard operation Some users see panics like the following when performing fstrim on a bcached volume: [ 529.803060] BUG: unable to handle kernel NULL pointer dereference at 0000000000000008 [ 530.183928] #PF error: [normal kernel read fault] [ 530.412392] PGD 8000001f42163067 P4D 8000001f42163067 PUD 1f42168067 PMD 0 [ 530.750887] Oops: 0000 [#1] SMP PTI [ 530.920869] CPU: 10 PID: 4167 Comm: fstrim Kdump: loaded Not tainted 5.0.0-rc1+ #3 [ 531.290204] Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360 Gen9, BIOS P89 12/27/2015 [ 531.693137] RIP: 0010:blk_queue_split+0x148/0x620 [ 531.922205] Code: 60 38 89 55 a0 45 31 db 45 31 f6 45 31 c9 31 ff 89 4d 98 85 db 0f 84 7f 04 00 00 44 8b 6d 98 4c 89 ee 48 c1 e6 04 49 03 70 78 <8b> 46 08 44 8b 56 0c 48 8b 16 44 29 e0 39 d8 48 89 55 a8 0f 47 c3 [ 532.838634] RSP: 0018:ffffb9b708df39b0 EFLAGS: 00010246 [ 533.093571] RAX: 00000000ffffffff RBX: 0000000000046000 RCX: 0000000000000000 [ 533.441865] RDX: 0000000000000200 RSI: 0000000000000000 RDI: 0000000000000000 [ 533.789922] RBP: ffffb9b708df3a48 R08: ffff940d3b3fdd20 R09: 0000000000000000 [ 534.137512] R10: ffffb9b708df3958 R11: 0000000000000000 R12: 0000000000000000 [ 534.485329] R13: 0000000000000000 R14: 0000000000000000 R15: ffff940d39212020 [ 534.833319] FS: 00007efec26e3840(0000) GS:ffff940d1f480000(0000) knlGS:0000000000000000 [ 535.224098] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 535.504318] CR2: 0000000000000008 CR3: 0000001f4e256004 CR4: 00000000001606e0 [ 535.851759] Call Trace: [ 535.970308] ? mempool_alloc_slab+0x15/0x20 [ 536.174152] ? bch_data_insert+0x42/0xd0 [bcache] [ 536.403399] blk_mq_make_request+0x97/0x4f0 [ 536.607036] generic_make_request+0x1e2/0x410 [ 536.819164] submit_bio+0x73/0x150 [ 536.980168] ? submit_bio+0x73/0x150 [ 537.149731] ? bio_associate_blkg_from_css+0x3b/0x60 [ 537.391595] ? _cond_resched+0x1a/0x50 [ 537.573774] submit_bio_wait+0x59/0x90 [ 537.756105] blkdev_issue_discard+0x80/0xd0 [ 537.959590] ext4_trim_fs+0x4a9/0x9e0 [ 538.137636] ? ext4_trim_fs+0x4a9/0x9e0 [ 538.324087] ext4_ioctl+0xea4/0x1530 [ 538.497712] ? _copy_to_user+0x2a/0x40 [ 538.679632] do_vfs_ioctl+0xa6/0x600 [ 538.853127] ? __do_sys_newfstat+0x44/0x70 [ 539.051951] ksys_ioctl+0x6d/0x80 [ 539.212785] __x64_sys_ioctl+0x1a/0x20 [ 539.394918] do_syscall_64+0x5a/0x110 [ 539.568674] entry_SYSCALL_64_after_hwframe+0x44/0xa9 We have observed it where both: 1) LVM/devmapper is involved (bcache backing device is LVM volume) and 2) writeback cache is involved (bcache cache_mode is writeback) On one machine, we can reliably reproduce it with: # echo writeback > /sys/block/bcache0/bcache/cache_mode (not sure whether above line is required) # mount /dev/bcache0 /test # for i in {0..10}; do file="$(mktemp /test/zero.XXX)" dd if=/dev/zero of="$file" bs=1M count=256 sync rm $file done # fstrim -v /test Observing this with tracepoints on, we see the following writes: fstrim-18019 [022] .... 91107.302026: bcache_write: 73f95583-561c-408f-a93a-4cbd2498f5c8 inode 0 DS 4260112 + 196352 hit 0 bypass 1 fstrim-18019 [022] .... 91107.302050: bcache_write: 73f95583-561c-408f-a93a-4cbd2498f5c8 inode 0 DS 4456464 + 262144 hit 0 bypass 1 fstrim-18019 [022] .... 91107.302075: bcache_write: 73f95583-561c-408f-a93a-4cbd2498f5c8 inode 0 DS 4718608 + 81920 hit 0 bypass 1 fstrim-18019 [022] .... 91107.302094: bcache_write: 73f95583-561c-408f-a93a-4cbd2498f5c8 inode 0 DS 5324816 + 180224 hit 0 bypass 1 fstrim-18019 [022] .... 91107.302121: bcache_write: 73f95583-561c-408f-a93a-4cbd2498f5c8 inode 0 DS 5505040 + 262144 hit 0 bypass 1 fstrim-18019 [022] .... 91107.302145: bcache_write: 73f95583-561c-408f-a93a-4cbd2498f5c8 inode 0 DS 5767184 + 81920 hit 0 bypass 1 fstrim-18019 [022] .... 91107.308777: bcache_write: 73f95583-561c-408f-a93a-4cbd2498f5c8 inode 0 DS 6373392 + 180224 hit 1 bypass 0 <crash> Note the final one has different hit/bypass flags. This is because in should_writeback(), we were hitting a case where the partial stripe condition was returning true and so should_writeback() was returning true early. If that hadn't been the case, it would have hit the would_skip test, and as would_skip == s->iop.bypass == true, should_writeback() would have returned false. Looking at the git history from 'commit 72c270612bd3 ("bcache: Write out full stripes")', it looks like the idea was to optimise for raid5/6: * If a stripe is already dirty, force writes to that stripe to writeback mode - to help build up full stripes of dirty data To fix this issue, make sure that should_writeback() on a discard op never returns true. More details of debugging: https://www.spinics.net/lists/linux-bcache/msg06996.html Previous reports: - https://bugzilla.kernel.org/show_bug.cgi?id=201051 - https://bugzilla.kernel.org/show_bug.cgi?id=196103 - https://www.spinics.net/lists/linux-bcache/msg06885.html (Coly Li: minor modification to follow maximum 75 chars per line rule) Cc: Kent Overstreet <koverstreet(a)google.com> Cc: stable(a)vger.kernel.org Fixes: 72c270612bd3 ("bcache: Write out full stripes") Signed-off-by: Daniel Axtens <dja(a)axtens.net> Signed-off-by: Coly Li <colyli(a)suse.de> Signed-off-by: Jens Axboe <axboe(a)kernel.dk> diff --git a/drivers/md/bcache/writeback.h b/drivers/md/bcache/writeback.h index 6a743d3bb338..4e4c6810dc3c 100644 --- a/drivers/md/bcache/writeback.h +++ b/drivers/md/bcache/writeback.h @@ -71,6 +71,9 @@ static inline bool should_writeback(struct cached_dev *dc, struct bio *bio, in_use > bch_cutoff_writeback_sync) return false; + if (bio_op(bio) == REQ_OP_DISCARD) + return false; + if (dc->partial_stripes_expensive && bcache_dev_stripe_dirty(dc, bio->bi_iter.bi_sector, bio_sectors(bio)))

6 years, 9 months

1
0
0 0

FAILED: patch "[PATCH] bcache: never writeback a discard operation" failed to apply to 3.18-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 3.18-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. thanks, greg k-h ------------------ original commit in Linus's tree ------------------ >From 9951379b0ca88c95876ad9778b9099e19a95d566 Mon Sep 17 00:00:00 2001 From: Daniel Axtens <dja(a)axtens.net> Date: Sat, 9 Feb 2019 12:52:53 +0800 Subject: [PATCH] bcache: never writeback a discard operation Some users see panics like the following when performing fstrim on a bcached volume: [ 529.803060] BUG: unable to handle kernel NULL pointer dereference at 0000000000000008 [ 530.183928] #PF error: [normal kernel read fault] [ 530.412392] PGD 8000001f42163067 P4D 8000001f42163067 PUD 1f42168067 PMD 0 [ 530.750887] Oops: 0000 [#1] SMP PTI [ 530.920869] CPU: 10 PID: 4167 Comm: fstrim Kdump: loaded Not tainted 5.0.0-rc1+ #3 [ 531.290204] Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360 Gen9, BIOS P89 12/27/2015 [ 531.693137] RIP: 0010:blk_queue_split+0x148/0x620 [ 531.922205] Code: 60 38 89 55 a0 45 31 db 45 31 f6 45 31 c9 31 ff 89 4d 98 85 db 0f 84 7f 04 00 00 44 8b 6d 98 4c 89 ee 48 c1 e6 04 49 03 70 78 <8b> 46 08 44 8b 56 0c 48 8b 16 44 29 e0 39 d8 48 89 55 a8 0f 47 c3 [ 532.838634] RSP: 0018:ffffb9b708df39b0 EFLAGS: 00010246 [ 533.093571] RAX: 00000000ffffffff RBX: 0000000000046000 RCX: 0000000000000000 [ 533.441865] RDX: 0000000000000200 RSI: 0000000000000000 RDI: 0000000000000000 [ 533.789922] RBP: ffffb9b708df3a48 R08: ffff940d3b3fdd20 R09: 0000000000000000 [ 534.137512] R10: ffffb9b708df3958 R11: 0000000000000000 R12: 0000000000000000 [ 534.485329] R13: 0000000000000000 R14: 0000000000000000 R15: ffff940d39212020 [ 534.833319] FS: 00007efec26e3840(0000) GS:ffff940d1f480000(0000) knlGS:0000000000000000 [ 535.224098] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 535.504318] CR2: 0000000000000008 CR3: 0000001f4e256004 CR4: 00000000001606e0 [ 535.851759] Call Trace: [ 535.970308] ? mempool_alloc_slab+0x15/0x20 [ 536.174152] ? bch_data_insert+0x42/0xd0 [bcache] [ 536.403399] blk_mq_make_request+0x97/0x4f0 [ 536.607036] generic_make_request+0x1e2/0x410 [ 536.819164] submit_bio+0x73/0x150 [ 536.980168] ? submit_bio+0x73/0x150 [ 537.149731] ? bio_associate_blkg_from_css+0x3b/0x60 [ 537.391595] ? _cond_resched+0x1a/0x50 [ 537.573774] submit_bio_wait+0x59/0x90 [ 537.756105] blkdev_issue_discard+0x80/0xd0 [ 537.959590] ext4_trim_fs+0x4a9/0x9e0 [ 538.137636] ? ext4_trim_fs+0x4a9/0x9e0 [ 538.324087] ext4_ioctl+0xea4/0x1530 [ 538.497712] ? _copy_to_user+0x2a/0x40 [ 538.679632] do_vfs_ioctl+0xa6/0x600 [ 538.853127] ? __do_sys_newfstat+0x44/0x70 [ 539.051951] ksys_ioctl+0x6d/0x80 [ 539.212785] __x64_sys_ioctl+0x1a/0x20 [ 539.394918] do_syscall_64+0x5a/0x110 [ 539.568674] entry_SYSCALL_64_after_hwframe+0x44/0xa9 We have observed it where both: 1) LVM/devmapper is involved (bcache backing device is LVM volume) and 2) writeback cache is involved (bcache cache_mode is writeback) On one machine, we can reliably reproduce it with: # echo writeback > /sys/block/bcache0/bcache/cache_mode (not sure whether above line is required) # mount /dev/bcache0 /test # for i in {0..10}; do file="$(mktemp /test/zero.XXX)" dd if=/dev/zero of="$file" bs=1M count=256 sync rm $file done # fstrim -v /test Observing this with tracepoints on, we see the following writes: fstrim-18019 [022] .... 91107.302026: bcache_write: 73f95583-561c-408f-a93a-4cbd2498f5c8 inode 0 DS 4260112 + 196352 hit 0 bypass 1 fstrim-18019 [022] .... 91107.302050: bcache_write: 73f95583-561c-408f-a93a-4cbd2498f5c8 inode 0 DS 4456464 + 262144 hit 0 bypass 1 fstrim-18019 [022] .... 91107.302075: bcache_write: 73f95583-561c-408f-a93a-4cbd2498f5c8 inode 0 DS 4718608 + 81920 hit 0 bypass 1 fstrim-18019 [022] .... 91107.302094: bcache_write: 73f95583-561c-408f-a93a-4cbd2498f5c8 inode 0 DS 5324816 + 180224 hit 0 bypass 1 fstrim-18019 [022] .... 91107.302121: bcache_write: 73f95583-561c-408f-a93a-4cbd2498f5c8 inode 0 DS 5505040 + 262144 hit 0 bypass 1 fstrim-18019 [022] .... 91107.302145: bcache_write: 73f95583-561c-408f-a93a-4cbd2498f5c8 inode 0 DS 5767184 + 81920 hit 0 bypass 1 fstrim-18019 [022] .... 91107.308777: bcache_write: 73f95583-561c-408f-a93a-4cbd2498f5c8 inode 0 DS 6373392 + 180224 hit 1 bypass 0 <crash> Note the final one has different hit/bypass flags. This is because in should_writeback(), we were hitting a case where the partial stripe condition was returning true and so should_writeback() was returning true early. If that hadn't been the case, it would have hit the would_skip test, and as would_skip == s->iop.bypass == true, should_writeback() would have returned false. Looking at the git history from 'commit 72c270612bd3 ("bcache: Write out full stripes")', it looks like the idea was to optimise for raid5/6: * If a stripe is already dirty, force writes to that stripe to writeback mode - to help build up full stripes of dirty data To fix this issue, make sure that should_writeback() on a discard op never returns true. More details of debugging: https://www.spinics.net/lists/linux-bcache/msg06996.html Previous reports: - https://bugzilla.kernel.org/show_bug.cgi?id=201051 - https://bugzilla.kernel.org/show_bug.cgi?id=196103 - https://www.spinics.net/lists/linux-bcache/msg06885.html (Coly Li: minor modification to follow maximum 75 chars per line rule) Cc: Kent Overstreet <koverstreet(a)google.com> Cc: stable(a)vger.kernel.org Fixes: 72c270612bd3 ("bcache: Write out full stripes") Signed-off-by: Daniel Axtens <dja(a)axtens.net> Signed-off-by: Coly Li <colyli(a)suse.de> Signed-off-by: Jens Axboe <axboe(a)kernel.dk> diff --git a/drivers/md/bcache/writeback.h b/drivers/md/bcache/writeback.h index 6a743d3bb338..4e4c6810dc3c 100644 --- a/drivers/md/bcache/writeback.h +++ b/drivers/md/bcache/writeback.h @@ -71,6 +71,9 @@ static inline bool should_writeback(struct cached_dev *dc, struct bio *bio, in_use > bch_cutoff_writeback_sync) return false; + if (bio_op(bio) == REQ_OP_DISCARD) + return false; + if (dc->partial_stripes_expensive && bcache_dev_stripe_dirty(dc, bio->bi_iter.bi_sector, bio_sectors(bio)))

6 years, 9 months

1
0
0 0

FAILED: patch "[PATCH] ipmi_si: Fix crash when using hard-coded device" failed to apply to 4.19-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 4.19-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. thanks, greg k-h ------------------ original commit in Linus's tree ------------------ >From 41b766d661bf94a364960862cfc248a78313dbd3 Mon Sep 17 00:00:00 2001 From: Corey Minyard <cminyard(a)mvista.com> Date: Thu, 21 Feb 2019 12:10:07 -0600 Subject: [PATCH] ipmi_si: Fix crash when using hard-coded device When excuting a command like: modprobe ipmi_si ports=0xffc0e3 type=bt The system would get an oops. The trouble here is that ipmi_si_hardcode_find_bmc() is called before ipmi_si_platform_init(), but initialization of the hard-coded device creates an IPMI platform device, which won't be initialized yet. The real trouble is that hard-coded devices aren't created with any device, and the fixup is done later. So do it right, create the hard-coded devices as normal platform devices. This required adding some new resource types to the IPMI platform code for passing information required by the hard-coded device and adding some code to remove the hard-coded platform devices on module removal. To enforce the "hard-coded devices passed by the user take priority over firmware devices" rule, some special code was added to check and see if a hard-coded device already exists. Reported-by: Yang Yingliang <yangyingliang(a)huawei.com> Cc: stable(a)vger.kernel.org # v4.15+ Signed-off-by: Corey Minyard <cminyard(a)mvista.com> Tested-by: Yang Yingliang <yangyingliang(a)huawei.com> diff --git a/drivers/char/ipmi/ipmi_si.h b/drivers/char/ipmi/ipmi_si.h index 52f6152d1fcb..7ae52c17618e 100644 --- a/drivers/char/ipmi/ipmi_si.h +++ b/drivers/char/ipmi/ipmi_si.h @@ -25,7 +25,9 @@ void ipmi_irq_finish_setup(struct si_sm_io *io); int ipmi_si_remove_by_dev(struct device *dev); void ipmi_si_remove_by_data(int addr_space, enum si_type si_type, unsigned long addr); -int ipmi_si_hardcode_find_bmc(void); +void ipmi_hardcode_init(void); +void ipmi_si_hardcode_exit(void); +int ipmi_si_hardcode_match(int addr_type, unsigned long addr); void ipmi_si_platform_init(void); void ipmi_si_platform_shutdown(void); diff --git a/drivers/char/ipmi/ipmi_si_hardcode.c b/drivers/char/ipmi/ipmi_si_hardcode.c index 487642809c58..1e5783961b0d 100644 --- a/drivers/char/ipmi/ipmi_si_hardcode.c +++ b/drivers/char/ipmi/ipmi_si_hardcode.c @@ -3,6 +3,7 @@ #define pr_fmt(fmt) "ipmi_hardcode: " fmt #include <linux/moduleparam.h> +#include <linux/platform_device.h> #include "ipmi_si.h" /* @@ -12,23 +13,22 @@ #define SI_MAX_PARMS 4 -static char *si_type[SI_MAX_PARMS]; #define MAX_SI_TYPE_STR 30 -static char si_type_str[MAX_SI_TYPE_STR]; +static char si_type_str[MAX_SI_TYPE_STR] __initdata; static unsigned long addrs[SI_MAX_PARMS]; static unsigned int num_addrs; static unsigned int ports[SI_MAX_PARMS]; static unsigned int num_ports; -static int irqs[SI_MAX_PARMS]; -static unsigned int num_irqs; -static int regspacings[SI_MAX_PARMS]; -static unsigned int num_regspacings; -static int regsizes[SI_MAX_PARMS]; -static unsigned int num_regsizes; -static int regshifts[SI_MAX_PARMS]; -static unsigned int num_regshifts; -static int slave_addrs[SI_MAX_PARMS]; /* Leaving 0 chooses the default value */ -static unsigned int num_slave_addrs; +static int irqs[SI_MAX_PARMS] __initdata; +static unsigned int num_irqs __initdata; +static int regspacings[SI_MAX_PARMS] __initdata; +static unsigned int num_regspacings __initdata; +static int regsizes[SI_MAX_PARMS] __initdata; +static unsigned int num_regsizes __initdata; +static int regshifts[SI_MAX_PARMS] __initdata; +static unsigned int num_regshifts __initdata; +static int slave_addrs[SI_MAX_PARMS] __initdata; +static unsigned int num_slave_addrs __initdata; module_param_string(type, si_type_str, MAX_SI_TYPE_STR, 0); MODULE_PARM_DESC(type, "Defines the type of each interface, each" @@ -73,12 +73,133 @@ MODULE_PARM_DESC(slave_addrs, "Set the default IPMB slave address for" " overridden by this parm. This is an array indexed" " by interface number."); -int ipmi_si_hardcode_find_bmc(void) +static struct platform_device *ipmi_hc_pdevs[SI_MAX_PARMS]; + +static void __init ipmi_hardcode_init_one(const char *si_type_str, + unsigned int i, + unsigned long addr, + unsigned int flags) { - int ret = -ENODEV; - int i; - struct si_sm_io io; + struct platform_device *pdev; + unsigned int num_r = 1, size; + struct resource r[4]; + struct property_entry p[6]; + enum si_type si_type; + unsigned int regspacing, regsize; + int rv; + + memset(p, 0, sizeof(p)); + memset(r, 0, sizeof(r)); + + if (!si_type_str || !*si_type_str || strcmp(si_type_str, "kcs") == 0) { + size = 2; + si_type = SI_KCS; + } else if (strcmp(si_type_str, "smic") == 0) { + size = 2; + si_type = SI_SMIC; + } else if (strcmp(si_type_str, "bt") == 0) { + size = 3; + si_type = SI_BT; + } else if (strcmp(si_type_str, "invalid") == 0) { + /* + * Allow a firmware-specified interface to be + * disabled. + */ + size = 1; + si_type = SI_TYPE_INVALID; + } else { + pr_warn("Interface type specified for interface %d, was invalid: %s\n", + i, si_type_str); + return; + } + + regsize = regsizes[i]; + if (regsize == 0) + regsize = DEFAULT_REGSIZE; + + p[0] = PROPERTY_ENTRY_U8("ipmi-type", si_type); + p[1] = PROPERTY_ENTRY_U8("slave-addr", slave_addrs[i]); + p[2] = PROPERTY_ENTRY_U8("addr-source", SI_HARDCODED); + p[3] = PROPERTY_ENTRY_U8("reg-shift", regshifts[i]); + p[4] = PROPERTY_ENTRY_U8("reg-size", regsize); + /* Last entry must be left NULL to terminate it. */ + + /* + * Register spacing is derived from the resources in + * the IPMI platform code. + */ + regspacing = regspacings[i]; + if (regspacing == 0) + regspacing = regsize; + + r[0].start = addr; + r[0].end = r[0].start + regsize - 1; + r[0].name = "IPMI Address 1"; + r[0].flags = flags; + + if (size > 1) { + r[1].start = r[0].start + regspacing; + r[1].end = r[1].start + regsize - 1; + r[1].name = "IPMI Address 2"; + r[1].flags = flags; + num_r++; + } + + if (size > 2) { + r[2].start = r[1].start + regspacing; + r[2].end = r[2].start + regsize - 1; + r[2].name = "IPMI Address 3"; + r[2].flags = flags; + num_r++; + } + + if (irqs[i]) { + r[num_r].start = irqs[i]; + r[num_r].end = irqs[i]; + r[num_r].name = "IPMI IRQ"; + r[num_r].flags = IORESOURCE_IRQ; + num_r++; + } + + pdev = platform_device_alloc("hardcode-ipmi-si", i); + if (!pdev) { + pr_err("Error allocating IPMI platform device %d\n", i); + return; + } + + rv = platform_device_add_resources(pdev, r, num_r); + if (rv) { + dev_err(&pdev->dev, + "Unable to add hard-code resources: %d\n", rv); + goto err; + } + + rv = platform_device_add_properties(pdev, p); + if (rv) { + dev_err(&pdev->dev, + "Unable to add hard-code properties: %d\n", rv); + goto err; + } + + rv = platform_device_add(pdev); + if (rv) { + dev_err(&pdev->dev, + "Unable to add hard-code device: %d\n", rv); + goto err; + } + + ipmi_hc_pdevs[i] = pdev; + return; + +err: + platform_device_put(pdev); +} + +void __init ipmi_hardcode_init(void) +{ + unsigned int i; char *str; + char *si_type[SI_MAX_PARMS]; /* Parse out the si_type string into its components. */ str = si_type_str; @@ -95,54 +216,45 @@ int ipmi_si_hardcode_find_bmc(void) } } - memset(&io, 0, sizeof(io)); for (i = 0; i < SI_MAX_PARMS; i++) { - if (!ports[i] && !addrs[i]) - continue; - - io.addr_source = SI_HARDCODED; - pr_info("probing via hardcoded address\n"); - - if (!si_type[i] || strcmp(si_type[i], "kcs") == 0) { - io.si_type = SI_KCS; - } else if (strcmp(si_type[i], "smic") == 0) { - io.si_type = SI_SMIC; - } else if (strcmp(si_type[i], "bt") == 0) { - io.si_type = SI_BT; - } else { - pr_warn("Interface type specified for interface %d, was invalid: %s\n", - i, si_type[i]); - continue; - } + if (i < num_ports && ports[i]) + ipmi_hardcode_init_one(si_type[i], i, ports[i], + IORESOURCE_IO); + if (i < num_addrs && addrs[i]) + ipmi_hardcode_init_one(si_type[i], i, addrs[i], + IORESOURCE_MEM); + } +} - if (ports[i]) { - /* An I/O port */ - io.addr_data = ports[i]; - io.addr_type = IPMI_IO_ADDR_SPACE; - } else if (addrs[i]) { - /* A memory port */ - io.addr_data = addrs[i]; - io.addr_type = IPMI_MEM_ADDR_SPACE; - } else { - pr_warn("Interface type specified for interface %d, but port and address were not set or set to zero\n", - i); - continue; - } +void ipmi_si_hardcode_exit(void) +{ + unsigned int i; - io.addr = NULL; - io.regspacing = regspacings[i]; - if (!io.regspacing) - io.regspacing = DEFAULT_REGSPACING; - io.regsize = regsizes[i]; - if (!io.regsize) - io.regsize = DEFAULT_REGSIZE; - io.regshift = regshifts[i]; - io.irq = irqs[i]; - if (io.irq) - io.irq_setup = ipmi_std_irq_setup; - io.slave_addr = slave_addrs[i]; - - ret = ipmi_si_add_smi(&io); + for (i = 0; i < SI_MAX_PARMS; i++) { + if (ipmi_hc_pdevs[i]) + platform_device_unregister(ipmi_hc_pdevs[i]); } - return ret; +} + +/* + * Returns true of the given address exists as a hardcoded address, + * false if not. + */ +int ipmi_si_hardcode_match(int addr_type, unsigned long addr) +{ + unsigned int i; + + if (addr_type == IPMI_IO_ADDR_SPACE) { + for (i = 0; i < num_ports; i++) { + if (ports[i] == addr) + return 1; + } + } else { + for (i = 0; i < num_addrs; i++) { + if (addrs[i] == addr) + return 1; + } + } + + return 0; } diff --git a/drivers/char/ipmi/ipmi_si_intf.c b/drivers/char/ipmi/ipmi_si_intf.c index ae99d6a14789..abbd526626d5 100644 --- a/drivers/char/ipmi/ipmi_si_intf.c +++ b/drivers/char/ipmi/ipmi_si_intf.c @@ -1865,6 +1865,18 @@ int ipmi_si_add_smi(struct si_sm_io *io) int rv = 0; struct smi_info *new_smi, *dup; + /* + * If the user gave us a hard-coded device at the same + * address, they presumably want us to use it and not what is + * in the firmware. + */ + if (io->addr_source != SI_HARDCODED && + ipmi_si_hardcode_match(io->addr_type, io->addr_data)) { + dev_info(io->dev, + "Hard-coded device at this address already exists"); + return -ENODEV; + } + if (!io->io_setup) { if (io->addr_type == IPMI_IO_ADDR_SPACE) { io->io_setup = ipmi_si_port_setup; @@ -2097,7 +2109,7 @@ static int try_smi_init(struct smi_info *new_smi) return rv; } -static int init_ipmi_si(void) +static int __init init_ipmi_si(void) { struct smi_info *e; enum ipmi_addr_src type = SI_INVALID; @@ -2105,11 +2117,9 @@ static int init_ipmi_si(void) if (initialized) return 0; - pr_info("IPMI System Interface driver\n"); + ipmi_hardcode_init(); - /* If the user gave us a device, they presumably want us to use it */ - if (!ipmi_si_hardcode_find_bmc()) - goto do_scan; + pr_info("IPMI System Interface driver\n"); ipmi_si_platform_init(); @@ -2121,7 +2131,6 @@ static int init_ipmi_si(void) with multiple BMCs we assume that there will be several instances of a given type so if we succeed in registering a type then also try to register everything else of the same type */ -do_scan: mutex_lock(&smi_infos_lock); list_for_each_entry(e, &smi_infos, link) { /* Try to register a device if it has an IRQ and we either @@ -2307,6 +2316,8 @@ static void cleanup_ipmi_si(void) list_for_each_entry_safe(e, tmp_e, &smi_infos, link) cleanup_one_si(e); mutex_unlock(&smi_infos_lock); + + ipmi_si_hardcode_exit(); } module_exit(cleanup_ipmi_si); diff --git a/drivers/char/ipmi/ipmi_si_platform.c b/drivers/char/ipmi/ipmi_si_platform.c index 15cf819f884f..8158d03542f4 100644 --- a/drivers/char/ipmi/ipmi_si_platform.c +++ b/drivers/char/ipmi/ipmi_si_platform.c @@ -128,8 +128,6 @@ ipmi_get_info_from_resources(struct platform_device *pdev, if (res_second->start > io->addr_data) io->regspacing = res_second->start - io->addr_data; } - io->regsize = DEFAULT_REGSIZE; - io->regshift = 0; return res; } @@ -137,7 +135,7 @@ ipmi_get_info_from_resources(struct platform_device *pdev, static int platform_ipmi_probe(struct platform_device *pdev) { struct si_sm_io io; - u8 type, slave_addr, addr_source; + u8 type, slave_addr, addr_source, regsize, regshift; int rv; rv = device_property_read_u8(&pdev->dev, "addr-source", &addr_source); @@ -149,7 +147,7 @@ static int platform_ipmi_probe(struct platform_device *pdev) if (addr_source == SI_SMBIOS) { if (!si_trydmi) return -ENODEV; - } else { + } else if (addr_source != SI_HARDCODED) { if (!si_tryplatform) return -ENODEV; } @@ -169,11 +167,23 @@ static int platform_ipmi_probe(struct platform_device *pdev) case SI_BT: io.si_type = type; break; + case SI_TYPE_INVALID: /* User disabled this in hardcode. */ + return -ENODEV; default: dev_err(&pdev->dev, "ipmi-type property is invalid\n"); return -EINVAL; } + io.regsize = DEFAULT_REGSIZE; + rv = device_property_read_u8(&pdev->dev, "reg-size", &regsize); + if (!rv) + io.regsize = regsize; + + io.regshift = 0; + rv = device_property_read_u8(&pdev->dev, "reg-shift", &regshift); + if (!rv) + io.regshift = regshift; + if (!ipmi_get_info_from_resources(pdev, &io)) return -EINVAL; @@ -193,7 +203,8 @@ static int platform_ipmi_probe(struct platform_device *pdev) io.dev = &pdev->dev; - pr_info("ipmi_si: SMBIOS: %s %#lx regsize %d spacing %d irq %d\n", + pr_info("ipmi_si: %s: %s %#lx regsize %d spacing %d irq %d\n", + ipmi_addr_src_to_str(addr_source), (io.addr_type == IPMI_IO_ADDR_SPACE) ? "io" : "mem", io.addr_data, io.regsize, io.regspacing, io.irq); @@ -358,6 +369,9 @@ static int acpi_ipmi_probe(struct platform_device *pdev) goto err_free; } + io.regsize = DEFAULT_REGSIZE; + io.regshift = 0; + res = ipmi_get_info_from_resources(pdev, &io); if (!res) { rv = -EINVAL; @@ -420,8 +434,9 @@ static int ipmi_remove(struct platform_device *pdev) } static const struct platform_device_id si_plat_ids[] = { - { "dmi-ipmi-si", 0 }, - { } + { "dmi-ipmi-si", 0 }, + { "hardcode-ipmi-si", 0 }, + { } }; struct platform_driver ipmi_platform_driver = {

6 years, 9 months

1
0
0 0

FAILED: patch "[PATCH] powerpc/smp: Fix NMI IPI xmon timeout" failed to apply to 4.19-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 4.19-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. thanks, greg k-h ------------------ original commit in Linus's tree ------------------ >From 88b9a3d1425a436e95c41f09986fdae2daee437a Mon Sep 17 00:00:00 2001 From: Nicholas Piggin <npiggin(a)gmail.com> Date: Mon, 26 Nov 2018 12:01:06 +1000 Subject: [PATCH] powerpc/smp: Fix NMI IPI xmon timeout The xmon debugger IPI handler waits in the callback function while xmon is still active. This means they don't complete the IPI, and the initiator always times out waiting for them. Things manage to work after the timeout because there is some fallback logic to keep NMI IPI state sane in case of the timeout, but this is a bit ugly. This patch changes NMI IPI back to half-asynchronous (i.e., wait for everyone to call in, do not wait for IPI function to complete), but the complexity is avoided by going one step further and allowing new IPIs to be issued before the IPI functions to all complete. If synchronization against that is required, it is left up to the caller, but current callers don't require that. In fact with the timeout handling, callers must be able to cope with this already. Fixes: 5b73151fff63 ("powerpc: NMI IPI make NMI IPIs fully sychronous") Cc: stable(a)vger.kernel.org # v4.19+ Signed-off-by: Nicholas Piggin <npiggin(a)gmail.com> Signed-off-by: Michael Ellerman <mpe(a)ellerman.id.au> diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c index 137196a4248b..6e521a3f67ca 100644 --- a/arch/powerpc/kernel/smp.c +++ b/arch/powerpc/kernel/smp.c @@ -358,13 +358,12 @@ void arch_send_call_function_ipi_mask(const struct cpumask *mask) * NMI IPIs may not be recoverable, so should not be used as ongoing part of * a running system. They can be used for crash, debug, halt/reboot, etc. * - * NMI IPIs are globally single threaded. No more than one in progress at - * any time. - * * The IPI call waits with interrupts disabled until all targets enter the - * NMI handler, then the call returns. + * NMI handler, then returns. Subsequent IPIs can be issued before targets + * have returned from their handlers, so there is no guarantee about + * concurrency or re-entrancy. * - * No new NMI can be initiated until targets exit the handler. + * A new NMI can be issued before all targets exit the handler. * * The IPI call may time out without all targets entering the NMI handler. * In that case, there is some logic to recover (and ignore subsequent @@ -375,7 +374,7 @@ void arch_send_call_function_ipi_mask(const struct cpumask *mask) static atomic_t __nmi_ipi_lock = ATOMIC_INIT(0); static struct cpumask nmi_ipi_pending_mask; -static int nmi_ipi_busy_count = 0; +static bool nmi_ipi_busy = false; static void (*nmi_ipi_function)(struct pt_regs *) = NULL; static void nmi_ipi_lock_start(unsigned long *flags) @@ -414,7 +413,7 @@ static void nmi_ipi_unlock_end(unsigned long *flags) */ int smp_handle_nmi_ipi(struct pt_regs *regs) { - void (*fn)(struct pt_regs *); + void (*fn)(struct pt_regs *) = NULL; unsigned long flags; int me = raw_smp_processor_id(); int ret = 0; @@ -425,29 +424,17 @@ int smp_handle_nmi_ipi(struct pt_regs *regs) * because the caller may have timed out. */ nmi_ipi_lock_start(&flags); - if (!nmi_ipi_busy_count) - goto out; - if (!cpumask_test_cpu(me, &nmi_ipi_pending_mask)) - goto out; - - fn = nmi_ipi_function; - if (!fn) - goto out; - - cpumask_clear_cpu(me, &nmi_ipi_pending_mask); - nmi_ipi_busy_count++; - nmi_ipi_unlock(); - - ret = 1; - - fn(regs); - - nmi_ipi_lock(); - if (nmi_ipi_busy_count > 1) /* Can race with caller time-out */ - nmi_ipi_busy_count--; -out: + if (cpumask_test_cpu(me, &nmi_ipi_pending_mask)) { + cpumask_clear_cpu(me, &nmi_ipi_pending_mask); + fn = READ_ONCE(nmi_ipi_function); + WARN_ON_ONCE(!fn); + ret = 1; + } nmi_ipi_unlock_end(&flags); + if (fn) + fn(regs); + return ret; } @@ -473,7 +460,7 @@ static void do_smp_send_nmi_ipi(int cpu, bool safe) * - cpu is the target CPU (must not be this CPU), or NMI_IPI_ALL_OTHERS. * - fn is the target callback function. * - delay_us > 0 is the delay before giving up waiting for targets to - * complete executing the handler, == 0 specifies indefinite delay. + * begin executing the handler, == 0 specifies indefinite delay. */ int __smp_send_nmi_ipi(int cpu, void (*fn)(struct pt_regs *), u64 delay_us, bool safe) { @@ -487,31 +474,33 @@ int __smp_send_nmi_ipi(int cpu, void (*fn)(struct pt_regs *), u64 delay_us, bool if (unlikely(!smp_ops)) return 0; - /* Take the nmi_ipi_busy count/lock with interrupts hard disabled */ nmi_ipi_lock_start(&flags); - while (nmi_ipi_busy_count) { + while (nmi_ipi_busy) { nmi_ipi_unlock_end(&flags); - spin_until_cond(nmi_ipi_busy_count == 0); + spin_until_cond(!nmi_ipi_busy); nmi_ipi_lock_start(&flags); } - + nmi_ipi_busy = true; nmi_ipi_function = fn; + WARN_ON_ONCE(!cpumask_empty(&nmi_ipi_pending_mask)); + if (cpu < 0) { /* ALL_OTHERS */ cpumask_copy(&nmi_ipi_pending_mask, cpu_online_mask); cpumask_clear_cpu(me, &nmi_ipi_pending_mask); } else { - /* cpumask starts clear */ cpumask_set_cpu(cpu, &nmi_ipi_pending_mask); } - nmi_ipi_busy_count++; + nmi_ipi_unlock(); + /* Interrupts remain hard disabled */ + do_smp_send_nmi_ipi(cpu, safe); nmi_ipi_lock(); - /* nmi_ipi_busy_count is held here, so unlock/lock is okay */ + /* nmi_ipi_busy is set here, so unlock/lock is okay */ while (!cpumask_empty(&nmi_ipi_pending_mask)) { nmi_ipi_unlock(); udelay(1); @@ -519,34 +508,19 @@ int __smp_send_nmi_ipi(int cpu, void (*fn)(struct pt_regs *), u64 delay_us, bool if (delay_us) { delay_us--; if (!delay_us) - goto timeout; + break; } } - while (nmi_ipi_busy_count > 1) { - nmi_ipi_unlock(); - udelay(1); - nmi_ipi_lock(); - if (delay_us) { - delay_us--; - if (!delay_us) - goto timeout; - } - } - -timeout: if (!cpumask_empty(&nmi_ipi_pending_mask)) { /* Timeout waiting for CPUs to call smp_handle_nmi_ipi */ ret = 0; cpumask_clear(&nmi_ipi_pending_mask); } - if (nmi_ipi_busy_count > 1) { - /* Timeout waiting for CPUs to execute fn */ - ret = 0; - nmi_ipi_busy_count = 1; - } - nmi_ipi_busy_count--; + nmi_ipi_function = NULL; + nmi_ipi_busy = false; + nmi_ipi_unlock_end(&flags); return ret; @@ -614,17 +588,8 @@ void crash_send_ipi(void (*crash_ipi_callback)(struct pt_regs *)) static void nmi_stop_this_cpu(struct pt_regs *regs) { /* - * This is a special case because it never returns, so the NMI IPI - * handling would never mark it as done, which makes any later - * smp_send_nmi_ipi() call spin forever. Mark it done now. - * * IRQs are already hard disabled by the smp_handle_nmi_ipi. */ - nmi_ipi_lock(); - if (nmi_ipi_busy_count > 1) - nmi_ipi_busy_count--; - nmi_ipi_unlock(); - spin_begin(); while (1) spin_cpu_relax();

6 years, 9 months

1
0
0 0

FAILED: patch "[PATCH] powerpc/smp: Fix NMI IPI timeout" failed to apply to 4.19-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 4.19-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. thanks, greg k-h ------------------ original commit in Linus's tree ------------------ >From 1b5fc84aba170bdfe3533396ca9662ceea1609b7 Mon Sep 17 00:00:00 2001 From: Nicholas Piggin <npiggin(a)gmail.com> Date: Mon, 26 Nov 2018 12:01:05 +1000 Subject: [PATCH] powerpc/smp: Fix NMI IPI timeout The NMI IPI timeout logic is broken, if __smp_send_nmi_ipi() times out on the first condition, delay_us will be zero which will send it into the second spin loop with no timeout so it will spin forever. Fixes: 5b73151fff63 ("powerpc: NMI IPI make NMI IPIs fully sychronous") Cc: stable(a)vger.kernel.org # v4.19+ Signed-off-by: Nicholas Piggin <npiggin(a)gmail.com> Signed-off-by: Michael Ellerman <mpe(a)ellerman.id.au> diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c index 3f15edf25a0d..137196a4248b 100644 --- a/arch/powerpc/kernel/smp.c +++ b/arch/powerpc/kernel/smp.c @@ -519,7 +519,7 @@ int __smp_send_nmi_ipi(int cpu, void (*fn)(struct pt_regs *), u64 delay_us, bool if (delay_us) { delay_us--; if (!delay_us) - break; + goto timeout; } } @@ -530,10 +530,11 @@ int __smp_send_nmi_ipi(int cpu, void (*fn)(struct pt_regs *), u64 delay_us, bool if (delay_us) { delay_us--; if (!delay_us) - break; + goto timeout; } } +timeout: if (!cpumask_empty(&nmi_ipi_pending_mask)) { /* Timeout waiting for CPUs to call smp_handle_nmi_ipi */ ret = 0;

6 years, 9 months

1
0
0 0

FAILED: patch "[PATCH] powerpc/powernv: Don't reprogram SLW image on every KVM guest" failed to apply to 4.14-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 4.14-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. thanks, greg k-h ------------------ original commit in Linus's tree ------------------ >From 19f8a5b5be2898573a5e1dc1db93e8d40117606a Mon Sep 17 00:00:00 2001 From: Paul Mackerras <paulus(a)ozlabs.org> Date: Tue, 12 Feb 2019 11:58:29 +1100 Subject: [PATCH] powerpc/powernv: Don't reprogram SLW image on every KVM guest entry/exit Commit 24be85a23d1f ("powerpc/powernv: Clear PECE1 in LPCR via stop-api only on Hotplug", 2017-07-21) added two calls to opal_slw_set_reg() inside pnv_cpu_offline(), with the aim of changing the LPCR value in the SLW image to disable wakeups from the decrementer while a CPU is offline. However, pnv_cpu_offline() gets called each time a secondary CPU thread is woken up to participate in running a KVM guest, that is, not just when a CPU is offlined. Since opal_slw_set_reg() is a very slow operation (with observed execution times around 20 milliseconds), this means that an offline secondary CPU can often be busy doing the opal_slw_set_reg() call when the primary CPU wants to grab all the secondary threads so that it can run a KVM guest. This leads to messages like "KVM: couldn't grab CPU n" being printed and guest execution failing. There is no need to reprogram the SLW image on every KVM guest entry and exit. So that we do it only when a CPU is really transitioning between online and offline, this moves the calls to pnv_program_cpu_hotplug_lpcr() into pnv_smp_cpu_kill_self(). Fixes: 24be85a23d1f ("powerpc/powernv: Clear PECE1 in LPCR via stop-api only on Hotplug") Cc: stable(a)vger.kernel.org # v4.14+ Signed-off-by: Paul Mackerras <paulus(a)ozlabs.org> Signed-off-by: Michael Ellerman <mpe(a)ellerman.id.au> diff --git a/arch/powerpc/include/asm/powernv.h b/arch/powerpc/include/asm/powernv.h index 362ea12a4501..05b552418519 100644 --- a/arch/powerpc/include/asm/powernv.h +++ b/arch/powerpc/include/asm/powernv.h @@ -23,6 +23,8 @@ extern int pnv_npu2_handle_fault(struct npu_context *context, uintptr_t *ea, unsigned long *flags, unsigned long *status, int count); +void pnv_program_cpu_hotplug_lpcr(unsigned int cpu, u64 lpcr_val); + void pnv_tm_init(void); #else static inline void powernv_set_nmmu_ptcr(unsigned long ptcr) { } diff --git a/arch/powerpc/platforms/powernv/idle.c b/arch/powerpc/platforms/powernv/idle.c index 35f699ebb662..e52f9b06dd9c 100644 --- a/arch/powerpc/platforms/powernv/idle.c +++ b/arch/powerpc/platforms/powernv/idle.c @@ -458,7 +458,8 @@ EXPORT_SYMBOL_GPL(pnv_power9_force_smt4_release); #endif /* CONFIG_KVM_BOOK3S_HV_POSSIBLE */ #ifdef CONFIG_HOTPLUG_CPU -static void pnv_program_cpu_hotplug_lpcr(unsigned int cpu, u64 lpcr_val) + +void pnv_program_cpu_hotplug_lpcr(unsigned int cpu, u64 lpcr_val) { u64 pir = get_hard_smp_processor_id(cpu); @@ -481,20 +482,6 @@ unsigned long pnv_cpu_offline(unsigned int cpu) { unsigned long srr1; u32 idle_states = pnv_get_supported_cpuidle_states(); - u64 lpcr_val; - - /* - * We don't want to take decrementer interrupts while we are - * offline, so clear LPCR:PECE1. We keep PECE2 (and - * LPCR_PECE_HVEE on P9) enabled as to let IPIs in. - * - * If the CPU gets woken up by a special wakeup, ensure that - * the SLW engine sets LPCR with decrementer bit cleared, else - * the CPU will come back to the kernel due to a spurious - * wakeup. - */ - lpcr_val = mfspr(SPRN_LPCR) & ~(u64)LPCR_PECE1; - pnv_program_cpu_hotplug_lpcr(cpu, lpcr_val); __ppc64_runlatch_off(); @@ -526,16 +513,6 @@ unsigned long pnv_cpu_offline(unsigned int cpu) __ppc64_runlatch_on(); - /* - * Re-enable decrementer interrupts in LPCR. - * - * Further, we want stop states to be woken up by decrementer - * for non-hotplug cases. So program the LPCR via stop api as - * well. - */ - lpcr_val = mfspr(SPRN_LPCR) | (u64)LPCR_PECE1; - pnv_program_cpu_hotplug_lpcr(cpu, lpcr_val); - return srr1; } #endif diff --git a/arch/powerpc/platforms/powernv/smp.c b/arch/powerpc/platforms/powernv/smp.c index 0d354e19ef92..db09c7022635 100644 --- a/arch/powerpc/platforms/powernv/smp.c +++ b/arch/powerpc/platforms/powernv/smp.c @@ -39,6 +39,7 @@ #include <asm/cpuidle.h> #include <asm/kexec.h> #include <asm/reg.h> +#include <asm/powernv.h> #include "powernv.h" @@ -153,6 +154,7 @@ static void pnv_smp_cpu_kill_self(void) { unsigned int cpu; unsigned long srr1, wmask; + u64 lpcr_val; /* Standard hot unplug procedure */ /* @@ -174,6 +176,19 @@ static void pnv_smp_cpu_kill_self(void) if (cpu_has_feature(CPU_FTR_ARCH_207S)) wmask = SRR1_WAKEMASK_P8; + /* + * We don't want to take decrementer interrupts while we are + * offline, so clear LPCR:PECE1. We keep PECE2 (and + * LPCR_PECE_HVEE on P9) enabled so as to let IPIs in. + * + * If the CPU gets woken up by a special wakeup, ensure that + * the SLW engine sets LPCR with decrementer bit cleared, else + * the CPU will come back to the kernel due to a spurious + * wakeup. + */ + lpcr_val = mfspr(SPRN_LPCR) & ~(u64)LPCR_PECE1; + pnv_program_cpu_hotplug_lpcr(cpu, lpcr_val); + while (!generic_check_cpu_restart(cpu)) { /* * Clear IPI flag, since we don't handle IPIs while @@ -246,6 +261,16 @@ static void pnv_smp_cpu_kill_self(void) } + /* + * Re-enable decrementer interrupts in LPCR. + * + * Further, we want stop states to be woken up by decrementer + * for non-hotplug cases. So program the LPCR via stop api as + * well. + */ + lpcr_val = mfspr(SPRN_LPCR) | (u64)LPCR_PECE1; + pnv_program_cpu_hotplug_lpcr(cpu, lpcr_val); + DBG("CPU%d coming online...\n", cpu); }

6 years, 9 months

1
0
0 0

FAILED: patch "[PATCH] powerpc/kvm: Save and restore host AMR/IAMR/UAMOR" failed to apply to 4.19-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 4.19-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. thanks, greg k-h ------------------ original commit in Linus's tree ------------------ >From c3c7470c75566a077c8dc71dcf8f1948b8ddfab4 Mon Sep 17 00:00:00 2001 From: Michael Ellerman <mpe(a)ellerman.id.au> Date: Fri, 22 Feb 2019 13:22:08 +1100 Subject: [PATCH] powerpc/kvm: Save and restore host AMR/IAMR/UAMOR When the hash MMU is active the AMR, IAMR and UAMOR are used for pkeys. The AMR is directly writable by user space, and the UAMOR masks those writes, meaning both registers are effectively user register state. The IAMR is used to create an execute only key. Also we must maintain the value of at least the AMR when running in process context, so that any memory accesses done by the kernel on behalf of the process are correctly controlled by the AMR. Although we are correctly switching all registers when going into a guest, on returning to the host we just write 0 into all regs, except on Power9 where we restore the IAMR correctly. This could be observed by a user process if it writes the AMR, then runs a guest and we then return immediately to it without rescheduling. Because we have written 0 to the AMR that would have the effect of granting read/write permission to pages that the process was trying to protect. In addition, when using the Radix MMU, the AMR can prevent inadvertent kernel access to userspace data, writing 0 to the AMR disables that protection. So save and restore AMR, IAMR and UAMOR. Fixes: cf43d3b26452 ("powerpc: Enable pkey subsystem") Cc: stable(a)vger.kernel.org # v4.16+ Signed-off-by: Russell Currey <ruscur(a)russell.cc> Signed-off-by: Michael Ellerman <mpe(a)ellerman.id.au> Acked-by: Paul Mackerras <paulus(a)ozlabs.org> diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S index f24f6a2f8eb5..25043b50cb30 100644 --- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S +++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S @@ -58,6 +58,8 @@ END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_300) #define STACK_SLOT_DAWR (SFS-56) #define STACK_SLOT_DAWRX (SFS-64) #define STACK_SLOT_HFSCR (SFS-72) +#define STACK_SLOT_AMR (SFS-80) +#define STACK_SLOT_UAMOR (SFS-88) /* the following is used by the P9 short path */ #define STACK_SLOT_NVGPRS (SFS-152) /* 18 gprs */ @@ -726,11 +728,9 @@ BEGIN_FTR_SECTION mfspr r5, SPRN_TIDR mfspr r6, SPRN_PSSCR mfspr r7, SPRN_PID - mfspr r8, SPRN_IAMR std r5, STACK_SLOT_TID(r1) std r6, STACK_SLOT_PSSCR(r1) std r7, STACK_SLOT_PID(r1) - std r8, STACK_SLOT_IAMR(r1) mfspr r5, SPRN_HFSCR std r5, STACK_SLOT_HFSCR(r1) END_FTR_SECTION_IFSET(CPU_FTR_ARCH_300) @@ -738,11 +738,18 @@ BEGIN_FTR_SECTION mfspr r5, SPRN_CIABR mfspr r6, SPRN_DAWR mfspr r7, SPRN_DAWRX + mfspr r8, SPRN_IAMR std r5, STACK_SLOT_CIABR(r1) std r6, STACK_SLOT_DAWR(r1) std r7, STACK_SLOT_DAWRX(r1) + std r8, STACK_SLOT_IAMR(r1) END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S) + mfspr r5, SPRN_AMR + std r5, STACK_SLOT_AMR(r1) + mfspr r6, SPRN_UAMOR + std r6, STACK_SLOT_UAMOR(r1) + BEGIN_FTR_SECTION /* Set partition DABR */ /* Do this before re-enabling PMU to avoid P7 DABR corruption bug */ @@ -1631,22 +1638,25 @@ ALT_FTR_SECTION_END_IFCLR(CPU_FTR_ARCH_300) mtspr SPRN_PSPB, r0 mtspr SPRN_WORT, r0 BEGIN_FTR_SECTION - mtspr SPRN_IAMR, r0 mtspr SPRN_TCSCR, r0 /* Set MMCRS to 1<<31 to freeze and disable the SPMC counters */ li r0, 1 sldi r0, r0, 31 mtspr SPRN_MMCRS, r0 END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_300) -8: - /* Save and reset AMR and UAMOR before turning on the MMU */ + /* Save and restore AMR, IAMR and UAMOR before turning on the MMU */ + ld r8, STACK_SLOT_IAMR(r1) + mtspr SPRN_IAMR, r8 + +8: /* Power7 jumps back in here */ mfspr r5,SPRN_AMR mfspr r6,SPRN_UAMOR std r5,VCPU_AMR(r9) std r6,VCPU_UAMOR(r9) - li r6,0 - mtspr SPRN_AMR,r6 + ld r5,STACK_SLOT_AMR(r1) + ld r6,STACK_SLOT_UAMOR(r1) + mtspr SPRN_AMR, r5 mtspr SPRN_UAMOR, r6 /* Switch DSCR back to host value */ @@ -1746,11 +1756,9 @@ BEGIN_FTR_SECTION ld r5, STACK_SLOT_TID(r1) ld r6, STACK_SLOT_PSSCR(r1) ld r7, STACK_SLOT_PID(r1) - ld r8, STACK_SLOT_IAMR(r1) mtspr SPRN_TIDR, r5 mtspr SPRN_PSSCR, r6 mtspr SPRN_PID, r7 - mtspr SPRN_IAMR, r8 END_FTR_SECTION_IFSET(CPU_FTR_ARCH_300) #ifdef CONFIG_PPC_RADIX_MMU

6 years, 9 months

1
0
0 0

2025

2024

2023

2022

2021

2020

2019

2018

2017

Linux-stable-mirror March 2019