Hi folks,
A few Fedora users have reported[0] a regression starting in v4.16.8
where the boot will hang ~1/3 of the time with the following RCU stall
warning:
INFO: rcu_sched detected stalls on CPUs/tasks:
o1-...!: (0 ticks this GP) idle=688/0/0 softirq=171/171 fqs=0
o(detected by 0, t=60002 jiffies, g=-142, c=-143, q=9)
Sending NMI from CPU 0 to CPU 1:
NMI backtrace for cpu 1 skipped: idling at
acpi_processor_ffh_cstate_enter+0x65/0xb0
rcu_sched kthread starved for 60002 jiffies! g18446744073709551474
c1844674407370955143 f0x0 RCU_GP_WAIT_FQS(3) ->state=0x402 -> cpu=1
RCU grace-period kthread stack dump:
rcu_sched I 0 9 2 0x80000000
Call Trace:
? __schedule+0x234/0x850
schedule+0x28/0x80
schedule_timeout+0x166/0x380
? __next_timer_interrupt+0xc0/0xc0
rcu_gp_kthread+0x368/0x830
? rcu_process_callbacks+0x4f0/0x4f0
kthread+0x112/0x130
? kthread_create_worker_on_cpu+0x70/0x70
ret_from_fork+0x35/0x40
A user has bisected the problem to the v4.16 commit 1ab4ca7c59d4
("x86/tsc: Fix mark_tsc_unstable()"). According to the reporter,
explicitly setting "tsc=" on the kernel command line causes the boot to
always succeed. All the users have Thinkpad T500s or T400s (Core 2 Duos)
[0] https://bugzilla.redhat.com/show_bug.cgi?id=1579925
Thanks,
Jeremy
This is the start of the stable review cycle for the 4.17.1 release.
There are 15 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Mon Jun 11 14:59:48 UTC 2018.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.17.1-rc1…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.17.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 4.17.1-rc1
Dexuan Cui <decui(a)microsoft.com>
PCI: hv: Do not wait forever on a device that has disappeared
Sabrina Dubroca <sd(a)queasysnail.net>
ipmr: fix error path when ipmr_new_table fails
Arun Parameswaran <arun.parameswaran(a)broadcom.com>
net: dsa: b53: Fix for brcm tag issue in Cygnus SoC
Stephen Suryaputra <ssuryaextr(a)gmail.com>
vrf: check the original netdevice for generating redirect
Dan Carpenter <dan.carpenter(a)oracle.com>
team: use netdev_features_t instead of u32
Xin Long <lucien.xin(a)gmail.com>
sctp: not allow transport timeout value less than HZ/5 for hb_timer
Eric Dumazet <edumazet(a)google.com>
rtnetlink: validate attributes in do_setlink()
Eric Dumazet <edumazet(a)google.com>
net/packet: refine check for priv area size
Eric Dumazet <edumazet(a)google.com>
net: metrics: add proper netlink validation
Cong Wang <xiyou.wangcong(a)gmail.com>
netdev-FAQ: clarify DaveM's position for stable backports
Guillaume Nault <g.nault(a)alphalink.fr>
l2tp: fix refcount leakage on PPPoL2TP sockets
Michal Kubecek <mkubecek(a)suse.cz>
ipv6: omit traffic class when calculating flow hash
Sabrina Dubroca <sd(a)queasysnail.net>
ip6mr: only set ip6mr_table from setsockopt when ip6mr_new_table succeeds
Julia Lawall <Julia.Lawall(a)lip6.fr>
bnx2x: use the right constant
Jason A. Donenfeld <Jason(a)zx2c4.com>
netfilter: nf_flow_table: attach dst to skbs
-------------
Diffstat:
Documentation/networking/netdev-FAQ.txt | 9 +++++
Makefile | 4 +--
drivers/net/dsa/b53/b53_common.c | 15 +++++++-
drivers/net/dsa/b53/b53_priv.h | 2 ++
drivers/net/dsa/b53/b53_srab.c | 4 +--
drivers/net/ethernet/broadcom/bnx2x/bnx2x_link.c | 2 +-
drivers/net/team/team.c | 3 +-
drivers/pci/host/pci-hyperv.c | 46 +++++++++++++++++-------
include/linux/mroute_base.h | 10 ------
include/net/ipv6.h | 5 +++
net/core/flow_dissector.c | 2 +-
net/core/rtnetlink.c | 8 ++---
net/ipv4/fib_semantics.c | 4 +++
net/ipv4/ipmr_base.c | 8 +++--
net/ipv4/netfilter/nf_flow_table_ipv4.c | 5 +--
net/ipv6/ip6_output.c | 3 +-
net/ipv6/ip6mr.c | 21 +++++++----
net/ipv6/ndisc.c | 6 ++++
net/ipv6/netfilter/nf_flow_table_ipv6.c | 1 +
net/ipv6/route.c | 4 +--
net/l2tp/l2tp_ppp.c | 35 +++++++++---------
net/packet/af_packet.c | 2 +-
net/sctp/transport.c | 2 +-
23 files changed, 132 insertions(+), 69 deletions(-)
This is the start of the stable review cycle for the 4.14.49 release.
There are 41 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Mon Jun 11 15:29:07 UTC 2018.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.14.49-rc…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.14.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 4.14.49-rc1
Dexuan Cui <decui(a)microsoft.com>
PCI: hv: Do not wait forever on a device that has disappeared
Paul Blakey <paulb(a)mellanox.com>
cls_flower: Fix incorrect idr release when failing to modify rule
Eric Dumazet <edumazet(a)google.com>
rtnetlink: validate attributes in do_setlink()
Jason Wang <jasowang(a)redhat.com>
virtio-net: fix leaking page for gso packet during mergeable XDP
Eran Ben Elisha <eranbe(a)mellanox.com>
net/mlx5e: When RXFCS is set, add FCS data into checksum calculation
Jason Wang <jasowang(a)redhat.com>
virtio-net: correctly check num_buf during err path
Toshiaki Makita <makita.toshiaki(a)lab.ntt.co.jp>
tun: Fix NULL pointer dereference in XDP redirect
Jack Morgenstein <jackm(a)dev.mellanox.co.il>
net/mlx4: Fix irq-unsafe spinlock usage
Jason Wang <jasowang(a)redhat.com>
virtio-net: correctly transmit XDP buff after linearizing
Alexander Duyck <alexander.h.duyck(a)intel.com>
net-sysfs: Fix memory leak in XPS configuration
Florian Fainelli <f.fainelli(a)gmail.com>
net: phy: broadcom: Fix auxiliary control register reads
Mathieu Xhonneux <m.xhonneux(a)gmail.com>
ipv6: sr: fix memory OOB access in seg6_do_srh_encap/inline
Stephen Suryaputra <ssuryaextr(a)gmail.com>
vrf: check the original netdevice for generating redirect
Jason Wang <jasowang(a)redhat.com>
vhost: synchronize IOTLB message with dev cleanup
Dan Carpenter <dan.carpenter(a)oracle.com>
team: use netdev_features_t instead of u32
Xin Long <lucien.xin(a)gmail.com>
sctp: not allow transport timeout value less than HZ/5 for hb_timer
Shahed Shaikh <shahed.shaikh(a)cavium.com>
qed: Fix mask for physical address in ILT entry
Willem de Bruijn <willemb(a)google.com>
packet: fix reserve calculation
Daniele Palmas <dnlplm(a)gmail.com>
net: usb: cdc_mbim: add flag FLAG_SEND_ZLP
Florian Fainelli <f.fainelli(a)gmail.com>
net: phy: broadcom: Fix bcm_write_exp()
Eric Dumazet <edumazet(a)google.com>
net/packet: refine check for priv area size
Eric Dumazet <edumazet(a)google.com>
net: metrics: add proper netlink validation
Roopa Prabhu <roopa(a)cumulusnetworks.com>
net: ipv4: add missing RTA_TABLE to rtm_ipv4_policy
Cong Wang <xiyou.wangcong(a)gmail.com>
netdev-FAQ: clarify DaveM's position for stable backports
Kirill Tkhai <ktkhai(a)virtuozzo.com>
kcm: Fix use-after-free caused by clonned sockets
Wenwen Wang <wang6495(a)umn.edu>
isdn: eicon: fix a missing-check bug
Michal Kubecek <mkubecek(a)suse.cz>
ipv6: omit traffic class when calculating flow hash
Willem de Bruijn <willemb(a)google.com>
ipv4: remove warning in ip_recv_error
Eric Dumazet <edumazet(a)google.com>
ipmr: properly check rhltable_init() return value
Nicolas Dichtel <nicolas.dichtel(a)6wind.com>
ip6_tunnel: remove magic mtu value 0xFFF8
Sabrina Dubroca <sd(a)queasysnail.net>
ip6mr: only set ip6mr_table from setsockopt when ip6mr_new_table succeeds
Govindarajulu Varadarajan <gvaradar(a)cisco.com>
enic: set DMA mask to 47 bit
Alexey Kodanev <alexey.kodanev(a)oracle.com>
dccp: don't free ccid2_hc_tx_sock struct in dccp_disconnect()
Julia Lawall <Julia.Lawall(a)lip6.fr>
bnx2x: use the right constant
Suresh Reddy <suresh.reddy(a)broadcom.com>
be2net: Fix error detection logic for BE3
Nathan Chancellor <natechancellor(a)gmail.com>
kconfig: Avoid format overflow warning from GCC 8.1
Anand Jain <Anand.Jain(a)oracle.com>
btrfs: define SUPER_FLAG_METADUMP_V2
Linus Torvalds <torvalds(a)linux-foundation.org>
mmap: relax file size limit for regular files
Linus Torvalds <torvalds(a)linux-foundation.org>
mmap: introduce sane default mmap limits
Bart Van Assche <bart.vanassche(a)wdc.com>
scsi: sd_zbc: Avoid that resetting a zone fails sporadically
Damien Le Moal <damien.lemoal(a)wdc.com>
scsi: sd_zbc: Fix potential memory leak
-------------
Diffstat:
Documentation/networking/netdev-FAQ.txt | 9 ++
Makefile | 4 +-
drivers/isdn/hardware/eicon/diva.c | 22 ++--
drivers/isdn/hardware/eicon/diva.h | 5 +-
drivers/isdn/hardware/eicon/divasmain.c | 18 ++--
drivers/net/ethernet/broadcom/bnx2x/bnx2x_link.c | 2 +-
drivers/net/ethernet/cisco/enic/enic_main.c | 8 +-
drivers/net/ethernet/emulex/benet/be_main.c | 4 +-
drivers/net/ethernet/mellanox/mlx4/qp.c | 4 +-
drivers/net/ethernet/mellanox/mlx5/core/en_rx.c | 42 ++++++++
drivers/net/ethernet/qlogic/qed/qed_cxt.c | 2 +-
drivers/net/phy/bcm-cygnus.c | 6 +-
drivers/net/phy/bcm-phy-lib.c | 2 +-
drivers/net/phy/bcm-phy-lib.h | 7 ++
drivers/net/phy/bcm7xxx.c | 4 +-
drivers/net/team/team.c | 3 +-
drivers/net/tun.c | 15 +--
drivers/net/usb/cdc_mbim.c | 2 +-
drivers/net/virtio_net.c | 19 ++--
drivers/pci/host/pci-hyperv.c | 46 +++++---
drivers/scsi/sd_zbc.c | 128 ++++++++++++++---------
drivers/vhost/vhost.c | 3 +
fs/btrfs/disk-io.c | 3 +-
include/net/ipv6.h | 5 +
include/uapi/linux/btrfs_tree.h | 1 +
mm/mmap.c | 32 ++++++
net/core/flow_dissector.c | 2 +-
net/core/net-sysfs.c | 6 +-
net/core/rtnetlink.c | 8 +-
net/dccp/proto.c | 2 -
net/ipv4/fib_frontend.c | 1 +
net/ipv4/fib_semantics.c | 4 +
net/ipv4/ip_sockglue.c | 2 -
net/ipv4/ipmr.c | 7 +-
net/ipv6/ip6_output.c | 3 +-
net/ipv6/ip6_tunnel.c | 11 +-
net/ipv6/ip6mr.c | 3 +-
net/ipv6/ndisc.c | 6 ++
net/ipv6/route.c | 2 +-
net/ipv6/seg6_iptunnel.c | 4 +-
net/ipv6/sit.c | 5 +-
net/kcm/kcmsock.c | 2 +-
net/packet/af_packet.c | 4 +-
net/sched/cls_flower.c | 2 +-
net/sctp/transport.c | 2 +-
scripts/kconfig/confdata.c | 2 +-
46 files changed, 329 insertions(+), 145 deletions(-)
This is the start of the stable review cycle for the 4.16.15 release.
There are 48 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Mon Jun 11 14:59:28 UTC 2018.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.16.15-rc…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.16.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 4.16.15-rc1
Dexuan Cui <decui(a)microsoft.com>
PCI: hv: Do not wait forever on a device that has disappeared
Jason Wang <jasowang(a)redhat.com>
vhost_net: flush batched heads before trying to busy polling
Ard Biesheuvel <ard.biesheuvel(a)linaro.org>
net: netsec: reduce DMA mask to 40 bits
Nicolas Dichtel <nicolas.dichtel(a)6wind.com>
ip_tunnel: restore binding to ifaces with a large mtu
Jason Wang <jasowang(a)redhat.com>
virtio-net: correctly redirect linearized packet
Or Gerlitz <ogerlitz(a)mellanox.com>
net : sched: cls_api: deal with egdev path only if needed
Arun Parameswaran <arun.parameswaran(a)broadcom.com>
net: dsa: b53: Fix for brcm tag issue in Cygnus SoC
Jason Wang <jasowang(a)redhat.com>
virtio-net: correctly check num_buf during err path
Toshiaki Makita <makita.toshiaki(a)lab.ntt.co.jp>
tun: Fix NULL pointer dereference in XDP redirect
Eran Ben Elisha <eranbe(a)mellanox.com>
net/mlx5e: When RXFCS is set, add FCS data into checksum calculation
Jack Morgenstein <jackm(a)dev.mellanox.co.il>
net/mlx4: Fix irq-unsafe spinlock usage
Jason Wang <jasowang(a)redhat.com>
virtio-net: fix leaking page for gso packet during mergeable XDP
Jason Wang <jasowang(a)redhat.com>
virtio-net: correctly transmit XDP buff after linearizing
Alexander Duyck <alexander.h.duyck(a)intel.com>
net-sysfs: Fix memory leak in XPS configuration
Florian Fainelli <f.fainelli(a)gmail.com>
net: phy: broadcom: Fix auxiliary control register reads
Mathieu Xhonneux <m.xhonneux(a)gmail.com>
ipv6: sr: fix memory OOB access in seg6_do_srh_encap/inline
Stephen Suryaputra <ssuryaextr(a)gmail.com>
vrf: check the original netdevice for generating redirect
Jason Wang <jasowang(a)redhat.com>
vhost: synchronize IOTLB message with dev cleanup
Dan Carpenter <dan.carpenter(a)oracle.com>
team: use netdev_features_t instead of u32
Xin Long <lucien.xin(a)gmail.com>
sctp: not allow transport timeout value less than HZ/5 for hb_timer
Eric Dumazet <edumazet(a)google.com>
rtnetlink: validate attributes in do_setlink()
Shahed Shaikh <shahed.shaikh(a)cavium.com>
qed: Fix mask for physical address in ILT entry
Willem de Bruijn <willemb(a)google.com>
packet: fix reserve calculation
Daniele Palmas <dnlplm(a)gmail.com>
net: usb: cdc_mbim: add flag FLAG_SEND_ZLP
Florian Fainelli <f.fainelli(a)gmail.com>
net: phy: broadcom: Fix bcm_write_exp()
Eric Dumazet <edumazet(a)google.com>
net/packet: refine check for priv area size
Eric Dumazet <edumazet(a)google.com>
net: metrics: add proper netlink validation
Roopa Prabhu <roopa(a)cumulusnetworks.com>
net: ipv4: add missing RTA_TABLE to rtm_ipv4_policy
Dan Carpenter <dan.carpenter(a)oracle.com>
net: ethernet: davinci_emac: fix error handling in probe()
Cong Wang <xiyou.wangcong(a)gmail.com>
netdev-FAQ: clarify DaveM's position for stable backports
Petr Machata <petrm(a)mellanox.com>
mlxsw: spectrum: Forbid creation of VLAN 1 over port/LAG
Guillaume Nault <g.nault(a)alphalink.fr>
l2tp: fix refcount leakage on PPPoL2TP sockets
Kirill Tkhai <ktkhai(a)virtuozzo.com>
kcm: Fix use-after-free caused by clonned sockets
Wenwen Wang <wang6495(a)umn.edu>
isdn: eicon: fix a missing-check bug
Michal Kubecek <mkubecek(a)suse.cz>
ipv6: omit traffic class when calculating flow hash
Willem de Bruijn <willemb(a)google.com>
ipv4: remove warning in ip_recv_error
Eric Dumazet <edumazet(a)google.com>
ipmr: properly check rhltable_init() return value
Nicolas Dichtel <nicolas.dichtel(a)6wind.com>
ip6_tunnel: remove magic mtu value 0xFFF8
Sabrina Dubroca <sd(a)queasysnail.net>
ip6mr: only set ip6mr_table from setsockopt when ip6mr_new_table succeeds
Govindarajulu Varadarajan <gvaradar(a)cisco.com>
enic: set DMA mask to 47 bit
Alexey Kodanev <alexey.kodanev(a)oracle.com>
dccp: don't free ccid2_hc_tx_sock struct in dccp_disconnect()
Paul Blakey <paulb(a)mellanox.com>
cls_flower: Fix incorrect idr release when failing to modify rule
Julia Lawall <Julia.Lawall(a)lip6.fr>
bnx2x: use the right constant
Suresh Reddy <suresh.reddy(a)broadcom.com>
be2net: Fix error detection logic for BE3
Nathan Chancellor <natechancellor(a)gmail.com>
kconfig: Avoid format overflow warning from GCC 8.1
Jason A. Donenfeld <Jason(a)zx2c4.com>
netfilter: nf_flow_table: attach dst to skbs
Linus Torvalds <torvalds(a)linux-foundation.org>
mmap: relax file size limit for regular files
Linus Torvalds <torvalds(a)linux-foundation.org>
mmap: introduce sane default mmap limits
-------------
Diffstat:
Documentation/networking/netdev-FAQ.txt | 9 +++++
Makefile | 4 +--
drivers/isdn/hardware/eicon/diva.c | 22 ++++++++----
drivers/isdn/hardware/eicon/diva.h | 5 +--
drivers/isdn/hardware/eicon/divasmain.c | 18 ++++++----
drivers/net/dsa/b53/b53_common.c | 15 +++++++-
drivers/net/dsa/b53/b53_priv.h | 2 ++
drivers/net/dsa/b53/b53_srab.c | 4 +--
drivers/net/ethernet/broadcom/bnx2x/bnx2x_link.c | 2 +-
drivers/net/ethernet/cisco/enic/enic_main.c | 8 ++---
drivers/net/ethernet/emulex/benet/be_main.c | 4 ++-
drivers/net/ethernet/mellanox/mlx4/qp.c | 4 +--
drivers/net/ethernet/mellanox/mlx5/core/en_rx.c | 42 ++++++++++++++++++++++
drivers/net/ethernet/mellanox/mlxsw/spectrum.c | 5 +++
drivers/net/ethernet/qlogic/qed/qed_cxt.c | 2 +-
drivers/net/ethernet/socionext/netsec.c | 4 +--
drivers/net/ethernet/ti/davinci_emac.c | 22 ++++++------
drivers/net/phy/bcm-cygnus.c | 6 ++--
drivers/net/phy/bcm-phy-lib.c | 2 +-
drivers/net/phy/bcm-phy-lib.h | 7 ++++
drivers/net/phy/bcm7xxx.c | 4 +--
drivers/net/team/team.c | 3 +-
drivers/net/tun.c | 15 ++++----
drivers/net/usb/cdc_mbim.c | 2 +-
drivers/net/virtio_net.c | 21 ++++++-----
drivers/pci/host/pci-hyperv.c | 46 +++++++++++++++++-------
drivers/vhost/net.c | 37 ++++++++++++-------
drivers/vhost/vhost.c | 3 ++
include/net/ipv6.h | 5 +++
mm/mmap.c | 32 +++++++++++++++++
net/core/flow_dissector.c | 2 +-
net/core/net-sysfs.c | 6 ++--
net/core/rtnetlink.c | 8 ++---
net/dccp/proto.c | 2 --
net/ipv4/fib_frontend.c | 1 +
net/ipv4/fib_semantics.c | 4 +++
net/ipv4/ip_sockglue.c | 2 --
net/ipv4/ip_tunnel.c | 8 ++---
net/ipv4/ipmr.c | 7 +++-
net/ipv4/netfilter/nf_flow_table_ipv4.c | 5 +--
net/ipv6/ip6_output.c | 3 +-
net/ipv6/ip6_tunnel.c | 11 ++++--
net/ipv6/ip6mr.c | 3 +-
net/ipv6/ndisc.c | 6 ++++
net/ipv6/netfilter/nf_flow_table_ipv6.c | 1 +
net/ipv6/route.c | 2 +-
net/ipv6/seg6_iptunnel.c | 4 +--
net/ipv6/sit.c | 5 +--
net/kcm/kcmsock.c | 2 +-
net/l2tp/l2tp_ppp.c | 35 +++++++++---------
net/packet/af_packet.c | 4 +--
net/sched/cls_api.c | 2 +-
net/sched/cls_flower.c | 2 +-
net/sctp/transport.c | 2 +-
scripts/kconfig/confdata.c | 2 +-
55 files changed, 338 insertions(+), 146 deletions(-)
From: Alexander Duyck <alexander.h.duyck(a)intel.com>
This patch addresses two issues. First it adds the correct bit definitions
for the SECTXSTAT and SECRXSTAT registers. Then it makes use of those
definitions to test for if IPsec has been disabled on the part and if so we
do not enable it.
CC: stable <stable(a)vger.kernel.org>
Signed-off-by: Alexander Duyck <alexander.h.duyck(a)intel.com>
Reported-by: Andre Tomt <andre(a)tomt.net>
Acked-by: Shannon Nelson <shannon.nelson(a)oracle.com>
Tested-by: Andrew Bowers <andrewx.bowers(a)intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher(a)intel.com>
---
drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c | 14 +++++++++++++-
drivers/net/ethernet/intel/ixgbe/ixgbe_type.h | 6 ++++--
2 files changed, 17 insertions(+), 3 deletions(-)
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c
index 7b23fb0c2d07..c116f459945d 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c
@@ -975,10 +975,22 @@ void ixgbe_ipsec_rx(struct ixgbe_ring *rx_ring,
**/
void ixgbe_init_ipsec_offload(struct ixgbe_adapter *adapter)
{
+ struct ixgbe_hw *hw = &adapter->hw;
struct ixgbe_ipsec *ipsec;
+ u32 t_dis, r_dis;
size_t size;
- if (adapter->hw.mac.type == ixgbe_mac_82598EB)
+ if (hw->mac.type == ixgbe_mac_82598EB)
+ return;
+
+ /* If there is no support for either Tx or Rx offload
+ * we should not be advertising support for IPsec.
+ */
+ t_dis = IXGBE_READ_REG(hw, IXGBE_SECTXSTAT) &
+ IXGBE_SECTXSTAT_SECTX_OFF_DIS;
+ r_dis = IXGBE_READ_REG(hw, IXGBE_SECRXSTAT) &
+ IXGBE_SECRXSTAT_SECRX_OFF_DIS;
+ if (t_dis || r_dis)
return;
ipsec = kzalloc(sizeof(*ipsec), GFP_KERNEL);
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_type.h b/drivers/net/ethernet/intel/ixgbe/ixgbe_type.h
index e8ed37749ab1..44cfb2021145 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_type.h
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_type.h
@@ -599,13 +599,15 @@ struct ixgbe_nvm_version {
#define IXGBE_SECTXCTRL_STORE_FORWARD 0x00000004
#define IXGBE_SECTXSTAT_SECTX_RDY 0x00000001
-#define IXGBE_SECTXSTAT_ECC_TXERR 0x00000002
+#define IXGBE_SECTXSTAT_SECTX_OFF_DIS 0x00000002
+#define IXGBE_SECTXSTAT_ECC_TXERR 0x00000004
#define IXGBE_SECRXCTRL_SECRX_DIS 0x00000001
#define IXGBE_SECRXCTRL_RX_DIS 0x00000002
#define IXGBE_SECRXSTAT_SECRX_RDY 0x00000001
-#define IXGBE_SECRXSTAT_ECC_RXERR 0x00000002
+#define IXGBE_SECRXSTAT_SECRX_OFF_DIS 0x00000002
+#define IXGBE_SECRXSTAT_ECC_RXERR 0x00000004
/* LinkSec (MacSec) Registers */
#define IXGBE_LSECTXCAP 0x08A00
--
2.17.1
From: Alexander Duyck <alexander.h.duyck(a)intel.com>
This patch fixes two issues. First we add an early test for the Tx and Rx
security block ready bits. By doing this we can avoid the need for waits or
loopback in the event that the security block is already flushed out.
Secondly we fix the boolean logic that was testing for the Tx OR Rx ready
bits being set and change it so that we only exit if the Tx AND Rx ready
bits are both set.
CC: stable <stable(a)vger.kernel.org>
Signed-off-by: Alexander Duyck <alexander.h.duyck(a)intel.com>
Acked-by: Shannon Nelson <shannon.nelson(a)oracle.com>
Tested-by: Andrew Bowers <andrewx.bowers(a)intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher(a)intel.com>
---
drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c | 13 +++++++++++--
1 file changed, 11 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c
index 38d8cf75e9ad..7b23fb0c2d07 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c
@@ -158,7 +158,16 @@ static void ixgbe_ipsec_stop_data(struct ixgbe_adapter *adapter)
reg |= IXGBE_SECRXCTRL_RX_DIS;
IXGBE_WRITE_REG(hw, IXGBE_SECRXCTRL, reg);
- IXGBE_WRITE_FLUSH(hw);
+ /* If both Tx and Rx are ready there are no packets
+ * that we need to flush so the loopback configuration
+ * below is not necessary.
+ */
+ t_rdy = IXGBE_READ_REG(hw, IXGBE_SECTXSTAT) &
+ IXGBE_SECTXSTAT_SECTX_RDY;
+ r_rdy = IXGBE_READ_REG(hw, IXGBE_SECRXSTAT) &
+ IXGBE_SECRXSTAT_SECRX_RDY;
+ if (t_rdy && r_rdy)
+ return;
/* If the tx fifo doesn't have link, but still has data,
* we can't clear the tx sec block. Set the MAC loopback
@@ -185,7 +194,7 @@ static void ixgbe_ipsec_stop_data(struct ixgbe_adapter *adapter)
IXGBE_SECTXSTAT_SECTX_RDY;
r_rdy = IXGBE_READ_REG(hw, IXGBE_SECRXSTAT) &
IXGBE_SECRXSTAT_SECRX_RDY;
- } while (!t_rdy && !r_rdy && limit--);
+ } while (!(t_rdy && r_rdy) && limit--);
/* undo loopback if we played with it earlier */
if (!link) {
--
2.17.1
From: Alexander Duyck <alexander.h.duyck(a)intel.com>
This patch moves the IPsec init function in ixgbe_sw_init. This way it is a
bit more consistent with the placement of similar initialization functions
and is placed before the reset_hw call which should allow us to clean up
any link issues that may be introduced by the fact that we force the link
up if somehow the device had IPsec still enabled before the driver was
loaded.
In addition to the function move it is necessary to change the assignment
of netdev->features. The easiest way to do this is to just test for the
existence of adapter->ipsec and if it is present we set the feature bits.
CC: stable <stable(a)vger.kernel.org>
Fixes: 49a94d74d948 ("ixgbe: add ipsec engine start and stop routines")
Reported-by: Andre Tomt <andre(a)tomt.net>
Signed-off-by: Alexander Duyck <alexander.h.duyck(a)intel.com>
Acked-by: Shannon Nelson <shannon.nelson(a)oracle.com>
Tested-by: Andrew Bowers <andrewx.bowers(a)intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher(a)intel.com>
---
drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c | 7 -------
drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 11 +++++++++--
2 files changed, 9 insertions(+), 9 deletions(-)
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c
index 344a1f213a5f..38d8cf75e9ad 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c
@@ -1001,13 +1001,6 @@ void ixgbe_init_ipsec_offload(struct ixgbe_adapter *adapter)
adapter->netdev->xfrmdev_ops = &ixgbe_xfrmdev_ops;
-#define IXGBE_ESP_FEATURES (NETIF_F_HW_ESP | \
- NETIF_F_HW_ESP_TX_CSUM | \
- NETIF_F_GSO_ESP)
-
- adapter->netdev->features |= IXGBE_ESP_FEATURES;
- adapter->netdev->hw_enc_features |= IXGBE_ESP_FEATURES;
-
return;
err2:
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index a925f05ec342..8d061af276d3 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -6117,6 +6117,7 @@ static int ixgbe_sw_init(struct ixgbe_adapter *adapter,
#ifdef CONFIG_IXGBE_DCB
ixgbe_init_dcb(adapter);
#endif
+ ixgbe_init_ipsec_offload(adapter);
/* default flow control settings */
hw->fc.requested_mode = ixgbe_fc_full;
@@ -10429,6 +10430,14 @@ static int ixgbe_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
if (hw->mac.type >= ixgbe_mac_82599EB)
netdev->features |= NETIF_F_SCTP_CRC;
+#ifdef CONFIG_XFRM_OFFLOAD
+#define IXGBE_ESP_FEATURES (NETIF_F_HW_ESP | \
+ NETIF_F_HW_ESP_TX_CSUM | \
+ NETIF_F_GSO_ESP)
+
+ if (adapter->ipsec)
+ netdev->features |= IXGBE_ESP_FEATURES;
+#endif
/* copy netdev features into list of user selectable features */
netdev->hw_features |= netdev->features |
NETIF_F_HW_VLAN_CTAG_FILTER |
@@ -10491,8 +10500,6 @@ static int ixgbe_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
NETIF_F_FCOE_MTU;
}
#endif /* IXGBE_FCOE */
- ixgbe_init_ipsec_offload(adapter);
-
if (adapter->flags2 & IXGBE_FLAG2_RSC_CAPABLE)
netdev->hw_features |= NETIF_F_LRO;
if (adapter->flags2 & IXGBE_FLAG2_RSC_ENABLED)
--
2.17.1
From: Alexander Duyck <alexander.h.duyck(a)intel.com>
There is no point in adding code if CONFIG_XFRM is defined that we won't
use unless CONFIG_XFRM_OFFLOAD is defined. So instead of leaving this code
floating around I am replacing the ifdef with what I believe is the correct
one so that we only include the code and variables if they will actually be
used.
CC: stable <stable(a)vger.kernel.org>
Signed-off-by: Alexander Duyck <alexander.h.duyck(a)intel.com>
Acked-by: Shannon Nelson <shannon.nelson(a)oracle.com>
Tested-by: Andrew Bowers <andrewx.bowers(a)intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher(a)intel.com>
---
drivers/net/ethernet/intel/ixgbe/ixgbe.h | 4 ++--
drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 2 +-
2 files changed, 3 insertions(+), 3 deletions(-)
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe.h b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
index fc534e91c6b2..144d5fe6b944 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe.h
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
@@ -760,9 +760,9 @@ struct ixgbe_adapter {
#define IXGBE_RSS_KEY_SIZE 40 /* size of RSS Hash Key in bytes */
u32 *rss_key;
-#ifdef CONFIG_XFRM
+#ifdef CONFIG_XFRM_OFFLOAD
struct ixgbe_ipsec *ipsec;
-#endif /* CONFIG_XFRM */
+#endif /* CONFIG_XFRM_OFFLOAD */
};
static inline u8 ixgbe_max_rss_indices(struct ixgbe_adapter *adapter)
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index f9e0dc041cfb..a925f05ec342 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -9896,7 +9896,7 @@ ixgbe_features_check(struct sk_buff *skb, struct net_device *dev,
* the TSO, so it's the exception.
*/
if (skb->encapsulation && !(features & NETIF_F_TSO_MANGLEID)) {
-#ifdef CONFIG_XFRM
+#ifdef CONFIG_XFRM_OFFLOAD
if (!skb->sp)
#endif
features &= ~NETIF_F_TSO;
--
2.17.1
From: Alexander Duyck <alexander.h.duyck(a)intel.com>
When we were enabling macvlan interfaces we weren't correctly configuring
things until ixgbe_setup_tc was called a second time either by tweaking the
number of queues or increasing the macvlan count past 15.
The issue came down to the fact that num_rx_pools is not populated until
after the queues and interrupts are reinitialized.
Instead of trying to set it sooner we can just move the call to setup at
least 1 traffic class to the SR-IOV/VMDq setup function so that we just set
it for this one case. We already had a spot that was configuring the queues
for TC 0 in the code here anyway so it makes sense to also set the number
of TCs here as well.
CC: stable <stable(a)vger.kernel.org>
Fixes: 49cfbeb7a95c ("ixgbe: Fix handling of macvlan Tx offload")
Signed-off-by: Alexander Duyck <alexander.h.duyck(a)intel.com>
Tested-by: Andrew Bowers <andrewx.bowers(a)intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher(a)intel.com>
---
drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c | 8 ++++++++
drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 8 --------
2 files changed, 8 insertions(+), 8 deletions(-)
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c
index 893a9206e718..d361f570ca37 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c
@@ -593,6 +593,14 @@ static bool ixgbe_set_sriov_queues(struct ixgbe_adapter *adapter)
}
#endif
+ /* To support macvlan offload we have to use num_tc to
+ * restrict the queues that can be used by the device.
+ * By doing this we can avoid reporting a false number of
+ * queues.
+ */
+ if (vmdq_i > 1)
+ netdev_set_num_tc(adapter->netdev, 1);
+
/* populate TC0 for use by pool 0 */
netdev_set_tc_queue(adapter->netdev, 0,
adapter->num_rx_queues_per_pool, 0);
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index 4929f7265598..f9e0dc041cfb 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -8822,14 +8822,6 @@ int ixgbe_setup_tc(struct net_device *dev, u8 tc)
} else {
netdev_reset_tc(dev);
- /* To support macvlan offload we have to use num_tc to
- * restrict the queues that can be used by the device.
- * By doing this we can avoid reporting a false number of
- * queues.
- */
- if (!tc && adapter->num_rx_pools > 1)
- netdev_set_num_tc(dev, 1);
-
if (adapter->hw.mac.type == ixgbe_mac_82598EB)
adapter->hw.fc.requested_mode = adapter->last_lfc_mode;
--
2.17.1
On Mon, Jun 11, 2018 at 08:12:45AM +1000, David Airlie wrote:
> Can you make sure you pull in
>
> 76ef6b28ea4f81c3d511866a9b31392caa833126 (tag:
> drm-fixes-for-v4.17-rc6-urgent)
> Author: Dave Airlie <airlied(a)redhat.com>
> Date: Tue May 15 13:38:15 2018 +1000
>
> drm: set FMODE_UNSIGNED_OFFSET for drm files
>
> Into anywhere this first patch goes?
Thanks for pointing this out, now queued up.
greg k-h
Hussam reports:
I was poking around and for no real reason, I did cat /dev/mem and
strings /dev/mem. Then I saw the following warning in dmesg. I saved it
and rebooted immediately.
memremap attempted on mixed range 0x000000000009c000 size: 0x1000
------------[ cut here ]------------
WARNING: CPU: 0 PID: 11810 at kernel/memremap.c:98 memremap+0x104/0x170
[..]
Call Trace:
xlate_dev_mem_ptr+0x25/0x40
read_mem+0x89/0x1a0
__vfs_read+0x36/0x170
The memremap() implementation checks for attempts to remap System RAM
with MEMREMAP_WB and instead redirects those mapping attempts to the
linear map. However, that only works if the physical address range being
remapped is page aligned. In low memory we have situations like the
following:
00000000-00000fff : Reserved
00001000-0009fbff : System RAM
0009fc00-0009ffff : Reserved
...where System RAM intersects Reserved ranges on a sub-page page
granularity.
Given that devmem_is_allowed() special cases any attempt to map System
RAM in the first 1MB of memory, replace page_is_ram() with the more
precise region_intersects() to trap attempts to map disallowed ranges.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=199999
Fixes: 92281dee825f ("arch: introduce memremap()")
Cc: <stable(a)vger.kernel.org>
Cc: Christoph Hellwig <hch(a)lst.de>
Reported-by: Hussam Al-Tayeb <me(a)hussam.eu.org>
Signed-off-by: Dan Williams <dan.j.williams(a)intel.com>
---
arch/x86/mm/init.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c
index fec82b577c18..cee58a972cb2 100644
--- a/arch/x86/mm/init.c
+++ b/arch/x86/mm/init.c
@@ -706,7 +706,9 @@ void __init init_mem_mapping(void)
*/
int devmem_is_allowed(unsigned long pagenr)
{
- if (page_is_ram(pagenr)) {
+ if (region_intersects(PFN_PHYS(pagenr), PAGE_SIZE,
+ IORESOURCE_SYSTEM_RAM, IORES_DESC_NONE)
+ != REGION_DISJOINT) {
/*
* For disallowed memory regions in the low 1MB range,
* request that the page be shown as all zeros.
Hi,
The patch has been in the mainline, and I have verified the commit can be cherry-picked
cleanly to these 3 stable branches.
The issue fixed by the patch also exists in 4.14, 4.16 and 4.17.
It looks I forgot to add a "Cc: stable(a)vger.kernel.org" tag. Sorry.
Thanks,
-- Dexuan
On Tue, Sep 19, 2017 at 9:32 AM, Greg KH <greg(a)kroah.com> wrote:
> On Mon, Sep 18, 2017 at 10:29:25PM +0300, Amir Goldstein wrote:
>> On Mon, Sep 18, 2017 at 9:35 PM, Greg KH <greg(a)kroah.com> wrote:
>> > On Mon, Sep 18, 2017 at 09:00:30PM +0300, Amir Goldstein wrote:
>> >> On Mon, Sep 18, 2017 at 8:11 PM, Darrick J. Wong
>> >> <darrick.wong(a)oracle.com> wrote:
>> >> > On Fri, Sep 15, 2017 at 03:40:24PM +0300, Amir Goldstein wrote:
>> >> >> On Wed, Aug 30, 2017 at 4:38 PM, Amir Goldstein <amir73il(a)gmail.com> wrote:
>> >> >> > When calling into _xfs_log_force{,_lsn}() with a pointer
>> >> >> > to log_flushed variable, log_flushed will be set to 1 if:
>> >> >> > 1. xlog_sync() is called to flush the active log buffer
>> >> >> > AND/OR
>> >> >> > 2. xlog_wait() is called to wait on a syncing log buffers
>> >> >> >
>> >> >> > xfs_file_fsync() checks the value of log_flushed after
>> >> >> > _xfs_log_force_lsn() call to optimize away an explicit
>> >> >> > PREFLUSH request to the data block device after writing
>> >> >> > out all the file's pages to disk.
>> >> >> >
>> >> >> > This optimization is incorrect in the following sequence of events:
>> >> >> >
>> >> >> > Task A Task B
>> >> >> > -------------------------------------------------------
>> >> >> > xfs_file_fsync()
>> >> >> > _xfs_log_force_lsn()
>> >> >> > xlog_sync()
>> >> >> > [submit PREFLUSH]
>> >> >> > xfs_file_fsync()
>> >> >> > file_write_and_wait_range()
>> >> >> > [submit WRITE X]
>> >> >> > [endio WRITE X]
>> >> >> > _xfs_log_force_lsn()
>> >> >> > xlog_wait()
>> >> >> > [endio PREFLUSH]
>> >> >> >
>> >> >> > The write X is not guarantied to be on persistent storage
>> >> >> > when PREFLUSH request in completed, because write A was submitted
>> >> >> > after the PREFLUSH request, but xfs_file_fsync() of task A will
>> >> >> > be notified of log_flushed=1 and will skip explicit flush.
>> >> >> >
>> >> >> > If the system crashes after fsync of task A, write X may not be
>> >> >> > present on disk after reboot.
>> >> >> >
>> >> >> > This bug was discovered and demonstrated using Josef Bacik's
>> >> >> > dm-log-writes target, which can be used to record block io operations
>> >> >> > and then replay a subset of these operations onto the target device.
>> >> >> > The test goes something like this:
>> >> >> > - Use fsx to execute ops of a file and record ops on log device
>> >> >> > - Every now and then fsync the file, store md5 of file and mark
>> >> >> > the location in the log
>> >> >> > - Then replay log onto device for each mark, mount fs and compare
>> >> >> > md5 of file to stored value
>> >> >> >
>> >> >> > Cc: Christoph Hellwig <hch(a)lst.de>
>> >> >> > Cc: Josef Bacik <jbacik(a)fb.com>
>> >> >> > Cc: <stable(a)vger.kernel.org>
>> >> >> > Signed-off-by: Amir Goldstein <amir73il(a)gmail.com>
>> >> >> > ---
>> >> >> >
>> >> >> > Christoph, Dave,
>> >> >> >
>> >> >> > It's hard to believe, but I think the reported bug has been around
>> >> >> > since 2005 f538d4da8d52 ("[XFS] write barrier support"), but I did
>> >> >> > not try to test old kernels.
>> >> >>
>> >> >> Forgot to tag commit message with:
>> >> >> Fixes: f538d4da8d52 ("[XFS] write barrier support")
>> >> >>
>> >> >> Maybe the tag could be added when applying to recent stables,
>> >> >> so distros and older downstream stables can see the tag.
>> >> >>
>> >> >> The disclosure of the security bug fix (commit b31ff3cdf5) made me wonder
>> >> >> if possible data loss bug should also be disclosed in some distros forum?
>> >> >> I bet some users would care more about the latter than the former.
>> >> >> Coincidentally, both data loss and security bugs fix the same commit..
>> >> >
>> >> > Yes the the patch ought to get sent on to stable w/ fixes tag. One
>> >> > would hope that the distros will pick up the stable fixes from there.
>> >>
>> >>
>> >> Greg, for your consideration, please add
>> >> Fixes: f538d4da8d52 ("[XFS] write barrier support")
>> >> If not pushed yet.
>> >
>> > Add it to what?
>>
>> Sorry, add that tag when applying commit 47c7d0b1950258312
>> to stable trees, since I missed adding the tag before it was merged
>> to master.
>
> Nah, as the tag is just needed to let me know where to backport stuff
> to, I don't think it matters when I add it to the stable tree itself, so
> I'll leave it as-is.
>
Greg,
Related or not to above Fixes discussion, I now noticed that you never picked
the patch for kernel 4.4.
Ben did take it to 3.2 and 3.16 BTW.
This is a very critical bug fix IMO.
Were you waiting for an ACK from xfs maintainers or just an oversight?
Or was it me who had to check up on that?
Thanks,
Amir.
Hi,
Linux 4.9.107 was released on Jun 7th and www.kernel.org still points to 4.9.106 as latest version
for 4.9 tree. I have seen delays before, but not for more than an hour.
Is something broken?
Cheers,
Pavlos
The patch titled
Subject: fs/binfmt_misc.c: do not allow offset overflow
has been removed from the -mm tree. Its filename was
fs-binfmt_miscc-do-not-allow-offset-overflow.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: Thadeu Lima de Souza Cascardo <cascardo(a)canonical.com>
Subject: fs/binfmt_misc.c: do not allow offset overflow
WHen registering a new binfmt_misc handler, it is possible to overflow the
offset to get a negative value, which might crash the system, or possibly
leak kernel data.
Here is a crash log when 2500000000 was used as an offset:
[ 6050.251552] BUG: unable to handle kernel paging request at ffff989cfd6edca0
[ 6050.252053] IP: load_misc_binary+0x22b/0x470 [binfmt_misc]
[ 6050.252053] PGD 1ef3e067 P4D 1ef3e067 PUD 0
[ 6050.252053] Oops: 0000 [#1] SMP NOPTI
[ 6050.252053] Modules linked in: binfmt_misc kvm_intel ppdev kvm irqbypass joydev input_leds serio_raw mac_hid parport_pc qemu_fw_cfg parpy
[ 6050.252053] CPU: 0 PID: 2499 Comm: bash Not tainted 4.15.0-22-generic #24-Ubuntu
[ 6050.252053] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.1-1 04/01/2014
[ 6050.252053] RIP: 0010:load_misc_binary+0x22b/0x470 [binfmt_misc]
[ 6050.252053] RSP: 0018:ffffb6e383017e18 EFLAGS: 00010202
[ 6050.252053] RAX: 0000000000000003 RBX: ffff989d74a47100 RCX: ffff989cfd6edca0
[ 6050.252053] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff989d7d2e95e5
[ 6050.252053] RBP: ffffb6e383017e48 R08: 0000000000000001 R09: 0000000000000000
[ 6050.252053] R10: 0000000000000000 R11: fefefefefefefeff R12: 0000000000000001
[ 6050.252053] R13: ffff989d7d2e9580 R14: 0000000000000000 R15: ffffffffc0592160
[ 6050.252053] FS: 00007fa424c89740(0000) GS:ffff989d7fc00000(0000) knlGS:0000000000000000
[ 6050.252053] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 6050.252053] CR2: ffff989cfd6edca0 CR3: 000000003db08000 CR4: 00000000000006f0
[ 6050.252053] Call Trace:
[ 6050.252053] search_binary_handler+0x97/0x1d0
[ 6050.252053] do_execveat_common.isra.34+0x667/0x810
[ 6050.252053] SyS_execve+0x31/0x40
[ 6050.252053] do_syscall_64+0x73/0x130
[ 6050.252053] entry_SYSCALL_64_after_hwframe+0x3d/0xa2
Use kstrtoint instead of simple_strtoul. It will work as the code already
set the delimiter byte to '\0' and we only do it when the field is not
empty.
Tested with offsets -1, 2500000000, UINT_MAX and INT_MAX. Also tested
with examples documented at Documentation/admin-guide/binfmt-misc.rst and
other registrations from packages on Ubuntu.
Link: http://lkml.kernel.org/r/20180529135648.14254-1-cascardo@canonical.com
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo(a)canonical.com>
Reviewed-by: Andrew Morton <akpm(a)linux-foundation.org>
Cc: Alexander Viro <viro(a)zeniv.linux.org.uk>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/binfmt_misc.c | 12 +++++++++---
1 file changed, 9 insertions(+), 3 deletions(-)
diff -puN fs/binfmt_misc.c~fs-binfmt_miscc-do-not-allow-offset-overflow fs/binfmt_misc.c
--- a/fs/binfmt_misc.c~fs-binfmt_miscc-do-not-allow-offset-overflow
+++ a/fs/binfmt_misc.c
@@ -387,8 +387,13 @@ static Node *create_entry(const char __u
s = strchr(p, del);
if (!s)
goto einval;
- *s++ = '\0';
- e->offset = simple_strtoul(p, &p, 10);
+ *s = '\0';
+ if (p != s) {
+ int r = kstrtoint(p, 10, &e->offset);
+ if (r != 0 || e->offset < 0)
+ goto einval;
+ }
+ p = s;
if (*p++)
goto einval;
pr_debug("register: offset: %#x\n", e->offset);
@@ -428,7 +433,8 @@ static Node *create_entry(const char __u
if (e->mask &&
string_unescape_inplace(e->mask, UNESCAPE_HEX) != e->size)
goto einval;
- if (e->size + e->offset > BINPRM_BUF_SIZE)
+ if (e->size > BINPRM_BUF_SIZE ||
+ BINPRM_BUF_SIZE - e->size < e->offset)
goto einval;
pr_debug("register: magic/mask length: %i\n", e->size);
if (USE_DEBUG) {
_
Patches currently in -mm which might be from cascardo(a)canonical.com are
The patch titled
Subject: mm, page_alloc: do not break __GFP_THISNODE by zonelist reset
has been removed from the -mm tree. Its filename was
mm-page_alloc-do-not-break-__gfp_thisnode-by-zonelist-reset.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: Vlastimil Babka <vbabka(a)suse.cz>
Subject: mm, page_alloc: do not break __GFP_THISNODE by zonelist reset
In __alloc_pages_slowpath() we reset zonelist and preferred_zoneref for
allocations that can ignore memory policies. The zonelist is obtained
from current CPU's node. This is a problem for __GFP_THISNODE allocations
that want to allocate on a different node, e.g. because the allocating
thread has been migrated to a different CPU.
This has been observed to break SLAB in our 4.4-based kernel, because
there it relies on __GFP_THISNODE working as intended. If a slab page is
put on wrong node's list, then further list manipulations may corrupt the
list because page_to_nid() is used to determine which node's list_lock
should be locked and thus we may take a wrong lock and race.
Current SLAB implementation seems to be immune by luck thanks to commit
511e3a058812 ("mm/slab: make cache_grow() handle the page allocated on
arbitrary node") but there may be others assuming that __GFP_THISNODE
works as promised.
We can fix it by simply removing the zonelist reset completely. There is
actually no reason to reset it, because memory policies and cpusets don't
affect the zonelist choice in the first place. This was different when
commit 183f6371aac2 ("mm: ignore mempolicies when using
ALLOC_NO_WATERMARK") introduced the code, as mempolicies provided their
own restricted zonelists.
We might consider this for 4.17 although I don't know if there's anything
currently broken.
SLAB is currently not affected, but in kernels older than 4.7 that
don't yet have 511e3a058812 ("mm/slab: make cache_grow() handle the
page allocated on arbitrary node") it is. That's at least 4.4 LTS.
Older ones I'll have to check.
So stable backports should be more important, but will have to be
reviewed carefully, as the code went through many changes. BTW I think
that also the ac->preferred_zoneref reset is currently useless if we
don't also reset ac->nodemask from a mempolicy to NULL first (which we
probably should for the OOM victims etc?), but I would leave that for a
separate patch.
Link: http://lkml.kernel.org/r/20180525130853.13915-1-vbabka@suse.cz
Signed-off-by: Vlastimil Babka <vbabka(a)suse.cz>
Fixes: 183f6371aac2 ("mm: ignore mempolicies when using ALLOC_NO_WATERMARK")
Acked-by: Mel Gorman <mgorman(a)techsingularity.net>
Cc: Michal Hocko <mhocko(a)kernel.org>
Cc: David Rientjes <rientjes(a)google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim(a)lge.com>
Cc: Vlastimil Babka <vbabka(a)suse.cz>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/page_alloc.c | 1 -
1 file changed, 1 deletion(-)
diff -puN mm/page_alloc.c~mm-page_alloc-do-not-break-__gfp_thisnode-by-zonelist-reset mm/page_alloc.c
--- a/mm/page_alloc.c~mm-page_alloc-do-not-break-__gfp_thisnode-by-zonelist-reset
+++ a/mm/page_alloc.c
@@ -4169,7 +4169,6 @@ retry:
* orientated.
*/
if (!(alloc_flags & ALLOC_CPUSET) || reserve_flags) {
- ac->zonelist = node_zonelist(numa_node_id(), gfp_mask);
ac->preferred_zoneref = first_zones_zonelist(ac->zonelist,
ac->high_zoneidx, ac->nodemask);
}
_
Patches currently in -mm which might be from vbabka(a)suse.cz are
Commit 17c2895 ("arm64: Abstract syscallno manipulation") abstracts
out the pt_regs.syscallno value for a syscall cancelled by a tracer
as NO_SYSCALL, and provides helpers to set and check for this
condition. However, the way this was implemented has the
unintended side-effect of disabling part of the syscall restart
logic.
This comes about because the second in_syscall() check in
do_signal() re-evaluates the "in a syscall" condition based on the
updated pt_regs instead of the original pt_regs. forget_syscall()
is explicitly called prior to the second check in order to prevent
restart logic in the ret_to_user path being spuriously triggered,
which means that the second in_syscall() check always yields false.
This triggers a failure in
tools/testing/selftests/seccomp/seccomp_bpf.c, when using ptrace to
suppress a signal that interrups a nanosleep() syscall.
Misbehaviour of this type is only expected in the case where a
tracer suppresses a signal and the target process is either being
single-stepped or the interrupted syscall attempts to restart via
-ERESTARTBLOCK.
This patch restores the old behaviour by performing the
in_syscall() check only once at the start of the function.
Fixes: 17c289586009 ("arm64: Abstract syscallno manipulation")
Signed-off-by: Dave Martin <Dave.Martin(a)arm.com>
Reported-by: Sumit Semwal <sumit.semwal(a)linaro.org>
Cc: Will Deacon <will.deacon(a)arm.com>
Cc: Catalin Marinas <catalin.marinas(a)arm.com>
Cc: <stable(a)vger.kernel.org> # 4.14.x-
---
arch/arm64/kernel/signal.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/arch/arm64/kernel/signal.c b/arch/arm64/kernel/signal.c
index 154b7d3..f212090 100644
--- a/arch/arm64/kernel/signal.c
+++ b/arch/arm64/kernel/signal.c
@@ -830,11 +830,12 @@ static void do_signal(struct pt_regs *regs)
unsigned long continue_addr = 0, restart_addr = 0;
int retval = 0;
struct ksignal ksig;
+ bool syscall = in_syscall(regs);
/*
* If we were from a system call, check for system call restarting...
*/
- if (in_syscall(regs)) {
+ if (syscall) {
continue_addr = regs->pc;
restart_addr = continue_addr - (compat_thumb_mode(regs) ? 2 : 4);
retval = regs->regs[0];
@@ -886,7 +887,7 @@ static void do_signal(struct pt_regs *regs)
* Handle restarting a different system call. As above, if a debugger
* has chosen to restart at a different PC, ignore the restart.
*/
- if (in_syscall(regs) && regs->pc == restart_addr) {
+ if (syscall && regs->pc == restart_addr) {
if (retval == -ERESTART_RESTARTBLOCK)
setup_restart_syscall(regs);
user_rewind_single_step(current);
--
2.1.4
This patch series includes some improvement to Machine check handler
for pseries. Patch 1 fixes an issue where machine check handler crashes
kernel while accessing vmalloc-ed buffer while in nmi context.
Patch 2 fixes endain bug while restoring of r3 in MCE handler.
Patch 4 dumps the SLB contents on SLB MCE errors to improve the debugability.
Patch 5 display's the MCE error details on console.
CHange in V3:
- Moved patch 5 to patch 2
Change in V2:
- patch 3: Display additional info (NIP and task info) in MCE error details.
- patch 5: Fix endain bug while restoring of r3 in MCE handler.
---
Mahesh Salgaonkar (5):
powerpc/pseries: convert rtas_log_buf to linear allocation.
powerpc/pseries: Fix endainness while restoring of r3 in MCE handler.
powerpc/pseries: Define MCE error event section.
powerpc/pseries: Dump and flush SLB contents on SLB MCE errors.
powerpc/pseries: Display machine check error details.
arch/powerpc/include/asm/book3s/64/mmu-hash.h | 1
arch/powerpc/include/asm/rtas.h | 109 ++++++++++++++++++
arch/powerpc/kernel/rtasd.c | 2
arch/powerpc/mm/slb.c | 35 ++++++
arch/powerpc/platforms/pseries/ras.c | 155 +++++++++++++++++++++++++
5 files changed, 299 insertions(+), 3 deletions(-)
--
Signature
From: Arnd Bergmann <arnd(a)arndb.de>
commit 590347e4000356f55eb10b03ced2686bd74dab40 upstream.
gcc-6.3 and earlier show a new warning after a seemingly unrelated
change to the arm64 PAGE_KERNEL definition:
In file included from drivers/md/dm-bufio.c:14:0:
drivers/md/dm-bufio.c: In function 'alloc_buffer':
include/linux/sched/mm.h:182:56: warning: 'noio_flag' may be used uninitialized in this function [-Wmaybe-uninitialized]
current->flags = (current->flags & ~PF_MEMALLOC_NOIO) | flags;
^
The same warning happened earlier on linux-3.18 for MIPS and I did a
workaround for that, but now it's come back.
gcc-7 and newer are apparently smart enough to figure this out, and
other architectures don't show it, so the best I could come up with is
to rework the caller slightly in a way that makes it obvious enough to
all arm64 compilers what is happening here.
Fixes: 41acec624087 ("arm64: kpti: Make use of nG dependent on arm64_kernel_unmapped_at_el0()")
Link: https://patchwork.kernel.org/patch/9692829/
Cc: stable(a)vger.kernel.org
Signed-off-by: Arnd Bergmann <arnd(a)arndb.de>
[snitzer: moved declarations inside conditional, altered vmalloc return]
Signed-off-by: Mike Snitzer <snitzer(a)redhat.com>
[nc: Backport to 4.9, adjust context for lack of 19809c2da28a]
Signed-off-by: Nathan Chancellor <natechancellor(a)gmail.com>
---
drivers/md/dm-bufio.c | 17 +++++++----------
1 file changed, 7 insertions(+), 10 deletions(-)
diff --git a/drivers/md/dm-bufio.c b/drivers/md/dm-bufio.c
index 3ec647e8b9c6..35fd57fdeba9 100644
--- a/drivers/md/dm-bufio.c
+++ b/drivers/md/dm-bufio.c
@@ -373,9 +373,6 @@ static void __cache_size_refresh(void)
static void *alloc_buffer_data(struct dm_bufio_client *c, gfp_t gfp_mask,
enum data_mode *data_mode)
{
- unsigned noio_flag;
- void *ptr;
-
if (c->block_size <= DM_BUFIO_BLOCK_SIZE_SLAB_LIMIT) {
*data_mode = DATA_MODE_SLAB;
return kmem_cache_alloc(DM_BUFIO_CACHE(c), gfp_mask);
@@ -399,16 +396,16 @@ static void *alloc_buffer_data(struct dm_bufio_client *c, gfp_t gfp_mask,
* all allocations done by this process (including pagetables) are done
* as if GFP_NOIO was specified.
*/
+ if (gfp_mask & __GFP_NORETRY) {
+ unsigned noio_flag = memalloc_noio_save();
+ void *ptr = __vmalloc(c->block_size, gfp_mask | __GFP_HIGHMEM,
+ PAGE_KERNEL);
- if (gfp_mask & __GFP_NORETRY)
- noio_flag = memalloc_noio_save();
-
- ptr = __vmalloc(c->block_size, gfp_mask | __GFP_HIGHMEM, PAGE_KERNEL);
-
- if (gfp_mask & __GFP_NORETRY)
memalloc_noio_restore(noio_flag);
+ return ptr;
+ }
- return ptr;
+ return __vmalloc(c->block_size, gfp_mask | __GFP_HIGHMEM, PAGE_KERNEL);
}
/*
--
2.17.1
From: Vlastimil Babka <vbabka(a)suse.cz>
Subject: mm, page_alloc: do not break __GFP_THISNODE by zonelist reset
In __alloc_pages_slowpath() we reset zonelist and preferred_zoneref for
allocations that can ignore memory policies. The zonelist is obtained
from current CPU's node. This is a problem for __GFP_THISNODE allocations
that want to allocate on a different node, e.g. because the allocating
thread has been migrated to a different CPU.
This has been observed to break SLAB in our 4.4-based kernel, because
there it relies on __GFP_THISNODE working as intended. If a slab page is
put on wrong node's list, then further list manipulations may corrupt the
list because page_to_nid() is used to determine which node's list_lock
should be locked and thus we may take a wrong lock and race.
Current SLAB implementation seems to be immune by luck thanks to commit
511e3a058812 ("mm/slab: make cache_grow() handle the page allocated on
arbitrary node") but there may be others assuming that __GFP_THISNODE
works as promised.
We can fix it by simply removing the zonelist reset completely. There is
actually no reason to reset it, because memory policies and cpusets don't
affect the zonelist choice in the first place. This was different when
commit 183f6371aac2 ("mm: ignore mempolicies when using
ALLOC_NO_WATERMARK") introduced the code, as mempolicies provided their
own restricted zonelists.
We might consider this for 4.17 although I don't know if there's anything
currently broken.
SLAB is currently not affected, but in kernels older than 4.7 that
don't yet have 511e3a058812 ("mm/slab: make cache_grow() handle the
page allocated on arbitrary node") it is. That's at least 4.4 LTS.
Older ones I'll have to check.
So stable backports should be more important, but will have to be
reviewed carefully, as the code went through many changes. BTW I think
that also the ac->preferred_zoneref reset is currently useless if we
don't also reset ac->nodemask from a mempolicy to NULL first (which we
probably should for the OOM victims etc?), but I would leave that for a
separate patch.
Link: http://lkml.kernel.org/r/20180525130853.13915-1-vbabka@suse.cz
Signed-off-by: Vlastimil Babka <vbabka(a)suse.cz>
Fixes: 183f6371aac2 ("mm: ignore mempolicies when using ALLOC_NO_WATERMARK")
Acked-by: Mel Gorman <mgorman(a)techsingularity.net>
Cc: Michal Hocko <mhocko(a)kernel.org>
Cc: David Rientjes <rientjes(a)google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim(a)lge.com>
Cc: Vlastimil Babka <vbabka(a)suse.cz>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/page_alloc.c | 1 -
1 file changed, 1 deletion(-)
diff -puN mm/page_alloc.c~mm-page_alloc-do-not-break-__gfp_thisnode-by-zonelist-reset mm/page_alloc.c
--- a/mm/page_alloc.c~mm-page_alloc-do-not-break-__gfp_thisnode-by-zonelist-reset
+++ a/mm/page_alloc.c
@@ -4169,7 +4169,6 @@ retry:
* orientated.
*/
if (!(alloc_flags & ALLOC_CPUSET) || reserve_flags) {
- ac->zonelist = node_zonelist(numa_node_id(), gfp_mask);
ac->preferred_zoneref = first_zones_zonelist(ac->zonelist,
ac->high_zoneidx, ac->nodemask);
}
_