Hi Jason and Doug,
Here are most of our updates for 4.17. I will follow this up with a small 16B series then I still have a few more patches that are waiting on some more thorough testing. Should be able to get them on the list tomrrow or Friday at the latest, wanted to get these out now. I don't think anything is really that scary in here.
These are mostly driver fixes. Patches 4,7,8,11, and 14 are marked stable. They didn't get sent for the -rc because they fix really old issues.
Patch 5 is a core fix. I should have sent it a bit sooner, sorry about that but it's pretty trivial so I decided to include it as well rather than wait for 4.18.
---
Alex Estrin (2): IB/hfi1: Complete check for locally terminated smp IB/{hfi1,qib}: Add handling of kernel restart
Ashutosh Dixit (1): IB/core: Fix rkey invalidation from user space into the kernel
Michael J. Ruhl (5): IB/hfi1: Return actual error value from program_rcvarray() IB/hfi1: Use after free race condition in send context error path IB/hfi1 Use correct type for num_user_context IB/hfi1: Return correct value for device state IB/hfi1: Reorder incorrect send context disable
Mike Marciniszyn (3): IB/hfi1: Fix handling of FECN marked multicast packet IB/hfi1: Fix fault injection init/exit issues IB/hfi1: Fix loss of BECN with AHG
Sebastian Sanchez (2): IB/hfi1: Prevent LNI hang when LCB can't obtain lanes IB/{hfi1,rdmavt,qib}: Fit kernel completions into single aligned cache-line
drivers/infiniband/core/uverbs_cmd.c | 4 + drivers/infiniband/hw/hfi1/chip.c | 59 ++++++++--- drivers/infiniband/hw/hfi1/chip.h | 15 ++- drivers/infiniband/hw/hfi1/chip_registers.h | 7 + drivers/infiniband/hw/hfi1/debugfs.c | 8 + drivers/infiniband/hw/hfi1/driver.c | 19 +++- drivers/infiniband/hw/hfi1/file_ops.c | 2 drivers/infiniband/hw/hfi1/hfi.h | 9 +- drivers/infiniband/hw/hfi1/init.c | 9 +- drivers/infiniband/hw/hfi1/mad.c | 36 ++++--- drivers/infiniband/hw/hfi1/pio.c | 44 ++++++-- drivers/infiniband/hw/hfi1/rc.c | 2 drivers/infiniband/hw/hfi1/ruc.c | 54 ++++++++-- drivers/infiniband/hw/hfi1/uc.c | 2 drivers/infiniband/hw/hfi1/ud.c | 10 +- drivers/infiniband/hw/hfi1/user_exp_rcv.c | 1 drivers/infiniband/hw/qib/qib.h | 1 drivers/infiniband/hw/qib/qib_init.c | 5 + drivers/infiniband/hw/qib/qib_rc.c | 2 drivers/infiniband/hw/qib/qib_ruc.c | 4 - drivers/infiniband/hw/qib/qib_uc.c | 2 drivers/infiniband/hw/qib/qib_ud.c | 4 - drivers/infiniband/sw/rdmavt/cq.c | 146 ++++++++++++++++++--------- drivers/infiniband/sw/rdmavt/qp.c | 4 - drivers/infiniband/sw/rdmavt/trace_cq.h | 6 + include/rdma/ib_verbs.h | 5 + include/rdma/rdmavt_cq.h | 35 +++++- include/rdma/rdmavt_qp.h | 2 28 files changed, 344 insertions(+), 153 deletions(-)
-- -Denny
From: Mike Marciniszyn mike.marciniszyn@intel.com
The code for handling a marked UD packet unconditionally returns the dlid in the header of the FECN marked packet. This is not correct for multicast packets where the DLID is in the multicast range.
The subsequent attempt to send the CNP with the multicast lid will cause the chip to halt the ack send context because the source lid doesn't match the chip programming. The send context will be halted and flush any other pending packets in the pio ring causing the CNP to not be sent.
A part of investigating the fix, it was determinted that the 16B work broke the FECN routine badly with inconsitent use of 16 bit and 32 bits types for lids and pkeys. Since the port's source lid was correctly 32 bits the type mixmatches need to be dealt with at the same time as fixing the CNP header issue.
Fix these issues by: - Using the ports lid for as the SLID for responding to FECN marked UD packets - Insure pkey is always 16 bit in this and subordinate routines - Insure lids are 32 bits in this and subordinate routines
Cc: stable@vger.kernel.org # 4.14.x Fixes: 88733e3b8450 ("IB/hfi1: Add 16B UD support") Reviewed-by: Don Hiatt don.hiatt@intel.com Reviewed-by: Michael J. Ruhl michael.j.ruhl@intel.com Signed-off-by: Mike Marciniszyn mike.marciniszyn@intel.com Signed-off-by: Dennis Dalessandro dennis.dalessandro@intel.com --- drivers/infiniband/hw/hfi1/driver.c | 19 +++++++++++++++---- drivers/infiniband/hw/hfi1/hfi.h | 8 ++++---- drivers/infiniband/hw/hfi1/ud.c | 6 +++--- 3 files changed, 22 insertions(+), 11 deletions(-)
diff --git a/drivers/infiniband/hw/hfi1/driver.c b/drivers/infiniband/hw/hfi1/driver.c index addc68e..78f7c4e 100644 --- a/drivers/infiniband/hw/hfi1/driver.c +++ b/drivers/infiniband/hw/hfi1/driver.c @@ -432,31 +432,43 @@ void hfi1_process_ecn_slowpath(struct rvt_qp *qp, struct hfi1_packet *pkt, bool do_cnp) { struct hfi1_ibport *ibp = to_iport(qp->ibqp.device, qp->port_num); + struct hfi1_pportdata *ppd = ppd_from_ibp(ibp); struct ib_other_headers *ohdr = pkt->ohdr; struct ib_grh *grh = pkt->grh; u32 rqpn = 0, bth1; - u16 pkey, rlid, dlid = ib_get_dlid(pkt->hdr); + u16 pkey; + u32 rlid, slid, dlid = 0; u8 hdr_type, sc, svc_type; bool is_mcast = false;
+ /* can be called from prescan */ if (pkt->etype == RHF_RCV_TYPE_BYPASS) { is_mcast = hfi1_is_16B_mcast(dlid); pkey = hfi1_16B_get_pkey(pkt->hdr); sc = hfi1_16B_get_sc(pkt->hdr); + dlid = hfi1_16B_get_dlid(pkt->hdr); + slid = hfi1_16B_get_slid(pkt->hdr); hdr_type = HFI1_PKT_TYPE_16B; } else { is_mcast = (dlid > be16_to_cpu(IB_MULTICAST_LID_BASE)) && (dlid != be16_to_cpu(IB_LID_PERMISSIVE)); pkey = ib_bth_get_pkey(ohdr); sc = hfi1_9B_get_sc5(pkt->hdr, pkt->rhf); + dlid = ib_get_dlid(pkt->hdr); + slid = ib_get_slid(pkt->hdr); hdr_type = HFI1_PKT_TYPE_9B; }
switch (qp->ibqp.qp_type) { + case IB_QPT_UD: + dlid = ppd->lid; + rlid = slid; + rqpn = ib_get_sqpn(pkt->ohdr); + svc_type = IB_CC_SVCTYPE_UD; + break; case IB_QPT_SMI: case IB_QPT_GSI: - case IB_QPT_UD: - rlid = ib_get_slid(pkt->hdr); + rlid = slid; rqpn = ib_get_sqpn(pkt->ohdr); svc_type = IB_CC_SVCTYPE_UD; break; @@ -481,7 +493,6 @@ void hfi1_process_ecn_slowpath(struct rvt_qp *qp, struct hfi1_packet *pkt, dlid, rlid, sc, grh);
if (!is_mcast && (bth1 & IB_BECN_SMASK)) { - struct hfi1_pportdata *ppd = ppd_from_ibp(ibp); u32 lqpn = bth1 & RVT_QPN_MASK; u8 sl = ibp->sc_to_sl[sc];
diff --git a/drivers/infiniband/hw/hfi1/hfi.h b/drivers/infiniband/hw/hfi1/hfi.h index 90bc8c7..4305000 100644 --- a/drivers/infiniband/hw/hfi1/hfi.h +++ b/drivers/infiniband/hw/hfi1/hfi.h @@ -1538,13 +1538,13 @@ static inline u32 egress_cycles(u32 len, u32 rate) void process_becn(struct hfi1_pportdata *ppd, u8 sl, u32 rlid, u32 lqpn, u32 rqpn, u8 svc_type); void return_cnp(struct hfi1_ibport *ibp, struct rvt_qp *qp, u32 remote_qpn, - u32 pkey, u32 slid, u32 dlid, u8 sc5, + u16 pkey, u32 slid, u32 dlid, u8 sc5, const struct ib_grh *old_grh); void return_cnp_16B(struct hfi1_ibport *ibp, struct rvt_qp *qp, - u32 remote_qpn, u32 pkey, u32 slid, u32 dlid, + u32 remote_qpn, u16 pkey, u32 slid, u32 dlid, u8 sc5, const struct ib_grh *old_grh); typedef void (*hfi1_handle_cnp)(struct hfi1_ibport *ibp, struct rvt_qp *qp, - u32 remote_qpn, u32 pkey, u32 slid, u32 dlid, + u32 remote_qpn, u16 pkey, u32 slid, u32 dlid, u8 sc5, const struct ib_grh *old_grh);
#define PKEY_CHECK_INVALID -1 @@ -2438,7 +2438,7 @@ static inline void hfi1_make_16b_hdr(struct hfi1_16b_header *hdr, ((slid >> OPA_16B_SLID_SHIFT) << OPA_16B_SLID_HIGH_SHIFT); lrh2 = (lrh2 & ~OPA_16B_DLID_MASK) | ((dlid >> OPA_16B_DLID_SHIFT) << OPA_16B_DLID_HIGH_SHIFT); - lrh2 = (lrh2 & ~OPA_16B_PKEY_MASK) | (pkey << OPA_16B_PKEY_SHIFT); + lrh2 = (lrh2 & ~OPA_16B_PKEY_MASK) | ((u32)pkey << OPA_16B_PKEY_SHIFT); lrh2 = (lrh2 & ~OPA_16B_L4_MASK) | l4;
hdr->lrh[0] = lrh0; diff --git a/drivers/infiniband/hw/hfi1/ud.c b/drivers/infiniband/hw/hfi1/ud.c index bcf3b0b..49a38a6 100644 --- a/drivers/infiniband/hw/hfi1/ud.c +++ b/drivers/infiniband/hw/hfi1/ud.c @@ -628,7 +628,7 @@ int hfi1_lookup_pkey_idx(struct hfi1_ibport *ibp, u16 pkey) }
void return_cnp_16B(struct hfi1_ibport *ibp, struct rvt_qp *qp, - u32 remote_qpn, u32 pkey, u32 slid, u32 dlid, + u32 remote_qpn, u16 pkey, u32 slid, u32 dlid, u8 sc5, const struct ib_grh *old_grh) { u64 pbc, pbc_flags = 0; @@ -687,7 +687,7 @@ void return_cnp_16B(struct hfi1_ibport *ibp, struct rvt_qp *qp, }
void return_cnp(struct hfi1_ibport *ibp, struct rvt_qp *qp, u32 remote_qpn, - u32 pkey, u32 slid, u32 dlid, u8 sc5, + u16 pkey, u32 slid, u32 dlid, u8 sc5, const struct ib_grh *old_grh) { u64 pbc, pbc_flags = 0; @@ -719,7 +719,7 @@ void return_cnp(struct hfi1_ibport *ibp, struct rvt_qp *qp, u32 remote_qpn,
lrh0 |= (sc5 & 0xf) << 12 | sl << 4;
- bth0 = pkey | (IB_OPCODE_CNP << 24); + bth0 = (u32)pkey | (IB_OPCODE_CNP << 24); ohdr->bth[0] = cpu_to_be32(bth0);
ohdr->bth[1] = cpu_to_be32(remote_qpn | (1 << IB_BECN_SHIFT));
From: Mike Marciniszyn mike.marciniszyn@intel.com
There are config dependent code paths that expose panics in unload paths both in this file and in debugfs_remove_recursive() because CONFIG_FAULT_INJECTION and CONFIG_FAULT_INJECTION_DEBUG_FS can be set independently.
Having CONFIG_FAULT_INJECTION set and CONFIG_FAULT_INJECTION_DEBUG_FS reset causes fault_create_debugfs_attr() to return an error.
The debugfs.c routines tolerate failures, but the module unload panics dereferencing a NULL in the two exit routines. If that is fixed, the dir passed to debugfs_remove_recursive comes from a memory location that was freed and potentially reused causing a segfault or corrupting memory.
Here is an example of the NULL deref panic:
[66866.286829] BUG: unable to handle kernel NULL pointer dereference at 0000000000000088 [66866.295602] IP: hfi1_dbg_ibdev_exit+0x2a/0x80 [hfi1] [66866.301138] PGD 858496067 P4D 858496067 PUD 8433a7067 PMD 0 [66866.307452] Oops: 0000 [#1] SMP [66866.310953] Modules linked in: hfi1(-) rdmavt rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm iw_cm ib_cm ib_core rpcsec_gss_krb5 nfsv4 dns_resolver nfsv3 nfs fscache sb_edac x86_pkg_temp_thermal intel_powerclamp vfat fat coretemp kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc aesni_intel iTCO_wdt iTCO_vendor_support crypto_simd mei_me glue_helper cryptd mxm_wmi ipmi_si pcspkr lpc_ich sg mei ioatdma ipmi_devintf i2c_i801 mfd_core shpchp ipmi_msghandler wmi acpi_power_meter acpi_cpufreq nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables ext4 mbcache jbd2 sd_mod mgag200 drm_kms_helper syscopyarea sysfillrect sysimgblt igb fb_sys_fops ttm ahci ptp crc32c_intel libahci pps_core drm dca libata i2c_algo_bit i2c_core [last unloaded: opa_vnic] [66866.385551] CPU: 8 PID: 7470 Comm: rmmod Not tainted 4.14.0-mam-tid-rdma #2 [66866.393317] Hardware name: Intel Corporation S2600WT2/S2600WT2, BIOS SE5C610.86B.01.01.0018.C4.072020161249 07/20/2016 [66866.405252] task: ffff88084f28c380 task.stack: ffffc90008454000 [66866.411866] RIP: 0010:hfi1_dbg_ibdev_exit+0x2a/0x80 [hfi1] [66866.417984] RSP: 0018:ffffc90008457da0 EFLAGS: 00010202 [66866.423812] RAX: 0000000000000000 RBX: ffff880857de0000 RCX: 0000000180040001 [66866.431773] RDX: 0000000180040002 RSI: ffffea0021088200 RDI: 0000000040000000 [66866.439734] RBP: ffffc90008457da8 R08: ffff88084220e000 R09: 0000000180040001 [66866.447696] R10: 000000004220e001 R11: ffff88084220e000 R12: ffff88085a31c000 [66866.455657] R13: ffffffffa07c9820 R14: ffffffffa07c9890 R15: ffff881059d78100 [66866.463618] FS: 00007f6876047740(0000) GS:ffff88085f800000(0000) knlGS:0000000000000000 [66866.472644] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [66866.479053] CR2: 0000000000000088 CR3: 0000000856357006 CR4: 00000000001606e0 [66866.487013] Call Trace: [66866.489747] remove_one+0x1f/0x220 [hfi1] [66866.494221] pci_device_remove+0x39/0xc0 [66866.498596] device_release_driver_internal+0x141/0x210 [66866.504424] driver_detach+0x3f/0x80 [66866.508409] bus_remove_driver+0x55/0xd0 [66866.512784] driver_unregister+0x2c/0x50 [66866.517164] pci_unregister_driver+0x2a/0xa0 [66866.521934] hfi1_mod_cleanup+0x10/0xaa2 [hfi1] [66866.526988] SyS_delete_module+0x171/0x250 [66866.531558] do_syscall_64+0x67/0x1b0 [66866.535644] entry_SYSCALL64_slow_path+0x25/0x25 [66866.540792] RIP: 0033:0x7f6875525c27 [66866.544777] RSP: 002b:00007ffd48528e78 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0 [66866.553224] RAX: ffffffffffffffda RBX: 0000000001cc01d0 RCX: 00007f6875525c27 [66866.561185] RDX: 00007f6875596000 RSI: 0000000000000800 RDI: 0000000001cc0238 [66866.569146] RBP: 0000000000000000 R08: 00007f68757e9060 R09: 00007f6875596000 [66866.577120] R10: 00007ffd48528c00 R11: 0000000000000206 R12: 00007ffd48529db4 [66866.585080] R13: 0000000000000000 R14: 0000000001cc01d0 R15: 0000000001cc0010 [66866.593040] Code: 90 0f 1f 44 00 00 48 83 3d a3 8b 03 00 00 55 48 89 e5 53 48 89 fb 74 4e 48 8d bf 18 0c 00 00 e8 9d f2 ff ff 48 8b 83 20 0c 00 00 <48> 8b b8 88 00 00 00 e8 2a 21 b3 e0 48 8b bb 20 0c 00 00 e8 0e [66866.614127] RIP: hfi1_dbg_ibdev_exit+0x2a/0x80 [hfi1] RSP: ffffc90008457da0 [66866.621885] CR2: 0000000000000088 [66866.625618] ---[ end trace c4817425783fb092 ]---
Fix by insuring that upon failure from fault_create_debugfs_attr() the parent pointer for the routines is always set to NULL and guards added in the exit routines to insure that debugfs_remove_recursive() is not called when when the parent pointer is NULL.
Fixes: 0181ce31b260 ("IB/hfi1: Add receive fault injection feature") Cc: stable@vger.kernel.org # 4.14.x Reviewed-by: Michael J. Ruhl michael.j.ruhl@intel.com Signed-off-by: Mike Marciniszyn mike.marciniszyn@intel.com Signed-off-by: Dennis Dalessandro dennis.dalessandro@intel.com --- drivers/infiniband/hw/hfi1/debugfs.c | 8 ++++++-- 1 files changed, 6 insertions(+), 2 deletions(-)
diff --git a/drivers/infiniband/hw/hfi1/debugfs.c b/drivers/infiniband/hw/hfi1/debugfs.c index 852173b..5343960 100644 --- a/drivers/infiniband/hw/hfi1/debugfs.c +++ b/drivers/infiniband/hw/hfi1/debugfs.c @@ -1227,7 +1227,8 @@ static int _fault_stats_seq_show(struct seq_file *s, void *v)
static void fault_exit_opcode_debugfs(struct hfi1_ibdev *ibd) { - debugfs_remove_recursive(ibd->fault_opcode->dir); + if (ibd->fault_opcode) + debugfs_remove_recursive(ibd->fault_opcode->dir); kfree(ibd->fault_opcode); ibd->fault_opcode = NULL; } @@ -1255,6 +1256,7 @@ static int fault_init_opcode_debugfs(struct hfi1_ibdev *ibd) &ibd->fault_opcode->attr); if (IS_ERR(ibd->fault_opcode->dir)) { kfree(ibd->fault_opcode); + ibd->fault_opcode = NULL; return -ENOENT; }
@@ -1278,7 +1280,8 @@ static int fault_init_opcode_debugfs(struct hfi1_ibdev *ibd)
static void fault_exit_packet_debugfs(struct hfi1_ibdev *ibd) { - debugfs_remove_recursive(ibd->fault_packet->dir); + if (ibd->fault_packet) + debugfs_remove_recursive(ibd->fault_packet->dir); kfree(ibd->fault_packet); ibd->fault_packet = NULL; } @@ -1304,6 +1307,7 @@ static int fault_init_packet_debugfs(struct hfi1_ibdev *ibd) &ibd->fault_opcode->attr); if (IS_ERR(ibd->fault_packet->dir)) { kfree(ibd->fault_packet); + ibd->fault_packet = NULL; return -ENOENT; }
From: Michael J. Ruhl michael.j.ruhl@intel.com
A pio send egress error can occur when the PSM library attempts to to send a bad packet. That issue is still being investigated.
The pio error interrupt handler then attempts to progress the recovery of the errored pio send context.
Code inspection reveals that the handling lacks the necessary locking if that recovery interleaves with a PSM close of the "context" object contains the pio send context.
The lack of the locking can cause the recovery to access the already freed pio send context object and incorrectly deduce that the pio send context is actually a kernel pio send context as shown by the NULL deref stack below:
[<ffffffff8143d78c>] _dev_info+0x6c/0x90 [<ffffffffc0613230>] sc_restart+0x70/0x1f0 [hfi1] [<ffffffff816ab124>] ? __schedule+0x424/0x9b0 [<ffffffffc06133c5>] sc_halted+0x15/0x20 [hfi1] [<ffffffff810aa3ba>] process_one_work+0x17a/0x440 [<ffffffff810ab086>] worker_thread+0x126/0x3c0 [<ffffffff810aaf60>] ? manage_workers.isra.24+0x2a0/0x2a0 [<ffffffff810b252f>] kthread+0xcf/0xe0 [<ffffffff810b2460>] ? insert_kthread_work+0x40/0x40 [<ffffffff816b8798>] ret_from_fork+0x58/0x90 [<ffffffff810b2460>] ? insert_kthread_work+0x40/0x40
This is the best case scenario and other scenarios can corrupt the already freed memory.
Fix by adding the necessary locking in the pio send context error handler.
Cc: stable@vger.kernel.org # 4.9.x Reviewed-by: Mike Marciniszyn mike.marciniszyn@intel.com Reviewed-by: Dennis Dalessandro dennis.dalessandro@intel.com Signed-off-by: Michael J. Ruhl michael.j.ruhl@intel.com Signed-off-by: Dennis Dalessandro dennis.dalessandro@intel.com --- drivers/infiniband/hw/hfi1/chip.c | 4 ++++ 1 files changed, 4 insertions(+), 0 deletions(-)
diff --git a/drivers/infiniband/hw/hfi1/chip.c b/drivers/infiniband/hw/hfi1/chip.c index cb9095d..e5254f8 100644 --- a/drivers/infiniband/hw/hfi1/chip.c +++ b/drivers/infiniband/hw/hfi1/chip.c @@ -5944,6 +5944,7 @@ static void is_sendctxt_err_int(struct hfi1_devdata *dd, u64 status; u32 sw_index; int i = 0; + unsigned long irq_flags;
sw_index = dd->hw_to_sw[hw_context]; if (sw_index >= dd->num_send_contexts) { @@ -5953,10 +5954,12 @@ static void is_sendctxt_err_int(struct hfi1_devdata *dd, return; } sci = &dd->send_contexts[sw_index]; + spin_lock_irqsave(&dd->sc_lock, irq_flags); sc = sci->sc; if (!sc) { dd_dev_err(dd, "%s: context %u(%u): no sc?\n", __func__, sw_index, hw_context); + spin_unlock_irqrestore(&dd->sc_lock, irq_flags); return; }
@@ -5978,6 +5981,7 @@ static void is_sendctxt_err_int(struct hfi1_devdata *dd, */ if (sc->type != SC_USER) queue_work(dd->pport->hfi1_wq, &sc->halt_work); + spin_unlock_irqrestore(&dd->sc_lock, irq_flags);
/* * Update the counters for the corresponding status bits.
From: Michael J. Ruhl michael.j.ruhl@intel.com
User send context integrity bits are cleared before the context is disabled. If the send context is still processing data, any packets that need those integrity bits will cause an error and halt the send context.
During the disable handling, the driver waits for the context to drain. If the context is halted, the driver will eventually timeout because the context won't drain and then incorrectly bounce the link.
Reorder the bit clearing and the context disable.
Examine the software state and send context status as well as the egress status to determine if a send context is in the halted state.
Promote the check macros to static functions for consistency with the new check and to follow kernel style.
Remove an unused define that refers to the egress timeout.
Cc: stable@vger.kernel.org # 4.9.x Reviewed-by: Mitko Haralanov mitko.haralanov@intel.com Reviewed-by: Mike Marciniszyn mike.marciniszyn@intel.com Signed-off-by: Michael J. Ruhl michael.j.ruhl@intel.com Signed-off-by: Dennis Dalessandro dennis.dalessandro@intel.com --- drivers/infiniband/hw/hfi1/file_ops.c | 2 +- drivers/infiniband/hw/hfi1/pio.c | 44 ++++++++++++++++++++++++++------- 2 files changed, 35 insertions(+), 11 deletions(-)
diff --git a/drivers/infiniband/hw/hfi1/file_ops.c b/drivers/infiniband/hw/hfi1/file_ops.c index 41fafeb..b042184 100644 --- a/drivers/infiniband/hw/hfi1/file_ops.c +++ b/drivers/infiniband/hw/hfi1/file_ops.c @@ -689,8 +689,8 @@ static int hfi1_file_close(struct inode *inode, struct file *fp) * checks to default and disable the send context. */ if (uctxt->sc) { - set_pio_integrity(uctxt->sc); sc_disable(uctxt->sc); + set_pio_integrity(uctxt->sc); }
hfi1_free_ctxt_rcv_groups(uctxt); diff --git a/drivers/infiniband/hw/hfi1/pio.c b/drivers/infiniband/hw/hfi1/pio.c index 40dac4d..9cac15d 100644 --- a/drivers/infiniband/hw/hfi1/pio.c +++ b/drivers/infiniband/hw/hfi1/pio.c @@ -50,8 +50,6 @@ #include "qp.h" #include "trace.h"
-#define SC_CTXT_PACKET_EGRESS_TIMEOUT 350 /* in chip cycles */ - #define SC(name) SEND_CTXT_##name /* * Send Context functions @@ -961,15 +959,40 @@ void sc_disable(struct send_context *sc) }
/* return SendEgressCtxtStatus.PacketOccupancy */ -#define packet_occupancy(r) \ - (((r) & SEND_EGRESS_CTXT_STATUS_CTXT_EGRESS_PACKET_OCCUPANCY_SMASK)\ - >> SEND_EGRESS_CTXT_STATUS_CTXT_EGRESS_PACKET_OCCUPANCY_SHIFT) +static u64 packet_occupancy(u64 reg) +{ + return (reg & + SEND_EGRESS_CTXT_STATUS_CTXT_EGRESS_PACKET_OCCUPANCY_SMASK) + >> SEND_EGRESS_CTXT_STATUS_CTXT_EGRESS_PACKET_OCCUPANCY_SHIFT; +}
/* is egress halted on the context? */ -#define egress_halted(r) \ - ((r) & SEND_EGRESS_CTXT_STATUS_CTXT_EGRESS_HALT_STATUS_SMASK) +static bool egress_halted(u64 reg) +{ + return !!(reg & SEND_EGRESS_CTXT_STATUS_CTXT_EGRESS_HALT_STATUS_SMASK); +}
-/* wait for packet egress, optionally pause for credit return */ +/* is the send context halted? */ +static bool is_sc_halted(struct hfi1_devdata *dd, u32 hw_context) +{ + return !!(read_kctxt_csr(dd, hw_context, SC(STATUS)) & + SC(STATUS_CTXT_HALTED_SMASK)); +} + +/** + * sc_wait_for_packet_egress + * @sc: valid send context + * @pause: wait for credit return + * + * Wait for packet egress, optionally pause for credit return + * + * Egress halt and Context halt are not necessarily the same thing, so + * check for both. + * + * NOTE: The context halt bit may not be set immediately. Because of this, + * it is necessary to check the SW SFC_HALTED bit (set in the IRQ) and the HW + * context bit to determine if the context is halted. + */ static void sc_wait_for_packet_egress(struct send_context *sc, int pause) { struct hfi1_devdata *dd = sc->dd; @@ -981,8 +1004,9 @@ static void sc_wait_for_packet_egress(struct send_context *sc, int pause) reg_prev = reg; reg = read_csr(dd, sc->hw_context * 8 + SEND_EGRESS_CTXT_STATUS); - /* done if egress is stopped */ - if (egress_halted(reg)) + /* done if any halt bits, SW or HW are set */ + if (sc->flags & SCF_HALTED || + is_sc_halted(dd, sc->hw_context) || egress_halted(reg)) break; reg = packet_occupancy(reg); if (reg == 0)
From: Alex Estrin alex.estrin@intel.com
A warm restart will fail to unload the driver, leaving link state potentially flapping up to the point the BIOS resets the adapter. Correct the issue by hooking the shutdown pci method, which will bring link down and remove the driver.
Cc: stable@vger.kernel.org # 4.9.x Reviewed-by: Mike Marciniszyn mike.marciniszyn@intel.com Signed-off-by: Alex Estrin alex.estrin@intel.com Signed-off-by: Dennis Dalessandro dennis.dalessandro@intel.com --- drivers/infiniband/hw/hfi1/hfi.h | 1 + drivers/infiniband/hw/hfi1/init.c | 5 +++++ drivers/infiniband/hw/qib/qib.h | 1 + drivers/infiniband/hw/qib/qib_init.c | 5 +++++ 4 files changed, 12 insertions(+), 0 deletions(-)
diff --git a/drivers/infiniband/hw/hfi1/hfi.h b/drivers/infiniband/hw/hfi1/hfi.h index 4305000..777abb8 100644 --- a/drivers/infiniband/hw/hfi1/hfi.h +++ b/drivers/infiniband/hw/hfi1/hfi.h @@ -1857,6 +1857,7 @@ struct cc_state *get_cc_state_protected(struct hfi1_pportdata *ppd) #define HFI1_HAS_SDMA_TIMEOUT 0x8 #define HFI1_HAS_SEND_DMA 0x10 /* Supports Send DMA */ #define HFI1_FORCED_FREEZE 0x80 /* driver forced freeze mode */ +#define HFI1_REMOVE 0x100 /* unloading device */
/* IB dword length mask in PBC (lower 11 bits); same for all chips */ #define HFI1_PBC_LENGTH_MASK ((1 << 11) - 1) diff --git a/drivers/infiniband/hw/hfi1/init.c b/drivers/infiniband/hw/hfi1/init.c index c45cca5..20fa898 100644 --- a/drivers/infiniband/hw/hfi1/init.c +++ b/drivers/infiniband/hw/hfi1/init.c @@ -1388,6 +1388,7 @@ void hfi1_disable_after_error(struct hfi1_devdata *dd) .name = DRIVER_NAME, .probe = init_one, .remove = remove_one, + .shutdown = remove_one, .id_table = hfi1_pci_tbl, .err_handler = &hfi1_pci_err_handler, }; @@ -1768,6 +1769,10 @@ static void remove_one(struct pci_dev *pdev) { struct hfi1_devdata *dd = pci_get_drvdata(pdev);
+ if (dd->flags & HFI1_REMOVE) + return; + dd->flags |= HFI1_REMOVE; + /* close debugfs files before ib unregister */ hfi1_dbg_ibdev_exit(&dd->verbs_dev);
diff --git a/drivers/infiniband/hw/qib/qib.h b/drivers/infiniband/hw/qib/qib.h index 4607245..677b757 100644 --- a/drivers/infiniband/hw/qib/qib.h +++ b/drivers/infiniband/hw/qib/qib.h @@ -1228,6 +1228,7 @@ int qib_cdev_init(int minor, const char *name, #define QIB_BADINTR 0x8000 /* severe interrupt problems */ #define QIB_DCA_ENABLED 0x10000 /* Direct Cache Access enabled */ #define QIB_HAS_QSFP 0x20000 /* device (card instance) has QSFP */ +#define QIB_REMOVE 0x40000 /* unloading device */
/* * values for ppd->lflags (_ib_port_ related flags) diff --git a/drivers/infiniband/hw/qib/qib_init.c b/drivers/infiniband/hw/qib/qib_init.c index 3990f38..796dea4 100644 --- a/drivers/infiniband/hw/qib/qib_init.c +++ b/drivers/infiniband/hw/qib/qib_init.c @@ -1201,6 +1201,7 @@ void qib_disable_after_error(struct qib_devdata *dd) .name = QIB_DRV_NAME, .probe = qib_init_one, .remove = qib_remove_one, + .shutdown = qib_remove_one, .id_table = qib_pci_tbl, .err_handler = &qib_pci_err_handler, }; @@ -1526,6 +1527,10 @@ static void qib_remove_one(struct pci_dev *pdev) struct qib_devdata *dd = pci_get_drvdata(pdev); int ret;
+ if (dd->flags & QIB_REMOVE) + return; + dd->flags |= QIB_REMOVE; + /* unregister from IB core */ qib_unregister_ib_device(dd);
On Wed, Apr 04, 2018 at 08:03:12PM -0700, Dennis Dalessandro wrote:
From: Alex Estrin alex.estrin@intel.com
A warm restart will fail to unload the driver, leaving link state potentially flapping up to the point the BIOS resets the adapter. Correct the issue by hooking the shutdown pci method, which will bring link down and remove the driver.
Cc: stable@vger.kernel.org # 4.9.x Reviewed-by: Mike Marciniszyn mike.marciniszyn@intel.com Signed-off-by: Alex Estrin alex.estrin@intel.com Signed-off-by: Dennis Dalessandro dennis.dalessandro@intel.com drivers/infiniband/hw/hfi1/hfi.h | 1 + drivers/infiniband/hw/hfi1/init.c | 5 +++++ drivers/infiniband/hw/qib/qib.h | 1 + drivers/infiniband/hw/qib/qib_init.c | 5 +++++ 4 files changed, 12 insertions(+), 0 deletions(-)
diff --git a/drivers/infiniband/hw/hfi1/hfi.h b/drivers/infiniband/hw/hfi1/hfi.h index 4305000..777abb8 100644 +++ b/drivers/infiniband/hw/hfi1/hfi.h @@ -1857,6 +1857,7 @@ struct cc_state *get_cc_state_protected(struct hfi1_pportdata *ppd) #define HFI1_HAS_SDMA_TIMEOUT 0x8 #define HFI1_HAS_SEND_DMA 0x10 /* Supports Send DMA */ #define HFI1_FORCED_FREEZE 0x80 /* driver forced freeze mode */ +#define HFI1_REMOVE 0x100 /* unloading device */ /* IB dword length mask in PBC (lower 11 bits); same for all chips */ #define HFI1_PBC_LENGTH_MASK ((1 << 11) - 1) diff --git a/drivers/infiniband/hw/hfi1/init.c b/drivers/infiniband/hw/hfi1/init.c index c45cca5..20fa898 100644 +++ b/drivers/infiniband/hw/hfi1/init.c @@ -1388,6 +1388,7 @@ void hfi1_disable_after_error(struct hfi1_devdata *dd) .name = DRIVER_NAME, .probe = init_one, .remove = remove_one,
- .shutdown = remove_one, .id_table = hfi1_pci_tbl, .err_handler = &hfi1_pci_err_handler,
}; @@ -1768,6 +1769,10 @@ static void remove_one(struct pci_dev *pdev) { struct hfi1_devdata *dd = pci_get_drvdata(pdev);
- if (dd->flags & HFI1_REMOVE)
return;
- dd->flags |= HFI1_REMOVE;
- /* close debugfs files before ib unregister */ hfi1_dbg_ibdev_exit(&dd->verbs_dev);
diff --git a/drivers/infiniband/hw/qib/qib.h b/drivers/infiniband/hw/qib/qib.h index 4607245..677b757 100644 +++ b/drivers/infiniband/hw/qib/qib.h @@ -1228,6 +1228,7 @@ int qib_cdev_init(int minor, const char *name, #define QIB_BADINTR 0x8000 /* severe interrupt problems */ #define QIB_DCA_ENABLED 0x10000 /* Direct Cache Access enabled */ #define QIB_HAS_QSFP 0x20000 /* device (card instance) has QSFP */ +#define QIB_REMOVE 0x40000 /* unloading device */ /*
- values for ppd->lflags (_ib_port_ related flags)
diff --git a/drivers/infiniband/hw/qib/qib_init.c b/drivers/infiniband/hw/qib/qib_init.c index 3990f38..796dea4 100644 +++ b/drivers/infiniband/hw/qib/qib_init.c @@ -1201,6 +1201,7 @@ void qib_disable_after_error(struct qib_devdata *dd) .name = QIB_DRV_NAME, .probe = qib_init_one, .remove = qib_remove_one,
- .shutdown = qib_remove_one,
No way, qib_remove_one() ultimately calls ib_unregister_device() which is not approprite for a shutdown callback, especially since these drivers do not support RDMA hot removal.
I think you need something lighter weight that just turns off the physical port.
Jason
On Wed, Apr 04, 2018 at 08:03:12PM -0700, Dennis Dalessandro wrote:
From: Alex Estrin alex.estrin@intel.com
A warm restart will fail to unload the driver, leaving link state potentially flapping up to the point the BIOS resets the adapter. Correct the issue by hooking the shutdown pci method, which will bring link down and remove the driver.
Cc: stable@vger.kernel.org # 4.9.x Reviewed-by: Mike Marciniszyn mike.marciniszyn@intel.com Signed-off-by: Alex Estrin alex.estrin@intel.com Signed-off-by: Dennis Dalessandro dennis.dalessandro@intel.com drivers/infiniband/hw/hfi1/hfi.h | 1 + drivers/infiniband/hw/hfi1/init.c | 5 +++++ drivers/infiniband/hw/qib/qib.h | 1 + drivers/infiniband/hw/qib/qib_init.c | 5 +++++ 4 files changed, 12 insertions(+), 0 deletions(-)
diff --git a/drivers/infiniband/hw/hfi1/hfi.h b/drivers/infiniband/hw/hfi1/hfi.h index 4305000..777abb8 100644 +++ b/drivers/infiniband/hw/hfi1/hfi.h @@ -1857,6 +1857,7 @@ struct cc_state *get_cc_state_protected(struct
hfi1_pportdata *ppd)
#define HFI1_HAS_SDMA_TIMEOUT 0x8 #define HFI1_HAS_SEND_DMA 0x10 /* Supports Send DMA */ #define HFI1_FORCED_FREEZE 0x80 /* driver forced freeze mode */ +#define HFI1_REMOVE 0x100 /* unloading device */
/* IB dword length mask in PBC (lower 11 bits); same for all chips */ #define HFI1_PBC_LENGTH_MASK ((1 << 11) - 1) diff --git a/drivers/infiniband/hw/hfi1/init.c b/drivers/infiniband/hw/hfi1/init.c index c45cca5..20fa898 100644 +++ b/drivers/infiniband/hw/hfi1/init.c @@ -1388,6 +1388,7 @@ void hfi1_disable_after_error(struct hfi1_devdata
*dd)
.name = DRIVER_NAME, .probe = init_one, .remove = remove_one,
- .shutdown = remove_one, .id_table = hfi1_pci_tbl, .err_handler = &hfi1_pci_err_handler,
}; @@ -1768,6 +1769,10 @@ static void remove_one(struct pci_dev *pdev) { struct hfi1_devdata *dd = pci_get_drvdata(pdev);
- if (dd->flags & HFI1_REMOVE)
return;
- dd->flags |= HFI1_REMOVE;
- /* close debugfs files before ib unregister */ hfi1_dbg_ibdev_exit(&dd->verbs_dev);
diff --git a/drivers/infiniband/hw/qib/qib.h b/drivers/infiniband/hw/qib/qib.h index 4607245..677b757 100644 +++ b/drivers/infiniband/hw/qib/qib.h @@ -1228,6 +1228,7 @@ int qib_cdev_init(int minor, const char *name, #define QIB_BADINTR 0x8000 /* severe interrupt problems */ #define QIB_DCA_ENABLED 0x10000 /* Direct Cache Access enabled */ #define QIB_HAS_QSFP 0x20000 /* device (card instance) has QSFP */ +#define QIB_REMOVE 0x40000 /* unloading device */
/*
- values for ppd->lflags (_ib_port_ related flags)
diff --git a/drivers/infiniband/hw/qib/qib_init.c
b/drivers/infiniband/hw/qib/qib_init.c
index 3990f38..796dea4 100644 +++ b/drivers/infiniband/hw/qib/qib_init.c @@ -1201,6 +1201,7 @@ void qib_disable_after_error(struct qib_devdata *dd) .name = QIB_DRV_NAME, .probe = qib_init_one, .remove = qib_remove_one,
- .shutdown = qib_remove_one,
No way, qib_remove_one() ultimately calls ib_unregister_device() which is not approprite for a shutdown callback, especially since these drivers do not support RDMA hot removal.
I think you need something lighter weight that just turns off the physical port.
Agreed. Non-blocking hw shutdown is what we need here. V2 will be posted shortly. Thanks for the feedback, Alex.
Jason
From: Mike Marciniszyn mike.marciniszyn@intel.com
AHG may be armed to use the stored header, which by design is limited to edits in the PSN/A 32 bit word (bth2).
When the code is trying to send a BECN, the use of the stored header will lose the BECN bit.
Fix by avoiding AHG when getting ready to send a BECN. This is accomplished by always claiming the packet is not a middle packet which is an AHG precursor. BECNs are not a normal case and this should not hurt AHG optimizations.
Cc: stable@vger.kernel.org # 4.14.x Reviewed-by: Michael J. Ruhl michael.j.ruhl@intel.com Signed-off-by: Mike Marciniszyn mike.marciniszyn@intel.com Signed-off-by: Dennis Dalessandro dennis.dalessandro@intel.com --- drivers/infiniband/hw/hfi1/ruc.c | 50 ++++++++++++++++++++++++++++++-------- 1 files changed, 40 insertions(+), 10 deletions(-)
diff --git a/drivers/infiniband/hw/hfi1/ruc.c b/drivers/infiniband/hw/hfi1/ruc.c index bab720c..e2fa689 100644 --- a/drivers/infiniband/hw/hfi1/ruc.c +++ b/drivers/infiniband/hw/hfi1/ruc.c @@ -733,6 +733,20 @@ static inline void hfi1_make_ruc_bth(struct rvt_qp *qp, ohdr->bth[2] = cpu_to_be32(bth2); }
+/** + * hfi1_make_ruc_header_16B - build a 16B header + * @qp: the queue pair + * @ohdr: a pointer to the destination header memory + * @bth0: bth0 passed in from the RC/UC builder + * @bth2: bth2 passed in from the RC/UC builder + * @middle: non zero implies indicates ahg "could" be used + * @ps: the current packet state + * + * This routine may disarm ahg under these situations: + * - packet needs a GRH + * - BECN needed + * - migration state not IB_MIG_MIGRATED + */ static inline void hfi1_make_ruc_header_16B(struct rvt_qp *qp, struct ib_other_headers *ohdr, u32 bth0, u32 bth2, int middle, @@ -777,6 +791,12 @@ static inline void hfi1_make_ruc_header_16B(struct rvt_qp *qp, else middle = 0;
+ if (qp->s_flags & RVT_S_ECN) { + qp->s_flags &= ~RVT_S_ECN; + /* we recently received a FECN, so return a BECN */ + becn = true; + middle = 0; + } if (middle) build_ahg(qp, bth2); else @@ -784,11 +804,6 @@ static inline void hfi1_make_ruc_header_16B(struct rvt_qp *qp,
bth0 |= pkey; bth0 |= extra_bytes << 20; - if (qp->s_flags & RVT_S_ECN) { - qp->s_flags &= ~RVT_S_ECN; - /* we recently received a FECN, so return a BECN */ - becn = true; - } hfi1_make_ruc_bth(qp, ohdr, bth0, bth1, bth2);
if (!ppd->lid) @@ -806,6 +821,20 @@ static inline void hfi1_make_ruc_header_16B(struct rvt_qp *qp, pkey, becn, 0, l4, priv->s_sc); }
+/** + * hfi1_make_ruc_header_9B - build a 9B header + * @qp: the queue pair + * @ohdr: a pointer to the destination header memory + * @bth0: bth0 passed in from the RC/UC builder + * @bth2: bth2 passed in from the RC/UC builder + * @middle: non zero implies indicates ahg "could" be used + * @ps: the current packet state + * + * This routine may disarm ahg under these situations: + * - packet needs a GRH + * - BECN needed + * - migration state not IB_MIG_MIGRATED + */ static inline void hfi1_make_ruc_header_9B(struct rvt_qp *qp, struct ib_other_headers *ohdr, u32 bth0, u32 bth2, int middle, @@ -839,6 +868,12 @@ static inline void hfi1_make_ruc_header_9B(struct rvt_qp *qp, else middle = 0;
+ if (qp->s_flags & RVT_S_ECN) { + qp->s_flags &= ~RVT_S_ECN; + /* we recently received a FECN, so return a BECN */ + bth1 |= (IB_BECN_MASK << IB_BECN_SHIFT); + middle = 0; + } if (middle) build_ahg(qp, bth2); else @@ -846,11 +881,6 @@ static inline void hfi1_make_ruc_header_9B(struct rvt_qp *qp,
bth0 |= pkey; bth0 |= extra_bytes << 20; - if (qp->s_flags & RVT_S_ECN) { - qp->s_flags &= ~RVT_S_ECN; - /* we recently received a FECN, so return a BECN */ - bth1 |= (IB_BECN_MASK << IB_BECN_SHIFT); - } hfi1_make_ruc_bth(qp, ohdr, bth0, bth1, bth2); hfi1_make_ib_hdr(&ps->s_txreq->phdr.hdr.ibh, lrh0,
On Wed, Apr 04, 2018 at 08:01:11PM -0700, Dennis Dalessandro wrote:
Hi Jason and Doug,
Here are most of our updates for 4.17. I will follow this up with a small 16B series then I still have a few more patches that are waiting on some more thorough testing. Should be able to get them on the list tomrrow or Friday at the latest, wanted to get these out now. I don't think anything is really that scary in here.
Probably not going to happen for 4.17.
As a general rule series that are not on the list before the merge window opens are not going to be make it. Linus discourages doing things at the last minute as it encourages bad behavior that reduces release quality.
In this specific case, Doug and I are both going to be at OFA next week so I'm not anticipating a 2nd pull request. I expect the only PR will go out on Friday.
If any of these are super important for 4.17 (eg would go to -rc) then let me now and I can look at only those, and they need really good rc worthy commit messages..
Jason
On 4/4/2018 11:09 PM, Jason Gunthorpe wrote:
On Wed, Apr 04, 2018 at 08:01:11PM -0700, Dennis Dalessandro wrote:
Hi Jason and Doug,
Here are most of our updates for 4.17. I will follow this up with a small 16B series then I still have a few more patches that are waiting on some more thorough testing. Should be able to get them on the list tomrrow or Friday at the latest, wanted to get these out now. I don't think anything is really that scary in here.
Probably not going to happen for 4.17.
As a general rule series that are not on the list before the merge window opens are not going to be make it. Linus discourages doing things at the last minute as it encourages bad behavior that reduces release quality.
I think that's a fine policy, and usually try to stick to that myself, but got behind this time by a couple days. To be fair though, I wouldn't call this last minute. There is still a week and a half of merge window. I do realize there is unfortunate timing with OFA falling in the middle of the merge window though.
In this specific case, Doug and I are both going to be at OFA next week so I'm not anticipating a 2nd pull request. I expect the only PR will go out on Friday.
I'll be there in Boulder as well so I'll see you guys then.
If any of these are super important for 4.17 (eg would go to -rc) then let me now and I can look at only those, and they need really good rc worthy commit messages.
That's fair, I can live with that. I'll sync up with my folks tomorrow but at this point I don't think any of this merit the -rc. The patches that are marked stable aren't really -rc material since we've lived with those issues for a while now. The core fix is in the same boat.
The 16B series can wait till 4.18. I do have one other series that fixes a pretty severe performance issue, I will post it when it's ready, but it's a bit more scary and I imagine will have to push off till 4.18 as well.
-Denny
On Wed, 2018-04-04 at 23:33 -0400, Dennis Dalessandro wrote:
I think that's a fine policy, and usually try to stick to that myself, but got behind this time by a couple days. To be fair though, I wouldn't call this last minute. There is still a week and a half of merge window. I do realize there is unfortunate timing with OFA falling in the middle of the merge window though.
Usually kernel maintainers are busy during the merge window with addressing merge conflicts and with analyzing and fixing regressions. I don't know any maintainer who accepts new patches during the merge window. Most maintainers get irritated if patches that are posted during the merge window and ignore such patches.
Bart.
On Wed, Apr 04, 2018 at 11:33:02PM -0400, Dennis Dalessandro wrote:
On 4/4/2018 11:09 PM, Jason Gunthorpe wrote:
On Wed, Apr 04, 2018 at 08:01:11PM -0700, Dennis Dalessandro wrote:
Hi Jason and Doug,
Here are most of our updates for 4.17. I will follow this up with a small 16B series then I still have a few more patches that are waiting on some more thorough testing. Should be able to get them on the list tomrrow or Friday at the latest, wanted to get these out now. I don't think anything is really that scary in here.
Probably not going to happen for 4.17.
As a general rule series that are not on the list before the merge window opens are not going to be make it. Linus discourages doing things at the last minute as it encourages bad behavior that reduces release quality.
I think that's a fine policy, and usually try to stick to that myself, but got behind this time by a couple days. To be fair though, I wouldn't call this last minute.
'last minute' was sending series during the rc6 time period.
After release is 'too late'
Jason
On Wed, Apr 04, 2018 at 08:01:11PM -0700, Dennis Dalessandro wrote:
Hi Jason and Doug,
Here are most of our updates for 4.17. I will follow this up with a small 16B series then I still have a few more patches that are waiting on some more thorough testing. Should be able to get them on the list tomrrow or Friday at the latest, wanted to get these out now. I don't think anything is really that scary in here.
These are mostly driver fixes. Patches 4,7,8,11, and 14 are marked stable. They didn't get sent for the -rc because they fix really old issues.
Patch 5 is a core fix. I should have sent it a bit sooner, sorry about that but it's pretty trivial so I decided to include it as well rather than wait for 4.18.
Alex Estrin (2): IB/hfi1: Complete check for locally terminated smp IB/{hfi1,qib}: Add handling of kernel restart
Ashutosh Dixit (1): IB/core: Fix rkey invalidation from user space into the kernel
Michael J. Ruhl (5): IB/hfi1: Return actual error value from program_rcvarray() IB/hfi1: Use after free race condition in send context error path IB/hfi1 Use correct type for num_user_context IB/hfi1: Return correct value for device state IB/hfi1: Reorder incorrect send context disable
Mike Marciniszyn (3): IB/hfi1: Fix handling of FECN marked multicast packet IB/hfi1: Fix fault injection init/exit issues IB/hfi1: Fix loss of BECN with AHG
Sebastian Sanchez (2): IB/hfi1: Prevent LNI hang when LCB can't obtain lanes IB/{hfi1,rdmavt,qib}: Fit kernel completions into single aligned cache-line
Many of these got comments so I'm just going to drop the series. Please rebase and resend anything you think should go to -rc or -next that doesn't interact with the patches being revised.
Also let me know if the next '16B' series is independent of this, if yes I'll drop it to..
Jason
On 4/17/2018 9:40 PM, Jason Gunthorpe wrote:
Many of these got comments so I'm just going to drop the series. Please rebase and resend anything you think should go to -rc or -next that doesn't interact with the patches being revised.
Already in the works. I am going to send a v2 with a few patches dropped and a couple reworked. So yes please just drop this in favor of a v2.
Also let me know if the next '16B' series is independent of this, if yes I'll drop it to..
You can drop this for now as well. We will rework and resubmit a v2 soon-ish.
-Denny
linux-stable-mirror@lists.linaro.org