October 2019 - Linux-stable-mirror

[PATCH for-rc 3/4] IB/hfi1: Calculate flow weight based on QP MTU for TID RDMA

by Dennis Dalessandro

From: Kaike Wan <kaike.wan(a)intel.com> For a TID RDMA WRITE request, a QP on the responder side could be put into a queue when a hardware flow is not available. A RNR NAK will be returned to the requester with a RNR timeout value based on the position of the QP in the queue. The tid_rdma_flow_wt variable is used to calculate the timeout value and is determined by using a MTU of 4096 at the module loading time. This could reduce the timeout value by half from the desired value, leading to excessive RNR retries. This patch fixes the issue by calculating the flow weight with the real MTU assigned to the QP. Fixes: 07b923701e38 ("IB/hfi1: Add functions to receive TID RDMA WRITE request") Cc: <stable(a)vger.kernel.org> Reviewed-by: Mike Marciniszyn <mike.marciniszyn(a)intel.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro(a)intel.com> Signed-off-by: Kaike Wan <kaike.wan(a)intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro(a)intel.com> --- drivers/infiniband/hw/hfi1/init.c | 1 - drivers/infiniband/hw/hfi1/tid_rdma.c | 13 +++++-------- drivers/infiniband/hw/hfi1/tid_rdma.h | 3 +-- 3 files changed, 6 insertions(+), 11 deletions(-) diff --git a/drivers/infiniband/hw/hfi1/init.c b/drivers/infiniband/hw/hfi1/init.c index 71cb952..26b792b 100644 --- a/drivers/infiniband/hw/hfi1/init.c +++ b/drivers/infiniband/hw/hfi1/init.c @@ -1489,7 +1489,6 @@ static int __init hfi1_mod_init(void) goto bail_dev; } - hfi1_compute_tid_rdma_flow_wt(); /* * These must be called before the driver is registered with * the PCI subsystem. diff --git a/drivers/infiniband/hw/hfi1/tid_rdma.c b/drivers/infiniband/hw/hfi1/tid_rdma.c index 73bc78b..e53f542 100644 --- a/drivers/infiniband/hw/hfi1/tid_rdma.c +++ b/drivers/infiniband/hw/hfi1/tid_rdma.c @@ -107,8 +107,6 @@ static u32 mask_generation(u32 a) * C - Capcode */ -static u32 tid_rdma_flow_wt; - static void tid_rdma_trigger_resume(struct work_struct *work); static void hfi1_kern_exp_rcv_free_flows(struct tid_rdma_request *req); static int hfi1_kern_exp_rcv_alloc_flows(struct tid_rdma_request *req, @@ -3388,18 +3386,17 @@ u32 hfi1_build_tid_rdma_write_req(struct rvt_qp *qp, struct rvt_swqe *wqe, return sizeof(ohdr->u.tid_rdma.w_req) / sizeof(u32); } -void hfi1_compute_tid_rdma_flow_wt(void) +static u32 hfi1_compute_tid_rdma_flow_wt(struct rvt_qp *qp) { /* * Heuristic for computing the RNR timeout when waiting on the flow * queue. Rather than a computationaly expensive exact estimate of when * a flow will be available, we assume that if a QP is at position N in * the flow queue it has to wait approximately (N + 1) * (number of - * segments between two sync points), assuming PMTU of 4K. The rationale - * for this is that flows are released and recycled at each sync point. + * segments between two sync points). The rationale for this is that + * flows are released and recycled at each sync point. */ - tid_rdma_flow_wt = MAX_TID_FLOW_PSN * enum_to_mtu(OPA_MTU_4096) / - TID_RDMA_MAX_SEGMENT_SIZE; + return (MAX_TID_FLOW_PSN * qp->pmtu) >> TID_RDMA_SEGMENT_SHIFT; } static u32 position_in_queue(struct hfi1_qp_priv *qpriv, @@ -3522,7 +3519,7 @@ static void hfi1_tid_write_alloc_resources(struct rvt_qp *qp, bool intr_ctx) if (qpriv->flow_state.index >= RXE_NUM_TID_FLOWS) { ret = hfi1_kern_setup_hw_flow(qpriv->rcd, qp); if (ret) { - to_seg = tid_rdma_flow_wt * + to_seg = hfi1_compute_tid_rdma_flow_wt(qp) * position_in_queue(qpriv, &rcd->flow_queue); break; diff --git a/drivers/infiniband/hw/hfi1/tid_rdma.h b/drivers/infiniband/hw/hfi1/tid_rdma.h index 1c53618..6e82df2 100644 --- a/drivers/infiniband/hw/hfi1/tid_rdma.h +++ b/drivers/infiniband/hw/hfi1/tid_rdma.h @@ -17,6 +17,7 @@ #define TID_RDMA_MIN_SEGMENT_SIZE BIT(18) /* 256 KiB (for now) */ #define TID_RDMA_MAX_SEGMENT_SIZE BIT(18) /* 256 KiB (for now) */ #define TID_RDMA_MAX_PAGES (BIT(18) >> PAGE_SHIFT) +#define TID_RDMA_SEGMENT_SHIFT 18 /* * Bit definitions for priv->s_flags. @@ -274,8 +275,6 @@ u32 hfi1_build_tid_rdma_write_req(struct rvt_qp *qp, struct rvt_swqe *wqe, struct ib_other_headers *ohdr, u32 *bth1, u32 *bth2, u32 *len); -void hfi1_compute_tid_rdma_flow_wt(void); - void hfi1_rc_rcv_tid_rdma_write_req(struct hfi1_packet *packet); u32 hfi1_build_tid_rdma_write_resp(struct rvt_qp *qp, struct rvt_ack_entry *e,

6 years, 1 month

1
0
0 0

[PATCH for-rc 2/4] IB/hfi1: Ensure r_tid_ack is valid before building TID RDMA ACK packet

by Dennis Dalessandro

From: Kaike Wan <kaike.wan(a)intel.com> The index r_tid_ack is used to indicate the next TID RDMA WRITE request to acknowledge in the ring s_ack_queue[] on the responder side and should be set to a valid index other than its initial value before r_tid_tail is advanced to the next TID RDMA WRITE request and particularly before a TID RDMA ACK is built. Otherwise, a NULL pointer dereference may result: [1372301.686240] BUG: unable to handle kernel paging request at ffff9a32d27abff8 [1372301.686270] IP: [<ffffffffc0d87ea6>] hfi1_make_tid_rdma_pkt+0x476/0xcb0 [hfi1] [1372301.686272] PGD 2749032067 PUD 0 [1372301.686273] Oops: 0000 1 SMP [1372301.686306] Modules linked in: osp(OE) ofd(OE) lfsck(OE) ost(OE) mgc(OE) osd_zfs(OE) lquota(OE) lustre(OE) lmv(OE) mdc(OE) lov(OE) fid(OE) fld(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) ib_ipoib(OE) hfi1(OE) rdmavt(OE) nfsv3 nfs_acl rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache ib_isert iscsi_target_mod target_core_mod ib_ucm dm_mirror dm_region_hash dm_log mlx5_ib dm_mod zfs(POE) rpcrdma sunrpc rdma_ucm ib_uverbs opa_vnic ib_iser zunicode(POE) ib_umad zavl(POE) icp(POE) sb_edac intel_powerclamp coretemp rdma_cm intel_rapl iosf_mbi iw_cm libiscsi scsi_transport_iscsi kvm ib_cm iTCO_wdt mxm_wmi iTCO_vendor_support irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd zcommon(POE) znvpair(POE) pcspkr spl(OE) mei_me [1372301.686322] sg mei ioatdma lpc_ich joydev i2c_i801 shpchp ipmi_si ipmi_devintf ipmi_msghandler wmi acpi_power_meter ip_tables xfs libcrc32c sd_mod crc_t10dif crct10dif_generic mgag200 mlx5_core drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ixgbe ahci ttm mlxfw ib_core libahci devlink mdio crct10dif_pclmul crct10dif_common drm ptp libata megaraid_sas crc32c_intel i2c_algo_bit pps_core i2c_core dca [last unloaded: rdmavt] [1372301.686326] CPU: 15 PID: 68691 Comm: kworker/15:2H Kdump: loaded Tainted: P W OE ------------ 3.10.0-862.2.3.el7_lustre.x86_64 #1 [1372301.686327] Hardware name: Intel Corporation S2600WTT/S2600WTT, BIOS SE5C610.86B.01.01.0016.033120161139 03/31/2016 [1372301.686341] Workqueue: hfi0_0 _hfi1_do_tid_send [hfi1] [1372301.686342] task: ffff9a01f47faf70 ti: ffff9a11776a8000 task.ti: ffff9a11776a8000 [1372301.686355] RIP: 0010:[<ffffffffc0d87ea6>] [<ffffffffc0d87ea6>] hfi1_make_tid_rdma_pkt+0x476/0xcb0 [hfi1] [1372301.686356] RSP: 0018:ffff9a11776abd08 EFLAGS: 00010002 [1372301.686357] RAX: ffff9a32d27abfc0 RBX: ffff99f2d27aa000 RCX: 00000000ffffffff [1372301.686358] RDX: 0000000000000000 RSI: 0000000000000220 RDI: ffff99f2ffc05300 [1372301.686359] RBP: ffff9a11776abd88 R08: 000000000001c310 R09: ffffffffc0d87ad4 [1372301.686359] R10: 0000000000000000 R11: 0000000000000000 R12: ffff9a117a423c00 [1372301.686360] R13: ffff9a117a423c00 R14: ffff9a03500c0000 R15: ffff9a117a423cb8 [1372301.686361] FS: 0000000000000000(0000) GS:ffff9a117e9c0000(0000) knlGS:0000000000000000 [1372301.686362] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [1372301.686363] CR2: ffff9a32d27abff8 CR3: 0000002748a0e000 CR4: 00000000001607e0 [1372301.686363] Call Trace: [1372301.686378] [<ffffffffc0d88874>] _hfi1_do_tid_send+0x194/0x320 [hfi1] [1372301.686383] [<ffffffffaf0b2dff>] process_one_work+0x17f/0x440 [1372301.686385] [<ffffffffaf0b3ac6>] worker_thread+0x126/0x3c0 [1372301.686386] [<ffffffffaf0b39a0>] ? manage_workers.isra.24+0x2a0/0x2a0 [1372301.686389] [<ffffffffaf0bae31>] kthread+0xd1/0xe0 [1372301.686391] [<ffffffffaf0bad60>] ? insert_kthread_work+0x40/0x40 [1372301.686394] [<ffffffffaf71f5f7>] ret_from_fork_nospec_begin+0x21/0x21 [1372301.686396] [<ffffffffaf0bad60>] ? insert_kthread_work+0x40/0x40 [1372301.686398] hfi1 0000:05:00.0: hfi1_0: reserved_op: opcode 0xf2, slot 2, rsv_used 1, rsv_ops 1 [1372301.686414] Code: 00 00 41 8b 8d d8 02 00 00 89 c8 48 89 45 b0 48 c1 65 b0 06 48 8b 83 a0 01 00 00 48 01 45 b0 48 8b 45 b0 41 80 bd 10 03 00 00 00 <48> 8b 50 38 4c 8d 7a 50 74 45 8b b2 d0 00 00 00 85 f6 0f 85 72 [1372301.686426] RIP [<ffffffffc0d87ea6>] hfi1_make_tid_rdma_pkt+0x476/0xcb0 [hfi1] [1372301.686427] RSP <ffff9a11776abd08> [1372301.686427] CR2: ffff9a32d27abff8 This problem can happen if a RESYNC request is received before r_tid_ack is modified. This patch fixes the issue by making sure that r_tid_ack is set to a valid value before a TID RDMA ACK is built. Functions are defined to simplify the code. Fixes: 07b923701e38 ("IB/hfi1: Add functions to receive TID RDMA WRITE request") Fixes: 7cf0ad679de4 ("IB/hfi1: Add a function to receive TID RDMA RESYNC packet") Cc: <stable(a)vger.kernel.org> Reviewed-by: Mike Marciniszyn <mike.marciniszyn(a)intel.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro(a)intel.com> Signed-off-by: Kaike Wan <kaike.wan(a)intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro(a)intel.com> --- drivers/infiniband/hw/hfi1/tid_rdma.c | 44 ++++++++++++++++++++------------- 1 file changed, 27 insertions(+), 17 deletions(-) diff --git a/drivers/infiniband/hw/hfi1/tid_rdma.c b/drivers/infiniband/hw/hfi1/tid_rdma.c index f21fca3..73bc78b 100644 --- a/drivers/infiniband/hw/hfi1/tid_rdma.c +++ b/drivers/infiniband/hw/hfi1/tid_rdma.c @@ -136,6 +136,26 @@ static void update_r_next_psn_fecn(struct hfi1_packet *packet, struct tid_rdma_flow *flow, bool fecn); +static void validate_r_tid_ack(struct hfi1_qp_priv *priv) +{ + if (priv->r_tid_ack == HFI1_QP_WQE_INVALID) + priv->r_tid_ack = priv->r_tid_tail; +} + +static void tid_rdma_schedule_ack(struct rvt_qp *qp) +{ + struct hfi1_qp_priv *priv = qp->priv; + + priv->s_flags |= RVT_S_ACK_PENDING; + hfi1_schedule_tid_send(qp); +} + +static void tid_rdma_trigger_ack(struct rvt_qp *qp) +{ + validate_r_tid_ack(qp->priv); + tid_rdma_schedule_ack(qp); +} + static u64 tid_rdma_opfn_encode(struct tid_rdma_params *p) { return @@ -3005,10 +3025,7 @@ bool hfi1_handle_kdeth_eflags(struct hfi1_ctxtdata *rcd, qpriv->s_nak_state = IB_NAK_PSN_ERROR; /* We are NAK'ing the next expected PSN */ qpriv->s_nak_psn = mask_psn(flow->flow_state.r_next_psn); - qpriv->s_flags |= RVT_S_ACK_PENDING; - if (qpriv->r_tid_ack == HFI1_QP_WQE_INVALID) - qpriv->r_tid_ack = qpriv->r_tid_tail; - hfi1_schedule_tid_send(qp); + tid_rdma_trigger_ack(qp); } goto unlock; } @@ -3526,7 +3543,7 @@ static void hfi1_tid_write_alloc_resources(struct rvt_qp *qp, bool intr_ctx) /* * If overtaking req->acked_tail, send an RNR NAK. Because the * QP is not queued in this case, and the issue can only be - * caused due a delay in scheduling the second leg which we + * caused by a delay in scheduling the second leg which we * cannot estimate, we use a rather arbitrary RNR timeout of * (MAX_FLOWS / 2) segments */ @@ -3534,8 +3551,7 @@ static void hfi1_tid_write_alloc_resources(struct rvt_qp *qp, bool intr_ctx) MAX_FLOWS)) { ret = -EAGAIN; to_seg = MAX_FLOWS >> 1; - qpriv->s_flags |= RVT_S_ACK_PENDING; - hfi1_schedule_tid_send(qp); + tid_rdma_trigger_ack(qp); break; } @@ -4335,8 +4351,7 @@ void hfi1_rc_rcv_tid_rdma_write_data(struct hfi1_packet *packet) trace_hfi1_tid_req_rcv_write_data(qp, 0, e->opcode, e->psn, e->lpsn, req); trace_hfi1_tid_write_rsp_rcv_data(qp); - if (priv->r_tid_ack == HFI1_QP_WQE_INVALID) - priv->r_tid_ack = priv->r_tid_tail; + validate_r_tid_ack(priv); if (opcode == TID_OP(WRITE_DATA_LAST)) { release_rdma_sge_mr(e); @@ -4375,8 +4390,7 @@ void hfi1_rc_rcv_tid_rdma_write_data(struct hfi1_packet *packet) } done: - priv->s_flags |= RVT_S_ACK_PENDING; - hfi1_schedule_tid_send(qp); + tid_rdma_schedule_ack(qp); exit: priv->r_next_psn_kdeth = flow->flow_state.r_next_psn; if (fecn) @@ -4388,10 +4402,7 @@ void hfi1_rc_rcv_tid_rdma_write_data(struct hfi1_packet *packet) if (!priv->s_nak_state) { priv->s_nak_state = IB_NAK_PSN_ERROR; priv->s_nak_psn = flow->flow_state.r_next_psn; - priv->s_flags |= RVT_S_ACK_PENDING; - if (priv->r_tid_ack == HFI1_QP_WQE_INVALID) - priv->r_tid_ack = priv->r_tid_tail; - hfi1_schedule_tid_send(qp); + tid_rdma_trigger_ack(qp); } goto done; } @@ -4939,8 +4950,7 @@ void hfi1_rc_rcv_tid_rdma_resync(struct hfi1_packet *packet) qpriv->resync = true; /* RESYNC request always gets a TID RDMA ACK. */ qpriv->s_nak_state = 0; - qpriv->s_flags |= RVT_S_ACK_PENDING; - hfi1_schedule_tid_send(qp); + tid_rdma_trigger_ack(qp); bail: if (fecn) qp->s_flags |= RVT_S_ECN;

6 years, 1 month

1
0
0 0

[PATCH] drm/i915: Fix PCH reference clock for FDI on HSW/BDW

by Ville Syrjala

From: Ville Syrjälä <ville.syrjala(a)linux.intel.com> The change to skip the PCH reference initialization during fastboot did end up breaking FDI. To fix that let's try to do the PCH reference init whenever we're disabling a DPLL that was using said reference previously. Cc: stable(a)vger.kernel.org Tested-by: Andrija <akijo97(a)gmail.com> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=112084 Fixes: b16c7ed95caf ("drm/i915: Do not touch the PCH SSC reference if a PLL is using it") Signed-off-by: Ville Syrjälä <ville.syrjala(a)linux.intel.com> --- drivers/gpu/drm/i915/display/intel_display.c | 11 ++++++----- drivers/gpu/drm/i915/display/intel_dpll_mgr.c | 15 +++++++++++++++ drivers/gpu/drm/i915/i915_drv.h | 2 ++ 3 files changed, 23 insertions(+), 5 deletions(-) diff --git a/drivers/gpu/drm/i915/display/intel_display.c b/drivers/gpu/drm/i915/display/intel_display.c index 236fdf122e47..da76f794a965 100644 --- a/drivers/gpu/drm/i915/display/intel_display.c +++ b/drivers/gpu/drm/i915/display/intel_display.c @@ -9359,7 +9359,6 @@ static bool wrpll_uses_pch_ssc(struct drm_i915_private *dev_priv, static void lpt_init_pch_refclk(struct drm_i915_private *dev_priv) { struct intel_encoder *encoder; - bool pch_ssc_in_use = false; bool has_fdi = false; for_each_intel_encoder(&dev_priv->drm, encoder) { @@ -9387,22 +9386,24 @@ static void lpt_init_pch_refclk(struct drm_i915_private *dev_priv) * clock hierarchy. That would also allow us to do * clock bending finally. */ + dev_priv->pch_ssc_use = 0; + if (spll_uses_pch_ssc(dev_priv)) { DRM_DEBUG_KMS("SPLL using PCH SSC\n"); - pch_ssc_in_use = true; + dev_priv->pch_ssc_use |= BIT(DPLL_ID_SPLL); } if (wrpll_uses_pch_ssc(dev_priv, DPLL_ID_WRPLL1)) { DRM_DEBUG_KMS("WRPLL1 using PCH SSC\n"); - pch_ssc_in_use = true; + dev_priv->pch_ssc_use |= BIT(DPLL_ID_WRPLL1); } if (wrpll_uses_pch_ssc(dev_priv, DPLL_ID_WRPLL2)) { DRM_DEBUG_KMS("WRPLL2 using PCH SSC\n"); - pch_ssc_in_use = true; + dev_priv->pch_ssc_use |= BIT(DPLL_ID_WRPLL2); } - if (pch_ssc_in_use) + if (dev_priv->pch_ssc_use) return; if (has_fdi) { diff --git a/drivers/gpu/drm/i915/display/intel_dpll_mgr.c b/drivers/gpu/drm/i915/display/intel_dpll_mgr.c index ec10fa7d3c69..3ce0a023eee0 100644 --- a/drivers/gpu/drm/i915/display/intel_dpll_mgr.c +++ b/drivers/gpu/drm/i915/display/intel_dpll_mgr.c @@ -526,16 +526,31 @@ static void hsw_ddi_wrpll_disable(struct drm_i915_private *dev_priv, val = I915_READ(WRPLL_CTL(id)); I915_WRITE(WRPLL_CTL(id), val & ~WRPLL_PLL_ENABLE); POSTING_READ(WRPLL_CTL(id)); + + /* + * Try to set up the PCH reference clock once all DPLLs + * that depend on it have been shut down. + */ + if (dev_priv->pch_ssc_use & BIT(id)) + intel_init_pch_refclk(dev_priv); } static void hsw_ddi_spll_disable(struct drm_i915_private *dev_priv, struct intel_shared_dpll *pll) { + enum intel_dpll_id id = pll->info->id; u32 val; val = I915_READ(SPLL_CTL); I915_WRITE(SPLL_CTL, val & ~SPLL_PLL_ENABLE); POSTING_READ(SPLL_CTL); + + /* + * Try to set up the PCH reference clock once all DPLLs + * that depend on it have been shut down. + */ + if (dev_priv->pch_ssc_use & BIT(id)) + intel_init_pch_refclk(dev_priv); } static bool hsw_ddi_wrpll_get_hw_state(struct drm_i915_private *dev_priv, diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index 8882c0908c3b..5332825e0ce4 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -1348,6 +1348,8 @@ struct drm_i915_private { } contexts; } gem; + u8 pch_ssc_use; + /* For i915gm/i945gm vblank irq workaround */ u8 vblank_enabled; -- 2.21.0

6 years, 1 month

3
2
0 0

[PATCH v2 11/11] zfcp: trace channel log even for FCP command responses

by Benjamin Block

From: Steffen Maier <maier(a)linux.ibm.com> While v2.6.26 commit b75db73159cc ("[SCSI] zfcp: Add qtcb dump to hba debug trace") is right that we don't want to flood the (payload) trace ring buffer, we don't trace successful FCP command responses by default. So we can include the channel log for problem determination with failed responses of any FSF request type. Signed-off-by: Steffen Maier <maier(a)linux.ibm.com> Reviewed-by: Benjamin Block <bblock(a)linux.ibm.com> Fixes: b75db73159cc ("[SCSI] zfcp: Add qtcb dump to hba debug trace") Fixes: a54ca0f62f95 ("[SCSI] zfcp: Redesign of the debug tracing for HBA records.") Cc: <stable(a)vger.kernel.org> #2.6.38+ Signed-off-by: Benjamin Block <bblock(a)linux.ibm.com> --- drivers/s390/scsi/zfcp_dbf.c | 8 +++----- 1 file changed, 3 insertions(+), 5 deletions(-) diff --git a/drivers/s390/scsi/zfcp_dbf.c b/drivers/s390/scsi/zfcp_dbf.c index dccdb41bed8c..1234294700c4 100644 --- a/drivers/s390/scsi/zfcp_dbf.c +++ b/drivers/s390/scsi/zfcp_dbf.c @@ -95,11 +95,9 @@ void zfcp_dbf_hba_fsf_res(char *tag, int level, struct zfcp_fsf_req *req) memcpy(rec->u.res.fsf_status_qual, &q_head->fsf_status_qual, FSF_STATUS_QUALIFIER_SIZE); - if (q_head->fsf_command != FSF_QTCB_FCP_CMND) { - rec->pl_len = q_head->log_length; - zfcp_dbf_pl_write(dbf, (char *)q_pref + q_head->log_start, - rec->pl_len, "fsf_res", req->req_id); - } + rec->pl_len = q_head->log_length; + zfcp_dbf_pl_write(dbf, (char *)q_pref + q_head->log_start, + rec->pl_len, "fsf_res", req->req_id); debug_event(dbf->hba, level, rec, sizeof(*rec)); spin_unlock_irqrestore(&dbf->hba_lock, flags); -- 2.21.0

6 years, 1 month

1
0
0 0

[PATCH AUTOSEL 5.3 01/33] net: ipv6: fix listify ip6_rcv_finish in case of forwarding

by Sasha Levin

From: Xin Long <lucien.xin(a)gmail.com> [ Upstream commit c7a42eb49212f93a800560662d17d5293960d3c3 ] We need a similar fix for ipv6 as Commit 0761680d5215 ("net: ipv4: fix listify ip_rcv_finish in case of forwarding") does for ipv4. This issue can be reprocuded by syzbot since Commit 323ebb61e32b ("net: use listified RX for handling GRO_NORMAL skbs") on net-next. The call trace was: kernel BUG at include/linux/skbuff.h:2225! RIP: 0010:__skb_pull include/linux/skbuff.h:2225 [inline] RIP: 0010:skb_pull+0xea/0x110 net/core/skbuff.c:1902 Call Trace: sctp_inq_pop+0x2f1/0xd80 net/sctp/inqueue.c:202 sctp_endpoint_bh_rcv+0x184/0x8d0 net/sctp/endpointola.c:385 sctp_inq_push+0x1e4/0x280 net/sctp/inqueue.c:80 sctp_rcv+0x2807/0x3590 net/sctp/input.c:256 sctp6_rcv+0x17/0x30 net/sctp/ipv6.c:1049 ip6_protocol_deliver_rcu+0x2fe/0x1660 net/ipv6/ip6_input.c:397 ip6_input_finish+0x84/0x170 net/ipv6/ip6_input.c:438 NF_HOOK include/linux/netfilter.h:305 [inline] NF_HOOK include/linux/netfilter.h:299 [inline] ip6_input+0xe4/0x3f0 net/ipv6/ip6_input.c:447 dst_input include/net/dst.h:442 [inline] ip6_sublist_rcv_finish+0x98/0x1e0 net/ipv6/ip6_input.c:84 ip6_list_rcv_finish net/ipv6/ip6_input.c:118 [inline] ip6_sublist_rcv+0x80c/0xcf0 net/ipv6/ip6_input.c:282 ipv6_list_rcv+0x373/0x4b0 net/ipv6/ip6_input.c:316 __netif_receive_skb_list_ptype net/core/dev.c:5049 [inline] __netif_receive_skb_list_core+0x5fc/0x9d0 net/core/dev.c:5097 __netif_receive_skb_list net/core/dev.c:5149 [inline] netif_receive_skb_list_internal+0x7eb/0xe60 net/core/dev.c:5244 gro_normal_list.part.0+0x1e/0xb0 net/core/dev.c:5757 gro_normal_list net/core/dev.c:5755 [inline] gro_normal_one net/core/dev.c:5769 [inline] napi_frags_finish net/core/dev.c:5782 [inline] napi_gro_frags+0xa6a/0xea0 net/core/dev.c:5855 tun_get_user+0x2e98/0x3fa0 drivers/net/tun.c:1974 tun_chr_write_iter+0xbd/0x156 drivers/net/tun.c:2020 Fixes: d8269e2cbf90 ("net: ipv6: listify ipv6_rcv() and ip6_rcv_finish()") Fixes: 323ebb61e32b ("net: use listified RX for handling GRO_NORMAL skbs") Reported-by: syzbot+eb349eeee854e389c36d(a)syzkaller.appspotmail.com Reported-by: syzbot+4a0643a653ac375612d1(a)syzkaller.appspotmail.com Signed-off-by: Xin Long <lucien.xin(a)gmail.com> Acked-by: Edward Cree <ecree(a)solarflare.com> Signed-off-by: David S. Miller <davem(a)davemloft.net> Signed-off-by: Sasha Levin <sashal(a)kernel.org> --- net/ipv6/ip6_input.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/net/ipv6/ip6_input.c b/net/ipv6/ip6_input.c index a593aaf257483..2bb0b66181a74 100644 --- a/net/ipv6/ip6_input.c +++ b/net/ipv6/ip6_input.c @@ -80,8 +80,10 @@ static void ip6_sublist_rcv_finish(struct list_head *head) { struct sk_buff *skb, *next; - list_for_each_entry_safe(skb, next, head, list) + list_for_each_entry_safe(skb, next, head, list) { + skb_list_del_init(skb); dst_input(skb); + } } static void ip6_list_rcv_finish(struct net *net, struct sock *sk, -- 2.20.1

6 years, 1 month

1
33
0 0

[v4 PATCH] mm: thp: handle page cache THP correctly in PageTransCompoundMap

by Yang Shi

We have usecase to use tmpfs as QEMU memory backend and we would like to take the advantage of THP as well. But, our test shows the EPT is not PMD mapped even though the underlying THP are PMD mapped on host. The number showed by /sys/kernel/debug/kvm/largepage is much less than the number of PMD mapped shmem pages as the below: 7f2778200000-7f2878200000 rw-s 00000000 00:14 262232 /dev/shm/qemu_back_mem.mem.Hz2hSf (deleted) Size: 4194304 kB [snip] AnonHugePages: 0 kB ShmemPmdMapped: 579584 kB [snip] Locked: 0 kB cat /sys/kernel/debug/kvm/largepages 12 And some benchmarks do worse than with anonymous THPs. By digging into the code we figured out that commit 127393fbe597 ("mm: thp: kvm: fix memory corruption in KVM with THP enabled") checks if there is a single PTE mapping on the page for anonymous THP when setting up EPT map. But, the _mapcount < 0 check doesn't fit to page cache THP since every subpage of page cache THP would get _mapcount inc'ed once it is PMD mapped, so PageTransCompoundMap() always returns false for page cache THP. This would prevent KVM from setting up PMD mapped EPT entry. So we need handle page cache THP correctly. However, when page cache THP's PMD gets split, kernel just remove the map instead of setting up PTE map like what anonymous THP does. Before KVM calls get_user_pages() the subpages may get PTE mapped even though it is still a THP since the page cache THP may be mapped by other processes at the mean time. Checking its _mapcount and whether the THP has PTE mapped or not. Although this may report some false negative cases (PTE mapped by other processes), it looks not trivial to make this accurate. With this fix /sys/kernel/debug/kvm/largepage would show reasonable pages are PMD mapped by EPT as the below: 7fbeaee00000-7fbfaee00000 rw-s 00000000 00:14 275464 /dev/shm/qemu_back_mem.mem.SKUvat (deleted) Size: 4194304 kB [snip] AnonHugePages: 0 kB ShmemPmdMapped: 557056 kB [snip] Locked: 0 kB cat /sys/kernel/debug/kvm/largepages 271 And the benchmarks are as same as anonymous THPs. Fixes: dd78fedde4b9 ("rmap: support file thp") Signed-off-by: Yang Shi <yang.shi(a)linux.alibaba.com> Reported-by: Gang Deng <gavin.dg(a)linux.alibaba.com> Tested-by: Gang Deng <gavin.dg(a)linux.alibaba.com> Suggested-by: Hugh Dickins <hughd(a)google.com> Cc: Andrea Arcangeli <aarcange(a)redhat.com> Cc: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com> Cc: Matthew Wilcox <willy(a)infradead.org> Cc: <stable(a)vger.kernel.org> 4.8+ --- include/linux/mm.h | 5 ----- include/linux/mm_types.h | 5 +++++ include/linux/page-flags.h | 20 ++++++++++++++++++-- 3 files changed, 23 insertions(+), 7 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index cc29227..a2adf95 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -695,11 +695,6 @@ static inline void *kvcalloc(size_t n, size_t size, gfp_t flags) extern void kvfree(const void *addr); -static inline atomic_t *compound_mapcount_ptr(struct page *page) -{ - return &page[1].compound_mapcount; -} - static inline int compound_mapcount(struct page *page) { VM_BUG_ON_PAGE(!PageCompound(page), page); diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 2222fa7..270aa8f 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -221,6 +221,11 @@ struct page { #endif } _struct_page_alignment; +static inline atomic_t *compound_mapcount_ptr(struct page *page) +{ + return &page[1].compound_mapcount; +} + /* * Used for sizing the vmemmap region on some architectures */ diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h index f91cb88..1bf83c8 100644 --- a/include/linux/page-flags.h +++ b/include/linux/page-flags.h @@ -622,12 +622,28 @@ static inline int PageTransCompound(struct page *page) * * Unlike PageTransCompound, this is safe to be called only while * split_huge_pmd() cannot run from under us, like if protected by the - * MMU notifier, otherwise it may result in page->_mapcount < 0 false + * MMU notifier, otherwise it may result in page->_mapcount check false * positives. + * + * We have to treat page cache THP differently since every subpage of it + * would get _mapcount inc'ed once it is PMD mapped. But, it may be PTE + * mapped in the current process so comparing subpage's _mapcount to + * compound_mapcount to filter out PTE mapped case. */ static inline int PageTransCompoundMap(struct page *page) { - return PageTransCompound(page) && atomic_read(&page->_mapcount) < 0; + struct page *head; + + if (!PageTransCompound(page)) + return 0; + + if (PageAnon(page)) + return atomic_read(&page->_mapcount) < 0; + + head = compound_head(page); + /* File THP is PMD mapped and not PTE mapped */ + return atomic_read(&page->_mapcount) == + atomic_read(compound_mapcount_ptr(head)); } /* -- 1.8.3.1

6 years, 1 month

3
5
0 0

[PATCH AUTOSEL 4.9 01/20] PCI/ASPM: Do not initialize link state when aspm_disabled is set

by Sasha Levin

From: Patrick Talbert <ptalbert(a)redhat.com> [ Upstream commit 17c91487364fb33797ed84022564ee7544ac4945 ] Now that ASPM is configured for *all* PCIe devices at boot, a problem is seen with systems that set the FADT NO_ASPM bit. This bit indicates that the OS should not alter the ASPM state, but when pcie_aspm_init_link_state() runs it only checks for !aspm_support_enabled. This misses the ACPI_FADT_NO_ASPM case because that is setting aspm_disabled. The result is systems may hang at boot after 1302fcf; avoidable if they boot with pcie_aspm=off (sets !aspm_support_enabled). Fix this by having aspm_init_link_state() check for either !aspm_support_enabled or acpm_disabled. Link: https://bugzilla.kernel.org/show_bug.cgi?id=201001 Fixes: 1302fcf0d03e ("PCI: Configure *all* devices, not just hot-added ones") Signed-off-by: Patrick Talbert <ptalbert(a)redhat.com> Signed-off-by: Bjorn Helgaas <bhelgaas(a)google.com> Signed-off-by: Sasha Levin <sashal(a)kernel.org> --- drivers/pci/pcie/aspm.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c index b12fe65d07f82..f69a9180c5ffd 100644 --- a/drivers/pci/pcie/aspm.c +++ b/drivers/pci/pcie/aspm.c @@ -581,7 +581,7 @@ void pcie_aspm_init_link_state(struct pci_dev *pdev) struct pcie_link_state *link; int blacklist = !!pcie_aspm_sanity_check(pdev); - if (!aspm_support_enabled) + if (!aspm_support_enabled || aspm_disabled) return; if (pdev->link_state) -- 2.20.1

6 years, 1 month

2
21
0 0

[PATCH] hwrng: omap - Fix RNG wait loop timeout

by Sumit Garg

Existing RNG data read timeout is 200us but it doesn't cover EIP76 RNG data rate which takes approx. 700us to produce 16 bytes of output data as per testing results. So configure the timeout as 1000us to also take account of lack of udelay()'s reliability. Fixes: 383212425c92 ("hwrng: omap - Add device variant for SafeXcel IP-76 found in Armada 8K") Cc: <stable(a)vger.kernel.org> Signed-off-by: Sumit Garg <sumit.garg(a)linaro.org> --- drivers/char/hw_random/omap-rng.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/drivers/char/hw_random/omap-rng.c b/drivers/char/hw_random/omap-rng.c index b27f396..e329f82 100644 --- a/drivers/char/hw_random/omap-rng.c +++ b/drivers/char/hw_random/omap-rng.c @@ -66,6 +66,13 @@ #define OMAP4_RNG_OUTPUT_SIZE 0x8 #define EIP76_RNG_OUTPUT_SIZE 0x10 +/* + * EIP76 RNG takes approx. 700us to produce 16 bytes of output data + * as per testing results. And to account for the lack of udelay()'s + * reliability, we keep the timeout as 1000us. + */ +#define RNG_DATA_FILL_TIMEOUT 100 + enum { RNG_OUTPUT_0_REG = 0, RNG_OUTPUT_1_REG, @@ -176,7 +183,7 @@ static int omap_rng_do_read(struct hwrng *rng, void *data, size_t max, if (max < priv->pdata->data_size) return 0; - for (i = 0; i < 20; i++) { + for (i = 0; i < RNG_DATA_FILL_TIMEOUT; i++) { present = priv->pdata->data_present(priv); if (present || !wait) break; -- 2.7.4

6 years, 1 month

3
3
0 0

[PATCH 3/3] usb: xhci: fix __le32/__le64 accessors in debugfs code

by Mathias Nyman

From: "Ben Dooks (Codethink)" <ben.dooks(a)codethink.co.uk> It looks like some of the xhci debug code is passing u32 to functions directly from __le32/__le64 fields. Fix this by using le{32,64}_to_cpu() on these to fix the following sparse warnings; xhci-debugfs.c:205:62: warning: incorrect type in argument 1 (different base types) xhci-debugfs.c:205:62: expected unsigned int [usertype] field0 xhci-debugfs.c:205:62: got restricted __le32 xhci-debugfs.c:206:62: warning: incorrect type in argument 2 (different base types) xhci-debugfs.c:206:62: expected unsigned int [usertype] field1 xhci-debugfs.c:206:62: got restricted __le32 ... [Trim down commit message, sparse warnings were similar -Mathias] Cc: <stable(a)vger.kernel.org> # 4.15+ Signed-off-by: Ben Dooks <ben.dooks(a)codethink.co.uk> Signed-off-by: Mathias Nyman <mathias.nyman(a)linux.intel.com> --- drivers/usb/host/xhci-debugfs.c | 24 ++++++++++++------------ 1 file changed, 12 insertions(+), 12 deletions(-) diff --git a/drivers/usb/host/xhci-debugfs.c b/drivers/usb/host/xhci-debugfs.c index 7ba6afc7ef23..76c3f29562d2 100644 --- a/drivers/usb/host/xhci-debugfs.c +++ b/drivers/usb/host/xhci-debugfs.c @@ -202,10 +202,10 @@ static void xhci_ring_dump_segment(struct seq_file *s, trb = &seg->trbs[i]; dma = seg->dma + i * sizeof(*trb); seq_printf(s, "%pad: %s\n", &dma, - xhci_decode_trb(trb->generic.field[0], - trb->generic.field[1], - trb->generic.field[2], - trb->generic.field[3])); + xhci_decode_trb(le32_to_cpu(trb->generic.field[0]), + le32_to_cpu(trb->generic.field[1]), + le32_to_cpu(trb->generic.field[2]), + le32_to_cpu(trb->generic.field[3]))); } } @@ -263,10 +263,10 @@ static int xhci_slot_context_show(struct seq_file *s, void *unused) xhci = hcd_to_xhci(bus_to_hcd(dev->udev->bus)); slot_ctx = xhci_get_slot_ctx(xhci, dev->out_ctx); seq_printf(s, "%pad: %s\n", &dev->out_ctx->dma, - xhci_decode_slot_context(slot_ctx->dev_info, - slot_ctx->dev_info2, - slot_ctx->tt_info, - slot_ctx->dev_state)); + xhci_decode_slot_context(le32_to_cpu(slot_ctx->dev_info), + le32_to_cpu(slot_ctx->dev_info2), + le32_to_cpu(slot_ctx->tt_info), + le32_to_cpu(slot_ctx->dev_state))); return 0; } @@ -286,10 +286,10 @@ static int xhci_endpoint_context_show(struct seq_file *s, void *unused) ep_ctx = xhci_get_ep_ctx(xhci, dev->out_ctx, dci); dma = dev->out_ctx->dma + dci * CTX_SIZE(xhci->hcc_params); seq_printf(s, "%pad: %s\n", &dma, - xhci_decode_ep_context(ep_ctx->ep_info, - ep_ctx->ep_info2, - ep_ctx->deq, - ep_ctx->tx_info)); + xhci_decode_ep_context(le32_to_cpu(ep_ctx->ep_info), + le32_to_cpu(ep_ctx->ep_info2), + le64_to_cpu(ep_ctx->deq), + le32_to_cpu(ep_ctx->tx_info))); } return 0; -- 2.7.4

6 years, 1 month

1
0
0 0

[PATCH 2/3] usb: xhci: fix Immediate Data Transfer endianness

by Mathias Nyman

From: Samuel Holland <samuel(a)sholland.org> The arguments to queue_trb are always byteswapped to LE for placement in the ring, but this should not happen in the case of immediate data; the bytes copied out of transfer_buffer are already in the correct order. Add a complementary byteswap so the bytes end up in the ring correctly. This was observed on BE ppc64 with a "Texas Instruments TUSB73x0 SuperSpeed USB 3.0 xHCI Host Controller [104c:8241]" as a ch341 usb-serial adapter ("1a86:7523 QinHeng Electronics HL-340 USB-Serial adapter") always transmitting the same character (generally NUL) over the serial link regardless of the key pressed. Cc: <stable(a)vger.kernel.org> # 5.2+ Fixes: 33e39350ebd2 ("usb: xhci: add Immediate Data Transfer support") Signed-off-by: Samuel Holland <samuel(a)sholland.org> Signed-off-by: Mathias Nyman <mathias.nyman(a)linux.intel.com> --- drivers/usb/host/xhci-ring.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c index 85ceb43e3405..e7aab31fd9a5 100644 --- a/drivers/usb/host/xhci-ring.c +++ b/drivers/usb/host/xhci-ring.c @@ -3330,6 +3330,7 @@ int xhci_queue_bulk_tx(struct xhci_hcd *xhci, gfp_t mem_flags, if (xhci_urb_suitable_for_idt(urb)) { memcpy(&send_addr, urb->transfer_buffer, trb_buff_len); + le64_to_cpus(&send_addr); field |= TRB_IDT; } } @@ -3475,6 +3476,7 @@ int xhci_queue_ctrl_tx(struct xhci_hcd *xhci, gfp_t mem_flags, if (xhci_urb_suitable_for_idt(urb)) { memcpy(&addr, urb->transfer_buffer, urb->transfer_buffer_length); + le64_to_cpus(&addr); field |= TRB_IDT; } else { addr = (u64) urb->transfer_dma; -- 2.7.4

6 years, 1 month

1
0
0 0

2025

2024

2023

2022

2021

2020

2019

2018

2017

Linux-stable-mirror October 2019