Hi Doug and Jason,
Here are some patches to go to for-next. One is a code cleanup. The rest are bug fixes that are probably not serious enough for an -rc6. The one that may be on the fence is the isert patch. Since it only affects debug kernels it can probably even wait till for-next. It has been marked stable though.
---
Alex Estrin (1): IB/isert: Fix for lib/dma_debug check_sync warning
Kamenee Arumugam (1): IB/Hfi1: Mask Unsupported Request error bit in PCIe AER
Michael J. Ruhl (1): IB/hfi1: Set port number for errorinfo MAD response
Mike Marciniszyn (2): IB/hfi1: Cleanup of exp_rcv IB/{rdmavt,hfi1}; Change hrtimer add to use the pinned variation
drivers/infiniband/hw/hfi1/exp_rcv.c | 39 +++++++++++++++++++------------ drivers/infiniband/hw/hfi1/exp_rcv.h | 24 ++++++++++++++++++- drivers/infiniband/hw/hfi1/hfi.h | 14 ++++++----- drivers/infiniband/hw/hfi1/init.c | 4 +-- drivers/infiniband/hw/hfi1/mad.c | 1 + drivers/infiniband/hw/hfi1/pcie.c | 15 ++++++++++++ drivers/infiniband/hw/hfi1/rc.c | 2 +- drivers/infiniband/sw/rdmavt/qp.c | 2 +- drivers/infiniband/ulp/isert/ib_isert.c | 26 ++++++++++++++------- 9 files changed, 91 insertions(+), 36 deletions(-)
-- -Denny
From: Alex Estrin alex.estrin@intel.com
The following error message occurs on a target host in a debug build during session login:
[ 3524.411874] WARNING: CPU: 5 PID: 12063 at lib/dma-debug.c:1207 check_sync+0x4ec/0x5b0 [ 3524.421057] infiniband hfi1_0: DMA-API: device driver tries to sync DMA memory it has not allocated [device address=0x0000000000000000] [size=76 bytes] ......snip .....
[ 3524.535846] CPU: 5 PID: 12063 Comm: iscsi_np Kdump: loaded Not tainted 3.10.0-862.el7.x86_64.debug #1 [ 3524.546764] Hardware name: Dell Inc. PowerEdge R430/03XKDV, BIOS 1.2.6 06/08/2015 [ 3524.555740] Call Trace: [ 3524.559102] [<ffffffffa5fe915b>] dump_stack+0x19/0x1b [ 3524.565477] [<ffffffffa58a2f58>] __warn+0xd8/0x100 [ 3524.571557] [<ffffffffa58a2fdf>] warn_slowpath_fmt+0x5f/0x80 [ 3524.578610] [<ffffffffa5bf5b8c>] check_sync+0x4ec/0x5b0 [ 3524.585177] [<ffffffffa58efc3f>] ? set_cpus_allowed_ptr+0x5f/0x1c0 [ 3524.592812] [<ffffffffa5bf5cd0>] debug_dma_sync_single_for_cpu+0x80/0x90 [ 3524.601029] [<ffffffffa586add3>] ? x2apic_send_IPI_mask+0x13/0x20 [ 3524.608574] [<ffffffffa585ee1b>] ? native_smp_send_reschedule+0x5b/0x80 [ 3524.616699] [<ffffffffa58e9b76>] ? resched_curr+0xf6/0x140 [ 3524.623567] [<ffffffffc0879af0>] isert_create_send_desc.isra.26+0xe0/0x110 [ib_isert] [ 3524.633060] [<ffffffffc087af95>] isert_put_login_tx+0x55/0x8b0 [ib_isert] [ 3524.641383] [<ffffffffa58ef114>] ? try_to_wake_up+0x1a4/0x430 [ 3524.648561] [<ffffffffc098cfed>] iscsi_target_do_tx_login_io+0xdd/0x230 [iscsi_target_mod] [ 3524.658557] [<ffffffffc098d827>] iscsi_target_do_login+0x1a7/0x600 [iscsi_target_mod] [ 3524.668084] [<ffffffffa59f9bc9>] ? kstrdup+0x49/0x60 [ 3524.674420] [<ffffffffc098e976>] iscsi_target_start_negotiation+0x56/0xc0 [iscsi_target_mod] [ 3524.684656] [<ffffffffc098c2ee>] __iscsi_target_login_thread+0x90e/0x1070 [iscsi_target_mod] [ 3524.694901] [<ffffffffc098ca50>] ? __iscsi_target_login_thread+0x1070/0x1070 [iscsi_target_mod] [ 3524.705446] [<ffffffffc098ca50>] ? __iscsi_target_login_thread+0x1070/0x1070 [iscsi_target_mod] [ 3524.715976] [<ffffffffc098ca78>] iscsi_target_login_thread+0x28/0x60 [iscsi_target_mod] [ 3524.725739] [<ffffffffa58d60ff>] kthread+0xef/0x100 [ 3524.732007] [<ffffffffa58d6010>] ? insert_kthread_work+0x80/0x80 [ 3524.739540] [<ffffffffa5fff1b7>] ret_from_fork_nospec_begin+0x21/0x21 [ 3524.747558] [<ffffffffa58d6010>] ? insert_kthread_work+0x80/0x80 [ 3524.755088] ---[ end trace 23f8bf9238bd1ed8 ]--- [ 3595.510822] iSCSI/iqn.1994-05.com.redhat:537fa56299: Unsupported SCSI Opcode 0xa3, sending CHECK_CONDITION.
The code calls dma_sync on login_tx_desc->dma_addr prior to initializing it with dma-mapped address. login_tx_desc is a part of iser_conn structure and is used only once during login negotiation, so the issue is fixed by eliminating dma_sync call for this buffer using a special case routine.
Cc: stable@vger.kernel.org Reviewed-by: Mike Marciniszyn mike.marciniszyn@intel.com Reviewed-by: Don Dutile ddutile@redhat.com Signed-off-by: Alex Estrin alex.estrin@intel.com Signed-off-by: Dennis Dalessandro dennis.dalessandro@intel.com --- drivers/infiniband/ulp/isert/ib_isert.c | 26 +++++++++++++++++--------- 1 files changed, 17 insertions(+), 9 deletions(-)
diff --git a/drivers/infiniband/ulp/isert/ib_isert.c b/drivers/infiniband/ulp/isert/ib_isert.c index fff40b0..6a55b87 100644 --- a/drivers/infiniband/ulp/isert/ib_isert.c +++ b/drivers/infiniband/ulp/isert/ib_isert.c @@ -886,15 +886,9 @@ }
static void -isert_create_send_desc(struct isert_conn *isert_conn, - struct isert_cmd *isert_cmd, - struct iser_tx_desc *tx_desc) +__isert_create_send_desc(struct isert_device *device, + struct iser_tx_desc *tx_desc) { - struct isert_device *device = isert_conn->device; - struct ib_device *ib_dev = device->ib_device; - - ib_dma_sync_single_for_cpu(ib_dev, tx_desc->dma_addr, - ISER_HEADERS_LEN, DMA_TO_DEVICE);
memset(&tx_desc->iser_header, 0, sizeof(struct iser_ctrl)); tx_desc->iser_header.flags = ISCSI_CTRL; @@ -907,6 +901,20 @@ } }
+static void +isert_create_send_desc(struct isert_conn *isert_conn, + struct isert_cmd *isert_cmd, + struct iser_tx_desc *tx_desc) +{ + struct isert_device *device = isert_conn->device; + struct ib_device *ib_dev = device->ib_device; + + ib_dma_sync_single_for_cpu(ib_dev, tx_desc->dma_addr, + ISER_HEADERS_LEN, DMA_TO_DEVICE); + + __isert_create_send_desc(device, tx_desc); +} + static int isert_init_tx_hdrs(struct isert_conn *isert_conn, struct iser_tx_desc *tx_desc) @@ -994,7 +1002,7 @@ struct iser_tx_desc *tx_desc = &isert_conn->login_tx_desc; int ret;
- isert_create_send_desc(isert_conn, NULL, tx_desc); + __isert_create_send_desc(device, tx_desc);
memcpy(&tx_desc->iscsi_header, &login->rsp[0], sizeof(struct iscsi_hdr));
On Tue, May 15, 2018 at 06:31:39PM -0700, Dennis Dalessandro wrote:
From: Alex Estrin alex.estrin@intel.com
The following error message occurs on a target host in a debug build during session login:
[ 3524.411874] WARNING: CPU: 5 PID: 12063 at lib/dma-debug.c:1207 check_sync+0x4ec/0x5b0 [ 3524.421057] infiniband hfi1_0: DMA-API: device driver tries to sync DMA memory it has not allocated [device address=0x0000000000000000] [size=76 bytes] ......snip .....
[ 3524.535846] CPU: 5 PID: 12063 Comm: iscsi_np Kdump: loaded Not tainted 3.10.0-862.el7.x86_64.debug #1 [ 3524.546764] Hardware name: Dell Inc. PowerEdge R430/03XKDV, BIOS 1.2.6 06/08/2015 [ 3524.555740] Call Trace: [ 3524.559102] [<ffffffffa5fe915b>] dump_stack+0x19/0x1b [ 3524.565477] [<ffffffffa58a2f58>] __warn+0xd8/0x100 [ 3524.571557] [<ffffffffa58a2fdf>] warn_slowpath_fmt+0x5f/0x80 [ 3524.578610] [<ffffffffa5bf5b8c>] check_sync+0x4ec/0x5b0 [ 3524.585177] [<ffffffffa58efc3f>] ? set_cpus_allowed_ptr+0x5f/0x1c0 [ 3524.592812] [<ffffffffa5bf5cd0>] debug_dma_sync_single_for_cpu+0x80/0x90 [ 3524.601029] [<ffffffffa586add3>] ? x2apic_send_IPI_mask+0x13/0x20 [ 3524.608574] [<ffffffffa585ee1b>] ? native_smp_send_reschedule+0x5b/0x80 [ 3524.616699] [<ffffffffa58e9b76>] ? resched_curr+0xf6/0x140 [ 3524.623567] [<ffffffffc0879af0>] isert_create_send_desc.isra.26+0xe0/0x110 [ib_isert] [ 3524.633060] [<ffffffffc087af95>] isert_put_login_tx+0x55/0x8b0 [ib_isert] [ 3524.641383] [<ffffffffa58ef114>] ? try_to_wake_up+0x1a4/0x430 [ 3524.648561] [<ffffffffc098cfed>] iscsi_target_do_tx_login_io+0xdd/0x230 [iscsi_target_mod] [ 3524.658557] [<ffffffffc098d827>] iscsi_target_do_login+0x1a7/0x600 [iscsi_target_mod] [ 3524.668084] [<ffffffffa59f9bc9>] ? kstrdup+0x49/0x60 [ 3524.674420] [<ffffffffc098e976>] iscsi_target_start_negotiation+0x56/0xc0 [iscsi_target_mod] [ 3524.684656] [<ffffffffc098c2ee>] __iscsi_target_login_thread+0x90e/0x1070 [iscsi_target_mod] [ 3524.694901] [<ffffffffc098ca50>] ? __iscsi_target_login_thread+0x1070/0x1070 [iscsi_target_mod] [ 3524.705446] [<ffffffffc098ca50>] ? __iscsi_target_login_thread+0x1070/0x1070 [iscsi_target_mod] [ 3524.715976] [<ffffffffc098ca78>] iscsi_target_login_thread+0x28/0x60 [iscsi_target_mod] [ 3524.725739] [<ffffffffa58d60ff>] kthread+0xef/0x100 [ 3524.732007] [<ffffffffa58d6010>] ? insert_kthread_work+0x80/0x80 [ 3524.739540] [<ffffffffa5fff1b7>] ret_from_fork_nospec_begin+0x21/0x21 [ 3524.747558] [<ffffffffa58d6010>] ? insert_kthread_work+0x80/0x80 [ 3524.755088] ---[ end trace 23f8bf9238bd1ed8 ]--- [ 3595.510822] iSCSI/iqn.1994-05.com.redhat:537fa56299: Unsupported SCSI Opcode 0xa3, sending CHECK_CONDITION.
The code calls dma_sync on login_tx_desc->dma_addr prior to initializing it with dma-mapped address. login_tx_desc is a part of iser_conn structure and is used only once during login negotiation, so the issue is fixed by eliminating dma_sync call for this buffer using a special case routine.
Cc: stable@vger.kernel.org Reviewed-by: Mike Marciniszyn mike.marciniszyn@intel.com Reviewed-by: Don Dutile ddutile@redhat.com Signed-off-by: Alex Estrin alex.estrin@intel.com Signed-off-by: Dennis Dalessandro dennis.dalessandro@intel.com
drivers/infiniband/ulp/isert/ib_isert.c | 26 +++++++++++++++++--------- 1 files changed, 17 insertions(+), 9 deletions(-)
This sounds like -rc material? Why is it for-next?
diff --git a/drivers/infiniband/ulp/isert/ib_isert.c b/drivers/infiniband/ulp/isert/ib_isert.c index fff40b0..6a55b87 100644 --- a/drivers/infiniband/ulp/isert/ib_isert.c +++ b/drivers/infiniband/ulp/isert/ib_isert.c @@ -886,15 +886,9 @@ } static void -isert_create_send_desc(struct isert_conn *isert_conn,
struct isert_cmd *isert_cmd,
struct iser_tx_desc *tx_desc)
+__isert_create_send_desc(struct isert_device *device,
struct iser_tx_desc *tx_desc)
{
- struct isert_device *device = isert_conn->device;
- struct ib_device *ib_dev = device->ib_device;
- ib_dma_sync_single_for_cpu(ib_dev, tx_desc->dma_addr,
ISER_HEADERS_LEN, DMA_TO_DEVICE);
memset(&tx_desc->iser_header, 0, sizeof(struct iser_ctrl)); tx_desc->iser_header.flags = ISCSI_CTRL; @@ -907,6 +901,20 @@ } } +static void +isert_create_send_desc(struct isert_conn *isert_conn,
struct isert_cmd *isert_cmd,
struct iser_tx_desc *tx_desc)
+{
- struct isert_device *device = isert_conn->device;
- struct ib_device *ib_dev = device->ib_device;
- ib_dma_sync_single_for_cpu(ib_dev, tx_desc->dma_addr,
ISER_HEADERS_LEN, DMA_TO_DEVICE);
- __isert_create_send_desc(device, tx_desc);
+}
static int isert_init_tx_hdrs(struct isert_conn *isert_conn, struct iser_tx_desc *tx_desc) @@ -994,7 +1002,7 @@ struct iser_tx_desc *tx_desc = &isert_conn->login_tx_desc; int ret;
- isert_create_send_desc(isert_conn, NULL, tx_desc);
- __isert_create_send_desc(device, tx_desc);
memcpy(&tx_desc->iscsi_header, &login->rsp[0], sizeof(struct iscsi_hdr));
On 05/16/2018 04:04 PM, Jason Gunthorpe wrote:
On Tue, May 15, 2018 at 06:31:39PM -0700, Dennis Dalessandro wrote:
From: Alex Estrin alex.estrin@intel.com
The following error message occurs on a target host in a debug build during session login:
[ 3524.411874] WARNING: CPU: 5 PID: 12063 at lib/dma-debug.c:1207 check_sync+0x4ec/0x5b0 [ 3524.421057] infiniband hfi1_0: DMA-API: device driver tries to sync DMA memory it has not allocated [device address=0x0000000000000000] [size=76 bytes] ......snip .....
[ 3524.535846] CPU: 5 PID: 12063 Comm: iscsi_np Kdump: loaded Not tainted 3.10.0-862.el7.x86_64.debug #1 [ 3524.546764] Hardware name: Dell Inc. PowerEdge R430/03XKDV, BIOS 1.2.6 06/08/2015 [ 3524.555740] Call Trace: [ 3524.559102] [<ffffffffa5fe915b>] dump_stack+0x19/0x1b [ 3524.565477] [<ffffffffa58a2f58>] __warn+0xd8/0x100 [ 3524.571557] [<ffffffffa58a2fdf>] warn_slowpath_fmt+0x5f/0x80 [ 3524.578610] [<ffffffffa5bf5b8c>] check_sync+0x4ec/0x5b0 [ 3524.585177] [<ffffffffa58efc3f>] ? set_cpus_allowed_ptr+0x5f/0x1c0 [ 3524.592812] [<ffffffffa5bf5cd0>] debug_dma_sync_single_for_cpu+0x80/0x90 [ 3524.601029] [<ffffffffa586add3>] ? x2apic_send_IPI_mask+0x13/0x20 [ 3524.608574] [<ffffffffa585ee1b>] ? native_smp_send_reschedule+0x5b/0x80 [ 3524.616699] [<ffffffffa58e9b76>] ? resched_curr+0xf6/0x140 [ 3524.623567] [<ffffffffc0879af0>] isert_create_send_desc.isra.26+0xe0/0x110 [ib_isert] [ 3524.633060] [<ffffffffc087af95>] isert_put_login_tx+0x55/0x8b0 [ib_isert] [ 3524.641383] [<ffffffffa58ef114>] ? try_to_wake_up+0x1a4/0x430 [ 3524.648561] [<ffffffffc098cfed>] iscsi_target_do_tx_login_io+0xdd/0x230 [iscsi_target_mod] [ 3524.658557] [<ffffffffc098d827>] iscsi_target_do_login+0x1a7/0x600 [iscsi_target_mod] [ 3524.668084] [<ffffffffa59f9bc9>] ? kstrdup+0x49/0x60 [ 3524.674420] [<ffffffffc098e976>] iscsi_target_start_negotiation+0x56/0xc0 [iscsi_target_mod] [ 3524.684656] [<ffffffffc098c2ee>] __iscsi_target_login_thread+0x90e/0x1070 [iscsi_target_mod] [ 3524.694901] [<ffffffffc098ca50>] ? __iscsi_target_login_thread+0x1070/0x1070 [iscsi_target_mod] [ 3524.705446] [<ffffffffc098ca50>] ? __iscsi_target_login_thread+0x1070/0x1070 [iscsi_target_mod] [ 3524.715976] [<ffffffffc098ca78>] iscsi_target_login_thread+0x28/0x60 [iscsi_target_mod] [ 3524.725739] [<ffffffffa58d60ff>] kthread+0xef/0x100 [ 3524.732007] [<ffffffffa58d6010>] ? insert_kthread_work+0x80/0x80 [ 3524.739540] [<ffffffffa5fff1b7>] ret_from_fork_nospec_begin+0x21/0x21 [ 3524.747558] [<ffffffffa58d6010>] ? insert_kthread_work+0x80/0x80 [ 3524.755088] ---[ end trace 23f8bf9238bd1ed8 ]--- [ 3595.510822] iSCSI/iqn.1994-05.com.redhat:537fa56299: Unsupported SCSI Opcode 0xa3, sending CHECK_CONDITION.
The code calls dma_sync on login_tx_desc->dma_addr prior to initializing it with dma-mapped address. login_tx_desc is a part of iser_conn structure and is used only once during login negotiation, so the issue is fixed by eliminating dma_sync call for this buffer using a special case routine.
Cc: stable@vger.kernel.org Reviewed-by: Mike Marciniszyn mike.marciniszyn@intel.com Reviewed-by: Don Dutile ddutile@redhat.com Signed-off-by: Alex Estrin alex.estrin@intel.com Signed-off-by: Dennis Dalessandro dennis.dalessandro@intel.com
drivers/infiniband/ulp/isert/ib_isert.c | 26 +++++++++++++++++--------- 1 files changed, 17 insertions(+), 9 deletions(-)
This sounds like -rc material? Why is it for-next?
diff --git a/drivers/infiniband/ulp/isert/ib_isert.c b/drivers/infiniband/ulp/isert/ib_isert.c index fff40b0..6a55b87 100644 --- a/drivers/infiniband/ulp/isert/ib_isert.c +++ b/drivers/infiniband/ulp/isert/ib_isert.c @@ -886,15 +886,9 @@ } static void -isert_create_send_desc(struct isert_conn *isert_conn,
struct isert_cmd *isert_cmd,
struct iser_tx_desc *tx_desc)
+__isert_create_send_desc(struct isert_device *device,
{struct iser_tx_desc *tx_desc)
- struct isert_device *device = isert_conn->device;
- struct ib_device *ib_dev = device->ib_device;
- ib_dma_sync_single_for_cpu(ib_dev, tx_desc->dma_addr,
ISER_HEADERS_LEN, DMA_TO_DEVICE);
memset(&tx_desc->iser_header, 0, sizeof(struct iser_ctrl)); tx_desc->iser_header.flags = ISCSI_CTRL; @@ -907,6 +901,20 @@ } } +static void +isert_create_send_desc(struct isert_conn *isert_conn,
struct isert_cmd *isert_cmd,
struct iser_tx_desc *tx_desc)
+{
- struct isert_device *device = isert_conn->device;
- struct ib_device *ib_dev = device->ib_device;
- ib_dma_sync_single_for_cpu(ib_dev, tx_desc->dma_addr,
ISER_HEADERS_LEN, DMA_TO_DEVICE);
- __isert_create_send_desc(device, tx_desc);
+}
- static int isert_init_tx_hdrs(struct isert_conn *isert_conn, struct iser_tx_desc *tx_desc)
@@ -994,7 +1002,7 @@ struct iser_tx_desc *tx_desc = &isert_conn->login_tx_desc; int ret;
- isert_create_send_desc(isert_conn, NULL, tx_desc);
- __isert_create_send_desc(device, tx_desc);
memcpy(&tx_desc->iscsi_header, &login->rsp[0], sizeof(struct iscsi_hdr));
It is an unnecessary synch at login, harmless if done, but causes the lib/dma-debug dma-synch checker to complain, which only occurs in the debug kernel.
e.g., RHEL is including it in it's next release, but not backporting to existing release (not doing a z-stream update to 7.5 release which took the multiple code chunks in to see this warning pop). Move along, nothing to see here... ;-)
-dd
On 5/16/2018 4:04 PM, Jason Gunthorpe wrote:
On Tue, May 15, 2018 at 06:31:39PM -0700, Dennis Dalessandro wrote:
From: Alex Estrin alex.estrin@intel.com
The following error message occurs on a target host in a debug build during session login:
[ 3524.411874] WARNING: CPU: 5 PID: 12063 at lib/dma-debug.c:1207 check_sync+0x4ec/0x5b0 [ 3524.421057] infiniband hfi1_0: DMA-API: device driver tries to sync DMA memory it has not allocated [device address=0x0000000000000000] [size=76 bytes] ......snip .....
[ 3524.535846] CPU: 5 PID: 12063 Comm: iscsi_np Kdump: loaded Not tainted 3.10.0-862.el7.x86_64.debug #1 [ 3524.546764] Hardware name: Dell Inc. PowerEdge R430/03XKDV, BIOS 1.2.6 06/08/2015 [ 3524.555740] Call Trace: [ 3524.559102] [<ffffffffa5fe915b>] dump_stack+0x19/0x1b [ 3524.565477] [<ffffffffa58a2f58>] __warn+0xd8/0x100 [ 3524.571557] [<ffffffffa58a2fdf>] warn_slowpath_fmt+0x5f/0x80 [ 3524.578610] [<ffffffffa5bf5b8c>] check_sync+0x4ec/0x5b0 [ 3524.585177] [<ffffffffa58efc3f>] ? set_cpus_allowed_ptr+0x5f/0x1c0 [ 3524.592812] [<ffffffffa5bf5cd0>] debug_dma_sync_single_for_cpu+0x80/0x90 [ 3524.601029] [<ffffffffa586add3>] ? x2apic_send_IPI_mask+0x13/0x20 [ 3524.608574] [<ffffffffa585ee1b>] ? native_smp_send_reschedule+0x5b/0x80 [ 3524.616699] [<ffffffffa58e9b76>] ? resched_curr+0xf6/0x140 [ 3524.623567] [<ffffffffc0879af0>] isert_create_send_desc.isra.26+0xe0/0x110 [ib_isert] [ 3524.633060] [<ffffffffc087af95>] isert_put_login_tx+0x55/0x8b0 [ib_isert] [ 3524.641383] [<ffffffffa58ef114>] ? try_to_wake_up+0x1a4/0x430 [ 3524.648561] [<ffffffffc098cfed>] iscsi_target_do_tx_login_io+0xdd/0x230 [iscsi_target_mod] [ 3524.658557] [<ffffffffc098d827>] iscsi_target_do_login+0x1a7/0x600 [iscsi_target_mod] [ 3524.668084] [<ffffffffa59f9bc9>] ? kstrdup+0x49/0x60 [ 3524.674420] [<ffffffffc098e976>] iscsi_target_start_negotiation+0x56/0xc0 [iscsi_target_mod] [ 3524.684656] [<ffffffffc098c2ee>] __iscsi_target_login_thread+0x90e/0x1070 [iscsi_target_mod] [ 3524.694901] [<ffffffffc098ca50>] ? __iscsi_target_login_thread+0x1070/0x1070 [iscsi_target_mod] [ 3524.705446] [<ffffffffc098ca50>] ? __iscsi_target_login_thread+0x1070/0x1070 [iscsi_target_mod] [ 3524.715976] [<ffffffffc098ca78>] iscsi_target_login_thread+0x28/0x60 [iscsi_target_mod] [ 3524.725739] [<ffffffffa58d60ff>] kthread+0xef/0x100 [ 3524.732007] [<ffffffffa58d6010>] ? insert_kthread_work+0x80/0x80 [ 3524.739540] [<ffffffffa5fff1b7>] ret_from_fork_nospec_begin+0x21/0x21 [ 3524.747558] [<ffffffffa58d6010>] ? insert_kthread_work+0x80/0x80 [ 3524.755088] ---[ end trace 23f8bf9238bd1ed8 ]--- [ 3595.510822] iSCSI/iqn.1994-05.com.redhat:537fa56299: Unsupported SCSI Opcode 0xa3, sending CHECK_CONDITION.
The code calls dma_sync on login_tx_desc->dma_addr prior to initializing it with dma-mapped address. login_tx_desc is a part of iser_conn structure and is used only once during login negotiation, so the issue is fixed by eliminating dma_sync call for this buffer using a special case routine.
Cc: stable@vger.kernel.org Reviewed-by: Mike Marciniszyn mike.marciniszyn@intel.com Reviewed-by: Don Dutile ddutile@redhat.com Signed-off-by: Alex Estrin alex.estrin@intel.com Signed-off-by: Dennis Dalessandro dennis.dalessandro@intel.com
drivers/infiniband/ulp/isert/ib_isert.c | 26 +++++++++++++++++--------- 1 files changed, 17 insertions(+), 9 deletions(-)
This sounds like -rc material? Why is it for-next?
Should something that only occurs in a debug build be considered -rc material?
-Denny
On Thu, May 17, 2018 at 11:03:49AM -0400, Dennis Dalessandro wrote:
On 5/16/2018 4:04 PM, Jason Gunthorpe wrote:
On Tue, May 15, 2018 at 06:31:39PM -0700, Dennis Dalessandro wrote:
From: Alex Estrin alex.estrin@intel.com
The following error message occurs on a target host in a debug build during session login:
[ 3524.411874] WARNING: CPU: 5 PID: 12063 at lib/dma-debug.c:1207 check_sync+0x4ec/0x5b0 [ 3524.421057] infiniband hfi1_0: DMA-API: device driver tries to sync DMA memory it has not allocated [device address=0x0000000000000000] [size=76 bytes] ......snip .....
[ 3524.535846] CPU: 5 PID: 12063 Comm: iscsi_np Kdump: loaded Not tainted 3.10.0-862.el7.x86_64.debug #1 [ 3524.546764] Hardware name: Dell Inc. PowerEdge R430/03XKDV, BIOS 1.2.6 06/08/2015 [ 3524.555740] Call Trace: [ 3524.559102] [<ffffffffa5fe915b>] dump_stack+0x19/0x1b [ 3524.565477] [<ffffffffa58a2f58>] __warn+0xd8/0x100 [ 3524.571557] [<ffffffffa58a2fdf>] warn_slowpath_fmt+0x5f/0x80 [ 3524.578610] [<ffffffffa5bf5b8c>] check_sync+0x4ec/0x5b0 [ 3524.585177] [<ffffffffa58efc3f>] ? set_cpus_allowed_ptr+0x5f/0x1c0 [ 3524.592812] [<ffffffffa5bf5cd0>] debug_dma_sync_single_for_cpu+0x80/0x90 [ 3524.601029] [<ffffffffa586add3>] ? x2apic_send_IPI_mask+0x13/0x20 [ 3524.608574] [<ffffffffa585ee1b>] ? native_smp_send_reschedule+0x5b/0x80 [ 3524.616699] [<ffffffffa58e9b76>] ? resched_curr+0xf6/0x140 [ 3524.623567] [<ffffffffc0879af0>] isert_create_send_desc.isra.26+0xe0/0x110 [ib_isert] [ 3524.633060] [<ffffffffc087af95>] isert_put_login_tx+0x55/0x8b0 [ib_isert] [ 3524.641383] [<ffffffffa58ef114>] ? try_to_wake_up+0x1a4/0x430 [ 3524.648561] [<ffffffffc098cfed>] iscsi_target_do_tx_login_io+0xdd/0x230 [iscsi_target_mod] [ 3524.658557] [<ffffffffc098d827>] iscsi_target_do_login+0x1a7/0x600 [iscsi_target_mod] [ 3524.668084] [<ffffffffa59f9bc9>] ? kstrdup+0x49/0x60 [ 3524.674420] [<ffffffffc098e976>] iscsi_target_start_negotiation+0x56/0xc0 [iscsi_target_mod] [ 3524.684656] [<ffffffffc098c2ee>] __iscsi_target_login_thread+0x90e/0x1070 [iscsi_target_mod] [ 3524.694901] [<ffffffffc098ca50>] ? __iscsi_target_login_thread+0x1070/0x1070 [iscsi_target_mod] [ 3524.705446] [<ffffffffc098ca50>] ? __iscsi_target_login_thread+0x1070/0x1070 [iscsi_target_mod] [ 3524.715976] [<ffffffffc098ca78>] iscsi_target_login_thread+0x28/0x60 [iscsi_target_mod] [ 3524.725739] [<ffffffffa58d60ff>] kthread+0xef/0x100 [ 3524.732007] [<ffffffffa58d6010>] ? insert_kthread_work+0x80/0x80 [ 3524.739540] [<ffffffffa5fff1b7>] ret_from_fork_nospec_begin+0x21/0x21 [ 3524.747558] [<ffffffffa58d6010>] ? insert_kthread_work+0x80/0x80 [ 3524.755088] ---[ end trace 23f8bf9238bd1ed8 ]--- [ 3595.510822] iSCSI/iqn.1994-05.com.redhat:537fa56299: Unsupported SCSI Opcode 0xa3, sending CHECK_CONDITION.
The code calls dma_sync on login_tx_desc->dma_addr prior to initializing it with dma-mapped address. login_tx_desc is a part of iser_conn structure and is used only once during login negotiation, so the issue is fixed by eliminating dma_sync call for this buffer using a special case routine.
Cc: stable@vger.kernel.org Reviewed-by: Mike Marciniszyn mike.marciniszyn@intel.com Reviewed-by: Don Dutile ddutile@redhat.com Signed-off-by: Alex Estrin alex.estrin@intel.com Signed-off-by: Dennis Dalessandro dennis.dalessandro@intel.com drivers/infiniband/ulp/isert/ib_isert.c | 26 +++++++++++++++++--------- 1 files changed, 17 insertions(+), 9 deletions(-)
This sounds like -rc material? Why is it for-next?
Should something that only occurs in a debug build be considered -rc material?
Usually errors produced in debug builds reflect some kind of real-world error as well.
Calling dma_sync on random garbage seems like the sort of thing that might crash on some archs?? But I guess HFI1 don't care about that..
Jason
On 05/17/2018 11:10 AM, Jason Gunthorpe wrote:
On Thu, May 17, 2018 at 11:03:49AM -0400, Dennis Dalessandro wrote:
On 5/16/2018 4:04 PM, Jason Gunthorpe wrote:
On Tue, May 15, 2018 at 06:31:39PM -0700, Dennis Dalessandro wrote:
From: Alex Estrin alex.estrin@intel.com
The following error message occurs on a target host in a debug build during session login:
[ 3524.411874] WARNING: CPU: 5 PID: 12063 at lib/dma-debug.c:1207 check_sync+0x4ec/0x5b0 [ 3524.421057] infiniband hfi1_0: DMA-API: device driver tries to sync DMA memory it has not allocated [device address=0x0000000000000000] [size=76 bytes] ......snip .....
[ 3524.535846] CPU: 5 PID: 12063 Comm: iscsi_np Kdump: loaded Not tainted 3.10.0-862.el7.x86_64.debug #1 [ 3524.546764] Hardware name: Dell Inc. PowerEdge R430/03XKDV, BIOS 1.2.6 06/08/2015 [ 3524.555740] Call Trace: [ 3524.559102] [<ffffffffa5fe915b>] dump_stack+0x19/0x1b [ 3524.565477] [<ffffffffa58a2f58>] __warn+0xd8/0x100 [ 3524.571557] [<ffffffffa58a2fdf>] warn_slowpath_fmt+0x5f/0x80 [ 3524.578610] [<ffffffffa5bf5b8c>] check_sync+0x4ec/0x5b0 [ 3524.585177] [<ffffffffa58efc3f>] ? set_cpus_allowed_ptr+0x5f/0x1c0 [ 3524.592812] [<ffffffffa5bf5cd0>] debug_dma_sync_single_for_cpu+0x80/0x90 [ 3524.601029] [<ffffffffa586add3>] ? x2apic_send_IPI_mask+0x13/0x20 [ 3524.608574] [<ffffffffa585ee1b>] ? native_smp_send_reschedule+0x5b/0x80 [ 3524.616699] [<ffffffffa58e9b76>] ? resched_curr+0xf6/0x140 [ 3524.623567] [<ffffffffc0879af0>] isert_create_send_desc.isra.26+0xe0/0x110 [ib_isert] [ 3524.633060] [<ffffffffc087af95>] isert_put_login_tx+0x55/0x8b0 [ib_isert] [ 3524.641383] [<ffffffffa58ef114>] ? try_to_wake_up+0x1a4/0x430 [ 3524.648561] [<ffffffffc098cfed>] iscsi_target_do_tx_login_io+0xdd/0x230 [iscsi_target_mod] [ 3524.658557] [<ffffffffc098d827>] iscsi_target_do_login+0x1a7/0x600 [iscsi_target_mod] [ 3524.668084] [<ffffffffa59f9bc9>] ? kstrdup+0x49/0x60 [ 3524.674420] [<ffffffffc098e976>] iscsi_target_start_negotiation+0x56/0xc0 [iscsi_target_mod] [ 3524.684656] [<ffffffffc098c2ee>] __iscsi_target_login_thread+0x90e/0x1070 [iscsi_target_mod] [ 3524.694901] [<ffffffffc098ca50>] ? __iscsi_target_login_thread+0x1070/0x1070 [iscsi_target_mod] [ 3524.705446] [<ffffffffc098ca50>] ? __iscsi_target_login_thread+0x1070/0x1070 [iscsi_target_mod] [ 3524.715976] [<ffffffffc098ca78>] iscsi_target_login_thread+0x28/0x60 [iscsi_target_mod] [ 3524.725739] [<ffffffffa58d60ff>] kthread+0xef/0x100 [ 3524.732007] [<ffffffffa58d6010>] ? insert_kthread_work+0x80/0x80 [ 3524.739540] [<ffffffffa5fff1b7>] ret_from_fork_nospec_begin+0x21/0x21 [ 3524.747558] [<ffffffffa58d6010>] ? insert_kthread_work+0x80/0x80 [ 3524.755088] ---[ end trace 23f8bf9238bd1ed8 ]--- [ 3595.510822] iSCSI/iqn.1994-05.com.redhat:537fa56299: Unsupported SCSI Opcode 0xa3, sending CHECK_CONDITION.
The code calls dma_sync on login_tx_desc->dma_addr prior to initializing it with dma-mapped address. login_tx_desc is a part of iser_conn structure and is used only once during login negotiation, so the issue is fixed by eliminating dma_sync call for this buffer using a special case routine.
Cc: stable@vger.kernel.org Reviewed-by: Mike Marciniszyn mike.marciniszyn@intel.com Reviewed-by: Don Dutile ddutile@redhat.com Signed-off-by: Alex Estrin alex.estrin@intel.com Signed-off-by: Dennis Dalessandro dennis.dalessandro@intel.com drivers/infiniband/ulp/isert/ib_isert.c | 26 +++++++++++++++++--------- 1 files changed, 17 insertions(+), 9 deletions(-)
This sounds like -rc material? Why is it for-next?
Should something that only occurs in a debug build be considered -rc material?
Usually errors produced in debug builds reflect some kind of real-world error as well.
Calling dma_sync on random garbage seems like the sort of thing that might crash on some archs?? But I guess HFI1 don't care about that..
Jason
That's not the case here. What is actually happening is a blind, dma-sync call before any DMA is being done, and the call is not needed.
Additionally, I believe dma-debug has a bug: -- doing a check on an op when there is no op in the dma-ops struct is not correct: static inline void dma_sync_single_for_cpu(struct device *dev, dma_addr_t addr, size_t size, enum dma_data_direction dir) { struct dma_map_ops *ops = get_dma_ops(dev);
BUG_ON(!valid_dma_direction(dir)); if (ops->sync_single_for_cpu) ops->sync_single_for_cpu(dev, addr, size, dir); debug_dma_sync_single_for_cpu(dev, addr, size, dir); }
Note: above: debug_dma_sync is called indep of whether an op function exists. For hfi1 & qib & rxe -- which use dma-virt-ops for non-IOMMU-enabled configs, the sync_syngle_for_cpu ops does not exist, yet debug_dma_sync is still called.
Christoph: care to comment, since you've done a fair amount in this space?
- Don
On Thu, May 17, 2018 at 07:01:30PM -0400, Don Dutile wrote:
Additionally, I believe dma-debug has a bug: -- doing a check on an op when there is no op in the dma-ops struct is not correct:
No, that is a feature.
Note: above: debug_dma_sync is called indep of whether an op function exists. For hfi1 & qib & rxe -- which use dma-virt-ops for non-IOMMU-enabled configs, the sync_syngle_for_cpu ops does not exist, yet debug_dma_sync is still called.
Yes, if you call dma ops you better make sure you respect the invariants.
This is the only way to get the code right for the 90% case where people develop on x86 with cache coherent DMA but do calls that only exist on non-coherent implementations.
On 05/18/2018 05:00 AM, Christoph Hellwig wrote:
On Thu, May 17, 2018 at 07:01:30PM -0400, Don Dutile wrote:
Additionally, I believe dma-debug has a bug: -- doing a check on an op when there is no op in the dma-ops struct is not correct:
No, that is a feature.
Note: above: debug_dma_sync is called indep of whether an op function exists. For hfi1 & qib & rxe -- which use dma-virt-ops for non-IOMMU-enabled configs, the sync_syngle_for_cpu ops does not exist, yet debug_dma_sync is still called.
Yes, if you call dma ops you better make sure you respect the invariants.
This is the only way to get the code right for the 90% case where people develop on x86 with cache coherent DMA but do calls that only exist on non-coherent implementations.
So, if I re-state the above correctly, it's not an issue on x86 (a cache-coherent arch, where we saw this message occurred), but it would be on a non-cache-coherent arch. The 'feature' warns the caller even when run on an arch that its not a true issue/warning.
So, to Jason's question as for-rc, no, unless someone has a case of running RDMA on a non-cache-coherent arch... which seems counter-intuitive -- high speed RDMA on low-speed/cache-incorehent arch.
Christoph: Would you object to a patch that more clearly states the above, i.e., still generates the warning but delineates that it's not an error on the given arch, but potentially other arch's?
I'm getting a fairly decent set of lib/dma-debug bz's atm (since I backported latest dma-map refactoring & all of lib/dma-debug to RHEL-7), and I'd like to easily discern what needs to be fixed asap, e.g., not checking return value of dma_map(), vs warnings (that will still be bz'd) I don't have to prioritize over more pressing bz's. Having the warning provide a bit more info in that area, as suggested above, would help.
Thanks for review & feedback.
- Don
On Tue, 2018-05-15 at 18:31 -0700, Dennis Dalessandro wrote:
Hi Doug and Jason,
Here are some patches to go to for-next. One is a code cleanup. The rest are bug fixes that are probably not serious enough for an -rc6. The one that may be on the fence is the isert patch. Since it only affects debug kernels it can probably even wait till for-next. It has been marked stable though.
Alex Estrin (1): IB/isert: Fix for lib/dma_debug check_sync warning
Kamenee Arumugam (1): IB/Hfi1: Mask Unsupported Request error bit in PCIe AER
Michael J. Ruhl (1): IB/hfi1: Set port number for errorinfo MAD response
Mike Marciniszyn (2): IB/hfi1: Cleanup of exp_rcv IB/{rdmavt,hfi1}; Change hrtimer add to use the pinned variation
Hi Denny,
I dropped the PCI bit fiddling patch as per comments. Please provide a more detailed commit message if you want this patch to go in.
I provided my own guestimate of a commit message touchup to the hrtimers patch. Please let me know if I was wrong in my assumptions there.
The rest applied as-is. Thanks.
On 5/22/2018 3:33 PM, Doug Ledford wrote:
On Tue, 2018-05-15 at 18:31 -0700, Dennis Dalessandro wrote:
Hi Doug and Jason,
Here are some patches to go to for-next. One is a code cleanup. The rest are bug fixes that are probably not serious enough for an -rc6. The one that may be on the fence is the isert patch. Since it only affects debug kernels it can probably even wait till for-next. It has been marked stable though.
Alex Estrin (1): IB/isert: Fix for lib/dma_debug check_sync warning
Kamenee Arumugam (1): IB/Hfi1: Mask Unsupported Request error bit in PCIe AER
Michael J. Ruhl (1): IB/hfi1: Set port number for errorinfo MAD response
Mike Marciniszyn (2): IB/hfi1: Cleanup of exp_rcv IB/{rdmavt,hfi1}; Change hrtimer add to use the pinned variation
Hi Denny,
I dropped the PCI bit fiddling patch as per comments. Please provide a more detailed commit message if you want this patch to go in.
Kamenee is looking into this some more. Yeah, OK to drop for now, we'll resubmit with at least a better commit message or a different approach depending on her investigation.
I provided my own guestimate of a commit message touchup to the hrtimers patch. Please let me know if I was wrong in my assumptions there.
Sounds good to me, thanks!
-Denny
linux-stable-mirror@lists.linaro.org