From: Pavel Tatashin <pasha.tatashin(a)soleen.com>
commit 94bb804e1e6f0a9a77acf20d7c70ea141c6c821e upstream.
A number of our uaccess routines ('__arch_clear_user()' and
'__arch_copy_{in,from,to}_user()') fail to re-enable PAN if they
encounter an unhandled fault whilst accessing userspace.
For CPUs implementing both hardware PAN and UAO, this bug has no effect
when both extensions are in use by the kernel.
For CPUs implementing hardware PAN but not UAO, this means that a kernel
using hardware PAN may execute portions of code with PAN inadvertently
disabled, opening us up to potential security vulnerabilities that rely
on userspace access from within the kernel which would usually be
prevented by this mechanism. In other words, parts of the kernel run the
same way as they would on a CPU without PAN implemented/emulated at all.
For CPUs not implementing hardware PAN and instead relying on software
emulation via 'CONFIG_ARM64_SW_TTBR0_PAN=y', the impact is unfortunately
much worse. Calling 'schedule()' with software PAN disabled means that
the next task will execute in the kernel using the page-table and ASID
of the previous process even after 'switch_mm()', since the actual
hardware switch is deferred until return to userspace. At this point, or
if there is a intermediate call to 'uaccess_enable()', the page-table
and ASID of the new process are installed. Sadly, due to the changes
introduced by KPTI, this is not an atomic operation and there is a very
small window (two instructions) where the CPU is configured with the
page-table of the old task and the ASID of the new task; a speculative
access in this state is disastrous because it would corrupt the TLB
entries for the new task with mappings from the previous address space.
As Pavel explains:
| I was able to reproduce memory corruption problem on Broadcom's SoC
| ARMv8-A like this:
|
| Enable software perf-events with PERF_SAMPLE_CALLCHAIN so userland's
| stack is accessed and copied.
|
| The test program performed the following on every CPU and forking
| many processes:
|
| unsigned long *map = mmap(NULL, PAGE_SIZE, PROT_READ|PROT_WRITE,
| MAP_SHARED | MAP_ANONYMOUS, -1, 0);
| map[0] = getpid();
| sched_yield();
| if (map[0] != getpid()) {
| fprintf(stderr, "Corruption detected!");
| }
| munmap(map, PAGE_SIZE);
|
| From time to time I was getting map[0] to contain pid for a
| different process.
Ensure that PAN is re-enabled when returning after an unhandled user
fault from our uaccess routines.
Cc: Catalin Marinas <catalin.marinas(a)arm.com>
Reviewed-by: Mark Rutland <mark.rutland(a)arm.com>
Tested-by: Mark Rutland <mark.rutland(a)arm.com>
Cc: <stable(a)vger.kernel.org>
Fixes: 338d4f49d6f7 ("arm64: kernel: Add support for Privileged Access Never")
Signed-off-by: Pavel Tatashin <pasha.tatashin(a)soleen.com>
[will: rewrote commit message]
[will: backport for 4.9.y stable kernels]
Signed-off-by: Will Deacon <will(a)kernel.org>
---
arch/arm64/lib/clear_user.S | 2 ++
arch/arm64/lib/copy_from_user.S | 2 ++
arch/arm64/lib/copy_in_user.S | 2 ++
arch/arm64/lib/copy_to_user.S | 2 ++
4 files changed, 8 insertions(+)
diff --git a/arch/arm64/lib/clear_user.S b/arch/arm64/lib/clear_user.S
index efbf610eaf4e..a814f32033b0 100644
--- a/arch/arm64/lib/clear_user.S
+++ b/arch/arm64/lib/clear_user.S
@@ -62,5 +62,7 @@ ENDPROC(__arch_clear_user)
.section .fixup,"ax"
.align 2
9: mov x0, x2 // return the original size
+ALTERNATIVE("nop", __stringify(SET_PSTATE_PAN(1)), ARM64_ALT_PAN_NOT_UAO, \
+ CONFIG_ARM64_PAN)
ret
.previous
diff --git a/arch/arm64/lib/copy_from_user.S b/arch/arm64/lib/copy_from_user.S
index 4fd67ea03bb0..580aca96c53c 100644
--- a/arch/arm64/lib/copy_from_user.S
+++ b/arch/arm64/lib/copy_from_user.S
@@ -80,5 +80,7 @@ ENDPROC(__arch_copy_from_user)
.section .fixup,"ax"
.align 2
9998: sub x0, end, dst // bytes not copied
+ALTERNATIVE("nop", __stringify(SET_PSTATE_PAN(1)), ARM64_ALT_PAN_NOT_UAO, \
+ CONFIG_ARM64_PAN)
ret
.previous
diff --git a/arch/arm64/lib/copy_in_user.S b/arch/arm64/lib/copy_in_user.S
index 841bf8f7fab7..d9ca6a4f33b3 100644
--- a/arch/arm64/lib/copy_in_user.S
+++ b/arch/arm64/lib/copy_in_user.S
@@ -81,5 +81,7 @@ ENDPROC(__arch_copy_in_user)
.section .fixup,"ax"
.align 2
9998: sub x0, end, dst // bytes not copied
+ALTERNATIVE("nop", __stringify(SET_PSTATE_PAN(1)), ARM64_ALT_PAN_NOT_UAO, \
+ CONFIG_ARM64_PAN)
ret
.previous
diff --git a/arch/arm64/lib/copy_to_user.S b/arch/arm64/lib/copy_to_user.S
index 7a7efe255034..e8bd40dc00cd 100644
--- a/arch/arm64/lib/copy_to_user.S
+++ b/arch/arm64/lib/copy_to_user.S
@@ -79,5 +79,7 @@ ENDPROC(__arch_copy_to_user)
.section .fixup,"ax"
.align 2
9998: sub x0, end, dst // bytes not copied
+ALTERNATIVE("nop", __stringify(SET_PSTATE_PAN(1)), ARM64_ALT_PAN_NOT_UAO, \
+ CONFIG_ARM64_PAN)
ret
.previous
--
2.24.0.432.g9d3f5f5b63-goog
From: Pavel Tatashin <pasha.tatashin(a)soleen.com>
commit 94bb804e1e6f0a9a77acf20d7c70ea141c6c821e upstream.
A number of our uaccess routines ('__arch_clear_user()' and
'__arch_copy_{in,from,to}_user()') fail to re-enable PAN if they
encounter an unhandled fault whilst accessing userspace.
For CPUs implementing both hardware PAN and UAO, this bug has no effect
when both extensions are in use by the kernel.
For CPUs implementing hardware PAN but not UAO, this means that a kernel
using hardware PAN may execute portions of code with PAN inadvertently
disabled, opening us up to potential security vulnerabilities that rely
on userspace access from within the kernel which would usually be
prevented by this mechanism. In other words, parts of the kernel run the
same way as they would on a CPU without PAN implemented/emulated at all.
For CPUs not implementing hardware PAN and instead relying on software
emulation via 'CONFIG_ARM64_SW_TTBR0_PAN=y', the impact is unfortunately
much worse. Calling 'schedule()' with software PAN disabled means that
the next task will execute in the kernel using the page-table and ASID
of the previous process even after 'switch_mm()', since the actual
hardware switch is deferred until return to userspace. At this point, or
if there is a intermediate call to 'uaccess_enable()', the page-table
and ASID of the new process are installed. Sadly, due to the changes
introduced by KPTI, this is not an atomic operation and there is a very
small window (two instructions) where the CPU is configured with the
page-table of the old task and the ASID of the new task; a speculative
access in this state is disastrous because it would corrupt the TLB
entries for the new task with mappings from the previous address space.
As Pavel explains:
| I was able to reproduce memory corruption problem on Broadcom's SoC
| ARMv8-A like this:
|
| Enable software perf-events with PERF_SAMPLE_CALLCHAIN so userland's
| stack is accessed and copied.
|
| The test program performed the following on every CPU and forking
| many processes:
|
| unsigned long *map = mmap(NULL, PAGE_SIZE, PROT_READ|PROT_WRITE,
| MAP_SHARED | MAP_ANONYMOUS, -1, 0);
| map[0] = getpid();
| sched_yield();
| if (map[0] != getpid()) {
| fprintf(stderr, "Corruption detected!");
| }
| munmap(map, PAGE_SIZE);
|
| From time to time I was getting map[0] to contain pid for a
| different process.
Ensure that PAN is re-enabled when returning after an unhandled user
fault from our uaccess routines.
Cc: Catalin Marinas <catalin.marinas(a)arm.com>
Reviewed-by: Mark Rutland <mark.rutland(a)arm.com>
Tested-by: Mark Rutland <mark.rutland(a)arm.com>
Cc: <stable(a)vger.kernel.org>
Fixes: 338d4f49d6f7 ("arm64: kernel: Add support for Privileged Access Never")
Signed-off-by: Pavel Tatashin <pasha.tatashin(a)soleen.com>
[will: rewrote commit message]
[will: backport for 4.4.y stable kernels]
Signed-off-by: Will Deacon <will(a)kernel.org>
---
arch/arm64/lib/clear_user.S | 2 ++
arch/arm64/lib/copy_from_user.S | 2 ++
arch/arm64/lib/copy_in_user.S | 2 ++
arch/arm64/lib/copy_to_user.S | 2 ++
4 files changed, 8 insertions(+)
diff --git a/arch/arm64/lib/clear_user.S b/arch/arm64/lib/clear_user.S
index a9723c71c52b..8d330c30a6f9 100644
--- a/arch/arm64/lib/clear_user.S
+++ b/arch/arm64/lib/clear_user.S
@@ -62,5 +62,7 @@ ENDPROC(__clear_user)
.section .fixup,"ax"
.align 2
9: mov x0, x2 // return the original size
+ALTERNATIVE("nop", __stringify(SET_PSTATE_PAN(1)), ARM64_HAS_PAN, \
+ CONFIG_ARM64_PAN)
ret
.previous
diff --git a/arch/arm64/lib/copy_from_user.S b/arch/arm64/lib/copy_from_user.S
index 4699cd74f87e..b8c95ef13229 100644
--- a/arch/arm64/lib/copy_from_user.S
+++ b/arch/arm64/lib/copy_from_user.S
@@ -85,5 +85,7 @@ ENDPROC(__copy_from_user)
strb wzr, [dst], #1 // zero remaining buffer space
cmp dst, end
b.lo 9999b
+ALTERNATIVE("nop", __stringify(SET_PSTATE_PAN(1)), ARM64_HAS_PAN, \
+ CONFIG_ARM64_PAN)
ret
.previous
diff --git a/arch/arm64/lib/copy_in_user.S b/arch/arm64/lib/copy_in_user.S
index 81c8fc93c100..233703c84bcd 100644
--- a/arch/arm64/lib/copy_in_user.S
+++ b/arch/arm64/lib/copy_in_user.S
@@ -81,5 +81,7 @@ ENDPROC(__copy_in_user)
.section .fixup,"ax"
.align 2
9998: sub x0, end, dst // bytes not copied
+ALTERNATIVE("nop", __stringify(SET_PSTATE_PAN(1)), ARM64_HAS_PAN, \
+ CONFIG_ARM64_PAN)
ret
.previous
diff --git a/arch/arm64/lib/copy_to_user.S b/arch/arm64/lib/copy_to_user.S
index 7512bbbc07ac..62b179408b23 100644
--- a/arch/arm64/lib/copy_to_user.S
+++ b/arch/arm64/lib/copy_to_user.S
@@ -79,5 +79,7 @@ ENDPROC(__copy_to_user)
.section .fixup,"ax"
.align 2
9998: sub x0, end, dst // bytes not copied
+ALTERNATIVE("nop", __stringify(SET_PSTATE_PAN(1)), ARM64_HAS_PAN, \
+ CONFIG_ARM64_PAN)
ret
.previous
--
2.24.0.432.g9d3f5f5b63-goog
We've encountered a rcu stall in get_mem_cgroup_from_mm():
rcu: INFO: rcu_sched self-detected stall on CPU
rcu: 33-....: (21000 ticks this GP) idle=6c6/1/0x4000000000000002 softirq=35441/35441 fqs=5017
(t=21031 jiffies g=324821 q=95837) NMI backtrace for cpu 33
<...>
RIP: 0010:get_mem_cgroup_from_mm+0x2f/0x90
<...>
__memcg_kmem_charge+0x55/0x140
__alloc_pages_nodemask+0x267/0x320
pipe_write+0x1ad/0x400
new_sync_write+0x127/0x1c0
__kernel_write+0x4f/0xf0
dump_emit+0x91/0xc0
writenote+0xa0/0xc0
elf_core_dump+0x11af/0x1430
do_coredump+0xc65/0xee0
? unix_stream_sendmsg+0x37d/0x3b0
get_signal+0x132/0x7c0
do_signal+0x36/0x640
? recalc_sigpending+0x17/0x50
exit_to_usermode_loop+0x61/0xd0
do_syscall_64+0xd4/0x100
entry_SYSCALL_64_after_hwframe+0x44/0xa9
The problem is caused by an exiting task which is associated with
an offline memcg. We're iterating over and over in the
do {} while (!css_tryget_online()) loop, but obviously the memcg won't
become online and the exiting task won't be migrated to a live memcg.
Let's fix it by switching from css_tryget_online() to css_tryget().
As css_tryget_online() cannot guarantee that the memcg won't go
offline, the check is usually useless, except some rare cases
when for example it determines if something should be presented
to a user.
A similar problem is described by commit 18fa84a2db0e ("cgroup: Use
css_tryget() instead of css_tryget_online() in task_get_css()").
Signed-off-by: Roman Gushchin <guro(a)fb.com>
Cc: stable(a)vger.kernel.org
Cc: Tejun Heo <tj(a)kernel.org>
---
mm/memcontrol.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 50f5bc55fcec..c5b5f74cfd4d 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -939,7 +939,7 @@ struct mem_cgroup *get_mem_cgroup_from_mm(struct mm_struct *mm)
if (unlikely(!memcg))
memcg = root_mem_cgroup;
}
- } while (!css_tryget_online(&memcg->css));
+ } while (!css_tryget(&memcg->css));
rcu_read_unlock();
return memcg;
}
--
2.17.1
From: James Smart <jsmart2021(a)gmail.com>
[ Upstream commit 7c4042a4d0b7532cfbc90478fd3084b2dab5849e ]
When dif and first burst is used in a write command wqe, the driver was not
properly setting fields in the io command request. This resulted in no dif
bytes being sent and invalid xfer_rdy's, resulting in the io being aborted
by the hardware.
Correct the wqe initializaton when both dif and first burst are used.
Signed-off-by: Dick Kennedy <dick.kennedy(a)broadcom.com>
Signed-off-by: James Smart <jsmart2021(a)gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen(a)oracle.com>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
drivers/scsi/lpfc/lpfc_scsi.c | 18 ++++++++++++++++++
1 file changed, 18 insertions(+)
diff --git a/drivers/scsi/lpfc/lpfc_scsi.c b/drivers/scsi/lpfc/lpfc_scsi.c
index bae36cc3740b6..ab6bff60478f9 100644
--- a/drivers/scsi/lpfc/lpfc_scsi.c
+++ b/drivers/scsi/lpfc/lpfc_scsi.c
@@ -2707,6 +2707,7 @@ lpfc_bg_scsi_prep_dma_buf_s3(struct lpfc_hba *phba,
int datasegcnt, protsegcnt, datadir = scsi_cmnd->sc_data_direction;
int prot_group_type = 0;
int fcpdl;
+ struct lpfc_vport *vport = phba->pport;
/*
* Start the lpfc command prep by bumping the bpl beyond fcp_cmnd
@@ -2812,6 +2813,14 @@ lpfc_bg_scsi_prep_dma_buf_s3(struct lpfc_hba *phba,
*/
iocb_cmd->un.fcpi.fcpi_parm = fcpdl;
+ /*
+ * For First burst, we may need to adjust the initial transfer
+ * length for DIF
+ */
+ if (iocb_cmd->un.fcpi.fcpi_XRdy &&
+ (fcpdl < vport->cfg_first_burst_size))
+ iocb_cmd->un.fcpi.fcpi_XRdy = fcpdl;
+
return 0;
err:
if (lpfc_cmd->seg_cnt)
@@ -3361,6 +3370,7 @@ lpfc_bg_scsi_prep_dma_buf_s4(struct lpfc_hba *phba,
int datasegcnt, protsegcnt, datadir = scsi_cmnd->sc_data_direction;
int prot_group_type = 0;
int fcpdl;
+ struct lpfc_vport *vport = phba->pport;
/*
* Start the lpfc command prep by bumping the sgl beyond fcp_cmnd
@@ -3476,6 +3486,14 @@ lpfc_bg_scsi_prep_dma_buf_s4(struct lpfc_hba *phba,
*/
iocb_cmd->un.fcpi.fcpi_parm = fcpdl;
+ /*
+ * For First burst, we may need to adjust the initial transfer
+ * length for DIF
+ */
+ if (iocb_cmd->un.fcpi.fcpi_XRdy &&
+ (fcpdl < vport->cfg_first_burst_size))
+ iocb_cmd->un.fcpi.fcpi_XRdy = fcpdl;
+
/*
* If the OAS driver feature is enabled and the lun is enabled for
* OAS, set the oas iocb related flags.
--
2.20.1
From: James Smart <jsmart2021(a)gmail.com>
[ Upstream commit 7c4042a4d0b7532cfbc90478fd3084b2dab5849e ]
When dif and first burst is used in a write command wqe, the driver was not
properly setting fields in the io command request. This resulted in no dif
bytes being sent and invalid xfer_rdy's, resulting in the io being aborted
by the hardware.
Correct the wqe initializaton when both dif and first burst are used.
Signed-off-by: Dick Kennedy <dick.kennedy(a)broadcom.com>
Signed-off-by: James Smart <jsmart2021(a)gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen(a)oracle.com>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
drivers/scsi/lpfc/lpfc_scsi.c | 18 ++++++++++++++++++
1 file changed, 18 insertions(+)
diff --git a/drivers/scsi/lpfc/lpfc_scsi.c b/drivers/scsi/lpfc/lpfc_scsi.c
index d197aa176dee3..d489cc1018b5c 100644
--- a/drivers/scsi/lpfc/lpfc_scsi.c
+++ b/drivers/scsi/lpfc/lpfc_scsi.c
@@ -2707,6 +2707,7 @@ lpfc_bg_scsi_prep_dma_buf_s3(struct lpfc_hba *phba,
int datasegcnt, protsegcnt, datadir = scsi_cmnd->sc_data_direction;
int prot_group_type = 0;
int fcpdl;
+ struct lpfc_vport *vport = phba->pport;
/*
* Start the lpfc command prep by bumping the bpl beyond fcp_cmnd
@@ -2812,6 +2813,14 @@ lpfc_bg_scsi_prep_dma_buf_s3(struct lpfc_hba *phba,
*/
iocb_cmd->un.fcpi.fcpi_parm = fcpdl;
+ /*
+ * For First burst, we may need to adjust the initial transfer
+ * length for DIF
+ */
+ if (iocb_cmd->un.fcpi.fcpi_XRdy &&
+ (fcpdl < vport->cfg_first_burst_size))
+ iocb_cmd->un.fcpi.fcpi_XRdy = fcpdl;
+
return 0;
err:
if (lpfc_cmd->seg_cnt)
@@ -3364,6 +3373,7 @@ lpfc_bg_scsi_prep_dma_buf_s4(struct lpfc_hba *phba,
int datasegcnt, protsegcnt, datadir = scsi_cmnd->sc_data_direction;
int prot_group_type = 0;
int fcpdl;
+ struct lpfc_vport *vport = phba->pport;
/*
* Start the lpfc command prep by bumping the sgl beyond fcp_cmnd
@@ -3479,6 +3489,14 @@ lpfc_bg_scsi_prep_dma_buf_s4(struct lpfc_hba *phba,
*/
iocb_cmd->un.fcpi.fcpi_parm = fcpdl;
+ /*
+ * For First burst, we may need to adjust the initial transfer
+ * length for DIF
+ */
+ if (iocb_cmd->un.fcpi.fcpi_XRdy &&
+ (fcpdl < vport->cfg_first_burst_size))
+ iocb_cmd->un.fcpi.fcpi_XRdy = fcpdl;
+
/*
* If the OAS driver feature is enabled and the lun is enabled for
* OAS, set the oas iocb related flags.
--
2.20.1
From: Bart Van Assche <bvanassche(a)acm.org>
[ Upstream commit e7f411049f5164ee6db6c3434c07302846f09990 ]
This patch does not change any functionality but avoids that sparse
complains about the queue_cmd_ring() function and its callers.
Fixes: 6fd0ce79724d ("tcmu: prep queue_cmd_ring to be used by unmap wq")
Reviewed-by: David Disseldorp <ddiss(a)suse.de>
Cc: Nicholas Bellinger <nab(a)linux-iscsi.org>
Cc: Mike Christie <mchristi(a)redhat.com>
Cc: Christoph Hellwig <hch(a)lst.de>
Cc: Hannes Reinecke <hare(a)suse.de>
Signed-off-by: Bart Van Assche <bvanassche(a)acm.org>
Signed-off-by: Martin K. Petersen <martin.petersen(a)oracle.com>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
drivers/target/target_core_user.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/target/target_core_user.c b/drivers/target/target_core_user.c
index 7159e8363b83b..7ee0a75ce4526 100644
--- a/drivers/target/target_core_user.c
+++ b/drivers/target/target_core_user.c
@@ -962,7 +962,7 @@ static int add_to_qfull_queue(struct tcmu_cmd *tcmu_cmd)
* 0 success
* 1 internally queued to wait for ring memory to free.
*/
-static sense_reason_t queue_cmd_ring(struct tcmu_cmd *tcmu_cmd, int *scsi_err)
+static int queue_cmd_ring(struct tcmu_cmd *tcmu_cmd, sense_reason_t *scsi_err)
{
struct tcmu_dev *udev = tcmu_cmd->tcmu_dev;
struct se_cmd *se_cmd = tcmu_cmd->se_cmd;
--
2.20.1
From: Bart Van Assche <bvanassche(a)acm.org>
[ Upstream commit e7f411049f5164ee6db6c3434c07302846f09990 ]
This patch does not change any functionality but avoids that sparse
complains about the queue_cmd_ring() function and its callers.
Fixes: 6fd0ce79724d ("tcmu: prep queue_cmd_ring to be used by unmap wq")
Reviewed-by: David Disseldorp <ddiss(a)suse.de>
Cc: Nicholas Bellinger <nab(a)linux-iscsi.org>
Cc: Mike Christie <mchristi(a)redhat.com>
Cc: Christoph Hellwig <hch(a)lst.de>
Cc: Hannes Reinecke <hare(a)suse.de>
Signed-off-by: Bart Van Assche <bvanassche(a)acm.org>
Signed-off-by: Martin K. Petersen <martin.petersen(a)oracle.com>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
drivers/target/target_core_user.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/target/target_core_user.c b/drivers/target/target_core_user.c
index 7159e8363b83b..7ee0a75ce4526 100644
--- a/drivers/target/target_core_user.c
+++ b/drivers/target/target_core_user.c
@@ -962,7 +962,7 @@ static int add_to_qfull_queue(struct tcmu_cmd *tcmu_cmd)
* 0 success
* 1 internally queued to wait for ring memory to free.
*/
-static sense_reason_t queue_cmd_ring(struct tcmu_cmd *tcmu_cmd, int *scsi_err)
+static int queue_cmd_ring(struct tcmu_cmd *tcmu_cmd, sense_reason_t *scsi_err)
{
struct tcmu_dev *udev = tcmu_cmd->tcmu_dev;
struct se_cmd *se_cmd = tcmu_cmd->se_cmd;
--
2.20.1