Two enclave threads may try to access the same non-present enclave page simultaneously (e.g., if the SGX runtime supports lazy allocation). The threads will end up in sgx_encl_eaug_page(), racing to acquire the enclave lock. The winning thread will perform EAUG, set up the page table entry, and insert the page into encl->page_array. The losing thread will then get -EBUSY on xa_insert(&encl->page_array) and proceed to error handling path.
This race condition can be illustrated as follows:
/* /* * Fault on CPU1 * Fault on CPU2 * on enclave page X * on enclave page X */ */ sgx_vma_fault() { sgx_vma_fault() {
xa_load(&encl->page_array) xa_load(&encl->page_array) == NULL --> == NULL -->
sgx_encl_eaug_page() { sgx_encl_eaug_page() {
... ...
/* /* * alloc encl_page * alloc encl_page */ */ mutex_lock(&encl->lock); /* * alloc EPC page */ epc_page = sgx_alloc_epc_page(...); /* * add page to enclave's xarray */ xa_insert(&encl->page_array, ...); /* * add page to enclave via EAUG * (page is in pending state) */ /* * add PTE entry */ vmf_insert_pfn(...);
mutex_unlock(&encl->lock); return VM_FAULT_NOPAGE; } } /* * All good up to here: enclave page * successfully added to enclave, * ready for EACCEPT from user space */ mutex_lock(&encl->lock); /* * alloc EPC page */ epc_page = sgx_alloc_epc_page(...); /* * add page to enclave's xarray, * this fails with -EBUSY as this * page was already added by CPU2 */ xa_insert(&encl->page_array, ...);
err_out_shrink: sgx_encl_free_epc_page(epc_page) { /* * remove page via EREMOVE * * *BUG*: page added by CPU2 is * yanked from enclave while it * remains accessible from OS * perspective (PTE installed) */ /* * free EPC page */ sgx_free_epc_page(epc_page); }
mutex_unlock(&encl->lock); /* * *BUG*: SIGBUS is returned * for a valid enclave page */ return VM_FAULT_SIGBUS; } }
The err_out_shrink error handling path contains two bugs: (1) function sgx_encl_free_epc_page() is called that performs EREMOVE even though the enclave page was never intended to be removed, and (2) SIGBUS is sent to userspace even though the enclave page is correctly installed by another thread.
The first bug renders the enclave page perpetually inaccessible (until another SGX_IOC_ENCLAVE_REMOVE_PAGES ioctl). This is because the page is marked accessible in the PTE entry but is not EAUGed, and any subsequent access to this page raises a fault: with the kernel believing there to be a valid VMA, the unlikely error code X86_PF_SGX encountered by code path do_user_addr_fault() -> access_error() causes the SGX driver's sgx_vma_fault() to be skipped and user space receives a SIGSEGV instead. The userspace SIGSEGV handler cannot perform EACCEPT because the page was not EAUGed. Thus, the user space is stuck with the inaccessible page. The second bug is less severe: a spurious SIGBUS signal is unnecessarily sent to user space.
Fix these two bugs (1) by returning VM_FAULT_NOPAGE to the generic Linux fault handler so that no signal is sent to userspace, and (2) by replacing sgx_encl_free_epc_page() with sgx_free_epc_page() so that no EREMOVE is performed.
Note that sgx_encl_free_epc_page() performs an additional WARN_ON_ONCE check in comparison to sgx_free_epc_page(): whether the EPC page is being reclaimer tracked. However, the EPC page is allocated in sgx_encl_eaug_page() and has zeroed-out flags in all error handling paths. In other words, the page is marked as reclaimable only in the happy path of sgx_encl_eaug_page(). Therefore, in the particular code path affected in this commit, the "page reclaimer tracked" condition is always false and the warning is never printed. Thus, it is safe to replace sgx_encl_free_epc_page() with sgx_free_epc_page().
Fixes: 5a90d2c3f5ef ("x86/sgx: Support adding of pages to an initialized enclave") Cc: stable@vger.kernel.org Reported-by: Marcelina Kościelnicka mwk@invisiblethingslab.com Suggested-by: Reinette Chatre reinette.chatre@intel.com Signed-off-by: Dmitrii Kuvaiskii dmitrii.kuvaiskii@intel.com --- arch/x86/kernel/cpu/sgx/encl.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c index 279148e72459..41f14b1a3025 100644 --- a/arch/x86/kernel/cpu/sgx/encl.c +++ b/arch/x86/kernel/cpu/sgx/encl.c @@ -382,8 +382,11 @@ static vm_fault_t sgx_encl_eaug_page(struct vm_area_struct *vma, * If ret == -EBUSY then page was created in another flow while * running without encl->lock */ - if (ret) + if (ret) { + if (ret == -EBUSY) + vmret = VM_FAULT_NOPAGE; goto err_out_shrink; + }
pginfo.secs = (unsigned long)sgx_get_epc_virt_addr(encl->secs.epc_page); pginfo.addr = encl_page->desc & PAGE_MASK; @@ -419,7 +422,7 @@ static vm_fault_t sgx_encl_eaug_page(struct vm_area_struct *vma, err_out_shrink: sgx_encl_shrink(encl, va_page); err_out_epc: - sgx_encl_free_epc_page(epc_page); + sgx_free_epc_page(epc_page); err_out_unlock: mutex_unlock(&encl->lock); kfree(encl_page);
On Wed May 15, 2024 at 4:12 PM EEST, Dmitrii Kuvaiskii wrote:
diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c index 279148e72459..41f14b1a3025 100644 --- a/arch/x86/kernel/cpu/sgx/encl.c +++ b/arch/x86/kernel/cpu/sgx/encl.c @@ -382,8 +382,11 @@ static vm_fault_t sgx_encl_eaug_page(struct vm_area_struct *vma, * If ret == -EBUSY then page was created in another flow while * running without encl->lock */
- if (ret)
- if (ret) {
if (ret == -EBUSY)
goto err_out_shrink;vmret = VM_FAULT_NOPAGE;
- }
I agree that there is a bug but it does not categorize as race condition.
The bug is simply that for a valid page SIGBUS might be returned. The fix is correct but the claim is not.
pginfo.secs = (unsigned long)sgx_get_epc_virt_addr(encl->secs.epc_page); pginfo.addr = encl_page->desc & PAGE_MASK; @@ -419,7 +422,7 @@ static vm_fault_t sgx_encl_eaug_page(struct vm_area_struct *vma, err_out_shrink: sgx_encl_shrink(encl, va_page); err_out_epc:
- sgx_encl_free_epc_page(epc_page);
- sgx_free_epc_page(epc_page);
err_out_unlock: mutex_unlock(&encl->lock); kfree(encl_page);
Agree with code change 100% but not with the description.
I'd cut out 90% of the description out and just make the argument of the wrong error code, and done. The sequence is great for showing how this could happen. The prose makes my head hurt tbh.
BR, Jarkko
On Wed May 15, 2024 at 4:54 PM EEST, Jarkko Sakkinen wrote:
On Wed May 15, 2024 at 4:12 PM EEST, Dmitrii Kuvaiskii wrote:
diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c index 279148e72459..41f14b1a3025 100644 --- a/arch/x86/kernel/cpu/sgx/encl.c +++ b/arch/x86/kernel/cpu/sgx/encl.c @@ -382,8 +382,11 @@ static vm_fault_t sgx_encl_eaug_page(struct vm_area_struct *vma, * If ret == -EBUSY then page was created in another flow while * running without encl->lock */
- if (ret)
- if (ret) {
if (ret == -EBUSY)
goto err_out_shrink;vmret = VM_FAULT_NOPAGE;
- }
I agree that there is a bug but it does not categorize as race condition.
The bug is simply that for a valid page SIGBUS might be returned. The fix is correct but the claim is not.
pginfo.secs = (unsigned long)sgx_get_epc_virt_addr(encl->secs.epc_page); pginfo.addr = encl_page->desc & PAGE_MASK; @@ -419,7 +422,7 @@ static vm_fault_t sgx_encl_eaug_page(struct vm_area_struct *vma, err_out_shrink: sgx_encl_shrink(encl, va_page); err_out_epc:
- sgx_encl_free_epc_page(epc_page);
- sgx_free_epc_page(epc_page);
err_out_unlock: mutex_unlock(&encl->lock); kfree(encl_page);
Agree with code change 100% but not with the description.
I'd cut out 90% of the description out and just make the argument of the wrong error code, and done. The sequence is great for showing how this could happen. The prose makes my head hurt tbh.
Also please remember that stable maintainers need to read all of that if this is a bug fix (it is a bug fix!) :-) So shorted possible legit argument, no prose and the sequence was awesome :-)
BR, Jarkko
On 5/15/24 06:54, Jarkko Sakkinen wrote:
I'd cut out 90% of the description out and just make the argument of the wrong error code, and done. The sequence is great for showing how this could happen. The prose makes my head hurt tbh.
The changelog is too long, but not fatally so. I'd much rather have a super verbose description than something super sparse.
Would something like this make more sense to folks?
Imagine an mmap()'d file. Two threads touch the same address at the same time and fault. Both allocate a physical page and race to install a PTE for that page. Only one will win the race. The loser frees its page, but still continues handling the fault as a success and returns VM_FAULT_NOPAGE from the fault handler.
The same race can happen with SGX. But there's a bug: the loser in the SGX steers into a failure path. The loser EREMOVE's the winner's EPC page, then returns SIGBUS, likely killing the app.
Fix the SGX loser's behavior. Change the return code to VM_FAULT_NOPAGE to avoid SIGBUS and call sgx_free_epc_page() which avoids EREMOVE'ing the winner's page and only frees the page that the loser allocated.
On Wed May 15, 2024 at 5:15 PM EEST, Dave Hansen wrote:
On 5/15/24 06:54, Jarkko Sakkinen wrote:
I'd cut out 90% of the description out and just make the argument of the wrong error code, and done. The sequence is great for showing how this could happen. The prose makes my head hurt tbh.
The changelog is too long, but not fatally so. I'd much rather have a super verbose description than something super sparse.
Would something like this make more sense to folks?
Imagine an mmap()'d file. Two threads touch the same address at the same time and fault. Both allocate a physical page and race to install a PTE for that page. Only one will win the race. The loser frees its page, but still continues handling the fault as a success and returns VM_FAULT_NOPAGE from the fault handler.
The same race can happen with SGX. But there's a bug: the loser in the SGX steers into a failure path. The loser EREMOVE's the winner's EPC page, then returns SIGBUS, likely killing the app.
Fix the SGX loser's behavior. Change the return code to VM_FAULT_NOPAGE to avoid SIGBUS and call sgx_free_epc_page() which avoids EREMOVE'ing the winner's page and only frees the page that the loser allocated.
Yes!
I did read the whole thing. My comment was only related to the chain of maintainers who also have to deal with this patch eventually.
BR, Jarkko
Hi Dmitrii,
On 5/15/2024 6:12 AM, Dmitrii Kuvaiskii wrote:
Two enclave threads may try to access the same non-present enclave page simultaneously (e.g., if the SGX runtime supports lazy allocation). The threads will end up in sgx_encl_eaug_page(), racing to acquire the enclave lock. The winning thread will perform EAUG, set up the page table entry, and insert the page into encl->page_array. The losing thread will then get -EBUSY on xa_insert(&encl->page_array) and proceed to error handling path.
This race condition can be illustrated as follows:
/* /*
- Fault on CPU1 * Fault on CPU2
- on enclave page X * on enclave page X
*/ */ sgx_vma_fault() { sgx_vma_fault() {
xa_load(&encl->page_array) xa_load(&encl->page_array) == NULL --> == NULL -->
sgx_encl_eaug_page() { sgx_encl_eaug_page() {
... ... /* /* * alloc encl_page * alloc encl_page */ */ mutex_lock(&encl->lock); /* * alloc EPC page */ epc_page = sgx_alloc_epc_page(...); /* * add page to enclave's xarray */ xa_insert(&encl->page_array, ...); /* * add page to enclave via EAUG * (page is in pending state) */ /* * add PTE entry */ vmf_insert_pfn(...); mutex_unlock(&encl->lock); return VM_FAULT_NOPAGE; } } /* * All good up to here: enclave page * successfully added to enclave, * ready for EACCEPT from user space */ mutex_lock(&encl->lock); /* * alloc EPC page */ epc_page = sgx_alloc_epc_page(...); /* * add page to enclave's xarray, * this fails with -EBUSY as this * page was already added by CPU2 */ xa_insert(&encl->page_array, ...);
err_out_shrink: sgx_encl_free_epc_page(epc_page) { /* * remove page via EREMOVE * * *BUG*: page added by CPU2 is * yanked from enclave while it * remains accessible from OS * perspective (PTE installed) */ /* * free EPC page */ sgx_free_epc_page(epc_page); }
mutex_unlock(&encl->lock); /* * *BUG*: SIGBUS is returned * for a valid enclave page */ return VM_FAULT_SIGBUS;
} }
The err_out_shrink error handling path contains two bugs: (1) function sgx_encl_free_epc_page() is called that performs EREMOVE even though the enclave page was never intended to be removed, and (2) SIGBUS is sent to userspace even though the enclave page is correctly installed by another thread.
The first bug renders the enclave page perpetually inaccessible (until another SGX_IOC_ENCLAVE_REMOVE_PAGES ioctl). This is because the page is marked accessible in the PTE entry but is not EAUGed, and any subsequent access to this page raises a fault: with the kernel believing there to be a valid VMA, the unlikely error code X86_PF_SGX encountered by code path do_user_addr_fault() -> access_error() causes the SGX driver's sgx_vma_fault() to be skipped and user space receives a SIGSEGV instead. The userspace SIGSEGV handler cannot perform EACCEPT because the page was not EAUGed. Thus, the user space is stuck with the inaccessible page. The second bug is less severe: a spurious SIGBUS signal is unnecessarily sent to user space.
Fix these two bugs (1) by returning VM_FAULT_NOPAGE to the generic Linux fault handler so that no signal is sent to userspace, and (2) by replacing sgx_encl_free_epc_page() with sgx_free_epc_page() so that no EREMOVE is performed.
Note that sgx_encl_free_epc_page() performs an additional WARN_ON_ONCE check in comparison to sgx_free_epc_page(): whether the EPC page is being reclaimer tracked. However, the EPC page is allocated in sgx_encl_eaug_page() and has zeroed-out flags in all error handling paths. In other words, the page is marked as reclaimable only in the happy path of sgx_encl_eaug_page(). Therefore, in the particular code path affected in this commit, the "page reclaimer tracked" condition is always false and the warning is never printed. Thus, it is safe to replace sgx_encl_free_epc_page() with sgx_free_epc_page().
Fixes: 5a90d2c3f5ef ("x86/sgx: Support adding of pages to an initialized enclave") Cc: stable@vger.kernel.org Reported-by: Marcelina Kościelnicka mwk@invisiblethingslab.com Suggested-by: Reinette Chatre reinette.chatre@intel.com Signed-off-by: Dmitrii Kuvaiskii dmitrii.kuvaiskii@intel.com
arch/x86/kernel/cpu/sgx/encl.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c index 279148e72459..41f14b1a3025 100644 --- a/arch/x86/kernel/cpu/sgx/encl.c +++ b/arch/x86/kernel/cpu/sgx/encl.c @@ -382,8 +382,11 @@ static vm_fault_t sgx_encl_eaug_page(struct vm_area_struct *vma, * If ret == -EBUSY then page was created in another flow while * running without encl->lock */
- if (ret)
- if (ret) {
if (ret == -EBUSY)
goto err_out_shrink;vmret = VM_FAULT_NOPAGE;
- }
pginfo.secs = (unsigned long)sgx_get_epc_virt_addr(encl->secs.epc_page); pginfo.addr = encl_page->desc & PAGE_MASK; @@ -419,7 +422,7 @@ static vm_fault_t sgx_encl_eaug_page(struct vm_area_struct *vma, err_out_shrink: sgx_encl_shrink(encl, va_page); err_out_epc:
- sgx_encl_free_epc_page(epc_page);
- sgx_free_epc_page(epc_page);
err_out_unlock: mutex_unlock(&encl->lock); kfree(encl_page);
Thank you very much. I understand the changelog is still being discussed and those changes look good to me, to which you can add:
Reviewed-by: Reinette Chatre reinette.chatre@intel.com
Reinette
Thank you very much. I understand the changelog is still being discussed and those changes look good to me, to which you can add:
Reviewed-by: Reinette Chatre reinette.chatre@intel.com
also for this (with changelog tweak Dave suggested) so that we don't need a new round:
Reviewed-by: Jarkko Sakkinen jarkko@kernel.org
BR, Jarkko
On Wed, 15 May 2024 08:12:39 -0500, Dmitrii Kuvaiskii dmitrii.kuvaiskii@intel.com wrote:
Two enclave threads may try to access the same non-present enclave page simultaneously (e.g., if the SGX runtime supports lazy allocation). The threads will end up in sgx_encl_eaug_page(), racing to acquire the enclave lock. The winning thread will perform EAUG, set up the page table entry, and insert the page into encl->page_array. The losing thread will then get -EBUSY on xa_insert(&encl->page_array) and proceed to error handling path.
This race condition can be illustrated as follows:
/* /*
- Fault on CPU1 * Fault on CPU2
- on enclave page X * on enclave page X
*/ */ sgx_vma_fault() { sgx_vma_fault() {
xa_load(&encl->page_array) xa_load(&encl->page_array) == NULL --> == NULL -->
sgx_encl_eaug_page() { sgx_encl_eaug_page() {
... ... /* /* * alloc encl_page * alloc encl_page */ */ mutex_lock(&encl->lock); /* * alloc EPC page */ epc_page = sgx_alloc_epc_page(...); /* * add page to enclave's xarray */ xa_insert(&encl->page_array, ...); /* * add page to enclave via EAUG * (page is in pending state) */ /* * add PTE entry */ vmf_insert_pfn(...); mutex_unlock(&encl->lock); return VM_FAULT_NOPAGE; } } /* * All good up to here: enclave page * successfully added to enclave, * ready for EACCEPT from user space */ mutex_lock(&encl->lock); /* * alloc EPC page */ epc_page = sgx_alloc_epc_page(...); /* * add page to enclave's xarray, * this fails with -EBUSY as this * page was already added by CPU2 */ xa_insert(&encl->page_array, ...);
err_out_shrink: sgx_encl_free_epc_page(epc_page) { /* * remove page via EREMOVE * * *BUG*: page added by CPU2 is * yanked from enclave while it * remains accessible from OS * perspective (PTE installed) */ /* * free EPC page */ sgx_free_epc_page(epc_page); }
mutex_unlock(&encl->lock); /* * *BUG*: SIGBUS is returned * for a valid enclave page */ return VM_FAULT_SIGBUS;
} }
The err_out_shrink error handling path contains two bugs: (1) function sgx_encl_free_epc_page() is called that performs EREMOVE even though the enclave page was never intended to be removed, and (2) SIGBUS is sent to userspace even though the enclave page is correctly installed by another thread.
The first bug renders the enclave page perpetually inaccessible (until another SGX_IOC_ENCLAVE_REMOVE_PAGES ioctl). This is because the page is marked accessible in the PTE entry but is not EAUGed, and any subsequent access to this page raises a fault: with the kernel believing there to be a valid VMA, the unlikely error code X86_PF_SGX encountered by code path do_user_addr_fault() -> access_error() causes the SGX driver's sgx_vma_fault() to be skipped and user space receives a SIGSEGV instead. The userspace SIGSEGV handler cannot perform EACCEPT because the page was not EAUGed. Thus, the user space is stuck with the inaccessible page. The second bug is less severe: a spurious SIGBUS signal is unnecessarily sent to user space.
Fix these two bugs (1) by returning VM_FAULT_NOPAGE to the generic Linux fault handler so that no signal is sent to userspace, and (2) by replacing sgx_encl_free_epc_page() with sgx_free_epc_page() so that no EREMOVE is performed.
Note that sgx_encl_free_epc_page() performs an additional WARN_ON_ONCE check in comparison to sgx_free_epc_page(): whether the EPC page is being reclaimer tracked. However, the EPC page is allocated in sgx_encl_eaug_page() and has zeroed-out flags in all error handling paths. In other words, the page is marked as reclaimable only in the happy path of sgx_encl_eaug_page(). Therefore, in the particular code path affected in this commit, the "page reclaimer tracked" condition is always false and the warning is never printed. Thus, it is safe to replace sgx_encl_free_epc_page() with sgx_free_epc_page().
Fixes: 5a90d2c3f5ef ("x86/sgx: Support adding of pages to an initialized enclave") Cc: stable@vger.kernel.org Reported-by: Marcelina Kościelnicka mwk@invisiblethingslab.com Suggested-by: Reinette Chatre reinette.chatre@intel.com Signed-off-by: Dmitrii Kuvaiskii dmitrii.kuvaiskii@intel.com
arch/x86/kernel/cpu/sgx/encl.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c index 279148e72459..41f14b1a3025 100644 --- a/arch/x86/kernel/cpu/sgx/encl.c +++ b/arch/x86/kernel/cpu/sgx/encl.c @@ -382,8 +382,11 @@ static vm_fault_t sgx_encl_eaug_page(struct vm_area_struct *vma, * If ret == -EBUSY then page was created in another flow while * running without encl->lock */
- if (ret)
- if (ret) {
if (ret == -EBUSY)
goto err_out_shrink;vmret = VM_FAULT_NOPAGE;
- } pginfo.secs = (unsigned long)sgx_get_epc_virt_addr(encl->secs.epc_page); pginfo.addr = encl_page->desc & PAGE_MASK;
@@ -419,7 +422,7 @@ static vm_fault_t sgx_encl_eaug_page(struct vm_area_struct *vma, err_out_shrink: sgx_encl_shrink(encl, va_page); err_out_epc:
- sgx_encl_free_epc_page(epc_page);
- sgx_free_epc_page(epc_page);
err_out_unlock: mutex_unlock(&encl->lock); kfree(encl_page);
Reviewed-by: Haitao Huang haitao.huang@linux.intel.com
linux-stable-mirror@lists.linaro.org