Re: [PATCH 1/2] x86/sgx: Resolve EAUG race where losing thread returns SIGBUS

10 May 2024

Hi Dmitrii,
Thank you so much for finding as well as fixing this issue.
On 4/30/2024 7:37 AM, Dmitrii Kuvaiskii wrote:
...
On Mon, Apr 29, 2024 at 04:04:24PM +0300, Jarkko Sakkinen wrote:
...
On Mon Apr 29, 2024 at 1:43 PM EEST, Dmitrii Kuvaiskii wrote:
...
Two enclave threads may try to access the same non-present enclave page
simultaneously (e.g., if the SGX runtime supports lazy allocation). The
threads will end up in sgx_encl_eaug_page(), racing to acquire the
enclave lock. The winning thread will perform EAUG, set up the page
table entry, and insert the page into encl->page_array. The losing
thread will then get -EBUSY on xa_insert(&encl->page_array) and proceed
to error handling path.
And that path removes page. Not sure I got gist of this tbh.
Well, this is not about a redundant EREMOVE performed. This is about the
enclave page becoming inaccessible due to a bug triggered with a data race.
Consider some enclave page not yet added to the enclave. The enclave
performs a memory access to it at the same time on CPU1 and CPU2. Since the
page does not yet have a corresponding PTE, the #PF handler on both CPUs
calls sgx_vma_fault(). Scenario proceeds as follows:
/*

Fault on CPU1

*/
sgx_vma_fault() {
xa_load(&encl->page_array) == NULL ->
sgx_encl_eaug_page() {
...                            /*
                                * Fault on CPU2
                                */
                               sgx_vma_fault() {

                                 xa_load(&encl->page_array) == NULL ->

                                 sgx_encl_eaug_page() {

                                   ...


Up to here it may be helpful to have the CPU1 and CPU2 code run concurrently
to highlight the race. First one to get the mutex "wins".
...
                                   mutex_lock(&encl->lock);
                                   /*
                                    * alloc encl_page
                                    */

Please note that encl_page is allocated before mutex is obtained.
...
                                   /*
                                    * alloc EPC page
                                    */
                                   epc_page = sgx_alloc_epc_page(...);
                                   /*
                                    * add page_to enclave's xarray

"page_to" -> "page to" ?
...
                                    */
                                   xa_insert(&encl->page_array, ...);
                                   /*
                                    * add page to enclave via EAUG
                                    * (page is in pending state)
                                    */
                                   /*
                                    * add PTE entry
                                    */
                                   vmf_insert_pfn(...);

                                   mutex_unlock(&encl->lock);
                                   return VM_FAULT_NOPAGE;
                                 }
                               }

A brief comment under CPU2 essentially stating that this is a "good"
flow may help. Something like: "All good up to here. Enclave page successfully
added to enclave, ready for EACCEPT from user space". (please feel free to
improve)
...
 mutex_lock(&encl->lock);
 /*
  * alloc encl_page
  */

This should be outside mutex_lock(). It can even be shown earlier how
CPU1 and CPU2 can allocate encl_page concurrently (which is fine to do).
...
 /*
  * alloc EPC page
  */
 epc_page = sgx_alloc_epc_page(...);
 /*
  * add page_to enclave's xarray,

hmmm ... is page_to actually intended?
...
  * this fails with -EBUSY

It may help to highlight that this failure is because CPU1 and CPU2 are both
attempting to access the same page thus the page was already added in CPU2 flow.
...
  */
 xa_insert(&encl->page_array, ...);


err_out_shrink:
     sgx_encl_free_epc_page(epc_page) {
       /*
        * remove page via EREMOVE
        */
This needs emphasis that this is *BAD*. Something like:
"BUG: Enclave page added from CPU2 is yanked (via EREMOVE)
from enclave while it remains "accessible" from OS perspective 
PTE installed with entry in OS's page_array)."
(please feel free to improve)
...
   /*
    * free EPC page
    */
   sgx_free_epc_page(epc_page);
 }

  mutex_unlock(&encl->lock);
  return VM_FAULT_SIGBUS;

This needs emphasis that this is *BAD*. "BUG: SIGBUS is
returned for a valid enclave page."  (please feel free to
improve)
...
}

}
CPU2 added the enclave page (in pending state) to the enclave and installed
the PTE. The kernel gives control back to the user space, without raising a
signal. The user space on CPU2 retries the memory access and induces a page
fault, but now with the SGX bit set in the #PF error code. The #PF handler
calls do_user_addr_fault(), which calls access_error() and ultimately
raises a SIGSEGV. The userspace SIGSEGV handler is supposed to perform
EACCEPT, after which point the enclave page becomes accessible.
CPU1 however jumps to the error handling path because the page was already
inserted into the enclave's xarray. This error handling path EREMOVEs the
page and also raises a SIGBUS signal to user space. The PTE entry is not
removed.
After CPU1 performs EREMOVE, this enclave page becomes perpetually
inaccessible (until an SGX_IOC_ENCLAVE_REMOVE_PAGES ioctl). This is because
the page is marked accessible in the PTE entry but is not EAUGed. Because
of this combination, the #PF handler sees the SGX bit set in the #PF error
Which #PF handler are you referring to here?
...
code and does not call sgx_vma_fault() but instead raises a SIGSEGV. The
userspace SIGSEGV handler cannot perform EACCEPT because the page was not
EAUGed. Thus, the user space is stuck with the inaccessible page.
Also note that in the scenario, CPU1 raises a SIGBUS signal to user space
unnecessarily. This signal is spurious because a page-access retry on CPU2
will also raise the SIGBUS signal. That said, this side effect is less
severe because it affects only user space. Therefore, it could be
circumvented in user space alone, but it seems reasonable to fix it in this
patch.
The variety of the signals and how they could/should be handled by userspace
are not completely clear to me but the bugs are clear to me and needs to be
fixed.
...
...
...
This error handling path contains two bugs: (1) SIGBUS is sent to
userspace even though the enclave page is correctly installed by another
thread, and (2) sgx_encl_free_epc_page() is called that performs EREMOVE
even though the enclave page was never intended to be removed. The first
bug is less severe because it impacts only the user space; the second
bug is more severe because it also impacts the OS state by ripping the
page (added by the winning thread) from the enclave.
Fix these two bugs (1) by returning VM_FAULT_NOPAGE to the generic Linux
fault handler so that no signal is sent to userspace, and (2) by
replacing sgx_encl_free_epc_page() with sgx_free_epc_page() so that no
EREMOVE is performed.
What is the collateral damage caused by ENCLS[EREMOVE]?
As explained above, the damage is that the SGX driver leaves the enclave
page metadata in an inconsistent state: on the one hand, the PTE entry is
installed which forces the generic Linux fault handler to raise SIGSEGV,
and on the other hand, the page is not in a correct state to be EACCEPTed
(i.e., EAUG was not performed on this page).
...
...
Fixes: 5a90d2c3f5ef ("x86/sgx: Support adding of pages to an initialized enclave")
Cc: stable@vger.kernel.org
Reported-by: Marcelina Kościelnicka mwk@invisiblethingslab.com
Suggested-by: Reinette Chatre reinette.chatre@intel.com
Signed-off-by: Dmitrii Kuvaiskii dmitrii.kuvaiskii@intel.com

arch/x86/kernel/cpu/sgx/encl.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c
index 279148e72459..41f14b1a3025 100644
--- a/arch/x86/kernel/cpu/sgx/encl.c
+++ b/arch/x86/kernel/cpu/sgx/encl.c
@@ -382,8 +382,11 @@ static vm_fault_t sgx_encl_eaug_page(struct vm_area_struct *vma,
    * If ret == -EBUSY then page was created in another flow while
    * running without encl->lock
    */

if (ret)


if (ret) {
if (ret == -EBUSY)


	vmret = VM_FAULT_NOPAGE;

goto err_out_shrink;
}

pginfo.secs = (unsigned long)sgx_get_epc_virt_addr(encl->secs.epc_page);
   pginfo.addr = encl_page->desc & PAGE_MASK;
@@ -419,7 +422,7 @@ static vm_fault_t sgx_encl_eaug_page(struct vm_area_struct *vma,
 err_out_shrink:
   sgx_encl_shrink(encl, va_page);
 err_out_epc:

sgx_encl_free_epc_page(epc_page);


sgx_free_epc_page(epc_page);

This ignores check for the page being reclaimer tracked, i.e. it does
changes that have been ignored in the commit message.
Indeed, sgx_encl_free_epc_page() performs the following check:
WARN_ON_ONCE(page->flags & SGX_EPC_PAGE_RECLAIMER_TRACKED);
However, the EPC page is allocated in sgx_encl_eaug_page() and has
zeroed-out flags in all error-handling paths. In other words, the page is
marked as reclaimable only in the happy path of sgx_encl_eaug_page().
Therefore, in the particular code path that I changed this "page reclaimer
tracked" condition is always false, and the warning is never printed.
Do you want me to explain this in the commit message?
Since original commit did prompt this question I do think it would
be helpful to add a snippet about this, yes.
The fix looks good to me. I assume that you will add the "CPU1 vs CPU2"
race description in the next version, that will help a lot to make the
bugs easier to spot.
Thanks again for this. Great catch.
Reinette

    

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

Re: [PATCH 1/2] x86/sgx: Resolve EAUG race where losing thread returns SIGBUS