On Thu, 2023-10-26 at 09:01 -0700, Chatre, Reinette wrote:
On 10/25/2023 4:58 PM, Huang, Kai wrote:
On Wed, 2023-10-25 at 07:31 -0700, Hansen, Dave wrote:
On 10/19/23 19:53, Haitao Huang wrote:
In the EAUG on page fault path, VM_FAULT_OOM is returned when the Enclave Page Cache (EPC) runs out. This may trigger unneeded OOM kill that will not free any EPCs. Return VM_FAULT_SIGBUS instead.
This commit message does not seem accurate to me. From what I can tell VM_FAULT_SIGBUS is indeed returned when EPC runs out. What is addressed with this patch is the error returned when kernel (not EPC) memory runs out.
So, when picking an error code and we look the documentation for the bits, we see:
- @VM_FAULT_OOM: Out Of Memory
- @VM_FAULT_SIGBUS: Bad access
So if anything we'll need a bit more changelog where you explain how running out of enclave memory is more "Bad access" than "Out Of Memory". Because on the surface this patch looks wrong.
But that's just a naming thing. What *behavior* is bad here? With the old code, what happens? With the new code, what happens? Why is the old better than the new?
I think Haitao meant if we return OOM, the core-MM fault handler will believe the fault couldn't be handled because of running out of memory, and then it could invoke the OOM killer which might select an unrelated victim who might have no EPC at all.
Since the issue is that system is out of kernel memory the resolution may need to look further than owners with EPC memory.
Oh right, I didn't look into the sgx_encl_page_alloc():
encl_page = kzalloc(sizeof(*encl_page), GFP_KERNEL); if (!encl_page) return ERR_PTR(-ENOMEM);
...
(Also, currently the non-EAUG code path (ELDU) in sgx_vma_fault() also returns SIGBUS if it fails to allocate EPC, so making EAUG code path return SIGBUS also matches the ELDU path.)
These errors all seem related to EPC memory to me, not kernel memory.
Right.