Two enclave threads may try to access the same non-present enclave page
simultaneously (e.g., if the SGX runtime supports lazy allocation). The
threads will end up in sgx_encl_eaug_page(), racing to acquire the
enclave lock. The winning thread will perform EAUG, set up the page
table entry, and insert the page into encl->page_array. The losing
thread will then get -EBUSY on xa_insert(&encl->page_array) and proceed
to error handling path.
This race condition can be illustrated as follows:
/* /*
* Fault on CPU1 * Fault on CPU2
* on enclave page X * on enclave page X
*/ */
sgx_vma_fault() { sgx_vma_fault() {
xa_load(&encl->page_array) xa_load(&encl->page_array)
== NULL --> == NULL -->
sgx_encl_eaug_page() { sgx_encl_eaug_page() {
... ...
/* /*
* alloc encl_page * alloc encl_page
*/ */
mutex_lock(&encl->lock);
/*
* alloc EPC page
*/
epc_page = sgx_alloc_epc_page(...);
/*
* add page to enclave's xarray
*/
xa_insert(&encl->page_array, ...);
/*
* add page to enclave via EAUG
* (page is in pending state)
*/
/*
* add PTE entry
*/
vmf_insert_pfn(...);
mutex_unlock(&encl->lock);
return VM_FAULT_NOPAGE;
}
}
/*
* All good up to here: enclave page
* successfully added to enclave,
* ready for EACCEPT from user space
*/
mutex_lock(&encl->lock);
/*
* alloc EPC page
*/
epc_page = sgx_alloc_epc_page(...);
/*
* add page to enclave's xarray,
* this fails with -EBUSY as this
* page was already added by CPU2
*/
xa_insert(&encl->page_array, ...);
err_out_shrink:
sgx_encl_free_epc_page(epc_page) {
/*
* remove page via EREMOVE
*
* *BUG*: page added by CPU2 is
* yanked from enclave while it
* remains accessible from OS
* perspective (PTE installed)
*/
/*
* free EPC page
*/
sgx_free_epc_page(epc_page);
}
mutex_unlock(&encl->lock);
/*
* *BUG*: SIGBUS is returned
* for a valid enclave page
*/
return VM_FAULT_SIGBUS;
}
}
The err_out_shrink error handling path contains two bugs: (1) function
sgx_encl_free_epc_page() is called that performs EREMOVE even though the
enclave page was never intended to be removed, and (2) SIGBUS is sent to
userspace even though the enclave page is correctly installed by another
thread.
The first bug renders the enclave page perpetually inaccessible (until
another SGX_IOC_ENCLAVE_REMOVE_PAGES ioctl). This is because the page is
marked accessible in the PTE entry but is not EAUGed, and any subsequent
access to this page raises a fault: with the kernel believing there to
be a valid VMA, the unlikely error code X86_PF_SGX encountered by code
path do_user_addr_fault() -> access_error() causes the SGX driver's
sgx_vma_fault() to be skipped and user space receives a SIGSEGV instead.
The userspace SIGSEGV handler cannot perform EACCEPT because the page
was not EAUGed. Thus, the user space is stuck with the inaccessible
page. The second bug is less severe: a spurious SIGBUS signal is
unnecessarily sent to user space.
Fix these two bugs (1) by returning VM_FAULT_NOPAGE to the generic Linux
fault handler so that no signal is sent to userspace, and (2) by
replacing sgx_encl_free_epc_page() with sgx_free_epc_page() so that no
EREMOVE is performed.
Note that sgx_encl_free_epc_page() performs an additional WARN_ON_ONCE
check in comparison to sgx_free_epc_page(): whether the EPC page is
being reclaimer tracked. However, the EPC page is allocated in
sgx_encl_eaug_page() and has zeroed-out flags in all error handling
paths. In other words, the page is marked as reclaimable only in the
happy path of sgx_encl_eaug_page(). Therefore, in the particular code
path affected in this commit, the "page reclaimer tracked" condition is
always false and the warning is never printed. Thus, it is safe to
replace sgx_encl_free_epc_page() with sgx_free_epc_page().
Fixes: 5a90d2c3f5ef ("x86/sgx: Support adding of pages to an initialized enclave")
Cc: stable(a)vger.kernel.org
Reported-by: Marcelina Kościelnicka <mwk(a)invisiblethingslab.com>
Suggested-by: Reinette Chatre <reinette.chatre(a)intel.com>
Signed-off-by: Dmitrii Kuvaiskii <dmitrii.kuvaiskii(a)intel.com>
---
arch/x86/kernel/cpu/sgx/encl.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c
index 279148e72459..41f14b1a3025 100644
--- a/arch/x86/kernel/cpu/sgx/encl.c
+++ b/arch/x86/kernel/cpu/sgx/encl.c
@@ -382,8 +382,11 @@ static vm_fault_t sgx_encl_eaug_page(struct vm_area_struct *vma,
* If ret == -EBUSY then page was created in another flow while
* running without encl->lock
*/
- if (ret)
+ if (ret) {
+ if (ret == -EBUSY)
+ vmret = VM_FAULT_NOPAGE;
goto err_out_shrink;
+ }
pginfo.secs = (unsigned long)sgx_get_epc_virt_addr(encl->secs.epc_page);
pginfo.addr = encl_page->desc & PAGE_MASK;
@@ -419,7 +422,7 @@ static vm_fault_t sgx_encl_eaug_page(struct vm_area_struct *vma,
err_out_shrink:
sgx_encl_shrink(encl, va_page);
err_out_epc:
- sgx_encl_free_epc_page(epc_page);
+ sgx_free_epc_page(epc_page);
err_out_unlock:
mutex_unlock(&encl->lock);
kfree(encl_page);
--
2.34.1
Two enclave threads may try to add and remove the same enclave page
simultaneously (e.g., if the SGX runtime supports both lazy allocation
and MADV_DONTNEED semantics). Consider some enclave page added to the
enclave. User space decides to temporarily remove this page (e.g.,
emulating the MADV_DONTNEED semantics) on CPU1. At the same time, user
space performs a memory access on the same page on CPU2, which results
in a #PF and ultimately in sgx_vma_fault(). Scenario proceeds as
follows:
/*
* CPU1: User space performs
* ioctl(SGX_IOC_ENCLAVE_REMOVE_PAGES)
* on enclave page X
*/
sgx_encl_remove_pages() {
mutex_lock(&encl->lock);
entry = sgx_encl_load_page(encl);
/*
* verify that page is
* trimmed and accepted
*/
mutex_unlock(&encl->lock);
/*
* remove PTE entry; cannot
* be performed under lock
*/
sgx_zap_enclave_ptes(encl);
/*
* Fault on CPU2 on same page X
*/
sgx_vma_fault() {
/*
* PTE entry was removed, but the
* page is still in enclave's xarray
*/
xa_load(&encl->page_array) != NULL ->
/*
* SGX driver thinks that this page
* was swapped out and loads it
*/
mutex_lock(&encl->lock);
/*
* this is effectively a no-op
*/
entry = sgx_encl_load_page_in_vma();
/*
* add PTE entry
*
* *BUG*: a PTE is installed for a
* page in process of being removed
*/
vmf_insert_pfn(...);
mutex_unlock(&encl->lock);
return VM_FAULT_NOPAGE;
}
/*
* continue with page removal
*/
mutex_lock(&encl->lock);
sgx_encl_free_epc_page(epc_page) {
/*
* remove page via EREMOVE
*/
/*
* free EPC page
*/
sgx_free_epc_page(epc_page);
}
xa_erase(&encl->page_array);
mutex_unlock(&encl->lock);
}
Here, CPU1 removed the page. However CPU2 installed the PTE entry on the
same page. This enclave page becomes perpetually inaccessible (until
another SGX_IOC_ENCLAVE_REMOVE_PAGES ioctl). This is because the page is
marked accessible in the PTE entry but is not EAUGed, and any subsequent
access to this page raises a fault: with the kernel believing there to
be a valid VMA, the unlikely error code X86_PF_SGX encountered by code
path do_user_addr_fault() -> access_error() causes the SGX driver's
sgx_vma_fault() to be skipped and user space receives a SIGSEGV instead.
The userspace SIGSEGV handler cannot perform EACCEPT because the page
was not EAUGed. Thus, the user space is stuck with the inaccessible
page.
Fix this race by forcing the fault handler on CPU2 to back off if the
page is currently being removed (on CPU1). This is achieved by
introducing a new flag SGX_ENCL_PAGE_BEING_REMOVED, which is unset by
default and set only right-before the first mutex_unlock() in
sgx_encl_remove_pages(). Upon loading the page, CPU2 checks whether this
page is being removed, and if yes then CPU2 backs off and waits until
the page is completely removed. After that, any memory access to this
page results in a normal "allocate and EAUG a page on #PF" flow.
Fixes: 9849bb27152c ("x86/sgx: Support complete page removal")
Cc: stable(a)vger.kernel.org
Signed-off-by: Dmitrii Kuvaiskii <dmitrii.kuvaiskii(a)intel.com>
---
arch/x86/kernel/cpu/sgx/encl.c | 3 ++-
arch/x86/kernel/cpu/sgx/encl.h | 3 +++
arch/x86/kernel/cpu/sgx/ioctl.c | 1 +
3 files changed, 6 insertions(+), 1 deletion(-)
diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c
index 41f14b1a3025..7ccd8b2fce5f 100644
--- a/arch/x86/kernel/cpu/sgx/encl.c
+++ b/arch/x86/kernel/cpu/sgx/encl.c
@@ -257,7 +257,8 @@ static struct sgx_encl_page *__sgx_encl_load_page(struct sgx_encl *encl,
/* Entry successfully located. */
if (entry->epc_page) {
- if (entry->desc & SGX_ENCL_PAGE_BEING_RECLAIMED)
+ if (entry->desc & (SGX_ENCL_PAGE_BEING_RECLAIMED |
+ SGX_ENCL_PAGE_BEING_REMOVED))
return ERR_PTR(-EBUSY);
return entry;
diff --git a/arch/x86/kernel/cpu/sgx/encl.h b/arch/x86/kernel/cpu/sgx/encl.h
index f94ff14c9486..fff5f2293ae7 100644
--- a/arch/x86/kernel/cpu/sgx/encl.h
+++ b/arch/x86/kernel/cpu/sgx/encl.h
@@ -25,6 +25,9 @@
/* 'desc' bit marking that the page is being reclaimed. */
#define SGX_ENCL_PAGE_BEING_RECLAIMED BIT(3)
+/* 'desc' bit marking that the page is being removed. */
+#define SGX_ENCL_PAGE_BEING_REMOVED BIT(2)
+
struct sgx_encl_page {
unsigned long desc;
unsigned long vm_max_prot_bits:8;
diff --git a/arch/x86/kernel/cpu/sgx/ioctl.c b/arch/x86/kernel/cpu/sgx/ioctl.c
index b65ab214bdf5..c542d4dd3e64 100644
--- a/arch/x86/kernel/cpu/sgx/ioctl.c
+++ b/arch/x86/kernel/cpu/sgx/ioctl.c
@@ -1142,6 +1142,7 @@ static long sgx_encl_remove_pages(struct sgx_encl *encl,
* Do not keep encl->lock because of dependency on
* mmap_lock acquired in sgx_zap_enclave_ptes().
*/
+ entry->desc |= SGX_ENCL_PAGE_BEING_REMOVED;
mutex_unlock(&encl->lock);
sgx_zap_enclave_ptes(encl, addr);
--
2.34.1
Hi,
There are just two minor fixes from Hector that we've been carrying downstream
for a while now. One increases the timeout while waiting for the firmware to
boot which is optional for the controller already supported upstream but
required for a newer 4388 board for which we'll also submit support soon.
It also fixes the units for the timeouts which is why I've already included it
here. The other one fixes a call to bitmap_release_region where we only wanted
to release a single bit but are actually releasing much more.
Best,
Sven
To: Hector Martin <marcan(a)marcan.st>
To: Alyssa Rosenzweig <alyssa(a)rosenzweig.io>
To: Marcel Holtmann <marcel(a)holtmann.org>
To: Luiz Augusto von Dentz <luiz.dentz(a)gmail.com>
Cc: asahi(a)lists.linux.dev
Cc: linux-arm-kernel(a)lists.infradead.org
Cc: linux-bluetooth(a)vger.kernel.org
Cc: linux-kernel(a)vger.kernel.org
Signed-off-by: Sven Peter <sven(a)svenpeter.dev>
Changes in v2:
- split the timeout commit into two
- collect Neal's tag
- Link to v1: https://lore.kernel.org/r/20240512-btfix-msgid-v1-0-ab1bd938a7f4@svenpeter.…
---
Hector Martin (2):
Bluetooth: hci_bcm4377: Increase boot timeout
Bluetooth: hci_bcm4377: Fix msgid release
Sven Peter (1):
Bluetooth: hci_bcm4377: Use correct unit for timeouts
drivers/bluetooth/hci_bcm4377.c | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)
---
base-commit: cf87f46fd34d6c19283d9625a7822f20d90b64a4
change-id: 20240512-btfix-msgid-d76029a7d917
Best regards,
--
Sven Peter <sven(a)svenpeter.dev>
From: Sven Peter <sven(a)svenpeter.dev>
Some Broadcom controllers found on Apple Silicon machines abuse the
reserved bits inside the PHY fields of LE Extended Advertising Report
events for additional flags. Add a quirk to drop these and correctly
extract the Primary/Secondary_PHY field.
The following excerpt from a btmon trace shows a report received with
"Reserved" for "Primary PHY" on a 4388 controller:
> HCI Event: LE Meta Event (0x3e) plen 26
LE Extended Advertising Report (0x0d)
Num reports: 1
Entry 0
Event type: 0x2515
Props: 0x0015
Connectable
Directed
Use legacy advertising PDUs
Data status: Complete
Reserved (0x2500)
Legacy PDU Type: Reserved (0x2515)
Address type: Random (0x01)
Address: 00:00:00:00:00:00 (Static)
Primary PHY: Reserved
Secondary PHY: No packets
SID: no ADI field (0xff)
TX power: 127 dBm
RSSI: -60 dBm (0xc4)
Periodic advertising interval: 0.00 msec (0x0000)
Direct address type: Public (0x00)
Direct address: 00:00:00:00:00:00 (Apple, Inc.)
Data length: 0x00
Cc: stable(a)vger.kernel.org
Fixes: 2e7ed5f5e69b ("Bluetooth: hci_sync: Use advertised PHYs on hci_le_ext_create_conn_sync")
Reported-by: Janne Grunau <j(a)jannau.net>
Closes: https://lore.kernel.org/all/Zjz0atzRhFykROM9@robin
Tested-by: Janne Grunau <j(a)jannau.net>
Signed-off-by: Sven Peter <sven(a)svenpeter.dev>
---
drivers/bluetooth/hci_bcm4377.c | 8 ++++++++
include/net/bluetooth/hci.h | 11 +++++++++++
net/bluetooth/hci_event.c | 7 +++++++
3 files changed, 26 insertions(+)
diff --git a/drivers/bluetooth/hci_bcm4377.c b/drivers/bluetooth/hci_bcm4377.c
index 9a7243d5db71..55109ad45115 100644
--- a/drivers/bluetooth/hci_bcm4377.c
+++ b/drivers/bluetooth/hci_bcm4377.c
@@ -495,6 +495,10 @@ struct bcm4377_data;
* extended scanning
* broken_mws_transport_config: Set to true if the chip erroneously claims to
* support MWS Transport Configuration
+ * broken_le_ext_adv_report_phy: Set to true if this chip stuffs flags inside
+ * reserved bits of Primary/Secondary_PHY inside
+ * LE Extended Advertising Report events which
+ * have to be ignored
* send_calibration: Optional callback to send calibration data
* send_ptb: Callback to send "PTB" regulatory/calibration data
*/
@@ -513,6 +517,7 @@ struct bcm4377_hw {
unsigned long broken_ext_scan : 1;
unsigned long broken_mws_transport_config : 1;
unsigned long broken_le_coded : 1;
+ unsigned long broken_le_ext_adv_report_phy : 1;
int (*send_calibration)(struct bcm4377_data *bcm4377);
int (*send_ptb)(struct bcm4377_data *bcm4377,
@@ -2374,6 +2379,8 @@ static int bcm4377_probe(struct pci_dev *pdev, const struct pci_device_id *id)
set_bit(HCI_QUIRK_BROKEN_EXT_SCAN, &hdev->quirks);
if (bcm4377->hw->broken_le_coded)
set_bit(HCI_QUIRK_BROKEN_LE_CODED, &hdev->quirks);
+ if (bcm4377->hw->broken_le_ext_adv_report_phy)
+ set_bit(HCI_QUIRK_FIXUP_LE_EXT_ADV_REPORT_PHY, &hdev->quirks);
pci_set_drvdata(pdev, bcm4377);
hci_set_drvdata(hdev, bcm4377);
@@ -2478,6 +2485,7 @@ static const struct bcm4377_hw bcm4377_hw_variants[] = {
.clear_pciecfg_subsystem_ctrl_bit19 = true,
.broken_mws_transport_config = true,
.broken_le_coded = true,
+ .broken_le_ext_adv_report_phy = true,
.send_calibration = bcm4387_send_calibration,
.send_ptb = bcm4378_send_ptb,
},
diff --git a/include/net/bluetooth/hci.h b/include/net/bluetooth/hci.h
index 5c12761cbc0e..342aab86dad6 100644
--- a/include/net/bluetooth/hci.h
+++ b/include/net/bluetooth/hci.h
@@ -347,6 +347,17 @@ enum {
* claim to support it.
*/
HCI_QUIRK_BROKEN_READ_ENC_KEY_SIZE,
+
+ /*
+ * When this quirk is set, the reserved bits of Primary/Secondary_PHY
+ * inside the LE Extended Advertising Report events are discarded.
+ * This is required for some Apple/Broadcom controllers which
+ * abuse these reserved bits for unrelated flags.
+ *
+ * This quirk can be set before hci_register_dev is called or
+ * during the hdev->setup vendor callback.
+ */
+ HCI_QUIRK_FIXUP_LE_EXT_ADV_REPORT_PHY,
};
/* HCI device flags */
diff --git a/net/bluetooth/hci_event.c b/net/bluetooth/hci_event.c
index d72d238c1656..01d3a8d343f1 100644
--- a/net/bluetooth/hci_event.c
+++ b/net/bluetooth/hci_event.c
@@ -6446,6 +6446,13 @@ static void hci_le_ext_adv_report_evt(struct hci_dev *hdev, void *data,
evt_type = __le16_to_cpu(info->type) & LE_EXT_ADV_EVT_TYPE_MASK;
legacy_evt_type = ext_evt_type_to_legacy(hdev, evt_type);
+
+ if (test_bit(HCI_QUIRK_FIXUP_LE_EXT_ADV_REPORT_PHY,
+ &hdev->quirks)) {
+ info->primary_phy &= 0x1f;
+ info->secondary_phy &= 0x1f;
+ }
+
if (legacy_evt_type != LE_ADV_INVALID) {
process_adv_report(hdev, legacy_evt_type, &info->bdaddr,
info->bdaddr_type, NULL, 0,
---
base-commit: a38297e3fb012ddfa7ce0321a7e5a8daeb1872b6
change-id: 20240512-btfix-95c10c456f17
Best regards,
--
Sven Peter <sven(a)svenpeter.dev>
Commit 35039eb6b199 ("x86: Show symbol name if insn decoder test failed")
included symbol lines in the post-processed objdump output consumed by
the insn decoder test. This broke the `instuction lines == total lines`
property that `insn_decoder_test.c` relied upon to print the offending
line's number in error messages. This has the consequence that the line
number reported on a test failure is unreated to, and much smaller than,
the line that actually caused the problem.
Add a new variable that counts the combined (insn+symbol) line count and
report this in the error message.
Fixes: 35039eb6b199 ("x86: Show symbol name if insn decoder test failed")
Cc: stable(a)vger.kernel.org
Reviewed-by: Miguel Ojeda <ojeda(a)kernel.org>
Tested-by: Miguel Ojeda <ojeda(a)kernel.org>
Reported-by: John Baublitz <john.m.baublitz(a)gmail.com>
Debugged-by: John Baublitz <john.m.baublitz(a)gmail.com>
Signed-off-by: Valentin Obst <kernel(a)valentinobst.de>
---
See v2's commit message and [1] for context why this bug made debugging a
test failure harder than necessary.
[1]: https://rust-for-linux.zulipchat.com/#narrow/stream/291565-Help/topic/insn_…
Changes in v3:
- Add Cc stable tag in sign-off area.
- Make commit message less verbose.
- Link to v2: https://lore.kernel.org/r/20240223-x86-insn-decoder-line-fix-v2-1-cde49c69f…
Changes in v2:
- Added tags 'Reviewed-by', 'Tested-by', 'Reported-by', 'Debugged-by',
'Link', and 'Fixes'.
- Explain why this patch fixes the commit mentioned in the 'Fixes' tag.
- CCed the stable list and sent to all x86 maintainers.
- Link to v1: https://lore.kernel.org/r/20240221-x86-insn-decoder-line-fix-v1-1-47cd5a171…
---
arch/x86/tools/insn_decoder_test.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/arch/x86/tools/insn_decoder_test.c b/arch/x86/tools/insn_decoder_test.c
index 472540aeabc2..727017a3c3c7 100644
--- a/arch/x86/tools/insn_decoder_test.c
+++ b/arch/x86/tools/insn_decoder_test.c
@@ -114,6 +114,7 @@ int main(int argc, char **argv)
unsigned char insn_buff[16];
struct insn insn;
int insns = 0;
+ int lines = 0;
int warnings = 0;
parse_args(argc, argv);
@@ -123,6 +124,8 @@ int main(int argc, char **argv)
int nb = 0, ret;
unsigned int b;
+ lines++;
+
if (line[0] == '<') {
/* Symbol line */
strcpy(sym, line);
@@ -134,12 +137,12 @@ int main(int argc, char **argv)
strcpy(copy, line);
tab1 = strchr(copy, '\t');
if (!tab1)
- malformed_line(line, insns);
+ malformed_line(line, lines);
s = tab1 + 1;
s += strspn(s, " ");
tab2 = strchr(s, '\t');
if (!tab2)
- malformed_line(line, insns);
+ malformed_line(line, lines);
*tab2 = '\0'; /* Characters beyond tab2 aren't examined */
while (s < tab2) {
if (sscanf(s, "%x", &b) == 1) {
---
base-commit: 4cece764965020c22cff7665b18a012006359095
change-id: 20240221-x86-insn-decoder-line-fix-7b1f2e1732ff
Best regards,
--
Valentin Obst <kernel(a)valentinobst.de>