On Thu, May 16, 2024 at 04:50:34AM -0700, Alexander Sergeev wrote:
> Good afternoon!
>
> Thank you for your advice! Unfortunately, I couldn't find any way or
> form for reporting issues in netlink, so I decided to use a bug
> reporting guide for kernel in general. Could you please let me know
> where could I forward this information? How could I contact netlink
> development team?
The NETWORKING section of the MAINTAINERS file lists:
netdev(a)vger.kernel.org
as the proper place to go.
greg k-h
Good morning!
Recently, I have found an issue with `rtnetlink` library that seems to be not intended and/or documented. I have asked about it several times, including here and it was also reported here. Neither in related RFC document nor in rtnetlink manuals the issue is described or mentioned.
In a few words, the issue is: for some reason, rtnetlink GETADDR request works only with NLM_F_ROOT or NLM_F_DUMP flags. Consequently, for instance, filtering by ifa_index field, that would *theoretically* according to docs, allow us to receive address info for 1 specific interface only (by ID) , is not possible. For comparison, similar functionality to GETLINK request (getting information about exactly 1 link by index) is indeed very possible. Instead of the expected output (information about 1 interface only), what is returned is an error message with errno -95, which, as I suppose, can be read as "operation not permitted".
To this email, I attach a small C file illustrating my issue. Feel free to change IFACE_ID define for ID of any network interface present in your system. It illustrates the error.
Please, let me know what you think about this issue. Am I missing something or does it really needs fixing or at least documenting?
Best regards,
Aleksandr Sergeev.
P.S. I have been told that this issue probably won't be addressed, because it "can easily be fixed in user space", however I believe in Linux kernel maintainers team dedication for clear, well-documented and bug-free code; so if this issue is not hard to fix, I would suggest addressing it at least in docs. If it can be useful, I would be more than grateful to provide my help with it.
A debugfs directory entry is create early during probe(). This entry is
not removed on error path leading to some "already present" issues in
case of EPROBE_DEFER.
Create this entry later in the probe() code to avoid the need to change
many 'return' in 'goto' and add the removal in the already present error
path.
Fixes: 942814840127 ("net: lan966x: Add VCAP debugFS support")
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Herve Codina <herve.codina(a)bootlin.com>
Reviewed-by: Andrew Lunn <andrew(a)lunn.ch>
---
drivers/net/ethernet/microchip/lan966x/lan966x_main.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/microchip/lan966x/lan966x_main.c b/drivers/net/ethernet/microchip/lan966x/lan966x_main.c
index 2635ef8958c8..61d88207eed4 100644
--- a/drivers/net/ethernet/microchip/lan966x/lan966x_main.c
+++ b/drivers/net/ethernet/microchip/lan966x/lan966x_main.c
@@ -1087,8 +1087,6 @@ static int lan966x_probe(struct platform_device *pdev)
platform_set_drvdata(pdev, lan966x);
lan966x->dev = &pdev->dev;
- lan966x->debugfs_root = debugfs_create_dir("lan966x", NULL);
-
if (!device_get_mac_address(&pdev->dev, mac_addr)) {
ether_addr_copy(lan966x->base_mac, mac_addr);
} else {
@@ -1179,6 +1177,8 @@ static int lan966x_probe(struct platform_device *pdev)
return dev_err_probe(&pdev->dev, -ENODEV,
"no ethernet-ports child found\n");
+ lan966x->debugfs_root = debugfs_create_dir("lan966x", NULL);
+
/* init switch */
lan966x_init(lan966x);
lan966x_stats_init(lan966x);
@@ -1257,6 +1257,8 @@ static int lan966x_probe(struct platform_device *pdev)
destroy_workqueue(lan966x->stats_queue);
mutex_destroy(&lan966x->stats_lock);
+ debugfs_remove_recursive(lan966x->debugfs_root);
+
return err;
}
--
This patch was previously sent as part of a bigger series:
https://lore.kernel.org/lkml/20240430083730.134918-9-herve.codina@bootlin.c…
As it is a simple fix, this v2 is the patch extracted from the series
and sent alone to net.
Changes v1 -> v2
Add 'Reviewed-by: Andrew Lunn <andrew(a)lunn.ch>'
2.44.0
Early during NAND identification, mtd_info fields have not yet been
initialized (namely, writesize and oobsize) and thus cannot be used for
sanity checks yet. Of course if there is a misuse of
nand_change_read_column_op() so early we won't be warned, but there is
anyway no actual check to perform at this stage as we do not yet know
the NAND geometry.
So, if the fields are empty, especially mtd->writesize which is *always*
set quite rapidly after identification, let's skip the sanity checks.
nand_change_read_column_op() is subject to be used early for ONFI/JEDEC
identification in the very unlikely case of:
- bitflips appearing in the parameter page,
- the controller driver not supporting simple DATA_IN cycles.
Fixes: c27842e7e11f ("mtd: rawnand: onfi: Adapt the parameter page read to constraint controllers")
Fixes: daca31765e8b ("mtd: rawnand: jedec: Adapt the parameter page read to constraint controllers")
Cc: stable(a)vger.kernel.org
Reported-by: Alexander Dahl <ada(a)thorsis.com>
Closes: https://lore.kernel.org/linux-mtd/20240306-shaky-bunion-d28b65ea97d7@thorsi…
Reported-by: Steven Seeger <steven.seeger(a)flightsystems.net>
Closes: https://lore.kernel.org/linux-mtd/DM6PR05MB4506554457CF95191A670BDEF7062@DM…
Signed-off-by: Miquel Raynal <miquel.raynal(a)bootlin.com>
---
drivers/mtd/nand/raw/nand_base.c | 12 +++++++-----
1 file changed, 7 insertions(+), 5 deletions(-)
diff --git a/drivers/mtd/nand/raw/nand_base.c b/drivers/mtd/nand/raw/nand_base.c
index 248e654ecefd..a66e73cd68cb 100644
--- a/drivers/mtd/nand/raw/nand_base.c
+++ b/drivers/mtd/nand/raw/nand_base.c
@@ -1440,12 +1440,14 @@ int nand_change_read_column_op(struct nand_chip *chip,
if (len && !buf)
return -EINVAL;
- if (offset_in_page + len > mtd->writesize + mtd->oobsize)
- return -EINVAL;
+ if (mtd->writesize) {
+ if ((offset_in_page + len > mtd->writesize + mtd->oobsize))
+ return -EINVAL;
- /* Small page NANDs do not support column change. */
- if (mtd->writesize <= 512)
- return -ENOTSUPP;
+ /* Small page NANDs do not support column change. */
+ if (mtd->writesize <= 512)
+ return -ENOTSUPP;
+ }
if (nand_has_exec_op(chip)) {
const struct nand_interface_config *conf =
--
2.40.1
Two enclave threads may try to access the same non-present enclave page
simultaneously (e.g., if the SGX runtime supports lazy allocation). The
threads will end up in sgx_encl_eaug_page(), racing to acquire the
enclave lock. The winning thread will perform EAUG, set up the page
table entry, and insert the page into encl->page_array. The losing
thread will then get -EBUSY on xa_insert(&encl->page_array) and proceed
to error handling path.
This race condition can be illustrated as follows:
/* /*
* Fault on CPU1 * Fault on CPU2
* on enclave page X * on enclave page X
*/ */
sgx_vma_fault() { sgx_vma_fault() {
xa_load(&encl->page_array) xa_load(&encl->page_array)
== NULL --> == NULL -->
sgx_encl_eaug_page() { sgx_encl_eaug_page() {
... ...
/* /*
* alloc encl_page * alloc encl_page
*/ */
mutex_lock(&encl->lock);
/*
* alloc EPC page
*/
epc_page = sgx_alloc_epc_page(...);
/*
* add page to enclave's xarray
*/
xa_insert(&encl->page_array, ...);
/*
* add page to enclave via EAUG
* (page is in pending state)
*/
/*
* add PTE entry
*/
vmf_insert_pfn(...);
mutex_unlock(&encl->lock);
return VM_FAULT_NOPAGE;
}
}
/*
* All good up to here: enclave page
* successfully added to enclave,
* ready for EACCEPT from user space
*/
mutex_lock(&encl->lock);
/*
* alloc EPC page
*/
epc_page = sgx_alloc_epc_page(...);
/*
* add page to enclave's xarray,
* this fails with -EBUSY as this
* page was already added by CPU2
*/
xa_insert(&encl->page_array, ...);
err_out_shrink:
sgx_encl_free_epc_page(epc_page) {
/*
* remove page via EREMOVE
*
* *BUG*: page added by CPU2 is
* yanked from enclave while it
* remains accessible from OS
* perspective (PTE installed)
*/
/*
* free EPC page
*/
sgx_free_epc_page(epc_page);
}
mutex_unlock(&encl->lock);
/*
* *BUG*: SIGBUS is returned
* for a valid enclave page
*/
return VM_FAULT_SIGBUS;
}
}
The err_out_shrink error handling path contains two bugs: (1) function
sgx_encl_free_epc_page() is called that performs EREMOVE even though the
enclave page was never intended to be removed, and (2) SIGBUS is sent to
userspace even though the enclave page is correctly installed by another
thread.
The first bug renders the enclave page perpetually inaccessible (until
another SGX_IOC_ENCLAVE_REMOVE_PAGES ioctl). This is because the page is
marked accessible in the PTE entry but is not EAUGed, and any subsequent
access to this page raises a fault: with the kernel believing there to
be a valid VMA, the unlikely error code X86_PF_SGX encountered by code
path do_user_addr_fault() -> access_error() causes the SGX driver's
sgx_vma_fault() to be skipped and user space receives a SIGSEGV instead.
The userspace SIGSEGV handler cannot perform EACCEPT because the page
was not EAUGed. Thus, the user space is stuck with the inaccessible
page. The second bug is less severe: a spurious SIGBUS signal is
unnecessarily sent to user space.
Fix these two bugs (1) by returning VM_FAULT_NOPAGE to the generic Linux
fault handler so that no signal is sent to userspace, and (2) by
replacing sgx_encl_free_epc_page() with sgx_free_epc_page() so that no
EREMOVE is performed.
Note that sgx_encl_free_epc_page() performs an additional WARN_ON_ONCE
check in comparison to sgx_free_epc_page(): whether the EPC page is
being reclaimer tracked. However, the EPC page is allocated in
sgx_encl_eaug_page() and has zeroed-out flags in all error handling
paths. In other words, the page is marked as reclaimable only in the
happy path of sgx_encl_eaug_page(). Therefore, in the particular code
path affected in this commit, the "page reclaimer tracked" condition is
always false and the warning is never printed. Thus, it is safe to
replace sgx_encl_free_epc_page() with sgx_free_epc_page().
Fixes: 5a90d2c3f5ef ("x86/sgx: Support adding of pages to an initialized enclave")
Cc: stable(a)vger.kernel.org
Reported-by: Marcelina Kościelnicka <mwk(a)invisiblethingslab.com>
Suggested-by: Reinette Chatre <reinette.chatre(a)intel.com>
Signed-off-by: Dmitrii Kuvaiskii <dmitrii.kuvaiskii(a)intel.com>
---
arch/x86/kernel/cpu/sgx/encl.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c
index 279148e72459..41f14b1a3025 100644
--- a/arch/x86/kernel/cpu/sgx/encl.c
+++ b/arch/x86/kernel/cpu/sgx/encl.c
@@ -382,8 +382,11 @@ static vm_fault_t sgx_encl_eaug_page(struct vm_area_struct *vma,
* If ret == -EBUSY then page was created in another flow while
* running without encl->lock
*/
- if (ret)
+ if (ret) {
+ if (ret == -EBUSY)
+ vmret = VM_FAULT_NOPAGE;
goto err_out_shrink;
+ }
pginfo.secs = (unsigned long)sgx_get_epc_virt_addr(encl->secs.epc_page);
pginfo.addr = encl_page->desc & PAGE_MASK;
@@ -419,7 +422,7 @@ static vm_fault_t sgx_encl_eaug_page(struct vm_area_struct *vma,
err_out_shrink:
sgx_encl_shrink(encl, va_page);
err_out_epc:
- sgx_encl_free_epc_page(epc_page);
+ sgx_free_epc_page(epc_page);
err_out_unlock:
mutex_unlock(&encl->lock);
kfree(encl_page);
--
2.34.1
Two enclave threads may try to add and remove the same enclave page
simultaneously (e.g., if the SGX runtime supports both lazy allocation
and MADV_DONTNEED semantics). Consider some enclave page added to the
enclave. User space decides to temporarily remove this page (e.g.,
emulating the MADV_DONTNEED semantics) on CPU1. At the same time, user
space performs a memory access on the same page on CPU2, which results
in a #PF and ultimately in sgx_vma_fault(). Scenario proceeds as
follows:
/*
* CPU1: User space performs
* ioctl(SGX_IOC_ENCLAVE_REMOVE_PAGES)
* on enclave page X
*/
sgx_encl_remove_pages() {
mutex_lock(&encl->lock);
entry = sgx_encl_load_page(encl);
/*
* verify that page is
* trimmed and accepted
*/
mutex_unlock(&encl->lock);
/*
* remove PTE entry; cannot
* be performed under lock
*/
sgx_zap_enclave_ptes(encl);
/*
* Fault on CPU2 on same page X
*/
sgx_vma_fault() {
/*
* PTE entry was removed, but the
* page is still in enclave's xarray
*/
xa_load(&encl->page_array) != NULL ->
/*
* SGX driver thinks that this page
* was swapped out and loads it
*/
mutex_lock(&encl->lock);
/*
* this is effectively a no-op
*/
entry = sgx_encl_load_page_in_vma();
/*
* add PTE entry
*
* *BUG*: a PTE is installed for a
* page in process of being removed
*/
vmf_insert_pfn(...);
mutex_unlock(&encl->lock);
return VM_FAULT_NOPAGE;
}
/*
* continue with page removal
*/
mutex_lock(&encl->lock);
sgx_encl_free_epc_page(epc_page) {
/*
* remove page via EREMOVE
*/
/*
* free EPC page
*/
sgx_free_epc_page(epc_page);
}
xa_erase(&encl->page_array);
mutex_unlock(&encl->lock);
}
Here, CPU1 removed the page. However CPU2 installed the PTE entry on the
same page. This enclave page becomes perpetually inaccessible (until
another SGX_IOC_ENCLAVE_REMOVE_PAGES ioctl). This is because the page is
marked accessible in the PTE entry but is not EAUGed, and any subsequent
access to this page raises a fault: with the kernel believing there to
be a valid VMA, the unlikely error code X86_PF_SGX encountered by code
path do_user_addr_fault() -> access_error() causes the SGX driver's
sgx_vma_fault() to be skipped and user space receives a SIGSEGV instead.
The userspace SIGSEGV handler cannot perform EACCEPT because the page
was not EAUGed. Thus, the user space is stuck with the inaccessible
page.
Fix this race by forcing the fault handler on CPU2 to back off if the
page is currently being removed (on CPU1). This is achieved by
introducing a new flag SGX_ENCL_PAGE_BEING_REMOVED, which is unset by
default and set only right-before the first mutex_unlock() in
sgx_encl_remove_pages(). Upon loading the page, CPU2 checks whether this
page is being removed, and if yes then CPU2 backs off and waits until
the page is completely removed. After that, any memory access to this
page results in a normal "allocate and EAUG a page on #PF" flow.
Fixes: 9849bb27152c ("x86/sgx: Support complete page removal")
Cc: stable(a)vger.kernel.org
Signed-off-by: Dmitrii Kuvaiskii <dmitrii.kuvaiskii(a)intel.com>
---
arch/x86/kernel/cpu/sgx/encl.c | 3 ++-
arch/x86/kernel/cpu/sgx/encl.h | 3 +++
arch/x86/kernel/cpu/sgx/ioctl.c | 1 +
3 files changed, 6 insertions(+), 1 deletion(-)
diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c
index 41f14b1a3025..7ccd8b2fce5f 100644
--- a/arch/x86/kernel/cpu/sgx/encl.c
+++ b/arch/x86/kernel/cpu/sgx/encl.c
@@ -257,7 +257,8 @@ static struct sgx_encl_page *__sgx_encl_load_page(struct sgx_encl *encl,
/* Entry successfully located. */
if (entry->epc_page) {
- if (entry->desc & SGX_ENCL_PAGE_BEING_RECLAIMED)
+ if (entry->desc & (SGX_ENCL_PAGE_BEING_RECLAIMED |
+ SGX_ENCL_PAGE_BEING_REMOVED))
return ERR_PTR(-EBUSY);
return entry;
diff --git a/arch/x86/kernel/cpu/sgx/encl.h b/arch/x86/kernel/cpu/sgx/encl.h
index f94ff14c9486..fff5f2293ae7 100644
--- a/arch/x86/kernel/cpu/sgx/encl.h
+++ b/arch/x86/kernel/cpu/sgx/encl.h
@@ -25,6 +25,9 @@
/* 'desc' bit marking that the page is being reclaimed. */
#define SGX_ENCL_PAGE_BEING_RECLAIMED BIT(3)
+/* 'desc' bit marking that the page is being removed. */
+#define SGX_ENCL_PAGE_BEING_REMOVED BIT(2)
+
struct sgx_encl_page {
unsigned long desc;
unsigned long vm_max_prot_bits:8;
diff --git a/arch/x86/kernel/cpu/sgx/ioctl.c b/arch/x86/kernel/cpu/sgx/ioctl.c
index b65ab214bdf5..c542d4dd3e64 100644
--- a/arch/x86/kernel/cpu/sgx/ioctl.c
+++ b/arch/x86/kernel/cpu/sgx/ioctl.c
@@ -1142,6 +1142,7 @@ static long sgx_encl_remove_pages(struct sgx_encl *encl,
* Do not keep encl->lock because of dependency on
* mmap_lock acquired in sgx_zap_enclave_ptes().
*/
+ entry->desc |= SGX_ENCL_PAGE_BEING_REMOVED;
mutex_unlock(&encl->lock);
sgx_zap_enclave_ptes(encl, addr);
--
2.34.1