The patch below does not apply to the 6.12-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.12.y
git checkout FETCH_HEAD
git cherry-pick -x f47eba045e6cb97f9ee154c68dbf7c3c756919aa
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025011357-emboss-unclaimed-572f@gregkh' --subject-prefix 'PATCH 6.12.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From f47eba045e6cb97f9ee154c68dbf7c3c756919aa Mon Sep 17 00:00:00 2001
From: Abel Vesa <abel.vesa(a)linaro.org>
Date: Thu, 12 Dec 2024 19:37:43 +0200
Subject: [PATCH] usb: typec: ucsi: Set orientation as none when connector is
unplugged
The current implementation of the ucsi glink client connector_status()
callback is only relying on the state of the gpio. This means that even
when the cable is unplugged, the orientation propagated to the switches
along the graph is "orientation normal", instead of "orientation none",
which would be the correct one in this case.
One of the Qualcomm DP-USB PHY combo drivers, which needs to be aware of
the orientation change, is relying on the "orientation none" to skip
the reinitialization of the entire PHY. Since the ucsi glink client
advertises "orientation normal" even when the cable is unplugged, the
mentioned PHY is taken down and reinitialized when in fact it should be
left as-is. This triggers a crash within the displayport controller driver
in turn, which brings the whole system down on some Qualcomm platforms.
Propagating "orientation none" from the ucsi glink client on the
connector_status() callback hides the problem of the mentioned PHY driver
away for now. But the "orientation none" is nonetheless the correct one
to be used in this case.
So propagate the "orientation none" instead when the connector status
flags says cable is disconnected.
Fixes: 76716fd5bf09 ("usb: typec: ucsi: glink: move GPIO reading into connector_status callback")
Cc: stable <stable(a)kernel.org> # 6.10
Reviewed-by: Bryan O'Donoghue <bryan.odonoghue(a)linaro.org>
Reviewed-by: Heikki Krogerus <heikki.krogerus(a)linux.intel.com>
Reviewed-by: Neil Armstrong <neil.armstrong(a)linaro.org>
Signed-off-by: Abel Vesa <abel.vesa(a)linaro.org>
Reviewed-by: Johan Hovold <johan+linaro(a)kernel.org>
Tested-by: Johan Hovold <johan+linaro(a)kernel.org>
Link: https://lore.kernel.org/r/20241212-usb-typec-ucsi-glink-add-orientation-non…
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
diff --git a/drivers/usb/typec/ucsi/ucsi_glink.c b/drivers/usb/typec/ucsi/ucsi_glink.c
index 90948cd6d297..fed39d458090 100644
--- a/drivers/usb/typec/ucsi/ucsi_glink.c
+++ b/drivers/usb/typec/ucsi/ucsi_glink.c
@@ -185,6 +185,11 @@ static void pmic_glink_ucsi_connector_status(struct ucsi_connector *con)
struct pmic_glink_ucsi *ucsi = ucsi_get_drvdata(con->ucsi);
int orientation;
+ if (!UCSI_CONSTAT(con, CONNECTED)) {
+ typec_set_orientation(con->port, TYPEC_ORIENTATION_NONE);
+ return;
+ }
+
if (con->num > PMIC_GLINK_MAX_PORTS ||
!ucsi->port_orientation[con->num - 1])
return;
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y
git checkout FETCH_HEAD
git cherry-pick -x 74adad500346fb07d69af2c79acbff4adb061134
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025011317-dallying-crock-6557@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 74adad500346fb07d69af2c79acbff4adb061134 Mon Sep 17 00:00:00 2001
From: Joe Hattori <joe(a)pf.is.s.u-tokyo.ac.jp>
Date: Mon, 16 Dec 2024 10:55:39 +0900
Subject: [PATCH] usb: chipidea: ci_hdrc_imx: decrement device's refcount in
.remove() and in the error path of .probe()
Current implementation of ci_hdrc_imx_driver does not decrement the
refcount of the device obtained in usbmisc_get_init_data(). Add a
put_device() call in .remove() and in .probe() before returning an
error.
This bug was found by an experimental static analysis tool that I am
developing.
Cc: stable <stable(a)kernel.org>
Fixes: f40017e0f332 ("chipidea: usbmisc_imx: Add USB support for VF610 SoCs")
Signed-off-by: Joe Hattori <joe(a)pf.is.s.u-tokyo.ac.jp>
Acked-by: Peter Chen <peter.chen(a)kernel.org>
Link: https://lore.kernel.org/r/20241216015539.352579-1-joe@pf.is.s.u-tokyo.ac.jp
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
diff --git a/drivers/usb/chipidea/ci_hdrc_imx.c b/drivers/usb/chipidea/ci_hdrc_imx.c
index f2801700be8e..1a7fc638213e 100644
--- a/drivers/usb/chipidea/ci_hdrc_imx.c
+++ b/drivers/usb/chipidea/ci_hdrc_imx.c
@@ -370,25 +370,29 @@ static int ci_hdrc_imx_probe(struct platform_device *pdev)
data->pinctrl = devm_pinctrl_get(dev);
if (PTR_ERR(data->pinctrl) == -ENODEV)
data->pinctrl = NULL;
- else if (IS_ERR(data->pinctrl))
- return dev_err_probe(dev, PTR_ERR(data->pinctrl),
+ else if (IS_ERR(data->pinctrl)) {
+ ret = dev_err_probe(dev, PTR_ERR(data->pinctrl),
"pinctrl get failed\n");
+ goto err_put;
+ }
data->hsic_pad_regulator =
devm_regulator_get_optional(dev, "hsic");
if (PTR_ERR(data->hsic_pad_regulator) == -ENODEV) {
/* no pad regulator is needed */
data->hsic_pad_regulator = NULL;
- } else if (IS_ERR(data->hsic_pad_regulator))
- return dev_err_probe(dev, PTR_ERR(data->hsic_pad_regulator),
+ } else if (IS_ERR(data->hsic_pad_regulator)) {
+ ret = dev_err_probe(dev, PTR_ERR(data->hsic_pad_regulator),
"Get HSIC pad regulator error\n");
+ goto err_put;
+ }
if (data->hsic_pad_regulator) {
ret = regulator_enable(data->hsic_pad_regulator);
if (ret) {
dev_err(dev,
"Failed to enable HSIC pad regulator\n");
- return ret;
+ goto err_put;
}
}
}
@@ -402,13 +406,14 @@ static int ci_hdrc_imx_probe(struct platform_device *pdev)
dev_err(dev,
"pinctrl_hsic_idle lookup failed, err=%ld\n",
PTR_ERR(pinctrl_hsic_idle));
- return PTR_ERR(pinctrl_hsic_idle);
+ ret = PTR_ERR(pinctrl_hsic_idle);
+ goto err_put;
}
ret = pinctrl_select_state(data->pinctrl, pinctrl_hsic_idle);
if (ret) {
dev_err(dev, "hsic_idle select failed, err=%d\n", ret);
- return ret;
+ goto err_put;
}
data->pinctrl_hsic_active = pinctrl_lookup_state(data->pinctrl,
@@ -417,7 +422,8 @@ static int ci_hdrc_imx_probe(struct platform_device *pdev)
dev_err(dev,
"pinctrl_hsic_active lookup failed, err=%ld\n",
PTR_ERR(data->pinctrl_hsic_active));
- return PTR_ERR(data->pinctrl_hsic_active);
+ ret = PTR_ERR(data->pinctrl_hsic_active);
+ goto err_put;
}
}
@@ -527,6 +533,8 @@ static int ci_hdrc_imx_probe(struct platform_device *pdev)
if (pdata.flags & CI_HDRC_PMQOS)
cpu_latency_qos_remove_request(&data->pm_qos_req);
data->ci_pdev = NULL;
+err_put:
+ put_device(data->usbmisc_data->dev);
return ret;
}
@@ -551,6 +559,7 @@ static void ci_hdrc_imx_remove(struct platform_device *pdev)
if (data->hsic_pad_regulator)
regulator_disable(data->hsic_pad_regulator);
}
+ put_device(data->usbmisc_data->dev);
}
static void ci_hdrc_imx_shutdown(struct platform_device *pdev)
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x 74adad500346fb07d69af2c79acbff4adb061134
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025011316-rocket-frantic-221b@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 74adad500346fb07d69af2c79acbff4adb061134 Mon Sep 17 00:00:00 2001
From: Joe Hattori <joe(a)pf.is.s.u-tokyo.ac.jp>
Date: Mon, 16 Dec 2024 10:55:39 +0900
Subject: [PATCH] usb: chipidea: ci_hdrc_imx: decrement device's refcount in
.remove() and in the error path of .probe()
Current implementation of ci_hdrc_imx_driver does not decrement the
refcount of the device obtained in usbmisc_get_init_data(). Add a
put_device() call in .remove() and in .probe() before returning an
error.
This bug was found by an experimental static analysis tool that I am
developing.
Cc: stable <stable(a)kernel.org>
Fixes: f40017e0f332 ("chipidea: usbmisc_imx: Add USB support for VF610 SoCs")
Signed-off-by: Joe Hattori <joe(a)pf.is.s.u-tokyo.ac.jp>
Acked-by: Peter Chen <peter.chen(a)kernel.org>
Link: https://lore.kernel.org/r/20241216015539.352579-1-joe@pf.is.s.u-tokyo.ac.jp
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
diff --git a/drivers/usb/chipidea/ci_hdrc_imx.c b/drivers/usb/chipidea/ci_hdrc_imx.c
index f2801700be8e..1a7fc638213e 100644
--- a/drivers/usb/chipidea/ci_hdrc_imx.c
+++ b/drivers/usb/chipidea/ci_hdrc_imx.c
@@ -370,25 +370,29 @@ static int ci_hdrc_imx_probe(struct platform_device *pdev)
data->pinctrl = devm_pinctrl_get(dev);
if (PTR_ERR(data->pinctrl) == -ENODEV)
data->pinctrl = NULL;
- else if (IS_ERR(data->pinctrl))
- return dev_err_probe(dev, PTR_ERR(data->pinctrl),
+ else if (IS_ERR(data->pinctrl)) {
+ ret = dev_err_probe(dev, PTR_ERR(data->pinctrl),
"pinctrl get failed\n");
+ goto err_put;
+ }
data->hsic_pad_regulator =
devm_regulator_get_optional(dev, "hsic");
if (PTR_ERR(data->hsic_pad_regulator) == -ENODEV) {
/* no pad regulator is needed */
data->hsic_pad_regulator = NULL;
- } else if (IS_ERR(data->hsic_pad_regulator))
- return dev_err_probe(dev, PTR_ERR(data->hsic_pad_regulator),
+ } else if (IS_ERR(data->hsic_pad_regulator)) {
+ ret = dev_err_probe(dev, PTR_ERR(data->hsic_pad_regulator),
"Get HSIC pad regulator error\n");
+ goto err_put;
+ }
if (data->hsic_pad_regulator) {
ret = regulator_enable(data->hsic_pad_regulator);
if (ret) {
dev_err(dev,
"Failed to enable HSIC pad regulator\n");
- return ret;
+ goto err_put;
}
}
}
@@ -402,13 +406,14 @@ static int ci_hdrc_imx_probe(struct platform_device *pdev)
dev_err(dev,
"pinctrl_hsic_idle lookup failed, err=%ld\n",
PTR_ERR(pinctrl_hsic_idle));
- return PTR_ERR(pinctrl_hsic_idle);
+ ret = PTR_ERR(pinctrl_hsic_idle);
+ goto err_put;
}
ret = pinctrl_select_state(data->pinctrl, pinctrl_hsic_idle);
if (ret) {
dev_err(dev, "hsic_idle select failed, err=%d\n", ret);
- return ret;
+ goto err_put;
}
data->pinctrl_hsic_active = pinctrl_lookup_state(data->pinctrl,
@@ -417,7 +422,8 @@ static int ci_hdrc_imx_probe(struct platform_device *pdev)
dev_err(dev,
"pinctrl_hsic_active lookup failed, err=%ld\n",
PTR_ERR(data->pinctrl_hsic_active));
- return PTR_ERR(data->pinctrl_hsic_active);
+ ret = PTR_ERR(data->pinctrl_hsic_active);
+ goto err_put;
}
}
@@ -527,6 +533,8 @@ static int ci_hdrc_imx_probe(struct platform_device *pdev)
if (pdata.flags & CI_HDRC_PMQOS)
cpu_latency_qos_remove_request(&data->pm_qos_req);
data->ci_pdev = NULL;
+err_put:
+ put_device(data->usbmisc_data->dev);
return ret;
}
@@ -551,6 +559,7 @@ static void ci_hdrc_imx_remove(struct platform_device *pdev)
if (data->hsic_pad_regulator)
regulator_disable(data->hsic_pad_regulator);
}
+ put_device(data->usbmisc_data->dev);
}
static void ci_hdrc_imx_shutdown(struct platform_device *pdev)
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x 74adad500346fb07d69af2c79acbff4adb061134
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025011316-turbulent-jawed-ce2c@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 74adad500346fb07d69af2c79acbff4adb061134 Mon Sep 17 00:00:00 2001
From: Joe Hattori <joe(a)pf.is.s.u-tokyo.ac.jp>
Date: Mon, 16 Dec 2024 10:55:39 +0900
Subject: [PATCH] usb: chipidea: ci_hdrc_imx: decrement device's refcount in
.remove() and in the error path of .probe()
Current implementation of ci_hdrc_imx_driver does not decrement the
refcount of the device obtained in usbmisc_get_init_data(). Add a
put_device() call in .remove() and in .probe() before returning an
error.
This bug was found by an experimental static analysis tool that I am
developing.
Cc: stable <stable(a)kernel.org>
Fixes: f40017e0f332 ("chipidea: usbmisc_imx: Add USB support for VF610 SoCs")
Signed-off-by: Joe Hattori <joe(a)pf.is.s.u-tokyo.ac.jp>
Acked-by: Peter Chen <peter.chen(a)kernel.org>
Link: https://lore.kernel.org/r/20241216015539.352579-1-joe@pf.is.s.u-tokyo.ac.jp
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
diff --git a/drivers/usb/chipidea/ci_hdrc_imx.c b/drivers/usb/chipidea/ci_hdrc_imx.c
index f2801700be8e..1a7fc638213e 100644
--- a/drivers/usb/chipidea/ci_hdrc_imx.c
+++ b/drivers/usb/chipidea/ci_hdrc_imx.c
@@ -370,25 +370,29 @@ static int ci_hdrc_imx_probe(struct platform_device *pdev)
data->pinctrl = devm_pinctrl_get(dev);
if (PTR_ERR(data->pinctrl) == -ENODEV)
data->pinctrl = NULL;
- else if (IS_ERR(data->pinctrl))
- return dev_err_probe(dev, PTR_ERR(data->pinctrl),
+ else if (IS_ERR(data->pinctrl)) {
+ ret = dev_err_probe(dev, PTR_ERR(data->pinctrl),
"pinctrl get failed\n");
+ goto err_put;
+ }
data->hsic_pad_regulator =
devm_regulator_get_optional(dev, "hsic");
if (PTR_ERR(data->hsic_pad_regulator) == -ENODEV) {
/* no pad regulator is needed */
data->hsic_pad_regulator = NULL;
- } else if (IS_ERR(data->hsic_pad_regulator))
- return dev_err_probe(dev, PTR_ERR(data->hsic_pad_regulator),
+ } else if (IS_ERR(data->hsic_pad_regulator)) {
+ ret = dev_err_probe(dev, PTR_ERR(data->hsic_pad_regulator),
"Get HSIC pad regulator error\n");
+ goto err_put;
+ }
if (data->hsic_pad_regulator) {
ret = regulator_enable(data->hsic_pad_regulator);
if (ret) {
dev_err(dev,
"Failed to enable HSIC pad regulator\n");
- return ret;
+ goto err_put;
}
}
}
@@ -402,13 +406,14 @@ static int ci_hdrc_imx_probe(struct platform_device *pdev)
dev_err(dev,
"pinctrl_hsic_idle lookup failed, err=%ld\n",
PTR_ERR(pinctrl_hsic_idle));
- return PTR_ERR(pinctrl_hsic_idle);
+ ret = PTR_ERR(pinctrl_hsic_idle);
+ goto err_put;
}
ret = pinctrl_select_state(data->pinctrl, pinctrl_hsic_idle);
if (ret) {
dev_err(dev, "hsic_idle select failed, err=%d\n", ret);
- return ret;
+ goto err_put;
}
data->pinctrl_hsic_active = pinctrl_lookup_state(data->pinctrl,
@@ -417,7 +422,8 @@ static int ci_hdrc_imx_probe(struct platform_device *pdev)
dev_err(dev,
"pinctrl_hsic_active lookup failed, err=%ld\n",
PTR_ERR(data->pinctrl_hsic_active));
- return PTR_ERR(data->pinctrl_hsic_active);
+ ret = PTR_ERR(data->pinctrl_hsic_active);
+ goto err_put;
}
}
@@ -527,6 +533,8 @@ static int ci_hdrc_imx_probe(struct platform_device *pdev)
if (pdata.flags & CI_HDRC_PMQOS)
cpu_latency_qos_remove_request(&data->pm_qos_req);
data->ci_pdev = NULL;
+err_put:
+ put_device(data->usbmisc_data->dev);
return ret;
}
@@ -551,6 +559,7 @@ static void ci_hdrc_imx_remove(struct platform_device *pdev)
if (data->hsic_pad_regulator)
regulator_disable(data->hsic_pad_regulator);
}
+ put_device(data->usbmisc_data->dev);
}
static void ci_hdrc_imx_shutdown(struct platform_device *pdev)
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x 74adad500346fb07d69af2c79acbff4adb061134
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025011315-purebred-anatomy-53fd@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 74adad500346fb07d69af2c79acbff4adb061134 Mon Sep 17 00:00:00 2001
From: Joe Hattori <joe(a)pf.is.s.u-tokyo.ac.jp>
Date: Mon, 16 Dec 2024 10:55:39 +0900
Subject: [PATCH] usb: chipidea: ci_hdrc_imx: decrement device's refcount in
.remove() and in the error path of .probe()
Current implementation of ci_hdrc_imx_driver does not decrement the
refcount of the device obtained in usbmisc_get_init_data(). Add a
put_device() call in .remove() and in .probe() before returning an
error.
This bug was found by an experimental static analysis tool that I am
developing.
Cc: stable <stable(a)kernel.org>
Fixes: f40017e0f332 ("chipidea: usbmisc_imx: Add USB support for VF610 SoCs")
Signed-off-by: Joe Hattori <joe(a)pf.is.s.u-tokyo.ac.jp>
Acked-by: Peter Chen <peter.chen(a)kernel.org>
Link: https://lore.kernel.org/r/20241216015539.352579-1-joe@pf.is.s.u-tokyo.ac.jp
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
diff --git a/drivers/usb/chipidea/ci_hdrc_imx.c b/drivers/usb/chipidea/ci_hdrc_imx.c
index f2801700be8e..1a7fc638213e 100644
--- a/drivers/usb/chipidea/ci_hdrc_imx.c
+++ b/drivers/usb/chipidea/ci_hdrc_imx.c
@@ -370,25 +370,29 @@ static int ci_hdrc_imx_probe(struct platform_device *pdev)
data->pinctrl = devm_pinctrl_get(dev);
if (PTR_ERR(data->pinctrl) == -ENODEV)
data->pinctrl = NULL;
- else if (IS_ERR(data->pinctrl))
- return dev_err_probe(dev, PTR_ERR(data->pinctrl),
+ else if (IS_ERR(data->pinctrl)) {
+ ret = dev_err_probe(dev, PTR_ERR(data->pinctrl),
"pinctrl get failed\n");
+ goto err_put;
+ }
data->hsic_pad_regulator =
devm_regulator_get_optional(dev, "hsic");
if (PTR_ERR(data->hsic_pad_regulator) == -ENODEV) {
/* no pad regulator is needed */
data->hsic_pad_regulator = NULL;
- } else if (IS_ERR(data->hsic_pad_regulator))
- return dev_err_probe(dev, PTR_ERR(data->hsic_pad_regulator),
+ } else if (IS_ERR(data->hsic_pad_regulator)) {
+ ret = dev_err_probe(dev, PTR_ERR(data->hsic_pad_regulator),
"Get HSIC pad regulator error\n");
+ goto err_put;
+ }
if (data->hsic_pad_regulator) {
ret = regulator_enable(data->hsic_pad_regulator);
if (ret) {
dev_err(dev,
"Failed to enable HSIC pad regulator\n");
- return ret;
+ goto err_put;
}
}
}
@@ -402,13 +406,14 @@ static int ci_hdrc_imx_probe(struct platform_device *pdev)
dev_err(dev,
"pinctrl_hsic_idle lookup failed, err=%ld\n",
PTR_ERR(pinctrl_hsic_idle));
- return PTR_ERR(pinctrl_hsic_idle);
+ ret = PTR_ERR(pinctrl_hsic_idle);
+ goto err_put;
}
ret = pinctrl_select_state(data->pinctrl, pinctrl_hsic_idle);
if (ret) {
dev_err(dev, "hsic_idle select failed, err=%d\n", ret);
- return ret;
+ goto err_put;
}
data->pinctrl_hsic_active = pinctrl_lookup_state(data->pinctrl,
@@ -417,7 +422,8 @@ static int ci_hdrc_imx_probe(struct platform_device *pdev)
dev_err(dev,
"pinctrl_hsic_active lookup failed, err=%ld\n",
PTR_ERR(data->pinctrl_hsic_active));
- return PTR_ERR(data->pinctrl_hsic_active);
+ ret = PTR_ERR(data->pinctrl_hsic_active);
+ goto err_put;
}
}
@@ -527,6 +533,8 @@ static int ci_hdrc_imx_probe(struct platform_device *pdev)
if (pdata.flags & CI_HDRC_PMQOS)
cpu_latency_qos_remove_request(&data->pm_qos_req);
data->ci_pdev = NULL;
+err_put:
+ put_device(data->usbmisc_data->dev);
return ret;
}
@@ -551,6 +559,7 @@ static void ci_hdrc_imx_remove(struct platform_device *pdev)
if (data->hsic_pad_regulator)
regulator_disable(data->hsic_pad_regulator);
}
+ put_device(data->usbmisc_data->dev);
}
static void ci_hdrc_imx_shutdown(struct platform_device *pdev)
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y
git checkout FETCH_HEAD
git cherry-pick -x 87ecfdbc699cc95fac73291b52650283ddcf929d
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025011354-trend-playmate-36dc@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 87ecfdbc699cc95fac73291b52650283ddcf929d Mon Sep 17 00:00:00 2001
From: Paolo Bonzini <pbonzini(a)redhat.com>
Date: Sun, 12 Jan 2025 10:34:44 +0100
Subject: [PATCH] KVM: e500: always restore irqs
If find_linux_pte fails, IRQs will not be restored. This is unlikely
to happen in practice since it would have been reported as hanging
hosts, but it should of course be fixed anyway.
Cc: stable(a)vger.kernel.org
Reported-by: Sean Christopherson <seanjc(a)google.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/powerpc/kvm/e500_mmu_host.c b/arch/powerpc/kvm/e500_mmu_host.c
index e5a145b578a4..6824e8139801 100644
--- a/arch/powerpc/kvm/e500_mmu_host.c
+++ b/arch/powerpc/kvm/e500_mmu_host.c
@@ -479,7 +479,6 @@ static inline int kvmppc_e500_shadow_map(struct kvmppc_vcpu_e500 *vcpu_e500,
if (pte_present(pte)) {
wimg = (pte_val(pte) >> PTE_WIMGE_SHIFT) &
MAS2_WIMGE_MASK;
- local_irq_restore(flags);
} else {
local_irq_restore(flags);
pr_err_ratelimited("%s: pte not present: gfn %lx,pfn %lx\n",
@@ -488,8 +487,9 @@ static inline int kvmppc_e500_shadow_map(struct kvmppc_vcpu_e500 *vcpu_e500,
goto out;
}
}
- writable = kvmppc_e500_ref_setup(ref, gtlbe, pfn, wimg);
+ local_irq_restore(flags);
+ writable = kvmppc_e500_ref_setup(ref, gtlbe, pfn, wimg);
kvmppc_e500_setup_stlbe(&vcpu_e500->vcpu, gtlbe, tsize,
ref, gvaddr, stlbe);
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x 87ecfdbc699cc95fac73291b52650283ddcf929d
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025011353-wooing-stitch-0ab6@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 87ecfdbc699cc95fac73291b52650283ddcf929d Mon Sep 17 00:00:00 2001
From: Paolo Bonzini <pbonzini(a)redhat.com>
Date: Sun, 12 Jan 2025 10:34:44 +0100
Subject: [PATCH] KVM: e500: always restore irqs
If find_linux_pte fails, IRQs will not be restored. This is unlikely
to happen in practice since it would have been reported as hanging
hosts, but it should of course be fixed anyway.
Cc: stable(a)vger.kernel.org
Reported-by: Sean Christopherson <seanjc(a)google.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/powerpc/kvm/e500_mmu_host.c b/arch/powerpc/kvm/e500_mmu_host.c
index e5a145b578a4..6824e8139801 100644
--- a/arch/powerpc/kvm/e500_mmu_host.c
+++ b/arch/powerpc/kvm/e500_mmu_host.c
@@ -479,7 +479,6 @@ static inline int kvmppc_e500_shadow_map(struct kvmppc_vcpu_e500 *vcpu_e500,
if (pte_present(pte)) {
wimg = (pte_val(pte) >> PTE_WIMGE_SHIFT) &
MAS2_WIMGE_MASK;
- local_irq_restore(flags);
} else {
local_irq_restore(flags);
pr_err_ratelimited("%s: pte not present: gfn %lx,pfn %lx\n",
@@ -488,8 +487,9 @@ static inline int kvmppc_e500_shadow_map(struct kvmppc_vcpu_e500 *vcpu_e500,
goto out;
}
}
- writable = kvmppc_e500_ref_setup(ref, gtlbe, pfn, wimg);
+ local_irq_restore(flags);
+ writable = kvmppc_e500_ref_setup(ref, gtlbe, pfn, wimg);
kvmppc_e500_setup_stlbe(&vcpu_e500->vcpu, gtlbe, tsize,
ref, gvaddr, stlbe);
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x 87ecfdbc699cc95fac73291b52650283ddcf929d
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025011347-reps-matching-f85e@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 87ecfdbc699cc95fac73291b52650283ddcf929d Mon Sep 17 00:00:00 2001
From: Paolo Bonzini <pbonzini(a)redhat.com>
Date: Sun, 12 Jan 2025 10:34:44 +0100
Subject: [PATCH] KVM: e500: always restore irqs
If find_linux_pte fails, IRQs will not be restored. This is unlikely
to happen in practice since it would have been reported as hanging
hosts, but it should of course be fixed anyway.
Cc: stable(a)vger.kernel.org
Reported-by: Sean Christopherson <seanjc(a)google.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/powerpc/kvm/e500_mmu_host.c b/arch/powerpc/kvm/e500_mmu_host.c
index e5a145b578a4..6824e8139801 100644
--- a/arch/powerpc/kvm/e500_mmu_host.c
+++ b/arch/powerpc/kvm/e500_mmu_host.c
@@ -479,7 +479,6 @@ static inline int kvmppc_e500_shadow_map(struct kvmppc_vcpu_e500 *vcpu_e500,
if (pte_present(pte)) {
wimg = (pte_val(pte) >> PTE_WIMGE_SHIFT) &
MAS2_WIMGE_MASK;
- local_irq_restore(flags);
} else {
local_irq_restore(flags);
pr_err_ratelimited("%s: pte not present: gfn %lx,pfn %lx\n",
@@ -488,8 +487,9 @@ static inline int kvmppc_e500_shadow_map(struct kvmppc_vcpu_e500 *vcpu_e500,
goto out;
}
}
- writable = kvmppc_e500_ref_setup(ref, gtlbe, pfn, wimg);
+ local_irq_restore(flags);
+ writable = kvmppc_e500_ref_setup(ref, gtlbe, pfn, wimg);
kvmppc_e500_setup_stlbe(&vcpu_e500->vcpu, gtlbe, tsize,
ref, gvaddr, stlbe);
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x 87ecfdbc699cc95fac73291b52650283ddcf929d
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025011347-agile-appendix-65e2@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 87ecfdbc699cc95fac73291b52650283ddcf929d Mon Sep 17 00:00:00 2001
From: Paolo Bonzini <pbonzini(a)redhat.com>
Date: Sun, 12 Jan 2025 10:34:44 +0100
Subject: [PATCH] KVM: e500: always restore irqs
If find_linux_pte fails, IRQs will not be restored. This is unlikely
to happen in practice since it would have been reported as hanging
hosts, but it should of course be fixed anyway.
Cc: stable(a)vger.kernel.org
Reported-by: Sean Christopherson <seanjc(a)google.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/powerpc/kvm/e500_mmu_host.c b/arch/powerpc/kvm/e500_mmu_host.c
index e5a145b578a4..6824e8139801 100644
--- a/arch/powerpc/kvm/e500_mmu_host.c
+++ b/arch/powerpc/kvm/e500_mmu_host.c
@@ -479,7 +479,6 @@ static inline int kvmppc_e500_shadow_map(struct kvmppc_vcpu_e500 *vcpu_e500,
if (pte_present(pte)) {
wimg = (pte_val(pte) >> PTE_WIMGE_SHIFT) &
MAS2_WIMGE_MASK;
- local_irq_restore(flags);
} else {
local_irq_restore(flags);
pr_err_ratelimited("%s: pte not present: gfn %lx,pfn %lx\n",
@@ -488,8 +487,9 @@ static inline int kvmppc_e500_shadow_map(struct kvmppc_vcpu_e500 *vcpu_e500,
goto out;
}
}
- writable = kvmppc_e500_ref_setup(ref, gtlbe, pfn, wimg);
+ local_irq_restore(flags);
+ writable = kvmppc_e500_ref_setup(ref, gtlbe, pfn, wimg);
kvmppc_e500_setup_stlbe(&vcpu_e500->vcpu, gtlbe, tsize,
ref, gvaddr, stlbe);
The patch below does not apply to the 6.6-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y
git checkout FETCH_HEAD
git cherry-pick -x 87ecfdbc699cc95fac73291b52650283ddcf929d
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025011346-singer-curry-4142@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 87ecfdbc699cc95fac73291b52650283ddcf929d Mon Sep 17 00:00:00 2001
From: Paolo Bonzini <pbonzini(a)redhat.com>
Date: Sun, 12 Jan 2025 10:34:44 +0100
Subject: [PATCH] KVM: e500: always restore irqs
If find_linux_pte fails, IRQs will not be restored. This is unlikely
to happen in practice since it would have been reported as hanging
hosts, but it should of course be fixed anyway.
Cc: stable(a)vger.kernel.org
Reported-by: Sean Christopherson <seanjc(a)google.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/powerpc/kvm/e500_mmu_host.c b/arch/powerpc/kvm/e500_mmu_host.c
index e5a145b578a4..6824e8139801 100644
--- a/arch/powerpc/kvm/e500_mmu_host.c
+++ b/arch/powerpc/kvm/e500_mmu_host.c
@@ -479,7 +479,6 @@ static inline int kvmppc_e500_shadow_map(struct kvmppc_vcpu_e500 *vcpu_e500,
if (pte_present(pte)) {
wimg = (pte_val(pte) >> PTE_WIMGE_SHIFT) &
MAS2_WIMGE_MASK;
- local_irq_restore(flags);
} else {
local_irq_restore(flags);
pr_err_ratelimited("%s: pte not present: gfn %lx,pfn %lx\n",
@@ -488,8 +487,9 @@ static inline int kvmppc_e500_shadow_map(struct kvmppc_vcpu_e500 *vcpu_e500,
goto out;
}
}
- writable = kvmppc_e500_ref_setup(ref, gtlbe, pfn, wimg);
+ local_irq_restore(flags);
+ writable = kvmppc_e500_ref_setup(ref, gtlbe, pfn, wimg);
kvmppc_e500_setup_stlbe(&vcpu_e500->vcpu, gtlbe, tsize,
ref, gvaddr, stlbe);
The patch below does not apply to the 6.12-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.12.y
git checkout FETCH_HEAD
git cherry-pick -x 87ecfdbc699cc95fac73291b52650283ddcf929d
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025011346-pesticide-silenced-822e@gregkh' --subject-prefix 'PATCH 6.12.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 87ecfdbc699cc95fac73291b52650283ddcf929d Mon Sep 17 00:00:00 2001
From: Paolo Bonzini <pbonzini(a)redhat.com>
Date: Sun, 12 Jan 2025 10:34:44 +0100
Subject: [PATCH] KVM: e500: always restore irqs
If find_linux_pte fails, IRQs will not be restored. This is unlikely
to happen in practice since it would have been reported as hanging
hosts, but it should of course be fixed anyway.
Cc: stable(a)vger.kernel.org
Reported-by: Sean Christopherson <seanjc(a)google.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/powerpc/kvm/e500_mmu_host.c b/arch/powerpc/kvm/e500_mmu_host.c
index e5a145b578a4..6824e8139801 100644
--- a/arch/powerpc/kvm/e500_mmu_host.c
+++ b/arch/powerpc/kvm/e500_mmu_host.c
@@ -479,7 +479,6 @@ static inline int kvmppc_e500_shadow_map(struct kvmppc_vcpu_e500 *vcpu_e500,
if (pte_present(pte)) {
wimg = (pte_val(pte) >> PTE_WIMGE_SHIFT) &
MAS2_WIMGE_MASK;
- local_irq_restore(flags);
} else {
local_irq_restore(flags);
pr_err_ratelimited("%s: pte not present: gfn %lx,pfn %lx\n",
@@ -488,8 +487,9 @@ static inline int kvmppc_e500_shadow_map(struct kvmppc_vcpu_e500 *vcpu_e500,
goto out;
}
}
- writable = kvmppc_e500_ref_setup(ref, gtlbe, pfn, wimg);
+ local_irq_restore(flags);
+ writable = kvmppc_e500_ref_setup(ref, gtlbe, pfn, wimg);
kvmppc_e500_setup_stlbe(&vcpu_e500->vcpu, gtlbe, tsize,
ref, gvaddr, stlbe);
The macro mt9t112_reg_read returns from mt9t112_camera_probe() in case of
an error. However, a call to the shutdown function mt9t112_s_power() is
required at this point. Failure to execute the shutdown function will
result in the priv->clk not being properly disabled.
Found by Linux Verification Center (linuxtesting.org) with Klever.
Fixes: 858424b998ae ("V4L/DVB (13670): soc-camera: Add mt9t112 camera driver")
Cc: stable(a)vger.kernel.org
Signed-off-by: Vitalii Mordan <mordan(a)ispras.ru>
---
drivers/media/i2c/mt9t112.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/drivers/media/i2c/mt9t112.c b/drivers/media/i2c/mt9t112.c
index 878dff9b7577..82e2c42f4c7b 100644
--- a/drivers/media/i2c/mt9t112.c
+++ b/drivers/media/i2c/mt9t112.c
@@ -1034,7 +1034,11 @@ static int mt9t112_camera_probe(struct i2c_client *client)
return ret;
/* Check and show chip ID. */
- mt9t112_reg_read(chipid, client, 0x0000);
+ chipid = __mt9t112_reg_read(client, 0x0000);
+ if (chipid < 0) {
+ ret = chipid;
+ goto done;
+ }
switch (chipid) {
case 0x2680:
--
2.25.1
If the clock csi2rx->p_clk was not enabled in csi2rx_stop(), it should not
be disabled in any path.
Conversely, if it was enabled in csi2rx_stop(), it must be disabled in all
error paths to ensure proper cleanup.
Found by Linux Verification Center (linuxtesting.org) with Klever.
Fixes: 1fc3b37f34f6 ("media: v4l: cadence: Add Cadence MIPI-CSI2 RX driver")
Cc: stable(a)vger.kernel.org
Signed-off-by: Vitalii Mordan <mordan(a)ispras.ru>
---
drivers/media/platform/cadence/cdns-csi2rx.c | 10 +++++++---
1 file changed, 7 insertions(+), 3 deletions(-)
diff --git a/drivers/media/platform/cadence/cdns-csi2rx.c b/drivers/media/platform/cadence/cdns-csi2rx.c
index 4d64df829e75..e7e8decf9a02 100644
--- a/drivers/media/platform/cadence/cdns-csi2rx.c
+++ b/drivers/media/platform/cadence/cdns-csi2rx.c
@@ -322,11 +322,14 @@ static int csi2rx_start(struct csi2rx_priv *csi2rx)
static void csi2rx_stop(struct csi2rx_priv *csi2rx)
{
+ int ret, ret_clk;
unsigned int i;
u32 val;
- int ret;
- clk_prepare_enable(csi2rx->p_clk);
+ ret_clk = clk_prepare_enable(csi2rx->p_clk);
+ if (ret_clk)
+ dev_warn(csi2rx->dev,
+ "Couldn't prepare and enable P clock\n");
reset_control_assert(csi2rx->sys_rst);
clk_disable_unprepare(csi2rx->sys_clk);
@@ -348,7 +351,8 @@ static void csi2rx_stop(struct csi2rx_priv *csi2rx)
}
reset_control_assert(csi2rx->p_rst);
- clk_disable_unprepare(csi2rx->p_clk);
+ if (!ret_clk)
+ clk_disable_unprepare(csi2rx->p_clk);
if (v4l2_subdev_call(csi2rx->source_subdev, video, s_stream, false))
dev_warn(csi2rx->dev, "Couldn't disable our subdev\n");
--
2.25.1
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y
git checkout FETCH_HEAD
git cherry-pick -x 01ea6bf5cb58b20cc1bd159f0cf74a76cf04bb69
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025011356-unwary-enrich-4187@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 01ea6bf5cb58b20cc1bd159f0cf74a76cf04bb69 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Andr=C3=A9=20Draszik?= <andre.draszik(a)linaro.org>
Date: Mon, 9 Dec 2024 11:49:53 +0000
Subject: [PATCH] usb: dwc3: gadget: fix writing NYET threshold
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Before writing a new value to the register, the old value needs to be
masked out for the new value to be programmed as intended, because at
least in some cases the reset value of that field is 0xf (max value).
At the moment, the dwc3 core initialises the threshold to the maximum
value (0xf), with the option to override it via a DT. No upstream DTs
seem to override it, therefore this commit doesn't change behaviour for
any upstream platform. Nevertheless, the code should be fixed to have
the desired outcome.
Do so.
Fixes: 80caf7d21adc ("usb: dwc3: add lpm erratum support")
Cc: stable(a)vger.kernel.org # 5.10+ (needs adjustment for 5.4)
Signed-off-by: André Draszik <andre.draszik(a)linaro.org>
Acked-by: Thinh Nguyen <Thinh.Nguyen(a)synopsys.com>
Link: https://lore.kernel.org/r/20241209-dwc3-nyet-fix-v2-1-02755683345b@linaro.o…
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
diff --git a/drivers/usb/dwc3/core.h b/drivers/usb/dwc3/core.h
index ee73789326bc..f11570c8ffd0 100644
--- a/drivers/usb/dwc3/core.h
+++ b/drivers/usb/dwc3/core.h
@@ -464,6 +464,7 @@
#define DWC3_DCTL_TRGTULST_SS_INACT (DWC3_DCTL_TRGTULST(6))
/* These apply for core versions 1.94a and later */
+#define DWC3_DCTL_NYET_THRES_MASK (0xf << 20)
#define DWC3_DCTL_NYET_THRES(n) (((n) & 0xf) << 20)
#define DWC3_DCTL_KEEP_CONNECT BIT(19)
diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
index 83dc7304d701..31a654c6f15b 100644
--- a/drivers/usb/dwc3/gadget.c
+++ b/drivers/usb/dwc3/gadget.c
@@ -4195,8 +4195,10 @@ static void dwc3_gadget_conndone_interrupt(struct dwc3 *dwc)
WARN_ONCE(DWC3_VER_IS_PRIOR(DWC3, 240A) && dwc->has_lpm_erratum,
"LPM Erratum not available on dwc3 revisions < 2.40a\n");
- if (dwc->has_lpm_erratum && !DWC3_VER_IS_PRIOR(DWC3, 240A))
+ if (dwc->has_lpm_erratum && !DWC3_VER_IS_PRIOR(DWC3, 240A)) {
+ reg &= ~DWC3_DCTL_NYET_THRES_MASK;
reg |= DWC3_DCTL_NYET_THRES(dwc->lpm_nyet_threshold);
+ }
dwc3_gadget_dctl_write_safe(dwc, reg);
} else {
drm_wait_one_vblank() uses drm_WARN() to check for a time-dependent
condition. Since syzkaller runs the kernel with the panic_on_warn set, this
causes the entire kernel to panic with a "vblank wait timed out on crtc %i"
message.
In this case it does not mean that there is something wrong with the kernel
but is caused by time delays in vblanks handling that the fuzzer introduces
as a side effect when fail_alloc_pages, failslab, fail_usercopy faults are
injected with maximum verbosity. With lower verbosity this issue disappears.
drm_WARN() was introduced here by e8450f51a4b3 ("drm/irq: Implement a
generic vblank_wait function") and it is intended to indicate a failure with
vblank irqs handling by the underlying driver. The issue is raised during
testing of the vkms driver, but it may be potentially reproduced with other
drivers.
Fix this by using drm_warn() instead which does not cause the kernel to
panic with panic_on_warn set, but still provides a way to tell users about
this unexpected condition.
Found by Linux Verification Center (linuxtesting.org) with Syzkaller.
Fixes: e8450f51a4b3 ("drm/irq: Implement a generic vblank_wait function")
Cc: stable(a)vger.kernel.org
Reported-by: syzbot+9a8f87865d5e2e8ef57f(a)syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=9a8f87865d5e2e8ef57f
Signed-off-by: Vitaliy Shevtsov <v.shevtsov(a)maxima.ru>
---
drivers/gpu/drm/drm_vblank.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/drm_vblank.c b/drivers/gpu/drm/drm_vblank.c
index 94e45ed6869d..fa09ff5b1d48 100644
--- a/drivers/gpu/drm/drm_vblank.c
+++ b/drivers/gpu/drm/drm_vblank.c
@@ -1304,7 +1304,8 @@ void drm_wait_one_vblank(struct drm_device *dev, unsigned int pipe)
last != drm_vblank_count(dev, pipe),
msecs_to_jiffies(100));
- drm_WARN(dev, ret == 0, "vblank wait timed out on crtc %i\n", pipe);
+ if (!ret)
+ drm_warn(dev, "vblank wait timed out on crtc %i\n", pipe);
drm_vblank_put(dev, pipe);
}
--
2.47.1
From: Lad Prabhakar <prabhakar.mahadev-lad.rj(a)bp.renesas.com>
The PFC_MASK value for the PFC_mx register was previously hardcoded as
`0x07`, which is correct for SoCs in the RZ/G2L family but insufficient
for RZ/V2H and RZ/G3E, where the mask value should be `0x0f`. This
discrepancy caused incorrect PFC register configurations on RZ/V2H and
RZ/G3E SoCs.
On the RZ/G2L, the PFC_mx bitfields are also 4 bits wide, with bit 4
marked as reserved. The reserved bits are documented to read as zero and
be ignored when written. Updating the PFC_MASK definition from `0x07` to
`0x0f` ensures compatibility with both SoC families while maintaining
correct behavior on RZ/G2L.
Fixes: 9bd95ac86e70 ("pinctrl: renesas: rzg2l: Add support for RZ/V2H SoC")
Cc: stable(a)vger.kernel.org
Reported-by: Hien Huynh <hien.huynh.px(a)renesas.com>
Signed-off-by: Lad Prabhakar <prabhakar.mahadev-lad.rj(a)bp.renesas.com>
---
v1->v2
- Dropped SoC specific configuration
---
drivers/pinctrl/renesas/pinctrl-rzg2l.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/pinctrl/renesas/pinctrl-rzg2l.c b/drivers/pinctrl/renesas/pinctrl-rzg2l.c
index ffcc5255724d..e33efd65670f 100644
--- a/drivers/pinctrl/renesas/pinctrl-rzg2l.c
+++ b/drivers/pinctrl/renesas/pinctrl-rzg2l.c
@@ -159,7 +159,7 @@
#define PWPR_REGWE_B BIT(5) /* OEN Register Write Enable, known only in RZ/V2H(P) */
#define PM_MASK 0x03
-#define PFC_MASK 0x07
+#define PFC_MASK 0x0f
#define IEN_MASK 0x01
#define IOLH_MASK 0x03
#define SR_MASK 0x01
--
2.43.0
The quilt patch titled
Subject: ocfs2: handle a symlink read error correctly
has been removed from the -mm tree. Its filename was
ocfs2-handle-a-symlink-read-error-correctly.patch
This patch was dropped because it was merged into the mm-nonmm-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: "Matthew Wilcox (Oracle)" <willy(a)infradead.org>
Subject: ocfs2: handle a symlink read error correctly
Date: Thu, 5 Dec 2024 17:16:29 +0000
Patch series "Convert ocfs2 to use folios".
Mark did a conversion of ocfs2 to use folios and sent it to me as a
giant patch for review ;-)
So I've redone it as individual patches, and credited Mark for the patches
where his code is substantially the same. It's not a bad way to do it;
his patch had some bugs and my patches had some bugs. Hopefully all our
bugs were different from each other. And hopefully Mark likes all the
changes I made to his code!
This patch (of 23):
If we can't read the buffer, be sure to unlock the page before returning.
Link: https://lkml.kernel.org/r/20241205171653.3179945-1-willy@infradead.org
Link: https://lkml.kernel.org/r/20241205171653.3179945-2-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy(a)infradead.org>
Reviewed-by: Joseph Qi <joseph.qi(a)linux.alibaba.com>
Cc: Mark Fasheh <mark(a)fasheh.com>
Cc: Joel Becker <jlbec(a)evilplan.org>
Cc: Junxiao Bi <junxiao.bi(a)oracle.com>
Cc: Changwei Ge <gechangwei(a)live.cn>
Cc: Jun Piao <piaojun(a)huawei.com>
Cc: Mark Tinguely <mark.tinguely(a)oracle.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/ocfs2/symlink.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
--- a/fs/ocfs2/symlink.c~ocfs2-handle-a-symlink-read-error-correctly
+++ a/fs/ocfs2/symlink.c
@@ -65,7 +65,7 @@ static int ocfs2_fast_symlink_read_folio
if (status < 0) {
mlog_errno(status);
- return status;
+ goto out;
}
fe = (struct ocfs2_dinode *) bh->b_data;
@@ -76,9 +76,10 @@ static int ocfs2_fast_symlink_read_folio
memcpy(kaddr, link, len + 1);
kunmap_atomic(kaddr);
SetPageUptodate(page);
+out:
unlock_page(page);
brelse(bh);
- return 0;
+ return status;
}
const struct address_space_operations ocfs2_fast_symlink_aops = {
_
Patches currently in -mm which might be from willy(a)infradead.org are
mm-page_alloc-cache-page_zone-result-in-free_unref_page.patch
mm-make-alloc_pages_mpol-static.patch
mm-page_alloc-export-free_frozen_pages-instead-of-free_unref_page.patch
mm-page_alloc-move-set_page_refcounted-to-callers-of-post_alloc_hook.patch
mm-page_alloc-move-set_page_refcounted-to-callers-of-prep_new_page.patch
mm-page_alloc-move-set_page_refcounted-to-callers-of-get_page_from_freelist.patch
mm-page_alloc-move-set_page_refcounted-to-callers-of-__alloc_pages_cpuset_fallback.patch
mm-page_alloc-move-set_page_refcounted-to-callers-of-__alloc_pages_may_oom.patch
mm-page_alloc-move-set_page_refcounted-to-callers-of-__alloc_pages_direct_compact.patch
mm-page_alloc-move-set_page_refcounted-to-callers-of-__alloc_pages_direct_reclaim.patch
mm-page_alloc-move-set_page_refcounted-to-callers-of-__alloc_pages_slowpath.patch
mm-page_alloc-move-set_page_refcounted-to-end-of-__alloc_pages.patch
mm-page_alloc-add-__alloc_frozen_pages.patch
mm-mempolicy-add-alloc_frozen_pages.patch
slab-allocate-frozen-pages.patch
mm-remove-pagetranstail.patch
The quilt patch titled
Subject: fs/proc: fix softlockup in __read_vmcore (part 2)
has been removed from the -mm tree. Its filename was
fs-proc-fix-softlockup-in-__read_vmcore-part-2.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Rik van Riel <riel(a)surriel.com>
Subject: fs/proc: fix softlockup in __read_vmcore (part 2)
Date: Fri, 10 Jan 2025 10:28:21 -0500
Since commit 5cbcb62dddf5 ("fs/proc: fix softlockup in __read_vmcore") the
number of softlockups in __read_vmcore at kdump time have gone down, but
they still happen sometimes.
In a memory constrained environment like the kdump image, a softlockup is
not just a harmless message, but it can interfere with things like RCU
freeing memory, causing the crashdump to get stuck.
The second loop in __read_vmcore has a lot more opportunities for natural
sleep points, like scheduling out while waiting for a data write to
happen, but apparently that is not always enough.
Add a cond_resched() to the second loop in __read_vmcore to (hopefully)
get rid of the softlockups.
Link: https://lkml.kernel.org/r/20250110102821.2a37581b@fangorn
Fixes: 5cbcb62dddf5 ("fs/proc: fix softlockup in __read_vmcore")
Signed-off-by: Rik van Riel <riel(a)surriel.com>
Reported-by: Breno Leitao <leitao(a)debian.org>
Cc: Baoquan He <bhe(a)redhat.com>
Cc: Dave Young <dyoung(a)redhat.com>
Cc: Vivek Goyal <vgoyal(a)redhat.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/proc/vmcore.c | 2 ++
1 file changed, 2 insertions(+)
--- a/fs/proc/vmcore.c~fs-proc-fix-softlockup-in-__read_vmcore-part-2
+++ a/fs/proc/vmcore.c
@@ -404,6 +404,8 @@ static ssize_t __read_vmcore(struct iov_
if (!iov_iter_count(iter))
return acc;
}
+
+ cond_resched();
}
return acc;
_
Patches currently in -mm which might be from riel(a)surriel.com are
mm-remove-unnecessary-calls-to-lru_add_drain.patch
The quilt patch titled
Subject: mm: vmscan : pgdemote vmstat is not getting updated when MGLRU is enabled.
has been removed from the -mm tree. Its filename was
mm-vmscan-pgdemote-vmstat-is-not-getting-updated-when-mglru-is-enabled.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Donet Tom <donettom(a)linux.ibm.com>
Subject: mm: vmscan : pgdemote vmstat is not getting updated when MGLRU is enabled.
Date: Thu, 9 Jan 2025 00:05:39 -0600
When MGLRU is enabled, the pgdemote_kswapd, pgdemote_direct, and
pgdemote_khugepaged stats in vmstat are not being updated.
Commit f77f0c751478 ("mm,memcg: provide per-cgroup counters for NUMA
balancing operations") moved the pgdemote vmstat update from
demote_folio_list() to shrink_inactive_list(), which is in the normal LRU
path. As a result, the pgdemote stats are updated correctly for the
normal LRU but not for MGLRU.
To address this, we have added the pgdemote stat update in the
evict_folios() function, which is in the MGLRU path. With this patch, the
pgdemote stats will now be updated correctly when MGLRU is enabled.
Without this patch vmstat output when MGLRU is enabled
======================================================
pgdemote_kswapd 0
pgdemote_direct 0
pgdemote_khugepaged 0
With this patch vmstat output when MGLRU is enabled
===================================================
pgdemote_kswapd 43234
pgdemote_direct 4691
pgdemote_khugepaged 0
Link: https://lkml.kernel.org/r/20250109060540.451261-1-donettom@linux.ibm.com
Fixes: f77f0c751478 ("mm,memcg: provide per-cgroup counters for NUMA balancing operations")
Signed-off-by: Donet Tom <donettom(a)linux.ibm.com>
Acked-by: Yu Zhao <yuzhao(a)google.com>
Tested-by: Li Zhijian <lizhijian(a)fujitsu.com>
Reviewed-by: Li Zhijian <lizhijian(a)fujitsu.com>
Cc: Aneesh Kumar K.V (Arm) <aneesh.kumar(a)kernel.org>
Cc: David Rientjes <rientjes(a)google.com>
Cc: Johannes Weiner <hannes(a)cmpxchg.org>
Cc: Kaiyang Zhao <kaiyang2(a)cs.cmu.edu>
Cc: Michal Hocko <mhocko(a)kernel.org>
Cc: Muchun Song <muchun.song(a)linux.dev>
Cc: Ritesh Harjani (IBM) <ritesh.list(a)gmail.com>
Cc: Roman Gushchin <roman.gushchin(a)linux.dev>
Cc: Shakeel Butt <shakeel.butt(a)linux.dev>
Cc: Wei Xu <weixugc(a)google.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/vmscan.c | 3 +++
1 file changed, 3 insertions(+)
--- a/mm/vmscan.c~mm-vmscan-pgdemote-vmstat-is-not-getting-updated-when-mglru-is-enabled
+++ a/mm/vmscan.c
@@ -4642,6 +4642,9 @@ retry:
reset_batch_size(walk);
}
+ __mod_lruvec_state(lruvec, PGDEMOTE_KSWAPD + reclaimer_offset(),
+ stat.nr_demoted);
+
item = PGSTEAL_KSWAPD + reclaimer_offset();
if (!cgroup_reclaim(sc))
__count_vm_events(item, reclaimed);
_
Patches currently in -mm which might be from donettom(a)linux.ibm.com are
mm-migrate-removed-unused-argument-vma-from-migrate_misplaced_folio.patch
selftests-mm-added-new-test-cases-to-the-migration-test.patch
The quilt patch titled
Subject: zram: fix potential UAF of zram table
has been removed from the -mm tree. Its filename was
zram-fix-potential-uaf-of-zram-table.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Kairui Song <kasong(a)tencent.com>
Subject: zram: fix potential UAF of zram table
Date: Tue, 7 Jan 2025 14:54:46 +0800
If zram_meta_alloc failed early, it frees allocated zram->table without
setting it NULL. Which will potentially cause zram_meta_free to access
the table if user reset an failed and uninitialized device.
Link: https://lkml.kernel.org/r/20250107065446.86928-1-ryncsn@gmail.com
Fixes: 74363ec674cb ("zram: fix uninitialized ZRAM not releasing backing device")
Signed-off-by: Kairui Song <kasong(a)tencent.com>
Reviewed-by: Sergey Senozhatsky <senozhatsky(a)chromium.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
drivers/block/zram/zram_drv.c | 1 +
1 file changed, 1 insertion(+)
--- a/drivers/block/zram/zram_drv.c~zram-fix-potential-uaf-of-zram-table
+++ a/drivers/block/zram/zram_drv.c
@@ -1468,6 +1468,7 @@ static bool zram_meta_alloc(struct zram
zram->mem_pool = zs_create_pool(zram->disk->disk_name);
if (!zram->mem_pool) {
vfree(zram->table);
+ zram->table = NULL;
return false;
}
_
Patches currently in -mm which might be from kasong(a)tencent.com are
mm-memcontrol-avoid-duplicated-memcg-enable-check.patch
mm-swap_cgroup-remove-swap_cgroup_cmpxchg.patch
mm-swap_cgroup-remove-global-swap-cgroup-lock.patch
mm-swap_cgroup-decouple-swap-cgroup-recording-and-clearing.patch
mm-swap-minor-clean-up-for-swap-entry-allocation.patch
mm-swap-fold-swap_info_get_cont-in-the-only-caller.patch
mm-swap-remove-old-allocation-path-for-hdd.patch
mm-swap-use-cluster-lock-for-hdd.patch
mm-swap-clean-up-device-availability-check.patch
mm-swap-clean-up-plist-removal-and-adding.patch
mm-swap-hold-a-reference-during-scan-and-cleanup-flag-usage.patch
mm-swap-use-an-enum-to-define-all-cluster-flags-and-wrap-flags-changes.patch
mm-swap-reduce-contention-on-device-lock.patch
mm-swap-simplify-percpu-cluster-updating.patch
mm-swap-introduce-a-helper-for-retrieving-cluster-from-offset.patch
mm-swap-use-a-global-swap-cluster-for-non-rotation-devices.patch
mm-swap_slots-remove-slot-cache-for-freeing-path.patch
The quilt patch titled
Subject: selftests/mm: set allocated memory to non-zero content in cow test
has been removed from the -mm tree. Its filename was
selftests-mm-set-allocated-memory-to-non-zero-content-in-cow-test.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Ryan Roberts <ryan.roberts(a)arm.com>
Subject: selftests/mm: set allocated memory to non-zero content in cow test
Date: Tue, 7 Jan 2025 14:25:53 +0000
After commit b1f202060afe ("mm: remap unused subpages to shared zeropage
when splitting isolated thp"), cow test cases involving swapping out THPs
via madvise(MADV_PAGEOUT) started to be skipped due to the subsequent
check via pagemap determining that the memory was not actually swapped
out. Logs similar to this were emitted:
...
# [RUN] Basic COW after fork() ... with swapped-out, PTE-mapped THP (16 kB)
ok 2 # SKIP MADV_PAGEOUT did not work, is swap enabled?
# [RUN] Basic COW after fork() ... with single PTE of swapped-out THP (16 kB)
ok 3 # SKIP MADV_PAGEOUT did not work, is swap enabled?
# [RUN] Basic COW after fork() ... with swapped-out, PTE-mapped THP (32 kB)
ok 4 # SKIP MADV_PAGEOUT did not work, is swap enabled?
...
The commit in question introduces the behaviour of scanning THPs and if
their content is predominantly zero, it splits them and replaces the pages
which are wholly zero with the zero page. These cow test cases were
getting caught up in this.
So let's avoid that by filling the contents of all allocated memory with
a non-zero value. With this in place, the tests are passing again.
Link: https://lkml.kernel.org/r/20250107142555.1870101-1-ryan.roberts@arm.com
Fixes: b1f202060afe ("mm: remap unused subpages to shared zeropage when splitting isolated thp")
Signed-off-by: Ryan Roberts <ryan.roberts(a)arm.com>
Acked-by: David Hildenbrand <david(a)redhat.com>
Cc: Usama Arif <usamaarif642(a)gmail.com>
Cc: Yu Zhao <yuzhao(a)google.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
tools/testing/selftests/mm/cow.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
--- a/tools/testing/selftests/mm/cow.c~selftests-mm-set-allocated-memory-to-non-zero-content-in-cow-test
+++ a/tools/testing/selftests/mm/cow.c
@@ -758,7 +758,7 @@ static void do_run_with_base_page(test_f
}
/* Populate a base page. */
- memset(mem, 0, pagesize);
+ memset(mem, 1, pagesize);
if (swapout) {
madvise(mem, pagesize, MADV_PAGEOUT);
@@ -824,12 +824,12 @@ static void do_run_with_thp(test_fn fn,
* Try to populate a THP. Touch the first sub-page and test if
* we get the last sub-page populated automatically.
*/
- mem[0] = 0;
+ mem[0] = 1;
if (!pagemap_is_populated(pagemap_fd, mem + thpsize - pagesize)) {
ksft_test_result_skip("Did not get a THP populated\n");
goto munmap;
}
- memset(mem, 0, thpsize);
+ memset(mem, 1, thpsize);
size = thpsize;
switch (thp_run) {
@@ -1012,7 +1012,7 @@ static void run_with_hugetlb(test_fn fn,
}
/* Populate an huge page. */
- memset(mem, 0, hugetlbsize);
+ memset(mem, 1, hugetlbsize);
/*
* We need a total of two hugetlb pages to handle COW/unsharing
_
Patches currently in -mm which might be from ryan.roberts(a)arm.com are
selftests-mm-add-fork-cow-guard-page-test-fix.patch
selftests-mm-introduce-uffd-wp-mremap-regression-test.patch
The quilt patch titled
Subject: mm: clear uffd-wp PTE/PMD state on mremap()
has been removed from the -mm tree. Its filename was
mm-clear-uffd-wp-pte-pmd-state-on-mremap.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Ryan Roberts <ryan.roberts(a)arm.com>
Subject: mm: clear uffd-wp PTE/PMD state on mremap()
Date: Tue, 7 Jan 2025 14:47:52 +0000
When mremap()ing a memory region previously registered with userfaultfd as
write-protected but without UFFD_FEATURE_EVENT_REMAP, an inconsistency in
flag clearing leads to a mismatch between the vma flags (which have
uffd-wp cleared) and the pte/pmd flags (which do not have uffd-wp
cleared). This mismatch causes a subsequent mprotect(PROT_WRITE) to
trigger a warning in page_table_check_pte_flags() due to setting the pte
to writable while uffd-wp is still set.
Fix this by always explicitly clearing the uffd-wp pte/pmd flags on any
such mremap() so that the values are consistent with the existing clearing
of VM_UFFD_WP. Be careful to clear the logical flag regardless of its
physical form; a PTE bit, a swap PTE bit, or a PTE marker. Cover PTE,
huge PMD and hugetlb paths.
Link: https://lkml.kernel.org/r/20250107144755.1871363-2-ryan.roberts@arm.com
Co-developed-by: Miko��aj Lenczewski <miko.lenczewski(a)arm.com>
Signed-off-by: Miko��aj Lenczewski <miko.lenczewski(a)arm.com>
Signed-off-by: Ryan Roberts <ryan.roberts(a)arm.com>
Closes: https://lore.kernel.org/linux-mm/810b44a8-d2ae-4107-b665-5a42eae2d948@arm.c…
Fixes: 63b2d4174c4a ("userfaultfd: wp: add the writeprotect API to userfaultfd ioctl")
Cc: David Hildenbrand <david(a)redhat.com>
Cc: Jann Horn <jannh(a)google.com>
Cc: Liam R. Howlett <Liam.Howlett(a)Oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes(a)oracle.com>
Cc: Mark Rutland <mark.rutland(a)arm.com>
Cc: Muchun Song <muchun.song(a)linux.dev>
Cc: Peter Xu <peterx(a)redhat.com>
Cc: Shuah Khan <shuah(a)kernel.org>
Cc: Vlastimil Babka <vbabka(a)suse.cz>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
include/linux/userfaultfd_k.h | 12 ++++++++++++
mm/huge_memory.c | 12 ++++++++++++
mm/hugetlb.c | 14 +++++++++++++-
mm/mremap.c | 32 +++++++++++++++++++++++++++++++-
4 files changed, 68 insertions(+), 2 deletions(-)
--- a/include/linux/userfaultfd_k.h~mm-clear-uffd-wp-pte-pmd-state-on-mremap
+++ a/include/linux/userfaultfd_k.h
@@ -247,6 +247,13 @@ static inline bool vma_can_userfault(str
vma_is_shmem(vma);
}
+static inline bool vma_has_uffd_without_event_remap(struct vm_area_struct *vma)
+{
+ struct userfaultfd_ctx *uffd_ctx = vma->vm_userfaultfd_ctx.ctx;
+
+ return uffd_ctx && (uffd_ctx->features & UFFD_FEATURE_EVENT_REMAP) == 0;
+}
+
extern int dup_userfaultfd(struct vm_area_struct *, struct list_head *);
extern void dup_userfaultfd_complete(struct list_head *);
void dup_userfaultfd_fail(struct list_head *);
@@ -401,6 +408,11 @@ static inline bool userfaultfd_wp_async(
{
return false;
}
+
+static inline bool vma_has_uffd_without_event_remap(struct vm_area_struct *vma)
+{
+ return false;
+}
#endif /* CONFIG_USERFAULTFD */
--- a/mm/huge_memory.c~mm-clear-uffd-wp-pte-pmd-state-on-mremap
+++ a/mm/huge_memory.c
@@ -2206,6 +2206,16 @@ static pmd_t move_soft_dirty_pmd(pmd_t p
return pmd;
}
+static pmd_t clear_uffd_wp_pmd(pmd_t pmd)
+{
+ if (pmd_present(pmd))
+ pmd = pmd_clear_uffd_wp(pmd);
+ else if (is_swap_pmd(pmd))
+ pmd = pmd_swp_clear_uffd_wp(pmd);
+
+ return pmd;
+}
+
bool move_huge_pmd(struct vm_area_struct *vma, unsigned long old_addr,
unsigned long new_addr, pmd_t *old_pmd, pmd_t *new_pmd)
{
@@ -2244,6 +2254,8 @@ bool move_huge_pmd(struct vm_area_struct
pgtable_trans_huge_deposit(mm, new_pmd, pgtable);
}
pmd = move_soft_dirty_pmd(pmd);
+ if (vma_has_uffd_without_event_remap(vma))
+ pmd = clear_uffd_wp_pmd(pmd);
set_pmd_at(mm, new_addr, new_pmd, pmd);
if (force_flush)
flush_pmd_tlb_range(vma, old_addr, old_addr + PMD_SIZE);
--- a/mm/hugetlb.c~mm-clear-uffd-wp-pte-pmd-state-on-mremap
+++ a/mm/hugetlb.c
@@ -5402,6 +5402,7 @@ static void move_huge_pte(struct vm_area
unsigned long new_addr, pte_t *src_pte, pte_t *dst_pte,
unsigned long sz)
{
+ bool need_clear_uffd_wp = vma_has_uffd_without_event_remap(vma);
struct hstate *h = hstate_vma(vma);
struct mm_struct *mm = vma->vm_mm;
spinlock_t *src_ptl, *dst_ptl;
@@ -5418,7 +5419,18 @@ static void move_huge_pte(struct vm_area
spin_lock_nested(src_ptl, SINGLE_DEPTH_NESTING);
pte = huge_ptep_get_and_clear(mm, old_addr, src_pte);
- set_huge_pte_at(mm, new_addr, dst_pte, pte, sz);
+
+ if (need_clear_uffd_wp && pte_marker_uffd_wp(pte))
+ huge_pte_clear(mm, new_addr, dst_pte, sz);
+ else {
+ if (need_clear_uffd_wp) {
+ if (pte_present(pte))
+ pte = huge_pte_clear_uffd_wp(pte);
+ else if (is_swap_pte(pte))
+ pte = pte_swp_clear_uffd_wp(pte);
+ }
+ set_huge_pte_at(mm, new_addr, dst_pte, pte, sz);
+ }
if (src_ptl != dst_ptl)
spin_unlock(src_ptl);
--- a/mm/mremap.c~mm-clear-uffd-wp-pte-pmd-state-on-mremap
+++ a/mm/mremap.c
@@ -138,6 +138,7 @@ static int move_ptes(struct vm_area_stru
struct vm_area_struct *new_vma, pmd_t *new_pmd,
unsigned long new_addr, bool need_rmap_locks)
{
+ bool need_clear_uffd_wp = vma_has_uffd_without_event_remap(vma);
struct mm_struct *mm = vma->vm_mm;
pte_t *old_pte, *new_pte, pte;
pmd_t dummy_pmdval;
@@ -216,7 +217,18 @@ static int move_ptes(struct vm_area_stru
force_flush = true;
pte = move_pte(pte, old_addr, new_addr);
pte = move_soft_dirty_pte(pte);
- set_pte_at(mm, new_addr, new_pte, pte);
+
+ if (need_clear_uffd_wp && pte_marker_uffd_wp(pte))
+ pte_clear(mm, new_addr, new_pte);
+ else {
+ if (need_clear_uffd_wp) {
+ if (pte_present(pte))
+ pte = pte_clear_uffd_wp(pte);
+ else if (is_swap_pte(pte))
+ pte = pte_swp_clear_uffd_wp(pte);
+ }
+ set_pte_at(mm, new_addr, new_pte, pte);
+ }
}
arch_leave_lazy_mmu_mode();
@@ -278,6 +290,15 @@ static bool move_normal_pmd(struct vm_ar
if (WARN_ON_ONCE(!pmd_none(*new_pmd)))
return false;
+ /* If this pmd belongs to a uffd vma with remap events disabled, we need
+ * to ensure that the uffd-wp state is cleared from all pgtables. This
+ * means recursing into lower page tables in move_page_tables(), and we
+ * can reuse the existing code if we simply treat the entry as "not
+ * moved".
+ */
+ if (vma_has_uffd_without_event_remap(vma))
+ return false;
+
/*
* We don't have to worry about the ordering of src and dst
* ptlocks because exclusive mmap_lock prevents deadlock.
@@ -333,6 +354,15 @@ static bool move_normal_pud(struct vm_ar
if (WARN_ON_ONCE(!pud_none(*new_pud)))
return false;
+ /* If this pud belongs to a uffd vma with remap events disabled, we need
+ * to ensure that the uffd-wp state is cleared from all pgtables. This
+ * means recursing into lower page tables in move_page_tables(), and we
+ * can reuse the existing code if we simply treat the entry as "not
+ * moved".
+ */
+ if (vma_has_uffd_without_event_remap(vma))
+ return false;
+
/*
* We don't have to worry about the ordering of src and dst
* ptlocks because exclusive mmap_lock prevents deadlock.
_
Patches currently in -mm which might be from ryan.roberts(a)arm.com are
selftests-mm-add-fork-cow-guard-page-test-fix.patch
selftests-mm-introduce-uffd-wp-mremap-regression-test.patch
The quilt patch titled
Subject: mm: zswap: properly synchronize freeing resources during CPU hotunplug
has been removed from the -mm tree. Its filename was
mm-zswap-properly-synchronize-freeing-resources-during-cpu-hotunplug.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Yosry Ahmed <yosryahmed(a)google.com>
Subject: mm: zswap: properly synchronize freeing resources during CPU hotunplug
Date: Wed, 8 Jan 2025 22:24:41 +0000
In zswap_compress() and zswap_decompress(), the per-CPU acomp_ctx of the
current CPU at the beginning of the operation is retrieved and used
throughout. However, since neither preemption nor migration are disabled,
it is possible that the operation continues on a different CPU.
If the original CPU is hotunplugged while the acomp_ctx is still in use,
we run into a UAF bug as some of the resources attached to the acomp_ctx
are freed during hotunplug in zswap_cpu_comp_dead() (i.e.
acomp_ctx.buffer, acomp_ctx.req, or acomp_ctx.acomp).
The problem was introduced in commit 1ec3b5fe6eec ("mm/zswap: move to use
crypto_acomp API for hardware acceleration") when the switch to the
crypto_acomp API was made. Prior to that, the per-CPU crypto_comp was
retrieved using get_cpu_ptr() which disables preemption and makes sure the
CPU cannot go away from under us. Preemption cannot be disabled with the
crypto_acomp API as a sleepable context is needed.
Use the acomp_ctx.mutex to synchronize CPU hotplug callbacks allocating
and freeing resources with compression/decompression paths. Make sure
that acomp_ctx.req is NULL when the resources are freed. In the
compression/decompression paths, check if acomp_ctx.req is NULL after
acquiring the mutex (meaning the CPU was offlined) and retry on the new
CPU.
The initialization of acomp_ctx.mutex is moved from the CPU hotplug
callback to the pool initialization where it belongs (where the mutex is
allocated). In addition to adding clarity, this makes sure that CPU
hotplug cannot reinitialize a mutex that is already locked by
compression/decompression.
Previously a fix was attempted by holding cpus_read_lock() [1]. This
would have caused a potential deadlock as it is possible for code already
holding the lock to fall into reclaim and enter zswap (causing a
deadlock). A fix was also attempted using SRCU for synchronization, but
Johannes pointed out that synchronize_srcu() cannot be used in CPU hotplug
notifiers [2].
Alternative fixes that were considered/attempted and could have worked:
- Refcounting the per-CPU acomp_ctx. This involves complexity in
handling the race between the refcount dropping to zero in
zswap_[de]compress() and the refcount being re-initialized when the
CPU is onlined.
- Disabling migration before getting the per-CPU acomp_ctx [3], but
that's discouraged and is a much bigger hammer than needed, and could
result in subtle performance issues.
[1]https://lkml.kernel.org/20241219212437.2714151-1-yosryahmed@google.com/
[2]https://lkml.kernel.org/20250107074724.1756696-2-yosryahmed@google.com/
[3]https://lkml.kernel.org/20250107222236.2715883-2-yosryahmed@google.com/
[yosryahmed(a)google.com: remove comment]
Link: https://lkml.kernel.org/r/CAJD7tkaxS1wjn+swugt8QCvQ-rVF5RZnjxwPGX17k8x9zSMa…
Link: https://lkml.kernel.org/r/20250108222441.3622031-1-yosryahmed@google.com
Fixes: 1ec3b5fe6eec ("mm/zswap: move to use crypto_acomp API for hardware acceleration")
Signed-off-by: Yosry Ahmed <yosryahmed(a)google.com>
Reported-by: Johannes Weiner <hannes(a)cmpxchg.org>
Closes: https://lore.kernel.org/lkml/20241113213007.GB1564047@cmpxchg.org/
Reported-by: Sam Sun <samsun1006219(a)gmail.com>
Closes: https://lore.kernel.org/lkml/CAEkJfYMtSdM5HceNsXUDf5haghD5+o2e7Qv4OcuruL4tP…
Cc: Barry Song <baohua(a)kernel.org>
Cc: Chengming Zhou <chengming.zhou(a)linux.dev>
Cc: Kanchana P Sridhar <kanchana.p.sridhar(a)intel.com>
Cc: Nhat Pham <nphamcs(a)gmail.com>
Cc: Vitaly Wool <vitalywool(a)gmail.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/zswap.c | 58 ++++++++++++++++++++++++++++++++++++++-------------
1 file changed, 44 insertions(+), 14 deletions(-)
--- a/mm/zswap.c~mm-zswap-properly-synchronize-freeing-resources-during-cpu-hotunplug
+++ a/mm/zswap.c
@@ -251,7 +251,7 @@ static struct zswap_pool *zswap_pool_cre
struct zswap_pool *pool;
char name[38]; /* 'zswap' + 32 char (max) num + \0 */
gfp_t gfp = __GFP_NORETRY | __GFP_NOWARN | __GFP_KSWAPD_RECLAIM;
- int ret;
+ int ret, cpu;
if (!zswap_has_pool) {
/* if either are unset, pool initialization failed, and we
@@ -285,6 +285,9 @@ static struct zswap_pool *zswap_pool_cre
goto error;
}
+ for_each_possible_cpu(cpu)
+ mutex_init(&per_cpu_ptr(pool->acomp_ctx, cpu)->mutex);
+
ret = cpuhp_state_add_instance(CPUHP_MM_ZSWP_POOL_PREPARE,
&pool->node);
if (ret)
@@ -821,11 +824,12 @@ static int zswap_cpu_comp_prepare(unsign
struct acomp_req *req;
int ret;
- mutex_init(&acomp_ctx->mutex);
-
+ mutex_lock(&acomp_ctx->mutex);
acomp_ctx->buffer = kmalloc_node(PAGE_SIZE * 2, GFP_KERNEL, cpu_to_node(cpu));
- if (!acomp_ctx->buffer)
- return -ENOMEM;
+ if (!acomp_ctx->buffer) {
+ ret = -ENOMEM;
+ goto buffer_fail;
+ }
acomp = crypto_alloc_acomp_node(pool->tfm_name, 0, 0, cpu_to_node(cpu));
if (IS_ERR(acomp)) {
@@ -855,12 +859,15 @@ static int zswap_cpu_comp_prepare(unsign
acomp_request_set_callback(req, CRYPTO_TFM_REQ_MAY_BACKLOG,
crypto_req_done, &acomp_ctx->wait);
+ mutex_unlock(&acomp_ctx->mutex);
return 0;
req_fail:
crypto_free_acomp(acomp_ctx->acomp);
acomp_fail:
kfree(acomp_ctx->buffer);
+buffer_fail:
+ mutex_unlock(&acomp_ctx->mutex);
return ret;
}
@@ -869,17 +876,45 @@ static int zswap_cpu_comp_dead(unsigned
struct zswap_pool *pool = hlist_entry(node, struct zswap_pool, node);
struct crypto_acomp_ctx *acomp_ctx = per_cpu_ptr(pool->acomp_ctx, cpu);
+ mutex_lock(&acomp_ctx->mutex);
if (!IS_ERR_OR_NULL(acomp_ctx)) {
if (!IS_ERR_OR_NULL(acomp_ctx->req))
acomp_request_free(acomp_ctx->req);
+ acomp_ctx->req = NULL;
if (!IS_ERR_OR_NULL(acomp_ctx->acomp))
crypto_free_acomp(acomp_ctx->acomp);
kfree(acomp_ctx->buffer);
}
+ mutex_unlock(&acomp_ctx->mutex);
return 0;
}
+static struct crypto_acomp_ctx *acomp_ctx_get_cpu_lock(struct zswap_pool *pool)
+{
+ struct crypto_acomp_ctx *acomp_ctx;
+
+ for (;;) {
+ acomp_ctx = raw_cpu_ptr(pool->acomp_ctx);
+ mutex_lock(&acomp_ctx->mutex);
+ if (likely(acomp_ctx->req))
+ return acomp_ctx;
+ /*
+ * It is possible that we were migrated to a different CPU after
+ * getting the per-CPU ctx but before the mutex was acquired. If
+ * the old CPU got offlined, zswap_cpu_comp_dead() could have
+ * already freed ctx->req (among other things) and set it to
+ * NULL. Just try again on the new CPU that we ended up on.
+ */
+ mutex_unlock(&acomp_ctx->mutex);
+ }
+}
+
+static void acomp_ctx_put_unlock(struct crypto_acomp_ctx *acomp_ctx)
+{
+ mutex_unlock(&acomp_ctx->mutex);
+}
+
static bool zswap_compress(struct page *page, struct zswap_entry *entry,
struct zswap_pool *pool)
{
@@ -893,10 +928,7 @@ static bool zswap_compress(struct page *
gfp_t gfp;
u8 *dst;
- acomp_ctx = raw_cpu_ptr(pool->acomp_ctx);
-
- mutex_lock(&acomp_ctx->mutex);
-
+ acomp_ctx = acomp_ctx_get_cpu_lock(pool);
dst = acomp_ctx->buffer;
sg_init_table(&input, 1);
sg_set_page(&input, page, PAGE_SIZE, 0);
@@ -949,7 +981,7 @@ unlock:
else if (alloc_ret)
zswap_reject_alloc_fail++;
- mutex_unlock(&acomp_ctx->mutex);
+ acomp_ctx_put_unlock(acomp_ctx);
return comp_ret == 0 && alloc_ret == 0;
}
@@ -960,9 +992,7 @@ static void zswap_decompress(struct zswa
struct crypto_acomp_ctx *acomp_ctx;
u8 *src;
- acomp_ctx = raw_cpu_ptr(entry->pool->acomp_ctx);
- mutex_lock(&acomp_ctx->mutex);
-
+ acomp_ctx = acomp_ctx_get_cpu_lock(entry->pool);
src = zpool_map_handle(zpool, entry->handle, ZPOOL_MM_RO);
/*
* If zpool_map_handle is atomic, we cannot reliably utilize its mapped buffer
@@ -986,10 +1016,10 @@ static void zswap_decompress(struct zswa
acomp_request_set_params(acomp_ctx->req, &input, &output, entry->length, PAGE_SIZE);
BUG_ON(crypto_wait_req(crypto_acomp_decompress(acomp_ctx->req), &acomp_ctx->wait));
BUG_ON(acomp_ctx->req->dlen != PAGE_SIZE);
- mutex_unlock(&acomp_ctx->mutex);
if (src != acomp_ctx->buffer)
zpool_unmap_handle(zpool, entry->handle);
+ acomp_ctx_put_unlock(acomp_ctx);
}
/*********************************
_
Patches currently in -mm which might be from yosryahmed(a)google.com are
The quilt patch titled
Subject: hugetlb: fix NULL pointer dereference in trace_hugetlbfs_alloc_inode
has been removed from the -mm tree. Its filename was
hugetlb-fix-null-pointer-dereference-in-trace_hugetlbfs_alloc_inode.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Muchun Song <songmuchun(a)bytedance.com>
Subject: hugetlb: fix NULL pointer dereference in trace_hugetlbfs_alloc_inode
Date: Mon, 6 Jan 2025 11:31:17 +0800
hugetlb_file_setup() will pass a NULL @dir to hugetlbfs_get_inode(), so we
will access a NULL pointer for @dir. Fix it and set __entry->dr to 0 if
@dir is NULL. Because ->i_ino cannot be 0 (see get_next_ino()), there is
no confusing if user sees a 0 inode number.
Link: https://lkml.kernel.org/r/20250106033118.4640-1-songmuchun@bytedance.com
Fixes: 318580ad7f28 ("hugetlbfs: support tracepoint")
Signed-off-by: Muchun Song <songmuchun(a)bytedance.com>
Reported-by: Cheung Wall <zzqq0103.hey(a)gmail.com>
Closes: https://lore.kernel.org/linux-mm/02858D60-43C1-4863-A84F-3C76A8AF1F15@linux…
Reviewed-by: Hongbo Li <lihongbo22(a)huawei.com>
Cc: cheung wall <zzqq0103.hey(a)gmail.com>
Cc: Christian Brauner <brauner(a)kernel.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
include/trace/events/hugetlbfs.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/include/trace/events/hugetlbfs.h~hugetlb-fix-null-pointer-dereference-in-trace_hugetlbfs_alloc_inode
+++ a/include/trace/events/hugetlbfs.h
@@ -23,7 +23,7 @@ TRACE_EVENT(hugetlbfs_alloc_inode,
TP_fast_assign(
__entry->dev = inode->i_sb->s_dev;
__entry->ino = inode->i_ino;
- __entry->dir = dir->i_ino;
+ __entry->dir = dir ? dir->i_ino : 0;
__entry->mode = mode;
),
_
Patches currently in -mm which might be from songmuchun(a)bytedance.com are
The quilt patch titled
Subject: filemap: avoid truncating 64-bit offset to 32 bits
has been removed from the -mm tree. Its filename was
filemap-avoid-truncating-64-bit-offset-to-32-bits.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Marco Nelissen <marco.nelissen(a)gmail.com>
Subject: filemap: avoid truncating 64-bit offset to 32 bits
Date: Thu, 2 Jan 2025 11:04:11 -0800
On 32-bit kernels, folio_seek_hole_data() was inadvertently truncating a
64-bit value to 32 bits, leading to a possible infinite loop when writing
to an xfs filesystem.
Link: https://lkml.kernel.org/r/20250102190540.1356838-1-marco.nelissen@gmail.com
Fixes: 54fa39ac2e00 ("iomap: use mapping_seek_hole_data")
Signed-off-by: Marco Nelissen <marco.nelissen(a)gmail.com>
Cc: Matthew Wilcox (Oracle) <willy(a)infradead.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/filemap.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/mm/filemap.c~filemap-avoid-truncating-64-bit-offset-to-32-bits
+++ a/mm/filemap.c
@@ -2996,7 +2996,7 @@ static inline loff_t folio_seek_hole_dat
if (ops->is_partially_uptodate(folio, offset, bsz) ==
seek_data)
break;
- start = (start + bsz) & ~(bsz - 1);
+ start = (start + bsz) & ~((u64)bsz - 1);
offset += bsz;
} while (offset < folio_size(folio));
unlock:
_
Patches currently in -mm which might be from marco.nelissen(a)gmail.com are
The quilt patch titled
Subject: tools: fix atomic_set() definition to set the value correctly
has been removed from the -mm tree. Its filename was
tools-fix-atomic_set-definition-to-set-the-value-correctly.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Suren Baghdasaryan <surenb(a)google.com>
Subject: tools: fix atomic_set() definition to set the value correctly
Date: Fri, 27 Dec 2024 14:22:20 -0800
Currently vma test is failing because of the new vma_assert_attached()
assertion. The check is failing because previous refcount_set() inside
vma_mark_attached() is a NoOp. Fix the definition of atomic_set() to
correctly set the value of the atomic.
Link: https://lkml.kernel.org/r/20241227222220.1726384-1-surenb@google.com
Fixes: 9325b8b5a1cb ("tools: add skeleton code for userland testing of VMA logic")
Signed-off-by: Suren Baghdasaryan <surenb(a)google.com>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes(a)oracle.com>
Cc: Jann Horn <jannh(a)google.com>
Cc: Liam R. Howlett <Liam.Howlett(a)Oracle.com>
Cc: Vlastimil Babka <vbabka(a)suse.cz>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
tools/testing/shared/linux/maple_tree.h | 2 +-
tools/testing/vma/linux/atomic.h | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)
--- a/tools/testing/shared/linux/maple_tree.h~tools-fix-atomic_set-definition-to-set-the-value-correctly
+++ a/tools/testing/shared/linux/maple_tree.h
@@ -2,6 +2,6 @@
#define atomic_t int32_t
#define atomic_inc(x) uatomic_inc(x)
#define atomic_read(x) uatomic_read(x)
-#define atomic_set(x, y) do {} while (0)
+#define atomic_set(x, y) uatomic_set(x, y)
#define U8_MAX UCHAR_MAX
#include "../../../../include/linux/maple_tree.h"
--- a/tools/testing/vma/linux/atomic.h~tools-fix-atomic_set-definition-to-set-the-value-correctly
+++ a/tools/testing/vma/linux/atomic.h
@@ -6,7 +6,7 @@
#define atomic_t int32_t
#define atomic_inc(x) uatomic_inc(x)
#define atomic_read(x) uatomic_read(x)
-#define atomic_set(x, y) do {} while (0)
+#define atomic_set(x, y) uatomic_set(x, y)
#define U8_MAX UCHAR_MAX
#endif /* _LINUX_ATOMIC_H */
_
Patches currently in -mm which might be from surenb(a)google.com are
alloc_tag-skip-pgalloc_tag_swap-if-profiling-is-disabled.patch
seqlock-add-raw_seqcount_try_begin.patch
mm-convert-mm_lock_seq-to-a-proper-seqcount.patch
mm-introduce-mmap_lock_speculate_try_beginretry.patch
mm-introduce-vma_start_read_locked_nested-helpers.patch
mm-move-per-vma-lock-into-vm_area_struct.patch
mm-mark-vma-as-detached-until-its-added-into-vma-tree.patch
mm-introduce-vma_iter_store_attached-to-use-with-attached-vmas.patch
mm-mark-vmas-detached-upon-exit.patch
types-move-struct-rcuwait-into-typesh.patch
mm-allow-vma_start_read_locked-vma_start_read_locked_nested-to-fail.patch
mm-move-mmap_init_lock-out-of-the-header-file.patch
mm-uninline-the-main-body-of-vma_start_write.patch
refcount-introduce-__refcount_addinc_not_zero_limited.patch
mm-replace-vm_lock-and-detached-flag-with-a-reference-count.patch
mm-move-lesser-used-vma_area_struct-members-into-the-last-cacheline.patch
mm-debug-print-vm_refcnt-state-when-dumping-the-vma.patch
mm-remove-extra-vma_numab_state_init-call.patch
mm-prepare-lock_vma_under_rcu-for-vma-reuse-possibility.patch
mm-make-vma-cache-slab_typesafe_by_rcu.patch
docs-mm-document-latest-changes-to-vm_lock.patch
alloc_tag-avoid-current-alloc_tag-manipulations-when-profiling-is-disabled.patch
The quilt patch titled
Subject: hugetlb: fix NULL pointer dereference in trace_hugetlbfs_alloc_inode
has been removed from the -mm tree. Its filename was
hugetlb-fix-null-pointer-dereference-in-trace_hugetlbfs_alloc_inode.patch
This patch was dropped because it was folded into mm-fix-div-by-zero-in-bdi_ratio_from_pages-v2.patch
------------------------------------------------------
From: Muchun Song <songmuchun(a)bytedance.com>
Subject: hugetlb: fix NULL pointer dereference in trace_hugetlbfs_alloc_inode
Date: Mon, 6 Jan 2025 11:31:17 +0800
hugetlb_file_setup() will pass a NULL @dir to hugetlbfs_get_inode(), so we
will access a NULL pointer for @dir. Fix it and set __entry->dr to 0 if
@dir is NULL. Because ->i_ino cannot be 0 (see get_next_ino()), there is
no confusing if user sees a 0 inode number.
Link: https://lkml.kernel.org/r/20250106033118.4640-1-songmuchun@bytedance.com
Fixes: 318580ad7f28 ("hugetlbfs: support tracepoint")
Signed-off-by: Muchun Song <songmuchun(a)bytedance.com>
Reported-by: Cheung Wall <zzqq0103.hey(a)gmail.com>
Closes: https://lore.kernel.org/linux-mm/02858D60-43C1-4863-A84F-3C76A8AF1F15@linux…
Reviewed-by: Hongbo Li <lihongbo22(a)huawei.com>
Cc: cheung wall <zzqq0103.hey(a)gmail.com>
Cc: Christian Brauner <brauner(a)kernel.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
include/trace/events/hugetlbfs.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/include/trace/events/hugetlbfs.h~hugetlb-fix-null-pointer-dereference-in-trace_hugetlbfs_alloc_inode
+++ a/include/trace/events/hugetlbfs.h
@@ -23,7 +23,7 @@ TRACE_EVENT(hugetlbfs_alloc_inode,
TP_fast_assign(
__entry->dev = inode->i_sb->s_dev;
__entry->ino = inode->i_ino;
- __entry->dir = dir->i_ino;
+ __entry->dir = dir ? dir->i_ino : 0;
__entry->mode = mode;
),
_
Patches currently in -mm which might be from songmuchun(a)bytedance.com are
mm-fix-div-by-zero-in-bdi_ratio_from_pages-v2.patch
We are seeing kernel panic when enabling two NCSI interfaces at same
time. It looks like mutex lock is being used in softirq caused the
issue.
This patch series try to fix oem gma command handling issue by adding a
new state, also fix a potential state handling issue.
Signed-off-by: Potin Lai <potin.lai.pt(a)gmail.com>
---
Changes in v3:
- Fix compile error by removing non-exist variable.
- Link to v2: https://lore.kernel.org/r/20250111-fix-ncsi-mac-v2-0-838e0a1a233a@gmail.com
Changes in v2:
- Add second patch for fixing state handling issue.
- Link to v1: https://lore.kernel.org/all/20250109145054.30925-1-fercerpav@gmail.com/
---
Cosmo Chou (1):
net/ncsi: fix state race during channel probe completion
Paul Fertser (1):
net/ncsi: fix locking in Get MAC Address handling
net/ncsi/internal.h | 2 ++
net/ncsi/ncsi-manage.c | 21 ++++++++++++++++++---
net/ncsi/ncsi-rsp.c | 19 ++++++-------------
3 files changed, 26 insertions(+), 16 deletions(-)
---
base-commit: fc033cf25e612e840e545f8d5ad2edd6ba613ed5
change-id: 20250111-fix-ncsi-mac-1e4b3df431f1
Best regards,
--
Potin Lai <potin.lai.pt(a)gmail.com>
Commit 73dae652dcac ("drm/amdgpu: rework resume handling for display (v2)")
missed a small code change when it was backported resulting in an automatic
backlight control breakage. Fix the backport.
Note that this patch is not in Linus' tree as it is not required there;
the bug was introduced in the backport.
Fixes: 99a02eab8251 ("drm/amdgpu: rework resume handling for display (v2)")
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3853
Signed-off-by: Alex Deucher <alexander.deucher(a)amd.com>
Cc: stable(a)vger.kernel.org # 6.11.x
---
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 51904906545e..45e28726e148 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -3721,8 +3721,12 @@ static int amdgpu_device_ip_resume_phase3(struct amdgpu_device *adev)
continue;
if (adev->ip_blocks[i].version->type == AMD_IP_BLOCK_TYPE_DCE) {
r = adev->ip_blocks[i].version->funcs->resume(adev);
- if (r)
+ if (r) {
+ DRM_ERROR("resume of IP block <%s> failed %d\n",
+ adev->ip_blocks[i].version->funcs->name, r);
return r;
+ }
+ adev->ip_blocks[i].status.hw = true;
}
}
--
2.47.1
The patch below does not apply to the 6.6-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y
git checkout FETCH_HEAD
git cherry-pick -x c9a40292a44e78f71258b8522655bffaf5753bdb
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025011245-hardy-tiptop-229e@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From c9a40292a44e78f71258b8522655bffaf5753bdb Mon Sep 17 00:00:00 2001
From: Jens Axboe <axboe(a)kernel.dk>
Date: Wed, 8 Jan 2025 10:28:05 -0700
Subject: [PATCH] io_uring/eventfd: ensure io_eventfd_signal() defers another
RCU period
io_eventfd_do_signal() is invoked from an RCU callback, but when
dropping the reference to the io_ev_fd, it calls io_eventfd_free()
directly if the refcount drops to zero. This isn't correct, as any
potential freeing of the io_ev_fd should be deferred another RCU grace
period.
Just call io_eventfd_put() rather than open-code the dec-and-test and
free, which will correctly defer it another RCU grace period.
Fixes: 21a091b970cd ("io_uring: signal registered eventfd to process deferred task work")
Reported-by: Jann Horn <jannh(a)google.com>
Cc: stable(a)vger.kernel.org
Tested-by: Li Zetao <lizetao1(a)huawei.com>
Reviewed-by: Li Zetao<lizetao1(a)huawei.com>
Reviewed-by: Prasanna Kumar T S M <ptsm(a)linux.microsoft.com>
Signed-off-by: Jens Axboe <axboe(a)kernel.dk>
diff --git a/io_uring/eventfd.c b/io_uring/eventfd.c
index fab936d31ba8..100d5da94cb9 100644
--- a/io_uring/eventfd.c
+++ b/io_uring/eventfd.c
@@ -33,20 +33,18 @@ static void io_eventfd_free(struct rcu_head *rcu)
kfree(ev_fd);
}
+static void io_eventfd_put(struct io_ev_fd *ev_fd)
+{
+ if (refcount_dec_and_test(&ev_fd->refs))
+ call_rcu(&ev_fd->rcu, io_eventfd_free);
+}
+
static void io_eventfd_do_signal(struct rcu_head *rcu)
{
struct io_ev_fd *ev_fd = container_of(rcu, struct io_ev_fd, rcu);
eventfd_signal_mask(ev_fd->cq_ev_fd, EPOLL_URING_WAKE);
-
- if (refcount_dec_and_test(&ev_fd->refs))
- io_eventfd_free(rcu);
-}
-
-static void io_eventfd_put(struct io_ev_fd *ev_fd)
-{
- if (refcount_dec_and_test(&ev_fd->refs))
- call_rcu(&ev_fd->rcu, io_eventfd_free);
+ io_eventfd_put(ev_fd);
}
static void io_eventfd_release(struct io_ev_fd *ev_fd, bool put_ref)
The patch below does not apply to the 6.12-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.12.y
git checkout FETCH_HEAD
git cherry-pick -x c9a40292a44e78f71258b8522655bffaf5753bdb
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025011244-expedited-clip-fd53@gregkh' --subject-prefix 'PATCH 6.12.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From c9a40292a44e78f71258b8522655bffaf5753bdb Mon Sep 17 00:00:00 2001
From: Jens Axboe <axboe(a)kernel.dk>
Date: Wed, 8 Jan 2025 10:28:05 -0700
Subject: [PATCH] io_uring/eventfd: ensure io_eventfd_signal() defers another
RCU period
io_eventfd_do_signal() is invoked from an RCU callback, but when
dropping the reference to the io_ev_fd, it calls io_eventfd_free()
directly if the refcount drops to zero. This isn't correct, as any
potential freeing of the io_ev_fd should be deferred another RCU grace
period.
Just call io_eventfd_put() rather than open-code the dec-and-test and
free, which will correctly defer it another RCU grace period.
Fixes: 21a091b970cd ("io_uring: signal registered eventfd to process deferred task work")
Reported-by: Jann Horn <jannh(a)google.com>
Cc: stable(a)vger.kernel.org
Tested-by: Li Zetao <lizetao1(a)huawei.com>
Reviewed-by: Li Zetao<lizetao1(a)huawei.com>
Reviewed-by: Prasanna Kumar T S M <ptsm(a)linux.microsoft.com>
Signed-off-by: Jens Axboe <axboe(a)kernel.dk>
diff --git a/io_uring/eventfd.c b/io_uring/eventfd.c
index fab936d31ba8..100d5da94cb9 100644
--- a/io_uring/eventfd.c
+++ b/io_uring/eventfd.c
@@ -33,20 +33,18 @@ static void io_eventfd_free(struct rcu_head *rcu)
kfree(ev_fd);
}
+static void io_eventfd_put(struct io_ev_fd *ev_fd)
+{
+ if (refcount_dec_and_test(&ev_fd->refs))
+ call_rcu(&ev_fd->rcu, io_eventfd_free);
+}
+
static void io_eventfd_do_signal(struct rcu_head *rcu)
{
struct io_ev_fd *ev_fd = container_of(rcu, struct io_ev_fd, rcu);
eventfd_signal_mask(ev_fd->cq_ev_fd, EPOLL_URING_WAKE);
-
- if (refcount_dec_and_test(&ev_fd->refs))
- io_eventfd_free(rcu);
-}
-
-static void io_eventfd_put(struct io_ev_fd *ev_fd)
-{
- if (refcount_dec_and_test(&ev_fd->refs))
- call_rcu(&ev_fd->rcu, io_eventfd_free);
+ io_eventfd_put(ev_fd);
}
static void io_eventfd_release(struct io_ev_fd *ev_fd, bool put_ref)
One fix on which rest of the patches are based on (context changes).
Not tested on hardware.
Best regards,
Krzysztof
---
Krzysztof Kozlowski (5):
can: c_can: Fix unbalanced runtime PM disable in error path
can: c_can: Drop useless final probe failure message
can: c_can: Simplify handling syscon error path
can: c_can: Use of_property_present() to test existence of DT property
can: c_can: Use syscon_regmap_lookup_by_phandle_args
drivers/net/can/c_can/c_can_platform.c | 56 +++++++++++-----------------------
1 file changed, 18 insertions(+), 38 deletions(-)
---
base-commit: df04532eb4413cfaf33a8a902d208f596ba7c1b0
change-id: 20250112-syscon-phandle-args-can-bd0b2067c4ad
Best regards,
--
Krzysztof Kozlowski <krzysztof.kozlowski(a)linaro.org>
One fix on which rest of the patches are based on (context changes).
Not tested on hardware.
Best regards,
Krzysztof
---
Krzysztof Kozlowski (5):
can: c_can: Fix unbalanced runtime PM disable in error path
can: c_can: Drop useless final probe failure message
can: c_can: Simplify handling syscon error path
can: c_can: Use of_property_present() to test existence of DT property
can: c_can: Use syscon_regmap_lookup_by_phandle_args
drivers/net/can/c_can/c_can_platform.c | 56 +++++++++++-----------------------
1 file changed, 18 insertions(+), 38 deletions(-)
---
base-commit: df04532eb4413cfaf33a8a902d208f596ba7c1b0
change-id: 20250112-syscon-phandle-args-can-bd0b2067c4ad
Best regards,
--
Krzysztof Kozlowski <krzysztof.kozlowski(a)linaro.org>
The patch below does not apply to the 6.12-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.12.y
git checkout FETCH_HEAD
git cherry-pick -x bd2703b42decebdcddf76e277ba76b4c4a142d73
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025011256-extinct-expanse-d059@gregkh' --subject-prefix 'PATCH 6.12.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From bd2703b42decebdcddf76e277ba76b4c4a142d73 Mon Sep 17 00:00:00 2001
From: Pavel Begunkov <asml.silence(a)gmail.com>
Date: Fri, 10 Jan 2025 20:36:45 +0000
Subject: [PATCH] io_uring: don't touch sqd->thread off tw add
With IORING_SETUP_SQPOLL all requests are created by the SQPOLL task,
which means that req->task should always match sqd->thread. Since
accesses to sqd->thread should be separately protected, use req->task
in io_req_normal_work_add() instead.
Note, in the eyes of io_req_normal_work_add(), the SQPOLL task struct
is always pinned and alive, and sqd->thread can either be the task or
NULL. It's only problematic if the compiler decides to reload the value
after the null check, which is not so likely.
Cc: stable(a)vger.kernel.org
Cc: Bui Quang Minh <minhquangbui99(a)gmail.com>
Reported-by: lizetao <lizetao1(a)huawei.com>
Fixes: 78f9b61bd8e54 ("io_uring: wake SQPOLL task when task_work is added to an empty queue")
Signed-off-by: Pavel Begunkov <asml.silence(a)gmail.com>
Link: https://lore.kernel.org/r/1cbbe72cf32c45a8fee96026463024cd8564a7d7.17365413…
Signed-off-by: Jens Axboe <axboe(a)kernel.dk>
diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index d3403c8216db..5eb119002099 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -1226,10 +1226,7 @@ static void io_req_normal_work_add(struct io_kiocb *req)
/* SQPOLL doesn't need the task_work added, it'll run it itself */
if (ctx->flags & IORING_SETUP_SQPOLL) {
- struct io_sq_data *sqd = ctx->sq_data;
-
- if (sqd->thread)
- __set_notify_signal(sqd->thread);
+ __set_notify_signal(tctx->task);
return;
}
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y
git checkout FETCH_HEAD
git cherry-pick -x 6a97f4118ac07cfdc316433f385dbdc12af5025e
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025011219-fondling-tweezers-0486@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 6a97f4118ac07cfdc316433f385dbdc12af5025e Mon Sep 17 00:00:00 2001
From: Nam Cao <namcao(a)linutronix.de>
Date: Mon, 18 Nov 2024 10:13:33 +0100
Subject: [PATCH] riscv: Fix sleeping in invalid context in die()
die() can be called in exception handler, and therefore cannot sleep.
However, die() takes spinlock_t which can sleep with PREEMPT_RT enabled.
That causes the following warning:
BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:48
in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 285, name: mutex
preempt_count: 110001, expected: 0
RCU nest depth: 0, expected: 0
CPU: 0 UID: 0 PID: 285 Comm: mutex Not tainted 6.12.0-rc7-00022-ge19049cf7d56-dirty #234
Hardware name: riscv-virtio,qemu (DT)
Call Trace:
dump_backtrace+0x1c/0x24
show_stack+0x2c/0x38
dump_stack_lvl+0x5a/0x72
dump_stack+0x14/0x1c
__might_resched+0x130/0x13a
rt_spin_lock+0x2a/0x5c
die+0x24/0x112
do_trap_insn_illegal+0xa0/0xea
_new_vmalloc_restore_context_a0+0xcc/0xd8
Oops - illegal instruction [#1]
Switch to use raw_spinlock_t, which does not sleep even with PREEMPT_RT
enabled.
Fixes: 76d2a0493a17 ("RISC-V: Init and Halt Code")
Signed-off-by: Nam Cao <namcao(a)linutronix.de>
Cc: stable(a)vger.kernel.org
Reviewed-by: Sebastian Andrzej Siewior <bigeasy(a)linutronix.de>
Link: https://lore.kernel.org/r/20241118091333.1185288-1-namcao@linutronix.de
Signed-off-by: Palmer Dabbelt <palmer(a)rivosinc.com>
diff --git a/arch/riscv/kernel/traps.c b/arch/riscv/kernel/traps.c
index 51ebfd23e007..8ff8e8b36524 100644
--- a/arch/riscv/kernel/traps.c
+++ b/arch/riscv/kernel/traps.c
@@ -35,7 +35,7 @@
int show_unhandled_signals = 1;
-static DEFINE_SPINLOCK(die_lock);
+static DEFINE_RAW_SPINLOCK(die_lock);
static int copy_code(struct pt_regs *regs, u16 *val, const u16 *insns)
{
@@ -81,7 +81,7 @@ void die(struct pt_regs *regs, const char *str)
oops_enter();
- spin_lock_irqsave(&die_lock, flags);
+ raw_spin_lock_irqsave(&die_lock, flags);
console_verbose();
bust_spinlocks(1);
@@ -100,7 +100,7 @@ void die(struct pt_regs *regs, const char *str)
bust_spinlocks(0);
add_taint(TAINT_DIE, LOCKDEP_NOW_UNRELIABLE);
- spin_unlock_irqrestore(&die_lock, flags);
+ raw_spin_unlock_irqrestore(&die_lock, flags);
oops_exit();
if (in_interrupt())
From: "Yiren.Xie" <1534428646(a)qq.com>
It is obvious a conflict between the code and the comment.
And verified that with this modification it can read the DAIF status.
Signed-off-by: Yiren.Xie <1534428646(a)qq.com>
---
arch/arm64/kernel/probes/decode-insn.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/arm64/kernel/probes/decode-insn.c b/arch/arm64/kernel/probes/decode-insn.c
index 6438bf62e753..22383eb1c22c 100644
--- a/arch/arm64/kernel/probes/decode-insn.c
+++ b/arch/arm64/kernel/probes/decode-insn.c
@@ -40,7 +40,7 @@ static bool __kprobes aarch64_insn_is_steppable(u32 insn)
*/
if (aarch64_insn_is_mrs(insn))
return aarch64_insn_extract_system_reg(insn)
- != AARCH64_INSN_SPCLREG_DAIF;
+ == AARCH64_INSN_SPCLREG_DAIF;
/*
* The HINT instruction is steppable only if it is in whitelist
--
2.34.1
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x 6df90c02bae468a3a6110bafbc659884d0c4966c
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025011142-bladder-elevator-f97f@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 6df90c02bae468a3a6110bafbc659884d0c4966c Mon Sep 17 00:00:00 2001
From: Milan Broz <gmazyland(a)gmail.com>
Date: Wed, 18 Dec 2024 13:56:58 +0100
Subject: [PATCH] dm-verity FEC: Fix RS FEC repair for roots unaligned to block
size (take 2)
This patch fixes an issue that was fixed in the commit
df7b59ba9245 ("dm verity: fix FEC for RS roots unaligned to block size")
but later broken again in the commit
8ca7cab82bda ("dm verity fec: fix misaligned RS roots IO")
If the Reed-Solomon roots setting spans multiple blocks, the code does not
use proper parity bytes and randomly fails to repair even trivial errors.
This bug cannot happen if the sector size is multiple of RS roots
setting (Android case with roots 2).
The previous solution was to find a dm-bufio block size that is multiple
of the device sector size and roots size. Unfortunately, the optimization
in commit 8ca7cab82bda ("dm verity fec: fix misaligned RS roots IO")
is incorrect and uses data block size for some roots (for example, it uses
4096 block size for roots = 20).
This patch uses a different approach:
- It always uses a configured data block size for dm-bufio to avoid
possible misaligned IOs.
- and it caches the processed parity bytes, so it can join it
if it spans two blocks.
As the RS calculation is called only if an error is detected and
the process is computationally intensive, copying a few more bytes
should not introduce performance issues.
The issue was reported to cryptsetup with trivial reproducer
https://gitlab.com/cryptsetup/cryptsetup/-/issues/923
Reproducer (with roots=20):
# create verity device with RS FEC
dd if=/dev/urandom of=data.img bs=4096 count=8 status=none
veritysetup format data.img hash.img --fec-device=fec.img --fec-roots=20 | \
awk '/^Root hash/{ print $3 }' >roothash
# create an erasure that should always be repairable with this roots setting
dd if=/dev/zero of=data.img conv=notrunc bs=1 count=4 seek=4 status=none
# try to read it through dm-verity
veritysetup open data.img test hash.img --fec-device=fec.img --fec-roots=20 $(cat roothash)
dd if=/dev/mapper/test of=/dev/null bs=4096 status=noxfer
Even now the log says it cannot repair it:
: verity-fec: 7:1: FEC 0: failed to correct: -74
: device-mapper: verity: 7:1: data block 0 is corrupted
...
With this fix, errors are properly repaired.
: verity-fec: 7:1: FEC 0: corrected 4 errors
Signed-off-by: Milan Broz <gmazyland(a)gmail.com>
Fixes: 8ca7cab82bda ("dm verity fec: fix misaligned RS roots IO")
Cc: stable(a)vger.kernel.org
Signed-off-by: Mikulas Patocka <mpatocka(a)redhat.com>
diff --git a/drivers/md/dm-verity-fec.c b/drivers/md/dm-verity-fec.c
index 62b1a44b8dd2..6bd9848518d4 100644
--- a/drivers/md/dm-verity-fec.c
+++ b/drivers/md/dm-verity-fec.c
@@ -60,15 +60,19 @@ static int fec_decode_rs8(struct dm_verity *v, struct dm_verity_fec_io *fio,
* to the data block. Caller is responsible for releasing buf.
*/
static u8 *fec_read_parity(struct dm_verity *v, u64 rsb, int index,
- unsigned int *offset, struct dm_buffer **buf,
- unsigned short ioprio)
+ unsigned int *offset, unsigned int par_buf_offset,
+ struct dm_buffer **buf, unsigned short ioprio)
{
u64 position, block, rem;
u8 *res;
+ /* We have already part of parity bytes read, skip to the next block */
+ if (par_buf_offset)
+ index++;
+
position = (index + rsb) * v->fec->roots;
block = div64_u64_rem(position, v->fec->io_size, &rem);
- *offset = (unsigned int)rem;
+ *offset = par_buf_offset ? 0 : (unsigned int)rem;
res = dm_bufio_read_with_ioprio(v->fec->bufio, block, buf, ioprio);
if (IS_ERR(res)) {
@@ -128,11 +132,12 @@ static int fec_decode_bufs(struct dm_verity *v, struct dm_verity_io *io,
{
int r, corrected = 0, res;
struct dm_buffer *buf;
- unsigned int n, i, offset;
- u8 *par, *block;
+ unsigned int n, i, offset, par_buf_offset = 0;
+ u8 *par, *block, par_buf[DM_VERITY_FEC_RSM - DM_VERITY_FEC_MIN_RSN];
struct bio *bio = dm_bio_from_per_bio_data(io, v->ti->per_io_data_size);
- par = fec_read_parity(v, rsb, block_offset, &offset, &buf, bio_prio(bio));
+ par = fec_read_parity(v, rsb, block_offset, &offset,
+ par_buf_offset, &buf, bio_prio(bio));
if (IS_ERR(par))
return PTR_ERR(par);
@@ -142,7 +147,8 @@ static int fec_decode_bufs(struct dm_verity *v, struct dm_verity_io *io,
*/
fec_for_each_buffer_rs_block(fio, n, i) {
block = fec_buffer_rs_block(v, fio, n, i);
- res = fec_decode_rs8(v, fio, block, &par[offset], neras);
+ memcpy(&par_buf[par_buf_offset], &par[offset], v->fec->roots - par_buf_offset);
+ res = fec_decode_rs8(v, fio, block, par_buf, neras);
if (res < 0) {
r = res;
goto error;
@@ -155,12 +161,21 @@ static int fec_decode_bufs(struct dm_verity *v, struct dm_verity_io *io,
if (block_offset >= 1 << v->data_dev_block_bits)
goto done;
- /* read the next block when we run out of parity bytes */
- offset += v->fec->roots;
+ /* Read the next block when we run out of parity bytes */
+ offset += (v->fec->roots - par_buf_offset);
+ /* Check if parity bytes are split between blocks */
+ if (offset < v->fec->io_size && (offset + v->fec->roots) > v->fec->io_size) {
+ par_buf_offset = v->fec->io_size - offset;
+ memcpy(par_buf, &par[offset], par_buf_offset);
+ offset += par_buf_offset;
+ } else
+ par_buf_offset = 0;
+
if (offset >= v->fec->io_size) {
dm_bufio_release(buf);
- par = fec_read_parity(v, rsb, block_offset, &offset, &buf, bio_prio(bio));
+ par = fec_read_parity(v, rsb, block_offset, &offset,
+ par_buf_offset, &buf, bio_prio(bio));
if (IS_ERR(par))
return PTR_ERR(par);
}
@@ -724,10 +739,7 @@ int verity_fec_ctr(struct dm_verity *v)
return -E2BIG;
}
- if ((f->roots << SECTOR_SHIFT) & ((1 << v->data_dev_block_bits) - 1))
- f->io_size = 1 << v->data_dev_block_bits;
- else
- f->io_size = v->fec->roots << SECTOR_SHIFT;
+ f->io_size = 1 << v->data_dev_block_bits;
f->bufio = dm_bufio_client_create(f->dev->bdev,
f->io_size,
The patch below does not apply to the 6.6-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y
git checkout FETCH_HEAD
git cherry-pick -x 6df90c02bae468a3a6110bafbc659884d0c4966c
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025011140-scope-tasting-d3ad@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 6df90c02bae468a3a6110bafbc659884d0c4966c Mon Sep 17 00:00:00 2001
From: Milan Broz <gmazyland(a)gmail.com>
Date: Wed, 18 Dec 2024 13:56:58 +0100
Subject: [PATCH] dm-verity FEC: Fix RS FEC repair for roots unaligned to block
size (take 2)
This patch fixes an issue that was fixed in the commit
df7b59ba9245 ("dm verity: fix FEC for RS roots unaligned to block size")
but later broken again in the commit
8ca7cab82bda ("dm verity fec: fix misaligned RS roots IO")
If the Reed-Solomon roots setting spans multiple blocks, the code does not
use proper parity bytes and randomly fails to repair even trivial errors.
This bug cannot happen if the sector size is multiple of RS roots
setting (Android case with roots 2).
The previous solution was to find a dm-bufio block size that is multiple
of the device sector size and roots size. Unfortunately, the optimization
in commit 8ca7cab82bda ("dm verity fec: fix misaligned RS roots IO")
is incorrect and uses data block size for some roots (for example, it uses
4096 block size for roots = 20).
This patch uses a different approach:
- It always uses a configured data block size for dm-bufio to avoid
possible misaligned IOs.
- and it caches the processed parity bytes, so it can join it
if it spans two blocks.
As the RS calculation is called only if an error is detected and
the process is computationally intensive, copying a few more bytes
should not introduce performance issues.
The issue was reported to cryptsetup with trivial reproducer
https://gitlab.com/cryptsetup/cryptsetup/-/issues/923
Reproducer (with roots=20):
# create verity device with RS FEC
dd if=/dev/urandom of=data.img bs=4096 count=8 status=none
veritysetup format data.img hash.img --fec-device=fec.img --fec-roots=20 | \
awk '/^Root hash/{ print $3 }' >roothash
# create an erasure that should always be repairable with this roots setting
dd if=/dev/zero of=data.img conv=notrunc bs=1 count=4 seek=4 status=none
# try to read it through dm-verity
veritysetup open data.img test hash.img --fec-device=fec.img --fec-roots=20 $(cat roothash)
dd if=/dev/mapper/test of=/dev/null bs=4096 status=noxfer
Even now the log says it cannot repair it:
: verity-fec: 7:1: FEC 0: failed to correct: -74
: device-mapper: verity: 7:1: data block 0 is corrupted
...
With this fix, errors are properly repaired.
: verity-fec: 7:1: FEC 0: corrected 4 errors
Signed-off-by: Milan Broz <gmazyland(a)gmail.com>
Fixes: 8ca7cab82bda ("dm verity fec: fix misaligned RS roots IO")
Cc: stable(a)vger.kernel.org
Signed-off-by: Mikulas Patocka <mpatocka(a)redhat.com>
diff --git a/drivers/md/dm-verity-fec.c b/drivers/md/dm-verity-fec.c
index 62b1a44b8dd2..6bd9848518d4 100644
--- a/drivers/md/dm-verity-fec.c
+++ b/drivers/md/dm-verity-fec.c
@@ -60,15 +60,19 @@ static int fec_decode_rs8(struct dm_verity *v, struct dm_verity_fec_io *fio,
* to the data block. Caller is responsible for releasing buf.
*/
static u8 *fec_read_parity(struct dm_verity *v, u64 rsb, int index,
- unsigned int *offset, struct dm_buffer **buf,
- unsigned short ioprio)
+ unsigned int *offset, unsigned int par_buf_offset,
+ struct dm_buffer **buf, unsigned short ioprio)
{
u64 position, block, rem;
u8 *res;
+ /* We have already part of parity bytes read, skip to the next block */
+ if (par_buf_offset)
+ index++;
+
position = (index + rsb) * v->fec->roots;
block = div64_u64_rem(position, v->fec->io_size, &rem);
- *offset = (unsigned int)rem;
+ *offset = par_buf_offset ? 0 : (unsigned int)rem;
res = dm_bufio_read_with_ioprio(v->fec->bufio, block, buf, ioprio);
if (IS_ERR(res)) {
@@ -128,11 +132,12 @@ static int fec_decode_bufs(struct dm_verity *v, struct dm_verity_io *io,
{
int r, corrected = 0, res;
struct dm_buffer *buf;
- unsigned int n, i, offset;
- u8 *par, *block;
+ unsigned int n, i, offset, par_buf_offset = 0;
+ u8 *par, *block, par_buf[DM_VERITY_FEC_RSM - DM_VERITY_FEC_MIN_RSN];
struct bio *bio = dm_bio_from_per_bio_data(io, v->ti->per_io_data_size);
- par = fec_read_parity(v, rsb, block_offset, &offset, &buf, bio_prio(bio));
+ par = fec_read_parity(v, rsb, block_offset, &offset,
+ par_buf_offset, &buf, bio_prio(bio));
if (IS_ERR(par))
return PTR_ERR(par);
@@ -142,7 +147,8 @@ static int fec_decode_bufs(struct dm_verity *v, struct dm_verity_io *io,
*/
fec_for_each_buffer_rs_block(fio, n, i) {
block = fec_buffer_rs_block(v, fio, n, i);
- res = fec_decode_rs8(v, fio, block, &par[offset], neras);
+ memcpy(&par_buf[par_buf_offset], &par[offset], v->fec->roots - par_buf_offset);
+ res = fec_decode_rs8(v, fio, block, par_buf, neras);
if (res < 0) {
r = res;
goto error;
@@ -155,12 +161,21 @@ static int fec_decode_bufs(struct dm_verity *v, struct dm_verity_io *io,
if (block_offset >= 1 << v->data_dev_block_bits)
goto done;
- /* read the next block when we run out of parity bytes */
- offset += v->fec->roots;
+ /* Read the next block when we run out of parity bytes */
+ offset += (v->fec->roots - par_buf_offset);
+ /* Check if parity bytes are split between blocks */
+ if (offset < v->fec->io_size && (offset + v->fec->roots) > v->fec->io_size) {
+ par_buf_offset = v->fec->io_size - offset;
+ memcpy(par_buf, &par[offset], par_buf_offset);
+ offset += par_buf_offset;
+ } else
+ par_buf_offset = 0;
+
if (offset >= v->fec->io_size) {
dm_bufio_release(buf);
- par = fec_read_parity(v, rsb, block_offset, &offset, &buf, bio_prio(bio));
+ par = fec_read_parity(v, rsb, block_offset, &offset,
+ par_buf_offset, &buf, bio_prio(bio));
if (IS_ERR(par))
return PTR_ERR(par);
}
@@ -724,10 +739,7 @@ int verity_fec_ctr(struct dm_verity *v)
return -E2BIG;
}
- if ((f->roots << SECTOR_SHIFT) & ((1 << v->data_dev_block_bits) - 1))
- f->io_size = 1 << v->data_dev_block_bits;
- else
- f->io_size = v->fec->roots << SECTOR_SHIFT;
+ f->io_size = 1 << v->data_dev_block_bits;
f->bufio = dm_bufio_client_create(f->dev->bdev,
f->io_size,
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y
git checkout FETCH_HEAD
git cherry-pick -x 7f5611cbc4871c7fb1ad36c2e5a9edad63dca95c
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025011125-nutlike-regally-c21e@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 7f5611cbc4871c7fb1ad36c2e5a9edad63dca95c Mon Sep 17 00:00:00 2001
From: "Matthieu Baerts (NGI0)" <matttbe(a)kernel.org>
Date: Wed, 8 Jan 2025 16:34:37 +0100
Subject: [PATCH] rds: sysctl: rds_tcp_{rcv,snd}buf: avoid using
current->nsproxy
As mentioned in a previous commit of this series, using the 'net'
structure via 'current' is not recommended for different reasons:
- Inconsistency: getting info from the reader's/writer's netns vs only
from the opener's netns.
- current->nsproxy can be NULL in some cases, resulting in an 'Oops'
(null-ptr-deref), e.g. when the current task is exiting, as spotted by
syzbot [1] using acct(2).
The per-netns structure can be obtained from the table->data using
container_of(), then the 'net' one can be retrieved from the listen
socket (if available).
Fixes: c6a58ffed536 ("RDS: TCP: Add sysctl tunables for sndbuf/rcvbuf on rds-tcp socket")
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/67769ecb.050a0220.3a8527.003f.GAE@google.com [1]
Suggested-by: Al Viro <viro(a)zeniv.linux.org.uk>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe(a)kernel.org>
Link: https://patch.msgid.link/20250108-net-sysctl-current-nsproxy-v1-9-5df34b208…
Signed-off-by: Jakub Kicinski <kuba(a)kernel.org>
diff --git a/net/rds/tcp.c b/net/rds/tcp.c
index 351ac1747224..0581c53e6517 100644
--- a/net/rds/tcp.c
+++ b/net/rds/tcp.c
@@ -61,8 +61,10 @@ static atomic_t rds_tcp_unloading = ATOMIC_INIT(0);
static struct kmem_cache *rds_tcp_conn_slab;
-static int rds_tcp_skbuf_handler(const struct ctl_table *ctl, int write,
- void *buffer, size_t *lenp, loff_t *fpos);
+static int rds_tcp_sndbuf_handler(const struct ctl_table *ctl, int write,
+ void *buffer, size_t *lenp, loff_t *fpos);
+static int rds_tcp_rcvbuf_handler(const struct ctl_table *ctl, int write,
+ void *buffer, size_t *lenp, loff_t *fpos);
static int rds_tcp_min_sndbuf = SOCK_MIN_SNDBUF;
static int rds_tcp_min_rcvbuf = SOCK_MIN_RCVBUF;
@@ -74,7 +76,7 @@ static struct ctl_table rds_tcp_sysctl_table[] = {
/* data is per-net pointer */
.maxlen = sizeof(int),
.mode = 0644,
- .proc_handler = rds_tcp_skbuf_handler,
+ .proc_handler = rds_tcp_sndbuf_handler,
.extra1 = &rds_tcp_min_sndbuf,
},
#define RDS_TCP_RCVBUF 1
@@ -83,7 +85,7 @@ static struct ctl_table rds_tcp_sysctl_table[] = {
/* data is per-net pointer */
.maxlen = sizeof(int),
.mode = 0644,
- .proc_handler = rds_tcp_skbuf_handler,
+ .proc_handler = rds_tcp_rcvbuf_handler,
.extra1 = &rds_tcp_min_rcvbuf,
},
};
@@ -682,10 +684,10 @@ static void rds_tcp_sysctl_reset(struct net *net)
spin_unlock_irq(&rds_tcp_conn_lock);
}
-static int rds_tcp_skbuf_handler(const struct ctl_table *ctl, int write,
+static int rds_tcp_skbuf_handler(struct rds_tcp_net *rtn,
+ const struct ctl_table *ctl, int write,
void *buffer, size_t *lenp, loff_t *fpos)
{
- struct net *net = current->nsproxy->net_ns;
int err;
err = proc_dointvec_minmax(ctl, write, buffer, lenp, fpos);
@@ -694,11 +696,34 @@ static int rds_tcp_skbuf_handler(const struct ctl_table *ctl, int write,
*(int *)(ctl->extra1));
return err;
}
- if (write)
+
+ if (write && rtn->rds_tcp_listen_sock && rtn->rds_tcp_listen_sock->sk) {
+ struct net *net = sock_net(rtn->rds_tcp_listen_sock->sk);
+
rds_tcp_sysctl_reset(net);
+ }
+
return 0;
}
+static int rds_tcp_sndbuf_handler(const struct ctl_table *ctl, int write,
+ void *buffer, size_t *lenp, loff_t *fpos)
+{
+ struct rds_tcp_net *rtn = container_of(ctl->data, struct rds_tcp_net,
+ sndbuf_size);
+
+ return rds_tcp_skbuf_handler(rtn, ctl, write, buffer, lenp, fpos);
+}
+
+static int rds_tcp_rcvbuf_handler(const struct ctl_table *ctl, int write,
+ void *buffer, size_t *lenp, loff_t *fpos)
+{
+ struct rds_tcp_net *rtn = container_of(ctl->data, struct rds_tcp_net,
+ rcvbuf_size);
+
+ return rds_tcp_skbuf_handler(rtn, ctl, write, buffer, lenp, fpos);
+}
+
static void rds_tcp_exit(void)
{
rds_tcp_set_unloading();
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x 7f5611cbc4871c7fb1ad36c2e5a9edad63dca95c
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025011125-sly-unworthy-41c0@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 7f5611cbc4871c7fb1ad36c2e5a9edad63dca95c Mon Sep 17 00:00:00 2001
From: "Matthieu Baerts (NGI0)" <matttbe(a)kernel.org>
Date: Wed, 8 Jan 2025 16:34:37 +0100
Subject: [PATCH] rds: sysctl: rds_tcp_{rcv,snd}buf: avoid using
current->nsproxy
As mentioned in a previous commit of this series, using the 'net'
structure via 'current' is not recommended for different reasons:
- Inconsistency: getting info from the reader's/writer's netns vs only
from the opener's netns.
- current->nsproxy can be NULL in some cases, resulting in an 'Oops'
(null-ptr-deref), e.g. when the current task is exiting, as spotted by
syzbot [1] using acct(2).
The per-netns structure can be obtained from the table->data using
container_of(), then the 'net' one can be retrieved from the listen
socket (if available).
Fixes: c6a58ffed536 ("RDS: TCP: Add sysctl tunables for sndbuf/rcvbuf on rds-tcp socket")
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/67769ecb.050a0220.3a8527.003f.GAE@google.com [1]
Suggested-by: Al Viro <viro(a)zeniv.linux.org.uk>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe(a)kernel.org>
Link: https://patch.msgid.link/20250108-net-sysctl-current-nsproxy-v1-9-5df34b208…
Signed-off-by: Jakub Kicinski <kuba(a)kernel.org>
diff --git a/net/rds/tcp.c b/net/rds/tcp.c
index 351ac1747224..0581c53e6517 100644
--- a/net/rds/tcp.c
+++ b/net/rds/tcp.c
@@ -61,8 +61,10 @@ static atomic_t rds_tcp_unloading = ATOMIC_INIT(0);
static struct kmem_cache *rds_tcp_conn_slab;
-static int rds_tcp_skbuf_handler(const struct ctl_table *ctl, int write,
- void *buffer, size_t *lenp, loff_t *fpos);
+static int rds_tcp_sndbuf_handler(const struct ctl_table *ctl, int write,
+ void *buffer, size_t *lenp, loff_t *fpos);
+static int rds_tcp_rcvbuf_handler(const struct ctl_table *ctl, int write,
+ void *buffer, size_t *lenp, loff_t *fpos);
static int rds_tcp_min_sndbuf = SOCK_MIN_SNDBUF;
static int rds_tcp_min_rcvbuf = SOCK_MIN_RCVBUF;
@@ -74,7 +76,7 @@ static struct ctl_table rds_tcp_sysctl_table[] = {
/* data is per-net pointer */
.maxlen = sizeof(int),
.mode = 0644,
- .proc_handler = rds_tcp_skbuf_handler,
+ .proc_handler = rds_tcp_sndbuf_handler,
.extra1 = &rds_tcp_min_sndbuf,
},
#define RDS_TCP_RCVBUF 1
@@ -83,7 +85,7 @@ static struct ctl_table rds_tcp_sysctl_table[] = {
/* data is per-net pointer */
.maxlen = sizeof(int),
.mode = 0644,
- .proc_handler = rds_tcp_skbuf_handler,
+ .proc_handler = rds_tcp_rcvbuf_handler,
.extra1 = &rds_tcp_min_rcvbuf,
},
};
@@ -682,10 +684,10 @@ static void rds_tcp_sysctl_reset(struct net *net)
spin_unlock_irq(&rds_tcp_conn_lock);
}
-static int rds_tcp_skbuf_handler(const struct ctl_table *ctl, int write,
+static int rds_tcp_skbuf_handler(struct rds_tcp_net *rtn,
+ const struct ctl_table *ctl, int write,
void *buffer, size_t *lenp, loff_t *fpos)
{
- struct net *net = current->nsproxy->net_ns;
int err;
err = proc_dointvec_minmax(ctl, write, buffer, lenp, fpos);
@@ -694,11 +696,34 @@ static int rds_tcp_skbuf_handler(const struct ctl_table *ctl, int write,
*(int *)(ctl->extra1));
return err;
}
- if (write)
+
+ if (write && rtn->rds_tcp_listen_sock && rtn->rds_tcp_listen_sock->sk) {
+ struct net *net = sock_net(rtn->rds_tcp_listen_sock->sk);
+
rds_tcp_sysctl_reset(net);
+ }
+
return 0;
}
+static int rds_tcp_sndbuf_handler(const struct ctl_table *ctl, int write,
+ void *buffer, size_t *lenp, loff_t *fpos)
+{
+ struct rds_tcp_net *rtn = container_of(ctl->data, struct rds_tcp_net,
+ sndbuf_size);
+
+ return rds_tcp_skbuf_handler(rtn, ctl, write, buffer, lenp, fpos);
+}
+
+static int rds_tcp_rcvbuf_handler(const struct ctl_table *ctl, int write,
+ void *buffer, size_t *lenp, loff_t *fpos)
+{
+ struct rds_tcp_net *rtn = container_of(ctl->data, struct rds_tcp_net,
+ rcvbuf_size);
+
+ return rds_tcp_skbuf_handler(rtn, ctl, write, buffer, lenp, fpos);
+}
+
static void rds_tcp_exit(void)
{
rds_tcp_set_unloading();
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x 7f5611cbc4871c7fb1ad36c2e5a9edad63dca95c
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025011124-ability-carrousel-5947@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 7f5611cbc4871c7fb1ad36c2e5a9edad63dca95c Mon Sep 17 00:00:00 2001
From: "Matthieu Baerts (NGI0)" <matttbe(a)kernel.org>
Date: Wed, 8 Jan 2025 16:34:37 +0100
Subject: [PATCH] rds: sysctl: rds_tcp_{rcv,snd}buf: avoid using
current->nsproxy
As mentioned in a previous commit of this series, using the 'net'
structure via 'current' is not recommended for different reasons:
- Inconsistency: getting info from the reader's/writer's netns vs only
from the opener's netns.
- current->nsproxy can be NULL in some cases, resulting in an 'Oops'
(null-ptr-deref), e.g. when the current task is exiting, as spotted by
syzbot [1] using acct(2).
The per-netns structure can be obtained from the table->data using
container_of(), then the 'net' one can be retrieved from the listen
socket (if available).
Fixes: c6a58ffed536 ("RDS: TCP: Add sysctl tunables for sndbuf/rcvbuf on rds-tcp socket")
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/67769ecb.050a0220.3a8527.003f.GAE@google.com [1]
Suggested-by: Al Viro <viro(a)zeniv.linux.org.uk>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe(a)kernel.org>
Link: https://patch.msgid.link/20250108-net-sysctl-current-nsproxy-v1-9-5df34b208…
Signed-off-by: Jakub Kicinski <kuba(a)kernel.org>
diff --git a/net/rds/tcp.c b/net/rds/tcp.c
index 351ac1747224..0581c53e6517 100644
--- a/net/rds/tcp.c
+++ b/net/rds/tcp.c
@@ -61,8 +61,10 @@ static atomic_t rds_tcp_unloading = ATOMIC_INIT(0);
static struct kmem_cache *rds_tcp_conn_slab;
-static int rds_tcp_skbuf_handler(const struct ctl_table *ctl, int write,
- void *buffer, size_t *lenp, loff_t *fpos);
+static int rds_tcp_sndbuf_handler(const struct ctl_table *ctl, int write,
+ void *buffer, size_t *lenp, loff_t *fpos);
+static int rds_tcp_rcvbuf_handler(const struct ctl_table *ctl, int write,
+ void *buffer, size_t *lenp, loff_t *fpos);
static int rds_tcp_min_sndbuf = SOCK_MIN_SNDBUF;
static int rds_tcp_min_rcvbuf = SOCK_MIN_RCVBUF;
@@ -74,7 +76,7 @@ static struct ctl_table rds_tcp_sysctl_table[] = {
/* data is per-net pointer */
.maxlen = sizeof(int),
.mode = 0644,
- .proc_handler = rds_tcp_skbuf_handler,
+ .proc_handler = rds_tcp_sndbuf_handler,
.extra1 = &rds_tcp_min_sndbuf,
},
#define RDS_TCP_RCVBUF 1
@@ -83,7 +85,7 @@ static struct ctl_table rds_tcp_sysctl_table[] = {
/* data is per-net pointer */
.maxlen = sizeof(int),
.mode = 0644,
- .proc_handler = rds_tcp_skbuf_handler,
+ .proc_handler = rds_tcp_rcvbuf_handler,
.extra1 = &rds_tcp_min_rcvbuf,
},
};
@@ -682,10 +684,10 @@ static void rds_tcp_sysctl_reset(struct net *net)
spin_unlock_irq(&rds_tcp_conn_lock);
}
-static int rds_tcp_skbuf_handler(const struct ctl_table *ctl, int write,
+static int rds_tcp_skbuf_handler(struct rds_tcp_net *rtn,
+ const struct ctl_table *ctl, int write,
void *buffer, size_t *lenp, loff_t *fpos)
{
- struct net *net = current->nsproxy->net_ns;
int err;
err = proc_dointvec_minmax(ctl, write, buffer, lenp, fpos);
@@ -694,11 +696,34 @@ static int rds_tcp_skbuf_handler(const struct ctl_table *ctl, int write,
*(int *)(ctl->extra1));
return err;
}
- if (write)
+
+ if (write && rtn->rds_tcp_listen_sock && rtn->rds_tcp_listen_sock->sk) {
+ struct net *net = sock_net(rtn->rds_tcp_listen_sock->sk);
+
rds_tcp_sysctl_reset(net);
+ }
+
return 0;
}
+static int rds_tcp_sndbuf_handler(const struct ctl_table *ctl, int write,
+ void *buffer, size_t *lenp, loff_t *fpos)
+{
+ struct rds_tcp_net *rtn = container_of(ctl->data, struct rds_tcp_net,
+ sndbuf_size);
+
+ return rds_tcp_skbuf_handler(rtn, ctl, write, buffer, lenp, fpos);
+}
+
+static int rds_tcp_rcvbuf_handler(const struct ctl_table *ctl, int write,
+ void *buffer, size_t *lenp, loff_t *fpos)
+{
+ struct rds_tcp_net *rtn = container_of(ctl->data, struct rds_tcp_net,
+ rcvbuf_size);
+
+ return rds_tcp_skbuf_handler(rtn, ctl, write, buffer, lenp, fpos);
+}
+
static void rds_tcp_exit(void)
{
rds_tcp_set_unloading();
The patch below does not apply to the 6.6-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y
git checkout FETCH_HEAD
git cherry-pick -x 7f5611cbc4871c7fb1ad36c2e5a9edad63dca95c
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025011122-collie-dimmer-4313@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 7f5611cbc4871c7fb1ad36c2e5a9edad63dca95c Mon Sep 17 00:00:00 2001
From: "Matthieu Baerts (NGI0)" <matttbe(a)kernel.org>
Date: Wed, 8 Jan 2025 16:34:37 +0100
Subject: [PATCH] rds: sysctl: rds_tcp_{rcv,snd}buf: avoid using
current->nsproxy
As mentioned in a previous commit of this series, using the 'net'
structure via 'current' is not recommended for different reasons:
- Inconsistency: getting info from the reader's/writer's netns vs only
from the opener's netns.
- current->nsproxy can be NULL in some cases, resulting in an 'Oops'
(null-ptr-deref), e.g. when the current task is exiting, as spotted by
syzbot [1] using acct(2).
The per-netns structure can be obtained from the table->data using
container_of(), then the 'net' one can be retrieved from the listen
socket (if available).
Fixes: c6a58ffed536 ("RDS: TCP: Add sysctl tunables for sndbuf/rcvbuf on rds-tcp socket")
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/67769ecb.050a0220.3a8527.003f.GAE@google.com [1]
Suggested-by: Al Viro <viro(a)zeniv.linux.org.uk>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe(a)kernel.org>
Link: https://patch.msgid.link/20250108-net-sysctl-current-nsproxy-v1-9-5df34b208…
Signed-off-by: Jakub Kicinski <kuba(a)kernel.org>
diff --git a/net/rds/tcp.c b/net/rds/tcp.c
index 351ac1747224..0581c53e6517 100644
--- a/net/rds/tcp.c
+++ b/net/rds/tcp.c
@@ -61,8 +61,10 @@ static atomic_t rds_tcp_unloading = ATOMIC_INIT(0);
static struct kmem_cache *rds_tcp_conn_slab;
-static int rds_tcp_skbuf_handler(const struct ctl_table *ctl, int write,
- void *buffer, size_t *lenp, loff_t *fpos);
+static int rds_tcp_sndbuf_handler(const struct ctl_table *ctl, int write,
+ void *buffer, size_t *lenp, loff_t *fpos);
+static int rds_tcp_rcvbuf_handler(const struct ctl_table *ctl, int write,
+ void *buffer, size_t *lenp, loff_t *fpos);
static int rds_tcp_min_sndbuf = SOCK_MIN_SNDBUF;
static int rds_tcp_min_rcvbuf = SOCK_MIN_RCVBUF;
@@ -74,7 +76,7 @@ static struct ctl_table rds_tcp_sysctl_table[] = {
/* data is per-net pointer */
.maxlen = sizeof(int),
.mode = 0644,
- .proc_handler = rds_tcp_skbuf_handler,
+ .proc_handler = rds_tcp_sndbuf_handler,
.extra1 = &rds_tcp_min_sndbuf,
},
#define RDS_TCP_RCVBUF 1
@@ -83,7 +85,7 @@ static struct ctl_table rds_tcp_sysctl_table[] = {
/* data is per-net pointer */
.maxlen = sizeof(int),
.mode = 0644,
- .proc_handler = rds_tcp_skbuf_handler,
+ .proc_handler = rds_tcp_rcvbuf_handler,
.extra1 = &rds_tcp_min_rcvbuf,
},
};
@@ -682,10 +684,10 @@ static void rds_tcp_sysctl_reset(struct net *net)
spin_unlock_irq(&rds_tcp_conn_lock);
}
-static int rds_tcp_skbuf_handler(const struct ctl_table *ctl, int write,
+static int rds_tcp_skbuf_handler(struct rds_tcp_net *rtn,
+ const struct ctl_table *ctl, int write,
void *buffer, size_t *lenp, loff_t *fpos)
{
- struct net *net = current->nsproxy->net_ns;
int err;
err = proc_dointvec_minmax(ctl, write, buffer, lenp, fpos);
@@ -694,11 +696,34 @@ static int rds_tcp_skbuf_handler(const struct ctl_table *ctl, int write,
*(int *)(ctl->extra1));
return err;
}
- if (write)
+
+ if (write && rtn->rds_tcp_listen_sock && rtn->rds_tcp_listen_sock->sk) {
+ struct net *net = sock_net(rtn->rds_tcp_listen_sock->sk);
+
rds_tcp_sysctl_reset(net);
+ }
+
return 0;
}
+static int rds_tcp_sndbuf_handler(const struct ctl_table *ctl, int write,
+ void *buffer, size_t *lenp, loff_t *fpos)
+{
+ struct rds_tcp_net *rtn = container_of(ctl->data, struct rds_tcp_net,
+ sndbuf_size);
+
+ return rds_tcp_skbuf_handler(rtn, ctl, write, buffer, lenp, fpos);
+}
+
+static int rds_tcp_rcvbuf_handler(const struct ctl_table *ctl, int write,
+ void *buffer, size_t *lenp, loff_t *fpos)
+{
+ struct rds_tcp_net *rtn = container_of(ctl->data, struct rds_tcp_net,
+ rcvbuf_size);
+
+ return rds_tcp_skbuf_handler(rtn, ctl, write, buffer, lenp, fpos);
+}
+
static void rds_tcp_exit(void)
{
rds_tcp_set_unloading();
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x 7f5611cbc4871c7fb1ad36c2e5a9edad63dca95c
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025011123-tumble-stopper-38ec@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 7f5611cbc4871c7fb1ad36c2e5a9edad63dca95c Mon Sep 17 00:00:00 2001
From: "Matthieu Baerts (NGI0)" <matttbe(a)kernel.org>
Date: Wed, 8 Jan 2025 16:34:37 +0100
Subject: [PATCH] rds: sysctl: rds_tcp_{rcv,snd}buf: avoid using
current->nsproxy
As mentioned in a previous commit of this series, using the 'net'
structure via 'current' is not recommended for different reasons:
- Inconsistency: getting info from the reader's/writer's netns vs only
from the opener's netns.
- current->nsproxy can be NULL in some cases, resulting in an 'Oops'
(null-ptr-deref), e.g. when the current task is exiting, as spotted by
syzbot [1] using acct(2).
The per-netns structure can be obtained from the table->data using
container_of(), then the 'net' one can be retrieved from the listen
socket (if available).
Fixes: c6a58ffed536 ("RDS: TCP: Add sysctl tunables for sndbuf/rcvbuf on rds-tcp socket")
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/67769ecb.050a0220.3a8527.003f.GAE@google.com [1]
Suggested-by: Al Viro <viro(a)zeniv.linux.org.uk>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe(a)kernel.org>
Link: https://patch.msgid.link/20250108-net-sysctl-current-nsproxy-v1-9-5df34b208…
Signed-off-by: Jakub Kicinski <kuba(a)kernel.org>
diff --git a/net/rds/tcp.c b/net/rds/tcp.c
index 351ac1747224..0581c53e6517 100644
--- a/net/rds/tcp.c
+++ b/net/rds/tcp.c
@@ -61,8 +61,10 @@ static atomic_t rds_tcp_unloading = ATOMIC_INIT(0);
static struct kmem_cache *rds_tcp_conn_slab;
-static int rds_tcp_skbuf_handler(const struct ctl_table *ctl, int write,
- void *buffer, size_t *lenp, loff_t *fpos);
+static int rds_tcp_sndbuf_handler(const struct ctl_table *ctl, int write,
+ void *buffer, size_t *lenp, loff_t *fpos);
+static int rds_tcp_rcvbuf_handler(const struct ctl_table *ctl, int write,
+ void *buffer, size_t *lenp, loff_t *fpos);
static int rds_tcp_min_sndbuf = SOCK_MIN_SNDBUF;
static int rds_tcp_min_rcvbuf = SOCK_MIN_RCVBUF;
@@ -74,7 +76,7 @@ static struct ctl_table rds_tcp_sysctl_table[] = {
/* data is per-net pointer */
.maxlen = sizeof(int),
.mode = 0644,
- .proc_handler = rds_tcp_skbuf_handler,
+ .proc_handler = rds_tcp_sndbuf_handler,
.extra1 = &rds_tcp_min_sndbuf,
},
#define RDS_TCP_RCVBUF 1
@@ -83,7 +85,7 @@ static struct ctl_table rds_tcp_sysctl_table[] = {
/* data is per-net pointer */
.maxlen = sizeof(int),
.mode = 0644,
- .proc_handler = rds_tcp_skbuf_handler,
+ .proc_handler = rds_tcp_rcvbuf_handler,
.extra1 = &rds_tcp_min_rcvbuf,
},
};
@@ -682,10 +684,10 @@ static void rds_tcp_sysctl_reset(struct net *net)
spin_unlock_irq(&rds_tcp_conn_lock);
}
-static int rds_tcp_skbuf_handler(const struct ctl_table *ctl, int write,
+static int rds_tcp_skbuf_handler(struct rds_tcp_net *rtn,
+ const struct ctl_table *ctl, int write,
void *buffer, size_t *lenp, loff_t *fpos)
{
- struct net *net = current->nsproxy->net_ns;
int err;
err = proc_dointvec_minmax(ctl, write, buffer, lenp, fpos);
@@ -694,11 +696,34 @@ static int rds_tcp_skbuf_handler(const struct ctl_table *ctl, int write,
*(int *)(ctl->extra1));
return err;
}
- if (write)
+
+ if (write && rtn->rds_tcp_listen_sock && rtn->rds_tcp_listen_sock->sk) {
+ struct net *net = sock_net(rtn->rds_tcp_listen_sock->sk);
+
rds_tcp_sysctl_reset(net);
+ }
+
return 0;
}
+static int rds_tcp_sndbuf_handler(const struct ctl_table *ctl, int write,
+ void *buffer, size_t *lenp, loff_t *fpos)
+{
+ struct rds_tcp_net *rtn = container_of(ctl->data, struct rds_tcp_net,
+ sndbuf_size);
+
+ return rds_tcp_skbuf_handler(rtn, ctl, write, buffer, lenp, fpos);
+}
+
+static int rds_tcp_rcvbuf_handler(const struct ctl_table *ctl, int write,
+ void *buffer, size_t *lenp, loff_t *fpos)
+{
+ struct rds_tcp_net *rtn = container_of(ctl->data, struct rds_tcp_net,
+ rcvbuf_size);
+
+ return rds_tcp_skbuf_handler(rtn, ctl, write, buffer, lenp, fpos);
+}
+
static void rds_tcp_exit(void)
{
rds_tcp_set_unloading();
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y
git checkout FETCH_HEAD
git cherry-pick -x 9fc17b76fc70763780aa78b38fcf4742384044a5
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025011139-spoilage-camping-62c6@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 9fc17b76fc70763780aa78b38fcf4742384044a5 Mon Sep 17 00:00:00 2001
From: "Matthieu Baerts (NGI0)" <matttbe(a)kernel.org>
Date: Wed, 8 Jan 2025 16:34:33 +0100
Subject: [PATCH] sctp: sysctl: rto_min/max: avoid using current->nsproxy
As mentioned in a previous commit of this series, using the 'net'
structure via 'current' is not recommended for different reasons:
- Inconsistency: getting info from the reader's/writer's netns vs only
from the opener's netns.
- current->nsproxy can be NULL in some cases, resulting in an 'Oops'
(null-ptr-deref), e.g. when the current task is exiting, as spotted by
syzbot [1] using acct(2).
The 'net' structure can be obtained from the table->data using
container_of().
Note that table->data could also be used directly, as this is the only
member needed from the 'net' structure, but that would increase the size
of this fix, to use '*data' everywhere 'net->sctp.rto_min/max' is used.
Fixes: 4f3fdf3bc59c ("sctp: add check rto_min and rto_max in sysctl")
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/67769ecb.050a0220.3a8527.003f.GAE@google.com [1]
Suggested-by: Al Viro <viro(a)zeniv.linux.org.uk>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe(a)kernel.org>
Link: https://patch.msgid.link/20250108-net-sysctl-current-nsproxy-v1-5-5df34b208…
Signed-off-by: Jakub Kicinski <kuba(a)kernel.org>
diff --git a/net/sctp/sysctl.c b/net/sctp/sysctl.c
index 9848d19630a4..a5285815264d 100644
--- a/net/sctp/sysctl.c
+++ b/net/sctp/sysctl.c
@@ -433,7 +433,7 @@ static int proc_sctp_do_hmac_alg(const struct ctl_table *ctl, int write,
static int proc_sctp_do_rto_min(const struct ctl_table *ctl, int write,
void *buffer, size_t *lenp, loff_t *ppos)
{
- struct net *net = current->nsproxy->net_ns;
+ struct net *net = container_of(ctl->data, struct net, sctp.rto_min);
unsigned int min = *(unsigned int *) ctl->extra1;
unsigned int max = *(unsigned int *) ctl->extra2;
struct ctl_table tbl;
@@ -461,7 +461,7 @@ static int proc_sctp_do_rto_min(const struct ctl_table *ctl, int write,
static int proc_sctp_do_rto_max(const struct ctl_table *ctl, int write,
void *buffer, size_t *lenp, loff_t *ppos)
{
- struct net *net = current->nsproxy->net_ns;
+ struct net *net = container_of(ctl->data, struct net, sctp.rto_max);
unsigned int min = *(unsigned int *) ctl->extra1;
unsigned int max = *(unsigned int *) ctl->extra2;
struct ctl_table tbl;
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x 9fc17b76fc70763780aa78b38fcf4742384044a5
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025011138-sherry-tinker-8a0c@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 9fc17b76fc70763780aa78b38fcf4742384044a5 Mon Sep 17 00:00:00 2001
From: "Matthieu Baerts (NGI0)" <matttbe(a)kernel.org>
Date: Wed, 8 Jan 2025 16:34:33 +0100
Subject: [PATCH] sctp: sysctl: rto_min/max: avoid using current->nsproxy
As mentioned in a previous commit of this series, using the 'net'
structure via 'current' is not recommended for different reasons:
- Inconsistency: getting info from the reader's/writer's netns vs only
from the opener's netns.
- current->nsproxy can be NULL in some cases, resulting in an 'Oops'
(null-ptr-deref), e.g. when the current task is exiting, as spotted by
syzbot [1] using acct(2).
The 'net' structure can be obtained from the table->data using
container_of().
Note that table->data could also be used directly, as this is the only
member needed from the 'net' structure, but that would increase the size
of this fix, to use '*data' everywhere 'net->sctp.rto_min/max' is used.
Fixes: 4f3fdf3bc59c ("sctp: add check rto_min and rto_max in sysctl")
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/67769ecb.050a0220.3a8527.003f.GAE@google.com [1]
Suggested-by: Al Viro <viro(a)zeniv.linux.org.uk>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe(a)kernel.org>
Link: https://patch.msgid.link/20250108-net-sysctl-current-nsproxy-v1-5-5df34b208…
Signed-off-by: Jakub Kicinski <kuba(a)kernel.org>
diff --git a/net/sctp/sysctl.c b/net/sctp/sysctl.c
index 9848d19630a4..a5285815264d 100644
--- a/net/sctp/sysctl.c
+++ b/net/sctp/sysctl.c
@@ -433,7 +433,7 @@ static int proc_sctp_do_hmac_alg(const struct ctl_table *ctl, int write,
static int proc_sctp_do_rto_min(const struct ctl_table *ctl, int write,
void *buffer, size_t *lenp, loff_t *ppos)
{
- struct net *net = current->nsproxy->net_ns;
+ struct net *net = container_of(ctl->data, struct net, sctp.rto_min);
unsigned int min = *(unsigned int *) ctl->extra1;
unsigned int max = *(unsigned int *) ctl->extra2;
struct ctl_table tbl;
@@ -461,7 +461,7 @@ static int proc_sctp_do_rto_min(const struct ctl_table *ctl, int write,
static int proc_sctp_do_rto_max(const struct ctl_table *ctl, int write,
void *buffer, size_t *lenp, loff_t *ppos)
{
- struct net *net = current->nsproxy->net_ns;
+ struct net *net = container_of(ctl->data, struct net, sctp.rto_max);
unsigned int min = *(unsigned int *) ctl->extra1;
unsigned int max = *(unsigned int *) ctl->extra2;
struct ctl_table tbl;
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y
git checkout FETCH_HEAD
git cherry-pick -x 6df90c02bae468a3a6110bafbc659884d0c4966c
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025011146-perkiness-camper-9011@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 6df90c02bae468a3a6110bafbc659884d0c4966c Mon Sep 17 00:00:00 2001
From: Milan Broz <gmazyland(a)gmail.com>
Date: Wed, 18 Dec 2024 13:56:58 +0100
Subject: [PATCH] dm-verity FEC: Fix RS FEC repair for roots unaligned to block
size (take 2)
This patch fixes an issue that was fixed in the commit
df7b59ba9245 ("dm verity: fix FEC for RS roots unaligned to block size")
but later broken again in the commit
8ca7cab82bda ("dm verity fec: fix misaligned RS roots IO")
If the Reed-Solomon roots setting spans multiple blocks, the code does not
use proper parity bytes and randomly fails to repair even trivial errors.
This bug cannot happen if the sector size is multiple of RS roots
setting (Android case with roots 2).
The previous solution was to find a dm-bufio block size that is multiple
of the device sector size and roots size. Unfortunately, the optimization
in commit 8ca7cab82bda ("dm verity fec: fix misaligned RS roots IO")
is incorrect and uses data block size for some roots (for example, it uses
4096 block size for roots = 20).
This patch uses a different approach:
- It always uses a configured data block size for dm-bufio to avoid
possible misaligned IOs.
- and it caches the processed parity bytes, so it can join it
if it spans two blocks.
As the RS calculation is called only if an error is detected and
the process is computationally intensive, copying a few more bytes
should not introduce performance issues.
The issue was reported to cryptsetup with trivial reproducer
https://gitlab.com/cryptsetup/cryptsetup/-/issues/923
Reproducer (with roots=20):
# create verity device with RS FEC
dd if=/dev/urandom of=data.img bs=4096 count=8 status=none
veritysetup format data.img hash.img --fec-device=fec.img --fec-roots=20 | \
awk '/^Root hash/{ print $3 }' >roothash
# create an erasure that should always be repairable with this roots setting
dd if=/dev/zero of=data.img conv=notrunc bs=1 count=4 seek=4 status=none
# try to read it through dm-verity
veritysetup open data.img test hash.img --fec-device=fec.img --fec-roots=20 $(cat roothash)
dd if=/dev/mapper/test of=/dev/null bs=4096 status=noxfer
Even now the log says it cannot repair it:
: verity-fec: 7:1: FEC 0: failed to correct: -74
: device-mapper: verity: 7:1: data block 0 is corrupted
...
With this fix, errors are properly repaired.
: verity-fec: 7:1: FEC 0: corrected 4 errors
Signed-off-by: Milan Broz <gmazyland(a)gmail.com>
Fixes: 8ca7cab82bda ("dm verity fec: fix misaligned RS roots IO")
Cc: stable(a)vger.kernel.org
Signed-off-by: Mikulas Patocka <mpatocka(a)redhat.com>
diff --git a/drivers/md/dm-verity-fec.c b/drivers/md/dm-verity-fec.c
index 62b1a44b8dd2..6bd9848518d4 100644
--- a/drivers/md/dm-verity-fec.c
+++ b/drivers/md/dm-verity-fec.c
@@ -60,15 +60,19 @@ static int fec_decode_rs8(struct dm_verity *v, struct dm_verity_fec_io *fio,
* to the data block. Caller is responsible for releasing buf.
*/
static u8 *fec_read_parity(struct dm_verity *v, u64 rsb, int index,
- unsigned int *offset, struct dm_buffer **buf,
- unsigned short ioprio)
+ unsigned int *offset, unsigned int par_buf_offset,
+ struct dm_buffer **buf, unsigned short ioprio)
{
u64 position, block, rem;
u8 *res;
+ /* We have already part of parity bytes read, skip to the next block */
+ if (par_buf_offset)
+ index++;
+
position = (index + rsb) * v->fec->roots;
block = div64_u64_rem(position, v->fec->io_size, &rem);
- *offset = (unsigned int)rem;
+ *offset = par_buf_offset ? 0 : (unsigned int)rem;
res = dm_bufio_read_with_ioprio(v->fec->bufio, block, buf, ioprio);
if (IS_ERR(res)) {
@@ -128,11 +132,12 @@ static int fec_decode_bufs(struct dm_verity *v, struct dm_verity_io *io,
{
int r, corrected = 0, res;
struct dm_buffer *buf;
- unsigned int n, i, offset;
- u8 *par, *block;
+ unsigned int n, i, offset, par_buf_offset = 0;
+ u8 *par, *block, par_buf[DM_VERITY_FEC_RSM - DM_VERITY_FEC_MIN_RSN];
struct bio *bio = dm_bio_from_per_bio_data(io, v->ti->per_io_data_size);
- par = fec_read_parity(v, rsb, block_offset, &offset, &buf, bio_prio(bio));
+ par = fec_read_parity(v, rsb, block_offset, &offset,
+ par_buf_offset, &buf, bio_prio(bio));
if (IS_ERR(par))
return PTR_ERR(par);
@@ -142,7 +147,8 @@ static int fec_decode_bufs(struct dm_verity *v, struct dm_verity_io *io,
*/
fec_for_each_buffer_rs_block(fio, n, i) {
block = fec_buffer_rs_block(v, fio, n, i);
- res = fec_decode_rs8(v, fio, block, &par[offset], neras);
+ memcpy(&par_buf[par_buf_offset], &par[offset], v->fec->roots - par_buf_offset);
+ res = fec_decode_rs8(v, fio, block, par_buf, neras);
if (res < 0) {
r = res;
goto error;
@@ -155,12 +161,21 @@ static int fec_decode_bufs(struct dm_verity *v, struct dm_verity_io *io,
if (block_offset >= 1 << v->data_dev_block_bits)
goto done;
- /* read the next block when we run out of parity bytes */
- offset += v->fec->roots;
+ /* Read the next block when we run out of parity bytes */
+ offset += (v->fec->roots - par_buf_offset);
+ /* Check if parity bytes are split between blocks */
+ if (offset < v->fec->io_size && (offset + v->fec->roots) > v->fec->io_size) {
+ par_buf_offset = v->fec->io_size - offset;
+ memcpy(par_buf, &par[offset], par_buf_offset);
+ offset += par_buf_offset;
+ } else
+ par_buf_offset = 0;
+
if (offset >= v->fec->io_size) {
dm_bufio_release(buf);
- par = fec_read_parity(v, rsb, block_offset, &offset, &buf, bio_prio(bio));
+ par = fec_read_parity(v, rsb, block_offset, &offset,
+ par_buf_offset, &buf, bio_prio(bio));
if (IS_ERR(par))
return PTR_ERR(par);
}
@@ -724,10 +739,7 @@ int verity_fec_ctr(struct dm_verity *v)
return -E2BIG;
}
- if ((f->roots << SECTOR_SHIFT) & ((1 << v->data_dev_block_bits) - 1))
- f->io_size = 1 << v->data_dev_block_bits;
- else
- f->io_size = v->fec->roots << SECTOR_SHIFT;
+ f->io_size = 1 << v->data_dev_block_bits;
f->bufio = dm_bufio_client_create(f->dev->bdev,
f->io_size,
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x 6df90c02bae468a3a6110bafbc659884d0c4966c
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025011144-slab-putt-25f3@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 6df90c02bae468a3a6110bafbc659884d0c4966c Mon Sep 17 00:00:00 2001
From: Milan Broz <gmazyland(a)gmail.com>
Date: Wed, 18 Dec 2024 13:56:58 +0100
Subject: [PATCH] dm-verity FEC: Fix RS FEC repair for roots unaligned to block
size (take 2)
This patch fixes an issue that was fixed in the commit
df7b59ba9245 ("dm verity: fix FEC for RS roots unaligned to block size")
but later broken again in the commit
8ca7cab82bda ("dm verity fec: fix misaligned RS roots IO")
If the Reed-Solomon roots setting spans multiple blocks, the code does not
use proper parity bytes and randomly fails to repair even trivial errors.
This bug cannot happen if the sector size is multiple of RS roots
setting (Android case with roots 2).
The previous solution was to find a dm-bufio block size that is multiple
of the device sector size and roots size. Unfortunately, the optimization
in commit 8ca7cab82bda ("dm verity fec: fix misaligned RS roots IO")
is incorrect and uses data block size for some roots (for example, it uses
4096 block size for roots = 20).
This patch uses a different approach:
- It always uses a configured data block size for dm-bufio to avoid
possible misaligned IOs.
- and it caches the processed parity bytes, so it can join it
if it spans two blocks.
As the RS calculation is called only if an error is detected and
the process is computationally intensive, copying a few more bytes
should not introduce performance issues.
The issue was reported to cryptsetup with trivial reproducer
https://gitlab.com/cryptsetup/cryptsetup/-/issues/923
Reproducer (with roots=20):
# create verity device with RS FEC
dd if=/dev/urandom of=data.img bs=4096 count=8 status=none
veritysetup format data.img hash.img --fec-device=fec.img --fec-roots=20 | \
awk '/^Root hash/{ print $3 }' >roothash
# create an erasure that should always be repairable with this roots setting
dd if=/dev/zero of=data.img conv=notrunc bs=1 count=4 seek=4 status=none
# try to read it through dm-verity
veritysetup open data.img test hash.img --fec-device=fec.img --fec-roots=20 $(cat roothash)
dd if=/dev/mapper/test of=/dev/null bs=4096 status=noxfer
Even now the log says it cannot repair it:
: verity-fec: 7:1: FEC 0: failed to correct: -74
: device-mapper: verity: 7:1: data block 0 is corrupted
...
With this fix, errors are properly repaired.
: verity-fec: 7:1: FEC 0: corrected 4 errors
Signed-off-by: Milan Broz <gmazyland(a)gmail.com>
Fixes: 8ca7cab82bda ("dm verity fec: fix misaligned RS roots IO")
Cc: stable(a)vger.kernel.org
Signed-off-by: Mikulas Patocka <mpatocka(a)redhat.com>
diff --git a/drivers/md/dm-verity-fec.c b/drivers/md/dm-verity-fec.c
index 62b1a44b8dd2..6bd9848518d4 100644
--- a/drivers/md/dm-verity-fec.c
+++ b/drivers/md/dm-verity-fec.c
@@ -60,15 +60,19 @@ static int fec_decode_rs8(struct dm_verity *v, struct dm_verity_fec_io *fio,
* to the data block. Caller is responsible for releasing buf.
*/
static u8 *fec_read_parity(struct dm_verity *v, u64 rsb, int index,
- unsigned int *offset, struct dm_buffer **buf,
- unsigned short ioprio)
+ unsigned int *offset, unsigned int par_buf_offset,
+ struct dm_buffer **buf, unsigned short ioprio)
{
u64 position, block, rem;
u8 *res;
+ /* We have already part of parity bytes read, skip to the next block */
+ if (par_buf_offset)
+ index++;
+
position = (index + rsb) * v->fec->roots;
block = div64_u64_rem(position, v->fec->io_size, &rem);
- *offset = (unsigned int)rem;
+ *offset = par_buf_offset ? 0 : (unsigned int)rem;
res = dm_bufio_read_with_ioprio(v->fec->bufio, block, buf, ioprio);
if (IS_ERR(res)) {
@@ -128,11 +132,12 @@ static int fec_decode_bufs(struct dm_verity *v, struct dm_verity_io *io,
{
int r, corrected = 0, res;
struct dm_buffer *buf;
- unsigned int n, i, offset;
- u8 *par, *block;
+ unsigned int n, i, offset, par_buf_offset = 0;
+ u8 *par, *block, par_buf[DM_VERITY_FEC_RSM - DM_VERITY_FEC_MIN_RSN];
struct bio *bio = dm_bio_from_per_bio_data(io, v->ti->per_io_data_size);
- par = fec_read_parity(v, rsb, block_offset, &offset, &buf, bio_prio(bio));
+ par = fec_read_parity(v, rsb, block_offset, &offset,
+ par_buf_offset, &buf, bio_prio(bio));
if (IS_ERR(par))
return PTR_ERR(par);
@@ -142,7 +147,8 @@ static int fec_decode_bufs(struct dm_verity *v, struct dm_verity_io *io,
*/
fec_for_each_buffer_rs_block(fio, n, i) {
block = fec_buffer_rs_block(v, fio, n, i);
- res = fec_decode_rs8(v, fio, block, &par[offset], neras);
+ memcpy(&par_buf[par_buf_offset], &par[offset], v->fec->roots - par_buf_offset);
+ res = fec_decode_rs8(v, fio, block, par_buf, neras);
if (res < 0) {
r = res;
goto error;
@@ -155,12 +161,21 @@ static int fec_decode_bufs(struct dm_verity *v, struct dm_verity_io *io,
if (block_offset >= 1 << v->data_dev_block_bits)
goto done;
- /* read the next block when we run out of parity bytes */
- offset += v->fec->roots;
+ /* Read the next block when we run out of parity bytes */
+ offset += (v->fec->roots - par_buf_offset);
+ /* Check if parity bytes are split between blocks */
+ if (offset < v->fec->io_size && (offset + v->fec->roots) > v->fec->io_size) {
+ par_buf_offset = v->fec->io_size - offset;
+ memcpy(par_buf, &par[offset], par_buf_offset);
+ offset += par_buf_offset;
+ } else
+ par_buf_offset = 0;
+
if (offset >= v->fec->io_size) {
dm_bufio_release(buf);
- par = fec_read_parity(v, rsb, block_offset, &offset, &buf, bio_prio(bio));
+ par = fec_read_parity(v, rsb, block_offset, &offset,
+ par_buf_offset, &buf, bio_prio(bio));
if (IS_ERR(par))
return PTR_ERR(par);
}
@@ -724,10 +739,7 @@ int verity_fec_ctr(struct dm_verity *v)
return -E2BIG;
}
- if ((f->roots << SECTOR_SHIFT) & ((1 << v->data_dev_block_bits) - 1))
- f->io_size = 1 << v->data_dev_block_bits;
- else
- f->io_size = v->fec->roots << SECTOR_SHIFT;
+ f->io_size = 1 << v->data_dev_block_bits;
f->bufio = dm_bufio_client_create(f->dev->bdev,
f->io_size,
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x 6df90c02bae468a3a6110bafbc659884d0c4966c
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025011143-boned-discount-f55c@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 6df90c02bae468a3a6110bafbc659884d0c4966c Mon Sep 17 00:00:00 2001
From: Milan Broz <gmazyland(a)gmail.com>
Date: Wed, 18 Dec 2024 13:56:58 +0100
Subject: [PATCH] dm-verity FEC: Fix RS FEC repair for roots unaligned to block
size (take 2)
This patch fixes an issue that was fixed in the commit
df7b59ba9245 ("dm verity: fix FEC for RS roots unaligned to block size")
but later broken again in the commit
8ca7cab82bda ("dm verity fec: fix misaligned RS roots IO")
If the Reed-Solomon roots setting spans multiple blocks, the code does not
use proper parity bytes and randomly fails to repair even trivial errors.
This bug cannot happen if the sector size is multiple of RS roots
setting (Android case with roots 2).
The previous solution was to find a dm-bufio block size that is multiple
of the device sector size and roots size. Unfortunately, the optimization
in commit 8ca7cab82bda ("dm verity fec: fix misaligned RS roots IO")
is incorrect and uses data block size for some roots (for example, it uses
4096 block size for roots = 20).
This patch uses a different approach:
- It always uses a configured data block size for dm-bufio to avoid
possible misaligned IOs.
- and it caches the processed parity bytes, so it can join it
if it spans two blocks.
As the RS calculation is called only if an error is detected and
the process is computationally intensive, copying a few more bytes
should not introduce performance issues.
The issue was reported to cryptsetup with trivial reproducer
https://gitlab.com/cryptsetup/cryptsetup/-/issues/923
Reproducer (with roots=20):
# create verity device with RS FEC
dd if=/dev/urandom of=data.img bs=4096 count=8 status=none
veritysetup format data.img hash.img --fec-device=fec.img --fec-roots=20 | \
awk '/^Root hash/{ print $3 }' >roothash
# create an erasure that should always be repairable with this roots setting
dd if=/dev/zero of=data.img conv=notrunc bs=1 count=4 seek=4 status=none
# try to read it through dm-verity
veritysetup open data.img test hash.img --fec-device=fec.img --fec-roots=20 $(cat roothash)
dd if=/dev/mapper/test of=/dev/null bs=4096 status=noxfer
Even now the log says it cannot repair it:
: verity-fec: 7:1: FEC 0: failed to correct: -74
: device-mapper: verity: 7:1: data block 0 is corrupted
...
With this fix, errors are properly repaired.
: verity-fec: 7:1: FEC 0: corrected 4 errors
Signed-off-by: Milan Broz <gmazyland(a)gmail.com>
Fixes: 8ca7cab82bda ("dm verity fec: fix misaligned RS roots IO")
Cc: stable(a)vger.kernel.org
Signed-off-by: Mikulas Patocka <mpatocka(a)redhat.com>
diff --git a/drivers/md/dm-verity-fec.c b/drivers/md/dm-verity-fec.c
index 62b1a44b8dd2..6bd9848518d4 100644
--- a/drivers/md/dm-verity-fec.c
+++ b/drivers/md/dm-verity-fec.c
@@ -60,15 +60,19 @@ static int fec_decode_rs8(struct dm_verity *v, struct dm_verity_fec_io *fio,
* to the data block. Caller is responsible for releasing buf.
*/
static u8 *fec_read_parity(struct dm_verity *v, u64 rsb, int index,
- unsigned int *offset, struct dm_buffer **buf,
- unsigned short ioprio)
+ unsigned int *offset, unsigned int par_buf_offset,
+ struct dm_buffer **buf, unsigned short ioprio)
{
u64 position, block, rem;
u8 *res;
+ /* We have already part of parity bytes read, skip to the next block */
+ if (par_buf_offset)
+ index++;
+
position = (index + rsb) * v->fec->roots;
block = div64_u64_rem(position, v->fec->io_size, &rem);
- *offset = (unsigned int)rem;
+ *offset = par_buf_offset ? 0 : (unsigned int)rem;
res = dm_bufio_read_with_ioprio(v->fec->bufio, block, buf, ioprio);
if (IS_ERR(res)) {
@@ -128,11 +132,12 @@ static int fec_decode_bufs(struct dm_verity *v, struct dm_verity_io *io,
{
int r, corrected = 0, res;
struct dm_buffer *buf;
- unsigned int n, i, offset;
- u8 *par, *block;
+ unsigned int n, i, offset, par_buf_offset = 0;
+ u8 *par, *block, par_buf[DM_VERITY_FEC_RSM - DM_VERITY_FEC_MIN_RSN];
struct bio *bio = dm_bio_from_per_bio_data(io, v->ti->per_io_data_size);
- par = fec_read_parity(v, rsb, block_offset, &offset, &buf, bio_prio(bio));
+ par = fec_read_parity(v, rsb, block_offset, &offset,
+ par_buf_offset, &buf, bio_prio(bio));
if (IS_ERR(par))
return PTR_ERR(par);
@@ -142,7 +147,8 @@ static int fec_decode_bufs(struct dm_verity *v, struct dm_verity_io *io,
*/
fec_for_each_buffer_rs_block(fio, n, i) {
block = fec_buffer_rs_block(v, fio, n, i);
- res = fec_decode_rs8(v, fio, block, &par[offset], neras);
+ memcpy(&par_buf[par_buf_offset], &par[offset], v->fec->roots - par_buf_offset);
+ res = fec_decode_rs8(v, fio, block, par_buf, neras);
if (res < 0) {
r = res;
goto error;
@@ -155,12 +161,21 @@ static int fec_decode_bufs(struct dm_verity *v, struct dm_verity_io *io,
if (block_offset >= 1 << v->data_dev_block_bits)
goto done;
- /* read the next block when we run out of parity bytes */
- offset += v->fec->roots;
+ /* Read the next block when we run out of parity bytes */
+ offset += (v->fec->roots - par_buf_offset);
+ /* Check if parity bytes are split between blocks */
+ if (offset < v->fec->io_size && (offset + v->fec->roots) > v->fec->io_size) {
+ par_buf_offset = v->fec->io_size - offset;
+ memcpy(par_buf, &par[offset], par_buf_offset);
+ offset += par_buf_offset;
+ } else
+ par_buf_offset = 0;
+
if (offset >= v->fec->io_size) {
dm_bufio_release(buf);
- par = fec_read_parity(v, rsb, block_offset, &offset, &buf, bio_prio(bio));
+ par = fec_read_parity(v, rsb, block_offset, &offset,
+ par_buf_offset, &buf, bio_prio(bio));
if (IS_ERR(par))
return PTR_ERR(par);
}
@@ -724,10 +739,7 @@ int verity_fec_ctr(struct dm_verity *v)
return -E2BIG;
}
- if ((f->roots << SECTOR_SHIFT) & ((1 << v->data_dev_block_bits) - 1))
- f->io_size = 1 << v->data_dev_block_bits;
- else
- f->io_size = v->fec->roots << SECTOR_SHIFT;
+ f->io_size = 1 << v->data_dev_block_bits;
f->bufio = dm_bufio_client_create(f->dev->bdev,
f->io_size,
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y
git checkout FETCH_HEAD
git cherry-pick -x 974e3fe0ac61de85015bbe5a4990cf4127b304b2
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025011115-carmaker-staleness-c55f@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 974e3fe0ac61de85015bbe5a4990cf4127b304b2 Mon Sep 17 00:00:00 2001
From: Amir Goldstein <amir73il(a)gmail.com>
Date: Thu, 19 Dec 2024 12:53:01 +0100
Subject: [PATCH] fs: relax assertions on failure to encode file handles
Encoding file handles is usually performed by a filesystem >encode_fh()
method that may fail for various reasons.
The legacy users of exportfs_encode_fh(), namely, nfsd and
name_to_handle_at(2) syscall are ready to cope with the possibility
of failure to encode a file handle.
There are a few other users of exportfs_encode_{fh,fid}() that
currently have a WARN_ON() assertion when ->encode_fh() fails.
Relax those assertions because they are wrong.
The second linked bug report states commit 16aac5ad1fa9 ("ovl: support
encoding non-decodable file handles") in v6.6 as the regressing commit,
but this is not accurate.
The aforementioned commit only increases the chances of the assertion
and allows triggering the assertion with the reproducer using overlayfs,
inotify and drop_caches.
Triggering this assertion was always possible with other filesystems and
other reasons of ->encode_fh() failures and more particularly, it was
also possible with the exact same reproducer using overlayfs that is
mounted with options index=on,nfs_export=on also on kernels < v6.6.
Therefore, I am not listing the aforementioned commit as a Fixes commit.
Backport hint: this patch will have a trivial conflict applying to
v6.6.y, and other trivial conflicts applying to stable kernels < v6.6.
Reported-by: syzbot+ec07f6f5ce62b858579f(a)syzkaller.appspotmail.com
Tested-by: syzbot+ec07f6f5ce62b858579f(a)syzkaller.appspotmail.com
Closes: https://lore.kernel.org/linux-unionfs/671fd40c.050a0220.4735a.024f.GAE@goog…
Reported-by: Dmitry Safonov <dima(a)arista.com>
Closes: https://lore.kernel.org/linux-fsdevel/CAGrbwDTLt6drB9eaUagnQVgdPBmhLfqqxAf3…
Cc: stable(a)vger.kernel.org
Signed-off-by: Amir Goldstein <amir73il(a)gmail.com>
Link: https://lore.kernel.org/r/20241219115301.465396-1-amir73il@gmail.com
Signed-off-by: Christian Brauner <brauner(a)kernel.org>
diff --git a/fs/notify/fdinfo.c b/fs/notify/fdinfo.c
index dec553034027..e933f9c65d90 100644
--- a/fs/notify/fdinfo.c
+++ b/fs/notify/fdinfo.c
@@ -47,10 +47,8 @@ static void show_mark_fhandle(struct seq_file *m, struct inode *inode)
size = f->handle_bytes >> 2;
ret = exportfs_encode_fid(inode, (struct fid *)f->f_handle, &size);
- if ((ret == FILEID_INVALID) || (ret < 0)) {
- WARN_ONCE(1, "Can't encode file handler for inotify: %d\n", ret);
+ if ((ret == FILEID_INVALID) || (ret < 0))
return;
- }
f->handle_type = ret;
f->handle_bytes = size * sizeof(u32);
diff --git a/fs/overlayfs/copy_up.c b/fs/overlayfs/copy_up.c
index 3601ddfeddc2..56eee9f23ea9 100644
--- a/fs/overlayfs/copy_up.c
+++ b/fs/overlayfs/copy_up.c
@@ -442,9 +442,8 @@ struct ovl_fh *ovl_encode_real_fh(struct ovl_fs *ofs, struct dentry *real,
buflen = (dwords << 2);
err = -EIO;
- if (WARN_ON(fh_type < 0) ||
- WARN_ON(buflen > MAX_HANDLE_SZ) ||
- WARN_ON(fh_type == FILEID_INVALID))
+ if (fh_type < 0 || fh_type == FILEID_INVALID ||
+ WARN_ON(buflen > MAX_HANDLE_SZ))
goto out_err;
fh->fb.version = OVL_FH_VERSION;
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x 974e3fe0ac61de85015bbe5a4990cf4127b304b2
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025011114-splashed-obtrusive-a908@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 974e3fe0ac61de85015bbe5a4990cf4127b304b2 Mon Sep 17 00:00:00 2001
From: Amir Goldstein <amir73il(a)gmail.com>
Date: Thu, 19 Dec 2024 12:53:01 +0100
Subject: [PATCH] fs: relax assertions on failure to encode file handles
Encoding file handles is usually performed by a filesystem >encode_fh()
method that may fail for various reasons.
The legacy users of exportfs_encode_fh(), namely, nfsd and
name_to_handle_at(2) syscall are ready to cope with the possibility
of failure to encode a file handle.
There are a few other users of exportfs_encode_{fh,fid}() that
currently have a WARN_ON() assertion when ->encode_fh() fails.
Relax those assertions because they are wrong.
The second linked bug report states commit 16aac5ad1fa9 ("ovl: support
encoding non-decodable file handles") in v6.6 as the regressing commit,
but this is not accurate.
The aforementioned commit only increases the chances of the assertion
and allows triggering the assertion with the reproducer using overlayfs,
inotify and drop_caches.
Triggering this assertion was always possible with other filesystems and
other reasons of ->encode_fh() failures and more particularly, it was
also possible with the exact same reproducer using overlayfs that is
mounted with options index=on,nfs_export=on also on kernels < v6.6.
Therefore, I am not listing the aforementioned commit as a Fixes commit.
Backport hint: this patch will have a trivial conflict applying to
v6.6.y, and other trivial conflicts applying to stable kernels < v6.6.
Reported-by: syzbot+ec07f6f5ce62b858579f(a)syzkaller.appspotmail.com
Tested-by: syzbot+ec07f6f5ce62b858579f(a)syzkaller.appspotmail.com
Closes: https://lore.kernel.org/linux-unionfs/671fd40c.050a0220.4735a.024f.GAE@goog…
Reported-by: Dmitry Safonov <dima(a)arista.com>
Closes: https://lore.kernel.org/linux-fsdevel/CAGrbwDTLt6drB9eaUagnQVgdPBmhLfqqxAf3…
Cc: stable(a)vger.kernel.org
Signed-off-by: Amir Goldstein <amir73il(a)gmail.com>
Link: https://lore.kernel.org/r/20241219115301.465396-1-amir73il@gmail.com
Signed-off-by: Christian Brauner <brauner(a)kernel.org>
diff --git a/fs/notify/fdinfo.c b/fs/notify/fdinfo.c
index dec553034027..e933f9c65d90 100644
--- a/fs/notify/fdinfo.c
+++ b/fs/notify/fdinfo.c
@@ -47,10 +47,8 @@ static void show_mark_fhandle(struct seq_file *m, struct inode *inode)
size = f->handle_bytes >> 2;
ret = exportfs_encode_fid(inode, (struct fid *)f->f_handle, &size);
- if ((ret == FILEID_INVALID) || (ret < 0)) {
- WARN_ONCE(1, "Can't encode file handler for inotify: %d\n", ret);
+ if ((ret == FILEID_INVALID) || (ret < 0))
return;
- }
f->handle_type = ret;
f->handle_bytes = size * sizeof(u32);
diff --git a/fs/overlayfs/copy_up.c b/fs/overlayfs/copy_up.c
index 3601ddfeddc2..56eee9f23ea9 100644
--- a/fs/overlayfs/copy_up.c
+++ b/fs/overlayfs/copy_up.c
@@ -442,9 +442,8 @@ struct ovl_fh *ovl_encode_real_fh(struct ovl_fs *ofs, struct dentry *real,
buflen = (dwords << 2);
err = -EIO;
- if (WARN_ON(fh_type < 0) ||
- WARN_ON(buflen > MAX_HANDLE_SZ) ||
- WARN_ON(fh_type == FILEID_INVALID))
+ if (fh_type < 0 || fh_type == FILEID_INVALID ||
+ WARN_ON(buflen > MAX_HANDLE_SZ))
goto out_err;
fh->fb.version = OVL_FH_VERSION;
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x 974e3fe0ac61de85015bbe5a4990cf4127b304b2
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025011113-overbite-backroom-b95c@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 974e3fe0ac61de85015bbe5a4990cf4127b304b2 Mon Sep 17 00:00:00 2001
From: Amir Goldstein <amir73il(a)gmail.com>
Date: Thu, 19 Dec 2024 12:53:01 +0100
Subject: [PATCH] fs: relax assertions on failure to encode file handles
Encoding file handles is usually performed by a filesystem >encode_fh()
method that may fail for various reasons.
The legacy users of exportfs_encode_fh(), namely, nfsd and
name_to_handle_at(2) syscall are ready to cope with the possibility
of failure to encode a file handle.
There are a few other users of exportfs_encode_{fh,fid}() that
currently have a WARN_ON() assertion when ->encode_fh() fails.
Relax those assertions because they are wrong.
The second linked bug report states commit 16aac5ad1fa9 ("ovl: support
encoding non-decodable file handles") in v6.6 as the regressing commit,
but this is not accurate.
The aforementioned commit only increases the chances of the assertion
and allows triggering the assertion with the reproducer using overlayfs,
inotify and drop_caches.
Triggering this assertion was always possible with other filesystems and
other reasons of ->encode_fh() failures and more particularly, it was
also possible with the exact same reproducer using overlayfs that is
mounted with options index=on,nfs_export=on also on kernels < v6.6.
Therefore, I am not listing the aforementioned commit as a Fixes commit.
Backport hint: this patch will have a trivial conflict applying to
v6.6.y, and other trivial conflicts applying to stable kernels < v6.6.
Reported-by: syzbot+ec07f6f5ce62b858579f(a)syzkaller.appspotmail.com
Tested-by: syzbot+ec07f6f5ce62b858579f(a)syzkaller.appspotmail.com
Closes: https://lore.kernel.org/linux-unionfs/671fd40c.050a0220.4735a.024f.GAE@goog…
Reported-by: Dmitry Safonov <dima(a)arista.com>
Closes: https://lore.kernel.org/linux-fsdevel/CAGrbwDTLt6drB9eaUagnQVgdPBmhLfqqxAf3…
Cc: stable(a)vger.kernel.org
Signed-off-by: Amir Goldstein <amir73il(a)gmail.com>
Link: https://lore.kernel.org/r/20241219115301.465396-1-amir73il@gmail.com
Signed-off-by: Christian Brauner <brauner(a)kernel.org>
diff --git a/fs/notify/fdinfo.c b/fs/notify/fdinfo.c
index dec553034027..e933f9c65d90 100644
--- a/fs/notify/fdinfo.c
+++ b/fs/notify/fdinfo.c
@@ -47,10 +47,8 @@ static void show_mark_fhandle(struct seq_file *m, struct inode *inode)
size = f->handle_bytes >> 2;
ret = exportfs_encode_fid(inode, (struct fid *)f->f_handle, &size);
- if ((ret == FILEID_INVALID) || (ret < 0)) {
- WARN_ONCE(1, "Can't encode file handler for inotify: %d\n", ret);
+ if ((ret == FILEID_INVALID) || (ret < 0))
return;
- }
f->handle_type = ret;
f->handle_bytes = size * sizeof(u32);
diff --git a/fs/overlayfs/copy_up.c b/fs/overlayfs/copy_up.c
index 3601ddfeddc2..56eee9f23ea9 100644
--- a/fs/overlayfs/copy_up.c
+++ b/fs/overlayfs/copy_up.c
@@ -442,9 +442,8 @@ struct ovl_fh *ovl_encode_real_fh(struct ovl_fs *ofs, struct dentry *real,
buflen = (dwords << 2);
err = -EIO;
- if (WARN_ON(fh_type < 0) ||
- WARN_ON(buflen > MAX_HANDLE_SZ) ||
- WARN_ON(fh_type == FILEID_INVALID))
+ if (fh_type < 0 || fh_type == FILEID_INVALID ||
+ WARN_ON(buflen > MAX_HANDLE_SZ))
goto out_err;
fh->fb.version = OVL_FH_VERSION;
The patch below does not apply to the 6.6-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y
git checkout FETCH_HEAD
git cherry-pick -x 974e3fe0ac61de85015bbe5a4990cf4127b304b2
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025011111-freehand-oblivion-a4c5@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 974e3fe0ac61de85015bbe5a4990cf4127b304b2 Mon Sep 17 00:00:00 2001
From: Amir Goldstein <amir73il(a)gmail.com>
Date: Thu, 19 Dec 2024 12:53:01 +0100
Subject: [PATCH] fs: relax assertions on failure to encode file handles
Encoding file handles is usually performed by a filesystem >encode_fh()
method that may fail for various reasons.
The legacy users of exportfs_encode_fh(), namely, nfsd and
name_to_handle_at(2) syscall are ready to cope with the possibility
of failure to encode a file handle.
There are a few other users of exportfs_encode_{fh,fid}() that
currently have a WARN_ON() assertion when ->encode_fh() fails.
Relax those assertions because they are wrong.
The second linked bug report states commit 16aac5ad1fa9 ("ovl: support
encoding non-decodable file handles") in v6.6 as the regressing commit,
but this is not accurate.
The aforementioned commit only increases the chances of the assertion
and allows triggering the assertion with the reproducer using overlayfs,
inotify and drop_caches.
Triggering this assertion was always possible with other filesystems and
other reasons of ->encode_fh() failures and more particularly, it was
also possible with the exact same reproducer using overlayfs that is
mounted with options index=on,nfs_export=on also on kernels < v6.6.
Therefore, I am not listing the aforementioned commit as a Fixes commit.
Backport hint: this patch will have a trivial conflict applying to
v6.6.y, and other trivial conflicts applying to stable kernels < v6.6.
Reported-by: syzbot+ec07f6f5ce62b858579f(a)syzkaller.appspotmail.com
Tested-by: syzbot+ec07f6f5ce62b858579f(a)syzkaller.appspotmail.com
Closes: https://lore.kernel.org/linux-unionfs/671fd40c.050a0220.4735a.024f.GAE@goog…
Reported-by: Dmitry Safonov <dima(a)arista.com>
Closes: https://lore.kernel.org/linux-fsdevel/CAGrbwDTLt6drB9eaUagnQVgdPBmhLfqqxAf3…
Cc: stable(a)vger.kernel.org
Signed-off-by: Amir Goldstein <amir73il(a)gmail.com>
Link: https://lore.kernel.org/r/20241219115301.465396-1-amir73il@gmail.com
Signed-off-by: Christian Brauner <brauner(a)kernel.org>
diff --git a/fs/notify/fdinfo.c b/fs/notify/fdinfo.c
index dec553034027..e933f9c65d90 100644
--- a/fs/notify/fdinfo.c
+++ b/fs/notify/fdinfo.c
@@ -47,10 +47,8 @@ static void show_mark_fhandle(struct seq_file *m, struct inode *inode)
size = f->handle_bytes >> 2;
ret = exportfs_encode_fid(inode, (struct fid *)f->f_handle, &size);
- if ((ret == FILEID_INVALID) || (ret < 0)) {
- WARN_ONCE(1, "Can't encode file handler for inotify: %d\n", ret);
+ if ((ret == FILEID_INVALID) || (ret < 0))
return;
- }
f->handle_type = ret;
f->handle_bytes = size * sizeof(u32);
diff --git a/fs/overlayfs/copy_up.c b/fs/overlayfs/copy_up.c
index 3601ddfeddc2..56eee9f23ea9 100644
--- a/fs/overlayfs/copy_up.c
+++ b/fs/overlayfs/copy_up.c
@@ -442,9 +442,8 @@ struct ovl_fh *ovl_encode_real_fh(struct ovl_fs *ofs, struct dentry *real,
buflen = (dwords << 2);
err = -EIO;
- if (WARN_ON(fh_type < 0) ||
- WARN_ON(buflen > MAX_HANDLE_SZ) ||
- WARN_ON(fh_type == FILEID_INVALID))
+ if (fh_type < 0 || fh_type == FILEID_INVALID ||
+ WARN_ON(buflen > MAX_HANDLE_SZ))
goto out_err;
fh->fb.version = OVL_FH_VERSION;
The patch below does not apply to the 6.6-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y
git checkout FETCH_HEAD
git cherry-pick -x 3b2f56860b05bf0cea86af786fd9b7faa8fe3ef3
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025011124-swoosh-staple-7fae@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 3b2f56860b05bf0cea86af786fd9b7faa8fe3ef3 Mon Sep 17 00:00:00 2001
From: Manivannan Sadhasivam <manivannan.sadhasivam(a)linaro.org>
Date: Thu, 19 Dec 2024 22:20:44 +0530
Subject: [PATCH] scsi: ufs: qcom: Power down the controller/device during
system suspend for SM8550/SM8650 SoCs
SM8550 and SM8650 SoCs doesn't support UFS PHY retention. So once these SoCs
reaches the low power state (CX power collapse) during system suspend, all
the PHY hardware state gets lost. This leads to the UFS resume failure:
ufshcd-qcom 1d84000.ufs: ufshcd_uic_hibern8_exit: hibern8 exit failed. ret = 5
ufshcd-qcom 1d84000.ufs: __ufshcd_wl_resume: hibern8 exit failed 5
ufs_device_wlun 0:0:0:49488: ufshcd_wl_resume failed: 5
ufs_device_wlun 0:0:0:49488: PM: dpm_run_callback(): scsi_bus_resume+0x0/0x84 returns 5
ufs_device_wlun 0:0:0:49488: PM: failed to resume async: error 5
With the default system suspend level of UFS_PM_LVL_3, the power domain for
UFS PHY needs to be kept always ON to retain the state. But this would
prevent these SoCs from reaching the CX power collapse state, leading to
poor power saving during system suspend.
So to fix this issue without affecting the power saving, set
'ufs_qcom_drvdata::no_phy_retention' to true which sets 'hba->spm_lvl' to
UFS_PM_LVL_5 to allow both the controller and device (in turn the PHY) to be
powered down during system suspend for these SoCs by default.
Cc: stable(a)vger.kernel.org # 6.3
Fixes: 35cf1aaab169 ("arm64: dts: qcom: sm8550: Add UFS host controller and phy nodes")
Fixes: 10e024671295 ("arm64: dts: qcom: sm8650: add interconnect dependent device nodes")
Reported-by: Neil Armstrong <neil.armstrong(a)linaro.org>
Tested-by: Amit Pundir <amit.pundir(a)linaro.org> # on SM8550-HDK
Reviewed-by: Bart Van Assche <bvanassche(a)acm.org>
Tested-by: Neil Armstrong <neil.armstrong(a)linaro.org> # on SM8550-QRD
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam(a)linaro.org>
Link: https://lore.kernel.org/r/20241219-ufs-qcom-suspend-fix-v3-4-63c4b95a70b9@l…
Signed-off-by: Martin K. Petersen <martin.petersen(a)oracle.com>
diff --git a/drivers/ufs/host/ufs-qcom.c b/drivers/ufs/host/ufs-qcom.c
index 7042322d55e9..91e94fe990b4 100644
--- a/drivers/ufs/host/ufs-qcom.c
+++ b/drivers/ufs/host/ufs-qcom.c
@@ -1069,6 +1069,7 @@ static int ufs_qcom_init(struct ufs_hba *hba)
struct device *dev = hba->dev;
struct ufs_qcom_host *host;
struct ufs_clk_info *clki;
+ const struct ufs_qcom_drvdata *drvdata = of_device_get_match_data(hba->dev);
host = devm_kzalloc(dev, sizeof(*host), GFP_KERNEL);
if (!host)
@@ -1148,6 +1149,9 @@ static int ufs_qcom_init(struct ufs_hba *hba)
dev_warn(dev, "%s: failed to configure the testbus %d\n",
__func__, err);
+ if (drvdata && drvdata->no_phy_retention)
+ hba->spm_lvl = UFS_PM_LVL_5;
+
return 0;
out_variant_clear:
@@ -1867,6 +1871,7 @@ static void ufs_qcom_remove(struct platform_device *pdev)
static const struct ufs_qcom_drvdata ufs_qcom_sm8550_drvdata = {
.quirks = UFSHCD_QUIRK_BROKEN_LSDBS_CAP,
+ .no_phy_retention = true,
};
static const struct of_device_id ufs_qcom_of_match[] __maybe_unused = {
diff --git a/drivers/ufs/host/ufs-qcom.h b/drivers/ufs/host/ufs-qcom.h
index 15f6dad8b27f..919f53682beb 100644
--- a/drivers/ufs/host/ufs-qcom.h
+++ b/drivers/ufs/host/ufs-qcom.h
@@ -219,6 +219,7 @@ struct ufs_qcom_host {
struct ufs_qcom_drvdata {
enum ufshcd_quirks quirks;
+ bool no_phy_retention;
};
static inline u32
The patch below does not apply to the 6.12-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.12.y
git checkout FETCH_HEAD
git cherry-pick -x 3b2f56860b05bf0cea86af786fd9b7faa8fe3ef3
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025011123-parameter-envy-591e@gregkh' --subject-prefix 'PATCH 6.12.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 3b2f56860b05bf0cea86af786fd9b7faa8fe3ef3 Mon Sep 17 00:00:00 2001
From: Manivannan Sadhasivam <manivannan.sadhasivam(a)linaro.org>
Date: Thu, 19 Dec 2024 22:20:44 +0530
Subject: [PATCH] scsi: ufs: qcom: Power down the controller/device during
system suspend for SM8550/SM8650 SoCs
SM8550 and SM8650 SoCs doesn't support UFS PHY retention. So once these SoCs
reaches the low power state (CX power collapse) during system suspend, all
the PHY hardware state gets lost. This leads to the UFS resume failure:
ufshcd-qcom 1d84000.ufs: ufshcd_uic_hibern8_exit: hibern8 exit failed. ret = 5
ufshcd-qcom 1d84000.ufs: __ufshcd_wl_resume: hibern8 exit failed 5
ufs_device_wlun 0:0:0:49488: ufshcd_wl_resume failed: 5
ufs_device_wlun 0:0:0:49488: PM: dpm_run_callback(): scsi_bus_resume+0x0/0x84 returns 5
ufs_device_wlun 0:0:0:49488: PM: failed to resume async: error 5
With the default system suspend level of UFS_PM_LVL_3, the power domain for
UFS PHY needs to be kept always ON to retain the state. But this would
prevent these SoCs from reaching the CX power collapse state, leading to
poor power saving during system suspend.
So to fix this issue without affecting the power saving, set
'ufs_qcom_drvdata::no_phy_retention' to true which sets 'hba->spm_lvl' to
UFS_PM_LVL_5 to allow both the controller and device (in turn the PHY) to be
powered down during system suspend for these SoCs by default.
Cc: stable(a)vger.kernel.org # 6.3
Fixes: 35cf1aaab169 ("arm64: dts: qcom: sm8550: Add UFS host controller and phy nodes")
Fixes: 10e024671295 ("arm64: dts: qcom: sm8650: add interconnect dependent device nodes")
Reported-by: Neil Armstrong <neil.armstrong(a)linaro.org>
Tested-by: Amit Pundir <amit.pundir(a)linaro.org> # on SM8550-HDK
Reviewed-by: Bart Van Assche <bvanassche(a)acm.org>
Tested-by: Neil Armstrong <neil.armstrong(a)linaro.org> # on SM8550-QRD
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam(a)linaro.org>
Link: https://lore.kernel.org/r/20241219-ufs-qcom-suspend-fix-v3-4-63c4b95a70b9@l…
Signed-off-by: Martin K. Petersen <martin.petersen(a)oracle.com>
diff --git a/drivers/ufs/host/ufs-qcom.c b/drivers/ufs/host/ufs-qcom.c
index 7042322d55e9..91e94fe990b4 100644
--- a/drivers/ufs/host/ufs-qcom.c
+++ b/drivers/ufs/host/ufs-qcom.c
@@ -1069,6 +1069,7 @@ static int ufs_qcom_init(struct ufs_hba *hba)
struct device *dev = hba->dev;
struct ufs_qcom_host *host;
struct ufs_clk_info *clki;
+ const struct ufs_qcom_drvdata *drvdata = of_device_get_match_data(hba->dev);
host = devm_kzalloc(dev, sizeof(*host), GFP_KERNEL);
if (!host)
@@ -1148,6 +1149,9 @@ static int ufs_qcom_init(struct ufs_hba *hba)
dev_warn(dev, "%s: failed to configure the testbus %d\n",
__func__, err);
+ if (drvdata && drvdata->no_phy_retention)
+ hba->spm_lvl = UFS_PM_LVL_5;
+
return 0;
out_variant_clear:
@@ -1867,6 +1871,7 @@ static void ufs_qcom_remove(struct platform_device *pdev)
static const struct ufs_qcom_drvdata ufs_qcom_sm8550_drvdata = {
.quirks = UFSHCD_QUIRK_BROKEN_LSDBS_CAP,
+ .no_phy_retention = true,
};
static const struct of_device_id ufs_qcom_of_match[] __maybe_unused = {
diff --git a/drivers/ufs/host/ufs-qcom.h b/drivers/ufs/host/ufs-qcom.h
index 15f6dad8b27f..919f53682beb 100644
--- a/drivers/ufs/host/ufs-qcom.h
+++ b/drivers/ufs/host/ufs-qcom.h
@@ -219,6 +219,7 @@ struct ufs_qcom_host {
struct ufs_qcom_drvdata {
enum ufshcd_quirks quirks;
+ bool no_phy_retention;
};
static inline u32
Currently, installation of Debian 12.8 for mipsel fails on machines
without an FPU [1]. This is caused by the fact that zstd (which is used
for initramfs compression) executes the prefx instruction, which is not
emulated properly by the kernel.
The prefx (Prefetch Indexed) instruction fetches data from memory into
the cache without any side effects. Though functionally unrelated, it
requires an FPU [2].
Bytecode format of this instruction ends on "001111" binary:
(prefx instruction format) & 0x0000003f = 0x0000000f
The code in fpux_emu() runs like so:
#define MIPSInst(x) x
#define MIPSInst_FMA_FFMT(x) (MIPSInst(x) & 0x00000007)
#define MIPSInst_FUNC(x) (MIPSInst(x) & 0x0000003f)
enum cop1x_func { ..., pfetch_op = 0x0f, ... };
...
switch (MIPSInst_FMA_FFMT(ir)) {
...
case 0x3:
if (MIPSInst_FUNC(ir) != pfetch_op)
return SIGILL;
/* ignore prefx operation */
break;
default:
return SIGILL;
}
That snippet above contains a logic error and the
if (MIPSInst_FUNC(ir) != pfetch_op)
comparison always fires.
When MIPSInst_FUNC(ir) is equal to pfetch_op, ir must end on 001111
binary. In this case, MIPSInst_FMA_FFMT(ir) must be equal to 0x7, which
does not match that case label.
This causes emulation failure for the prefx instruction. Fix it.
This has been broken by
commit 919af8b96c89 ("MIPS: Make definitions of MIPSInst_FMA_{FUNC,FMTM} consistent with MIPS64 manual")
which modified the MIPSInst_FMA_FFMT macro without updating the users.
Signed-off-by: Mateusz Jończyk <mat.jonczyk(a)o2.pl>
Cc: stable(a)vger.kernel.org # after 3 weeks
Cc: Dengcheng Zhu <dzhu(a)wavecomp.com>
Cc: Thomas Bogendoerfer <tsbogend(a)alpha.franken.de>
Cc: Ming Wang <wangming01(a)loongson.cn>
Cc: Tiezhu Yang <yangtiezhu(a)loongson.cn>
Fixes: 919af8b96c89 ("MIPS: Make definitions of MIPSInst_FMA_{FUNC,FMTM} consistent with MIPS64 manual")
[1] https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1091858
[2] MIPS Architecture For Programmers Volume II-A: The MIPS32 Instruction Set
---
Tested in QEMU for mipsel and mips64el.
---
arch/mips/math-emu/cp1emu.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/mips/math-emu/cp1emu.c b/arch/mips/math-emu/cp1emu.c
index 265bc57819df..c89e70df43d8 100644
--- a/arch/mips/math-emu/cp1emu.c
+++ b/arch/mips/math-emu/cp1emu.c
@@ -1660,7 +1660,7 @@ static int fpux_emu(struct pt_regs *xcp, struct mips_fpu_struct *ctx,
break;
}
- case 0x3:
+ case 0x7:
if (MIPSInst_FUNC(ir) != pfetch_op)
return SIGILL;
--
2.25.1
This reverts commit 426fa8e4fe7bb914b5977cbce453a9926bf5b2e6.
The commit has caused two issues on Loongson 3A4000:
1. The timestamp in dmesg become erratic, like:
[3.736957] amdgpu 0000:04:00.0: ... ...
[3.748895] [drm] Initialized amdgpu ... ...
[18446744073.381141] amdgpu 0000:04:00:0: ... ...
[1.613326] igb 0000:03:00.0 enp3s0: ... ...
2. More seriously, some workloads (for example, the test
stdlib/test-cxa_atexit2 in the Glibc test suite) triggers an RCU
stall and hang the system with a high probably (4 hangs out of 5
tests).
Revert this commit to use jiffie on Loongson MIPS systems and fix these
issues for now. The root cause may need more investigation.
Cc: stable(a)vger.kernel.org # 6.11+
Cc: Jiaxun Yang <jiaxun.yang(a)flygoat.com>
Cc: Icenowy Zheng <icenowy(a)aosc.io>
Signed-off-by: Xi Ruoyao <xry111(a)xry111.site>
---
arch/mips/Kconfig | 1 -
1 file changed, 1 deletion(-)
diff --git a/arch/mips/Kconfig b/arch/mips/Kconfig
index 467b10f4361a..5078ebf071ec 100644
--- a/arch/mips/Kconfig
+++ b/arch/mips/Kconfig
@@ -1084,7 +1084,6 @@ config CSRC_IOASIC
config CSRC_R4K
select CLOCKSOURCE_WATCHDOG if CPU_FREQ
- select HAVE_UNSTABLE_SCHED_CLOCK if SMP && 64BIT
bool
config CSRC_SB1250
--
2.47.1
In "MIPS: Loongson64: Add /sys/firmware/lefi/boardinfo", "ROM Size" was
added under section "BIOS Info". However, that is incorrect per Appendix
A.7 in Q/LS 0013-2014 (龙芯CPU开发系统固件与内核接口规范 V2.2,
lit. Loongson DevSys Firmware Kernel Interface Specification V2.2),
which defines the size to be the size of this interface itself.
Removing "ROM Size" from boardinfo, and also adding comments to the
struct interface_info to clarify meanings of its members.
Signed-off-by: Kexy Biscuit <kexybiscuit(a)aosc.io>
---
Kexy Biscuit (2):
MIPS: Loongson64: remove ROM Size unit in boardinfo
MIPS: Loongson: Add comments for interface_info
arch/mips/include/asm/mach-loongson64/boot_param.h | 8 ++++----
arch/mips/loongson64/boardinfo.c | 2 --
2 files changed, 4 insertions(+), 6 deletions(-)
---
base-commit: 2144da25584eb10b84252230319b5783f6a83041
change-id: 20250111-loongson64-boardinfo-no-size-a25dd2a03a8b
Best regards,
--
Kexy Biscuit <kexybiscuit(a)aosc.io>
We are seeing kernel panic when enabling two NCSI interfaces at same
time. It looks like mutex lock is being used in softirq caused the
issue.
This patch series try to fix oem gma command handling issue by adding a
new state, also fix a potential state handling issue.
v1: https://lore.kernel.org/all/20250109145054.30925-1-fercerpav@gmail.com/
Signed-off-by: Potin Lai <potin.lai.pt(a)gmail.com>
---
Cosmo Chou (1):
net/ncsi: fix state race during channel probe completion
Paul Fertser (1):
net/ncsi: fix locking in Get MAC Address handling
net/ncsi/internal.h | 2 ++
net/ncsi/ncsi-manage.c | 21 ++++++++++++++++++---
net/ncsi/ncsi-rsp.c | 19 ++++++-------------
3 files changed, 26 insertions(+), 16 deletions(-)
---
base-commit: fc033cf25e612e840e545f8d5ad2edd6ba613ed5
change-id: 20250111-fix-ncsi-mac-1e4b3df431f1
Best regards,
--
Potin Lai <potin.lai.pt(a)gmail.com>
[BUG]
With CONFIG_DEBUG_VM set, test case generic/476 has some chance to crash
with the following VM_BUG_ON_FOLIO():
BTRFS error (device dm-3): cow_file_range failed, start 1146880 end 1253375 len 106496 ret -28
BTRFS error (device dm-3): run_delalloc_nocow failed, start 1146880 end 1253375 len 106496 ret -28
page: refcount:4 mapcount:0 mapping:00000000592787cc index:0x12 pfn:0x10664
aops:btrfs_aops [btrfs] ino:101 dentry name(?):"f1774"
flags: 0x2fffff80004028(uptodate|lru|private|node=0|zone=2|lastcpupid=0xfffff)
page dumped because: VM_BUG_ON_FOLIO(!folio_test_locked(folio))
------------[ cut here ]------------
kernel BUG at mm/page-writeback.c:2992!
Internal error: Oops - BUG: 00000000f2000800 [#1] SMP
CPU: 2 UID: 0 PID: 3943513 Comm: kworker/u24:15 Tainted: G OE 6.12.0-rc7-custom+ #87
Tainted: [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
Hardware name: QEMU KVM Virtual Machine, BIOS unknown 2/2/2022
Workqueue: events_unbound btrfs_async_reclaim_data_space [btrfs]
pc : folio_clear_dirty_for_io+0x128/0x258
lr : folio_clear_dirty_for_io+0x128/0x258
Call trace:
folio_clear_dirty_for_io+0x128/0x258
btrfs_folio_clamp_clear_dirty+0x80/0xd0 [btrfs]
__process_folios_contig+0x154/0x268 [btrfs]
extent_clear_unlock_delalloc+0x5c/0x80 [btrfs]
run_delalloc_nocow+0x5f8/0x760 [btrfs]
btrfs_run_delalloc_range+0xa8/0x220 [btrfs]
writepage_delalloc+0x230/0x4c8 [btrfs]
extent_writepage+0xb8/0x358 [btrfs]
extent_write_cache_pages+0x21c/0x4e8 [btrfs]
btrfs_writepages+0x94/0x150 [btrfs]
do_writepages+0x74/0x190
filemap_fdatawrite_wbc+0x88/0xc8
start_delalloc_inodes+0x178/0x3a8 [btrfs]
btrfs_start_delalloc_roots+0x174/0x280 [btrfs]
shrink_delalloc+0x114/0x280 [btrfs]
flush_space+0x250/0x2f8 [btrfs]
btrfs_async_reclaim_data_space+0x180/0x228 [btrfs]
process_one_work+0x164/0x408
worker_thread+0x25c/0x388
kthread+0x100/0x118
ret_from_fork+0x10/0x20
Code: 910a8021 a90363f7 a9046bf9 94012379 (d4210000)
---[ end trace 0000000000000000 ]---
[CAUSE]
The first two lines of extra debug messages show the problem is caused
by the error handling of run_delalloc_nocow().
E.g. we have the following dirtied range (4K blocksize 4K page size):
0 16K 32K
|//////////////////////////////////////|
| Pre-allocated |
And the range [0, 16K) has a preallocated extent.
- Enter run_delalloc_nocow() for range [0, 16K)
Which found range [0, 16K) is preallocated, can do the proper NOCOW
write.
- Enter fallback_to_fow() for range [16K, 32K)
Since the range [16K, 32K) is not backed by preallocated extent, we
have to go COW.
- cow_file_range() failed for range [16K, 32K)
So cow_file_range() will do the clean up by clearing folio dirty,
unlock the folios.
Now the folios in range [16K, 32K) is unlocked.
- Enter extent_clear_unlock_delalloc() from run_delalloc_nocow()
Which is called with PAGE_START_WRITEBACK to start page writeback.
But folios can only be marked writeback when it's properly locked,
thus this triggered the VM_BUG_ON_FOLIO().
Furthermore there is another hidden but common bug that
run_delalloc_nocow() is not clearing the folio dirty flags in its error
handling path.
This is the common bug shared between run_delalloc_nocow() and
cow_file_range().
[FIX]
- Clear folio dirty for range [@start, @cur_offset)
Introduce a helper, cleanup_dirty_folios(), which
will find and lock the folio in the range, clear the dirty flag and
start/end the writeback, with the extra handling for the
@locked_folio.
- Introduce a helper to clear folio dirty, start and end writeback
- Introduce a helper to record the last failed COW range end
This is to trace which range we should skip, to avoid double
unlocking.
- Skip the failed COW range for the error handling
Cc: stable(a)vger.kernel.org
Reviewed-by: Boris Burkov <boris(a)bur.io>
Signed-off-by: Qu Wenruo <wqu(a)suse.com>
---
fs/btrfs/inode.c | 98 ++++++++++++++++++++++++++++++++++++++++++----
fs/btrfs/subpage.h | 13 ++++++
2 files changed, 104 insertions(+), 7 deletions(-)
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index b94c4abcda3a..73a6b88c6511 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -1957,6 +1957,54 @@ static int can_nocow_file_extent(struct btrfs_path *path,
return ret < 0 ? ret : can_nocow;
}
+/*
+ * To cleanup the dirty folios which will never be submitted due to
+ * error.
+ *
+ * When running a delalloc range, we may need to split the ranges (due to
+ * fragmentation or NOCOW limit). If we hit an error in the later part,
+ * we will error out and previously successfully executed range will never
+ * be submitted, thus we have to cleanup those folios by clear their
+ * dirty flag, start and finish the writeback.
+ */
+static void cleanup_dirty_folios(struct btrfs_inode *inode,
+ struct folio *locked_folio,
+ u64 start, u64 end, int error)
+{
+ struct btrfs_fs_info *fs_info = inode->root->fs_info;
+ struct address_space *mapping = inode->vfs_inode.i_mapping;
+ pgoff_t start_index = start >> PAGE_SHIFT;
+ pgoff_t end_index = end >> PAGE_SHIFT;
+ u32 len;
+
+ ASSERT(end + 1 - start < U32_MAX);
+ ASSERT(IS_ALIGNED(start, fs_info->sectorsize) &&
+ IS_ALIGNED(end + 1, fs_info->sectorsize));
+ len = end + 1 - start;
+
+ /*
+ * Handle the locked folio first.
+ * btrfs_folio_clamp_*() helpers can handle range out of the folio case.
+ */
+ btrfs_folio_clamp_finish_io(fs_info, locked_folio, start, len);
+
+ for (pgoff_t index = start_index; index <= end_index; index++) {
+ struct folio *folio;
+
+ /* Already handled at the beginning. */
+ if (index == locked_folio->index)
+ continue;
+ folio = __filemap_get_folio(mapping, index, FGP_LOCK, GFP_NOFS);
+ /* Cache already dropped, no need to do any cleanup. */
+ if (IS_ERR(folio))
+ continue;
+ btrfs_folio_clamp_finish_io(fs_info, locked_folio, start, len);
+ folio_unlock(folio);
+ folio_put(folio);
+ }
+ mapping_set_error(mapping, error);
+}
+
/*
* when nowcow writeback call back. This checks for snapshots or COW copies
* of the extents that exist in the file, and COWs the file as required.
@@ -1972,6 +2020,11 @@ static noinline int run_delalloc_nocow(struct btrfs_inode *inode,
struct btrfs_root *root = inode->root;
struct btrfs_path *path;
u64 cow_start = (u64)-1;
+ /*
+ * If not 0, represents the inclusive end of the last fallback_to_cow()
+ * range. Only for error handling.
+ */
+ u64 cow_end = 0;
u64 cur_offset = start;
int ret;
bool check_prev = true;
@@ -2132,6 +2185,7 @@ static noinline int run_delalloc_nocow(struct btrfs_inode *inode,
found_key.offset - 1);
cow_start = (u64)-1;
if (ret) {
+ cow_end = found_key.offset - 1;
btrfs_dec_nocow_writers(nocow_bg);
goto error;
}
@@ -2205,11 +2259,12 @@ static noinline int run_delalloc_nocow(struct btrfs_inode *inode,
cow_start = cur_offset;
if (cow_start != (u64)-1) {
- cur_offset = end;
ret = fallback_to_cow(inode, locked_folio, cow_start, end);
cow_start = (u64)-1;
- if (ret)
+ if (ret) {
+ cow_end = end;
goto error;
+ }
}
btrfs_free_path(path);
@@ -2217,12 +2272,41 @@ static noinline int run_delalloc_nocow(struct btrfs_inode *inode,
error:
/*
- * If an error happened while a COW region is outstanding, cur_offset
- * needs to be reset to cow_start to ensure the COW region is unlocked
- * as well.
+ * There are several error cases:
+ *
+ * 1) Failed without falling back to COW
+ * start cur_offset end
+ * |/////////////| |
+ *
+ * For range [start, cur_offset) the folios are already unlocked (except
+ * @locked_folio), EXTENT_DELALLOC already removed.
+ * Only need to clear the dirty flag as they will never be submitted.
+ * Ordered extent and extent maps are handled by
+ * btrfs_mark_ordered_io_finished() inside run_delalloc_range().
+ *
+ * 2) Failed with error from fallback_to_cow()
+ * start cur_offset cow_end end
+ * |/////////////|-----------| |
+ *
+ * For range [start, cur_offset) it's the same as case 1).
+ * But for range [cur_offset, cow_end), the folios have dirty flag
+ * cleared and unlocked, EXTENT_DEALLLOC cleared by cow_file_range().
+ *
+ * Thus we should not call extent_clear_unlock_delalloc() on range
+ * [cur_offset, cow_end), as the folios are already unlocked.
+ *
+ * So clear the folio dirty flags for [start, cur_offset) first.
*/
- if (cow_start != (u64)-1)
- cur_offset = cow_start;
+ if (cur_offset > start)
+ cleanup_dirty_folios(inode, locked_folio, start, cur_offset - 1, ret);
+
+ /*
+ * If an error happened while a COW region is outstanding, cur_offset
+ * needs to be reset to @cow_end + 1 to skip the COW range, as
+ * cow_file_range() will do the proper cleanup at error.
+ */
+ if (cow_end)
+ cur_offset = cow_end + 1;
/*
* We need to lock the extent here because we're clearing DELALLOC and
diff --git a/fs/btrfs/subpage.h b/fs/btrfs/subpage.h
index 428fa9389fd4..44fff1f4eac4 100644
--- a/fs/btrfs/subpage.h
+++ b/fs/btrfs/subpage.h
@@ -137,6 +137,19 @@ DECLARE_BTRFS_SUBPAGE_OPS(writeback);
DECLARE_BTRFS_SUBPAGE_OPS(ordered);
DECLARE_BTRFS_SUBPAGE_OPS(checked);
+/*
+ * Helper for error cleanup, where a folio will have its dirty flag cleared,
+ * with writeback started and finished.
+ */
+static inline void btrfs_folio_clamp_finish_io(struct btrfs_fs_info *fs_info,
+ struct folio *locked_folio,
+ u64 start, u32 len)
+{
+ btrfs_folio_clamp_clear_dirty(fs_info, locked_folio, start, len);
+ btrfs_folio_clamp_set_writeback(fs_info, locked_folio, start, len);
+ btrfs_folio_clamp_clear_writeback(fs_info, locked_folio, start, len);
+}
+
bool btrfs_subpage_clear_and_test_dirty(const struct btrfs_fs_info *fs_info,
struct folio *folio, u64 start, u32 len);
--
2.47.1
[BUG]
When testing with COW fixup marked as BUG_ON() (this is involved with the
new pin_user_pages*() change, which should not result new out-of-band
dirty pages), I hit a crash triggered by the BUG_ON() from hitting COW
fixup path.
This BUG_ON() happens just after a failed btrfs_run_delalloc_range():
BTRFS error (device dm-2): failed to run delalloc range, root 348 ino 405 folio 65536 submit_bitmap 6-15 start 90112 len 106496: -28
------------[ cut here ]------------
kernel BUG at fs/btrfs/extent_io.c:1444!
Internal error: Oops - BUG: 00000000f2000800 [#1] SMP
CPU: 0 UID: 0 PID: 434621 Comm: kworker/u24:8 Tainted: G OE 6.12.0-rc7-custom+ #86
Hardware name: QEMU KVM Virtual Machine, BIOS unknown 2/2/2022
Workqueue: events_unbound btrfs_async_reclaim_data_space [btrfs]
pc : extent_writepage_io+0x2d4/0x308 [btrfs]
lr : extent_writepage_io+0x2d4/0x308 [btrfs]
Call trace:
extent_writepage_io+0x2d4/0x308 [btrfs]
extent_writepage+0x218/0x330 [btrfs]
extent_write_cache_pages+0x1d4/0x4b0 [btrfs]
btrfs_writepages+0x94/0x150 [btrfs]
do_writepages+0x74/0x190
filemap_fdatawrite_wbc+0x88/0xc8
start_delalloc_inodes+0x180/0x3b0 [btrfs]
btrfs_start_delalloc_roots+0x174/0x280 [btrfs]
shrink_delalloc+0x114/0x280 [btrfs]
flush_space+0x250/0x2f8 [btrfs]
btrfs_async_reclaim_data_space+0x180/0x228 [btrfs]
process_one_work+0x164/0x408
worker_thread+0x25c/0x388
kthread+0x100/0x118
ret_from_fork+0x10/0x20
Code: aa1403e1 9402f3ef aa1403e0 9402f36f (d4210000)
---[ end trace 0000000000000000 ]---
[CAUSE]
That failure is mostly from cow_file_range(), where we can hit -ENOSPC.
Although the -ENOSPC is already a bug related to our space reservation
code, let's just focus on the error handling.
For example, we have the following dirty range [0, 64K) of an inode,
with 4K sector size and 4K page size:
0 16K 32K 48K 64K
|///////////////////////////////////////|
|#######################################|
Where |///| means page are still dirty, and |###| means the extent io
tree has EXTENT_DELALLOC flag.
- Enter extent_writepage() for page 0
- Enter btrfs_run_delalloc_range() for range [0, 64K)
- Enter cow_file_range() for range [0, 64K)
- Function btrfs_reserve_extent() only reserved one 16K extent
So we created extent map and ordered extent for range [0, 16K)
0 16K 32K 48K 64K
|////////|//////////////////////////////|
|<- OE ->|##############################|
And range [0, 16K) has its delalloc flag cleared.
But since we haven't yet submit any bio, involved 4 pages are still
dirty.
- Function btrfs_reserve_extent() return with -ENOSPC
Now we have to run error cleanup, which will clear all
EXTENT_DELALLOC* flags and clear the dirty flags for the remaining
ranges:
0 16K 32K 48K 64K
|////////| |
| | |
Note that range [0, 16K) still has their pages dirty.
- Some time later, writeback are triggered again for the range [0, 16K)
since the page range still have dirty flags.
- btrfs_run_delalloc_range() will do nothing because there is no
EXTENT_DELALLOC flag.
- extent_writepage_io() find page 0 has no ordered flag
Which falls into the COW fixup path, triggering the BUG_ON().
Unfortunately this error handling bug dates back to the introduction of btrfs.
Thankfully with the abuse of cow fixup, at least it won't crash the
kernel.
[FIX]
Instead of immediately unlock the extent and folios, we keep the extent
and folios locked until either erroring out or the whole delalloc range
finished.
When the whole delalloc range finished without error, we just unlock the
whole range with PAGE_SET_ORDERED (and PAGE_UNLOCK for !keep_locked
cases), with EXTENT_DELALLOC and EXTENT_LOCKED cleared.
And those involved folios will be properly submitted, with their dirty
flags cleared during submission.
For the error path, it will be a little more complex:
- The range with ordered extent allocated (range (1))
We only clear the EXTENT_DELALLOC and EXTENT_LOCKED, as the remaining
flags are cleaned up by
btrfs_mark_ordered_io_finished()->btrfs_finish_one_ordered().
For folios we finish the IO (clear dirty, start writeback and
immediately finish the writeback) and unlock the folios.
- The range with reserved extent but no ordered extent (range(2))
- The range we never touched (range(3))
For both range (2) and range(3) the behavior is not changed.
Now even if cow_file_range() failed halfway with some successfully
reserved extents/ordered extents, we will keep all folios clean, so
there will be no future writeback triggered on them.
Cc: stable(a)vger.kernel.org
Reviewed-by: Boris Burkov <boris(a)bur.io>
Signed-off-by: Qu Wenruo <wqu(a)suse.com>
---
fs/btrfs/inode.c | 65 ++++++++++++++++++++++++------------------------
1 file changed, 32 insertions(+), 33 deletions(-)
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index c23993cb7bdc..b94c4abcda3a 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -1363,6 +1363,17 @@ static noinline int cow_file_range(struct btrfs_inode *inode,
alloc_hint = btrfs_get_extent_allocation_hint(inode, start, num_bytes);
+ /*
+ * We're not doing compressed IO, don't unlock the first page
+ * (which the caller expects to stay locked), don't clear any
+ * dirty bits and don't set any writeback bits
+ *
+ * Do set the Ordered (Private2) bit so we know this page was
+ * properly setup for writepage.
+ */
+ page_ops = (keep_locked ? 0 : PAGE_UNLOCK);
+ page_ops |= PAGE_SET_ORDERED;
+
/*
* Relocation relies on the relocated extents to have exactly the same
* size as the original extents. Normally writeback for relocation data
@@ -1422,6 +1433,10 @@ static noinline int cow_file_range(struct btrfs_inode *inode,
file_extent.offset = 0;
file_extent.compression = BTRFS_COMPRESS_NONE;
+ /*
+ * Locked range will be released either during error clean up or
+ * after the whole range is finished.
+ */
lock_extent(&inode->io_tree, start, start + cur_alloc_size - 1,
&cached);
@@ -1467,21 +1482,6 @@ static noinline int cow_file_range(struct btrfs_inode *inode,
btrfs_dec_block_group_reservations(fs_info, ins.objectid);
- /*
- * We're not doing compressed IO, don't unlock the first page
- * (which the caller expects to stay locked), don't clear any
- * dirty bits and don't set any writeback bits
- *
- * Do set the Ordered flag so we know this page was
- * properly setup for writepage.
- */
- page_ops = (keep_locked ? 0 : PAGE_UNLOCK);
- page_ops |= PAGE_SET_ORDERED;
-
- extent_clear_unlock_delalloc(inode, start, start + cur_alloc_size - 1,
- locked_folio, &cached,
- EXTENT_LOCKED | EXTENT_DELALLOC,
- page_ops);
if (num_bytes < cur_alloc_size)
num_bytes = 0;
else
@@ -1498,6 +1498,9 @@ static noinline int cow_file_range(struct btrfs_inode *inode,
if (ret)
goto out_unlock;
}
+ extent_clear_unlock_delalloc(inode, orig_start, end, locked_folio, &cached,
+ EXTENT_LOCKED | EXTENT_DELALLOC,
+ page_ops);
done:
if (done_offset)
*done_offset = end;
@@ -1518,35 +1521,31 @@ static noinline int cow_file_range(struct btrfs_inode *inode,
* We process each region below.
*/
- clear_bits = EXTENT_LOCKED | EXTENT_DELALLOC | EXTENT_DELALLOC_NEW |
- EXTENT_DEFRAG | EXTENT_CLEAR_META_RESV;
- page_ops = PAGE_UNLOCK | PAGE_START_WRITEBACK | PAGE_END_WRITEBACK;
-
/*
* For the range (1). We have already instantiated the ordered extents
* for this region. They are cleaned up by
* btrfs_cleanup_ordered_extents() in e.g,
- * btrfs_run_delalloc_range(). EXTENT_LOCKED | EXTENT_DELALLOC are
- * already cleared in the above loop. And, EXTENT_DELALLOC_NEW |
- * EXTENT_DEFRAG | EXTENT_CLEAR_META_RESV are handled by the cleanup
- * function.
+ * btrfs_run_delalloc_range().
+ * EXTENT_DELALLOC_NEW | EXTENT_DEFRAG | EXTENT_CLEAR_META_RESV
+ * are also handled by the cleanup function.
*
- * However, in case of @keep_locked, we still need to unlock the pages
- * (except @locked_folio) to ensure all the pages are unlocked.
+ * So here we only clear EXTENT_LOCKED and EXTENT_DELALLOC flag,
+ * and finish the writeback of the involved folios, which will be
+ * never submitted.
*/
- if (keep_locked && orig_start < start) {
+ if (orig_start < start) {
+ clear_bits = EXTENT_LOCKED | EXTENT_DELALLOC;
+ page_ops = PAGE_UNLOCK | PAGE_START_WRITEBACK | PAGE_END_WRITEBACK;
+
if (!locked_folio)
mapping_set_error(inode->vfs_inode.i_mapping, ret);
extent_clear_unlock_delalloc(inode, orig_start, start - 1,
- locked_folio, NULL, 0, page_ops);
+ locked_folio, NULL, clear_bits, page_ops);
}
- /*
- * At this point we're unlocked, we want to make sure we're only
- * clearing these flags under the extent lock, so lock the rest of the
- * range and clear everything up.
- */
- lock_extent(&inode->io_tree, start, end, NULL);
+ clear_bits = EXTENT_LOCKED | EXTENT_DELALLOC | EXTENT_DELALLOC_NEW |
+ EXTENT_DEFRAG | EXTENT_CLEAR_META_RESV;
+ page_ops = PAGE_UNLOCK | PAGE_START_WRITEBACK | PAGE_END_WRITEBACK;
/*
* For the range (2). If we reserved an extent for our delalloc range
--
2.47.1
[BUG]
If submit_one_sector() failed inside extent_writepage_io() for sector
size < page size cases (e.g. 4K sector size and 64K page size), then
we can hit double ordered extent accounting error.
This should be very rare, as submit_one_sector() only fails when we
failed to grab the extent map, and such extent map should exist inside
the memory and have been pinned.
[CAUSE]
For example we have the following folio layout:
0 4K 32K 48K 60K 64K
|//| |//////| |///|
Where |///| is the dirty range we need to writeback. The 3 different
dirty ranges are submitted for regular COW.
Now we hit the following sequence:
- submit_one_sector() returned 0 for [0, 4K)
- submit_one_sector() returned 0 for [32K, 48K)
- submit_one_sector() returned error for [60K, 64K)
- btrfs_mark_ordered_io_finished() called for the whole folio
This will mark the following ranges as finished:
* [0, 4K)
* [32K, 48K)
Both ranges have their IO already submitted, this cleanup will
lead to double accounting.
* [60K, 64K)
That's the correct cleanup.
The only good news is, this error is only theoretical, as the target
extent map is always pinned, thus we should directly grab it from
memory, other than reading it from the disk.
[FIX]
Instead of calling btrfs_mark_ordered_io_finished() for the whole folio
range, which can touch ranges we should not touch, instead
move the error handling inside extent_writepage_io().
So that we can cleanup exact sectors that are ought to be submitted but
failed.
This provide much more accurate cleanup, avoiding the double accounting.
Cc: stable(a)vger.kernel.org # 5.15+
Signed-off-by: Qu Wenruo <wqu(a)suse.com>
---
fs/btrfs/extent_io.c | 37 ++++++++++++++++++++++++-------------
1 file changed, 24 insertions(+), 13 deletions(-)
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 7aa3d8c2b0c3..f50e4fccd909 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -1420,6 +1420,7 @@ static noinline_for_stack int extent_writepage_io(struct btrfs_inode *inode,
struct btrfs_fs_info *fs_info = inode->root->fs_info;
unsigned long range_bitmap = 0;
bool submitted_io = false;
+ bool error = false;
const u64 folio_start = folio_pos(folio);
u64 cur;
int bit;
@@ -1462,11 +1463,26 @@ static noinline_for_stack int extent_writepage_io(struct btrfs_inode *inode,
break;
}
ret = submit_one_sector(inode, folio, cur, bio_ctrl, i_size);
- if (ret < 0)
- goto out;
+ if (unlikely(ret < 0)) {
+ /*
+ * bio_ctrl may contain a bio crossing several folios.
+ * Submit it immediately so that the bio has a chance
+ * to finish normally, other than marked as error.
+ */
+ submit_one_bio(bio_ctrl);
+ /*
+ * Failed to grab the extent map which should be very rare.
+ * Since there is no bio submitted to finish the ordered
+ * extent, we have to manually finish this sector.
+ */
+ btrfs_mark_ordered_io_finished(inode, folio, cur,
+ fs_info->sectorsize, false);
+ error = true;
+ continue;
+ }
submitted_io = true;
}
-out:
+
/*
* If we didn't submitted any sector (>= i_size), folio dirty get
* cleared but PAGECACHE_TAG_DIRTY is not cleared (only cleared
@@ -1474,8 +1490,11 @@ static noinline_for_stack int extent_writepage_io(struct btrfs_inode *inode,
*
* Here we set writeback and clear for the range. If the full folio
* is no longer dirty then we clear the PAGECACHE_TAG_DIRTY tag.
+ *
+ * If we hit any error, the corresponding sector will still be dirty
+ * thus no need to clear PAGECACHE_TAG_DIRTY.
*/
- if (!submitted_io) {
+ if (!submitted_io && !error) {
btrfs_folio_set_writeback(fs_info, folio, start, len);
btrfs_folio_clear_writeback(fs_info, folio, start, len);
}
@@ -1495,7 +1514,6 @@ static int extent_writepage(struct folio *folio, struct btrfs_bio_ctrl *bio_ctrl
{
struct btrfs_inode *inode = BTRFS_I(folio->mapping->host);
struct btrfs_fs_info *fs_info = inode->root->fs_info;
- const u64 page_start = folio_pos(folio);
int ret;
size_t pg_offset;
loff_t i_size = i_size_read(&inode->vfs_inode);
@@ -1538,10 +1556,6 @@ static int extent_writepage(struct folio *folio, struct btrfs_bio_ctrl *bio_ctrl
bio_ctrl->wbc->nr_to_write--;
- if (ret)
- btrfs_mark_ordered_io_finished(inode, folio,
- page_start, PAGE_SIZE, !ret);
-
done:
if (ret < 0)
mapping_set_error(folio->mapping, ret);
@@ -2314,11 +2328,8 @@ void extent_write_locked_range(struct inode *inode, const struct folio *locked_f
if (ret == 1)
goto next_page;
- if (ret) {
- btrfs_mark_ordered_io_finished(BTRFS_I(inode), folio,
- cur, cur_len, !ret);
+ if (ret)
mapping_set_error(mapping, ret);
- }
btrfs_folio_end_lock(fs_info, folio, cur, cur_len);
if (ret < 0)
found_error = true;
--
2.47.1
[BUG]
When running btrfs with block size (4K) smaller than page size (64K,
aarch64), there is a very high chance to crash the kernel at
generic/750, with the following messages:
(before the call traces, there are 3 extra debug messages added)
BTRFS warning (device dm-3): read-write for sector size 4096 with page size 65536 is experimental
BTRFS info (device dm-3): checking UUID tree
hrtimer: interrupt took 5451385 ns
BTRFS error (device dm-3): cow_file_range failed, root=4957 inode=257 start=1605632 len=69632: -28
BTRFS error (device dm-3): run_delalloc_nocow failed, root=4957 inode=257 start=1605632 len=69632: -28
BTRFS error (device dm-3): failed to run delalloc range, root=4957 ino=257 folio=1572864 submit_bitmap=8-15 start=1605632 len=69632: -28
------------[ cut here ]------------
WARNING: CPU: 2 PID: 3020984 at ordered-data.c:360 can_finish_ordered_extent+0x370/0x3b8 [btrfs]
CPU: 2 UID: 0 PID: 3020984 Comm: kworker/u24:1 Tainted: G OE 6.13.0-rc1-custom+ #89
Tainted: [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
Hardware name: QEMU KVM Virtual Machine, BIOS unknown 2/2/2022
Workqueue: events_unbound btrfs_async_reclaim_data_space [btrfs]
pc : can_finish_ordered_extent+0x370/0x3b8 [btrfs]
lr : can_finish_ordered_extent+0x1ec/0x3b8 [btrfs]
Call trace:
can_finish_ordered_extent+0x370/0x3b8 [btrfs] (P)
can_finish_ordered_extent+0x1ec/0x3b8 [btrfs] (L)
btrfs_mark_ordered_io_finished+0x130/0x2b8 [btrfs]
extent_writepage+0x10c/0x3b8 [btrfs]
extent_write_cache_pages+0x21c/0x4e8 [btrfs]
btrfs_writepages+0x94/0x160 [btrfs]
do_writepages+0x74/0x190
filemap_fdatawrite_wbc+0x74/0xa0
start_delalloc_inodes+0x17c/0x3b0 [btrfs]
btrfs_start_delalloc_roots+0x17c/0x288 [btrfs]
shrink_delalloc+0x11c/0x280 [btrfs]
flush_space+0x288/0x328 [btrfs]
btrfs_async_reclaim_data_space+0x180/0x228 [btrfs]
process_one_work+0x228/0x680
worker_thread+0x1bc/0x360
kthread+0x100/0x118
ret_from_fork+0x10/0x20
---[ end trace 0000000000000000 ]---
BTRFS critical (device dm-3): bad ordered extent accounting, root=4957 ino=257 OE offset=1605632 OE len=16384 to_dec=16384 left=0
BTRFS critical (device dm-3): bad ordered extent accounting, root=4957 ino=257 OE offset=1622016 OE len=12288 to_dec=12288 left=0
Unable to handle kernel NULL pointer dereference at virtual address 0000000000000008
BTRFS critical (device dm-3): bad ordered extent accounting, root=4957 ino=257 OE offset=1634304 OE len=8192 to_dec=4096 left=0
CPU: 1 UID: 0 PID: 3286940 Comm: kworker/u24:3 Tainted: G W OE 6.13.0-rc1-custom+ #89
Hardware name: QEMU KVM Virtual Machine, BIOS unknown 2/2/2022
Workqueue: btrfs_work_helper [btrfs] (btrfs-endio-write)
pstate: 404000c5 (nZcv daIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : process_one_work+0x110/0x680
lr : worker_thread+0x1bc/0x360
Call trace:
process_one_work+0x110/0x680 (P)
worker_thread+0x1bc/0x360 (L)
worker_thread+0x1bc/0x360
kthread+0x100/0x118
ret_from_fork+0x10/0x20
Code: f84086a1 f9000fe1 53041c21 b9003361 (f9400661)
---[ end trace 0000000000000000 ]---
Kernel panic - not syncing: Oops: Fatal exception
SMP: stopping secondary CPUs
SMP: failed to stop secondary CPUs 2-3
Dumping ftrace buffer:
(ftrace buffer empty)
Kernel Offset: 0x275bb9540000 from 0xffff800080000000
PHYS_OFFSET: 0xffff8fbba0000000
CPU features: 0x100,00000070,00801250,8201720b
[CAUSE]
The above warning is triggered immediately after the delalloc range
failure, this happens in the following sequence:
- Range [1568K, 1636K) is dirty
1536K 1568K 1600K 1636K 1664K
| |/////////|////////| |
Where 1536K, 1600K and 1664K are page boundaries (64K page size)
- Enter extent_writepage() for page 1536K
- Enter run_delalloc_nocow() with locked page 1536K and range
[1568K, 1636K)
This is due to the inode has preallocated extents.
- Enter cow_file_range() with locked page 1536K and range
[1568K, 1636K)
- btrfs_reserve_extent() only reserved two extents
The main loop of cow_file_range() only reserved two data extents,
Now we have:
1536K 1568K 1600K 1636K 1664K
| |<-->|<--->|/|///////| |
1584K 1596K
Range [1568K, 1596K) has ordered extent reserved.
- btrfs_reserve_extent() failed inside cow_file_range() for file offset
1596K
This is already a bug in our space reservation code, but for now let's
focus on the error handling path.
Now cow_file_range() returned -ENOSPC.
- btrfs_run_delalloc_range() do error cleanup <<< ROOT CAUSE
Call btrfs_cleanup_ordered_extents() with locked folio 1536K and range
[1568K, 1636K)
Function btrfs_cleanup_ordered_extents() normally needs to skip the
ranges inside the folio, as it will normally be cleaned up by
extent_writepage().
Such split error handling is already problematic in the first place.
What's worse is the folio range skipping itself, which is not taking
subpage cases into consideration at all, it will only skip the range
if the page start >= the range start.
In our case, the page start < the range start, since for subpage cases
we can have delalloc ranges inside the folio but not covering the
folio.
So it doesn't skip the page range at all.
This means all the ordered extents, both [1568K, 1584K) and
[1584K, 1596K) will be marked as IOERR.
And those two ordered extents have no more pending ios, it is marked
finished, and *QUEUED* to be deleted from the io tree.
- extent_writepage() do error cleanup
Call btrfs_mark_ordered_io_finished() for the range [1536K, 1600K).
Although ranges [1568K, 1584K) and [1584K, 1596K) are finished, the
deletion from io tree is async, it may or may not happen at this
timing.
If the ranges are not yet removed, we will do double cleaning on those
ranges, triggers the above ordered extent warnings.
In theory there are other bugs, like the cleanup in extent_writepage()
can cause double accounting on ranges that are submitted async
(compression for example).
But that's much harder to trigger because normally we do not mix regular
and compression delalloc ranges.
[FIX]
The folio range split is already buggy and not subpage compatible, it's
introduced a long time ago where subpage support is not even considered.
So instead of splitting the ordered extents cleanup into the folio range
and out of folio range, do all the cleanup inside writepage_delalloc().
- Pass @NULL as locked_folio for btrfs_cleanup_ordered_extents() in
btrfs_run_delalloc_range()
- Skip the btrfs_cleanup_ordered_extents() if writepage_delalloc()
failed
So all ordered extents are only cleaned up by
btrfs_run_delalloc_range().
- Handle the ranges that already have ordered extents allocated
If part of the folio already has ordered extent allocated, and
btrfs_run_delalloc_range() failed, we also need to cleanup that range.
Now we have a concentrated error handling for ordered extents during
btrfs_run_delalloc_range().
Cc: stable(a)vger.kernel.org # 5.15+
Fixes: d1051d6ebf8e ("btrfs: Fix error handling in btrfs_cleanup_ordered_extents")
Reviewed-by: Boris Burkov <boris(a)bur.io>
Signed-off-by: Qu Wenruo <wqu(a)suse.com>
---
fs/btrfs/extent_io.c | 57 ++++++++++++++++++++++++++++++++++++--------
fs/btrfs/inode.c | 2 +-
2 files changed, 48 insertions(+), 11 deletions(-)
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index c068a442753c..7aa3d8c2b0c3 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -1136,12 +1136,17 @@ static bool find_next_delalloc_bitmap(struct folio *folio,
/*
* helper for extent_writepage(), doing all of the delayed allocation setup.
*
- * This returns 1 if btrfs_run_delalloc_range function did all the work required
- * to write the page (copy into inline extent). In this case the IO has
- * been started and the page is already unlocked.
+ * Return >0 if all the dirty blocks are submitted async (compression) or inlined.
+ * The @folio should no longer be touched (treat it as already unlocked).
*
- * This returns 0 if all went well (page still locked)
- * This returns < 0 if there were errors (page still locked)
+ * Return 0 if there is still dirty block that needs to be submitted through
+ * extent_writepage_io().
+ * bio_ctrl->submit_bitmap will indicate which blocks of the folio should be
+ * submitted, and @folio is still kept locked.
+ *
+ * Return <0 if there is any error hit.
+ * Any allocated ordered extent range covering this folio will be marked
+ * finished (IOERR), and @folio is still kept locked.
*/
static noinline_for_stack int writepage_delalloc(struct btrfs_inode *inode,
struct folio *folio,
@@ -1159,6 +1164,16 @@ static noinline_for_stack int writepage_delalloc(struct btrfs_inode *inode,
* last delalloc end.
*/
u64 last_delalloc_end = 0;
+ /*
+ * The range end (exclusive) of the last successfully finished delalloc
+ * range.
+ * Any range covered by ordered extent must either be manually marked
+ * finished (error handling), or has IO submitted (and finish the
+ * ordered extent normally).
+ *
+ * This records the end of ordered extent cleanup if we hit an error.
+ */
+ u64 last_finished_delalloc_end = page_start;
u64 delalloc_start = page_start;
u64 delalloc_end = page_end;
u64 delalloc_to_write = 0;
@@ -1227,11 +1242,19 @@ static noinline_for_stack int writepage_delalloc(struct btrfs_inode *inode,
found_len = last_delalloc_end + 1 - found_start;
if (ret >= 0) {
+ /*
+ * Some delalloc range may be created by previous folios.
+ * Thus we still need to clean those range up during error
+ * handling.
+ */
+ last_finished_delalloc_end = found_start;
/* No errors hit so far, run the current delalloc range. */
ret = btrfs_run_delalloc_range(inode, folio,
found_start,
found_start + found_len - 1,
wbc);
+ if (ret >= 0)
+ last_finished_delalloc_end = found_start + found_len;
} else {
/*
* We've hit an error during previous delalloc range,
@@ -1266,8 +1289,22 @@ static noinline_for_stack int writepage_delalloc(struct btrfs_inode *inode,
delalloc_start = found_start + found_len;
}
- if (ret < 0)
+ /*
+ * It's possible we have some ordered extents created before we hit
+ * an error, cleanup non-async successfully created delalloc ranges.
+ */
+ if (unlikely(ret < 0)) {
+ unsigned int bitmap_size = min(
+ (last_finished_delalloc_end - page_start) >>
+ fs_info->sectorsize_bits,
+ fs_info->sectors_per_page);
+
+ for_each_set_bit(bit, &bio_ctrl->submit_bitmap, bitmap_size)
+ btrfs_mark_ordered_io_finished(inode, folio,
+ page_start + (bit << fs_info->sectorsize_bits),
+ fs_info->sectorsize, false);
return ret;
+ }
out:
if (last_delalloc_end)
delalloc_end = last_delalloc_end;
@@ -1501,13 +1538,13 @@ static int extent_writepage(struct folio *folio, struct btrfs_bio_ctrl *bio_ctrl
bio_ctrl->wbc->nr_to_write--;
-done:
- if (ret) {
+ if (ret)
btrfs_mark_ordered_io_finished(inode, folio,
page_start, PAGE_SIZE, !ret);
- mapping_set_error(folio->mapping, ret);
- }
+done:
+ if (ret < 0)
+ mapping_set_error(folio->mapping, ret);
/*
* Only unlock ranges that are submitted. As there can be some async
* submitted ranges inside the folio.
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 1546f341f9a4..85e36dbe1e1f 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -2301,7 +2301,7 @@ int btrfs_run_delalloc_range(struct btrfs_inode *inode, struct folio *locked_fol
out:
if (ret < 0)
- btrfs_cleanup_ordered_extents(inode, locked_folio, start,
+ btrfs_cleanup_ordered_extents(inode, NULL, start,
end - start + 1);
return ret;
}
--
2.47.1
lockdep detects the following circular locking dependency:
CPU 0 CPU 1
========================== ============================
cdns_uart_isr() printk()
uart_port_lock(port) console_lock()
cdns_uart_console_write()
if (!port->sysrq)
uart_port_lock(port)
uart_handle_break()
port->sysrq = ...
uart_handle_sysrq_char()
printk()
console_lock()
The fixed commit attempts to avoid this situation by only taking the
port lock in cdns_uart_console_write if port->sysrq unset. However, if
(as shown above) cdns_uart_console_write runs before port->sysrq is set,
then it will try to take the port lock anyway. This may result in a
deadlock.
Fix this by splitting sysrq handling into two parts. We use the prepare
helper under the port lock and defer handling until we release the lock.
Fixes: 74ea66d4ca06 ("tty: xuartps: Improve sysrq handling")
Signed-off-by: Sean Anderson <sean.anderson(a)linux.dev>
Cc: <stable(a)vger.kernel.org> # c980248179d: serial: xilinx_uartps: Use port lock wrappers
---
drivers/tty/serial/xilinx_uartps.c | 8 +++-----
1 file changed, 3 insertions(+), 5 deletions(-)
diff --git a/drivers/tty/serial/xilinx_uartps.c b/drivers/tty/serial/xilinx_uartps.c
index beb151be4d32..92ec51870d1d 100644
--- a/drivers/tty/serial/xilinx_uartps.c
+++ b/drivers/tty/serial/xilinx_uartps.c
@@ -287,7 +287,7 @@ static void cdns_uart_handle_rx(void *dev_id, unsigned int isrstatus)
continue;
}
- if (uart_handle_sysrq_char(port, data))
+ if (uart_prepare_sysrq_char(port, data))
continue;
if (is_rxbs_support) {
@@ -495,7 +495,7 @@ static irqreturn_t cdns_uart_isr(int irq, void *dev_id)
!(readl(port->membase + CDNS_UART_CR) & CDNS_UART_CR_RX_DIS))
cdns_uart_handle_rx(dev_id, isrstatus);
- uart_port_unlock(port);
+ uart_unlock_and_check_sysrq(port);
return IRQ_HANDLED;
}
@@ -1380,9 +1380,7 @@ static void cdns_uart_console_write(struct console *co, const char *s,
unsigned int imr, ctrl;
int locked = 1;
- if (port->sysrq)
- locked = 0;
- else if (oops_in_progress)
+ if (oops_in_progress)
locked = uart_port_trylock_irqsave(port, &flags);
else
uart_port_lock_irqsave(port, &flags);
--
2.35.1.1320.gc452695387.dirty
Currently, on book3s-hv, the capability KVM_CAP_SPAPR_TCE_VFIO is only
available for KVM Guests running on PowerNV and not for the KVM guests
running on pSeries hypervisors. This prevents a pSeries hypervisor from
leveraging the in-kernel acceleration for H_PUT_TCE_INDIRECT and
H_STUFF_TCE hcalls that results in slow startup times for large memory
guests.
Fix this by enabling the CAP_SPAPR_TCE_VFIO on the pSeries hosts for the
nested PAPR guests.
Fixes: f431a8cde7f1 ("powerpc/iommu: Reimplement the iommu_table_group_ops for pSeries")
Cc: stable(a)vger.kernel.org
Signed-off-by: Amit Machhiwal <amachhiw(a)linux.ibm.com>
---
arch/powerpc/kvm/powerpc.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index ce1d91eed231..9c479c7381e4 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -554,7 +554,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
r = 1;
break;
case KVM_CAP_SPAPR_TCE_VFIO:
- r = !!cpu_has_feature(CPU_FTR_HVMODE);
+ r = !!cpu_has_feature(CPU_FTR_HVMODE) || is_kvmppc_hv_enabled(kvm);
break;
case KVM_CAP_PPC_RTAS:
case KVM_CAP_PPC_FIXUP_HCALL:
base-commit: eea6e4b4dfb8859446177c32961c96726d0117be
--
2.47.1
The patch titled
Subject: scripts/gdb: fix aarch64 userspace detection in get_current_task
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
scripts-gdb-fix-aarch64-userspace-detection-in-get_current_task.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Jan Kiszka <jan.kiszka(a)siemens.com>
Subject: scripts/gdb: fix aarch64 userspace detection in get_current_task
Date: Fri, 10 Jan 2025 11:36:33 +0100
At least recent gdb releases (seen with 14.2) return SP_EL0 as signed long
which lets the right-shift always return 0.
Link: https://lkml.kernel.org/r/dcd2fabc-9131-4b48-8419-6444e2d67454@siemens.com
Signed-off-by: Jan Kiszka <jan.kiszka(a)siemens.com>
Cc: Barry Song <baohua(a)kernel.org>
Cc: Kieran Bingham <kbingham(a)kernel.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
scripts/gdb/linux/cpus.py | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/scripts/gdb/linux/cpus.py~scripts-gdb-fix-aarch64-userspace-detection-in-get_current_task
+++ a/scripts/gdb/linux/cpus.py
@@ -167,7 +167,7 @@ def get_current_task(cpu):
var_ptr = gdb.parse_and_eval("&pcpu_hot.current_task")
return per_cpu(var_ptr, cpu).dereference()
elif utils.is_target_arch("aarch64"):
- current_task_addr = gdb.parse_and_eval("$SP_EL0")
+ current_task_addr = gdb.parse_and_eval("(unsigned long)$SP_EL0")
if (current_task_addr >> 63) != 0:
current_task = current_task_addr.cast(task_ptr_type)
return current_task.dereference()
_
Patches currently in -mm which might be from jan.kiszka(a)siemens.com are
scripts-gdb-fix-aarch64-userspace-detection-in-get_current_task.patch
The patch titled
Subject: mm/vmscan: fix pgdemote_* accounting with lru_gen_enabled
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
mm-vmscan-fix-pgdemote_-accounting-with-lru_gen_enabled.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Li Zhijian <lizhijian(a)fujitsu.com>
Subject: mm/vmscan: fix pgdemote_* accounting with lru_gen_enabled
Date: Fri, 10 Jan 2025 20:21:33 +0800
Commit f77f0c751478 ("mm,memcg: provide per-cgroup counters for NUMA
balancing operations") moved the accounting of PGDEMOTE_* statistics to
shrink_inactive_list(). However, shrink_inactive_list() is not called
when lrugen_enabled is true, leading to incorrect demotion statistics
despite actual demotion events occurring.
Add the PGDEMOTE_* accounting in evict_folios(), ensuring that demotion
statistics are correctly updated regardless of the lru_gen_enabled state.
This fix is crucial for systems that rely on accurate NUMA balancing
metrics for performance tuning and resource management.
Link: https://lkml.kernel.org/r/20250110122133.423481-2-lizhijian@fujitsu.com
Fixes: f77f0c751478 ("mm,memcg: provide per-cgroup counters for NUMA balancing operations")
Signed-off-by: Li Zhijian <lizhijian(a)fujitsu.com>
Cc: Kaiyang Zhao <kaiyang2(a)cs.cmu.edu>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/vmscan.c | 2 ++
1 file changed, 2 insertions(+)
--- a/mm/vmscan.c~mm-vmscan-fix-pgdemote_-accounting-with-lru_gen_enabled
+++ a/mm/vmscan.c
@@ -4649,6 +4649,8 @@ retry:
__mod_lruvec_state(lruvec, PGDEMOTE_KSWAPD + reclaimer_offset(),
stat.nr_demoted);
+ __mod_lruvec_state(lruvec, PGDEMOTE_KSWAPD + reclaimer_offset(),
+ stat.nr_demoted);
item = PGSTEAL_KSWAPD + reclaimer_offset();
if (!cgroup_reclaim(sc))
__count_vm_events(item, reclaimed);
_
Patches currently in -mm which might be from lizhijian(a)fujitsu.com are
mm-vmscan-accumulate-nr_demoted-for-accurate-demotion-statistics.patch
mm-vmscan-fix-pgdemote_-accounting-with-lru_gen_enabled.patch
selftests-mm-add-a-few-missing-gitignore-files.patch
The patch titled
Subject: mm/vmscan: accumulate nr_demoted for accurate demotion statistics
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
mm-vmscan-accumulate-nr_demoted-for-accurate-demotion-statistics.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Li Zhijian <lizhijian(a)fujitsu.com>
Subject: mm/vmscan: accumulate nr_demoted for accurate demotion statistics
Date: Fri, 10 Jan 2025 20:21:32 +0800
In shrink_folio_list(), demote_folio_list() can be called multiple times,
which can lead to inaccurate demotion statistics if the number of demoted
pages is not accumulated correctly.
Accumulate the nr_demoted count across multiple calls to
demote_folio_list(), ensuring accurate reporting of demotion statistics.
Link: https://lkml.kernel.org/r/20250110122133.423481-1-lizhijian@fujitsu.com
Fixes: f77f0c751478 ("mm,memcg: provide per-cgroup counters for NUMA balancing operations")
Signed-off-by: Li Zhijian <lizhijian(a)fujitsu.com>
Acked-by: Kaiyang Zhao <kaiyang2(a)cs.cmu.edu>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/vmscan.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/mm/vmscan.c~mm-vmscan-accumulate-nr_demoted-for-accurate-demotion-statistics
+++ a/mm/vmscan.c
@@ -1522,7 +1522,7 @@ keep:
/* 'folio_list' is always empty here */
/* Migrate folios selected for demotion */
- stat->nr_demoted = demote_folio_list(&demote_folios, pgdat);
+ stat->nr_demoted += demote_folio_list(&demote_folios, pgdat);
nr_reclaimed += stat->nr_demoted;
/* Folios that could not be demoted are still in @demote_folios */
if (!list_empty(&demote_folios)) {
_
Patches currently in -mm which might be from lizhijian(a)fujitsu.com are
mm-vmscan-accumulate-nr_demoted-for-accurate-demotion-statistics.patch
selftests-mm-add-a-few-missing-gitignore-files.patch
The patch titled
Subject: fs/proc: fix softlockup in __read_vmcore (part 2)
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
fs-proc-fix-softlockup-in-__read_vmcore-part-2.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Rik van Riel <riel(a)surriel.com>
Subject: fs/proc: fix softlockup in __read_vmcore (part 2)
Date: Fri, 10 Jan 2025 10:28:21 -0500
Since commit 5cbcb62dddf5 ("fs/proc: fix softlockup in __read_vmcore") the
number of softlockups in __read_vmcore at kdump time have gone down, but
they still happen sometimes.
In a memory constrained environment like the kdump image, a softlockup is
not just a harmless message, but it can interfere with things like RCU
freeing memory, causing the crashdump to get stuck.
The second loop in __read_vmcore has a lot more opportunities for natural
sleep points, like scheduling out while waiting for a data write to
happen, but apparently that is not always enough.
Add a cond_resched() to the second loop in __read_vmcore to (hopefully)
get rid of the softlockups.
Link: https://lkml.kernel.org/r/20250110102821.2a37581b@fangorn
Fixes: 5cbcb62dddf5 ("fs/proc: fix softlockup in __read_vmcore")
Signed-off-by: Rik van Riel <riel(a)surriel.com>
Reported-by: Breno Leitao <leitao(a)debian.org>
Cc: Baoquan He <bhe(a)redhat.com>
Cc: Dave Young <dyoung(a)redhat.com>
Cc: Vivek Goyal <vgoyal(a)redhat.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/proc/vmcore.c | 2 ++
1 file changed, 2 insertions(+)
--- a/fs/proc/vmcore.c~fs-proc-fix-softlockup-in-__read_vmcore-part-2
+++ a/fs/proc/vmcore.c
@@ -404,6 +404,8 @@ static ssize_t __read_vmcore(struct iov_
if (!iov_iter_count(iter))
return acc;
}
+
+ cond_resched();
}
return acc;
_
Patches currently in -mm which might be from riel(a)surriel.com are
fs-proc-fix-softlockup-in-__read_vmcore-part-2.patch
mm-remove-unnecessary-calls-to-lru_add_drain.patch
Recent reports have shown how we sometimes call vsock_*_has_data()
when a vsock socket has been de-assigned from a transport (see attached
links), but we shouldn't.
Previous commits should have solved the real problems, but we may have
more in the future, so to avoid null-ptr-deref, we can return 0
(no space, no data available) but with a warning.
This way the code should continue to run in a nearly consistent state
and have a warning that allows us to debug future problems.
Fixes: c0cfa2d8a788 ("vsock: add multi-transports support")
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/netdev/Z2K%2FI4nlHdfMRTZC@v4bel-B760M-AORUS-ELITE-A…
Link: https://lore.kernel.org/netdev/5ca20d4c-1017-49c2-9516-f6f75fd331e9@rbox.co/
Link: https://lore.kernel.org/netdev/677f84a8.050a0220.25a300.01b3.GAE@google.com/
Co-developed-by: Hyunwoo Kim <v4bel(a)theori.io>
Signed-off-by: Hyunwoo Kim <v4bel(a)theori.io>
Co-developed-by: Wongi Lee <qwerty(a)theori.io>
Signed-off-by: Wongi Lee <qwerty(a)theori.io>
Signed-off-by: Stefano Garzarella <sgarzare(a)redhat.com>
---
net/vmw_vsock/af_vsock.c | 9 +++++++++
1 file changed, 9 insertions(+)
diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
index 74d35a871644..fa9d1b49599b 100644
--- a/net/vmw_vsock/af_vsock.c
+++ b/net/vmw_vsock/af_vsock.c
@@ -879,6 +879,9 @@ EXPORT_SYMBOL_GPL(vsock_create_connected);
s64 vsock_stream_has_data(struct vsock_sock *vsk)
{
+ if (WARN_ON(!vsk->transport))
+ return 0;
+
return vsk->transport->stream_has_data(vsk);
}
EXPORT_SYMBOL_GPL(vsock_stream_has_data);
@@ -887,6 +890,9 @@ s64 vsock_connectible_has_data(struct vsock_sock *vsk)
{
struct sock *sk = sk_vsock(vsk);
+ if (WARN_ON(!vsk->transport))
+ return 0;
+
if (sk->sk_type == SOCK_SEQPACKET)
return vsk->transport->seqpacket_has_data(vsk);
else
@@ -896,6 +902,9 @@ EXPORT_SYMBOL_GPL(vsock_connectible_has_data);
s64 vsock_stream_has_space(struct vsock_sock *vsk)
{
+ if (WARN_ON(!vsk->transport))
+ return 0;
+
return vsk->transport->stream_has_space(vsk);
}
EXPORT_SYMBOL_GPL(vsock_stream_has_space);
--
2.47.1
During virtio_transport_release() we can schedule a delayed work to
perform the closing of the socket before destruction.
The destructor is called either when the socket is really destroyed
(reference counter to zero), or it can also be called when we are
de-assigning the transport.
In the former case, we are sure the delayed work has completed, because
it holds a reference until it completes, so the destructor will
definitely be called after the delayed work is finished.
But in the latter case, the destructor is called by AF_VSOCK core, just
after the release(), so there may still be delayed work scheduled.
Refactor the code, moving the code to delete the close work already in
the do_close() to a new function. Invoke it during destruction to make
sure we don't leave any pending work.
Fixes: c0cfa2d8a788 ("vsock: add multi-transports support")
Cc: stable(a)vger.kernel.org
Reported-by: Hyunwoo Kim <v4bel(a)theori.io>
Closes: https://lore.kernel.org/netdev/Z37Sh+utS+iV3+eb@v4bel-B760M-AORUS-ELITE-AX/
Signed-off-by: Stefano Garzarella <sgarzare(a)redhat.com>
---
net/vmw_vsock/virtio_transport_common.c | 29 ++++++++++++++++++-------
1 file changed, 21 insertions(+), 8 deletions(-)
diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
index 51a494b69be8..7f7de6d88096 100644
--- a/net/vmw_vsock/virtio_transport_common.c
+++ b/net/vmw_vsock/virtio_transport_common.c
@@ -26,6 +26,9 @@
/* Threshold for detecting small packets to copy */
#define GOOD_COPY_LEN 128
+static void virtio_transport_cancel_close_work(struct vsock_sock *vsk,
+ bool cancel_timeout);
+
static const struct virtio_transport *
virtio_transport_get_ops(struct vsock_sock *vsk)
{
@@ -1109,6 +1112,8 @@ void virtio_transport_destruct(struct vsock_sock *vsk)
{
struct virtio_vsock_sock *vvs = vsk->trans;
+ virtio_transport_cancel_close_work(vsk, true);
+
kfree(vvs);
vsk->trans = NULL;
}
@@ -1204,17 +1209,11 @@ static void virtio_transport_wait_close(struct sock *sk, long timeout)
}
}
-static void virtio_transport_do_close(struct vsock_sock *vsk,
- bool cancel_timeout)
+static void virtio_transport_cancel_close_work(struct vsock_sock *vsk,
+ bool cancel_timeout)
{
struct sock *sk = sk_vsock(vsk);
- sock_set_flag(sk, SOCK_DONE);
- vsk->peer_shutdown = SHUTDOWN_MASK;
- if (vsock_stream_has_data(vsk) <= 0)
- sk->sk_state = TCP_CLOSING;
- sk->sk_state_change(sk);
-
if (vsk->close_work_scheduled &&
(!cancel_timeout || cancel_delayed_work(&vsk->close_work))) {
vsk->close_work_scheduled = false;
@@ -1226,6 +1225,20 @@ static void virtio_transport_do_close(struct vsock_sock *vsk,
}
}
+static void virtio_transport_do_close(struct vsock_sock *vsk,
+ bool cancel_timeout)
+{
+ struct sock *sk = sk_vsock(vsk);
+
+ sock_set_flag(sk, SOCK_DONE);
+ vsk->peer_shutdown = SHUTDOWN_MASK;
+ if (vsock_stream_has_data(vsk) <= 0)
+ sk->sk_state = TCP_CLOSING;
+ sk->sk_state_change(sk);
+
+ virtio_transport_cancel_close_work(vsk, cancel_timeout);
+}
+
static void virtio_transport_close_timeout(struct work_struct *work)
{
struct vsock_sock *vsk =
--
2.47.1
The reference count of the device incremented in device_initialize() is
not decremented when device_add() fails. Add a put_device() call before
returning from the function to decrement reference count for cleanup.
Or it could cause memory leak.
As comment of device_add() says, if device_add() succeeds, you should
call device_del() when you want to get rid of it. If device_add() has
not succeeded, use only put_device() to drop the reference count.
Found by code review.
Cc: stable(a)vger.kernel.org
Fixes: ed542bed126c ("[SCSI] raid class: handle component-add errors")
Signed-off-by: Ma Ke <make_ruc2021(a)163.com>
---
drivers/scsi/raid_class.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/scsi/raid_class.c b/drivers/scsi/raid_class.c
index 898a0bdf8df6..2cb2949a78c6 100644
--- a/drivers/scsi/raid_class.c
+++ b/drivers/scsi/raid_class.c
@@ -251,6 +251,7 @@ int raid_component_add(struct raid_template *r,struct device *raid_dev,
list_del(&rc->node);
rd->component_count--;
put_device(component_dev);
+ put_device(&rc->dev);
kfree(rc);
return err;
}
--
2.25.1
From: Juntong Deng <juntong.deng(a)outlook.com>
commit bdcb8aa434c6d36b5c215d02a9ef07551be25a37 upstream.
In gfs2_put_super(), whether withdrawn or not, the quota should
be cleaned up by gfs2_quota_cleanup().
Otherwise, struct gfs2_sbd will be freed before gfs2_qd_dealloc (rcu
callback) has run for all gfs2_quota_data objects, resulting in
use-after-free.
Also, gfs2_destroy_threads() and gfs2_quota_cleanup() is already called
by gfs2_make_fs_ro(), so in gfs2_put_super(), after calling
gfs2_make_fs_ro(), there is no need to call them again.
Reported-by: syzbot+29c47e9e51895928698c(a)syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=29c47e9e51895928698c
Signed-off-by: Juntong Deng <juntong.deng(a)outlook.com>
Signed-off-by: Andreas Gruenbacher <agruenba(a)redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Signed-off-by: Clayton Casciato <majortomtosourcecontrol(a)gmail.com>
(cherry picked from commit 7ad4e0a4f61c57c3ca291ee010a9d677d0199fba)
Signed-off-by: Vasiliy Kovalev <kovalev(a)altlinux.org>
---
Backport to fix CVE-2023-52760
Link: https://www.cve.org/CVERecord/?id=CVE-2023-52760
---
fs/gfs2/super.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/fs/gfs2/super.c b/fs/gfs2/super.c
index 268651ac9fc848..98158559893f48 100644
--- a/fs/gfs2/super.c
+++ b/fs/gfs2/super.c
@@ -590,6 +590,8 @@ static void gfs2_put_super(struct super_block *sb)
if (!sb_rdonly(sb)) {
gfs2_make_fs_ro(sdp);
+ } else {
+ gfs2_quota_cleanup(sdp);
}
WARN_ON(gfs2_withdrawing(sdp));
--
2.33.8
The FRED RSP0 MSR is only used for delivering events when running
userspace. Linux leverages this property to reduce expensive MSR
writes and optimize context switches. The kernel only writes the
MSR when about to run userspace *and* when the MSR has actually
changed since the last time userspace ran.
This optimization is implemented by maintaining a per-CPU cache of
FRED RSP0 and then checking that against the value for the top of
current task stack before running userspace.
However cpu_init_fred_exceptions() writes the MSR without updating
the per-CPU cache. This means that the kernel might return to
userspace with MSR_IA32_FRED_RSP0==0 when it needed to point to the
top of current task stack. This would induce a double fault (#DF),
which is bad.
A context switch after cpu_init_fred_exceptions() can paper over
the issue since it updates the cached value. That evidently
happens most of the time explaining how this bug got through.
Fix the bug through resynchronizing the FRED RSP0 MSR with its
per-CPU cache in cpu_init_fred_exceptions().
Fixes: fe85ee391966 ("x86/entry: Set FRED RSP0 on return to userspace instead of context switch")
Signed-off-by: Xin Li (Intel) <xin(a)zytor.com>
Cc: stable(a)vger.kernel.org
---
arch/x86/kernel/fred.c | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)
diff --git a/arch/x86/kernel/fred.c b/arch/x86/kernel/fred.c
index 8d32c3f48abc..5e2cd1004980 100644
--- a/arch/x86/kernel/fred.c
+++ b/arch/x86/kernel/fred.c
@@ -50,7 +50,13 @@ void cpu_init_fred_exceptions(void)
FRED_CONFIG_ENTRYPOINT(asm_fred_entrypoint_user));
wrmsrl(MSR_IA32_FRED_STKLVLS, 0);
- wrmsrl(MSR_IA32_FRED_RSP0, 0);
+
+ /*
+ * Ater a CPU offline/online cycle, the FRED RSP0 MSR should be
+ * resynchronized with its per-CPU cache.
+ */
+ wrmsrl(MSR_IA32_FRED_RSP0, __this_cpu_read(fred_rsp0));
+
wrmsrl(MSR_IA32_FRED_RSP1, 0);
wrmsrl(MSR_IA32_FRED_RSP2, 0);
wrmsrl(MSR_IA32_FRED_RSP3, 0);
base-commit: 59011effc84d7b167f4b6542bd05c7aff1b7574a
--
2.47.1
For interrupt-map entries, the DTS specification requires
that #address-cells is defined for both the child node and the
interrupt parent. For the PCIe interrupt-map entries, the parent
node ("gic") has not specified #address-cells. The existing layout
of the PCIe interrupt-map entries indicates that it assumes
that #address-cells is zero for this node.
Explicitly set #address-cells to zero for "gic" so that it complies
with the device tree specification.
NVIDIA EDK2 works around this issue by assuming #address-cells
is zero in this scenario, but that workaround is being removed and so
this update is needed or else NVIDIA EDK2 cannot successfully parse the
device tree and the board cannot boot.
Fixes: ec142c44b026 ("arm64: tegra: Add P2U and PCIe controller nodes to Tegra234 DT")
Signed-off-by: Brad Griffis <bgriffis(a)nvidia.com>
Cc: stable(a)vger.kernel.org
---
v2
* Add "Fixes" and CC stable
* Give further context about why the patch is needed
arch/arm64/boot/dts/nvidia/tegra234.dtsi | 1 +
1 file changed, 1 insertion(+)
diff --git a/arch/arm64/boot/dts/nvidia/tegra234.dtsi b/arch/arm64/boot/dts/nvidia/tegra234.dtsi
index 984c85eab41a..e1c07c99e9bd 100644
--- a/arch/arm64/boot/dts/nvidia/tegra234.dtsi
+++ b/arch/arm64/boot/dts/nvidia/tegra234.dtsi
@@ -4010,6 +4010,7 @@ ccplex@e000000 {
gic: interrupt-controller@f400000 {
compatible = "arm,gic-v3";
+ #address-cells = <0>;
reg = <0x0 0x0f400000 0x0 0x010000>, /* GICD */
<0x0 0x0f440000 0x0 0x200000>; /* GICR */
interrupt-parent = <&gic>;
--
2.34.1
This series is aimed at fixing a soundness issue with how dynamically
allocated LockClassKeys are handled. Currently, LockClassKeys can be
used without being Pin'd, which can break lockdep since it relies on
address stability. Similarly, these keys are not automatically
(de)registered with lockdep.
At the suggestion of Alice Ryhl, this series includes a patch for
-stable kernels that disables dynamically allocated keys. This prevents
backported patches from using the unsound implementation.
Currently, this series requires that all dynamically allocated
LockClassKeys have a lifetime of 'static (i.e., they must be leaked
after allocation). This is because Lock does not currently keep a
reference to the LockClassKey, instead passing it to C via FFI. This
causes a problem because the rust compiler would allow creating a
'static Lock with a 'a LockClassKey (with 'a < 'static) while C would
expect the LockClassKey to live as long as the lock. This problem
represents an avenue for future work.
---
Changes from RFC:
- Split into two commits so that dynamically allocated LockClassKeys are
removed from stable kernels. (Thanks Alice Ryhl)
- Extract calls to C lockdep functions into helpers so things build
properly when LOCKDEP=n. (Thanks Benno Lossin)
- Remove extraneous `get_ref()` calls. (Thanks Benno Lossin)
- Provide better documentation for `new_dynamic()`. (Thanks Benno
Lossin)
- Ran rustfmt to fix formatting and some extraneous changes. (Thanks
Alice Ryhl and Benno Lossin)
- Link to RFC: https://lore.kernel.org/r/20240905-rust-lockdep-v1-1-d2c9c21aa8b2@gmail.com
---
Changes in v2:
- Dropped formatting change that's already fixed upstream (Thanks Dirk
Behme).
- Moved safety comment to the right point in the patch series (Thanks
Dirk Behme and Boqun Feng).
- Added an example of dynamic LockClassKey usage (Thanks Boqun Feng).
- Link to v1: https://lore.kernel.org/r/20241004-rust-lockdep-v1-0-e9a5c45721fc@gmail.com
---
Mitchell Levy (2):
rust: lockdep: Remove support for dynamically allocated LockClassKeys
rust: lockdep: Use Pin for all LockClassKey usages
rust/helpers/helpers.c | 1 +
rust/helpers/sync.c | 13 +++++++++
rust/kernel/sync.rs | 63 ++++++++++++++++++++++++++++++++++-------
rust/kernel/sync/condvar.rs | 5 ++--
rust/kernel/sync/lock.rs | 9 ++----
rust/kernel/sync/lock/global.rs | 5 ++--
rust/kernel/sync/poll.rs | 2 +-
rust/kernel/workqueue.rs | 3 +-
8 files changed, 78 insertions(+), 23 deletions(-)
---
base-commit: 0c5928deada15a8d075516e6e0d9ee19011bb000
change-id: 20240905-rust-lockdep-d3e30521c8ba
Best regards,
--
Mitchell Levy <levymitchell0(a)gmail.com>
Hi!
When updating the Alpine Linux kernel to 6.6.70 I bumped into a new compile error:
LD .tmp_vmlinux.kallsyms1
ld: vmlinux.o: in function `__crash_kexec':
/home/ncopa/aports/main/linux-lts/src/linux-6.6/kernel/kexec_core.c:1075:(.text+0x1bffd0): undefined reference to `machine_crash_shutdown'
ld: vmlinux.o: in function `do_kexec_load':
/home/ncopa/aports/main/linux-lts/src/linux-6.6/kernel/kexec.c:166:(.text+0x1c1b4e): undefined reference to `arch_kexec_protect_crashkres'
ld: /home/ncopa/aports/main/linux-lts/src/linux-6.6/kernel/kexec.c:105:(.text+0x1c1b94): undefined reference to `arch_kexec_unprotect_crashkres'
make[2]: *** [/home/ncopa/aports/main/linux-lts/src/linux-6.6/scripts/Makefile.vmlinux:37: vmlinux] Error 1
make[1]: *** [/home/ncopa/aports/main/linux-lts/src/linux-6.6/Makefile:1164: vmlinux] Error 2
make: ***
[/home/ncopa/aports/main/linux-lts/src/linux-6.6/Makefile:234:
__sub-make] Error 2
$ grep -E '(CRASH_CORE|_KEXEC)' .config
CONFIG_CRASH_CORE=y
CONFIG_KEXEC_CORE=y
CONFIG_KEXEC=y
# CONFIG_KEXEC_FILE is not set
# CONFIG_KEXEC_JUMP is not set
CONFIG_ARCH_SUPPORTS_KEXEC=y
CONFIG_ARCH_SUPPORTS_KEXEC_FILE=y
CONFIG_ARCH_SUPPORTS_KEXEC_PURGATORY=y
CONFIG_ARCH_SUPPORTS_KEXEC_SIG=y
CONFIG_ARCH_SUPPORTS_KEXEC_SIG_FORCE=y
CONFIG_ARCH_SUPPORTS_KEXEC_BZIMAGE_VERIFY_SIG=y
CONFIG_ARCH_SUPPORTS_KEXEC_JUMP=y
Looking at the git history I notice this commit:
> From e5b1574a8ca28c40cf53eda43f6c3b016ed41e27 Mon Sep 17 00:00:00 2001
> From: Baoquan He <bhe(a)redhat.com>
> Date: Wed, 24 Jan 2024 13:12:46 +0800
> Subject: x86, crash: wrap crash dumping code into crash related ifdefs
>
> [ Upstream commit a4eeb2176d89fdf2785851521577b94b31690a60 ]
>
> Now crash codes under kernel/ folder has been split out from kexec
> code, crash dumping can be separated from kexec reboot in config
> items on x86 with some adjustments.
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=…
So a wild guess is that the commit(s) that splits out crash codes under
kernel/ directory from kexec code also needs to be backported to
linux-6.6.y?
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=…
Thanks!
-nc
On removal of the device or unloading of the kernel module a potential
NULL pointer dereference occurs.
The following sequence deletes the interface:
brcmf_detach()
brcmf_remove_interface()
brcmf_del_if()
Inside the brcmf_del_if() function the drvr->if2bss[ifidx] is updated to
BRCMF_BSSIDX_INVALID (-1) if the bsscfgidx matches.
After brcmf_remove_interface() call the brcmf_proto_detach() function is
called providing the following sequence:
brcmf_detach()
brcmf_proto_detach()
brcmf_proto_msgbuf_detach()
brcmf_flowring_detach()
brcmf_msgbuf_delete_flowring()
brcmf_msgbuf_remove_flowring()
brcmf_flowring_delete()
brcmf_get_ifp()
brcmf_txfinalize()
Since brcmf_get_ip() can and actually will return NULL in this case the
call to brcmf_txfinalize() will result in a NULL pointer dereference
inside brcmf_txfinalize() when trying to update
ifp->ndev->stats.tx_errors.
This will only happen if a flowring still has an skb.
Cc: stable(a)vger.kernel.org
Signed-off-by: Marcel Hamer <marcel.hamer(a)windriver.com>
Link: https://lore.kernel.org/all/b519e746-ddfd-421f-d897-7620d229e4b2@gmail.com/
---
drivers/net/wireless/broadcom/brcm80211/brcmfmac/core.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/core.c b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/core.c
index c3a57e30c855..cf731bc7ae24 100644
--- a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/core.c
+++ b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/core.c
@@ -549,7 +549,7 @@ void brcmf_txfinalize(struct brcmf_if *ifp, struct sk_buff *txp, bool success)
wake_up(&ifp->pend_8021x_wait);
}
- if (!success)
+ if (!success && ifp)
ifp->ndev->stats.tx_errors++;
brcmu_pkt_buf_free_skb(txp);
--
2.34.1
I'm announcing the release of the 6.6.71 kernel.
It's only needed if you could not properly build 6.6.70 as there is a
configuration that is pretty common that would fail to build properly.
That is now resolved. If you did not have a problem building 6.6.70, no
need to upgrade at this point in time.
The updated 6.6.y git tree can be found at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git linux-6.6.y
and can be browsed at the normal kernel.org git web browser:
https://git.kernel.org/?p=linux/kernel/git/stable/linux-stable.git;a=summary
thanks,
greg k-h
------------
Makefile | 2 +-
arch/x86/kernel/Makefile | 4 ++--
arch/x86/kernel/cpu/mshyperv.c | 10 ++--------
arch/x86/kernel/kexec-bzimage64.c | 4 ----
arch/x86/kernel/kvm.c | 4 ++--
arch/x86/kernel/machine_kexec_64.c | 3 ---
arch/x86/kernel/reboot.c | 4 ++--
arch/x86/kernel/setup.c | 2 +-
arch/x86/kernel/smp.c | 2 +-
arch/x86/xen/enlighten_hvm.c | 4 ----
arch/x86/xen/mmu_pv.c | 2 +-
11 files changed, 12 insertions(+), 29 deletions(-)
Greg Kroah-Hartman (3):
Revert "x86/hyperv: Fix hv tsc page based sched_clock for hibernation"
Revert "x86, crash: wrap crash dumping code into crash related ifdefs"
Linux 6.6.71
Naman Jain (1):
x86/hyperv: Fix hv tsc page based sched_clock for hibernation
Transport's release() and destruct() are called when de-assigning the
vsock transport. These callbacks can touch some socket state like
sock flags, sk_state, and peer_shutdown.
Since we are reassigning the socket to a new transport during
vsock_connect(), let's reset these fields to have a clean state with
the new transport.
Fixes: c0cfa2d8a788 ("vsock: add multi-transports support")
Cc: stable(a)vger.kernel.org
Signed-off-by: Stefano Garzarella <sgarzare(a)redhat.com>
---
net/vmw_vsock/af_vsock.c | 9 +++++++++
1 file changed, 9 insertions(+)
diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
index 5cf8109f672a..74d35a871644 100644
--- a/net/vmw_vsock/af_vsock.c
+++ b/net/vmw_vsock/af_vsock.c
@@ -491,6 +491,15 @@ int vsock_assign_transport(struct vsock_sock *vsk, struct vsock_sock *psk)
*/
vsk->transport->release(vsk);
vsock_deassign_transport(vsk);
+
+ /* transport's release() and destruct() can touch some socket
+ * state, since we are reassigning the socket to a new transport
+ * during vsock_connect(), let's reset these fields to have a
+ * clean state.
+ */
+ sock_reset_flag(sk, SOCK_DONE);
+ sk->sk_state = TCP_CLOSE;
+ vsk->peer_shutdown = 0;
}
/* We increase the module refcnt to prevent the transport unloading
--
2.47.1
After commit f7fe64ad0f22 ("drm/sched: Split free_job into own work item")
and with drivers who use the unordered workqueue sched_jobs can be freed
in parallel as soon as the complete_all(&entity->entity_idle) is called.
This makes all dereferencing in the lower part of the worker unsafe so
lets fix it by moving the complete_all() call to after the worker is done
touching the job.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin(a)igalia.com>
Fixes: f7fe64ad0f22 ("drm/sched: Split free_job into own work item")
Cc: Christian König <christian.koenig(a)amd.com>
Cc: Danilo Krummrich <dakr(a)redhat.com>
Cc: Matthew Brost <matthew.brost(a)intel.com>
Cc: Philipp Stanner <pstanner(a)redhat.com>
Cc: <stable(a)vger.kernel.org> # v6.8+
---
drivers/gpu/drm/scheduler/sched_main.c | 7 ++-----
1 file changed, 2 insertions(+), 5 deletions(-)
diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
index 57da84908752..f0d02c061c23 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -1188,7 +1188,6 @@ static void drm_sched_run_job_work(struct work_struct *w)
container_of(w, struct drm_gpu_scheduler, work_run_job);
struct drm_sched_entity *entity;
struct dma_fence *fence;
- struct drm_sched_fence *s_fence;
struct drm_sched_job *sched_job;
int r;
@@ -1207,15 +1206,12 @@ static void drm_sched_run_job_work(struct work_struct *w)
return;
}
- s_fence = sched_job->s_fence;
-
atomic_add(sched_job->credits, &sched->credit_count);
drm_sched_job_begin(sched_job);
trace_drm_run_job(sched_job, entity);
fence = sched->ops->run_job(sched_job);
- complete_all(&entity->entity_idle);
- drm_sched_fence_scheduled(s_fence, fence);
+ drm_sched_fence_scheduled(sched_job->s_fence, fence);
if (!IS_ERR_OR_NULL(fence)) {
/* Drop for original kref_init of the fence */
@@ -1232,6 +1228,7 @@ static void drm_sched_run_job_work(struct work_struct *w)
PTR_ERR(fence) : 0);
}
+ complete_all(&entity->entity_idle);
wake_up(&sched->job_scheduled);
drm_sched_run_job_queue(sched);
}
--
2.47.1
Some of the core functions can only be called if the transport
has been assigned.
As Michal reported, a socket might have the transport at NULL,
for example after a failed connect(), causing the following trace:
BUG: kernel NULL pointer dereference, address: 00000000000000a0
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 12faf8067 P4D 12faf8067 PUD 113670067 PMD 0
Oops: Oops: 0000 [#1] PREEMPT SMP NOPTI
CPU: 15 UID: 0 PID: 1198 Comm: a.out Not tainted 6.13.0-rc2+
RIP: 0010:vsock_connectible_has_data+0x1f/0x40
Call Trace:
vsock_bpf_recvmsg+0xca/0x5e0
sock_recvmsg+0xb9/0xc0
__sys_recvfrom+0xb3/0x130
__x64_sys_recvfrom+0x20/0x30
do_syscall_64+0x93/0x180
entry_SYSCALL_64_after_hwframe+0x76/0x7e
So we need to check the `vsk->transport` in vsock_bpf_recvmsg(),
especially for connected sockets (stream/seqpacket) as we already
do in __vsock_connectible_recvmsg().
Fixes: 634f1a7110b4 ("vsock: support sockmap")
Cc: stable(a)vger.kernel.org
Reported-by: Michal Luczaj <mhal(a)rbox.co>
Closes: https://lore.kernel.org/netdev/5ca20d4c-1017-49c2-9516-f6f75fd331e9@rbox.co/
Tested-by: Michal Luczaj <mhal(a)rbox.co>
Reported-by: syzbot+3affdbfc986ecd9200fd(a)syzkaller.appspotmail.com
Closes: https://lore.kernel.org/netdev/677f84a8.050a0220.25a300.01b3.GAE@google.com/
Tested-by: syzbot+3affdbfc986ecd9200fd(a)syzkaller.appspotmail.com
Reviewed-by: Hyunwoo Kim <v4bel(a)theori.io>
Acked-by: Michael S. Tsirkin <mst(a)redhat.com>
Reviewed-by: Luigi Leonardi <leonardi(a)redhat.com>
Signed-off-by: Stefano Garzarella <sgarzare(a)redhat.com>
---
net/vmw_vsock/vsock_bpf.c | 9 +++++++++
1 file changed, 9 insertions(+)
diff --git a/net/vmw_vsock/vsock_bpf.c b/net/vmw_vsock/vsock_bpf.c
index 4aa6e74ec295..f201d9eca1df 100644
--- a/net/vmw_vsock/vsock_bpf.c
+++ b/net/vmw_vsock/vsock_bpf.c
@@ -77,6 +77,7 @@ static int vsock_bpf_recvmsg(struct sock *sk, struct msghdr *msg,
size_t len, int flags, int *addr_len)
{
struct sk_psock *psock;
+ struct vsock_sock *vsk;
int copied;
psock = sk_psock_get(sk);
@@ -84,6 +85,13 @@ static int vsock_bpf_recvmsg(struct sock *sk, struct msghdr *msg,
return __vsock_recvmsg(sk, msg, len, flags);
lock_sock(sk);
+ vsk = vsock_sk(sk);
+
+ if (!vsk->transport) {
+ copied = -ENODEV;
+ goto out;
+ }
+
if (vsock_has_data(sk, psock) && sk_psock_queue_empty(psock)) {
release_sock(sk);
sk_psock_put(sk, psock);
@@ -108,6 +116,7 @@ static int vsock_bpf_recvmsg(struct sock *sk, struct msghdr *msg,
copied = sk_msg_recvmsg(sk, psock, msg, len, flags);
}
+out:
release_sock(sk);
sk_psock_put(sk, psock);
--
2.47.1