Am 07.04.23 um 05:40 schrieb Huang, Tim:
> [AMD Official Use Only - General]
>
> On Thu, Apr 06, 2023 at 05:39:07PM +0200, Rainer Fiebig wrote:
>> Am 06.04.23 um 15:30 schrieb Linux regression tracking (Thorsten Leemhuis):
>>> [CCing the regression list, as it should be in the loop for regressions:
>>> https://docs.kernel.org/admin-guide/reporting-regressions.html]
>>>
>>> On 06.04.23 14:06, Rainer Fiebig wrote:
>>>> Hi! Since kernel 6.1.22 starting a resume from hibernate by hitting a
>>>> key on the keyboard fails. However, if the PC was switched off and on
>>>> again (or reset), the resume is OK. The APU is a Ryzen 5600G.
>>>>
>>>> Bisecting between 6.1.21/22 turned up this:
>>>>
>>>>
>>>> Author: Tim Huang <tim.huang(a)amd.com>
>>>> Date: Thu Mar 9 16:27:51 2023 +0800
>>>>
>>>> drm/amdgpu: skip ASIC reset for APUs when go to S4
>>>>
>>>> commit b589626674de94d977e81c99bf7905872b991197 upstream.
>>>>
>>>> For GC IP v11.0.4/11, PSP TMR need to be reserved
>>>> for ASIC mode2 reset. But for S4, when psp suspend,
>>>> it will destroy the TMR that fails the ASIC reset.
>>>> [...]
>>>>
>>>>
>>>> Reverting the commit solves the problem.
>>>> Thanks.
>>>
>>> Please try 6.1.23 and report back, because from the thread
>>> https://lore.kernel.org/all/20230330160740.1dbff94b@schienar/
>>> it sounds a lot like "drm/amdgpu: allow more APUs to do mode2 reset when
>>> go to S4" might be fixing this, which went into 6.1.23.
>> Yes, 6.1.23 seems OK so far.
>>
>
>
> The patch " drm/amdgpu: allow more APUs to do mode2 reset when go to S4" is to fix this hibernate regression issue.
>
> Sorry to have troubled you.
No problem, please don't take it personally. It wasn't a big deal and I
was just a bit grumpy yesterday.
Thanks for the info and have a good day!
Rainer
usb_udc_connect_control does not check to see if the udc has already
been started. This causes gadget->ops->pullup to be called through
usb_gadget_connect when invoked from usb_udc_vbus_handler even before
usb_gadget_udc_start is called. Guard this by checking for udc->started
in usb_udc_connect_control before invoking usb_gadget_connect.
Guarding udc->vbus, udc->started, gadget->connect, gadget->deactivate
related functions with connect_lock. usb_gadget_connect_locked,
usb_gadget_disconnect_locked, usb_udc_connect_control_locked,
usb_gadget_udc_start_locked, usb_gadget_udc_stop_locked are called with
this lock held as they can be simulataneously invoked from different code
paths.
Adding an additional check to make sure udc is started(udc->started)
before pullup callback is invoked.
Cc: stable(a)vger.kernel.org
Fixes: 628ef0d273a6 ("usb: udc: add usb_udc_vbus_handler")
Signed-off-by: Badhri Jagan Sridharan <badhri(a)google.com>
---
* Fixed commit message comments.
* Renamed udc_connect_control_lock to connect_lock and made it per
device.
* udc->vbus, udc->started, gadget->connect, gadget->deactivate are all
now guarded by connect_lock.
* Code now checks for udc->started to be set before invoking pullup
callback.
---
drivers/usb/gadget/udc/core.c | 140 +++++++++++++++++++++++-----------
1 file changed, 96 insertions(+), 44 deletions(-)
diff --git a/drivers/usb/gadget/udc/core.c b/drivers/usb/gadget/udc/core.c
index 3dcbba739db6..41d3a1998cff 100644
--- a/drivers/usb/gadget/udc/core.c
+++ b/drivers/usb/gadget/udc/core.c
@@ -37,6 +37,10 @@ static struct bus_type gadget_bus_type;
* @vbus: for udcs who care about vbus status, this value is real vbus status;
* for udcs who do not care about vbus status, this value is always true
* @started: the UDC's started state. True if the UDC had started.
+ * @connect_lock: protects udc->vbus, udc->started, gadget->connect, gadget->deactivate related
+ * functions. usb_gadget_connect_locked, usb_gadget_disconnect_locked,
+ * usb_udc_connect_control_locked, usb_gadget_udc_start_locked, usb_gadget_udc_stop_locked are
+ * called with this lock held.
*
* This represents the internal data structure which is used by the UDC-class
* to hold information about udc driver and gadget together.
@@ -48,6 +52,7 @@ struct usb_udc {
struct list_head list;
bool vbus;
bool started;
+ struct mutex connect_lock;
};
static struct class *udc_class;
@@ -687,17 +692,8 @@ int usb_gadget_vbus_disconnect(struct usb_gadget *gadget)
}
EXPORT_SYMBOL_GPL(usb_gadget_vbus_disconnect);
-/**
- * usb_gadget_connect - software-controlled connect to USB host
- * @gadget:the peripheral being connected
- *
- * Enables the D+ (or potentially D-) pullup. The host will start
- * enumerating this gadget when the pullup is active and a VBUS session
- * is active (the link is powered).
- *
- * Returns zero on success, else negative errno.
- */
-int usb_gadget_connect(struct usb_gadget *gadget)
+/* Internal version of usb_gadget_connect needs to be called with udc_connect_control_lock held. */
+int usb_gadget_connect_locked(struct usb_gadget *gadget)
{
int ret = 0;
@@ -706,10 +702,12 @@ int usb_gadget_connect(struct usb_gadget *gadget)
goto out;
}
- if (gadget->deactivated) {
+ if (gadget->deactivated || !gadget->udc->started) {
/*
* If gadget is deactivated we only save new state.
* Gadget will be connected automatically after activation.
+ *
+ * udc first needs to be started before gadget can be pulled up.
*/
gadget->connected = true;
goto out;
@@ -724,22 +722,31 @@ int usb_gadget_connect(struct usb_gadget *gadget)
return ret;
}
-EXPORT_SYMBOL_GPL(usb_gadget_connect);
/**
- * usb_gadget_disconnect - software-controlled disconnect from USB host
- * @gadget:the peripheral being disconnected
- *
- * Disables the D+ (or potentially D-) pullup, which the host may see
- * as a disconnect (when a VBUS session is active). Not all systems
- * support software pullup controls.
+ * usb_gadget_connect - software-controlled connect to USB host
+ * @gadget:the peripheral being connected
*
- * Following a successful disconnect, invoke the ->disconnect() callback
- * for the current gadget driver so that UDC drivers don't need to.
+ * Enables the D+ (or potentially D-) pullup. The host will start
+ * enumerating this gadget when the pullup is active and a VBUS session
+ * is active (the link is powered).
*
* Returns zero on success, else negative errno.
*/
-int usb_gadget_disconnect(struct usb_gadget *gadget)
+int usb_gadget_connect(struct usb_gadget *gadget)
+{
+ int ret;
+
+ mutex_lock(&gadget->udc->connect_lock);
+ ret = usb_gadget_connect_locked(gadget);
+ mutex_unlock(&gadget->udc->connect_lock);
+
+ return ret;
+}
+EXPORT_SYMBOL_GPL(usb_gadget_connect);
+
+/* Internal version of usb_gadget_disconnect needs to be called with udc->connect_lock held. */
+int usb_gadget_disconnect_locked(struct usb_gadget *gadget)
{
int ret = 0;
@@ -751,10 +758,12 @@ int usb_gadget_disconnect(struct usb_gadget *gadget)
if (!gadget->connected)
goto out;
- if (gadget->deactivated) {
+ if (gadget->deactivated || !gadget->udc->started) {
/*
* If gadget is deactivated we only save new state.
* Gadget will stay disconnected after activation.
+ *
+ * udc should have been started before gadget being pulled down.
*/
gadget->connected = false;
goto out;
@@ -774,6 +783,30 @@ int usb_gadget_disconnect(struct usb_gadget *gadget)
return ret;
}
+
+/**
+ * usb_gadget_disconnect - software-controlled disconnect from USB host
+ * @gadget:the peripheral being disconnected
+ *
+ * Disables the D+ (or potentially D-) pullup, which the host may see
+ * as a disconnect (when a VBUS session is active). Not all systems
+ * support software pullup controls.
+ *
+ * Following a successful disconnect, invoke the ->disconnect() callback
+ * for the current gadget driver so that UDC drivers don't need to.
+ *
+ * Returns zero on success, else negative errno.
+ */
+int usb_gadget_disconnect(struct usb_gadget *gadget)
+{
+ int ret;
+
+ mutex_lock(&gadget->udc->connect_lock);
+ ret = usb_gadget_disconnect_locked(gadget);
+ mutex_unlock(&gadget->udc->connect_lock);
+
+ return ret;
+}
EXPORT_SYMBOL_GPL(usb_gadget_disconnect);
/**
@@ -794,10 +827,11 @@ int usb_gadget_deactivate(struct usb_gadget *gadget)
if (gadget->deactivated)
goto out;
+ mutex_lock(&gadget->udc->connect_lock);
if (gadget->connected) {
- ret = usb_gadget_disconnect(gadget);
+ ret = usb_gadget_disconnect_locked(gadget);
if (ret)
- goto out;
+ goto unlock;
/*
* If gadget was being connected before deactivation, we want
@@ -807,6 +841,8 @@ int usb_gadget_deactivate(struct usb_gadget *gadget)
}
gadget->deactivated = true;
+unlock:
+ mutex_unlock(&gadget->udc->connect_lock);
out:
trace_usb_gadget_deactivate(gadget, ret);
@@ -830,6 +866,7 @@ int usb_gadget_activate(struct usb_gadget *gadget)
if (!gadget->deactivated)
goto out;
+ mutex_lock(&gadget->udc->connect_lock);
gadget->deactivated = false;
/*
@@ -837,7 +874,8 @@ int usb_gadget_activate(struct usb_gadget *gadget)
* while it was being deactivated, we call usb_gadget_connect().
*/
if (gadget->connected)
- ret = usb_gadget_connect(gadget);
+ ret = usb_gadget_connect_locked(gadget);
+ mutex_unlock(&gadget->udc->connect_lock);
out:
trace_usb_gadget_activate(gadget, ret);
@@ -1078,12 +1116,13 @@ EXPORT_SYMBOL_GPL(usb_gadget_set_state);
/* ------------------------------------------------------------------------- */
-static void usb_udc_connect_control(struct usb_udc *udc)
+/* Acquire udc_connect_control_lock before calling this function. */
+static void usb_udc_connect_control_locked(struct usb_udc *udc)
{
- if (udc->vbus)
- usb_gadget_connect(udc->gadget);
+ if (udc->vbus && udc->started)
+ usb_gadget_connect_locked(udc->gadget);
else
- usb_gadget_disconnect(udc->gadget);
+ usb_gadget_disconnect_locked(udc->gadget);
}
/**
@@ -1099,10 +1138,12 @@ void usb_udc_vbus_handler(struct usb_gadget *gadget, bool status)
{
struct usb_udc *udc = gadget->udc;
+ mutex_lock(&udc->connect_lock);
if (udc) {
udc->vbus = status;
- usb_udc_connect_control(udc);
+ usb_udc_connect_control_locked(udc);
}
+ mutex_unlock(&udc->connect_lock);
}
EXPORT_SYMBOL_GPL(usb_udc_vbus_handler);
@@ -1124,7 +1165,7 @@ void usb_gadget_udc_reset(struct usb_gadget *gadget,
EXPORT_SYMBOL_GPL(usb_gadget_udc_reset);
/**
- * usb_gadget_udc_start - tells usb device controller to start up
+ * usb_gadget_udc_start_locked - tells usb device controller to start up
* @udc: The UDC to be started
*
* This call is issued by the UDC Class driver when it's about
@@ -1136,7 +1177,7 @@ EXPORT_SYMBOL_GPL(usb_gadget_udc_reset);
*
* Returns zero on success, else negative errno.
*/
-static inline int usb_gadget_udc_start(struct usb_udc *udc)
+static inline int usb_gadget_udc_start_locked(struct usb_udc *udc)
{
int ret;
@@ -1153,7 +1194,7 @@ static inline int usb_gadget_udc_start(struct usb_udc *udc)
}
/**
- * usb_gadget_udc_stop - tells usb device controller we don't need it anymore
+ * usb_gadget_udc_stop_locked - tells usb device controller we don't need it anymore
* @udc: The UDC to be stopped
*
* This call is issued by the UDC Class driver after calling
@@ -1163,7 +1204,7 @@ static inline int usb_gadget_udc_start(struct usb_udc *udc)
* far as powering off UDC completely and disable its data
* line pullups.
*/
-static inline void usb_gadget_udc_stop(struct usb_udc *udc)
+static inline void usb_gadget_udc_stop_locked(struct usb_udc *udc)
{
if (!udc->started) {
dev_err(&udc->dev, "UDC had already stopped\n");
@@ -1322,6 +1363,7 @@ int usb_add_gadget(struct usb_gadget *gadget)
udc->gadget = gadget;
gadget->udc = udc;
+ mutex_init(&udc->connect_lock);
udc->started = false;
@@ -1523,11 +1565,15 @@ static int gadget_bind_driver(struct device *dev)
if (ret)
goto err_bind;
- ret = usb_gadget_udc_start(udc);
- if (ret)
+ mutex_lock(&udc->connect_lock);
+ ret = usb_gadget_udc_start_locked(udc);
+ if (ret) {
+ mutex_unlock(&udc->connect_lock);
goto err_start;
+ }
usb_gadget_enable_async_callbacks(udc);
- usb_udc_connect_control(udc);
+ usb_udc_connect_control_locked(udc);
+ mutex_unlock(&udc->connect_lock);
kobject_uevent(&udc->dev.kobj, KOBJ_CHANGE);
return 0;
@@ -1558,12 +1604,14 @@ static void gadget_unbind_driver(struct device *dev)
kobject_uevent(&udc->dev.kobj, KOBJ_CHANGE);
- usb_gadget_disconnect(gadget);
+ mutex_lock(&udc->connect_lock);
+ usb_gadget_disconnect_locked(gadget);
usb_gadget_disable_async_callbacks(udc);
if (gadget->irq)
synchronize_irq(gadget->irq);
udc->driver->unbind(gadget);
- usb_gadget_udc_stop(udc);
+ usb_gadget_udc_stop_locked(udc);
+ mutex_unlock(&udc->connect_lock);
mutex_lock(&udc_lock);
driver->is_bound = false;
@@ -1649,11 +1697,15 @@ static ssize_t soft_connect_store(struct device *dev,
}
if (sysfs_streq(buf, "connect")) {
- usb_gadget_udc_start(udc);
- usb_gadget_connect(udc->gadget);
+ mutex_lock(&udc->connect_lock);
+ usb_gadget_udc_start_locked(udc);
+ usb_gadget_connect_locked(udc->gadget);
+ mutex_unlock(&udc->connect_lock);
} else if (sysfs_streq(buf, "disconnect")) {
- usb_gadget_disconnect(udc->gadget);
- usb_gadget_udc_stop(udc);
+ mutex_lock(&udc->connect_lock);
+ usb_gadget_disconnect_locked(udc->gadget);
+ usb_gadget_udc_stop_locked(udc);
+ mutex_unlock(&udc->connect_lock);
} else {
dev_err(dev, "unsupported command '%s'\n", buf);
ret = -EINVAL;
base-commit: d629c0e221cd99198b843d8351a0a9bfec6c0423
--
2.40.0.348.gf938b09366-goog
usb_udc_connect_control does not check to see if the udc has already
been started. This causes gadget->ops->pullup to be called through
usb_gadget_connect when invoked from usb_udc_vbus_handler even before
usb_gadget_udc_start is called. Guard this by checking for udc->started
in usb_udc_connect_control before invoking usb_gadget_connect.
Guarding udc->vbus, udc->started, gadget->connect, gadget->deactivate
related functions with connect_lock. usb_gadget_connect_locked,
usb_gadget_disconnect_locked, usb_udc_connect_control_locked,
usb_gadget_udc_start_locked, usb_gadget_udc_stop_locked are called with
this lock held as they can be simulataneously invoked from different code
paths.
Adding an additional check to make sure udc is started(udc->started)
before pullup callback is invoked.
Cc: stable(a)vger.kernel.org
Fixes: 628ef0d273a6 ("usb: udc: add usb_udc_vbus_handler")
Signed-off-by: Badhri Jagan Sridharan <badhri(a)google.com>
---
Changes since v2:
* Added __must_hold marking for connect_lock
Changes since v1:
* Fixed commit message comments.
* Renamed udc_connect_control_lock to connect_lock and made it per
device.
* udc->vbus, udc->started, gadget->connect, gadget->deactivate are all
now guarded by connect_lock.
* Code now checks for udc->started to be set before invoking pullup
callback.
---
drivers/usb/gadget/udc/core.c | 146 ++++++++++++++++++++++++----------
1 file changed, 102 insertions(+), 44 deletions(-)
diff --git a/drivers/usb/gadget/udc/core.c b/drivers/usb/gadget/udc/core.c
index 3dcbba739db6..3eb5d4e392db 100644
--- a/drivers/usb/gadget/udc/core.c
+++ b/drivers/usb/gadget/udc/core.c
@@ -37,6 +37,10 @@ static struct bus_type gadget_bus_type;
* @vbus: for udcs who care about vbus status, this value is real vbus status;
* for udcs who do not care about vbus status, this value is always true
* @started: the UDC's started state. True if the UDC had started.
+ * @connect_lock: protects udc->vbus, udc->started, gadget->connect, gadget->deactivate related
+ * functions. usb_gadget_connect_locked, usb_gadget_disconnect_locked,
+ * usb_udc_connect_control_locked, usb_gadget_udc_start_locked, usb_gadget_udc_stop_locked are
+ * called with this lock held.
*
* This represents the internal data structure which is used by the UDC-class
* to hold information about udc driver and gadget together.
@@ -48,6 +52,7 @@ struct usb_udc {
struct list_head list;
bool vbus;
bool started;
+ struct mutex connect_lock;
};
static struct class *udc_class;
@@ -687,17 +692,8 @@ int usb_gadget_vbus_disconnect(struct usb_gadget *gadget)
}
EXPORT_SYMBOL_GPL(usb_gadget_vbus_disconnect);
-/**
- * usb_gadget_connect - software-controlled connect to USB host
- * @gadget:the peripheral being connected
- *
- * Enables the D+ (or potentially D-) pullup. The host will start
- * enumerating this gadget when the pullup is active and a VBUS session
- * is active (the link is powered).
- *
- * Returns zero on success, else negative errno.
- */
-int usb_gadget_connect(struct usb_gadget *gadget)
+/* Internal version of usb_gadget_connect needs to be called with connect_lock held. */
+int usb_gadget_connect_locked(struct usb_gadget *gadget) __must_hold(&gadget->udc->connect_lock)
{
int ret = 0;
@@ -706,10 +702,12 @@ int usb_gadget_connect(struct usb_gadget *gadget)
goto out;
}
- if (gadget->deactivated) {
+ if (gadget->deactivated || !gadget->udc->started) {
/*
* If gadget is deactivated we only save new state.
* Gadget will be connected automatically after activation.
+ *
+ * udc first needs to be started before gadget can be pulled up.
*/
gadget->connected = true;
goto out;
@@ -724,22 +722,31 @@ int usb_gadget_connect(struct usb_gadget *gadget)
return ret;
}
-EXPORT_SYMBOL_GPL(usb_gadget_connect);
/**
- * usb_gadget_disconnect - software-controlled disconnect from USB host
- * @gadget:the peripheral being disconnected
- *
- * Disables the D+ (or potentially D-) pullup, which the host may see
- * as a disconnect (when a VBUS session is active). Not all systems
- * support software pullup controls.
+ * usb_gadget_connect - software-controlled connect to USB host
+ * @gadget:the peripheral being connected
*
- * Following a successful disconnect, invoke the ->disconnect() callback
- * for the current gadget driver so that UDC drivers don't need to.
+ * Enables the D+ (or potentially D-) pullup. The host will start
+ * enumerating this gadget when the pullup is active and a VBUS session
+ * is active (the link is powered).
*
* Returns zero on success, else negative errno.
*/
-int usb_gadget_disconnect(struct usb_gadget *gadget)
+int usb_gadget_connect(struct usb_gadget *gadget)
+{
+ int ret;
+
+ mutex_lock(&gadget->udc->connect_lock);
+ ret = usb_gadget_connect_locked(gadget);
+ mutex_unlock(&gadget->udc->connect_lock);
+
+ return ret;
+}
+EXPORT_SYMBOL_GPL(usb_gadget_connect);
+
+/* Internal version of usb_gadget_disconnect needs to be called with connect_lock held. */
+int usb_gadget_disconnect_locked(struct usb_gadget *gadget) __must_hold(&gadget->udc->connect_lock)
{
int ret = 0;
@@ -751,10 +758,12 @@ int usb_gadget_disconnect(struct usb_gadget *gadget)
if (!gadget->connected)
goto out;
- if (gadget->deactivated) {
+ if (gadget->deactivated || !gadget->udc->started) {
/*
* If gadget is deactivated we only save new state.
* Gadget will stay disconnected after activation.
+ *
+ * udc should have been started before gadget being pulled down.
*/
gadget->connected = false;
goto out;
@@ -774,6 +783,30 @@ int usb_gadget_disconnect(struct usb_gadget *gadget)
return ret;
}
+
+/**
+ * usb_gadget_disconnect - software-controlled disconnect from USB host
+ * @gadget:the peripheral being disconnected
+ *
+ * Disables the D+ (or potentially D-) pullup, which the host may see
+ * as a disconnect (when a VBUS session is active). Not all systems
+ * support software pullup controls.
+ *
+ * Following a successful disconnect, invoke the ->disconnect() callback
+ * for the current gadget driver so that UDC drivers don't need to.
+ *
+ * Returns zero on success, else negative errno.
+ */
+int usb_gadget_disconnect(struct usb_gadget *gadget)
+{
+ int ret;
+
+ mutex_lock(&gadget->udc->connect_lock);
+ ret = usb_gadget_disconnect_locked(gadget);
+ mutex_unlock(&gadget->udc->connect_lock);
+
+ return ret;
+}
EXPORT_SYMBOL_GPL(usb_gadget_disconnect);
/**
@@ -794,10 +827,11 @@ int usb_gadget_deactivate(struct usb_gadget *gadget)
if (gadget->deactivated)
goto out;
+ mutex_lock(&gadget->udc->connect_lock);
if (gadget->connected) {
- ret = usb_gadget_disconnect(gadget);
+ ret = usb_gadget_disconnect_locked(gadget);
if (ret)
- goto out;
+ goto unlock;
/*
* If gadget was being connected before deactivation, we want
@@ -807,6 +841,8 @@ int usb_gadget_deactivate(struct usb_gadget *gadget)
}
gadget->deactivated = true;
+unlock:
+ mutex_unlock(&gadget->udc->connect_lock);
out:
trace_usb_gadget_deactivate(gadget, ret);
@@ -830,6 +866,7 @@ int usb_gadget_activate(struct usb_gadget *gadget)
if (!gadget->deactivated)
goto out;
+ mutex_lock(&gadget->udc->connect_lock);
gadget->deactivated = false;
/*
@@ -837,7 +874,8 @@ int usb_gadget_activate(struct usb_gadget *gadget)
* while it was being deactivated, we call usb_gadget_connect().
*/
if (gadget->connected)
- ret = usb_gadget_connect(gadget);
+ ret = usb_gadget_connect_locked(gadget);
+ mutex_unlock(&gadget->udc->connect_lock);
out:
trace_usb_gadget_activate(gadget, ret);
@@ -1078,12 +1116,13 @@ EXPORT_SYMBOL_GPL(usb_gadget_set_state);
/* ------------------------------------------------------------------------- */
-static void usb_udc_connect_control(struct usb_udc *udc)
+/* Acquire connect_lock before calling this function. */
+static void usb_udc_connect_control_locked(struct usb_udc *udc) __must_hold(&udc->connect_lock)
{
- if (udc->vbus)
- usb_gadget_connect(udc->gadget);
+ if (udc->vbus && udc->started)
+ usb_gadget_connect_locked(udc->gadget);
else
- usb_gadget_disconnect(udc->gadget);
+ usb_gadget_disconnect_locked(udc->gadget);
}
/**
@@ -1099,10 +1138,12 @@ void usb_udc_vbus_handler(struct usb_gadget *gadget, bool status)
{
struct usb_udc *udc = gadget->udc;
+ mutex_lock(&udc->connect_lock);
if (udc) {
udc->vbus = status;
- usb_udc_connect_control(udc);
+ usb_udc_connect_control_locked(udc);
}
+ mutex_unlock(&udc->connect_lock);
}
EXPORT_SYMBOL_GPL(usb_udc_vbus_handler);
@@ -1124,7 +1165,7 @@ void usb_gadget_udc_reset(struct usb_gadget *gadget,
EXPORT_SYMBOL_GPL(usb_gadget_udc_reset);
/**
- * usb_gadget_udc_start - tells usb device controller to start up
+ * usb_gadget_udc_start_locked - tells usb device controller to start up
* @udc: The UDC to be started
*
* This call is issued by the UDC Class driver when it's about
@@ -1135,8 +1176,11 @@ EXPORT_SYMBOL_GPL(usb_gadget_udc_reset);
* necessary to have it powered on.
*
* Returns zero on success, else negative errno.
+ *
+ * Caller should acquire connect_lock before invoking this function.
*/
-static inline int usb_gadget_udc_start(struct usb_udc *udc)
+static inline int usb_gadget_udc_start_locked(struct usb_udc *udc)
+ __must_hold(&udc->connect_lock)
{
int ret;
@@ -1153,7 +1197,7 @@ static inline int usb_gadget_udc_start(struct usb_udc *udc)
}
/**
- * usb_gadget_udc_stop - tells usb device controller we don't need it anymore
+ * usb_gadget_udc_stop_locked - tells usb device controller we don't need it anymore
* @udc: The UDC to be stopped
*
* This call is issued by the UDC Class driver after calling
@@ -1162,8 +1206,11 @@ static inline int usb_gadget_udc_start(struct usb_udc *udc)
* The details are implementation specific, but it can go as
* far as powering off UDC completely and disable its data
* line pullups.
+ *
+ * Caller should acquire connect lock before invoking this function.
*/
-static inline void usb_gadget_udc_stop(struct usb_udc *udc)
+static inline void usb_gadget_udc_stop_locked(struct usb_udc *udc)
+ __must_hold(&udc->connect_lock)
{
if (!udc->started) {
dev_err(&udc->dev, "UDC had already stopped\n");
@@ -1322,6 +1369,7 @@ int usb_add_gadget(struct usb_gadget *gadget)
udc->gadget = gadget;
gadget->udc = udc;
+ mutex_init(&udc->connect_lock);
udc->started = false;
@@ -1523,11 +1571,15 @@ static int gadget_bind_driver(struct device *dev)
if (ret)
goto err_bind;
- ret = usb_gadget_udc_start(udc);
- if (ret)
+ mutex_lock(&udc->connect_lock);
+ ret = usb_gadget_udc_start_locked(udc);
+ if (ret) {
+ mutex_unlock(&udc->connect_lock);
goto err_start;
+ }
usb_gadget_enable_async_callbacks(udc);
- usb_udc_connect_control(udc);
+ usb_udc_connect_control_locked(udc);
+ mutex_unlock(&udc->connect_lock);
kobject_uevent(&udc->dev.kobj, KOBJ_CHANGE);
return 0;
@@ -1558,12 +1610,14 @@ static void gadget_unbind_driver(struct device *dev)
kobject_uevent(&udc->dev.kobj, KOBJ_CHANGE);
- usb_gadget_disconnect(gadget);
+ mutex_lock(&udc->connect_lock);
+ usb_gadget_disconnect_locked(gadget);
usb_gadget_disable_async_callbacks(udc);
if (gadget->irq)
synchronize_irq(gadget->irq);
udc->driver->unbind(gadget);
- usb_gadget_udc_stop(udc);
+ usb_gadget_udc_stop_locked(udc);
+ mutex_unlock(&udc->connect_lock);
mutex_lock(&udc_lock);
driver->is_bound = false;
@@ -1649,11 +1703,15 @@ static ssize_t soft_connect_store(struct device *dev,
}
if (sysfs_streq(buf, "connect")) {
- usb_gadget_udc_start(udc);
- usb_gadget_connect(udc->gadget);
+ mutex_lock(&udc->connect_lock);
+ usb_gadget_udc_start_locked(udc);
+ usb_gadget_connect_locked(udc->gadget);
+ mutex_unlock(&udc->connect_lock);
} else if (sysfs_streq(buf, "disconnect")) {
- usb_gadget_disconnect(udc->gadget);
- usb_gadget_udc_stop(udc);
+ mutex_lock(&udc->connect_lock);
+ usb_gadget_disconnect_locked(udc->gadget);
+ usb_gadget_udc_stop_locked(udc);
+ mutex_unlock(&udc->connect_lock);
} else {
dev_err(dev, "unsupported command '%s'\n", buf);
ret = -EINVAL;
base-commit: d629c0e221cd99198b843d8351a0a9bfec6c0423
--
2.40.0.577.gac1e443424-goog
The si->lock must be held when deleting the si from
the available list. Otherwise, another thread can
re-add the si to the available list, which can lead
to memory corruption. The only place we have found
where this happens is in the swapoff path. This case
can be described as below:
core 0 core 1
swapoff
del_from_avail_list(si) waiting
try lock si->lock acquire swap_avail_lock
and re-add si into
swap_avail_head
acquire si->lock but
missing si already be
added again, and continuing
to clear SWP_WRITEOK, etc.
It can be easily found a massive warning messages can
be triggered inside get_swap_pages() by some special
cases, for example, we call madvise(MADV_PAGEOUT) on
blocks of touched memory concurrently, meanwhile, run
much swapon-swapoff operations (e.g. stress-ng-swap).
However, in the worst case, panic can be caused by the
above scene. In swapoff(), the memory used by si could
be kept in swap_info[] after turning off a swap. This
means memory corruption will not be caused immediately
until allocated and reset for a new swap in the swapon
path. A panic message caused:
(with CONFIG_PLIST_DEBUG enabled)
------------[ cut here ]------------
top: 00000000e58a3003, n: 0000000013e75cda, p: 000000008cd4451a
prev: 0000000035b1e58a, n: 000000008cd4451a, p: 000000002150ee8d
next: 000000008cd4451a, n: 000000008cd4451a, p: 000000008cd4451a
WARNING: CPU: 21 PID: 1843 at lib/plist.c:60 plist_check_prev_next_node+0x50/0x70
Modules linked in: rfkill(E) crct10dif_ce(E)...
CPU: 21 PID: 1843 Comm: stress-ng Kdump: ... 5.10.134+
Hardware name: Alibaba Cloud ECS, BIOS 0.0.0 02/06/2015
pstate: 60400005 (nZCv daif +PAN -UAO -TCO BTYPE=--)
pc : plist_check_prev_next_node+0x50/0x70
lr : plist_check_prev_next_node+0x50/0x70
sp : ffff0018009d3c30
x29: ffff0018009d3c40 x28: ffff800011b32a98
x27: 0000000000000000 x26: ffff001803908000
x25: ffff8000128ea088 x24: ffff800011b32a48
x23: 0000000000000028 x22: ffff001800875c00
x21: ffff800010f9e520 x20: ffff001800875c00
x19: ffff001800fdc6e0 x18: 0000000000000030
x17: 0000000000000000 x16: 0000000000000000
x15: 0736076307640766 x14: 0730073007380731
x13: 0736076307640766 x12: 0730073007380731
x11: 000000000004058d x10: 0000000085a85b76
x9 : ffff8000101436e4 x8 : ffff800011c8ce08
x7 : 0000000000000000 x6 : 0000000000000001
x5 : ffff0017df9ed338 x4 : 0000000000000001
x3 : ffff8017ce62a000 x2 : ffff0017df9ed340
x1 : 0000000000000000 x0 : 0000000000000000
Call trace:
plist_check_prev_next_node+0x50/0x70
plist_check_head+0x80/0xf0
plist_add+0x28/0x140
add_to_avail_list+0x9c/0xf0
_enable_swap_info+0x78/0xb4
__do_sys_swapon+0x918/0xa10
__arm64_sys_swapon+0x20/0x30
el0_svc_common+0x8c/0x220
do_el0_svc+0x2c/0x90
el0_svc+0x1c/0x30
el0_sync_handler+0xa8/0xb0
el0_sync+0x148/0x180
irq event stamp: 2082270
Now, si->lock locked before calling 'del_from_avail_list()'
to make sure other thread see the si had been deleted
and SWP_WRITEOK cleared together, will not reinsert again.
This problem exists in versions after stable 5.10.y.
Cc: stable(a)vger.kernel.org
Tested-by: Yongchen Yin <wb-yyc939293(a)alibaba-inc.com>
Signed-off-by: Rongwei Wang <rongwei.wang(a)linux.alibaba.com>
---
mm/swapfile.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/mm/swapfile.c b/mm/swapfile.c
index 62ba2bf577d7..2c718f45745f 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -679,6 +679,7 @@ static void __del_from_avail_list(struct swap_info_struct *p)
{
int nid;
+ assert_spin_locked(&p->lock);
for_each_node(nid)
plist_del(&p->avail_lists[nid], &swap_avail_heads[nid]);
}
@@ -2434,8 +2435,8 @@ SYSCALL_DEFINE1(swapoff, const char __user *, specialfile)
spin_unlock(&swap_lock);
goto out_dput;
}
- del_from_avail_list(p);
spin_lock(&p->lock);
+ del_from_avail_list(p);
if (p->prio < 0) {
struct swap_info_struct *si = p;
int nid;
--
2.27.0
sfp->i2c_block_size is initialized at SFP module insertion in
sfp_sm_mod_probe(). Because of that, if SFP module was not inserted
since boot, ethtool -m leads to zero-length I2C read attempt.
# ethtool -m xge0
i2c i2c-3: adapter quirk: no zero length (addr 0x0050, size 0, read)
Cannot get Module EEPROM data: Operation not supported
If SFP module was plugged then removed at least once,
sfp->i2c_block_size will be initialized and ethtool -m will fail with
different error
# ethtool -m xge0
Cannot get Module EEPROM data: Remote I/O error
Fix this by initializing sfp->i2_block_size at struct sfp allocation
stage so ethtool -m with SFP module removed will fail the same way, i.e.
-EREMOTEIO, in both cases and without errors from I2C adapter.
Signed-off-by: Ivan Bornyakov <i.bornyakov(a)metrotek.ru>
Fixes: 0d035bed2a4a ("net: sfp: VSOL V2801F / CarlitoxxPro CPGOS03-0490 v2.0 workaround")
Cc: stable(a)vger.kernel.org
---
drivers/net/phy/sfp.c | 13 ++++++++-----
1 file changed, 8 insertions(+), 5 deletions(-)
diff --git a/drivers/net/phy/sfp.c b/drivers/net/phy/sfp.c
index 40c9a64c5e30..5663a184644d 100644
--- a/drivers/net/phy/sfp.c
+++ b/drivers/net/phy/sfp.c
@@ -212,6 +212,12 @@ static const enum gpiod_flags gpio_flags[] = {
#define SFP_PHY_ADDR 22
#define SFP_PHY_ADDR_ROLLBALL 17
+/* SFP_EEPROM_BLOCK_SIZE is the size of data chunk to read the EEPROM
+ * at a time. Some SFP modules and also some Linux I2C drivers do not like
+ * reads longer than 16 bytes.
+ */
+#define SFP_EEPROM_BLOCK_SIZE 16
+
struct sff_data {
unsigned int gpios;
bool (*module_supported)(const struct sfp_eeprom_id *id);
@@ -1928,11 +1934,7 @@ static int sfp_sm_mod_probe(struct sfp *sfp, bool report)
u8 check;
int ret;
- /* Some SFP modules and also some Linux I2C drivers do not like reads
- * longer than 16 bytes, so read the EEPROM in chunks of 16 bytes at
- * a time.
- */
- sfp->i2c_block_size = 16;
+ sfp->i2c_block_size = SFP_EEPROM_BLOCK_SIZE;
ret = sfp_read(sfp, false, 0, &id.base, sizeof(id.base));
if (ret < 0) {
@@ -2615,6 +2617,7 @@ static struct sfp *sfp_alloc(struct device *dev)
return ERR_PTR(-ENOMEM);
sfp->dev = dev;
+ sfp->i2c_block_size = SFP_EEPROM_BLOCK_SIZE;
mutex_init(&sfp->sm_mutex);
mutex_init(&sfp->st_mutex);
--
2.39.2
The patch titled
Subject: mm/huge_memory.c: warn with pr_warn_ratelimited instead of VM_WARN_ON_ONCE_FOLIO
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
mm-huge_memoryc-warn-with-pr_warn_ratelimited-instead-of-vm_warn_on_once_folio.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Naoya Horiguchi <naoya.horiguchi(a)nec.com>
Subject: mm/huge_memory.c: warn with pr_warn_ratelimited instead of VM_WARN_ON_ONCE_FOLIO
Date: Thu, 6 Apr 2023 17:20:04 +0900
split_huge_page_to_list() WARNs when called for huge zero pages, which
sounds to me too harsh because it does not imply a kernel bug, but just
notifies the event to admins. On the other hand, this is considered as
critical by syzkaller and makes its testing less efficient, which seems to
me harmful.
So replace the VM_WARN_ON_ONCE_FOLIO with pr_warn_ratelimited.
Link: https://lkml.kernel.org/r/20230406082004.2185420-1-naoya.horiguchi@linux.dev
Fixes: 478d134e9506 ("mm/huge_memory: do not overkill when splitting huge_zero_page")
Signed-off-by: Naoya Horiguchi <naoya.horiguchi(a)nec.com>
Reported-by: syzbot+07a218429c8d19b1fb25(a)syzkaller.appspotmail.com
Link: https://lore.kernel.org/lkml/000000000000a6f34a05e6efcd01@google.com/
Cc: Miaohe Lin <linmiaohe(a)huawei.com>
Cc: Tetsuo Handa <penguin-kernel(a)i-love.sakura.ne.jp>
Cc: Xu Yu <xuyu(a)linux.alibaba.com>
Cc: Yang Shi <shy828301(a)gmail.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/huge_memory.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
--- a/mm/huge_memory.c~mm-huge_memoryc-warn-with-pr_warn_ratelimited-instead-of-vm_warn_on_once_folio
+++ a/mm/huge_memory.c
@@ -2665,9 +2665,10 @@ int split_huge_page_to_list(struct page
VM_BUG_ON_FOLIO(!folio_test_large(folio), folio);
is_hzp = is_huge_zero_page(&folio->page);
- VM_WARN_ON_ONCE_FOLIO(is_hzp, folio);
- if (is_hzp)
+ if (is_hzp) {
+ pr_warn_ratelimited("Called split_huge_page for huge zero page\n");
return -EBUSY;
+ }
if (folio_test_writeback(folio))
return -EBUSY;
_
Patches currently in -mm which might be from naoya.horiguchi(a)nec.com are
mm-huge_memoryc-warn-with-pr_warn_ratelimited-instead-of-vm_warn_on_once_folio.patch
We keep track of different types of reclaimed pages through
reclaim_state->reclaimed_slab, and we add them to the reported number
of reclaimed pages. For non-memcg reclaim, this makes sense. For memcg
reclaim, we have no clue if those pages are charged to the memcg under
reclaim.
Slab pages are shared by different memcgs, so a freed slab page may have
only been partially charged to the memcg under reclaim. The same goes for
clean file pages from pruned inodes (on highmem systems) or xfs buffer
pages, there is no simple way to currently link them to the memcg under
reclaim.
Stop reporting those freed pages as reclaimed pages during memcg reclaim.
This should make the return value of writing to memory.reclaim, and may
help reduce unnecessary reclaim retries during memcg charging. Writing to
memory.reclaim on the root memcg is considered as cgroup_reclaim(), but
for this case we want to include any freed pages, so use the
global_reclaim() check instead of !cgroup_reclaim().
Generally, this should make the return value of
try_to_free_mem_cgroup_pages() more accurate. In some limited cases (e.g.
freed a slab page that was mostly charged to the memcg under reclaim),
the return value of try_to_free_mem_cgroup_pages() can be underestimated,
but this should be fine. The freed pages will be uncharged anyway, and we
can charge the memcg the next time around as we usually do memcg reclaim
in a retry loop.
The next patch performs some cleanups around reclaim_state and adds an
elaborate comment explaining this to the code. This patch is kept
minimal for easy backporting.
Signed-off-by: Yosry Ahmed <yosryahmed(a)google.com>
Cc: stable(a)vger.kernel.org
---
global_reclaim(sc) does not exist in kernels before 6.3. It can be
replaced with:
!cgroup_reclaim(sc) || mem_cgroup_is_root(sc->target_mem_cgroup)
---
mm/vmscan.c | 8 +++++---
1 file changed, 5 insertions(+), 3 deletions(-)
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 9c1c5e8b24b8f..c82bd89f90364 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -5346,8 +5346,10 @@ static int shrink_one(struct lruvec *lruvec, struct scan_control *sc)
vmpressure(sc->gfp_mask, memcg, false, sc->nr_scanned - scanned,
sc->nr_reclaimed - reclaimed);
- sc->nr_reclaimed += current->reclaim_state->reclaimed_slab;
- current->reclaim_state->reclaimed_slab = 0;
+ if (global_reclaim(sc)) {
+ sc->nr_reclaimed += current->reclaim_state->reclaimed_slab;
+ current->reclaim_state->reclaimed_slab = 0;
+ }
return success ? MEMCG_LRU_YOUNG : 0;
}
@@ -6472,7 +6474,7 @@ static void shrink_node(pg_data_t *pgdat, struct scan_control *sc)
shrink_node_memcgs(pgdat, sc);
- if (reclaim_state) {
+ if (reclaim_state && global_reclaim(sc)) {
sc->nr_reclaimed += reclaim_state->reclaimed_slab;
reclaim_state->reclaimed_slab = 0;
}
--
2.40.0.348.gf938b09366-goog
When the loop over the VMA is terminated early due to an error, the
return code could be overwritten with ENOMEM. Fix the return code by
only setting the error on early loop termination when the error is not
set.
Fixes: 2286a6914c77 ("mm: change mprotect_fixup to vma iterator")
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Liam R. Howlett <Liam.Howlett(a)oracle.com>
---
mm/mprotect.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/mm/mprotect.c b/mm/mprotect.c
index 13e84d8c0797..36351a00c0e8 100644
--- a/mm/mprotect.c
+++ b/mm/mprotect.c
@@ -838,7 +838,7 @@ static int do_mprotect_pkey(unsigned long start, size_t len,
}
tlb_finish_mmu(&tlb);
- if (vma_iter_end(&vmi) < end)
+ if (!error && vma_iter_end(&vmi) < end)
error = -ENOMEM;
out:
--
2.39.2
The patch titled
Subject: mm/mprotect: fix do_mprotect_pkey() return on error
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
mm-mprotect-fix-do_mprotect_pkey-return-on-error.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: "Liam R. Howlett" <Liam.Howlett(a)oracle.com>
Subject: mm/mprotect: fix do_mprotect_pkey() return on error
Date: Thu, 6 Apr 2023 15:30:50 -0400
When the loop over the VMA is terminated early due to an error, the return
code could be overwritten with ENOMEM. Fix the return code by only
setting the error on early loop termination when the error is not set.
User-visible effects include: attempts to run mprotect() against a special
mapping or with a poorly-aligned hugetlb address should return -EINVAL,
but they presently return -ENOMEM.
Link: https://lkml.kernel.org/r/20230406193050.1363476-1-Liam.Howlett@oracle.com
Fixes: 2286a6914c77 ("mm: change mprotect_fixup to vma iterator")
Signed-off-by: Liam R. Howlett <Liam.Howlett(a)oracle.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/mprotect.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/mm/mprotect.c~mm-mprotect-fix-do_mprotect_pkey-return-on-error
+++ a/mm/mprotect.c
@@ -838,7 +838,7 @@ static int do_mprotect_pkey(unsigned lon
}
tlb_finish_mmu(&tlb);
- if (vma_iter_end(&vmi) < end)
+ if (!error && vma_iter_end(&vmi) < end)
error = -ENOMEM;
out:
_
Patches currently in -mm which might be from Liam.Howlett(a)oracle.com are
mm-mprotect-fix-do_mprotect_pkey-return-on-error.patch
By default the indirect state sampler data (border colors) are stored
in the same heap as the SAMPLER_STATE structure. For userspace drivers
that can be 2 different heaps (dynamic state heap & bindless sampler
state heap). This means that border colors have to copied in 2
different places so that the same SAMPLER_STATE structure find the
right data.
This change is forcing the indirect state sampler data to only be in
the dynamic state pool (more convinient for userspace drivers, they
only have to have one copy of the border colors). This is reproducing
the behavior of the Windows drivers.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin(a)intel.com>
Cc: stable(a)vger.kernel.org
---
drivers/gpu/drm/i915/gt/intel_gt_regs.h | 1 +
drivers/gpu/drm/i915/gt/intel_workarounds.c | 17 +++++++++++++++++
2 files changed, 18 insertions(+)
diff --git a/drivers/gpu/drm/i915/gt/intel_gt_regs.h b/drivers/gpu/drm/i915/gt/intel_gt_regs.h
index 08d76aa06974c..1aaa471d08c56 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt_regs.h
+++ b/drivers/gpu/drm/i915/gt/intel_gt_regs.h
@@ -1141,6 +1141,7 @@
#define ENABLE_SMALLPL REG_BIT(15)
#define SC_DISABLE_POWER_OPTIMIZATION_EBB REG_BIT(9)
#define GEN11_SAMPLER_ENABLE_HEADLESS_MSG REG_BIT(5)
+#define GEN11_INDIRECT_STATE_BASE_ADDR_OVERRIDE REG_BIT(0)
#define GEN9_HALF_SLICE_CHICKEN7 MCR_REG(0xe194)
#define DG2_DISABLE_ROUND_ENABLE_ALLOW_FOR_SSLA REG_BIT(15)
diff --git a/drivers/gpu/drm/i915/gt/intel_workarounds.c b/drivers/gpu/drm/i915/gt/intel_workarounds.c
index 32aa1647721ae..734b64e714647 100644
--- a/drivers/gpu/drm/i915/gt/intel_workarounds.c
+++ b/drivers/gpu/drm/i915/gt/intel_workarounds.c
@@ -2542,6 +2542,23 @@ rcs_engine_wa_init(struct intel_engine_cs *engine, struct i915_wa_list *wal)
ENABLE_SMALLPL);
}
+ if (GRAPHICS_VER(i915) >= 11) {
+ /* This is not a Wa (although referred to as
+ * WaSetInidrectStateOverride in places), this allows
+ * applications that reference sampler states through
+ * the BindlessSamplerStateBaseAddress to have their
+ * border color relative to DynamicStateBaseAddress
+ * rather than BindlessSamplerStateBaseAddress.
+ *
+ * Otherwise SAMPLER_STATE border colors have to be
+ * copied in multiple heaps (DynamicStateBaseAddress &
+ * BindlessSamplerStateBaseAddress)
+ */
+ wa_mcr_masked_en(wal,
+ GEN10_SAMPLER_MODE,
+ GEN11_INDIRECT_STATE_BASE_ADDR_OVERRIDE);
+ }
+
if (GRAPHICS_VER(i915) == 11) {
/* This is not an Wa. Enable for better image quality */
wa_masked_en(wal,
--
2.34.1
The rust_fmt_argument function is called from printk() to handle the %pA
format specifier.
Since it's called from C, we should mark it extern "C" to make sure it's
ABI compatible.
Cc: stable(a)vger.kernel.org
Fixes: 247b365dc8dc ("rust: add `kernel` crate")
Signed-off-by: David Gow <davidgow(a)google.com>
---
See discussion in:
https://github.com/Rust-for-Linux/linux/pull/967
and
https://github.com/Rust-for-Linux/linux/pull/966
---
rust/kernel/print.rs | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/rust/kernel/print.rs b/rust/kernel/print.rs
index 30103325696d..ec457f0952fe 100644
--- a/rust/kernel/print.rs
+++ b/rust/kernel/print.rs
@@ -18,7 +18,7 @@ use crate::bindings;
// Called from `vsprintf` with format specifier `%pA`.
#[no_mangle]
-unsafe fn rust_fmt_argument(buf: *mut c_char, end: *mut c_char, ptr: *const c_void) -> *mut c_char {
+unsafe extern "C" fn rust_fmt_argument(buf: *mut c_char, end: *mut c_char, ptr: *const c_void) -> *mut c_char {
use fmt::Write;
// SAFETY: The C contract guarantees that `buf` is valid if it's less than `end`.
let mut w = unsafe { RawFormatter::from_ptrs(buf.cast(), end.cast()) };
--
2.39.1.581.gbfd45094c4-goog
Since the introduction of scrub interface, the only flag that we support
is BTRFS_SCRUB_READONLY.
Thus there is no sanity checks, if there are some undefined flags passed
in, we just ignore them.
This is problematic if we want to introduce new scrub flags, as we have
no way to determine if such flags are supported.
Thus this patch would address the problem by introducing a check for the
flags, and if unsupported flags are set, return -EOPNOTSUPP to inform
the user space.
This check should be backported for all supported kernels before any new
scrub flags are introduced.
CC: stable(a)vger.kernel.org
Signed-off-by: Qu Wenruo <wqu(a)suse.com>
---
fs/btrfs/ioctl.c | 5 +++++
include/uapi/linux/btrfs.h | 1 +
2 files changed, 6 insertions(+)
diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index ba769a1eb87a..25833b4eeaf5 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -3161,6 +3161,11 @@ static long btrfs_ioctl_scrub(struct file *file, void __user *arg)
if (IS_ERR(sa))
return PTR_ERR(sa);
+ if (sa->flags & ~BTRFS_SCRUB_SUPPORTED_FLAGS) {
+ ret = -EOPNOTSUPP;
+ goto out;
+ }
+
if (!(sa->flags & BTRFS_SCRUB_READONLY)) {
ret = mnt_want_write_file(file);
if (ret)
diff --git a/include/uapi/linux/btrfs.h b/include/uapi/linux/btrfs.h
index ada0a489bf2b..dbb8b96da50d 100644
--- a/include/uapi/linux/btrfs.h
+++ b/include/uapi/linux/btrfs.h
@@ -187,6 +187,7 @@ struct btrfs_scrub_progress {
};
#define BTRFS_SCRUB_READONLY 1
+#define BTRFS_SCRUB_SUPPORTED_FLAGS (BTRFS_SCRUB_READONLY)
struct btrfs_ioctl_scrub_args {
__u64 devid; /* in */
__u64 start; /* in */
--
2.39.2
sfp->i2c_block_size is initialized at SFP module insertion in
sfp_sm_mod_probe(). Because of that, if SFP module was never inserted
since boot, sfp_read() call will lead to zero-length I2C read attempt,
and not all I2C controllers are happy with zero-length reads.
One way to issue sfp_read() on empty SFP cage is to execute ethtool -m.
If SFP module was never plugged since boot, there will be a zero-length
I2C read attempt.
# ethtool -m xge0
i2c i2c-3: adapter quirk: no zero length (addr 0x0050, size 0, read)
Cannot get Module EEPROM data: Operation not supported
If SFP module was plugged then removed at least once,
sfp->i2c_block_size will be initialized and ethtool -m will fail with
different exit code and without I2C error
# ethtool -m xge0
Cannot get Module EEPROM data: Remote I/O error
Fix this by initializing sfp->i2_block_size at struct sfp allocation
stage so no wild sfp_read() could issue zero-length I2C read.
Signed-off-by: Ivan Bornyakov <i.bornyakov(a)metrotek.ru>
Fixes: 0d035bed2a4a ("net: sfp: VSOL V2801F / CarlitoxxPro CPGOS03-0490 v2.0 workaround")
Cc: stable(a)vger.kernel.org
---
drivers/net/phy/sfp.c | 13 ++++++++-----
1 file changed, 8 insertions(+), 5 deletions(-)
diff --git a/drivers/net/phy/sfp.c b/drivers/net/phy/sfp.c
index 40c9a64c5e30..5663a184644d 100644
--- a/drivers/net/phy/sfp.c
+++ b/drivers/net/phy/sfp.c
@@ -212,6 +212,12 @@ static const enum gpiod_flags gpio_flags[] = {
#define SFP_PHY_ADDR 22
#define SFP_PHY_ADDR_ROLLBALL 17
+/* SFP_EEPROM_BLOCK_SIZE is the size of data chunk to read the EEPROM
+ * at a time. Some SFP modules and also some Linux I2C drivers do not like
+ * reads longer than 16 bytes.
+ */
+#define SFP_EEPROM_BLOCK_SIZE 16
+
struct sff_data {
unsigned int gpios;
bool (*module_supported)(const struct sfp_eeprom_id *id);
@@ -1928,11 +1934,7 @@ static int sfp_sm_mod_probe(struct sfp *sfp, bool report)
u8 check;
int ret;
- /* Some SFP modules and also some Linux I2C drivers do not like reads
- * longer than 16 bytes, so read the EEPROM in chunks of 16 bytes at
- * a time.
- */
- sfp->i2c_block_size = 16;
+ sfp->i2c_block_size = SFP_EEPROM_BLOCK_SIZE;
ret = sfp_read(sfp, false, 0, &id.base, sizeof(id.base));
if (ret < 0) {
@@ -2615,6 +2617,7 @@ static struct sfp *sfp_alloc(struct device *dev)
return ERR_PTR(-ENOMEM);
sfp->dev = dev;
+ sfp->i2c_block_size = SFP_EEPROM_BLOCK_SIZE;
mutex_init(&sfp->sm_mutex);
mutex_init(&sfp->st_mutex);
--
2.39.2
From: Daniel Vetter <daniel.vetter(a)ffwll.ch>
Instead of calling aperture_remove_conflicting_devices() to remove the
conflicting devices, just call to aperture_detach_devices() to detach
the device that matches the same PCI BAR / aperture range. Since the
former is just a wrapper of the latter plus a sysfb_disable() call,
and now that's done in this function but only for the primary devices.
This fixes a regression introduced by commit ee7a69aa38d8 ("fbdev:
Disable sysfb device registration when removing conflicting FBs"),
where we remove the sysfb when loading a driver for an unrelated pci
device, resulting in the user losing their efifb console or similar.
Note that in practice this only is a problem with the nvidia blob,
because that's the only gpu driver people might install which does not
come with an fbdev driver of it's own. For everyone else the real gpu
driver will restore a working console.
Also note that in the referenced bug there's confusion that this same
bug also happens on amdgpu. But that was just another amdgpu specific
regression, which just happened to happen at roughly the same time and
with the same user-observable symptoms. That bug is fixed now, see
https://bugzilla.kernel.org/show_bug.cgi?id=216331#c15
Note that we should not have any such issues on non-pci multi-gpu
issues, because I could only find two such cases:
- SoC with some external panel over spi or similar. These panel
drivers do not use drm_aperture_remove_conflicting_framebuffers(),
so no problem.
- vga+mga, which is a direct console driver and entirely bypasses all
this.
For the above reasons the cc: stable is just notionally, this patch
will need a backport and that's up to nvidia if they care enough.
v2:
- Explain a bit better why other multi-gpu that aren't pci shouldn't
have any issues with making all this fully pci specific.
v3
- polish commit message (Javier)
v4:
- Fix commit message style (i.e., commit 1234 ("..."))
- fix Daniel's S-o-b address
v5:
- add back an S-o-b tag with Daniel's Intel address
Fixes: ee7a69aa38d8 ("fbdev: Disable sysfb device registration when removing conflicting FBs")
Tested-by: Aaron Plattner <aplattner(a)nvidia.com>
Reviewed-by: Javier Martinez Canillas <javierm(a)redhat.com>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=216303#c28
Signed-off-by: Daniel Vetter <daniel.vetter(a)ffwll.ch>
Signed-off-by: Daniel Vetter <daniel.vetter(a)intel.com>
Cc: Aaron Plattner <aplattner(a)nvidia.com>
Cc: Javier Martinez Canillas <javierm(a)redhat.com>
Cc: Thomas Zimmermann <tzimmermann(a)suse.de>
Cc: Helge Deller <deller(a)gmx.de>
Cc: Sam Ravnborg <sam(a)ravnborg.org>
Cc: Alex Deucher <alexander.deucher(a)amd.com>
Cc: <stable(a)vger.kernel.org> # v5.19+ (if someone else does the backport)
---
drivers/video/aperture.c | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/drivers/video/aperture.c b/drivers/video/aperture.c
index 1356f0e88241..e4091688b5eb 100644
--- a/drivers/video/aperture.c
+++ b/drivers/video/aperture.c
@@ -322,15 +322,16 @@ int aperture_remove_conflicting_pci_devices(struct pci_dev *pdev, const char *na
if (pdev == vga_default_device())
primary = true;
+ if (primary)
+ sysfb_disable();
+
for (bar = 0; bar < PCI_STD_NUM_BARS; ++bar) {
if (!(pci_resource_flags(pdev, bar) & IORESOURCE_MEM))
continue;
base = pci_resource_start(pdev, bar);
size = pci_resource_len(pdev, bar);
- ret = aperture_remove_conflicting_devices(base, size, name);
- if (ret)
- return ret;
+ aperture_detach_devices(base, size);
}
if (primary) {
--
2.40.0
From: Douglas Raillard <douglas.raillard(a)arm.com>
[ Upstream commit 0b04d4c0542e8573a837b1d81b94209e48723b25 ]
Fix the nid_t field so that its size is correctly reported in the text
format embedded in trace.dat files. As it stands, it is reported as
being of size 4:
field:nid_t nid[3]; offset:24; size:4; signed:0;
Instead of 12:
field:nid_t nid[3]; offset:24; size:12; signed:0;
This also fixes the reported offset of subsequent fields so that they
match with the actual struct layout.
Signed-off-by: Douglas Raillard <douglas.raillard(a)arm.com>
Reviewed-by: Mukesh Ojha <quic_mojha(a)quicinc.com>
Reviewed-by: Chao Yu <chao(a)kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk(a)kernel.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
include/trace/events/f2fs.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/include/trace/events/f2fs.h b/include/trace/events/f2fs.h
index 7ab40491485bc..8ecfc8e68507d 100644
--- a/include/trace/events/f2fs.h
+++ b/include/trace/events/f2fs.h
@@ -485,7 +485,7 @@ TRACE_EVENT(f2fs_truncate_partial_nodes,
TP_STRUCT__entry(
__field(dev_t, dev)
__field(ino_t, ino)
- __field(nid_t, nid[3])
+ __array(nid_t, nid, 3)
__field(int, depth)
__field(int, err)
),
--
2.39.2
From: Douglas Raillard <douglas.raillard(a)arm.com>
[ Upstream commit 0b04d4c0542e8573a837b1d81b94209e48723b25 ]
Fix the nid_t field so that its size is correctly reported in the text
format embedded in trace.dat files. As it stands, it is reported as
being of size 4:
field:nid_t nid[3]; offset:24; size:4; signed:0;
Instead of 12:
field:nid_t nid[3]; offset:24; size:12; signed:0;
This also fixes the reported offset of subsequent fields so that they
match with the actual struct layout.
Signed-off-by: Douglas Raillard <douglas.raillard(a)arm.com>
Reviewed-by: Mukesh Ojha <quic_mojha(a)quicinc.com>
Reviewed-by: Chao Yu <chao(a)kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk(a)kernel.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
include/trace/events/f2fs.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/include/trace/events/f2fs.h b/include/trace/events/f2fs.h
index 52e6456bdb922..098d6dff20bef 100644
--- a/include/trace/events/f2fs.h
+++ b/include/trace/events/f2fs.h
@@ -498,7 +498,7 @@ TRACE_EVENT(f2fs_truncate_partial_nodes,
TP_STRUCT__entry(
__field(dev_t, dev)
__field(ino_t, ino)
- __field(nid_t, nid[3])
+ __array(nid_t, nid, 3)
__field(int, depth)
__field(int, err)
),
--
2.39.2
From: Douglas Raillard <douglas.raillard(a)arm.com>
[ Upstream commit 0b04d4c0542e8573a837b1d81b94209e48723b25 ]
Fix the nid_t field so that its size is correctly reported in the text
format embedded in trace.dat files. As it stands, it is reported as
being of size 4:
field:nid_t nid[3]; offset:24; size:4; signed:0;
Instead of 12:
field:nid_t nid[3]; offset:24; size:12; signed:0;
This also fixes the reported offset of subsequent fields so that they
match with the actual struct layout.
Signed-off-by: Douglas Raillard <douglas.raillard(a)arm.com>
Reviewed-by: Mukesh Ojha <quic_mojha(a)quicinc.com>
Reviewed-by: Chao Yu <chao(a)kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk(a)kernel.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
include/trace/events/f2fs.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/include/trace/events/f2fs.h b/include/trace/events/f2fs.h
index a7613efc271ab..88266a7fbad26 100644
--- a/include/trace/events/f2fs.h
+++ b/include/trace/events/f2fs.h
@@ -499,7 +499,7 @@ TRACE_EVENT(f2fs_truncate_partial_nodes,
TP_STRUCT__entry(
__field(dev_t, dev)
__field(ino_t, ino)
- __field(nid_t, nid[3])
+ __array(nid_t, nid, 3)
__field(int, depth)
__field(int, err)
),
--
2.39.2
From: Douglas Raillard <douglas.raillard(a)arm.com>
[ Upstream commit 0b04d4c0542e8573a837b1d81b94209e48723b25 ]
Fix the nid_t field so that its size is correctly reported in the text
format embedded in trace.dat files. As it stands, it is reported as
being of size 4:
field:nid_t nid[3]; offset:24; size:4; signed:0;
Instead of 12:
field:nid_t nid[3]; offset:24; size:12; signed:0;
This also fixes the reported offset of subsequent fields so that they
match with the actual struct layout.
Signed-off-by: Douglas Raillard <douglas.raillard(a)arm.com>
Reviewed-by: Mukesh Ojha <quic_mojha(a)quicinc.com>
Reviewed-by: Chao Yu <chao(a)kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk(a)kernel.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
include/trace/events/f2fs.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/include/trace/events/f2fs.h b/include/trace/events/f2fs.h
index df293bc7f03b8..e8cd19e91de11 100644
--- a/include/trace/events/f2fs.h
+++ b/include/trace/events/f2fs.h
@@ -513,7 +513,7 @@ TRACE_EVENT(f2fs_truncate_partial_nodes,
TP_STRUCT__entry(
__field(dev_t, dev)
__field(ino_t, ino)
- __field(nid_t, nid[3])
+ __array(nid_t, nid, 3)
__field(int, depth)
__field(int, err)
),
--
2.39.2
From: Douglas Raillard <douglas.raillard(a)arm.com>
[ Upstream commit 0b04d4c0542e8573a837b1d81b94209e48723b25 ]
Fix the nid_t field so that its size is correctly reported in the text
format embedded in trace.dat files. As it stands, it is reported as
being of size 4:
field:nid_t nid[3]; offset:24; size:4; signed:0;
Instead of 12:
field:nid_t nid[3]; offset:24; size:12; signed:0;
This also fixes the reported offset of subsequent fields so that they
match with the actual struct layout.
Signed-off-by: Douglas Raillard <douglas.raillard(a)arm.com>
Reviewed-by: Mukesh Ojha <quic_mojha(a)quicinc.com>
Reviewed-by: Chao Yu <chao(a)kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk(a)kernel.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
include/trace/events/f2fs.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/include/trace/events/f2fs.h b/include/trace/events/f2fs.h
index 4cb055af1ec0b..f5dcf7c9b7076 100644
--- a/include/trace/events/f2fs.h
+++ b/include/trace/events/f2fs.h
@@ -513,7 +513,7 @@ TRACE_EVENT(f2fs_truncate_partial_nodes,
TP_STRUCT__entry(
__field(dev_t, dev)
__field(ino_t, ino)
- __field(nid_t, nid[3])
+ __array(nid_t, nid, 3)
__field(int, depth)
__field(int, err)
),
--
2.39.2
From: Douglas Raillard <douglas.raillard(a)arm.com>
[ Upstream commit 0b04d4c0542e8573a837b1d81b94209e48723b25 ]
Fix the nid_t field so that its size is correctly reported in the text
format embedded in trace.dat files. As it stands, it is reported as
being of size 4:
field:nid_t nid[3]; offset:24; size:4; signed:0;
Instead of 12:
field:nid_t nid[3]; offset:24; size:12; signed:0;
This also fixes the reported offset of subsequent fields so that they
match with the actual struct layout.
Signed-off-by: Douglas Raillard <douglas.raillard(a)arm.com>
Reviewed-by: Mukesh Ojha <quic_mojha(a)quicinc.com>
Reviewed-by: Chao Yu <chao(a)kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk(a)kernel.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
include/trace/events/f2fs.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/include/trace/events/f2fs.h b/include/trace/events/f2fs.h
index e57f867191ef1..eb53e96b7a29c 100644
--- a/include/trace/events/f2fs.h
+++ b/include/trace/events/f2fs.h
@@ -505,7 +505,7 @@ TRACE_EVENT(f2fs_truncate_partial_nodes,
TP_STRUCT__entry(
__field(dev_t, dev)
__field(ino_t, ino)
- __field(nid_t, nid[3])
+ __array(nid_t, nid, 3)
__field(int, depth)
__field(int, err)
),
--
2.39.2
From: Douglas Raillard <douglas.raillard(a)arm.com>
[ Upstream commit 0b04d4c0542e8573a837b1d81b94209e48723b25 ]
Fix the nid_t field so that its size is correctly reported in the text
format embedded in trace.dat files. As it stands, it is reported as
being of size 4:
field:nid_t nid[3]; offset:24; size:4; signed:0;
Instead of 12:
field:nid_t nid[3]; offset:24; size:12; signed:0;
This also fixes the reported offset of subsequent fields so that they
match with the actual struct layout.
Signed-off-by: Douglas Raillard <douglas.raillard(a)arm.com>
Reviewed-by: Mukesh Ojha <quic_mojha(a)quicinc.com>
Reviewed-by: Chao Yu <chao(a)kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk(a)kernel.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
include/trace/events/f2fs.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/include/trace/events/f2fs.h b/include/trace/events/f2fs.h
index 35ecb3118c7d5..111fafe049f7d 100644
--- a/include/trace/events/f2fs.h
+++ b/include/trace/events/f2fs.h
@@ -512,7 +512,7 @@ TRACE_EVENT(f2fs_truncate_partial_nodes,
TP_STRUCT__entry(
__field(dev_t, dev)
__field(ino_t, ino)
- __field(nid_t, nid[3])
+ __array(nid_t, nid, 3)
__field(int, depth)
__field(int, err)
),
--
2.39.2
The PSP IRQ is edge-triggered (MSI or MSI-X) in all cases supported by
the psp module so clear the interrupt status register early in the
handler to prevent missed interrupts. sev_irq_handler() calls wake_up()
on a wait queue, which can result in a new command being submitted from
a different CPU. This then races with the clearing of isr and can result
in missed interrupts. A missed interrupt results in a command waiting
until it times out, which results in the psp being declared dead.
This is unlikely on bare metal, but has been observed when running
virtualized. In the cases where this is observed, sev->cmdresp_reg has
PSP_CMDRESP_RESP set which indicates that the command was processed
correctly but no interrupt was asserted.
The full sequence of events looks like this:
CPU 1: submits SEV cmd #1
CPU 1: calls wait_event_timeout()
CPU 0: enters psp_irq_handler()
CPU 0: calls sev_handler()->wake_up()
CPU 1: wakes up; finishes processing cmd #1
CPU 1: submits SEV cmd #2
CPU 1: calls wait_event_timeout()
PSP: finishes processing cmd #2; interrupt status is still set; no interrupt
CPU 0: clears intsts
CPU 0: exits psp_irq_handler()
CPU 1: wait_event_timeout() times out; psp_dead=true
Fixes: 200664d5237f ("crypto: ccp: Add Secure Encrypted Virtualization (SEV) command support")
Cc: stable(a)vger.kernel.org
Signed-off-by: Jeremi Piotrowski <jpiotrowski(a)linux.microsoft.com>
---
drivers/crypto/ccp/psp-dev.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/drivers/crypto/ccp/psp-dev.c b/drivers/crypto/ccp/psp-dev.c
index c9c741ac8442..949a3fa0b94a 100644
--- a/drivers/crypto/ccp/psp-dev.c
+++ b/drivers/crypto/ccp/psp-dev.c
@@ -42,6 +42,9 @@ static irqreturn_t psp_irq_handler(int irq, void *data)
/* Read the interrupt status: */
status = ioread32(psp->io_regs + psp->vdata->intsts_reg);
+ /* Clear the interrupt status by writing the same value we read. */
+ iowrite32(status, psp->io_regs + psp->vdata->intsts_reg);
+
/* invoke subdevice interrupt handlers */
if (status) {
if (psp->sev_irq_handler)
@@ -51,9 +54,6 @@ static irqreturn_t psp_irq_handler(int irq, void *data)
psp->tee_irq_handler(irq, psp->tee_irq_data, status);
}
- /* Clear the interrupt status by writing the same value we read. */
- iowrite32(status, psp->io_regs + psp->vdata->intsts_reg);
-
return IRQ_HANDLED;
}
--
2.34.1
From: Steve French <stfrench(a)microsoft.com>
[ Upstream commit 87f93d82e0952da18af4d978e7d887b4c5326c0b ]
Add check for null cifs_sb to create_options helper
Signed-off-by: Steve French <stfrench(a)microsoft.com>
Reviewed-by: Amir Goldstein <amir73il(a)gmail.com>
Reviewed-by: Aurelien Aptel <aaptel(a)suse.com>
Signed-off-by: Pratyush Yadav <ptyadav(a)amazon.de>
---
Only compile-tested. This was discovered by our static code analysis
tool. I do not use CIFS and do not know how to actually reproduce the
NULL dereference.
Follow up from [0]. Original patch is at [1].
Mandatory text due to licensing terms:
This bug was discovered and resolved using Coverity Static Analysis
Security Testing (SAST) by Synopsys, Inc.
[0] https://lore.kernel.org/stable/20230405114220.108739-1-ptyadav@amazon.de/T/…
[1] https://lore.kernel.org/all/CAH2r5mtu69KEWU94qZK32H_8cvyhVU8GyOKrZqbdjH0ZLd…
fs/cifs/cifsproto.h | 2 +-
fs/cifs/smb2ops.c | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/fs/cifs/cifsproto.h b/fs/cifs/cifsproto.h
index a5fab9afd699f..2dde83a969680 100644
--- a/fs/cifs/cifsproto.h
+++ b/fs/cifs/cifsproto.h
@@ -602,7 +602,7 @@ static inline int get_dfs_path(const unsigned int xid, struct cifs_ses *ses,
static inline int cifs_create_options(struct cifs_sb_info *cifs_sb, int options)
{
- if (backup_cred(cifs_sb))
+ if (cifs_sb && (backup_cred(cifs_sb)))
return options | CREATE_OPEN_BACKUP_INTENT;
else
return options;
diff --git a/fs/cifs/smb2ops.c b/fs/cifs/smb2ops.c
index 4cb0ebe7330eb..44a261b9850de 100644
--- a/fs/cifs/smb2ops.c
+++ b/fs/cifs/smb2ops.c
@@ -2343,7 +2343,7 @@ smb2_queryfs(const unsigned int xid, struct cifs_tcon *tcon,
FS_FULL_SIZE_INFORMATION,
SMB2_O_INFO_FILESYSTEM,
sizeof(struct smb2_fs_full_size_info),
- &rsp_iov, &buftype, NULL);
+ &rsp_iov, &buftype, cifs_sb);
if (rc)
goto qfs_exit;
--
2.39.2
From: Daniel Vetter <daniel.vetter(a)ffwll.ch>
Instead of calling aperture_remove_conflicting_devices() to remove the
conflicting devices, just call to aperture_detach_devices() to detach
the device that matches the same PCI BAR / aperture range. Since the
former is just a wrapper of the latter plus a sysfb_disable() call,
and now that's done in this function but only for the primary devices.
This fixes a regression introduced by commit ee7a69aa38d8 ("fbdev:
Disable sysfb device registration when removing conflicting FBs"),
where we remove the sysfb when loading a driver for an unrelated pci
device, resulting in the user losing their efifb console or similar.
Note that in practice this only is a problem with the nvidia blob,
because that's the only gpu driver people might install which does not
come with an fbdev driver of it's own. For everyone else the real gpu
driver will restore a working console.
Also note that in the referenced bug there's confusion that this same
bug also happens on amdgpu. But that was just another amdgpu specific
regression, which just happened to happen at roughly the same time and
with the same user-observable symptoms. That bug is fixed now, see
https://bugzilla.kernel.org/show_bug.cgi?id=216331#c15
Note that we should not have any such issues on non-pci multi-gpu
issues, because I could only find two such cases:
- SoC with some external panel over spi or similar. These panel
drivers do not use drm_aperture_remove_conflicting_framebuffers(),
so no problem.
- vga+mga, which is a direct console driver and entirely bypasses all
this.
For the above reasons the cc: stable is just notionally, this patch
will need a backport and that's up to nvidia if they care enough.
v2:
- Explain a bit better why other multi-gpu that aren't pci shouldn't
have any issues with making all this fully pci specific.
v3
- polish commit message (Javier)
v4:
- Fix commit message style (i.e., commit 1234 ("..."))
- fix Daniel's S-o-b address
Fixes: ee7a69aa38d8 ("fbdev: Disable sysfb device registration when removing conflicting FBs")
Tested-by: Aaron Plattner <aplattner(a)nvidia.com>
Reviewed-by: Javier Martinez Canillas <javierm(a)redhat.com>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=216303#c28
Signed-off-by: Daniel Vetter <daniel.vetter(a)ffwll.ch>
Cc: Aaron Plattner <aplattner(a)nvidia.com>
Cc: Javier Martinez Canillas <javierm(a)redhat.com>
Cc: Thomas Zimmermann <tzimmermann(a)suse.de>
Cc: Helge Deller <deller(a)gmx.de>
Cc: Sam Ravnborg <sam(a)ravnborg.org>
Cc: Alex Deucher <alexander.deucher(a)amd.com>
Cc: <stable(a)vger.kernel.org> # v5.19+ (if someone else does the backport)
---
drivers/video/aperture.c | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/drivers/video/aperture.c b/drivers/video/aperture.c
index 1356f0e88241..e4091688b5eb 100644
--- a/drivers/video/aperture.c
+++ b/drivers/video/aperture.c
@@ -322,15 +322,16 @@ int aperture_remove_conflicting_pci_devices(struct pci_dev *pdev, const char *na
if (pdev == vga_default_device())
primary = true;
+ if (primary)
+ sysfb_disable();
+
for (bar = 0; bar < PCI_STD_NUM_BARS; ++bar) {
if (!(pci_resource_flags(pdev, bar) & IORESOURCE_MEM))
continue;
base = pci_resource_start(pdev, bar);
size = pci_resource_len(pdev, bar);
- ret = aperture_remove_conflicting_devices(base, size, name);
- if (ret)
- return ret;
+ aperture_detach_devices(base, size);
}
if (primary) {
--
2.40.0
Some Soundwire buses (like &swr0 on Qualcomm HDK8450) have two devices,
which can be brought from powerdown state one after another. We need to
keep enumerating them on each slave attached interrupt, otherwise only
first will appear.
Cc: <stable(a)vger.kernel.org>
Fixes: a6e6581942ca ("soundwire: qcom: add auto enumeration support")
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski(a)linaro.org>
---
Cc: Patrick Lai <quic_plai(a)quicinc.com>
---
drivers/soundwire/qcom.c | 11 +++--------
1 file changed, 3 insertions(+), 8 deletions(-)
diff --git a/drivers/soundwire/qcom.c b/drivers/soundwire/qcom.c
index c296e0bf897b..1e5077d91f59 100644
--- a/drivers/soundwire/qcom.c
+++ b/drivers/soundwire/qcom.c
@@ -587,14 +587,9 @@ static irqreturn_t qcom_swrm_irq_handler(int irq, void *dev_id)
case SWRM_INTERRUPT_STATUS_CHANGE_ENUM_SLAVE_STATUS:
dev_dbg_ratelimited(swrm->dev, "SWR new slave attached\n");
swrm->reg_read(swrm, SWRM_MCP_SLV_STATUS, &slave_status);
- if (swrm->slave_status == slave_status) {
- dev_dbg(swrm->dev, "Slave status not changed %x\n",
- slave_status);
- } else {
- qcom_swrm_get_device_status(swrm);
- qcom_swrm_enumerate(&swrm->bus);
- sdw_handle_slave_status(&swrm->bus, swrm->status);
- }
+ qcom_swrm_get_device_status(swrm);
+ qcom_swrm_enumerate(&swrm->bus);
+ sdw_handle_slave_status(&swrm->bus, swrm->status);
break;
case SWRM_INTERRUPT_STATUS_MASTER_CLASH_DET:
dev_err_ratelimited(swrm->dev,
--
2.34.1
For SCI, the TE (transmit enable) must be set after setting TIE (transmit
interrupt enable) or in the same instruction to start the transmission.
Set TE bit in sci_start_tx() instead of set_termios() for SCI and clear
TE bit, if circular buffer is empty in sci_transmit_chars().
Fixes: 93bcefd4c6ba ("serial: sh-sci: Fix setting SCSCR_TIE while transferring data")
Cc: stable(a)vger.kernel.org
Signed-off-by: Biju Das <biju.das.jz(a)bp.renesas.com>
---
v3->v4:
* Updated fixes tag.
v3:
* New patch
---
drivers/tty/serial/sh-sci.c | 25 +++++++++++++++++++++++--
1 file changed, 23 insertions(+), 2 deletions(-)
diff --git a/drivers/tty/serial/sh-sci.c b/drivers/tty/serial/sh-sci.c
index 15954ca3e9dc..bcc4152ce043 100644
--- a/drivers/tty/serial/sh-sci.c
+++ b/drivers/tty/serial/sh-sci.c
@@ -596,6 +596,15 @@ static void sci_start_tx(struct uart_port *port)
if (!s->chan_tx || port->type == PORT_SCIFA || port->type == PORT_SCIFB) {
/* Set TIE (Transmit Interrupt Enable) bit in SCSCR */
ctrl = serial_port_in(port, SCSCR);
+
+ /*
+ * For SCI, TE (transmit enable) must be set after setting TIE
+ * (transmit interrupt enable) or in the same instruction to start
+ * the transmit process.
+ */
+ if (port->type == PORT_SCI)
+ ctrl |= SCSCR_TE;
+
serial_port_out(port, SCSCR, ctrl | SCSCR_TIE);
}
}
@@ -834,6 +843,12 @@ static void sci_transmit_chars(struct uart_port *port)
c = xmit->buf[xmit->tail];
xmit->tail = (xmit->tail + 1) & (UART_XMIT_SIZE - 1);
} else {
+ if (port->type == PORT_SCI) {
+ ctrl = serial_port_in(port, SCSCR);
+ ctrl &= ~SCSCR_TE;
+ serial_port_out(port, SCSCR, ctrl);
+ return;
+ }
break;
}
@@ -2580,8 +2595,14 @@ static void sci_set_termios(struct uart_port *port, struct ktermios *termios,
sci_set_mctrl(port, port->mctrl);
}
- scr_val |= SCSCR_RE | SCSCR_TE |
- (s->cfg->scscr & ~(SCSCR_CKE1 | SCSCR_CKE0));
+ /*
+ * For SCI, TE (transmit enable) must be set after setting TIE
+ * (transmit interrupt enable) or in the same instruction to
+ * start the transmitting process. So skip setting TE here for SCI.
+ */
+ if (port->type != PORT_SCI)
+ scr_val |= SCSCR_TE;
+ scr_val |= SCSCR_RE | (s->cfg->scscr & ~(SCSCR_CKE1 | SCSCR_CKE0));
serial_port_out(port, SCSCR, scr_val | s->hscif_tot);
if ((srr + 1 == 5) &&
(port->type == PORT_SCIFA || port->type == PORT_SCIFB)) {
--
2.25.1
From: Takahiro Kuwano <Takahiro.Kuwano(a)infineon.com>
Infineon(Cypress) SEMPER NOR flash family has on-die ECC and its program
granularity is 16-byte ECC data unit size. JFFS2 supports write buffer
mode for ECC'd NOR flash. Provide a way to clear the MTD_BIT_WRITEABLE
flag in order to enable JFFS2 write buffer mode support. Drop the
comment as the same info is now specified in cypress_nor_ecc_init().
Fixes: 6afcc84080c4 ("mtd: spi-nor: spansion: Add support for Infineon S25FS256T")
Suggested-by: Tudor Ambarus <tudor.ambarus(a)linaro.org>
Signed-off-by: Takahiro Kuwano <Takahiro.Kuwano(a)infineon.com>
Cc: stable(a)vger.kernel.org
---
drivers/mtd/spi-nor/spansion.c | 8 +-------
1 file changed, 1 insertion(+), 7 deletions(-)
diff --git a/drivers/mtd/spi-nor/spansion.c b/drivers/mtd/spi-nor/spansion.c
index 4d0cc10e3d85..ffeede78700d 100644
--- a/drivers/mtd/spi-nor/spansion.c
+++ b/drivers/mtd/spi-nor/spansion.c
@@ -384,13 +384,7 @@ static void s25fs256t_post_sfdp_fixup(struct spi_nor *nor)
static void s25fs256t_late_init(struct spi_nor *nor)
{
- /*
- * Programming is supported only in 16-byte ECC data unit granularity.
- * Byte-programming, bit-walking, or multiple program operations to the
- * same ECC data unit without an erase are not allowed. See chapter
- * 5.3.1 and 5.6 in the datasheet.
- */
- nor->params->writesize = 16;
+ cypress_nor_ecc_init(nor);
}
static struct spi_nor_fixups s25fs256t_fixups = {
--
2.34.1
From: Takahiro Kuwano <Takahiro.Kuwano(a)infineon.com>
Infineon(Cypress) SEMPER NOR flash family has on-die ECC and its program
granularity is 16-byte ECC data unit size. JFFS2 supports write buffer
mode for ECC'd NOR flash. Provide a way to clear the MTD_BIT_WRITEABLE
flag in order to enable JFFS2 write buffer mode support.
A new SNOR_F_ECC flag is introduced to determine if the part has on-die
ECC and if it has, MTD_BIT_WRITEABLE is unset.
In vendor specific driver, a common cypress_nor_ecc_init() helper is
added. This helper takes care for ECC related initialization for SEMPER
flash family by setting up params->writesize and SNOR_F_ECC.
Fixes: c3266af101f2 ("mtd: spi-nor: spansion: add support for Cypress Semper flash")
Suggested-by: Tudor Ambarus <tudor.ambarus(a)linaro.org>
Signed-off-by: Takahiro Kuwano <Takahiro.Kuwano(a)infineon.com>
Cc: stable(a)vger.kernel.org
---
drivers/mtd/spi-nor/core.c | 3 +++
drivers/mtd/spi-nor/core.h | 1 +
drivers/mtd/spi-nor/debugfs.c | 1 +
drivers/mtd/spi-nor/spansion.c | 13 ++++++++++++-
4 files changed, 17 insertions(+), 1 deletion(-)
diff --git a/drivers/mtd/spi-nor/core.c b/drivers/mtd/spi-nor/core.c
index 1e30737b607b..143ca3c9b477 100644
--- a/drivers/mtd/spi-nor/core.c
+++ b/drivers/mtd/spi-nor/core.c
@@ -3407,6 +3407,9 @@ static void spi_nor_set_mtd_info(struct spi_nor *nor)
mtd->name = dev_name(dev);
mtd->type = MTD_NORFLASH;
mtd->flags = MTD_CAP_NORFLASH;
+ /* Unset BIT_WRITEABLE to enable JFFS2 write buffer for ECC'd NOR */
+ if (nor->flags & SNOR_F_ECC)
+ mtd->flags &= ~MTD_BIT_WRITEABLE;
if (nor->info->flags & SPI_NOR_NO_ERASE)
mtd->flags |= MTD_NO_ERASE;
else
diff --git a/drivers/mtd/spi-nor/core.h b/drivers/mtd/spi-nor/core.h
index ea9033cb0a01..8cfa82ed06c7 100644
--- a/drivers/mtd/spi-nor/core.h
+++ b/drivers/mtd/spi-nor/core.h
@@ -131,6 +131,7 @@ enum spi_nor_option_flags {
SNOR_F_SOFT_RESET = BIT(12),
SNOR_F_SWP_IS_VOLATILE = BIT(13),
SNOR_F_RWW = BIT(14),
+ SNOR_F_ECC = BIT(15),
};
struct spi_nor_read_command {
diff --git a/drivers/mtd/spi-nor/debugfs.c b/drivers/mtd/spi-nor/debugfs.c
index e200f5b9234c..082c0c5a8626 100644
--- a/drivers/mtd/spi-nor/debugfs.c
+++ b/drivers/mtd/spi-nor/debugfs.c
@@ -26,6 +26,7 @@ static const char *const snor_f_names[] = {
SNOR_F_NAME(SOFT_RESET),
SNOR_F_NAME(SWP_IS_VOLATILE),
SNOR_F_NAME(RWW),
+ SNOR_F_NAME(ECC),
};
#undef SNOR_F_NAME
diff --git a/drivers/mtd/spi-nor/spansion.c b/drivers/mtd/spi-nor/spansion.c
index 352c40dd3864..19b1436f36ea 100644
--- a/drivers/mtd/spi-nor/spansion.c
+++ b/drivers/mtd/spi-nor/spansion.c
@@ -332,6 +332,17 @@ static int cypress_nor_set_page_size(struct spi_nor *nor)
return 0;
}
+static void cypress_nor_ecc_init(struct spi_nor *nor)
+{
+ /*
+ * Programming is supported only in 16-byte ECC data unit granularity.
+ * Byte-programming, bit-walking, or multiple program operations to the
+ * same ECC data unit without an erase are not allowed.
+ */
+ nor->params->writesize = 16;
+ nor->flags |= SNOR_F_ECC;
+}
+
static int
s25fs256t_post_bfpt_fixup(struct spi_nor *nor,
const struct sfdp_parameter_header *bfpt_header,
@@ -506,7 +517,7 @@ static int s28hx_t_post_bfpt_fixup(struct spi_nor *nor,
static void s28hx_t_late_init(struct spi_nor *nor)
{
nor->params->octal_dtr_enable = cypress_nor_octal_dtr_enable;
- nor->params->writesize = 16;
+ cypress_nor_ecc_init(nor);
}
static const struct spi_nor_fixups s28hx_t_fixups = {
--
2.34.1
The quilt patch titled
Subject: maple_tree: fix a potential concurrency bug in RCU mode
has been removed from the -mm tree. Its filename was
maple_tree-fix-a-potential-concurrency-bug-in-rcu-mode.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Peng Zhang <zhangpeng.00(a)bytedance.com>
Subject: maple_tree: fix a potential concurrency bug in RCU mode
Date: Tue, 14 Mar 2023 20:42:03 +0800
There is a concurrency bug that may cause the wrong value to be loaded
when a CPU is modifying the maple tree.
CPU1:
mtree_insert_range()
mas_insert()
mas_store_root()
...
mas_root_expand()
...
rcu_assign_pointer(mas->tree->ma_root, mte_mk_root(mas->node));
ma_set_meta(node, maple_leaf_64, 0, slot); <---IP
CPU2:
mtree_load()
mtree_lookup_walk()
ma_data_end();
When CPU1 is about to execute the instruction pointed to by IP, the
ma_data_end() executed by CPU2 may return the wrong end position, which
will cause the value loaded by mtree_load() to be wrong.
An example of triggering the bug:
Add mdelay(100) between rcu_assign_pointer() and ma_set_meta() in
mas_root_expand().
static DEFINE_MTREE(tree);
int work(void *p) {
unsigned long val;
for (int i = 0 ; i< 30; ++i) {
val = (unsigned long)mtree_load(&tree, 8);
mdelay(5);
pr_info("%lu",val);
}
return 0;
}
mt_init_flags(&tree, MT_FLAGS_USE_RCU);
mtree_insert(&tree, 0, (void*)12345, GFP_KERNEL);
run_thread(work)
mtree_insert(&tree, 1, (void*)56789, GFP_KERNEL);
In RCU mode, mtree_load() should always return the value before or after
the data structure is modified, and in this example mtree_load(&tree, 8)
may return 56789 which is not expected, it should always return NULL. Fix
it by put ma_set_meta() before rcu_assign_pointer().
Link: https://lkml.kernel.org/r/20230314124203.91572-4-zhangpeng.00@bytedance.com
Fixes: 54a611b60590 ("Maple Tree: add new data structure")
Signed-off-by: Peng Zhang <zhangpeng.00(a)bytedance.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett(a)oracle.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
lib/maple_tree.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
--- a/lib/maple_tree.c~maple_tree-fix-a-potential-concurrency-bug-in-rcu-mode
+++ a/lib/maple_tree.c
@@ -3725,10 +3725,9 @@ static inline int mas_root_expand(struct
slot++;
mas->depth = 1;
mas_set_height(mas);
-
+ ma_set_meta(node, maple_leaf_64, 0, slot);
/* swap the new root into the tree */
rcu_assign_pointer(mas->tree->ma_root, mte_mk_root(mas->node));
- ma_set_meta(node, maple_leaf_64, 0, slot);
return slot;
}
_
Patches currently in -mm which might be from zhangpeng.00(a)bytedance.com are
mm-kfence-improve-the-performance-of-__kfence_alloc-and-__kfence_free.patch
maple_tree-simplify-mas_wr_node_walk.patch
The quilt patch titled
Subject: maple_tree: fix get wrong data_end in mtree_lookup_walk()
has been removed from the -mm tree. Its filename was
maple_tree-fix-get-wrong-data_end-in-mtree_lookup_walk.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Peng Zhang <zhangpeng.00(a)bytedance.com>
Subject: maple_tree: fix get wrong data_end in mtree_lookup_walk()
Date: Tue, 14 Mar 2023 20:42:01 +0800
if (likely(offset > end))
max = pivots[offset];
The above code should be changed to if (likely(offset < end)), which is
correct. This affects the correctness of ma_data_end(). Now it seems
that the final result will not be wrong, but it is best to change it.
This patch does not change the code as above, because it simplifies the
code by the way.
Link: https://lkml.kernel.org/r/20230314124203.91572-1-zhangpeng.00@bytedance.com
Link: https://lkml.kernel.org/r/20230314124203.91572-2-zhangpeng.00@bytedance.com
Fixes: 54a611b60590 ("Maple Tree: add new data structure")
Signed-off-by: Peng Zhang <zhangpeng.00(a)bytedance.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett(a)oracle.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
lib/maple_tree.c | 15 +++++----------
1 file changed, 5 insertions(+), 10 deletions(-)
--- a/lib/maple_tree.c~maple_tree-fix-get-wrong-data_end-in-mtree_lookup_walk
+++ a/lib/maple_tree.c
@@ -3941,18 +3941,13 @@ static inline void *mtree_lookup_walk(st
end = ma_data_end(node, type, pivots, max);
if (unlikely(ma_dead_node(node)))
goto dead_node;
-
- if (pivots[offset] >= mas->index)
- goto next;
-
do {
- offset++;
- } while ((offset < end) && (pivots[offset] < mas->index));
-
- if (likely(offset > end))
- max = pivots[offset];
+ if (pivots[offset] >= mas->index) {
+ max = pivots[offset];
+ break;
+ }
+ } while (++offset < end);
-next:
slots = ma_slots(node, type);
next = mt_slot(mas->tree, slots, offset);
if (unlikely(ma_dead_node(node)))
_
Patches currently in -mm which might be from zhangpeng.00(a)bytedance.com are
mm-kfence-improve-the-performance-of-__kfence_alloc-and-__kfence_free.patch
maple_tree-simplify-mas_wr_node_walk.patch
The quilt patch titled
Subject: mm/swap: fix swap_info_struct race between swapoff and get_swap_pages()
has been removed from the -mm tree. Its filename was
mm-swap-fix-swap_info_struct-race-between-swapoff-and-get_swap_pages.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Rongwei Wang <rongwei.wang(a)linux.alibaba.com>
Subject: mm/swap: fix swap_info_struct race between swapoff and get_swap_pages()
Date: Tue, 4 Apr 2023 23:47:16 +0800
The si->lock must be held when deleting the si from the available list.
Otherwise, another thread can re-add the si to the available list, which
can lead to memory corruption. The only place we have found where this
happens is in the swapoff path. This case can be described as below:
core 0 core 1
swapoff
del_from_avail_list(si) waiting
try lock si->lock acquire swap_avail_lock
and re-add si into
swap_avail_head
acquire si->lock but missing si already being added again, and continuing
to clear SWP_WRITEOK, etc.
It can be easily found that a massive warning messages can be triggered
inside get_swap_pages() by some special cases, for example, we call
madvise(MADV_PAGEOUT) on blocks of touched memory concurrently, meanwhile,
run much swapon-swapoff operations (e.g. stress-ng-swap).
However, in the worst case, panic can be caused by the above scene. In
swapoff(), the memory used by si could be kept in swap_info[] after
turning off a swap. This means memory corruption will not be caused
immediately until allocated and reset for a new swap in the swapon path.
A panic message caused: (with CONFIG_PLIST_DEBUG enabled)
------------[ cut here ]------------
top: 00000000e58a3003, n: 0000000013e75cda, p: 000000008cd4451a
prev: 0000000035b1e58a, n: 000000008cd4451a, p: 000000002150ee8d
next: 000000008cd4451a, n: 000000008cd4451a, p: 000000008cd4451a
WARNING: CPU: 21 PID: 1843 at lib/plist.c:60 plist_check_prev_next_node+0x50/0x70
Modules linked in: rfkill(E) crct10dif_ce(E)...
CPU: 21 PID: 1843 Comm: stress-ng Kdump: ... 5.10.134+
Hardware name: Alibaba Cloud ECS, BIOS 0.0.0 02/06/2015
pstate: 60400005 (nZCv daif +PAN -UAO -TCO BTYPE=--)
pc : plist_check_prev_next_node+0x50/0x70
lr : plist_check_prev_next_node+0x50/0x70
sp : ffff0018009d3c30
x29: ffff0018009d3c40 x28: ffff800011b32a98
x27: 0000000000000000 x26: ffff001803908000
x25: ffff8000128ea088 x24: ffff800011b32a48
x23: 0000000000000028 x22: ffff001800875c00
x21: ffff800010f9e520 x20: ffff001800875c00
x19: ffff001800fdc6e0 x18: 0000000000000030
x17: 0000000000000000 x16: 0000000000000000
x15: 0736076307640766 x14: 0730073007380731
x13: 0736076307640766 x12: 0730073007380731
x11: 000000000004058d x10: 0000000085a85b76
x9 : ffff8000101436e4 x8 : ffff800011c8ce08
x7 : 0000000000000000 x6 : 0000000000000001
x5 : ffff0017df9ed338 x4 : 0000000000000001
x3 : ffff8017ce62a000 x2 : ffff0017df9ed340
x1 : 0000000000000000 x0 : 0000000000000000
Call trace:
plist_check_prev_next_node+0x50/0x70
plist_check_head+0x80/0xf0
plist_add+0x28/0x140
add_to_avail_list+0x9c/0xf0
_enable_swap_info+0x78/0xb4
__do_sys_swapon+0x918/0xa10
__arm64_sys_swapon+0x20/0x30
el0_svc_common+0x8c/0x220
do_el0_svc+0x2c/0x90
el0_svc+0x1c/0x30
el0_sync_handler+0xa8/0xb0
el0_sync+0x148/0x180
irq event stamp: 2082270
Now, si->lock locked before calling 'del_from_avail_list()' to make sure
other thread see the si had been deleted and SWP_WRITEOK cleared together,
will not reinsert again.
This problem exists in versions after stable 5.10.y.
Link: https://lkml.kernel.org/r/20230404154716.23058-1-rongwei.wang@linux.alibaba…
Fixes: a2468cc9bfdff ("swap: choose swap device according to numa node")
Tested-by: Yongchen Yin <wb-yyc939293(a)alibaba-inc.com>
Signed-off-by: Rongwei Wang <rongwei.wang(a)linux.alibaba.com>
Cc: Bagas Sanjaya <bagasdotme(a)gmail.com>
Cc: Matthew Wilcox (Oracle) <willy(a)infradead.org>
Cc: Aaron Lu <aaron.lu(a)intel.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/swapfile.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
--- a/mm/swapfile.c~mm-swap-fix-swap_info_struct-race-between-swapoff-and-get_swap_pages
+++ a/mm/swapfile.c
@@ -679,6 +679,7 @@ static void __del_from_avail_list(struct
{
int nid;
+ assert_spin_locked(&p->lock);
for_each_node(nid)
plist_del(&p->avail_lists[nid], &swap_avail_heads[nid]);
}
@@ -2434,8 +2435,8 @@ SYSCALL_DEFINE1(swapoff, const char __us
spin_unlock(&swap_lock);
goto out_dput;
}
- del_from_avail_list(p);
spin_lock(&p->lock);
+ del_from_avail_list(p);
if (p->prio < 0) {
struct swap_info_struct *si = p;
int nid;
_
Patches currently in -mm which might be from rongwei.wang(a)linux.alibaba.com are
The quilt patch titled
Subject: nilfs2: fix sysfs interface lifetime
has been removed from the -mm tree. Its filename was
nilfs2-fix-sysfs-interface-lifetime.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Subject: nilfs2: fix sysfs interface lifetime
Date: Fri, 31 Mar 2023 05:55:15 +0900
The current nilfs2 sysfs support has issues with the timing of creation
and deletion of sysfs entries, potentially leading to null pointer
dereferences, use-after-free, and lockdep warnings.
Some of the sysfs attributes for nilfs2 per-filesystem instance refer to
metadata file "cpfile", "sufile", or "dat", but
nilfs_sysfs_create_device_group that creates those attributes is executed
before the inodes for these metadata files are loaded, and
nilfs_sysfs_delete_device_group which deletes these sysfs entries is
called after releasing their metadata file inodes.
Therefore, access to some of these sysfs attributes may occur outside of
the lifetime of these metadata files, resulting in inode NULL pointer
dereferences or use-after-free.
In addition, the call to nilfs_sysfs_create_device_group() is made during
the locking period of the semaphore "ns_sem" of nilfs object, so the
shrinker call caused by the memory allocation for the sysfs entries, may
derive lock dependencies "ns_sem" -> (shrinker) -> "locks acquired in
nilfs_evict_inode()".
Since nilfs2 may acquire "ns_sem" deep in the call stack holding other
locks via its error handler __nilfs_error(), this causes lockdep to report
circular locking. This is a false positive and no circular locking
actually occurs as no inodes exist yet when
nilfs_sysfs_create_device_group() is called. Fortunately, the lockdep
warnings can be resolved by simply moving the call to
nilfs_sysfs_create_device_group() out of "ns_sem".
This fixes these sysfs issues by revising where the device's sysfs
interface is created/deleted and keeping its lifetime within the lifetime
of the metadata files above.
Link: https://lkml.kernel.org/r/20230330205515.6167-1-konishi.ryusuke@gmail.com
Fixes: dd70edbde262 ("nilfs2: integrate sysfs support into driver")
Signed-off-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Reported-by: syzbot+979fa7f9c0d086fdc282(a)syzkaller.appspotmail.com
Link: https://lkml.kernel.org/r/0000000000003414b505f7885f7e@google.com
Reported-by: syzbot+5b7d542076d9bddc3c6a(a)syzkaller.appspotmail.com
Link: https://lkml.kernel.org/r/0000000000006ac86605f5f44eb9@google.com
Cc: Viacheslav Dubeyko <slava(a)dubeyko.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/nilfs2/super.c | 2 ++
fs/nilfs2/the_nilfs.c | 12 +++++++-----
2 files changed, 9 insertions(+), 5 deletions(-)
--- a/fs/nilfs2/super.c~nilfs2-fix-sysfs-interface-lifetime
+++ a/fs/nilfs2/super.c
@@ -482,6 +482,7 @@ static void nilfs_put_super(struct super
up_write(&nilfs->ns_sem);
}
+ nilfs_sysfs_delete_device_group(nilfs);
iput(nilfs->ns_sufile);
iput(nilfs->ns_cpfile);
iput(nilfs->ns_dat);
@@ -1105,6 +1106,7 @@ nilfs_fill_super(struct super_block *sb,
nilfs_put_root(fsroot);
failed_unload:
+ nilfs_sysfs_delete_device_group(nilfs);
iput(nilfs->ns_sufile);
iput(nilfs->ns_cpfile);
iput(nilfs->ns_dat);
--- a/fs/nilfs2/the_nilfs.c~nilfs2-fix-sysfs-interface-lifetime
+++ a/fs/nilfs2/the_nilfs.c
@@ -87,7 +87,6 @@ void destroy_nilfs(struct the_nilfs *nil
{
might_sleep();
if (nilfs_init(nilfs)) {
- nilfs_sysfs_delete_device_group(nilfs);
brelse(nilfs->ns_sbh[0]);
brelse(nilfs->ns_sbh[1]);
}
@@ -305,6 +304,10 @@ int load_nilfs(struct the_nilfs *nilfs,
goto failed;
}
+ err = nilfs_sysfs_create_device_group(sb);
+ if (unlikely(err))
+ goto sysfs_error;
+
if (valid_fs)
goto skip_recovery;
@@ -366,6 +369,9 @@ int load_nilfs(struct the_nilfs *nilfs,
goto failed;
failed_unload:
+ nilfs_sysfs_delete_device_group(nilfs);
+
+ sysfs_error:
iput(nilfs->ns_cpfile);
iput(nilfs->ns_sufile);
iput(nilfs->ns_dat);
@@ -697,10 +703,6 @@ int init_nilfs(struct the_nilfs *nilfs,
if (err)
goto failed_sbh;
- err = nilfs_sysfs_create_device_group(sb);
- if (err)
- goto failed_sbh;
-
set_nilfs_init(nilfs);
err = 0;
out:
_
Patches currently in -mm which might be from konishi.ryusuke(a)gmail.com are
The quilt patch titled
Subject: mm: take a page reference when removing device exclusive entries
has been removed from the -mm tree. Its filename was
mm-take-a-page-reference-when-removing-device-exclusive-entries.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Alistair Popple <apopple(a)nvidia.com>
Subject: mm: take a page reference when removing device exclusive entries
Date: Thu, 30 Mar 2023 12:25:19 +1100
Device exclusive page table entries are used to prevent CPU access to a
page whilst it is being accessed from a device. Typically this is used to
implement atomic operations when the underlying bus does not support
atomic access. When a CPU thread encounters a device exclusive entry it
locks the page and restores the original entry after calling mmu notifiers
to signal drivers that exclusive access is no longer available.
The device exclusive entry holds a reference to the page making it safe to
access the struct page whilst the entry is present. However the fault
handling code does not hold the PTL when taking the page lock. This means
if there are multiple threads faulting concurrently on the device
exclusive entry one will remove the entry whilst others will wait on the
page lock without holding a reference.
This can lead to threads locking or waiting on a folio with a zero
refcount. Whilst mmap_lock prevents the pages getting freed via munmap()
they may still be freed by a migration. This leads to warnings such as
PAGE_FLAGS_CHECK_AT_FREE due to the page being locked when the refcount
drops to zero.
Fix this by trying to take a reference on the folio before locking it.
The code already checks the PTE under the PTL and aborts if the entry is
no longer there. It is also possible the folio has been unmapped, freed
and re-allocated allowing a reference to be taken on an unrelated folio.
This case is also detected by the PTE check and the folio is unlocked
without further changes.
Link: https://lkml.kernel.org/r/20230330012519.804116-1-apopple@nvidia.com
Fixes: b756a3b5e7ea ("mm: device exclusive memory access")
Signed-off-by: Alistair Popple <apopple(a)nvidia.com>
Reviewed-by: Ralph Campbell <rcampbell(a)nvidia.com>
Reviewed-by: John Hubbard <jhubbard(a)nvidia.com>
Acked-by: David Hildenbrand <david(a)redhat.com>
Cc: Matthew Wilcox (Oracle) <willy(a)infradead.org>
Cc: Christoph Hellwig <hch(a)infradead.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/memory.c | 16 +++++++++++++++-
1 file changed, 15 insertions(+), 1 deletion(-)
--- a/mm/memory.c~mm-take-a-page-reference-when-removing-device-exclusive-entries
+++ a/mm/memory.c
@@ -3563,8 +3563,21 @@ static vm_fault_t remove_device_exclusiv
struct vm_area_struct *vma = vmf->vma;
struct mmu_notifier_range range;
- if (!folio_lock_or_retry(folio, vma->vm_mm, vmf->flags))
+ /*
+ * We need a reference to lock the folio because we don't hold
+ * the PTL so a racing thread can remove the device-exclusive
+ * entry and unmap it. If the folio is free the entry must
+ * have been removed already. If it happens to have already
+ * been re-allocated after being freed all we do is lock and
+ * unlock it.
+ */
+ if (!folio_try_get(folio))
+ return 0;
+
+ if (!folio_lock_or_retry(folio, vma->vm_mm, vmf->flags)) {
+ folio_put(folio);
return VM_FAULT_RETRY;
+ }
mmu_notifier_range_init_owner(&range, MMU_NOTIFY_EXCLUSIVE, 0,
vma->vm_mm, vmf->address & PAGE_MASK,
(vmf->address & PAGE_MASK) + PAGE_SIZE, NULL);
@@ -3577,6 +3590,7 @@ static vm_fault_t remove_device_exclusiv
pte_unmap_unlock(vmf->pte, vmf->ptl);
folio_unlock(folio);
+ folio_put(folio);
mmu_notifier_invalidate_range_end(&range);
return 0;
_
Patches currently in -mm which might be from apopple(a)nvidia.com are
The quilt patch titled
Subject: nilfs2: fix potential UAF of struct nilfs_sc_info in nilfs_segctor_thread()
has been removed from the -mm tree. Its filename was
nilfs2-fix-potential-uaf-of-struct-nilfs_sc_info-in-nilfs_segctor_thread.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Subject: nilfs2: fix potential UAF of struct nilfs_sc_info in nilfs_segctor_thread()
Date: Tue, 28 Mar 2023 02:53:18 +0900
The finalization of nilfs_segctor_thread() can race with
nilfs_segctor_kill_thread() which terminates that thread, potentially
causing a use-after-free BUG as KASAN detected.
At the end of nilfs_segctor_thread(), it assigns NULL to "sc_task" member
of "struct nilfs_sc_info" to indicate the thread has finished, and then
notifies nilfs_segctor_kill_thread() of this using waitqueue
"sc_wait_task" on the struct nilfs_sc_info.
However, here, immediately after the NULL assignment to "sc_task", it is
possible that nilfs_segctor_kill_thread() will detect it and return to
continue the deallocation, freeing the nilfs_sc_info structure before the
thread does the notification.
This fixes the issue by protecting the NULL assignment to "sc_task" and
its notification, with spinlock "sc_state_lock" of the struct
nilfs_sc_info. Since nilfs_segctor_kill_thread() does a final check to
see if "sc_task" is NULL with "sc_state_lock" locked, this can eliminate
the race.
Link: https://lkml.kernel.org/r/20230327175318.8060-1-konishi.ryusuke@gmail.com
Reported-by: syzbot+b08ebcc22f8f3e6be43a(a)syzkaller.appspotmail.com
Link: https://lkml.kernel.org/r/00000000000000660d05f7dfa877@google.com
Signed-off-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/nilfs2/segment.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
--- a/fs/nilfs2/segment.c~nilfs2-fix-potential-uaf-of-struct-nilfs_sc_info-in-nilfs_segctor_thread
+++ a/fs/nilfs2/segment.c
@@ -2609,11 +2609,10 @@ static int nilfs_segctor_thread(void *ar
goto loop;
end_thread:
- spin_unlock(&sci->sc_state_lock);
-
/* end sync. */
sci->sc_task = NULL;
wake_up(&sci->sc_wait_task); /* for nilfs_segctor_kill_thread() */
+ spin_unlock(&sci->sc_state_lock);
return 0;
}
_
Patches currently in -mm which might be from konishi.ryusuke(a)gmail.com are
The quilt patch titled
Subject: zsmalloc: document freeable stats
has been removed from the -mm tree. Its filename was
zsmalloc-document-freeable-stats.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Sergey Senozhatsky <senozhatsky(a)chromium.org>
Subject: zsmalloc: document freeable stats
Date: Sat, 25 Mar 2023 11:46:31 +0900
When freeable class stat was added to classes file (back in 2016) we
forgot to update zsmalloc documentation. Fix that.
Link: https://lkml.kernel.org/r/20230325024631.2817153-3-senozhatsky@chromium.org
Fixes: 1120ed548394 ("mm/zsmalloc: add `freeable' column to pool stat")
Signed-off-by: Sergey Senozhatsky <senozhatsky(a)chromium.org>
Cc: Minchan Kim <minchan(a)kernel.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
Documentation/mm/zsmalloc.rst | 2 ++
1 file changed, 2 insertions(+)
--- a/Documentation/mm/zsmalloc.rst~zsmalloc-document-freeable-stats
+++ a/Documentation/mm/zsmalloc.rst
@@ -83,6 +83,8 @@ pages_used
the number of pages allocated for the class
pages_per_zspage
the number of 0-order pages to make a zspage
+freeable
+ the approximate number of pages class compaction can free
Each zspage maintains inuse counter which keeps track of the number of
objects stored in the zspage. The inuse counter determines the zspage's
_
Patches currently in -mm which might be from senozhatsky(a)chromium.org are
The quilt patch titled
Subject: mm/hugetlb: fix uffd wr-protection for CoW optimization path
has been removed from the -mm tree. Its filename was
mm-hugetlb-fix-uffd-wr-protection-for-cow-optimization-path.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Peter Xu <peterx(a)redhat.com>
Subject: mm/hugetlb: fix uffd wr-protection for CoW optimization path
Date: Tue, 21 Mar 2023 15:18:40 -0400
This patch fixes an issue that a hugetlb uffd-wr-protected mapping can be
writable even with uffd-wp bit set. It only happens with hugetlb private
mappings, when someone firstly wr-protects a missing pte (which will
install a pte marker), then a write to the same page without any prior
access to the page.
Userfaultfd-wp trap for hugetlb was implemented in hugetlb_fault() before
reaching hugetlb_wp() to avoid taking more locks that userfault won't
need. However there's one CoW optimization path that can trigger
hugetlb_wp() inside hugetlb_no_page(), which will bypass the trap.
This patch skips hugetlb_wp() for CoW and retries the fault if uffd-wp bit
is detected. The new path will only trigger in the CoW optimization path
because generic hugetlb_fault() (e.g. when a present pte was
wr-protected) will resolve the uffd-wp bit already. Also make sure
anonymous UNSHARE won't be affected and can still be resolved, IOW only
skip CoW not CoR.
This patch will be needed for v5.19+ hence copy stable.
[peterx(a)redhat.com: v2]
Link: https://lkml.kernel.org/r/ZBzOqwF2wrHgBVZb@x1n
[peterx(a)redhat.com: v3]
Link: https://lkml.kernel.org/r/20230324142620.2344140-1-peterx@redhat.com
Link: https://lkml.kernel.org/r/20230321191840.1897940-1-peterx@redhat.com
Fixes: 166f3ecc0daf ("mm/hugetlb: hook page faults for uffd write protection")
Signed-off-by: Peter Xu <peterx(a)redhat.com>
Reported-by: Muhammad Usama Anjum <usama.anjum(a)collabora.com>
Tested-by: Muhammad Usama Anjum <usama.anjum(a)collabora.com>
Acked-by: David Hildenbrand <david(a)redhat.com>
Reviewed-by: Mike Kravetz <mike.kravetz(a)oracle.com>
Cc: Andrea Arcangeli <aarcange(a)redhat.com>
Cc: Axel Rasmussen <axelrasmussen(a)google.com>
Cc: Mike Rapoport <rppt(a)linux.vnet.ibm.com>
Cc: Nadav Amit <nadav.amit(a)gmail.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/hugetlb.c | 14 ++++++++++++--
1 file changed, 12 insertions(+), 2 deletions(-)
--- a/mm/hugetlb.c~mm-hugetlb-fix-uffd-wr-protection-for-cow-optimization-path
+++ a/mm/hugetlb.c
@@ -5478,7 +5478,7 @@ static vm_fault_t hugetlb_wp(struct mm_s
struct folio *pagecache_folio, spinlock_t *ptl)
{
const bool unshare = flags & FAULT_FLAG_UNSHARE;
- pte_t pte;
+ pte_t pte = huge_ptep_get(ptep);
struct hstate *h = hstate_vma(vma);
struct page *old_page;
struct folio *new_folio;
@@ -5488,6 +5488,17 @@ static vm_fault_t hugetlb_wp(struct mm_s
struct mmu_notifier_range range;
/*
+ * Never handle CoW for uffd-wp protected pages. It should be only
+ * handled when the uffd-wp protection is removed.
+ *
+ * Note that only the CoW optimization path (in hugetlb_no_page())
+ * can trigger this, because hugetlb_fault() will always resolve
+ * uffd-wp bit first.
+ */
+ if (!unshare && huge_pte_uffd_wp(pte))
+ return 0;
+
+ /*
* hugetlb does not support FOLL_FORCE-style write faults that keep the
* PTE mapped R/O such as maybe_mkwrite() would do.
*/
@@ -5500,7 +5511,6 @@ static vm_fault_t hugetlb_wp(struct mm_s
return 0;
}
- pte = huge_ptep_get(ptep);
old_page = pte_page(pte);
delayacct_wpcopy_start();
_
Patches currently in -mm which might be from peterx(a)redhat.com are
mm-khugepaged-check-again-on-anon-uffd-wp-during-isolation.patch
mm-uffd-uffd_feature_wp_unpopulated.patch
mm-uffd-uffd_feature_wp_unpopulated-fix.patch
selftests-mm-smoke-test-uffd_feature_wp_unpopulated.patch
The quilt patch titled
Subject: mm: enable maple tree RCU mode by default
has been removed from the -mm tree. Its filename was
mm-enable-maple-tree-rcu-mode-by-default.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: "Liam R. Howlett" <Liam.Howlett(a)oracle.com>
Subject: mm: enable maple tree RCU mode by default
Date: Mon, 27 Feb 2023 09:36:07 -0800
Use the maple tree in RCU mode for VMA tracking.
The maple tree tracks the stack and is able to update the pivot
(lower/upper boundary) in-place to allow the page fault handler to write
to the tree while holding just the mmap read lock. This is safe as the
writes to the stack have a guard VMA which ensures there will always be a
NULL in the direction of the growth and thus will only update a pivot.
It is possible, but not recommended, to have VMAs that grow up/down
without guard VMAs. syzbot has constructed a testcase which sets up a VMA
to grow and consume the empty space. Overwriting the entire NULL entry
causes the tree to be altered in a way that is not safe for concurrent
readers; the readers may see a node being rewritten or one that does not
match the maple state they are using.
Enabling RCU mode allows the concurrent readers to see a stable node and
will return the expected result.
[Liam.Howlett(a)Oracle.com: we don't need to free the nodes with RCU[
Link: https://lore.kernel.org/linux-mm/000000000000b0a65805f663ace6@google.com/
Link: https://lkml.kernel.org/r/20230227173632.3292573-9-surenb@google.com
Fixes: d4af56c5c7c6 ("mm: start tracking VMAs with maple tree")
Signed-off-by: Liam R. Howlett <Liam.Howlett(a)oracle.com>
Signed-off-by: Suren Baghdasaryan <surenb(a)google.com>
Reported-by: syzbot+8d95422d3537159ca390(a)syzkaller.appspotmail.com
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
include/linux/mm_types.h | 3 ++-
kernel/fork.c | 3 +++
mm/mmap.c | 3 ++-
3 files changed, 7 insertions(+), 2 deletions(-)
--- a/include/linux/mm_types.h~mm-enable-maple-tree-rcu-mode-by-default
+++ a/include/linux/mm_types.h
@@ -774,7 +774,8 @@ struct mm_struct {
unsigned long cpu_bitmap[];
};
-#define MM_MT_FLAGS (MT_FLAGS_ALLOC_RANGE | MT_FLAGS_LOCK_EXTERN)
+#define MM_MT_FLAGS (MT_FLAGS_ALLOC_RANGE | MT_FLAGS_LOCK_EXTERN | \
+ MT_FLAGS_USE_RCU)
extern struct mm_struct init_mm;
/* Pointer magic because the dynamic array size confuses some compilers. */
--- a/kernel/fork.c~mm-enable-maple-tree-rcu-mode-by-default
+++ a/kernel/fork.c
@@ -617,6 +617,7 @@ static __latent_entropy int dup_mmap(str
if (retval)
goto out;
+ mt_clear_in_rcu(vmi.mas.tree);
for_each_vma(old_vmi, mpnt) {
struct file *file;
@@ -700,6 +701,8 @@ static __latent_entropy int dup_mmap(str
retval = arch_dup_mmap(oldmm, mm);
loop_out:
vma_iter_free(&vmi);
+ if (!retval)
+ mt_set_in_rcu(vmi.mas.tree);
out:
mmap_write_unlock(mm);
flush_tlb_mm(oldmm);
--- a/mm/mmap.c~mm-enable-maple-tree-rcu-mode-by-default
+++ a/mm/mmap.c
@@ -2277,7 +2277,7 @@ do_vmi_align_munmap(struct vma_iterator
int count = 0;
int error = -ENOMEM;
MA_STATE(mas_detach, &mt_detach, 0, 0);
- mt_init_flags(&mt_detach, MT_FLAGS_LOCK_EXTERN);
+ mt_init_flags(&mt_detach, vmi->mas.tree->ma_flags & MT_FLAGS_LOCK_MASK);
mt_set_external_lock(&mt_detach, &mm->mmap_lock);
/*
@@ -3037,6 +3037,7 @@ void exit_mmap(struct mm_struct *mm)
*/
set_bit(MMF_OOM_SKIP, &mm->flags);
mmap_write_lock(mm);
+ mt_clear_in_rcu(&mm->mm_mt);
free_pgtables(&tlb, &mm->mm_mt, vma, FIRST_USER_ADDRESS,
USER_PGTABLES_CEILING);
tlb_finish_mmu(&tlb);
_
Patches currently in -mm which might be from Liam.Howlett(a)oracle.com are
The quilt patch titled
Subject: maple_tree: add smp_rmb() to dead node detection
has been removed from the -mm tree. Its filename was
maple_tree-add-smp_rmb-to-dead-node-detection.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: "Liam R. Howlett" <Liam.Howlett(a)oracle.com>
Subject: maple_tree: add smp_rmb() to dead node detection
Date: Mon, 27 Feb 2023 09:36:05 -0800
Add an smp_rmb() before reading the parent pointer to ensure that anything
read from the node prior to the parent pointer hasn't been reordered ahead
of this check.
The is necessary for RCU mode.
Link: https://lkml.kernel.org/r/20230227173632.3292573-7-surenb@google.com
Fixes: 54a611b60590 ("Maple Tree: add new data structure")
Signed-off-by: Liam R. Howlett <Liam.Howlett(a)oracle.com>
Signed-off-by: Suren Baghdasaryan <surenb(a)google.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
lib/maple_tree.c | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)
--- a/lib/maple_tree.c~maple_tree-add-smp_rmb-to-dead-node-detection
+++ a/lib/maple_tree.c
@@ -539,9 +539,11 @@ static inline struct maple_node *mte_par
*/
static inline bool ma_dead_node(const struct maple_node *node)
{
- struct maple_node *parent = (void *)((unsigned long)
- node->parent & ~MAPLE_NODE_MASK);
+ struct maple_node *parent;
+ /* Do not reorder reads from the node prior to the parent check */
+ smp_rmb();
+ parent = (void *)((unsigned long) node->parent & ~MAPLE_NODE_MASK);
return (parent == node);
}
@@ -556,6 +558,8 @@ static inline bool mte_dead_node(const s
struct maple_node *parent, *node;
node = mte_to_node(enode);
+ /* Do not reorder reads from the node prior to the parent check */
+ smp_rmb();
parent = mte_parent(enode);
return (parent == node);
}
_
Patches currently in -mm which might be from Liam.Howlett(a)oracle.com are
The quilt patch titled
Subject: maple_tree: remove extra smp_wmb() from mas_dead_leaves()
has been removed from the -mm tree. Its filename was
maple_tree-remove-extra-smp_wmb-from-mas_dead_leaves.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Liam Howlett <Liam.Howlett(a)oracle.com>
Subject: maple_tree: remove extra smp_wmb() from mas_dead_leaves()
Date: Mon, 27 Feb 2023 09:36:03 -0800
The call to mte_set_dead_node() before the smp_wmb() already calls
smp_wmb() so this is not needed. This is an optimization for the RCU mode
of the maple tree.
Link: https://lkml.kernel.org/r/20230227173632.3292573-5-surenb@google.com
Fixes: 54a611b60590 ("Maple Tree: add new data structure")
Signed-off-by: Liam Howlett <Liam.Howlett(a)oracle.com>
Signed-off-by: Suren Baghdasaryan <surenb(a)google.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
lib/maple_tree.c | 1 -
1 file changed, 1 deletion(-)
--- a/lib/maple_tree.c~maple_tree-remove-extra-smp_wmb-from-mas_dead_leaves
+++ a/lib/maple_tree.c
@@ -5503,7 +5503,6 @@ unsigned char mas_dead_leaves(struct ma_
break;
mte_set_node_dead(entry);
- smp_wmb(); /* Needed for RCU */
node->type = type;
rcu_assign_pointer(slots[offset], node);
}
_
Patches currently in -mm which might be from Liam.Howlett(a)oracle.com are
The quilt patch titled
Subject: maple_tree: detect dead nodes in mas_start()
has been removed from the -mm tree. Its filename was
maple_tree-detect-dead-nodes-in-mas_start.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Liam Howlett <Liam.Howlett(a)oracle.com>
Subject: maple_tree: detect dead nodes in mas_start()
Date: Mon, 27 Feb 2023 09:36:01 -0800
When initially starting a search, the root node may already be in the
process of being replaced in RCU mode. Detect and restart the walk if
this is the case. This is necessary for RCU mode of the maple tree.
Link: https://lkml.kernel.org/r/20230227173632.3292573-3-surenb@google.com
Fixes: 54a611b60590 ("Maple Tree: add new data structure")
Signed-off-by: Liam Howlett <Liam.Howlett(a)oracle.com>
Signed-off-by: Suren Baghdasaryan <surenb(a)google.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
lib/maple_tree.c | 4 ++++
1 file changed, 4 insertions(+)
--- a/lib/maple_tree.c~maple_tree-detect-dead-nodes-in-mas_start
+++ a/lib/maple_tree.c
@@ -1360,12 +1360,16 @@ static inline struct maple_enode *mas_st
mas->max = ULONG_MAX;
mas->depth = 0;
+retry:
root = mas_root(mas);
/* Tree with nodes */
if (likely(xa_is_node(root))) {
mas->depth = 1;
mas->node = mte_safe_root(root);
mas->offset = 0;
+ if (mte_dead_node(mas->node))
+ goto retry;
+
return NULL;
}
_
Patches currently in -mm which might be from Liam.Howlett(a)oracle.com are
The quilt patch titled
Subject: maple_tree: be more cautious about dead nodes
has been removed from the -mm tree. Its filename was
maple_tree-be-more-cautious-about-dead-nodes.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Liam Howlett <Liam.Howlett(a)oracle.com>
Subject: maple_tree: be more cautious about dead nodes
Date: Mon, 27 Feb 2023 09:36:00 -0800
Patch series "Fix VMA tree modification under mmap read lock".
Syzbot reported a BUG_ON in mm/mmap.c which was found to be caused by an
inconsistency between threads walking the VMA maple tree. The
inconsistency is caused by the page fault handler modifying the maple tree
while holding the mmap_lock for read.
This only happens for stack VMAs. We had thought this was safe as it only
modifies a single pivot in the tree. Unfortunately, syzbot constructed a
test case where the stack had no guard page and grew the stack to abut the
next VMA. This causes us to delete the NULL entry between the two VMAs
and rewrite the node.
We considered several options for fixing this, including dropping the
mmap_lock, then reacquiring it for write; and relaxing the definition of
the tree to permit a zero-length NULL entry in the node. We decided the
best option was to backport some of the RCU patches from -next, which
solve the problem by allocating a new node and RCU-freeing the old node.
Since the problem exists in 6.1, we preferred a solution which is similar
to the one we intended to merge next merge window.
These patches have been in -next since next-20230301, and have received
intensive testing in Android as part of the RCU page fault patchset. They
were also sent as part of the "Per-VMA locks" v4 patch series. Patches 1
to 7 are bug fixes for RCU mode of the tree and patch 8 enables RCU mode
for the tree.
Performance v6.3-rc3 vs patched v6.3-rc3: Running these changes through
mmtests showed there was a 15-20% performance decrease in
will-it-scale/brk1-processes. This tests creating and inserting a single
VMA repeatedly through the brk interface and isn't representative of any
real world applications.
This patch (of 8):
ma_pivots() and ma_data_end() may be called with a dead node. Ensure to
that the node isn't dead before using the returned values.
This is necessary for RCU mode of the maple tree.
Link: https://lkml.kernel.org/r/20230327185532.2354250-1-Liam.Howlett@oracle.com
Link: https://lkml.kernel.org/r/20230227173632.3292573-1-surenb@google.com
Link: https://lkml.kernel.org/r/20230227173632.3292573-2-surenb@google.com
Fixes: 54a611b60590 ("Maple Tree: add new data structure")
Signed-off-by: Liam Howlett <Liam.Howlett(a)oracle.com>
Signed-off-by: Suren Baghdasaryan <surenb(a)google.com>
Cc: Andy Lutomirski <luto(a)kernel.org>
Cc: Arjun Roy <arjunroy(a)google.com>
Cc: Axel Rasmussen <axelrasmussen(a)google.com>
Cc: Chris Li <chriscli(a)google.com>
Cc: David Hildenbrand <david(a)redhat.com>
Cc: David Howells <dhowells(a)redhat.com>
Cc: Davidlohr Bueso <dave(a)stgolabs.net>
Cc: David Rientjes <rientjes(a)google.com>
Cc: Eric Dumazet <edumazet(a)google.com>
Cc: freak07 <michalechner92(a)googlemail.com>
Cc: Greg Thelen <gthelen(a)google.com>
Cc: Hugh Dickins <hughd(a)google.com>
Cc: Ingo Molnar <mingo(a)redhat.com>
Cc: Jann Horn <jannh(a)google.com>
Cc: Joel Fernandes <joelaf(a)google.com>
Cc: Johannes Weiner <hannes(a)cmpxchg.org>
Cc: Kent Overstreet <kent.overstreet(a)linux.dev>
Cc: Laurent Dufour <ldufour(a)linux.ibm.com>
Cc: Lorenzo Stoakes <lstoakes(a)gmail.com>
Cc: Matthew Wilcox <willy(a)infradead.org>
Cc: Mel Gorman <mgorman(a)techsingularity.net>
Cc: Michal Hocko <mhocko(a)suse.com>
Cc: Mike Rapoport <rppt(a)kernel.org>
Cc: Minchan Kim <minchan(a)google.com>
Cc: Paul E. McKenney <paulmck(a)kernel.org>
Cc: Peter Oskolkov <posk(a)google.com>
Cc: Peter Xu <peterx(a)redhat.com>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Punit Agrawal <punit.agrawal(a)bytedance.com>
Cc: Sebastian Andrzej Siewior <bigeasy(a)linutronix.de>
Cc: Shakeel Butt <shakeelb(a)google.com>
Cc: Soheil Hassas Yeganeh <soheil(a)google.com>
Cc: Song Liu <songliubraving(a)fb.com>
Cc: Vlastimil Babka <vbabka(a)suse.cz>
Cc: Will Deacon <will(a)kernel.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
lib/maple_tree.c | 52 +++++++++++++++++++++++++++++++++++++--------
1 file changed, 43 insertions(+), 9 deletions(-)
--- a/lib/maple_tree.c~maple_tree-be-more-cautious-about-dead-nodes
+++ a/lib/maple_tree.c
@@ -544,6 +544,7 @@ static inline bool ma_dead_node(const st
return (parent == node);
}
+
/*
* mte_dead_node() - check if the @enode is dead.
* @enode: The encoded maple node
@@ -625,6 +626,8 @@ static inline unsigned int mas_alloc_req
* @node - the maple node
* @type - the node type
*
+ * In the event of a dead node, this array may be %NULL
+ *
* Return: A pointer to the maple node pivots
*/
static inline unsigned long *ma_pivots(struct maple_node *node,
@@ -1096,8 +1099,11 @@ static int mas_ascend(struct ma_state *m
a_type = mas_parent_enum(mas, p_enode);
a_node = mte_parent(p_enode);
a_slot = mte_parent_slot(p_enode);
- pivots = ma_pivots(a_node, a_type);
a_enode = mt_mk_node(a_node, a_type);
+ pivots = ma_pivots(a_node, a_type);
+
+ if (unlikely(ma_dead_node(a_node)))
+ return 1;
if (!set_min && a_slot) {
set_min = true;
@@ -1401,6 +1407,9 @@ static inline unsigned char ma_data_end(
{
unsigned char offset;
+ if (!pivots)
+ return 0;
+
if (type == maple_arange_64)
return ma_meta_end(node, type);
@@ -1436,6 +1445,9 @@ static inline unsigned char mas_data_end
return ma_meta_end(node, type);
pivots = ma_pivots(node, type);
+ if (unlikely(ma_dead_node(node)))
+ return 0;
+
offset = mt_pivots[type] - 1;
if (likely(!pivots[offset]))
return ma_meta_end(node, type);
@@ -4505,6 +4517,9 @@ static inline int mas_prev_node(struct m
node = mas_mn(mas);
slots = ma_slots(node, mt);
pivots = ma_pivots(node, mt);
+ if (unlikely(ma_dead_node(node)))
+ return 1;
+
mas->max = pivots[offset];
if (offset)
mas->min = pivots[offset - 1] + 1;
@@ -4526,6 +4541,9 @@ static inline int mas_prev_node(struct m
slots = ma_slots(node, mt);
pivots = ma_pivots(node, mt);
offset = ma_data_end(node, mt, pivots, mas->max);
+ if (unlikely(ma_dead_node(node)))
+ return 1;
+
if (offset)
mas->min = pivots[offset - 1] + 1;
@@ -4574,6 +4592,7 @@ static inline int mas_next_node(struct m
struct maple_enode *enode;
int level = 0;
unsigned char offset;
+ unsigned char node_end;
enum maple_type mt;
void __rcu **slots;
@@ -4597,7 +4616,11 @@ static inline int mas_next_node(struct m
node = mas_mn(mas);
mt = mte_node_type(mas->node);
pivots = ma_pivots(node, mt);
- } while (unlikely(offset == ma_data_end(node, mt, pivots, mas->max)));
+ node_end = ma_data_end(node, mt, pivots, mas->max);
+ if (unlikely(ma_dead_node(node)))
+ return 1;
+
+ } while (unlikely(offset == node_end));
slots = ma_slots(node, mt);
pivot = mas_safe_pivot(mas, pivots, ++offset, mt);
@@ -4613,6 +4636,9 @@ static inline int mas_next_node(struct m
mt = mte_node_type(mas->node);
slots = ma_slots(node, mt);
pivots = ma_pivots(node, mt);
+ if (unlikely(ma_dead_node(node)))
+ return 1;
+
offset = 0;
pivot = pivots[0];
}
@@ -4659,11 +4685,14 @@ static inline void *mas_next_nentry(stru
return NULL;
}
- pivots = ma_pivots(node, type);
slots = ma_slots(node, type);
- mas->index = mas_safe_min(mas, pivots, mas->offset);
+ pivots = ma_pivots(node, type);
count = ma_data_end(node, type, pivots, mas->max);
- if (ma_dead_node(node))
+ if (unlikely(ma_dead_node(node)))
+ return NULL;
+
+ mas->index = mas_safe_min(mas, pivots, mas->offset);
+ if (unlikely(ma_dead_node(node)))
return NULL;
if (mas->index > max)
@@ -4817,6 +4846,11 @@ retry:
slots = ma_slots(mn, mt);
pivots = ma_pivots(mn, mt);
+ if (unlikely(ma_dead_node(mn))) {
+ mas_rewalk(mas, index);
+ goto retry;
+ }
+
if (offset == mt_pivots[mt])
pivot = mas->max;
else
@@ -6617,11 +6651,11 @@ static inline void *mas_first_entry(stru
while (likely(!ma_is_leaf(mt))) {
MT_BUG_ON(mas->tree, mte_dead_node(mas->node));
slots = ma_slots(mn, mt);
- pivots = ma_pivots(mn, mt);
- max = pivots[0];
entry = mas_slot(mas, slots, 0);
+ pivots = ma_pivots(mn, mt);
if (unlikely(ma_dead_node(mn)))
return NULL;
+ max = pivots[0];
mas->node = entry;
mn = mas_mn(mas);
mt = mte_node_type(mas->node);
@@ -6641,13 +6675,13 @@ static inline void *mas_first_entry(stru
if (likely(entry))
return entry;
- pivots = ma_pivots(mn, mt);
- mas->index = pivots[0] + 1;
mas->offset = 1;
entry = mas_slot(mas, slots, 1);
+ pivots = ma_pivots(mn, mt);
if (unlikely(ma_dead_node(mn)))
return NULL;
+ mas->index = pivots[0] + 1;
if (mas->index > limit)
goto none;
_
Patches currently in -mm which might be from Liam.Howlett(a)oracle.com are
The quilt patch titled
Subject: mm-hugetlb-fix-uffd-wr-protection-for-cow-optimization-path-v3
has been removed from the -mm tree. Its filename was
mm-hugetlb-fix-uffd-wr-protection-for-cow-optimization-path-v3.patch
This patch was dropped because it was folded into mm-hugetlb-fix-uffd-wr-protection-for-cow-optimization-path.patch
------------------------------------------------------
From: Peter Xu <peterx(a)redhat.com>
Subject: mm-hugetlb-fix-uffd-wr-protection-for-cow-optimization-path-v3
Date: Fri, 24 Mar 2023 10:26:20 -0400
Link: https://lkml.kernel.org/r/20230324142620.2344140-1-peterx@redhat.com
Reported-by: Muhammad Usama Anjum <usama.anjum(a)collabora.com>
Cc: linux-stable <stable(a)vger.kernel.org>
Fixes: 166f3ecc0daf ("mm/hugetlb: hook page faults for uffd write protection")
Signed-off-by: Peter Xu <peterx(a)redhat.com>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/hugetlb.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
--- a/mm/hugetlb.c~mm-hugetlb-fix-uffd-wr-protection-for-cow-optimization-path-v3
+++ a/mm/hugetlb.c
@@ -5491,11 +5491,11 @@ static vm_fault_t hugetlb_wp(struct mm_s
* Never handle CoW for uffd-wp protected pages. It should be only
* handled when the uffd-wp protection is removed.
*
- * Note that only the CoW optimization path can trigger this and
- * got skipped, because hugetlb_fault() will always resolve uffd-wp
- * bit first.
+ * Note that only the CoW optimization path (in hugetlb_no_page())
+ * can trigger this, because hugetlb_fault() will always resolve
+ * uffd-wp bit first.
*/
- if (huge_pte_uffd_wp(pte))
+ if (!unshare && huge_pte_uffd_wp(pte))
return 0;
/*
_
Patches currently in -mm which might be from peterx(a)redhat.com are
mm-hugetlb-fix-uffd-wr-protection-for-cow-optimization-path.patch
mm-khugepaged-check-again-on-anon-uffd-wp-during-isolation.patch
mm-uffd-uffd_feature_wp_unpopulated.patch
mm-uffd-uffd_feature_wp_unpopulated-fix.patch
selftests-mm-smoke-test-uffd_feature_wp_unpopulated.patch
From: Oleksij Rempel <o.rempel(a)pengutronix.de>
In the j1939_tp_tx_dat_new() function, an out-of-bounds memory access
could occur during the memcpy() operation if the size of skb->cb is
larger than the size of struct j1939_sk_buff_cb. This is because the
memcpy() operation uses the size of skb->cb, leading to a read beyond
the struct j1939_sk_buff_cb.
Updated the memcpy() operation to use the size of struct
j1939_sk_buff_cb instead of the size of skb->cb. This ensures that the
memcpy() operation only reads the memory within the bounds of struct
j1939_sk_buff_cb, preventing out-of-bounds memory access.
Additionally, add a BUILD_BUG_ON() to check that the size of skb->cb
is greater than or equal to the size of struct j1939_sk_buff_cb. This
ensures that the skb->cb buffer is large enough to hold the
j1939_sk_buff_cb structure.
Fixes: 9d71dd0c7009 ("can: add support of SAE J1939 protocol")
Reported-by: Shuangpeng Bai <sjb7183(a)psu.edu>
Tested-by: Shuangpeng Bai <sjb7183(a)psu.edu>
Signed-off-by: Oleksij Rempel <o.rempel(a)pengutronix.de>
Link: https://groups.google.com/g/syzkaller/c/G_LL-C3plRs/m/-8xCi6dCAgAJ
Link: https://lore.kernel.org/all/20230404073128.3173900-1-o.rempel@pengutronix.de
Cc: stable(a)vger.kernel.org
[mkl: rephrase commit message]
Signed-off-by: Marc Kleine-Budde <mkl(a)pengutronix.de>
---
net/can/j1939/transport.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/net/can/j1939/transport.c b/net/can/j1939/transport.c
index fb92c3609e17..fe3df23a2595 100644
--- a/net/can/j1939/transport.c
+++ b/net/can/j1939/transport.c
@@ -604,7 +604,10 @@ sk_buff *j1939_tp_tx_dat_new(struct j1939_priv *priv,
/* reserve CAN header */
skb_reserve(skb, offsetof(struct can_frame, data));
- memcpy(skb->cb, re_skcb, sizeof(skb->cb));
+ /* skb->cb must be large enough to hold a j1939_sk_buff_cb structure */
+ BUILD_BUG_ON(sizeof(skb->cb) < sizeof(*re_skcb));
+
+ memcpy(skb->cb, re_skcb, sizeof(*re_skcb));
skcb = j1939_skb_to_cb(skb);
if (swap_src_dst)
j1939_skbcb_swap(skcb);
base-commit: 3ce9345580974863c060fa32971537996a7b2d57
--
2.39.2
The patch titled
Subject: mm/khugepaged: check again on anon uffd-wp during isolation
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
mm-khugepaged-check-again-on-anon-uffd-wp-during-isolation.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Peter Xu <peterx(a)redhat.com>
Subject: mm/khugepaged: check again on anon uffd-wp during isolation
Date: Wed, 5 Apr 2023 11:51:20 -0400
Khugepaged collapse an anonymous thp in two rounds of scans. The 2nd
round done in __collapse_huge_page_isolate() after
hpage_collapse_scan_pmd(), during which all the locks will be released
temporarily. It means the pgtable can change during this phase before 2nd
round starts.
It's logically possible some ptes got wr-protected during this phase, and
we can errornously collapse a thp without noticing some ptes are
wr-protected by userfault. e1e267c7928f wanted to avoid it but it only
did that for the 1st phase, not the 2nd phase.
Since __collapse_huge_page_isolate() happens after a round of small page
swapins, we don't need to worry on any !present ptes - if it existed
khugepaged will already bail out. So we only need to check present ptes
with uffd-wp bit set there.
This is something I found only but never had a reproducer, I thought it
was one caused a bug in Muhammad's recent pagemap new ioctl work, but it
turns out it's not the cause of that but an userspace bug. However this
seems to still be a real bug even with a very small race window, still
worth to have it fixed and copy stable.
Link: https://lkml.kernel.org/r/20230405155120.3608140-1-peterx@redhat.com
Fixes: e1e267c7928f ("khugepaged: skip collapse if uffd-wp detected")
Signed-off-by: Peter Xu <peterx(a)redhat.com>
Reviewed-by: David Hildenbrand <david(a)redhat.com>
Reviewed-by: Yang Shi <shy828301(a)gmail.com>
Cc: Andrea Arcangeli <aarcange(a)redhat.com>
Cc: Axel Rasmussen <axelrasmussen(a)google.com>
Cc: Mike Rapoport <rppt(a)linux.vnet.ibm.com>
Cc: Nadav Amit <nadav.amit(a)gmail.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/khugepaged.c | 4 ++++
1 file changed, 4 insertions(+)
--- a/mm/khugepaged.c~mm-khugepaged-check-again-on-anon-uffd-wp-during-isolation
+++ a/mm/khugepaged.c
@@ -573,6 +573,10 @@ static int __collapse_huge_page_isolate(
result = SCAN_PTE_NON_PRESENT;
goto out;
}
+ if (pte_uffd_wp(pteval)) {
+ result = SCAN_PTE_UFFD_WP;
+ goto out;
+ }
page = vm_normal_page(vma, address, pteval);
if (unlikely(!page) || unlikely(is_zone_device_page(page))) {
result = SCAN_PAGE_NULL;
_
Patches currently in -mm which might be from peterx(a)redhat.com are
mm-hugetlb-fix-uffd-wr-protection-for-cow-optimization-path.patch
mm-hugetlb-fix-uffd-wr-protection-for-cow-optimization-path-v2.patch
mm-hugetlb-fix-uffd-wr-protection-for-cow-optimization-path-v3.patch
mm-khugepaged-check-again-on-anon-uffd-wp-during-isolation.patch
mm-uffd-uffd_feature_wp_unpopulated.patch
mm-uffd-uffd_feature_wp_unpopulated-fix.patch
selftests-mm-smoke-test-uffd_feature_wp_unpopulated.patch
The patch titled
Subject: mm/userfaultfd: fix uffd-wp handling for THP migration entries
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
mm-userfaultfd-fix-uffd-wp-handling-for-thp-migration-entries.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: David Hildenbrand <david(a)redhat.com>
Subject: mm/userfaultfd: fix uffd-wp handling for THP migration entries
Date: Wed, 5 Apr 2023 18:02:35 +0200
Looks like what we fixed for hugetlb in commit 44f86392bdd1 ("mm/hugetlb:
fix uffd-wp handling for migration entries in
hugetlb_change_protection()") similarly applies to THP.
Setting/clearing uffd-wp on THP migration entries is not implemented
properly. Further, while removing migration PMDs considers the uffd-wp
bit, inserting migration PMDs does not consider the uffd-wp bit.
We have to set/clear independently of the migration entry type in
change_huge_pmd() and properly copy the uffd-wp bit in
set_pmd_migration_entry().
Verified using a simple reproducer that triggers migration of a THP, that
the set_pmd_migration_entry() no longer loses the uffd-wp bit.
Link: https://lkml.kernel.org/r/20230405160236.587705-2-david@redhat.com
Fixes: f45ec5ff16a7 ("userfaultfd: wp: support swap and page migration")
Signed-off-by: David Hildenbrand <david(a)redhat.com>
Reviewed-by: Peter Xu <peterx(a)redhat.com>
Cc: <stable(a)vger.kernel.org>
Cc: Muhammad Usama Anjum <usama.anjum(a)collabora.com>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/huge_memory.c | 14 ++++++++++++--
1 file changed, 12 insertions(+), 2 deletions(-)
--- a/mm/huge_memory.c~mm-userfaultfd-fix-uffd-wp-handling-for-thp-migration-entries
+++ a/mm/huge_memory.c
@@ -1838,10 +1838,10 @@ int change_huge_pmd(struct mmu_gather *t
if (is_swap_pmd(*pmd)) {
swp_entry_t entry = pmd_to_swp_entry(*pmd);
struct page *page = pfn_swap_entry_to_page(entry);
+ pmd_t newpmd;
VM_BUG_ON(!is_pmd_migration_entry(*pmd));
if (is_writable_migration_entry(entry)) {
- pmd_t newpmd;
/*
* A protection check is difficult so
* just be safe and disable write
@@ -1855,8 +1855,16 @@ int change_huge_pmd(struct mmu_gather *t
newpmd = pmd_swp_mksoft_dirty(newpmd);
if (pmd_swp_uffd_wp(*pmd))
newpmd = pmd_swp_mkuffd_wp(newpmd);
- set_pmd_at(mm, addr, pmd, newpmd);
+ } else {
+ newpmd = *pmd;
}
+
+ if (uffd_wp)
+ newpmd = pmd_swp_mkuffd_wp(newpmd);
+ else if (uffd_wp_resolve)
+ newpmd = pmd_swp_clear_uffd_wp(newpmd);
+ if (!pmd_same(*pmd, newpmd))
+ set_pmd_at(mm, addr, pmd, newpmd);
goto unlock;
}
#endif
@@ -3251,6 +3259,8 @@ int set_pmd_migration_entry(struct page_
pmdswp = swp_entry_to_pmd(entry);
if (pmd_soft_dirty(pmdval))
pmdswp = pmd_swp_mksoft_dirty(pmdswp);
+ if (pmd_uffd_wp(pmdval))
+ pmdswp = pmd_swp_mkuffd_wp(pmdswp);
set_pmd_at(mm, address, pvmw->pmd, pmdswp);
page_remove_rmap(page, vma, true);
put_page(page);
_
Patches currently in -mm which might be from david(a)redhat.com are
mm-userfaultfd-fix-uffd-wp-handling-for-thp-migration-entries.patch
m68k-mm-use-correct-bit-number-in-_page_swp_exclusive-comment.patch
mm-userfaultfd-dont-consider-uffd-wp-bit-of-writable-migration-entries.patch
From: "Steven Rostedt (Google)" <rostedt(a)goodmis.org>
The kernel command line ftrace_boot_snapshot by itself is supposed to
trigger a snapshot at the end of boot up of the main top level trace
buffer. A ftrace_boot_snapshot=foo will do the same for an instance called
foo that was created by trace_instance=foo,...
The logic was broken where if ftrace_boot_snapshot was by itself, it would
trigger a snapshot for all instances that had tracing enabled, regardless
if it asked for a snapshot or not.
When a snapshot is requested for a buffer, the buffer's
tr->allocated_snapshot is set to true. Use that to know if a trace buffer
wants a snapshot at boot up or not.
Since the top level buffer is part of the ftrace_trace_arrays list,
there's no reason to treat it differently than the other buffers. Just
iterate the list if ftrace_boot_snapshot was specified.
Link: https://lkml.kernel.org/r/20230405022341.895334039@goodmis.org
Cc: stable(a)vger.kernel.org
Cc: Masami Hiramatsu <mhiramat(a)kernel.org>
Cc: Mark Rutland <mark.rutland(a)arm.com>
Cc: Andrew Morton <akpm(a)linux-foundation.org>
Cc: Ross Zwisler <zwisler(a)google.com>
Fixes: 9c1c251d670bc ("tracing: Allow boot instances to have snapshot buffers")
Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
---
kernel/trace/trace.c | 13 +++++++------
1 file changed, 7 insertions(+), 6 deletions(-)
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index ed1d1093f5e9..4686473b8497 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -10393,19 +10393,20 @@ __init static int tracer_alloc_buffers(void)
void __init ftrace_boot_snapshot(void)
{
+#ifdef CONFIG_TRACER_MAX_TRACE
struct trace_array *tr;
- if (snapshot_at_boot) {
- tracing_snapshot();
- internal_trace_puts("** Boot snapshot taken **\n");
- }
+ if (!snapshot_at_boot)
+ return;
list_for_each_entry(tr, &ftrace_trace_arrays, list) {
- if (tr == &global_trace)
+ if (!tr->allocated_snapshot)
continue;
- trace_array_puts(tr, "** Boot snapshot taken **\n");
+
tracing_snapshot_instance(tr);
+ trace_array_puts(tr, "** Boot snapshot taken **\n");
}
+#endif
}
void __init early_trace_init(void)
--
2.39.2
Khugepaged collapse an anonymous thp in two rounds of scans. The 2nd round
done in __collapse_huge_page_isolate() after hpage_collapse_scan_pmd(),
during which all the locks will be released temporarily. It means the
pgtable can change during this phase before 2nd round starts.
It's logically possible some ptes got wr-protected during this phase, and
we can errornously collapse a thp without noticing some ptes are
wr-protected by userfault. e1e267c7928f wanted to avoid it but it only did
that for the 1st phase, not the 2nd phase.
Since __collapse_huge_page_isolate() happens after a round of small page
swapins, we don't need to worry on any !present ptes - if it existed
khugepaged will already bail out. So we only need to check present ptes
with uffd-wp bit set there.
This is something I found only but never had a reproducer, I thought it was
one caused a bug in Muhammad's recent pagemap new ioctl work, but it turns
out it's not the cause of that but an userspace bug. However this seems to
still be a real bug even with a very small race window, still worth to have
it fixed and copy stable.
Cc: linux-stable <stable(a)vger.kernel.org>
Fixes: e1e267c7928f ("khugepaged: skip collapse if uffd-wp detected")
Signed-off-by: Peter Xu <peterx(a)redhat.com>
---
mm/khugepaged.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index a19aa140fd52..42ac93b4bd87 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -575,6 +575,10 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma,
result = SCAN_PTE_NON_PRESENT;
goto out;
}
+ if (pte_uffd_wp(pteval)) {
+ result = SCAN_PTE_UFFD_WP;
+ goto out;
+ }
page = vm_normal_page(vma, address, pteval);
if (unlikely(!page) || unlikely(is_zone_device_page(page))) {
result = SCAN_PAGE_NULL;
--
2.39.1
This is a note to let you know that I've just added the patch titled
usb: cdnsp: Fixes error: uninitialized symbol 'len'
to my usb git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb.git
in the usb-linus branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will hopefully also be merged in Linus's tree for the
next -rc kernel release.
If you have any questions about this process, please let me know.
From 1edf48991a783d00a3a18dc0d27c88139e4030a2 Mon Sep 17 00:00:00 2001
From: Pawel Laszczak <pawell(a)cadence.com>
Date: Fri, 31 Mar 2023 05:06:00 -0400
Subject: usb: cdnsp: Fixes error: uninitialized symbol 'len'
The patch 5bc38d33a5a1: "usb: cdnsp: Fixes issue with redundant
Status Stage" leads to the following Smatch static checker warning:
drivers/usb/cdns3/cdnsp-ep0.c:470 cdnsp_setup_analyze()
error: uninitialized symbol 'len'.
cc: <stable(a)vger.kernel.org>
Fixes: 5bc38d33a5a1 ("usb: cdnsp: Fixes issue with redundant Status Stage")
Signed-off-by: Pawel Laszczak <pawell(a)cadence.com>
Link: https://lore.kernel.org/r/20230331090600.454674-1-pawell@cadence.com
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/usb/cdns3/cdnsp-ep0.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/drivers/usb/cdns3/cdnsp-ep0.c b/drivers/usb/cdns3/cdnsp-ep0.c
index d63d5d92f255..f317d3c84781 100644
--- a/drivers/usb/cdns3/cdnsp-ep0.c
+++ b/drivers/usb/cdns3/cdnsp-ep0.c
@@ -414,7 +414,7 @@ static int cdnsp_ep0_std_request(struct cdnsp_device *pdev,
void cdnsp_setup_analyze(struct cdnsp_device *pdev)
{
struct usb_ctrlrequest *ctrl = &pdev->setup;
- int ret = 0;
+ int ret = -EINVAL;
u16 len;
trace_cdnsp_ctrl_req(ctrl);
@@ -424,7 +424,6 @@ void cdnsp_setup_analyze(struct cdnsp_device *pdev)
if (pdev->gadget.state == USB_STATE_NOTATTACHED) {
dev_err(pdev->dev, "ERR: Setup detected in unattached state\n");
- ret = -EINVAL;
goto out;
}
--
2.40.0
Looks like what we fixed for hugetlb in commit 44f86392bdd1 ("mm/hugetlb:
fix uffd-wp handling for migration entries in hugetlb_change_protection()")
similarly applies to THP.
Setting/clearing uffd-wp on THP migration entries is not implemented
properly. Further, while removing migration PMDs considers the uffd-wp
bit, inserting migration PMDs does not consider the uffd-wp bit.
We have to set/clear independently of the migration entry type in
change_huge_pmd() and properly copy the uffd-wp bit in
set_pmd_migration_entry().
Verified using a simple reproducer that triggers migration of a THP, that
the set_pmd_migration_entry() no longer loses the uffd-wp bit.
Fixes: f45ec5ff16a7 ("userfaultfd: wp: support swap and page migration")
Reviewed-by: Peter Xu <peterx(a)redhat.com>
Cc: stable(a)vger.kernel.org
Signed-off-by: David Hildenbrand <david(a)redhat.com>
---
mm/huge_memory.c | 14 ++++++++++++--
1 file changed, 12 insertions(+), 2 deletions(-)
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 032fb0ef9cd1..e3706a2b34b2 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1838,10 +1838,10 @@ int change_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
if (is_swap_pmd(*pmd)) {
swp_entry_t entry = pmd_to_swp_entry(*pmd);
struct page *page = pfn_swap_entry_to_page(entry);
+ pmd_t newpmd;
VM_BUG_ON(!is_pmd_migration_entry(*pmd));
if (is_writable_migration_entry(entry)) {
- pmd_t newpmd;
/*
* A protection check is difficult so
* just be safe and disable write
@@ -1855,8 +1855,16 @@ int change_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
newpmd = pmd_swp_mksoft_dirty(newpmd);
if (pmd_swp_uffd_wp(*pmd))
newpmd = pmd_swp_mkuffd_wp(newpmd);
- set_pmd_at(mm, addr, pmd, newpmd);
+ } else {
+ newpmd = *pmd;
}
+
+ if (uffd_wp)
+ newpmd = pmd_swp_mkuffd_wp(newpmd);
+ else if (uffd_wp_resolve)
+ newpmd = pmd_swp_clear_uffd_wp(newpmd);
+ if (!pmd_same(*pmd, newpmd))
+ set_pmd_at(mm, addr, pmd, newpmd);
goto unlock;
}
#endif
@@ -3251,6 +3259,8 @@ int set_pmd_migration_entry(struct page_vma_mapped_walk *pvmw,
pmdswp = swp_entry_to_pmd(entry);
if (pmd_soft_dirty(pmdval))
pmdswp = pmd_swp_mksoft_dirty(pmdswp);
+ if (pmd_uffd_wp(pmdval))
+ pmdswp = pmd_swp_mkuffd_wp(pmdswp);
set_pmd_at(mm, address, pvmw->pmd, pmdswp);
page_remove_rmap(page, vma, true);
put_page(page);
--
2.39.2
Looks like what we fixed for hugetlb in commit 44f86392bdd1 ("mm/hugetlb:
fix uffd-wp handling for migration entries in hugetlb_change_protection()")
similarly applies to THP.
Setting/clearing uffd-wp on THP migration entries is not implemented
properly. Further, while removing migration PMDs considers the uffd-wp
bit, inserting migration PMDs does not consider the uffd-wp bit.
We have to set/clear independently of the migration entry type in
change_huge_pmd() and properly copy the uffd-wp bit in
set_pmd_migration_entry().
Verified using a simple reproducer that triggers migration of a THP, that
the set_pmd_migration_entry() no longer loses the uffd-wp bit.
Fixes: f45ec5ff16a7 ("userfaultfd: wp: support swap and page migration")
Cc: stable(a)vger.kernel.org
Signed-off-by: David Hildenbrand <david(a)redhat.com>
---
mm/huge_memory.c | 14 ++++++++++++--
1 file changed, 12 insertions(+), 2 deletions(-)
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 032fb0ef9cd1..bdda4f426d58 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1838,10 +1838,10 @@ int change_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
if (is_swap_pmd(*pmd)) {
swp_entry_t entry = pmd_to_swp_entry(*pmd);
struct page *page = pfn_swap_entry_to_page(entry);
+ pmd_t newpmd;
VM_BUG_ON(!is_pmd_migration_entry(*pmd));
if (is_writable_migration_entry(entry)) {
- pmd_t newpmd;
/*
* A protection check is difficult so
* just be safe and disable write
@@ -1855,8 +1855,16 @@ int change_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
newpmd = pmd_swp_mksoft_dirty(newpmd);
if (pmd_swp_uffd_wp(*pmd))
newpmd = pmd_swp_mkuffd_wp(newpmd);
- set_pmd_at(mm, addr, pmd, newpmd);
+ } else {
+ newpmd = *pmd;
}
+
+ if (uffd_wp)
+ newpmd = pmd_swp_mkuffd_wp(newpmd);
+ else if (uffd_wp_resolve)
+ newpmd = pmd_swp_clear_uffd_wp(newpmd);
+ if (!pmd_same(*pmd, newpmd))
+ set_pmd_at(mm, addr, pmd, newpmd);
goto unlock;
}
#endif
@@ -3251,6 +3259,8 @@ int set_pmd_migration_entry(struct page_vma_mapped_walk *pvmw,
pmdswp = swp_entry_to_pmd(entry);
if (pmd_soft_dirty(pmdval))
pmdswp = pmd_swp_mksoft_dirty(pmdswp);
+ if (pmd_swp_uffd_wp(*pvmw->pmd))
+ pmdswp = pmd_swp_mkuffd_wp(pmdswp);
set_pmd_at(mm, address, pvmw->pmd, pmdswp);
page_remove_rmap(page, vma, true);
put_page(page);
--
2.39.2
From: "Steven Rostedt (Google)" <rostedt(a)goodmis.org>
If a trace instance has a failure with its snapshot code, the error
message is to be written to that instance's buffer. But currently, the
message is written to the top level buffer. Worse yet, it may also disable
the top level buffer and not the instance that had the issue.
Link: https://lkml.kernel.org/r/20230405022341.688730321@goodmis.org
Cc: stable(a)vger.kernel.org
Cc: Masami Hiramatsu <mhiramat(a)kernel.org>
Cc: Mark Rutland <mark.rutland(a)arm.com>
Cc: Andrew Morton <akpm(a)linux-foundation.org>
Cc: Ross Zwisler <zwisler(a)google.com>
Fixes: 2824f50332486 ("tracing: Make the snapshot trigger work with instances")
Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
---
kernel/trace/trace.c | 14 +++++++-------
1 file changed, 7 insertions(+), 7 deletions(-)
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 937e9676dfd4..ed1d1093f5e9 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -1149,22 +1149,22 @@ static void tracing_snapshot_instance_cond(struct trace_array *tr,
unsigned long flags;
if (in_nmi()) {
- internal_trace_puts("*** SNAPSHOT CALLED FROM NMI CONTEXT ***\n");
- internal_trace_puts("*** snapshot is being ignored ***\n");
+ trace_array_puts(tr, "*** SNAPSHOT CALLED FROM NMI CONTEXT ***\n");
+ trace_array_puts(tr, "*** snapshot is being ignored ***\n");
return;
}
if (!tr->allocated_snapshot) {
- internal_trace_puts("*** SNAPSHOT NOT ALLOCATED ***\n");
- internal_trace_puts("*** stopping trace here! ***\n");
- tracing_off();
+ trace_array_puts(tr, "*** SNAPSHOT NOT ALLOCATED ***\n");
+ trace_array_puts(tr, "*** stopping trace here! ***\n");
+ tracer_tracing_off(tr);
return;
}
/* Note, snapshot can not be used when the tracer uses it */
if (tracer->use_max_tr) {
- internal_trace_puts("*** LATENCY TRACER ACTIVE ***\n");
- internal_trace_puts("*** Can not use snapshot (sorry) ***\n");
+ trace_array_puts(tr, "*** LATENCY TRACER ACTIVE ***\n");
+ trace_array_puts(tr, "*** Can not use snapshot (sorry) ***\n");
return;
}
--
2.39.2
Hi all,
there seems to be a regression which allows you to bind the same port
twice when the first bind call bound to all ip addresses (i. e. dual stack).
A second bind call for the same port will succeed if you try to bind to
a specific ipv4 (e. g. 127.0.0.1), binding to 0.0.0.0 or an ipv6 address
fails correctly with EADDRINUSE.
I included a small c program below to show the issue. Normally the
second bind call should fail, this was the case before v6.1.
I bisected the regression to commit 5456262d2baa ("net: Fix incorrect
address comparison when searching for a bind2 bucket").
I also checked that the issue is still present in v6.3-rc1.
Original report: https://github.com/containers/podman/issues/17719
#regzbot introduced: 5456262d2baa
```
#include <sys/socket.h>
#include <sys/un.h>
#include <stdlib.h>
#include <stdio.h>
#include <netinet/in.h>
#include <unistd.h>
int main(int argc, char *argv[])
{
int ret, sock1, sock2;
struct sockaddr_in6 addr;
struct sockaddr_in addr2;
sock1 = socket(AF_INET6, SOCK_STREAM, 0);
if (sock1 == -1)
{
perror("socket1");
exit(1);
}
sock2 = socket(AF_INET, SOCK_STREAM, 0);
if (sock2 == -1)
{
perror("socket2");
exit(1);
}
memset(&addr, 0, sizeof(addr));
addr.sin6_family = AF_INET6;
addr.sin6_addr = in6addr_any;
addr.sin6_port = htons(8080);
memset(&addr2, 0, sizeof(addr2));
addr2.sin_family = AF_INET;
addr2.sin_addr.s_addr = htonl(INADDR_LOOPBACK);
addr2.sin_port = htons(8080);
ret = bind(sock1, (struct sockaddr *)&addr, sizeof(addr));
if (ret == -1)
{
perror("bind1");
exit(1);
}
printf("bind1 ret: %d\n", ret);
if ((listen(sock1, 5)) != 0)
{
perror("listen1");
exit(1);
}
ret = bind(sock2, (struct sockaddr *)&addr2, sizeof(addr2));
if (ret == -1)
{
perror("bind2");
exit(1);
}
printf("bind2 ret: %d\n", ret);
if ((listen(sock2, 5)) != 0)
{
perror("listen2");
exit(1);
}
// uncomment pause() to see with ss -tlpn the bound ports
// pause();
return 0;
}
```
Best regards,
Paul
Drivers are supposed to fix this up if needed if they don't outright
reject it. Uncovered by 6c11df58fd1a ("fbmem: Check virtual screen
sizes in fb_set_var()").
Reported-by: syzbot+20dcf81733d43ddff661(a)syzkaller.appspotmail.com
Link: https://syzkaller.appspot.com/bug?id=c5faf983bfa4a607de530cd3bb008888bf06ce…
Cc: stable(a)vger.kernel.org # v5.4+
Cc: Daniel Vetter <daniel(a)ffwll.ch>
Cc: Javier Martinez Canillas <javierm(a)redhat.com>
Cc: Thomas Zimmermann <tzimmermann(a)suse.de>
Signed-off-by: Daniel Vetter <daniel.vetter(a)intel.com>
---
drivers/gpu/drm/drm_fb_helper.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/drivers/gpu/drm/drm_fb_helper.c b/drivers/gpu/drm/drm_fb_helper.c
index 59409820f424..eb4ece3e0027 100644
--- a/drivers/gpu/drm/drm_fb_helper.c
+++ b/drivers/gpu/drm/drm_fb_helper.c
@@ -1595,6 +1595,9 @@ int drm_fb_helper_check_var(struct fb_var_screeninfo *var,
return -EINVAL;
}
+ var->xres_virtual = fb->width;
+ var->yres_virtual = fb->height;
+
/*
* Workaround for SDL 1.2, which is known to be setting all pixel format
* fields values to zero in some cases. We treat this situation as a
--
2.40.0
Instead of calling aperture_remove_conflicting_devices() to remove the
conflicting devices, just call to aperture_detach_devices() to detach
the device that matches the same PCI BAR / aperture range. Since the
former is just a wrapper of the latter plus a sysfb_disable() call,
and now that's done in this function but only for the primary devices.
This fixes a regression introduced by ee7a69aa38d8 ("fbdev: Disable
sysfb device registration when removing conflicting FBs"), where we
remove the sysfb when loading a driver for an unrelated pci device,
resulting in the user loosing their efifb console or similar.
Note that in practice this only is a problem with the nvidia blob,
because that's the only gpu driver people might install which does not
come with an fbdev driver of it's own. For everyone else the real gpu
driver will restore a working console.
Also note that in the referenced bug there's confusion that this same
bug also happens on amdgpu. But that was just another amdgpu specific
regression, which just happened to happen at roughly the same time and
with the same user-observable symptoms. That bug is fixed now, see
https://bugzilla.kernel.org/show_bug.cgi?id=216331#c15
Note that we should not have any such issues on non-pci multi-gpu
issues, because I could only find two such cases:
- SoC with some external panel over spi or similar. These panel
drivers do not use drm_aperture_remove_conflicting_framebuffers(),
so no problem.
- vga+mga, which is a direct console driver and entirely bypasses all
this.
For the above reasons the cc: stable is just notionally, this patch
will need a backport and that's up to nvidia if they care enough.
v2:
- Explain a bit better why other multi-gpu that aren't pci shouldn't
have any issues with making all this fully pci specific.
v3
- polish commit message (Javier)
Fixes: ee7a69aa38d8 ("fbdev: Disable sysfb device registration when removing conflicting FBs")
Tested-by: Aaron Plattner <aplattner(a)nvidia.com>
Reviewed-by: Javier Martinez Canillas <javierm(a)redhat.com>
References: https://bugzilla.kernel.org/show_bug.cgi?id=216303#c28
Signed-off-by: Daniel Vetter <daniel.vetter(a)intel.com>
Cc: Aaron Plattner <aplattner(a)nvidia.com>
Cc: Javier Martinez Canillas <javierm(a)redhat.com>
Cc: Thomas Zimmermann <tzimmermann(a)suse.de>
Cc: Helge Deller <deller(a)gmx.de>
Cc: Sam Ravnborg <sam(a)ravnborg.org>
Cc: Alex Deucher <alexander.deucher(a)amd.com>
Cc: <stable(a)vger.kernel.org> # v5.19+ (if someone else does the backport)
---
drivers/video/aperture.c | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/drivers/video/aperture.c b/drivers/video/aperture.c
index 8f1437339e49..2394c2d310f8 100644
--- a/drivers/video/aperture.c
+++ b/drivers/video/aperture.c
@@ -321,15 +321,16 @@ int aperture_remove_conflicting_pci_devices(struct pci_dev *pdev, const char *na
primary = pdev == vga_default_device();
+ if (primary)
+ sysfb_disable();
+
for (bar = 0; bar < PCI_STD_NUM_BARS; ++bar) {
if (!(pci_resource_flags(pdev, bar) & IORESOURCE_MEM))
continue;
base = pci_resource_start(pdev, bar);
size = pci_resource_len(pdev, bar);
- ret = aperture_remove_conflicting_devices(base, size, name);
- if (ret)
- return ret;
+ aperture_detach_devices(base, size);
}
if (primary) {
--
2.40.0
Re-enable the console device after suspending, causes its cflags,
ispeed and ospeed to be set anew, basing on the values stored in
uport->cons. These values are set only once,
when parsing console parameters after boot (see uart_set_options()),
next after configuring a port in uart_port_startup() these parameters
(cflags, ispeed and ospeed) are copied to termios structure and
the original one (stored in uport->cons) are cleared, but there is no place
in code where those fields are checked against 0.
When kernel calls uart_resume_port() and setups console, it copies cflags,
ispeed and ospeed values from uart->cons, but those are already cleared.
The effect is that the console is broken.
This patch address the issue by preserving the cflags, ispeed and
ospeed fields in uart->cons during uart_port_startup().
Signed-off-by: Lukasz Majczak <lma(a)semihalf.com>
Cc: <stable(a)vger.kernel.org> # 4.14+
---
drivers/tty/serial/serial_core.c | 3 ---
1 file changed, 3 deletions(-)
diff --git a/drivers/tty/serial/serial_core.c b/drivers/tty/serial/serial_core.c
index 2bd32c8ece39..394a05c09d87 100644
--- a/drivers/tty/serial/serial_core.c
+++ b/drivers/tty/serial/serial_core.c
@@ -225,9 +225,6 @@ static int uart_port_startup(struct tty_struct *tty, struct uart_state *state,
tty->termios.c_cflag = uport->cons->cflag;
tty->termios.c_ispeed = uport->cons->ispeed;
tty->termios.c_ospeed = uport->cons->ospeed;
- uport->cons->cflag = 0;
- uport->cons->ispeed = 0;
- uport->cons->ospeed = 0;
}
/*
* Initialise the hardware port settings.
--
2.40.0.577.gac1e443424-goog
From: Oliver Hartkopp <socketcan(a)hartkopp.net>
As discussed with Dae R. Jeong and Hillf Danton here [1] the sendmsg()
function in isotp.c might get into a race condition when restoring the
former tx.state from the old_state.
Remove the old_state concept and implement proper locking for the
ISOTP_IDLE transitions in isotp_sendmsg(), inspired by a
simplification idea from Hillf Danton.
Introduce a new tx.state ISOTP_SHUTDOWN and use the same locking
mechanism from isotp_release() which resolves a potential race between
isotp_sendsmg() and isotp_release().
[1] https://lore.kernel.org/linux-can/ZB%2F93xJxq%2FBUqAgG@dragonet
v1: https://lore.kernel.org/all/20230331102114.15164-1-socketcan@hartkopp.net
v2: https://lore.kernel.org/all/20230331123600.3550-1-socketcan@hartkopp.net
take care of signal interrupts for wait_event_interruptible() in
isotp_release()
v3: https://lore.kernel.org/all/20230331130654.9886-1-socketcan@hartkopp.net
take care of signal interrupts for wait_event_interruptible() in
isotp_sendmsg() in the wait_tx_done case
v4: https://lore.kernel.org/all/20230331131935.21465-1-socketcan@hartkopp.net
take care of signal interrupts for wait_event_interruptible() in
isotp_sendmsg() in ALL cases
Cc: Dae R. Jeong <threeearcat(a)gmail.com>
Cc: Hillf Danton <hdanton(a)sina.com>
Signed-off-by: Oliver Hartkopp <socketcan(a)hartkopp.net>
Fixes: 4f027cba8216 ("can: isotp: split tx timer into transmission and timeout")
Link: https://lore.kernel.org/all/20230331131935.21465-1-socketcan@hartkopp.net
Cc: stable(a)vger.kernel.org
[mkl: rephrase commit message]
Signed-off-by: Marc Kleine-Budde <mkl(a)pengutronix.de>
---
net/can/isotp.c | 55 ++++++++++++++++++++++++++++---------------------
1 file changed, 31 insertions(+), 24 deletions(-)
diff --git a/net/can/isotp.c b/net/can/isotp.c
index 281b7766c54e..5761d4ab839d 100644
--- a/net/can/isotp.c
+++ b/net/can/isotp.c
@@ -119,7 +119,8 @@ enum {
ISOTP_WAIT_FIRST_FC,
ISOTP_WAIT_FC,
ISOTP_WAIT_DATA,
- ISOTP_SENDING
+ ISOTP_SENDING,
+ ISOTP_SHUTDOWN,
};
struct tpcon {
@@ -880,8 +881,8 @@ static enum hrtimer_restart isotp_tx_timer_handler(struct hrtimer *hrtimer)
txtimer);
struct sock *sk = &so->sk;
- /* don't handle timeouts in IDLE state */
- if (so->tx.state == ISOTP_IDLE)
+ /* don't handle timeouts in IDLE or SHUTDOWN state */
+ if (so->tx.state == ISOTP_IDLE || so->tx.state == ISOTP_SHUTDOWN)
return HRTIMER_NORESTART;
/* we did not get any flow control or echo frame in time */
@@ -918,7 +919,6 @@ static int isotp_sendmsg(struct socket *sock, struct msghdr *msg, size_t size)
{
struct sock *sk = sock->sk;
struct isotp_sock *so = isotp_sk(sk);
- u32 old_state = so->tx.state;
struct sk_buff *skb;
struct net_device *dev;
struct canfd_frame *cf;
@@ -928,23 +928,24 @@ static int isotp_sendmsg(struct socket *sock, struct msghdr *msg, size_t size)
int off;
int err;
- if (!so->bound)
+ if (!so->bound || so->tx.state == ISOTP_SHUTDOWN)
return -EADDRNOTAVAIL;
+wait_free_buffer:
/* we do not support multiple buffers - for now */
- if (cmpxchg(&so->tx.state, ISOTP_IDLE, ISOTP_SENDING) != ISOTP_IDLE ||
- wq_has_sleeper(&so->wait)) {
- if (msg->msg_flags & MSG_DONTWAIT) {
- err = -EAGAIN;
- goto err_out;
- }
+ if (wq_has_sleeper(&so->wait) && (msg->msg_flags & MSG_DONTWAIT))
+ return -EAGAIN;
- /* wait for complete transmission of current pdu */
- err = wait_event_interruptible(so->wait, so->tx.state == ISOTP_IDLE);
- if (err)
- goto err_out;
+ /* wait for complete transmission of current pdu */
+ err = wait_event_interruptible(so->wait, so->tx.state == ISOTP_IDLE);
+ if (err)
+ goto err_event_drop;
- so->tx.state = ISOTP_SENDING;
+ if (cmpxchg(&so->tx.state, ISOTP_IDLE, ISOTP_SENDING) != ISOTP_IDLE) {
+ if (so->tx.state == ISOTP_SHUTDOWN)
+ return -EADDRNOTAVAIL;
+
+ goto wait_free_buffer;
}
if (!size || size > MAX_MSG_LENGTH) {
@@ -1074,7 +1075,9 @@ static int isotp_sendmsg(struct socket *sock, struct msghdr *msg, size_t size)
if (wait_tx_done) {
/* wait for complete transmission of current pdu */
- wait_event_interruptible(so->wait, so->tx.state == ISOTP_IDLE);
+ err = wait_event_interruptible(so->wait, so->tx.state == ISOTP_IDLE);
+ if (err)
+ goto err_event_drop;
if (sk->sk_err)
return -sk->sk_err;
@@ -1082,13 +1085,15 @@ static int isotp_sendmsg(struct socket *sock, struct msghdr *msg, size_t size)
return size;
+err_event_drop:
+ /* got signal: force tx state machine to be idle */
+ so->tx.state = ISOTP_IDLE;
+ hrtimer_cancel(&so->txfrtimer);
+ hrtimer_cancel(&so->txtimer);
err_out_drop:
/* drop this PDU and unlock a potential wait queue */
- old_state = ISOTP_IDLE;
-err_out:
- so->tx.state = old_state;
- if (so->tx.state == ISOTP_IDLE)
- wake_up_interruptible(&so->wait);
+ so->tx.state = ISOTP_IDLE;
+ wake_up_interruptible(&so->wait);
return err;
}
@@ -1150,10 +1155,12 @@ static int isotp_release(struct socket *sock)
net = sock_net(sk);
/* wait for complete transmission of current pdu */
- wait_event_interruptible(so->wait, so->tx.state == ISOTP_IDLE);
+ while (wait_event_interruptible(so->wait, so->tx.state == ISOTP_IDLE) == 0 &&
+ cmpxchg(&so->tx.state, ISOTP_IDLE, ISOTP_SHUTDOWN) != ISOTP_IDLE)
+ ;
/* force state machines to be idle also when a signal occurred */
- so->tx.state = ISOTP_IDLE;
+ so->tx.state = ISOTP_SHUTDOWN;
so->rx.state = ISOTP_IDLE;
spin_lock(&isotp_notifier_lock);
--
2.39.2
From: Michal Sojka <michal.sojka(a)cvut.cz>
When using select()/poll()/epoll() with a non-blocking ISOTP socket to
wait for when non-blocking write is possible, a false EPOLLOUT event
is sometimes returned. This can happen at least after sending a
message which must be split to multiple CAN frames.
The reason is that isotp_sendmsg() returns -EAGAIN when tx.state is
not equal to ISOTP_IDLE and this behavior is not reflected in
datagram_poll(), which is used in isotp_ops.
This is fixed by introducing ISOTP-specific poll function, which
suppresses the EPOLLOUT events in that case.
v2: https://lore.kernel.org/all/20230302092812.320643-1-michal.sojka@cvut.cz
v1: https://lore.kernel.org/all/20230224010659.48420-1-michal.sojka@cvut.czhttps://lore.kernel.org/all/b53a04a2-ba1f-3858-84c1-d3eb3301ae15@hartkopp.n…
Signed-off-by: Michal Sojka <michal.sojka(a)cvut.cz>
Reported-by: Jakub Jira <jirajak2(a)fel.cvut.cz>
Tested-by: Oliver Hartkopp <socketcan(a)hartkopp.net>
Acked-by: Oliver Hartkopp <socketcan(a)hartkopp.net>
Fixes: e057dd3fc20f ("can: add ISO 15765-2:2016 transport protocol")
Link: https://lore.kernel.org/all/20230331125511.372783-1-michal.sojka@cvut.cz
Cc: stable(a)vger.kernel.org
Signed-off-by: Marc Kleine-Budde <mkl(a)pengutronix.de>
---
net/can/isotp.c | 17 ++++++++++++++++-
1 file changed, 16 insertions(+), 1 deletion(-)
diff --git a/net/can/isotp.c b/net/can/isotp.c
index 47c2ebad10ed..281b7766c54e 100644
--- a/net/can/isotp.c
+++ b/net/can/isotp.c
@@ -1608,6 +1608,21 @@ static int isotp_init(struct sock *sk)
return 0;
}
+static __poll_t isotp_poll(struct file *file, struct socket *sock, poll_table *wait)
+{
+ struct sock *sk = sock->sk;
+ struct isotp_sock *so = isotp_sk(sk);
+
+ __poll_t mask = datagram_poll(file, sock, wait);
+ poll_wait(file, &so->wait, wait);
+
+ /* Check for false positives due to TX state */
+ if ((mask & EPOLLWRNORM) && (so->tx.state != ISOTP_IDLE))
+ mask &= ~(EPOLLOUT | EPOLLWRNORM);
+
+ return mask;
+}
+
static int isotp_sock_no_ioctlcmd(struct socket *sock, unsigned int cmd,
unsigned long arg)
{
@@ -1623,7 +1638,7 @@ static const struct proto_ops isotp_ops = {
.socketpair = sock_no_socketpair,
.accept = sock_no_accept,
.getname = isotp_getname,
- .poll = datagram_poll,
+ .poll = isotp_poll,
.ioctl = isotp_sock_no_ioctlcmd,
.gettstamp = sock_gettstamp,
.listen = sock_no_listen,
--
2.39.2
From: Oliver Hartkopp <socketcan(a)hartkopp.net>
isotp.c was still using sock_recv_timestamp() which does not provide
control messages to detect dropped PDUs in the receive path.
Fixes: e057dd3fc20f ("can: add ISO 15765-2:2016 transport protocol")
Signed-off-by: Oliver Hartkopp <socketcan(a)hartkopp.net>
Link: https://lore.kernel.org/all/20230330170248.62342-1-socketcan@hartkopp.net
Cc: stable(a)vger.kernel.org
Signed-off-by: Marc Kleine-Budde <mkl(a)pengutronix.de>
---
net/can/isotp.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/can/isotp.c b/net/can/isotp.c
index 9bc344851704..47c2ebad10ed 100644
--- a/net/can/isotp.c
+++ b/net/can/isotp.c
@@ -1120,7 +1120,7 @@ static int isotp_recvmsg(struct socket *sock, struct msghdr *msg, size_t size,
if (ret < 0)
goto out_err;
- sock_recv_timestamp(msg, sk, skb);
+ sock_recv_cmsgs(msg, sk, skb);
if (msg->msg_name) {
__sockaddr_check_size(ISOTP_MIN_NAMELEN);
--
2.39.2
This is the start of the stable review cycle for the 4.14.312 release.
There are 66 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Wed, 05 Apr 2023 14:03:18 +0000.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.14.312-r…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.14.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 4.14.312-rc1
Harshit Mogalapalli <harshit.m.mogalapalli(a)oracle.com>
ca8210: Fix unsigned mac_len comparison with zero in ca8210_skb_tx()
Jamal Hadi Salim <jhs(a)mojatatu.com>
net: sched: cbq: dont intepret cls results when asked to drop
Ye Bin <yebin10(a)huawei.com>
ext4: fix kernel BUG in 'ext4_write_inline_data_end()'
Dan Carpenter <dan.carpenter(a)oracle.com>
usb: host: ohci-pxa27x: Fix and & vs | typo
Heiko Carstens <hca(a)linux.ibm.com>
s390/uaccess: add missing earlyclobber annotations to __clear_user()
Lucas Stach <l.stach(a)pengutronix.de>
drm/etnaviv: fix reference leak when mmaping imported buffer
Takashi Iwai <tiwai(a)suse.de>
ALSA: usb-audio: Fix regression on detection of Roland VS-100
Takashi Iwai <tiwai(a)suse.de>
ALSA: hda/conexant: Partial revert of a quirk for Lenovo
Johan Hovold <johan+linaro(a)kernel.org>
pinctrl: at91-pio4: fix domain name assignment
Juergen Gross <jgross(a)suse.com>
xen/netback: don't do grant copy across page boundary
David Disseldorp <ddiss(a)suse.de>
cifs: fix DFS traversal oops without CONFIG_CIFS_DFS_UPCALL
Jason A. Donenfeld <Jason(a)zx2c4.com>
Input: focaltech - use explicitly signed char type
Radoslaw Tyl <radoslawx.tyl(a)intel.com>
i40e: fix registers dump after run ethtool adapter self test
Ivan Orlov <ivan.orlov0322(a)gmail.com>
can: bcm: bcm_tx_setup(): fix KMSAN uninit-value in vfs_write
Tomas Henzl <thenzl(a)redhat.com>
scsi: megaraid_sas: Fix crash after a double completion
Wei Chen <harperchen1110(a)gmail.com>
fbdev: au1200fb: Fix potential divide by zero
Wei Chen <harperchen1110(a)gmail.com>
fbdev: lxfb: Fix potential divide by zero
Wei Chen <harperchen1110(a)gmail.com>
fbdev: intelfb: Fix potential divide by zero
Wei Chen <harperchen1110(a)gmail.com>
fbdev: nvidia: Fix potential divide by zero
Linus Torvalds <torvalds(a)linux-foundation.org>
sched_getaffinity: don't assume 'cpumask_size()' is fully initialized
Wei Chen <harperchen1110(a)gmail.com>
fbdev: tgafb: Fix potential divide by zero
Kuninori Morimoto <kuninori.morimoto.gx(a)renesas.com>
ALSA: hda/ca0132: fixup buffer overrun at tuning_ctl_set()
Kuninori Morimoto <kuninori.morimoto.gx(a)renesas.com>
ALSA: asihpi: check pao in control_message()
NeilBrown <neilb(a)suse.de>
md: avoid signed overflow in slot_store()
Jan Kara via Ocfs2-devel <ocfs2-devel(a)oss.oracle.com>
ocfs2: fix data corruption after failed write
Vincent Guittot <vincent.guittot(a)linaro.org>
sched/fair: Sanitize vruntime of entity being migrated
Zhang Qiao <zhangqiao22(a)huawei.com>
sched/fair: sanitize vruntime of entity being placed
Mikulas Patocka <mpatocka(a)redhat.com>
dm crypt: add cond_resched() to dmcrypt_write()
Jiasheng Jiang <jiasheng(a)iscas.ac.cn>
dm stats: check for and propagate alloc_percpu failure
Wei Chen <harperchen1110(a)gmail.com>
i2c: xgene-slimpro: Fix out-of-bounds bug in xgene_slimpro_i2c_xfer()
Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
nilfs2: fix kernel-infoleak in nilfs_ioctl_wrap_copy()
Xu Yang <xu.yang_2(a)nxp.com>
usb: chipidea: core: fix possible concurrent when switch role
Xu Yang <xu.yang_2(a)nxp.com>
usb: chipdea: core: fix return -EINVAL if request role is the same with current role
Lin Ma <linma(a)zju.edu.cn>
igb: revert rtnl_lock() that causes deadlock
Alvin Šipraga <alsi(a)bang-olufsen.dk>
usb: gadget: u_audio: don't let userspace block driver unbind
Joel Selvaraj <joelselvaraj.oss(a)gmail.com>
scsi: core: Add BLIST_SKIP_VPD_PAGES for SKhynix H28U74301AMR
Al Viro <viro(a)zeniv.linux.org.uk>
sh: sanitize the flags on sigreturn
Enrico Sau <enrico.sau(a)gmail.com>
net: usb: qmi_wwan: add Telit 0x1080 composition
Enrico Sau <enrico.sau(a)gmail.com>
net: usb: cdc_mbim: avoid altsetting toggling for Telit FE990
Adrien Thierry <athierry(a)redhat.com>
scsi: ufs: core: Add soft dependency on governor_simpleondemand
Maurizio Lombardi <mlombard(a)redhat.com>
scsi: target: iscsi: Fix an error message in iscsi_check_key()
Michael Schmitz <schmitzmic(a)gmail.com>
m68k: Only force 030 bus error if PC not in exception table
Alexander Aring <aahringo(a)redhat.com>
ca8210: fix mac_len negative array access
Alexandre Ghiti <alex(a)ghiti.fr>
riscv: Bump COMMAND_LINE_SIZE value to 1024
Mario Limonciello <mario.limonciello(a)amd.com>
thunderbolt: Use const qualifier for `ring_interrupt_index`
Yaroslav Furman <yaro330(a)gmail.com>
uas: Add US_FL_NO_REPORT_OPCODES for JMicron JMS583Gen 2
Frank Crawford <frank(a)crawford.emu.id.au>
hwmon (it87): Fix voltage scaling for chips with 10.9mV ADCs
Zheng Wang <zyytlz.wz(a)163.com>
Bluetooth: btsdio: fix use after free bug in btsdio_remove due to unfinished work
Stephan Gerhold <stephan.gerhold(a)kernkonzept.com>
Bluetooth: btqcomsmd: Fix command timeout after setting BD address
Liang He <windhl(a)126.com>
net: mdio: thunder: Add missing fwnode_handle_put()
Roger Pau Monne <roger.pau(a)citrix.com>
hvc/xen: prevent concurrent accesses to the shared ring
Li Zetao <lizetao1(a)huawei.com>
atm: idt77252: fix kmemleak when rmmod idt77252
Maher Sanalla <msanalla(a)nvidia.com>
net/mlx5: Read the TC mapping of all priorities on ETS query
Daniel Borkmann <daniel(a)iogearbox.net>
bpf: Adjust insufficient default bpf_jit_limit
Geoff Levand <geoff(a)infradead.org>
net/ps3_gelic_net: Use dma_mapping_error
Geoff Levand <geoff(a)infradead.org>
net/ps3_gelic_net: Fix RX sk_buff length
Zheng Wang <zyytlz.wz(a)163.com>
net: qcom/emac: Fix use after free bug in emac_remove due to race condition
Zheng Wang <zyytlz.wz(a)163.com>
xirc2ps_cs: Fix use after free bug in xirc2ps_detach
Daniil Tatianin <d-tatianin(a)yandex-team.ru>
qed/qed_sriov: guard against NULL derefs from qed_iov_get_vf_info
Szymon Heidrich <szymon.heidrich(a)gmail.com>
net: usb: smsc95xx: Limit packet length to skb->len
Yu Kuai <yukuai3(a)huawei.com>
scsi: scsi_dh_alua: Fix memleak for 'qdata' in alua_activate()
Alexander Stein <alexander.stein(a)ew.tq-group.com>
i2c: imx-lpi2c: check only for enabled interrupt flags
Akihiko Odaki <akihiko.odaki(a)daynix.com>
igbvf: Regard vf reset nack as success
Gaosheng Cui <cuigaosheng1(a)huawei.com>
intel/igbvf: free irq on the error path in igbvf_request_msix()
Alexander Lobakin <aleksander.lobakin(a)intel.com>
iavf: fix inverted Rx hash condition leading to disabled hash
Zheng Wang <zyytlz.wz(a)163.com>
power: supply: da9150: Fix use after free bug in da9150_charger_remove due to race condition
-------------
Diffstat:
Makefile | 4 +-
arch/m68k/kernel/traps.c | 4 +-
arch/riscv/include/uapi/asm/setup.h | 8 ++++
arch/s390/lib/uaccess.c | 2 +-
arch/sh/include/asm/processor_32.h | 1 +
arch/sh/kernel/signal_32.c | 3 ++
drivers/atm/idt77252.c | 11 +++++
drivers/bluetooth/btqcomsmd.c | 17 ++++++-
drivers/bluetooth/btsdio.c | 1 +
drivers/gpu/drm/etnaviv/etnaviv_gem_prime.c | 10 +++-
drivers/hwmon/it87.c | 4 +-
drivers/i2c/busses/i2c-imx-lpi2c.c | 4 ++
drivers/i2c/busses/i2c-xgene-slimpro.c | 3 ++
drivers/input/mouse/focaltech.c | 8 ++--
drivers/md/dm-crypt.c | 1 +
drivers/md/dm-stats.c | 7 ++-
drivers/md/dm-stats.h | 2 +-
drivers/md/dm.c | 4 +-
drivers/md/md.c | 3 ++
drivers/net/ethernet/intel/i40e/i40e_diag.c | 11 +++--
drivers/net/ethernet/intel/i40e/i40e_diag.h | 2 +-
drivers/net/ethernet/intel/i40evf/i40e_txrx.c | 2 +-
drivers/net/ethernet/intel/igb/igb_main.c | 2 -
drivers/net/ethernet/intel/igbvf/netdev.c | 8 +++-
drivers/net/ethernet/intel/igbvf/vf.c | 13 ++++--
drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c | 6 ++-
drivers/net/ethernet/qlogic/qed/qed_sriov.c | 5 +-
drivers/net/ethernet/qualcomm/emac/emac.c | 6 +++
drivers/net/ethernet/toshiba/ps3_gelic_net.c | 41 ++++++++--------
drivers/net/ethernet/toshiba/ps3_gelic_net.h | 5 +-
drivers/net/ethernet/xircom/xirc2ps_cs.c | 5 ++
drivers/net/ieee802154/ca8210.c | 5 +-
drivers/net/phy/mdio-thunder.c | 1 +
drivers/net/usb/cdc_mbim.c | 5 ++
drivers/net/usb/qmi_wwan.c | 1 +
drivers/net/usb/smsc95xx.c | 6 +++
drivers/net/xen-netback/common.h | 2 +-
drivers/net/xen-netback/netback.c | 25 +++++++++-
drivers/pinctrl/pinctrl-at91-pio4.c | 1 -
drivers/power/supply/da9150-charger.c | 1 +
drivers/scsi/device_handler/scsi_dh_alua.c | 6 ++-
drivers/scsi/megaraid/megaraid_sas_fusion.c | 4 +-
drivers/scsi/scsi_devinfo.c | 1 +
drivers/scsi/ufs/ufshcd.c | 1 +
drivers/target/iscsi/iscsi_target_parameters.c | 12 +++--
drivers/thunderbolt/nhi.c | 2 +-
drivers/tty/hvc/hvc_xen.c | 19 +++++++-
drivers/usb/chipidea/ci.h | 2 +
drivers/usb/chipidea/core.c | 11 ++++-
drivers/usb/chipidea/otg.c | 5 +-
drivers/usb/gadget/function/u_audio.c | 2 +-
drivers/usb/host/ohci-pxa27x.c | 2 +-
drivers/usb/storage/unusual_uas.h | 7 +++
drivers/video/fbdev/au1200fb.c | 3 ++
drivers/video/fbdev/geode/lxfb_core.c | 3 ++
drivers/video/fbdev/intelfb/intelfbdrv.c | 3 ++
drivers/video/fbdev/nvidia/nvidia.c | 2 +
drivers/video/fbdev/tgafb.c | 3 ++
fs/cifs/cifsfs.h | 5 +-
fs/ext4/inode.c | 3 +-
fs/nilfs2/ioctl.c | 2 +-
fs/ocfs2/aops.c | 18 +++++++-
kernel/bpf/core.c | 2 +-
kernel/compat.c | 2 +-
kernel/sched/core.c | 7 ++-
kernel/sched/fair.c | 54 ++++++++++++++++++++--
net/can/bcm.c | 16 ++++---
net/sched/sch_cbq.c | 4 +-
sound/pci/asihpi/hpi6205.c | 2 +-
sound/pci/hda/patch_ca0132.c | 4 +-
sound/pci/hda/patch_conexant.c | 6 ++-
sound/usb/format.c | 8 +++-
72 files changed, 370 insertions(+), 101 deletions(-)
From: "Steven Rostedt (Google)" <rostedt(a)goodmis.org>
The kernel command line ftrace_boot_snapshot by itself is supposed to
trigger a snapshot at the end of boot up of the main top level trace
buffer. A ftrace_boot_snapshot=foo will do the same for an instance called
foo that was created by trace_instance=foo,...
The logic was broken where if ftrace_boot_snapshot was by itself, it would
trigger a snapshot for all instances that had tracing enabled, regardless
if it asked for a snapshot or not.
When a snapshot is requested for a buffer, the buffer's
tr->allocated_snapshot is set to true. Use that to know if a trace buffer
wants a snapshot at boot up or not.
Since the top level buffer is part of the ftrace_trace_arrays list,
there's no reason to treat it differently than the other buffers. Just
iterate the list if ftrace_boot_snapshot was specified.
Cc: stable(a)vger.kernel.org
Fixes: 9c1c251d670bc ("tracing: Allow boot instances to have snapshot buffers")
Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
---
kernel/trace/trace.c | 11 +++++------
1 file changed, 5 insertions(+), 6 deletions(-)
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index ed1d1093f5e9..8ae51f1dea8e 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -10395,16 +10395,15 @@ void __init ftrace_boot_snapshot(void)
{
struct trace_array *tr;
- if (snapshot_at_boot) {
- tracing_snapshot();
- internal_trace_puts("** Boot snapshot taken **\n");
- }
+ if (!snapshot_at_boot)
+ return;
list_for_each_entry(tr, &ftrace_trace_arrays, list) {
- if (tr == &global_trace)
+ if (!tr->allocated_snapshot)
continue;
- trace_array_puts(tr, "** Boot snapshot taken **\n");
+
tracing_snapshot_instance(tr);
+ trace_array_puts(tr, "** Boot snapshot taken **\n");
}
}
--
2.39.2
From: "Steven Rostedt (Google)" <rostedt(a)goodmis.org>
The kernel command line ftrace_boot_snapshot by itself is supposed to
trigger a snapshot at the end of boot up of the main top level trace
buffer. A ftrace_boot_snapshot=foo will do the same for an instance called
foo that was created by trace_instance=foo,...
The logic was broken where if ftrace_boot_snapshot was by itself, it would
trigger a snapshot for all instances that had tracing enabled, regardless
if it asked for a snapshot or not.
When a snapshot is requested for a buffer, the buffer's
tr->allocated_snapshot is set to true. Use that to know if a trace buffer
wants a snapshot at boot up or not.
Since the top level buffer is part of the ftrace_trace_arrays list,
there's no reason to treat it differently than the other buffers. Just
iterate the list if ftrace_boot_snapshot was specified.
Cc: stable(a)vger.kernel.org
Fixes: 9c1c251d670bc ("tracing: Allow boot instances to have snapshot buffers")
Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
---
Changes since v1: https://lkml.kernel.org/r/20230404230308.501833715@goodmis.org
- Protect use of tr->allocated_snapshot around #ifdef TRACER_MAX_TRACE
kernel/trace/trace.c | 13 +++++++------
1 file changed, 7 insertions(+), 6 deletions(-)
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 93740a9370c6..36a6037823cd 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -10394,19 +10394,20 @@ __init static int tracer_alloc_buffers(void)
void __init ftrace_boot_snapshot(void)
{
+#ifdef CONFIG_TRACER_MAX_TRACE
struct trace_array *tr;
- if (snapshot_at_boot) {
- tracing_snapshot();
- internal_trace_puts("** Boot snapshot taken **\n");
- }
+ if (!snapshot_at_boot)
+ return;
list_for_each_entry(tr, &ftrace_trace_arrays, list) {
- if (tr == &global_trace)
+ if (!tr->allocated_snapshot)
continue;
- trace_array_puts(tr, "** Boot snapshot taken **\n");
+
tracing_snapshot_instance(tr);
+ trace_array_puts(tr, "** Boot snapshot taken **\n");
}
+#endif
}
void __init early_trace_init(void)
--
2.39.2
From: "Steven Rostedt (Google)" <rostedt(a)goodmis.org>
If a trace instance has a failure with its snapshot code, the error
message is to be written to that instance's buffer. But currently, the
message is written to the top level buffer. Worse yet, it may also disable
the top level buffer and not the instance that had the issue.
Cc: stable(a)vger.kernel.org
Fixes: 2824f50332486 ("tracing: Make the snapshot trigger work with instances")
Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
---
kernel/trace/trace.c | 14 +++++++-------
1 file changed, 7 insertions(+), 7 deletions(-)
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 0a7ea02c9f08..93740a9370c6 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -1149,22 +1149,22 @@ static void tracing_snapshot_instance_cond(struct trace_array *tr,
unsigned long flags;
if (in_nmi()) {
- internal_trace_puts("*** SNAPSHOT CALLED FROM NMI CONTEXT ***\n");
- internal_trace_puts("*** snapshot is being ignored ***\n");
+ trace_array_puts(tr, "*** SNAPSHOT CALLED FROM NMI CONTEXT ***\n");
+ trace_array_puts(tr, "*** snapshot is being ignored ***\n");
return;
}
if (!tr->allocated_snapshot) {
- internal_trace_puts("*** SNAPSHOT NOT ALLOCATED ***\n");
- internal_trace_puts("*** stopping trace here! ***\n");
- tracing_off();
+ trace_array_puts(tr, "*** SNAPSHOT NOT ALLOCATED ***\n");
+ trace_array_puts(tr, "*** stopping trace here! ***\n");
+ tracer_tracing_off(tr);
return;
}
/* Note, snapshot can not be used when the tracer uses it */
if (tracer->use_max_tr) {
- internal_trace_puts("*** LATENCY TRACER ACTIVE ***\n");
- internal_trace_puts("*** Can not use snapshot (sorry) ***\n");
+ trace_array_puts(tr, "*** LATENCY TRACER ACTIVE ***\n");
+ trace_array_puts(tr, "*** Can not use snapshot (sorry) ***\n");
return;
}
--
2.39.2
SUBJECT: act_mirred: use the backlog for nested calls to mirred ingress
COMMIT: commit ca22da2fbd693b54dc8e3b7b54ccc9f7e9ba3640
Reason for request:
The commit above resolves CVE-2022-4269.
From: "Steven Rostedt (Google)" <rostedt(a)goodmis.org>
If a trace instance has a failure with its snapshot code, the error
message is to be written to that instance's buffer. But currently, the
message is written to the top level buffer. Worse yet, it may also disable
the top level buffer and not the instance that had the issue.
Cc: stable(a)vger.kernel.org
Fixes: 2824f50332486 ("tracing: Make the snapshot trigger work with instances")
Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
---
kernel/trace/trace.c | 14 +++++++-------
1 file changed, 7 insertions(+), 7 deletions(-)
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 937e9676dfd4..ed1d1093f5e9 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -1149,22 +1149,22 @@ static void tracing_snapshot_instance_cond(struct trace_array *tr,
unsigned long flags;
if (in_nmi()) {
- internal_trace_puts("*** SNAPSHOT CALLED FROM NMI CONTEXT ***\n");
- internal_trace_puts("*** snapshot is being ignored ***\n");
+ trace_array_puts(tr, "*** SNAPSHOT CALLED FROM NMI CONTEXT ***\n");
+ trace_array_puts(tr, "*** snapshot is being ignored ***\n");
return;
}
if (!tr->allocated_snapshot) {
- internal_trace_puts("*** SNAPSHOT NOT ALLOCATED ***\n");
- internal_trace_puts("*** stopping trace here! ***\n");
- tracing_off();
+ trace_array_puts(tr, "*** SNAPSHOT NOT ALLOCATED ***\n");
+ trace_array_puts(tr, "*** stopping trace here! ***\n");
+ tracer_tracing_off(tr);
return;
}
/* Note, snapshot can not be used when the tracer uses it */
if (tracer->use_max_tr) {
- internal_trace_puts("*** LATENCY TRACER ACTIVE ***\n");
- internal_trace_puts("*** Can not use snapshot (sorry) ***\n");
+ trace_array_puts(tr, "*** LATENCY TRACER ACTIVE ***\n");
+ trace_array_puts(tr, "*** Can not use snapshot (sorry) ***\n");
return;
}
--
2.39.2
The patch titled
Subject: mm/page_alloc: fix potential deadlock on zonelist_update_seq seqlock
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
mm-page_alloc-fix-potential-deadlock-on-zonelist_update_seq-seqlock.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Tetsuo Handa <penguin-kernel(a)I-love.SAKURA.ne.jp>
Subject: mm/page_alloc: fix potential deadlock on zonelist_update_seq seqlock
Date: Tue, 4 Apr 2023 23:31:58 +0900
syzbot is reporting circular locking dependency which involves
zonelist_update_seq seqlock [1], for this lock is checked by memory
allocation requests which do not need to be retried.
One deadlock scenario is kmalloc(GFP_ATOMIC) from an interrupt handler.
CPU0
----
__build_all_zonelists() {
write_seqlock(&zonelist_update_seq); // makes zonelist_update_seq.seqcount odd
// e.g. timer interrupt handler runs at this moment
some_timer_func() {
kmalloc(GFP_ATOMIC) {
__alloc_pages_slowpath() {
read_seqbegin(&zonelist_update_seq) {
// spins forever because zonelist_update_seq.seqcount is odd
}
}
}
}
// e.g. timer interrupt handler finishes
write_sequnlock(&zonelist_update_seq); // makes zonelist_update_seq.seqcount even
}
This deadlock scenario can be easily eliminated by not calling
read_seqbegin(&zonelist_update_seq) from !__GFP_DIRECT_RECLAIM allocation
requests, for retry is applicable to only __GFP_DIRECT_RECLAIM allocation
requests. But Michal Hocko does not know whether we should go with this
approach.
Another deadlock scenario which syzbot is reporting is a race between
kmalloc(GFP_ATOMIC) from tty_insert_flip_string_and_push_buffer() with
port->lock held and printk() from __build_all_zonelists() with
zonelist_update_seq held.
CPU0 CPU1
---- ----
pty_write() {
tty_insert_flip_string_and_push_buffer() {
__build_all_zonelists() {
write_seqlock(&zonelist_update_seq);
build_zonelists() {
printk() {
vprintk() {
vprintk_default() {
vprintk_emit() {
console_unlock() {
console_flush_all() {
console_emit_next_record() {
con->write() = serial8250_console_write() {
spin_lock_irqsave(&port->lock, flags);
tty_insert_flip_string() {
tty_insert_flip_string_fixed_flag() {
__tty_buffer_request_room() {
tty_buffer_alloc() {
kmalloc(GFP_ATOMIC | __GFP_NOWARN) {
__alloc_pages_slowpath() {
zonelist_iter_begin() {
read_seqbegin(&zonelist_update_seq); // spins forever because zonelist_update_seq.seqcount is odd
spin_lock_irqsave(&port->lock, flags); // spins forever because port->lock is held
}
}
}
}
}
}
}
}
spin_unlock_irqrestore(&port->lock, flags);
// message is printed to console
spin_unlock_irqrestore(&port->lock, flags);
}
}
}
}
}
}
}
}
}
write_sequnlock(&zonelist_update_seq);
}
}
}
This deadlock scenario can be eliminated by
preventing interrupt context from calling kmalloc(GFP_ATOMIC)
and
preventing printk() from calling console_flush_all()
while zonelist_update_seq.seqcount is odd.
Since Petr Mladek thinks that __build_all_zonelists() can become a
candidate for deferring printk() [2], let's address this problem by
disabling local interrupts in order to avoid kmalloc(GFP_ATOMIC)
and
disabling synchronous printk() in order to avoid console_flush_all()
.
As a side effect of minimizing duration of zonelist_update_seq.seqcount
being odd by disabling synchronous printk(), latency at
read_seqbegin(&zonelist_update_seq) for both !__GFP_DIRECT_RECLAIM and
__GFP_DIRECT_RECLAIM allocation requests will be reduced. Although, from
lockdep perspective, not calling read_seqbegin(&zonelist_update_seq) (i.e.
do not record unnecessary locking dependency) from interrupt context is
still preferable, even if we don't allow calling kmalloc(GFP_ATOMIC)
inside
write_seqlock(&zonelist_update_seq)/write_sequnlock(&zonelist_update_seq)
section...
Link: https://lkml.kernel.org/r/8796b95c-3da3-5885-fddd-6ef55f30e4d3@I-love.SAKUR…
Fixes: 3d36424b3b58 ("mm/page_alloc: fix race condition between build_all_zonelists and page allocation")
Link: https://lkml.kernel.org/r/ZCrs+1cDqPWTDFNM@alley [2]
Reported-by: syzbot <syzbot+223c7461c58c58a4cb10(a)syzkaller.appspotmail.com>
Link: https://syzkaller.appspot.com/bug?extid=223c7461c58c58a4cb10 [1]
Signed-off-by: Tetsuo Handa <penguin-kernel(a)I-love.SAKURA.ne.jp>
Acked-by: Michal Hocko <mhocko(a)suse.com>
Cc: Petr Mladek <pmladek(a)suse.com>
Cc: David Hildenbrand <david(a)redhat.com>
Cc: Ilpo J��rvinen <ilpo.jarvinen(a)linux.intel.com>
Cc: John Ogness <john.ogness(a)linutronix.de>
Cc: Mel Gorman <mgorman(a)techsingularity.net>
Cc: Patrick Daly <quic_pdaly(a)quicinc.com>
Cc: Sergey Senozhatsky <senozhatsky(a)chromium.org>
Cc: Steven Rostedt <rostedt(a)goodmis.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/page_alloc.c | 16 ++++++++++++++++
1 file changed, 16 insertions(+)
--- a/mm/page_alloc.c~mm-page_alloc-fix-potential-deadlock-on-zonelist_update_seq-seqlock
+++ a/mm/page_alloc.c
@@ -6632,7 +6632,21 @@ static void __build_all_zonelists(void *
int nid;
int __maybe_unused cpu;
pg_data_t *self = data;
+ unsigned long flags;
+ /*
+ * Explicitly disable this CPU's interrupts before taking seqlock
+ * to prevent any IRQ handler from calling into the page allocator
+ * (e.g. GFP_ATOMIC) that could hit zonelist_iter_begin and livelock.
+ */
+ local_irq_save(flags);
+ /*
+ * Explicitly disable this CPU's synchronous printk() before taking
+ * seqlock to prevent any printk() from trying to hold port->lock, for
+ * tty_insert_flip_string_and_push_buffer() on other CPU might be
+ * calling kmalloc(GFP_ATOMIC | __GFP_NOWARN) with port->lock held.
+ */
+ printk_deferred_enter();
write_seqlock(&zonelist_update_seq);
#ifdef CONFIG_NUMA
@@ -6671,6 +6685,8 @@ static void __build_all_zonelists(void *
}
write_sequnlock(&zonelist_update_seq);
+ printk_deferred_exit();
+ local_irq_restore(flags);
}
static noinline void __init
_
Patches currently in -mm which might be from penguin-kernel(a)I-love.SAKURA.ne.jp are
nilfs2-initialize-struct-nilfs_binfo_dat-bi_pad-field.patch
mm-page_alloc-fix-potential-deadlock-on-zonelist_update_seq-seqlock.patch
The patch titled
Subject: mm/swap: fix swap_info_struct race between swapoff and get_swap_pages()
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
mm-swap-fix-swap_info_struct-race-between-swapoff-and-get_swap_pages.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Rongwei Wang <rongwei.wang(a)linux.alibaba.com>
Subject: mm/swap: fix swap_info_struct race between swapoff and get_swap_pages()
Date: Tue, 4 Apr 2023 23:47:16 +0800
The si->lock must be held when deleting the si from the available list.
Otherwise, another thread can re-add the si to the available list, which
can lead to memory corruption. The only place we have found where this
happens is in the swapoff path. This case can be described as below:
core 0 core 1
swapoff
del_from_avail_list(si) waiting
try lock si->lock acquire swap_avail_lock
and re-add si into
swap_avail_head
acquire si->lock but missing si already being added again, and continuing
to clear SWP_WRITEOK, etc.
It can be easily found that a massive warning messages can be triggered
inside get_swap_pages() by some special cases, for example, we call
madvise(MADV_PAGEOUT) on blocks of touched memory concurrently, meanwhile,
run much swapon-swapoff operations (e.g. stress-ng-swap).
However, in the worst case, panic can be caused by the above scene. In
swapoff(), the memory used by si could be kept in swap_info[] after
turning off a swap. This means memory corruption will not be caused
immediately until allocated and reset for a new swap in the swapon path.
A panic message caused: (with CONFIG_PLIST_DEBUG enabled)
------------[ cut here ]------------
top: 00000000e58a3003, n: 0000000013e75cda, p: 000000008cd4451a
prev: 0000000035b1e58a, n: 000000008cd4451a, p: 000000002150ee8d
next: 000000008cd4451a, n: 000000008cd4451a, p: 000000008cd4451a
WARNING: CPU: 21 PID: 1843 at lib/plist.c:60 plist_check_prev_next_node+0x50/0x70
Modules linked in: rfkill(E) crct10dif_ce(E)...
CPU: 21 PID: 1843 Comm: stress-ng Kdump: ... 5.10.134+
Hardware name: Alibaba Cloud ECS, BIOS 0.0.0 02/06/2015
pstate: 60400005 (nZCv daif +PAN -UAO -TCO BTYPE=--)
pc : plist_check_prev_next_node+0x50/0x70
lr : plist_check_prev_next_node+0x50/0x70
sp : ffff0018009d3c30
x29: ffff0018009d3c40 x28: ffff800011b32a98
x27: 0000000000000000 x26: ffff001803908000
x25: ffff8000128ea088 x24: ffff800011b32a48
x23: 0000000000000028 x22: ffff001800875c00
x21: ffff800010f9e520 x20: ffff001800875c00
x19: ffff001800fdc6e0 x18: 0000000000000030
x17: 0000000000000000 x16: 0000000000000000
x15: 0736076307640766 x14: 0730073007380731
x13: 0736076307640766 x12: 0730073007380731
x11: 000000000004058d x10: 0000000085a85b76
x9 : ffff8000101436e4 x8 : ffff800011c8ce08
x7 : 0000000000000000 x6 : 0000000000000001
x5 : ffff0017df9ed338 x4 : 0000000000000001
x3 : ffff8017ce62a000 x2 : ffff0017df9ed340
x1 : 0000000000000000 x0 : 0000000000000000
Call trace:
plist_check_prev_next_node+0x50/0x70
plist_check_head+0x80/0xf0
plist_add+0x28/0x140
add_to_avail_list+0x9c/0xf0
_enable_swap_info+0x78/0xb4
__do_sys_swapon+0x918/0xa10
__arm64_sys_swapon+0x20/0x30
el0_svc_common+0x8c/0x220
do_el0_svc+0x2c/0x90
el0_svc+0x1c/0x30
el0_sync_handler+0xa8/0xb0
el0_sync+0x148/0x180
irq event stamp: 2082270
Now, si->lock locked before calling 'del_from_avail_list()' to make sure
other thread see the si had been deleted and SWP_WRITEOK cleared together,
will not reinsert again.
This problem exists in versions after stable 5.10.y.
Link: https://lkml.kernel.org/r/20230404154716.23058-1-rongwei.wang@linux.alibaba…
Fixes: a2468cc9bfdff ("swap: choose swap device according to numa node")
Tested-by: Yongchen Yin <wb-yyc939293(a)alibaba-inc.com>
Signed-off-by: Rongwei Wang <rongwei.wang(a)linux.alibaba.com>
Cc: Bagas Sanjaya <bagasdotme(a)gmail.com>
Cc: Matthew Wilcox (Oracle) <willy(a)infradead.org>
Cc: Aaron Lu <aaron.lu(a)intel.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/swapfile.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
--- a/mm/swapfile.c~mm-swap-fix-swap_info_struct-race-between-swapoff-and-get_swap_pages
+++ a/mm/swapfile.c
@@ -679,6 +679,7 @@ static void __del_from_avail_list(struct
{
int nid;
+ assert_spin_locked(&p->lock);
for_each_node(nid)
plist_del(&p->avail_lists[nid], &swap_avail_heads[nid]);
}
@@ -2434,8 +2435,8 @@ SYSCALL_DEFINE1(swapoff, const char __us
spin_unlock(&swap_lock);
goto out_dput;
}
- del_from_avail_list(p);
spin_lock(&p->lock);
+ del_from_avail_list(p);
if (p->prio < 0) {
struct swap_info_struct *si = p;
int nid;
_
Patches currently in -mm which might be from rongwei.wang(a)linux.alibaba.com are
mm-swap-fix-swap_info_struct-race-between-swapoff-and-get_swap_pages.patch
The patch below does not apply to the 6.2-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.2.y
git checkout FETCH_HEAD
git cherry-pick -x f7b58a69fad9d2c4c90cab0247811155dd0d48e7
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023040332-dodgy-chamomile-54f5@gregkh' --subject-prefix 'PATCH 6.2.y' HEAD^..
Possible dependencies:
f7b58a69fad9 ("dm: fix improper splitting for abnormal bios")
86a3238c7b9b ("dm: change "unsigned" to "unsigned int"")
7533afa1d27b ("dm: send just one event on resize, not two")
5cd6d1d53a1f ("dm integrity: Remove bi_sector that's only used by commented debug code")
22c40e134c4c ("dm cache: Add some documentation to dm-cache-background-tracker.h")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From f7b58a69fad9d2c4c90cab0247811155dd0d48e7 Mon Sep 17 00:00:00 2001
From: Mike Snitzer <snitzer(a)kernel.org>
Date: Thu, 30 Mar 2023 14:56:38 -0400
Subject: [PATCH] dm: fix improper splitting for abnormal bios
"Abnormal" bios include discards, write zeroes and secure erase. By no
longer passing the calculated 'len' pointer, commit 7dd06a2548b2 ("dm:
allow dm_accept_partial_bio() for dm_io without duplicate bios") took a
senseless approach to disallowing dm_accept_partial_bio() from working
for duplicate bios processed using __send_duplicate_bios().
It inadvertently and incorrectly stopped the use of 'len' when
initializing a target's io (in alloc_tio). As such the resulting tio
could address more area of a device than it should.
For example, when discarding an entire DM striped device with the
following DM table:
vg-lvol0: 0 159744 striped 2 128 7:0 2048 7:1 2048
vg-lvol0: 159744 45056 striped 2 128 7:2 2048 7:3 2048
Before this fix:
device-mapper: striped: target_stripe=0, bdev=7:0, start=2048 len=102400
blkdiscard: attempt to access beyond end of device
loop0: rw=2051, sector=2048, nr_sectors = 102400 limit=81920
device-mapper: striped: target_stripe=1, bdev=7:1, start=2048 len=102400
blkdiscard: attempt to access beyond end of device
loop1: rw=2051, sector=2048, nr_sectors = 102400 limit=81920
After this fix;
device-mapper: striped: target_stripe=0, bdev=7:0, start=2048 len=79872
device-mapper: striped: target_stripe=1, bdev=7:1, start=2048 len=79872
Fixes: 7dd06a2548b2 ("dm: allow dm_accept_partial_bio() for dm_io without duplicate bios")
Cc: stable(a)vger.kernel.org
Reported-by: Orange Kao <orange(a)aiven.io>
Signed-off-by: Mike Snitzer <snitzer(a)kernel.org>
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index 2d0f934ba6e6..e67a2757c53e 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -1467,7 +1467,8 @@ static void setup_split_accounting(struct clone_info *ci, unsigned int len)
}
static void alloc_multiple_bios(struct bio_list *blist, struct clone_info *ci,
- struct dm_target *ti, unsigned int num_bios)
+ struct dm_target *ti, unsigned int num_bios,
+ unsigned *len)
{
struct bio *bio;
int try;
@@ -1478,7 +1479,7 @@ static void alloc_multiple_bios(struct bio_list *blist, struct clone_info *ci,
if (try)
mutex_lock(&ci->io->md->table_devices_lock);
for (bio_nr = 0; bio_nr < num_bios; bio_nr++) {
- bio = alloc_tio(ci, ti, bio_nr, NULL,
+ bio = alloc_tio(ci, ti, bio_nr, len,
try ? GFP_NOIO : GFP_NOWAIT);
if (!bio)
break;
@@ -1514,7 +1515,7 @@ static int __send_duplicate_bios(struct clone_info *ci, struct dm_target *ti,
break;
default:
/* dm_accept_partial_bio() is not supported with shared tio->len_ptr */
- alloc_multiple_bios(&blist, ci, ti, num_bios);
+ alloc_multiple_bios(&blist, ci, ti, num_bios, len);
while ((clone = bio_list_pop(&blist))) {
dm_tio_set_flag(clone_to_tio(clone), DM_TIO_IS_DUPLICATE_BIO);
__map_bio(clone);
From: Roberto Sassu <roberto.sassu(a)huawei.com>
Reiserfs sets a security xattr at inode creation time in two stages: first,
it calls reiserfs_security_init() to obtain the xattr from active LSMs;
then, it calls reiserfs_security_write() to actually write that xattr.
Unfortunately, it seems there is a wrong expectation that LSMs provide the
full xattr name in the form 'security.<suffix>'. However, LSMs always
provided just the suffix, causing reiserfs to not write the xattr at all
(if the suffix is shorter than the prefix), or to write an xattr with the
wrong name.
Add a temporary buffer in reiserfs_security_write(), and write to it the
full xattr name, before passing it to reiserfs_xattr_set_handle().
Also replace the name length check with a check that the full xattr name is
not larger than XATTR_NAME_MAX.
Cc: stable(a)vger.kernel.org # v2.6.x
Fixes: 57fe60df6241 ("reiserfs: add atomic addition of selinux attributes during inode creation")
Signed-off-by: Roberto Sassu <roberto.sassu(a)huawei.com>
---
fs/reiserfs/xattr_security.c | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/fs/reiserfs/xattr_security.c b/fs/reiserfs/xattr_security.c
index 6bffdf9a4fd..6e0a099dd78 100644
--- a/fs/reiserfs/xattr_security.c
+++ b/fs/reiserfs/xattr_security.c
@@ -95,11 +95,15 @@ int reiserfs_security_write(struct reiserfs_transaction_handle *th,
struct inode *inode,
struct reiserfs_security_handle *sec)
{
+ char xattr_name[XATTR_NAME_MAX + 1] = XATTR_SECURITY_PREFIX;
int error;
- if (strlen(sec->name) < sizeof(XATTR_SECURITY_PREFIX))
+
+ if (XATTR_SECURITY_PREFIX_LEN + strlen(sec->name) > XATTR_NAME_MAX)
return -EINVAL;
- error = reiserfs_xattr_set_handle(th, inode, sec->name, sec->value,
+ strlcat(xattr_name, sec->name, sizeof(xattr_name));
+
+ error = reiserfs_xattr_set_handle(th, inode, xattr_name, sec->value,
sec->length, XATTR_CREATE);
if (error == -ENODATA || error == -EOPNOTSUPP)
error = 0;
--
2.25.1
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x f7b58a69fad9d2c4c90cab0247811155dd0d48e7
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023040334-attire-drone-2c8b@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
f7b58a69fad9 ("dm: fix improper splitting for abnormal bios")
86a3238c7b9b ("dm: change "unsigned" to "unsigned int"")
7533afa1d27b ("dm: send just one event on resize, not two")
5cd6d1d53a1f ("dm integrity: Remove bi_sector that's only used by commented debug code")
22c40e134c4c ("dm cache: Add some documentation to dm-cache-background-tracker.h")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From f7b58a69fad9d2c4c90cab0247811155dd0d48e7 Mon Sep 17 00:00:00 2001
From: Mike Snitzer <snitzer(a)kernel.org>
Date: Thu, 30 Mar 2023 14:56:38 -0400
Subject: [PATCH] dm: fix improper splitting for abnormal bios
"Abnormal" bios include discards, write zeroes and secure erase. By no
longer passing the calculated 'len' pointer, commit 7dd06a2548b2 ("dm:
allow dm_accept_partial_bio() for dm_io without duplicate bios") took a
senseless approach to disallowing dm_accept_partial_bio() from working
for duplicate bios processed using __send_duplicate_bios().
It inadvertently and incorrectly stopped the use of 'len' when
initializing a target's io (in alloc_tio). As such the resulting tio
could address more area of a device than it should.
For example, when discarding an entire DM striped device with the
following DM table:
vg-lvol0: 0 159744 striped 2 128 7:0 2048 7:1 2048
vg-lvol0: 159744 45056 striped 2 128 7:2 2048 7:3 2048
Before this fix:
device-mapper: striped: target_stripe=0, bdev=7:0, start=2048 len=102400
blkdiscard: attempt to access beyond end of device
loop0: rw=2051, sector=2048, nr_sectors = 102400 limit=81920
device-mapper: striped: target_stripe=1, bdev=7:1, start=2048 len=102400
blkdiscard: attempt to access beyond end of device
loop1: rw=2051, sector=2048, nr_sectors = 102400 limit=81920
After this fix;
device-mapper: striped: target_stripe=0, bdev=7:0, start=2048 len=79872
device-mapper: striped: target_stripe=1, bdev=7:1, start=2048 len=79872
Fixes: 7dd06a2548b2 ("dm: allow dm_accept_partial_bio() for dm_io without duplicate bios")
Cc: stable(a)vger.kernel.org
Reported-by: Orange Kao <orange(a)aiven.io>
Signed-off-by: Mike Snitzer <snitzer(a)kernel.org>
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index 2d0f934ba6e6..e67a2757c53e 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -1467,7 +1467,8 @@ static void setup_split_accounting(struct clone_info *ci, unsigned int len)
}
static void alloc_multiple_bios(struct bio_list *blist, struct clone_info *ci,
- struct dm_target *ti, unsigned int num_bios)
+ struct dm_target *ti, unsigned int num_bios,
+ unsigned *len)
{
struct bio *bio;
int try;
@@ -1478,7 +1479,7 @@ static void alloc_multiple_bios(struct bio_list *blist, struct clone_info *ci,
if (try)
mutex_lock(&ci->io->md->table_devices_lock);
for (bio_nr = 0; bio_nr < num_bios; bio_nr++) {
- bio = alloc_tio(ci, ti, bio_nr, NULL,
+ bio = alloc_tio(ci, ti, bio_nr, len,
try ? GFP_NOIO : GFP_NOWAIT);
if (!bio)
break;
@@ -1514,7 +1515,7 @@ static int __send_duplicate_bios(struct clone_info *ci, struct dm_target *ti,
break;
default:
/* dm_accept_partial_bio() is not supported with shared tio->len_ptr */
- alloc_multiple_bios(&blist, ci, ti, num_bios);
+ alloc_multiple_bios(&blist, ci, ti, num_bios, len);
while ((clone = bio_list_pop(&blist))) {
dm_tio_set_flag(clone_to_tio(clone), DM_TIO_IS_DUPLICATE_BIO);
__map_bio(clone);
Currently, with VHE, KVM enables the EL0 event counting for the
guest on vcpu_load() or KVM enables it as a part of the PMU
register emulation process, when needed. However, in the migration
case (with VHE), the same handling is lacking, as vPMU register
values that were restored by userspace haven't been propagated yet
(the PMU events haven't been created) at the vcpu load-time on the
first KVM_RUN (kvm_vcpu_pmu_restore_guest() called from vcpu_load()
on the first KVM_RUN won't do anything as events_{guest,host} of
kvm_pmu_events are still zero).
So, with VHE, enable the guest's EL0 event counting on the first
KVM_RUN (after the migration) when needed. More specifically,
have kvm_pmu_handle_pmcr() call kvm_vcpu_pmu_restore_guest()
so that kvm_pmu_handle_pmcr() on the first KVM_RUN can take
care of it.
Fixes: d0c94c49792c ("KVM: arm64: Restore PMU configuration on first run")
Cc: stable(a)vger.kernel.org
Reviewed-by: Marc Zyngier <maz(a)kernel.org>
Signed-off-by: Reiji Watanabe <reijiw(a)google.com>
---
v2:
- Added more explanation to the commit message [Marc]
- Added Marc's r-b tag (Thank you!)
v1: https://lore.kernel.org/all/20230328034725.2051499-1-reijiw@google.com/
---
arch/arm64/kvm/pmu-emul.c | 1 +
arch/arm64/kvm/sys_regs.c | 1 -
2 files changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/arm64/kvm/pmu-emul.c b/arch/arm64/kvm/pmu-emul.c
index 24908400e190..74e0d2b153b5 100644
--- a/arch/arm64/kvm/pmu-emul.c
+++ b/arch/arm64/kvm/pmu-emul.c
@@ -557,6 +557,7 @@ void kvm_pmu_handle_pmcr(struct kvm_vcpu *vcpu, u64 val)
for_each_set_bit(i, &mask, 32)
kvm_pmu_set_pmc_value(kvm_vcpu_idx_to_pmc(vcpu, i), 0, true);
}
+ kvm_vcpu_pmu_restore_guest(vcpu);
}
static bool kvm_pmu_counter_is_enabled(struct kvm_pmc *pmc)
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 53749d3a0996..425e1e9adae7 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -794,7 +794,6 @@ static bool access_pmcr(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
if (!kvm_supports_32bit_el0())
val |= ARMV8_PMU_PMCR_LC;
kvm_pmu_handle_pmcr(vcpu, val);
- kvm_vcpu_pmu_restore_guest(vcpu);
} else {
/* PMCR.P & PMCR.C are RAZ */
val = __vcpu_sys_reg(vcpu, PMCR_EL0)
base-commit: 197b6b60ae7bc51dd0814953c562833143b292aa
--
2.40.0.348.gf938b09366-goog
Hi!
The Tegra20 requires an enabled VDE power domain during startup.
As the VDE is currently not used, it is disabled during runtime.
Since [1], there is a workaround for the "normal restart path" which
enables the VDE before doing PMC's warm reboot.
This workaround is not executed in the "emergency restart path", leading
to a hang-up during start.
This series implements and registers a new pmic-based restart handler for
boards with tps6586x.
This cold reboot ensures that the VDE power domain is enabled on
tegra20-based boards.
During panic(), preemption is disabled.
This should be correctly detected by i2c_in_atomic_xfer_mode() to use
atomic i2c xfer in this late stage. This avoids warnings regarding
"Voluntary context switch within RCU".
[1] 8f0c714ad9be1ef774c98e8819a7a571451cb019
v2: https://lore.kernel.org/all/20230320220345.1463687-1-bbara93@gmail.com/
system_state: https://lore.kernel.org/all/20230320213230.1459532-1-bbara93@gmail.com/
v1: https://lore.kernel.org/all/20230316164703.1157813-1-bbara93@gmail.com/
v3:
- bring system_state back in this series
- do atomic i2c xfer if not preemptible (as suggested by Dmitry)
- fix style issues mentioned by Dmitry
- add cc stable as suggested by Dmitry
- add explanation why this is needed for Jon
v2:
- use devm-based restart handler
- convert the existing power_off handler to a devm-based handler
- handle system_state in extra series
---
Benjamin Bara (4):
kernel/reboot: emergency_restart: set correct system_state
i2c: core: run atomic i2c xfer when !preemptible
mfd: tps6586x: use devm-based power off handler
mfd: tps6586x: register restart handler
drivers/i2c/i2c-core.h | 2 +-
drivers/mfd/tps6586x.c | 43 +++++++++++++++++++++++++++++++++++--------
kernel/reboot.c | 1 +
3 files changed, 37 insertions(+), 9 deletions(-)
---
base-commit: 197b6b60ae7bc51dd0814953c562833143b292aa
change-id: 20230327-tegra-pmic-reboot-4175ff814a4b
Best regards,
--
Benjamin Bara <benjamin.bara(a)skidata.com>
The Lenovo ThinkPad W530 uses a nvidia k1000m GPU. When this gets used
together with one of the older nvidia binary driver series (the latest
series does not support it), then backlight control does not work.
This is caused by commit 3dbc80a3e4c5 ("ACPI: video: Make backlight
class device registration a separate step (v2)") combined with
commit 5aa9d943e9b6 ("ACPI: video: Don't enable fallback path for
creating ACPI backlight by default").
After these changes the acpi_video# backlight device is only registered
when requested by a GPU driver calling acpi_video_register_backlight()
which the nvidia binary driver does not do.
I realize that using the nvidia binary driver is not a supported use-case
and users can workaround this by adding acpi_backlight=video on the kernel
commandline, but the ThinkPad W530 is a popular model under Linux users,
so it seems worthwhile to add a quirk for this.
I will also email Nvidia asking them to make the driver call
acpi_video_register_backlight() when an internal LCD panel is detected.
So maybe the next maintenance release of the drivers will fix this...
Fixes: 5aa9d943e9b6 ("ACPI: video: Don't enable fallback path for creating ACPI backlight by default")
Cc: stable(a)vger.kernel.org
Reviewed-by: Mario Limonciello <mario.limonciello(a)amd.com>
Signed-off-by: Hans de Goede <hdegoede(a)redhat.com>
---
drivers/acpi/video_detect.c | 14 ++++++++++++++
1 file changed, 14 insertions(+)
diff --git a/drivers/acpi/video_detect.c b/drivers/acpi/video_detect.c
index 295744fe7c92..e85729fc481f 100644
--- a/drivers/acpi/video_detect.c
+++ b/drivers/acpi/video_detect.c
@@ -299,6 +299,20 @@ static const struct dmi_system_id video_detect_dmi_table[] = {
},
},
+ /*
+ * Older models with nvidia GPU which need acpi_video backlight
+ * control and where the old nvidia binary driver series does not
+ * call acpi_video_register_backlight().
+ */
+ {
+ .callback = video_detect_force_video,
+ /* ThinkPad W530 */
+ .matches = {
+ DMI_MATCH(DMI_SYS_VENDOR, "LENOVO"),
+ DMI_MATCH(DMI_PRODUCT_VERSION, "ThinkPad W530"),
+ },
+ },
+
/*
* These models have a working acpi_video backlight control, and using
* native backlight causes a regression where backlight does not work
--
2.39.1
On the Apple iMac14,1 and iMac14,2 all-in-ones (monitors with builtin "PC")
the connection between the GPU and the panel is seen by the GPU driver as
regular DP instead of eDP, causing the GPU driver to never call
acpi_video_register_backlight().
(GPU drivers only call acpi_video_register_backlight() when an internal
panel is detected, to avoid non working acpi_video# devices getting
registered on desktops which unfortunately is a real issue.)
Fix the missing acpi_video# backlight device on these all-in-ones by
adding a acpi_backlight=video DMI quirk, so that video.ko will
immediately register the backlight device instead of waiting for
an acpi_video_register_backlight() call.
Fixes: 5aa9d943e9b6 ("ACPI: video: Don't enable fallback path for creating ACPI backlight by default")
Cc: stable(a)vger.kernel.org
Reviewed-by: Mario Limonciello <mario.limonciello(a)amd.com>
Signed-off-by: Hans de Goede <hdegoede(a)redhat.com>
---
drivers/acpi/video_detect.c | 23 +++++++++++++++++++++++
1 file changed, 23 insertions(+)
diff --git a/drivers/acpi/video_detect.c b/drivers/acpi/video_detect.c
index f7c218dd8742..295744fe7c92 100644
--- a/drivers/acpi/video_detect.c
+++ b/drivers/acpi/video_detect.c
@@ -276,6 +276,29 @@ static const struct dmi_system_id video_detect_dmi_table[] = {
},
},
+ /*
+ * Models which need acpi_video backlight control where the GPU drivers
+ * do not call acpi_video_register_backlight() because no internal panel
+ * is detected. Typically these are all-in-ones (monitors with builtin
+ * PC) where the panel connection shows up as regular DP instead of eDP.
+ */
+ {
+ .callback = video_detect_force_video,
+ /* Apple iMac14,1 */
+ .matches = {
+ DMI_MATCH(DMI_SYS_VENDOR, "Apple Inc."),
+ DMI_MATCH(DMI_PRODUCT_NAME, "iMac14,1"),
+ },
+ },
+ {
+ .callback = video_detect_force_video,
+ /* Apple iMac14,2 */
+ .matches = {
+ DMI_MATCH(DMI_SYS_VENDOR, "Apple Inc."),
+ DMI_MATCH(DMI_PRODUCT_NAME, "iMac14,2"),
+ },
+ },
+
/*
* These models have a working acpi_video backlight control, and using
* native backlight causes a regression where backlight does not work
--
2.39.1
Commit 3dbc80a3e4c5 ("ACPI: video: Make backlight class device
registration a separate step (v2)") combined with
commit 5aa9d943e9b6 ("ACPI: video: Don't enable fallback path for
creating ACPI backlight by default")
Means that the video.ko code now fully depends on the GPU driver calling
acpi_video_register_backlight() for the acpi_video# backlight class
devices to get registered.
This means that if the GPU driver does not do this, acpi_backlight=video
on the cmdline, or DMI quirks for selecting acpi_video# will not work.
This is a problem on for example Apple iMac14,1 all-in-ones where
the monitor's LCD panel shows up as a regular DP connection instead of
eDP so the GPU driver will not call acpi_video_register_backlight() [1].
Fix this by making video.ko directly register the acpi_video# devices
when these have been explicitly requested either on the cmdline or
through DMI quirks (rather then auto-detection being used).
[1] GPU drivers only call acpi_video_register_backlight() when an internal
panel is detected, to avoid non working acpi_video# devices getting
registered on desktops which unfortunately is a real issue.
Fixes: 5aa9d943e9b6 ("ACPI: video: Don't enable fallback path for creating ACPI backlight by default")
Cc: stable(a)vger.kernel.org
Reviewed-by: Mario Limonciello <mario.limonciello(a)amd.com>
Signed-off-by: Hans de Goede <hdegoede(a)redhat.com>
---
drivers/acpi/acpi_video.c | 15 +++++++++++++--
1 file changed, 13 insertions(+), 2 deletions(-)
diff --git a/drivers/acpi/acpi_video.c b/drivers/acpi/acpi_video.c
index 97b711e57bff..c7a6d0b69dab 100644
--- a/drivers/acpi/acpi_video.c
+++ b/drivers/acpi/acpi_video.c
@@ -1984,6 +1984,7 @@ static int instance;
static int acpi_video_bus_add(struct acpi_device *device)
{
struct acpi_video_bus *video;
+ bool auto_detect;
int error;
acpi_status status;
@@ -2045,10 +2046,20 @@ static int acpi_video_bus_add(struct acpi_device *device)
mutex_unlock(&video_list_lock);
/*
- * The userspace visible backlight_device gets registered separately
- * from acpi_video_register_backlight().
+ * If backlight-type auto-detection is used then a native backlight may
+ * show up later and this may change the result from video to native.
+ * Therefor normally the userspace visible /sys/class/backlight device
+ * gets registered separately by the GPU driver calling
+ * acpi_video_register_backlight() when an internal panel is detected.
+ * Register the backlight now when not using auto-detection, so that
+ * when the kernel cmdline or DMI-quirks are used the backlight will
+ * get registered even if acpi_video_register_backlight() is not called.
*/
acpi_video_run_bcl_for_osi(video);
+ if (__acpi_video_get_backlight_type(false, &auto_detect) == acpi_backlight_video &&
+ !auto_detect)
+ acpi_video_bus_register_backlight(video);
+
acpi_video_bus_add_notify_handler(video);
return 0;
--
2.39.1
Allow callers of __acpi_video_get_backlight_type() to pass a pointer
to a bool which will get set to false if the backlight-type comes from
the cmdline or a DMI quirk and set to true if auto-detection was used.
And make __acpi_video_get_backlight_type() non static so that it can
be called directly outside of video_detect.c .
While at it turn the acpi_video_get_backlight_type() and
acpi_video_backlight_use_native() wrappers into static inline functions
in include/acpi/video.h, so that we need to export one less symbol.
Fixes: 5aa9d943e9b6 ("ACPI: video: Don't enable fallback path for creating ACPI backlight by default")
Cc: stable(a)vger.kernel.org
Reviewed-by: Mario Limonciello <mario.limonciello(a)amd.com>
Signed-off-by: Hans de Goede <hdegoede(a)redhat.com>
---
drivers/acpi/video_detect.c | 21 ++++++++-------------
include/acpi/video.h | 15 +++++++++++++--
2 files changed, 21 insertions(+), 15 deletions(-)
diff --git a/drivers/acpi/video_detect.c b/drivers/acpi/video_detect.c
index fd7cbce8076e..f7c218dd8742 100644
--- a/drivers/acpi/video_detect.c
+++ b/drivers/acpi/video_detect.c
@@ -782,7 +782,7 @@ static bool prefer_native_over_acpi_video(void)
* Determine which type of backlight interface to use on this system,
* First check cmdline, then dmi quirks, then do autodetect.
*/
-static enum acpi_backlight_type __acpi_video_get_backlight_type(bool native)
+enum acpi_backlight_type __acpi_video_get_backlight_type(bool native, bool *auto_detect)
{
static DEFINE_MUTEX(init_mutex);
static bool nvidia_wmi_ec_present;
@@ -807,6 +807,9 @@ static enum acpi_backlight_type __acpi_video_get_backlight_type(bool native)
native_available = true;
mutex_unlock(&init_mutex);
+ if (auto_detect)
+ *auto_detect = false;
+
/*
* The below heuristics / detection steps are in order of descending
* presedence. The commandline takes presedence over anything else.
@@ -818,6 +821,9 @@ static enum acpi_backlight_type __acpi_video_get_backlight_type(bool native)
if (acpi_backlight_dmi != acpi_backlight_undef)
return acpi_backlight_dmi;
+ if (auto_detect)
+ *auto_detect = true;
+
/* Special cases such as nvidia_wmi_ec and apple gmux. */
if (nvidia_wmi_ec_present)
return acpi_backlight_nvidia_wmi_ec;
@@ -837,15 +843,4 @@ static enum acpi_backlight_type __acpi_video_get_backlight_type(bool native)
/* No ACPI video/native (old hw), use vendor specific fw methods. */
return acpi_backlight_vendor;
}
-
-enum acpi_backlight_type acpi_video_get_backlight_type(void)
-{
- return __acpi_video_get_backlight_type(false);
-}
-EXPORT_SYMBOL(acpi_video_get_backlight_type);
-
-bool acpi_video_backlight_use_native(void)
-{
- return __acpi_video_get_backlight_type(true) == acpi_backlight_native;
-}
-EXPORT_SYMBOL(acpi_video_backlight_use_native);
+EXPORT_SYMBOL(__acpi_video_get_backlight_type);
diff --git a/include/acpi/video.h b/include/acpi/video.h
index 8ed9bec03e53..ff5a8da5d883 100644
--- a/include/acpi/video.h
+++ b/include/acpi/video.h
@@ -59,8 +59,6 @@ extern void acpi_video_unregister(void);
extern void acpi_video_register_backlight(void);
extern int acpi_video_get_edid(struct acpi_device *device, int type,
int device_id, void **edid);
-extern enum acpi_backlight_type acpi_video_get_backlight_type(void);
-extern bool acpi_video_backlight_use_native(void);
/*
* Note: The value returned by acpi_video_handles_brightness_key_presses()
* may change over time and should not be cached.
@@ -69,6 +67,19 @@ extern bool acpi_video_handles_brightness_key_presses(void);
extern int acpi_video_get_levels(struct acpi_device *device,
struct acpi_video_device_brightness **dev_br,
int *pmax_level);
+
+extern enum acpi_backlight_type __acpi_video_get_backlight_type(bool native,
+ bool *auto_detect);
+
+static inline enum acpi_backlight_type acpi_video_get_backlight_type(void)
+{
+ return __acpi_video_get_backlight_type(false, NULL);
+}
+
+static inline bool acpi_video_backlight_use_native(void)
+{
+ return __acpi_video_get_backlight_type(true, NULL) == acpi_backlight_native;
+}
#else
static inline void acpi_video_report_nolcd(void) { return; };
static inline int acpi_video_register(void) { return -ENODEV; }
--
2.39.1
Assert PCI Configuration Enable bit after probe. When this bit is left to
0 in the endpoint mode, the RK3399 PCIe endpoint core will generate
configuration request retry status (CRS) messages back to the root complex.
Assert this bit after probe to allow the RK3399 PCIe endpoint core to reply
to configuration requests from the root complex.
This is documented in section 17.5.8.1.2 of the RK3399 TRM.
Fixes: cf590b078391 ("PCI: rockchip: Add EP driver for Rockchip PCIe controller")
Cc: stable(a)vger.kernel.org
Signed-off-by: Rick Wertenbroek <rick.wertenbroek(a)gmail.com>
Reviewed-by: Damien Le Moal <damien.lemoal(a)opensource.wdc.com>
Tested-by: Damien Le Moal <damien.lemoal(a)opensource.wdc.com>
---
drivers/pci/controller/pcie-rockchip-ep.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/drivers/pci/controller/pcie-rockchip-ep.c b/drivers/pci/controller/pcie-rockchip-ep.c
index 9b835377bd9e..4c84e403e155 100644
--- a/drivers/pci/controller/pcie-rockchip-ep.c
+++ b/drivers/pci/controller/pcie-rockchip-ep.c
@@ -623,6 +623,8 @@ static int rockchip_pcie_ep_probe(struct platform_device *pdev)
ep->irq_pci_addr = ROCKCHIP_PCIE_EP_DUMMY_IRQ_ADDR;
+ rockchip_pcie_write(rockchip, PCIE_CLIENT_CONF_ENABLE, PCIE_CLIENT_CONFIG);
+
return 0;
err_epc_mem_exit:
pci_epc_mem_exit(epc);
--
2.25.1
Commit 5829f8a897e4 ("platform/x86: ideapad-laptop: Send
KEY_TOUCHPAD_TOGGLE on some models") made ideapad-laptop send
KEY_TOUCHPAD_TOGGLE when we receive an ACPI notify with VPC event bit 5 set
and the touchpad-state has not been changed by the EC itself already.
This was done under the assumption that this would be good to do to make
the touchpad-toggle hotkey work on newer models where the EC does not
toggle the touchpad on/off itself (because it is not routed through
the PS/2 controller, but uses I2C).
But it turns out that at least some models, e.g. the Yoga 7-15ITL5 the EC
triggers an ACPI notify with VPC event bit 5 set on resume, which would
now cause a spurious KEY_TOUCHPAD_TOGGLE on resume to which the desktop
environment responds by disabling the touchpad in software, breaking
the touchpad (until manually re-enabled) on resume.
It was never confirmed that sending KEY_TOUCHPAD_TOGGLE actually improves
things on new models and at least some new models like the Yoga 7-15ITL5
don't have a touchpad on/off toggle hotkey at all, while still sending
ACPI notify events with VPC event bit 5 set.
So it seems best to revert the change to send KEY_TOUCHPAD_TOGGLE when
receiving an ACPI notify events with VPC event bit 5 and the touchpad
state as reported by the EC has not changed.
Note this is not a full revert the code to cache the last EC touchpad
state is kept to avoid sending spurious KEY_TOUCHPAD_ON / _OFF events
on resume.
Fixes: 5829f8a897e4 ("platform/x86: ideapad-laptop: Send KEY_TOUCHPAD_TOGGLE on some models")
Link: https://bugzilla.kernel.org/show_bug.cgi?id=217234
Cc: stable(a)vger.kernel.org
Signed-off-by: Hans de Goede <hdegoede(a)redhat.com>
---
drivers/platform/x86/ideapad-laptop.c | 23 ++++++++++-------------
1 file changed, 10 insertions(+), 13 deletions(-)
diff --git a/drivers/platform/x86/ideapad-laptop.c b/drivers/platform/x86/ideapad-laptop.c
index b5ef3452da1f..35c63cce0479 100644
--- a/drivers/platform/x86/ideapad-laptop.c
+++ b/drivers/platform/x86/ideapad-laptop.c
@@ -1170,7 +1170,6 @@ static const struct key_entry ideapad_keymap[] = {
{ KE_KEY, 65, { KEY_PROG4 } },
{ KE_KEY, 66, { KEY_TOUCHPAD_OFF } },
{ KE_KEY, 67, { KEY_TOUCHPAD_ON } },
- { KE_KEY, 68, { KEY_TOUCHPAD_TOGGLE } },
{ KE_KEY, 128, { KEY_ESC } },
/*
@@ -1526,18 +1525,16 @@ static void ideapad_sync_touchpad_state(struct ideapad_private *priv, bool send_
if (priv->features.ctrl_ps2_aux_port)
i8042_command(¶m, value ? I8042_CMD_AUX_ENABLE : I8042_CMD_AUX_DISABLE);
- if (send_events) {
- /*
- * On older models the EC controls the touchpad and toggles it
- * on/off itself, in this case we report KEY_TOUCHPAD_ON/_OFF.
- * If the EC did not toggle, report KEY_TOUCHPAD_TOGGLE.
- */
- if (value != priv->r_touchpad_val) {
- ideapad_input_report(priv, value ? 67 : 66);
- sysfs_notify(&priv->platform_device->dev.kobj, NULL, "touchpad");
- } else {
- ideapad_input_report(priv, 68);
- }
+ /*
+ * On older models the EC controls the touchpad and toggles it on/off
+ * itself, in this case we report KEY_TOUCHPAD_ON/_OFF. Some models do
+ * an acpi-notify with VPC bit 5 set on resume, so this function get
+ * called with send_events=true on every resume. Therefor if the EC did
+ * not toggle, do nothing to avoid sending spurious KEY_TOUCHPAD_TOGGLE.
+ */
+ if (send_events && value != priv->r_touchpad_val) {
+ ideapad_input_report(priv, value ? 67 : 66);
+ sysfs_notify(&priv->platform_device->dev.kobj, NULL, "touchpad");
}
priv->r_touchpad_val = value;
--
2.39.1
In the memory allocation of rp->domains in rapl_detect_domains(), there
is an additional memory of struct rapl_domain allocated to prevent the
pointer out of bounds called later.
Optimize the code here to save sizeof(struct rapl_domain) bytes of
memory.
Test in Intel NUC (i5-1135G7).
Cc: stable(a)vger.kernel.org
Signed-off-by: xiongxin <xiongxin(a)kylinos.cn>
Tested-by: xiongxin <xiongxin(a)kylinos.cn>
---
drivers/powercap/intel_rapl_common.c | 15 ++++++++-------
1 file changed, 8 insertions(+), 7 deletions(-)
diff --git a/drivers/powercap/intel_rapl_common.c b/drivers/powercap/intel_rapl_common.c
index 8970c7b80884..f8971282498a 100644
--- a/drivers/powercap/intel_rapl_common.c
+++ b/drivers/powercap/intel_rapl_common.c
@@ -1171,13 +1171,14 @@ static int rapl_package_register_powercap(struct rapl_package *rp)
{
struct rapl_domain *rd;
struct powercap_zone *power_zone = NULL;
- int nr_pl, ret;
+ int nr_pl, ret, i;
/* Update the domain data of the new package */
rapl_update_domain_data(rp);
/* first we register package domain as the parent zone */
- for (rd = rp->domains; rd < rp->domains + rp->nr_domains; rd++) {
+ for (i = 0; i < rp->nr_domains; i++) {
+ rd = &rp->domains[i];
if (rd->id == RAPL_DOMAIN_PACKAGE) {
nr_pl = find_nr_power_limit(rd);
pr_debug("register package domain %s\n", rp->name);
@@ -1201,8 +1202,9 @@ static int rapl_package_register_powercap(struct rapl_package *rp)
return -ENODEV;
}
/* now register domains as children of the socket/package */
- for (rd = rp->domains; rd < rp->domains + rp->nr_domains; rd++) {
+ for (i = 0; i < rp->nr_domains; i++) {
struct powercap_zone *parent = rp->power_zone;
+ rd = &rp->domains[i];
if (rd->id == RAPL_DOMAIN_PACKAGE)
continue;
@@ -1302,7 +1304,6 @@ static void rapl_detect_powerlimit(struct rapl_domain *rd)
*/
static int rapl_detect_domains(struct rapl_package *rp, int cpu)
{
- struct rapl_domain *rd;
int i;
for (i = 0; i < RAPL_DOMAIN_MAX; i++) {
@@ -1319,15 +1320,15 @@ static int rapl_detect_domains(struct rapl_package *rp, int cpu)
}
pr_debug("found %d domains on %s\n", rp->nr_domains, rp->name);
- rp->domains = kcalloc(rp->nr_domains + 1, sizeof(struct rapl_domain),
+ rp->domains = kcalloc(rp->nr_domains, sizeof(struct rapl_domain),
GFP_KERNEL);
if (!rp->domains)
return -ENOMEM;
rapl_init_domains(rp);
- for (rd = rp->domains; rd < rp->domains + rp->nr_domains; rd++)
- rapl_detect_powerlimit(rd);
+ for (i = 0; i < rp->nr_domains; i++)
+ rapl_detect_powerlimit(&rp->domains[i]);
return 0;
}
--
2.34.1
Dzień dobry,
jakiś czas temu zgłosiła się do nas firma, której strona internetowa nie pozycjonowała się wysoko w wyszukiwarce Google.
Na podstawie wykonanego przez nas audytu SEO zoptymalizowaliśmy treści na stronie pod kątem wcześniej opracowanych słów kluczowych. Nasz wewnętrzny system codziennie analizuje prawidłowe działanie witryny. Dzięki indywidualnej strategii, firma zdobywa coraz więcej Klientów.
Czy chcieliby Państwo zwiększyć liczbę osób odwiedzających stronę internetową firmy? Mógłbym przedstawić ofertę?
Pozdrawiam
Przemysław Polak
This series fixes two bugs in the RTW88 USB driver I was reported from
several people and that I also encountered myself.
The first one resulted in "timed out to flush queue 3" messages from the
driver and sometimes a complete stall of the TX queues.
The second one is specific to the RTW8821CU chipset. Here 2GHz networks
were hardly seen and impossible to connect to. This goes down to
misinterpreting the rfe_option field in the efuse.
Sascha Hauer (2):
wifi: rtw88: usb: fix priority queue to endpoint mapping
wifi: rtw88: rtw8821c: Fix rfe_option field width
drivers/net/wireless/realtek/rtw88/rtw8821c.c | 3 +-
drivers/net/wireless/realtek/rtw88/usb.c | 70 +++++++++++++------
2 files changed, 48 insertions(+), 25 deletions(-)
--
2.39.2
From: Hui Li <caelli(a)tencent.com>
We have met a hang on pty device, the reader was blocking
at epoll on master side, the writer was sleeping at wait_woken
inside n_tty_write on slave side, and the write buffer on
tty_port was full, we found that the reader and writer would
never be woken again and blocked forever.
The problem was caused by a race between reader and kworker:
n_tty_read(reader): n_tty_receive_buf_common(kworker):
copy_from_read_buf()|
|room = N_TTY_BUF_SIZE - (ldata->read_head - tail)
|room <= 0
n_tty_kick_worker() |
|ldata->no_room = true
After writing to slave device, writer wakes up kworker to flush
data on tty_port to reader, and the kworker finds that reader
has no room to store data so room <= 0 is met. At this moment,
reader consumes all the data on reader buffer and calls
n_tty_kick_worker to check ldata->no_room which is false and
reader quits reading. Then kworker sets ldata->no_room=true
and quits too.
If write buffer is not full, writer will wake kworker to flush data
again after following writes, but if write buffer is full and writer
goes to sleep, kworker will never be woken again and tty device is
blocked.
This problem can be solved with a check for read buffer size inside
n_tty_receive_buf_common, if read buffer is empty and ldata->no_room
is true, a call to n_tty_kick_worker is necessary to keep flushing
data to reader.
Cc: <stable(a)vger.kernel.org>
Fixes: 42458f41d08f ("n_tty: Ensure reader restarts worker for next reader")
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen(a)linux.intel.com>
Signed-off-by: Hui Li <caelli(a)tencent.com>
---
Patch changelogs between v1 and v2:
-add barrier inside n_tty_read and n_tty_receive_buf_common;
-comment why barrier is needed;
-access to ldata->no_room is changed with READ_ONCE and WRITE_ONCE;
Patch changelogs between v2 and v3:
-in function n_tty_receive_buf_common, add unlikely to check
ldata->no_room, eg: if (unlikely(ldata->no_room)), and READ_ONCE
is removed here to get locality;
-change comment for barrier to show the race condition to make
comment easier to understand;
Patch changelogs between v3 and v4:
-change subject from 'tty: fix a possible hang on tty device' to
'tty: fix hang on tty device with no_room set' to make subject
more obvious;
Patch changelogs between v4 and v5:
-name is changed from cael to caelli, li is added as the family
name and caelli is the fullname.
Patch changelogs between v5 and v6:
-change from and Signed-off-by, from 'caelli <juanfengpy(a)gmail.com>'
to 'caelli <caelli(a)tencent.com>', later one is my corporate address.
Patch changelogs between v6 and v7:
-change name from caelli to 'Hui Li', which is my name in chinese.
-the comment for barrier is improved, and a Fixes and Reviewed-by
tags is added.
drivers/tty/n_tty.c | 41 +++++++++++++++++++++++++++++++++++++----
1 file changed, 37 insertions(+), 4 deletions(-)
diff --git a/drivers/tty/n_tty.c b/drivers/tty/n_tty.c
index c8f56c9b1a1c..8c17304fffcf 100644
--- a/drivers/tty/n_tty.c
+++ b/drivers/tty/n_tty.c
@@ -204,8 +204,8 @@ static void n_tty_kick_worker(struct tty_struct *tty)
struct n_tty_data *ldata = tty->disc_data;
/* Did the input worker stop? Restart it */
- if (unlikely(ldata->no_room)) {
- ldata->no_room = 0;
+ if (unlikely(READ_ONCE(ldata->no_room))) {
+ WRITE_ONCE(ldata->no_room, 0);
WARN_RATELIMIT(tty->port->itty == NULL,
"scheduling with invalid itty\n");
@@ -1698,7 +1698,7 @@ n_tty_receive_buf_common(struct tty_struct *tty, const unsigned char *cp,
if (overflow && room < 0)
ldata->read_head--;
room = overflow;
- ldata->no_room = flow && !room;
+ WRITE_ONCE(ldata->no_room, flow && !room);
} else
overflow = 0;
@@ -1729,6 +1729,27 @@ n_tty_receive_buf_common(struct tty_struct *tty, const unsigned char *cp,
} else
n_tty_check_throttle(tty);
+ if (unlikely(ldata->no_room)) {
+ /*
+ * Barrier here is to ensure to read the latest read_tail in
+ * chars_in_buffer() and to make sure that read_tail is not loaded
+ * before ldata->no_room is set, otherwise, following race may occur:
+ * n_tty_receive_buf_common()
+ * n_tty_read()
+ * if (!chars_in_buffer(tty))->false
+ * copy_from_read_buf()
+ * read_tail=commit_head
+ * n_tty_kick_worker()
+ * if (ldata->no_room)->false
+ * ldata->no_room = 1
+ * Then both kworker and reader will fail to kick n_tty_kick_worker(),
+ * smp_mb is paired with smp_mb() in n_tty_read().
+ */
+ smp_mb();
+ if (!chars_in_buffer(tty))
+ n_tty_kick_worker(tty);
+ }
+
up_read(&tty->termios_rwsem);
return rcvd;
@@ -2282,8 +2303,25 @@ static ssize_t n_tty_read(struct tty_struct *tty, struct file *file,
if (time)
timeout = time;
}
- if (old_tail != ldata->read_tail)
+ if (old_tail != ldata->read_tail) {
+ /*
+ * Make sure no_room is not read in n_tty_kick_worker()
+ * before setting ldata->read_tail in copy_from_read_buf(),
+ * otherwise, following race may occur:
+ * n_tty_read()
+ * n_tty_receive_buf_common()
+ * n_tty_kick_worker()
+ * if(ldata->no_room)->false
+ * ldata->no_room = 1
+ * if (!chars_in_buffer(tty))->false
+ * copy_from_read_buf()
+ * read_tail=commit_head
+ * Both reader and kworker will fail to kick tty_buffer_restart_work(),
+ * smp_mb is paired with smp_mb() in n_tty_receive_buf_common().
+ */
+ smp_mb();
n_tty_kick_worker(tty);
+ }
up_read(&tty->termios_rwsem);
remove_wait_queue(&tty->read_wait, &wait);
--
2.27.0
Hi stable maintainers,
This is a backport patch for commit df205b5c6328 ("KVM: arm64: Filter
out invalid core register IDs in KVM_GET_REG_LIST") to 4.14 and 4.19.
This commit was not applied to the 4.14-stable tree due to merge
conflict [1]. To backport this, commit be25bbb392fa ("KVM: arm64: Factor
out core register ID enumeration") that has no functional changes is
cherry-picked.
I'd appreciate if if you could consider backporting this to 4.14 and
4.19.
Best regards,
Takahiro
[1] https://lore.kernel.org/all/1560343489-22906-1-git-send-email-Dave.Martin@a…
Dave Martin (2):
KVM: arm64: Factor out core register ID enumeration
KVM: arm64: Filter out invalid core register IDs in KVM_GET_REG_LIST
arch/arm64/kvm/guest.c | 79 ++++++++++++++++++++++++++++++++++--------
1 file changed, 65 insertions(+), 14 deletions(-)
--
2.39.2
On Mon, Apr 03, 2023 at 09:11:43PM +0800, cuigaosheng wrote:
> On 2023/4/3 20:53, Greg KH wrote:
> > On Mon, Apr 03, 2023 at 08:17:04PM +0800, Gaosheng Cui wrote:
> > > This reverts commit c7a218cbf67fffcd99b76ae3b5e9c2e8bef17c8c.
> > >
> > > The memory of ctx is allocated by devm_kzalloc in cal_ctx_create,
> > > it should not be freed by kfree when cal_ctx_v4l2_init() fails,
> > > otherwise kfree() will cause double free, so revert this patch.
> > >
> > > Fixes: c7a218cbf67f ("media: ti: cal: fix possible memory leak in cal_ctx_create()")
> > > Signed-off-by: Gaosheng Cui <cuigaosheng1(a)huawei.com>
> > > ---
> > > drivers/media/platform/ti-vpe/cal.c | 4 +---
> > > 1 file changed, 1 insertion(+), 3 deletions(-)
> > >
> > > diff --git a/drivers/media/platform/ti-vpe/cal.c b/drivers/media/platform/ti-vpe/cal.c
> > > index 93121c90d76a..2eef245c31a1 100644
> > > --- a/drivers/media/platform/ti-vpe/cal.c
> > > +++ b/drivers/media/platform/ti-vpe/cal.c
> > > @@ -624,10 +624,8 @@ static struct cal_ctx *cal_ctx_create(struct cal_dev *cal, int inst)
> > > ctx->cport = inst;
> > > ret = cal_ctx_v4l2_init(ctx);
> > > - if (ret) {
> > > - kfree(ctx);
> > > + if (ret)
> > > return NULL;
> > > - }
> > > return ctx;
> > > }
> > > --
> > > 2.25.1
> > >
> > Why is this not needed to be reverted in Linus's tree first?
> >
> > thanks,
> >
> > greg k-h
> > .
>
> Thanks for taking time to review this patch.
>
> The memory of ctx is allocated by kzalloc since commit 9e67f24e4d90
> ("media: ti-vpe: cal: fix ctx uninitialization"), so the fixes tag
> of patch c7a218cbf67fis not entirely accurate, mainline should merge this
> patch, but it should not be merged into 5.10, my apologies for
> notdiscovering this bug earlier. Gaosheng.
>
Great, can you please put all of this information in the changelog
explaining why this is only needed for this one branch and resend it?
thanks,
greg k-h