Am 23.05.24 um 15:13 schrieb Barry Kauler:
> On Wed, May 22, 2024 at 12:58 AM Armin Wolf <W_Armin(a)gmx.de> wrote:
>> Am 20.05.24 um 18:22 schrieb Alex Deucher:
>>
>>> On Sat, May 18, 2024 at 8:17 PM Armin Wolf <W_Armin(a)gmx.de> wrote:
>>>> Am 17.05.24 um 03:30 schrieb Barry Kauler:
>>>>
>>>>> Armin, Yifan, Prike,
>>>>> I will top-post, so you don't have to scroll down.
>>>>> After identifying the commit that causes black screen with my gpu, I
>>>>> posted the result to you guys, on May 9.
>>>>> It is now May 17 and no reply.
>>>>> OK, I have now created a patch that reverts Yifan's commit, compiled
>>>>> 5.15.158, and my gpu now works.
>>>>> Note, the radeon module is not loaded, so it is not a factor.
>>>>> I'm not a kernel developer. I have identified the culprit and it is up
>>>>> to you guys to fix it, Yifan especially, as you are the person who has
>>>>> created the regression.
>>>>> I will attach my patch.
>>>>> Regards,
>>>>> Barry Kauler
>>>> Hi,
>>>>
>>>> sorry for not responding to your findings. I normally do not work with GPU drivers,
>>>> so i hoped one of the amdgpu developers would handle this.
>>>>
>>>> I CCeddri-devel(a)lists.freedesktop.org and amd-gfx(a)lists.freedesktop.org so that other
>>>> amdgpu developers hear from this issue.
>>>>
>>>> Thanks you for you persistence in finding the offending commit.
>>> Likely this patch should not have been ported to 5.15 in the first
>>> place. The IOMMU requirements have been dropped from the driver for
>>> the last few kernel versions so it is no longer relevant on newer
>>> kernels.
>>>
>>> Alex
>> Barry, can you verify that the latest upstream kernel works on you device?
>> If yes, then the commit itself is ok and just the backporting itself was wrong.
>>
>> Thanks,
>> Armin Wolf
> Armin,
> The unmodified 6.8.1 kernel works ok.
> I presume that patch was applied long before 6.8.1 got released and
> only got backported to 5.15.x recently.
>
> Regards,
> Barry
>
Great to hear, that means we only have to revert commit 56b522f46681 ("drm/amdgpu: init iommu after amdkfd device init")
from the 5.15.y series.
I CCed the stable mailing list so that they can revert the offending commit.
Thanks,
Armin Wolf
>>>> Armin Wolf
>>>>
>>>>> On Thu, May 9, 2024 at 4:08 PM Barry Kauler <bkauler(a)gmail.com> wrote:
>>>>>> On Fri, May 3, 2024 at 9:03 PM Armin Wolf <W_Armin(a)gmx.de> wrote:
>>>>>>>> ...
>>>>>>>> # lspci | grep VGA
>>>>>>>> 05:00.0 VGA compatible controller: Advanced Micro Devices, Inc.
>>>>>>>> [AMD/ATI] Picasso/Raven 2 [Radeon Vega Series / Radeon Vega Mobile
>>>>>>>> Series] (rev c2)
>>>>>>>> 05:00.7 Non-VGA unclassified device: Advanced Micro Devices, Inc.
>>>>>>>> [AMD] Raven/Raven2/Renoir Non-Sensor Fusion Hub KMDF driver
>>>>>>>>
>>>>>>>> # lspci -n -k
>>>>>>>> ...
>>>>>>>> 05:00.0 0300: 1002:15d8 (rev c2)
>>>>>>>> Subsystem: 1025:1456
>>>>>>>> Kernel driver in use: amdgpu
>>>>>>>> Kernel modules: amdgpu
>>>>>>>> ...
>>>>>>> thanks for informing us of this regression. Since there are four commits affecting
>>>>>>> amdgpu in 5.15.150, i suggest that you use "git bisect" to find the faulty commits,
>>>>>>> see https://docs.kernel.org/admin-guide/bug-bisect.html for details.
>>>>>>>
>>>>>>> I think you can speed up the bisecting process by limiting yourself to the AMD DRM
>>>>>>> driver directory with "git bisect start -- drivers/gpu/drm/amd", take a look at the
>>>>>>> man page of "git bisect" for details.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Armin Wolf
>>>>>> Armin,
>>>>>> Thanks for the advice. I am unfamiliar with git on the commandline.
>>>>>> Previously only used SmartGit gui.
>>>>>> EasyOS requires aufs patch, and for a few days tried to figure out how
>>>>>> to use that with git bisect, then gave up. Changed to testing with my
>>>>>> "QV" distro, which is more conventional, doesn't need any kernel
>>>>>> patches. Managed to get it down to one commit. Here are the steps I
>>>>>> followed:
>>>>>>
>>>>>> # git clone git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git
>>>>>> # cd linux-stable
>>>>>> # git tag -l | grep '5\.15\.150'
>>>>>> v5.15.150
>>>>>> # git checkout -b my5.15.150 v5.15.150
>>>>>> Updating files: 100% (65776/65776), done.
>>>>>> Switched to a new branch 'my5.15.150'
>>>>>>
>>>>>> Copied in my .config then...
>>>>>>
>>>>>> # make menuconfig
>>>>>> # git bisect start -- drivers/gpu/drm/amd
>>>>>> # git bisect bad
>>>>>> # git bisect good v5.15.149
>>>>>> Bisecting: 1 revision left to test after this (roughly 1 step)
>>>>>> [b9a61ee2bb2704e42516e3da962f99dfa98f3b20] drm/amdgpu: reset gpu for
>>>>>> s3 suspend abort case
>>>>>> # make
>>>>>> # rm -rf /boot2
>>>>>> # mkdir -p /boot2/lib/modules
>>>>>> # make INSTALL_MOD_STRIP=1 INSTALL_MOD_PATH=/boot2 modules_install
>>>>>> # cp arch/x86/boot/bzImage /boot2/vmlinuz
>>>>>> # sync
>>>>>> ...QV on Acer laptop, with amdgpu, works!
>>>>>> # git bisect good
>>>>>> Bisecting: 0 revisions left to test after this (roughly 0 steps)
>>>>>> [56b522f4668167096a50c39446d6263c96219f5f] drm/amdgpu: init iommu
>>>>>> after amdkfd device init
>>>>>> # make
>>>>>> # mkdir -p /boot2/lib/modules
>>>>>> # make INSTALL_MOD_STRIP=1 INSTALL_MOD_PATH=/boot2 modules_install
>>>>>> # cp arch/x86/boot/bzImage /boot2/vmlinuz
>>>>>> # sync
>>>>>> ...QV on Acer laptop, black screen!
>>>>>>
>>>>>> # git bisect bad
>>>>>> 56b522f4668167096a50c39446d6263c96219f5f is the first bad commit
>>>>>> commit 56b522f4668167096a50c39446d6263c96219f5f
>>>>>> Author: Yifan Zhang <yifan1.zhang(a)amd.com>
>>>>>> Date: Tue Sep 28 15:42:35 2021 +0800
>>>>>>
>>>>>> drm/amdgpu: init iommu after amdkfd device init
>>>>>>
>>>>>> [ Upstream commit 286826d7d976e7646b09149d9bc2899d74ff962b ]
>>>>>>
>>>>>> This patch is to fix clinfo failure in Raven/Picasso:
>>>>>>
>>>>>> Number of platforms: 1
>>>>>> Platform Profile: FULL_PROFILE
>>>>>> Platform Version: OpenCL 2.2 AMD-APP (3364.0)
>>>>>> Platform Name: AMD Accelerated Parallel Processing
>>>>>> Platform Vendor: Advanced Micro Devices, Inc.
>>>>>> Platform Extensions: cl_khr_icd cl_amd_event_callback
>>>>>>
>>>>>> Platform Name: AMD Accelerated Parallel Processing Number of devices: 0
>>>>>>
>>>>>> Signed-off-by: Yifan Zhang <yifan1.zhang(a)amd.com>
>>>>>> Reviewed-by: James Zhu <James.Zhu(a)amd.com>
>>>>>> Tested-by: James Zhu <James.Zhu(a)amd.com>
>>>>>> Acked-by: Felix Kuehling <Felix.Kuehling(a)amd.com>
>>>>>> Signed-off-by: Alex Deucher <alexander.deucher(a)amd.com>
>>>>>> Signed-off-by: Sasha Levin <sashal(a)kernel.org>
>>>>>>
>>>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 8 ++++----
>>>>>> 1 file changed, 4 insertions(+), 4 deletions(-)
>>>>>>
>>>>>> Anything else I should do, to identify what in this commit is the
>>>>>> likely culprit?
>>>>>> Regards,
>>>>>> Barry Kauler
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x a63c357b9fd56ad5fe64616f5b22835252c6a76a
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024012226-unmanned-marshy-5819@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
a63c357b9fd5 ("iommu/dma: Trace bounce buffer usage when mapping buffers")
f316ba0a8814 ("dma-iommu: Check that swiotlb is active before trying to use it")
a17e3026bc4d ("iommu: Move flush queue data into iommu_dma_cookie")
f7f07484542f ("iommu/iova: Move flush queue code to iommu-dma")
ea4d71bb5e3f ("iommu/iova: Consolidate flush queue code")
87f60cc65d24 ("iommu/vt-d: Use put_pages_list")
649ad9835a37 ("iommu/iova: Squash flush_cb abstraction")
d5c383f2c98a ("iommu/iova: Squash entry_dtor abstraction")
d7061627d701 ("iommu/iova: Fix race between FQ timeout and teardown")
2e727bffbe93 ("iommu/dma: Check CONFIG_SWIOTLB more broadly")
9b49bbc2c4df ("iommu/dma: Fold _swiotlb helpers into callers")
ee9d4097cc14 ("iommu/dma: Skip extra sync during unmap w/swiotlb")
06e620345d54 ("iommu/dma: Fix arch_sync_dma for map")
08ae5d4a1ae9 ("iommu/dma: Fix sync_sg with swiotlb")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From a63c357b9fd56ad5fe64616f5b22835252c6a76a Mon Sep 17 00:00:00 2001
From: "Isaac J. Manjarres" <isaacmanjarres(a)google.com>
Date: Fri, 8 Dec 2023 15:41:40 -0800
Subject: [PATCH] iommu/dma: Trace bounce buffer usage when mapping buffers
When commit 82612d66d51d ("iommu: Allow the dma-iommu api to
use bounce buffers") was introduced, it did not add the logic
for tracing the bounce buffer usage from iommu_dma_map_page().
All of the users of swiotlb_tbl_map_single() trace their bounce
buffer usage, except iommu_dma_map_page(). This makes it difficult
to track SWIOTLB usage from that function. Thus, trace bounce buffer
usage from iommu_dma_map_page().
Fixes: 82612d66d51d ("iommu: Allow the dma-iommu api to use bounce buffers")
Cc: stable(a)vger.kernel.org # v5.15+
Cc: Tom Murphy <murphyt7(a)tcd.ie>
Cc: Lu Baolu <baolu.lu(a)linux.intel.com>
Cc: Saravana Kannan <saravanak(a)google.com>
Signed-off-by: Isaac J. Manjarres <isaacmanjarres(a)google.com>
Link: https://lore.kernel.org/r/20231208234141.2356157-1-isaacmanjarres@google.com
Signed-off-by: Joerg Roedel <jroedel(a)suse.de>
diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 85163a83df2f..037fcf826407 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -29,6 +29,7 @@
#include <linux/spinlock.h>
#include <linux/swiotlb.h>
#include <linux/vmalloc.h>
+#include <trace/events/swiotlb.h>
#include "dma-iommu.h"
@@ -1156,6 +1157,8 @@ static dma_addr_t iommu_dma_map_page(struct device *dev, struct page *page,
return DMA_MAPPING_ERROR;
}
+ trace_swiotlb_bounced(dev, phys, size);
+
aligned_size = iova_align(iovad, size);
phys = swiotlb_tbl_map_single(dev, phys, size, aligned_size,
iova_mask(iovad), dir, attrs);
From: Dmitry Torokhov <dmitry.torokhov(a)gmail.com>
commit 0774d19038c496f0c3602fb505c43e1b2d8eed85 upstream.
If an input device declares too many capability bits then modalias
string for such device may become too long and not fit into uevent
buffer, resulting in failure of sending said uevent. This, in turn,
may prevent userspace from recognizing existence of such devices.
This is typically not a concern for real hardware devices as they have
limited number of keys, but happen with synthetic devices such as
ones created by xen-kbdfront driver, which creates devices as being
capable of delivering all possible keys, since it doesn't know what
keys the backend may produce.
To deal with such devices input core will attempt to trim key data,
in the hope that the rest of modalias string will fit in the given
buffer. When trimming key data it will indicate that it is not
complete by placing "+," sign, resulting in conversions like this:
old: k71,72,73,74,78,7A,7B,7C,7D,8E,9E,A4,AD,E0,E1,E4,F8,174,
new: k71,72,73,74,78,7A,7B,7C,+,
This should allow existing udev rules continue to work with existing
devices, and will also allow writing more complex rules that would
recognize trimmed modalias and check input device characteristics by
other means (for example by parsing KEY= data in uevent or parsing
input device sysfs attributes).
Note that the driver core may try adding more uevent environment
variables once input core is done adding its own, so when forming
modalias we can not use the entire available buffer, so we reduce
it by somewhat an arbitrary amount (96 bytes).
Reported-by: Jason Andryuk <jandryuk(a)gmail.com>
Reviewed-by: Peter Hutterer <peter.hutterer(a)who-t.net>
Tested-by: Jason Andryuk <jandryuk(a)gmail.com>
Link: https://lore.kernel.org/r/ZjAWMQCJdrxZkvkB@google.com
Cc: stable(a)vger.kernel.org
Signed-off-by: Dmitry Torokhov <dmitry.torokhov(a)gmail.com>
[ Apply to linux-6.1.y ]
Signed-off-by: Jason Andryuk <jason.andryuk(a)amd.com>
---
Patch did not automatically apply to 6.1.y because
input_print_modalias_parts() does not have const on *id.
Tested on 6.1. Seems to also apply and build on 5.4 and 4.19.
drivers/input/input.c | 104 ++++++++++++++++++++++++++++++++++++------
1 file changed, 89 insertions(+), 15 deletions(-)
diff --git a/drivers/input/input.c b/drivers/input/input.c
index 8b6a922f8470..eb2bb8cbec3c 100644
--- a/drivers/input/input.c
+++ b/drivers/input/input.c
@@ -1374,19 +1374,19 @@ static int input_print_modalias_bits(char *buf, int size,
char name, unsigned long *bm,
unsigned int min_bit, unsigned int max_bit)
{
- int len = 0, i;
+ int bit = min_bit;
+ int len = 0;
len += snprintf(buf, max(size, 0), "%c", name);
- for (i = min_bit; i < max_bit; i++)
- if (bm[BIT_WORD(i)] & BIT_MASK(i))
- len += snprintf(buf + len, max(size - len, 0), "%X,", i);
+ for_each_set_bit_from(bit, bm, max_bit)
+ len += snprintf(buf + len, max(size - len, 0), "%X,", bit);
return len;
}
-static int input_print_modalias(char *buf, int size, struct input_dev *id,
- int add_cr)
+static int input_print_modalias_parts(char *buf, int size, int full_len,
+ const struct input_dev *id)
{
- int len;
+ int len, klen, remainder, space;
len = snprintf(buf, max(size, 0),
"input:b%04Xv%04Xp%04Xe%04X-",
@@ -1395,8 +1395,48 @@ static int input_print_modalias(char *buf, int size, struct input_dev *id,
len += input_print_modalias_bits(buf + len, size - len,
'e', id->evbit, 0, EV_MAX);
- len += input_print_modalias_bits(buf + len, size - len,
+
+ /*
+ * Calculate the remaining space in the buffer making sure we
+ * have place for the terminating 0.
+ */
+ space = max(size - (len + 1), 0);
+
+ klen = input_print_modalias_bits(buf + len, size - len,
'k', id->keybit, KEY_MIN_INTERESTING, KEY_MAX);
+ len += klen;
+
+ /*
+ * If we have more data than we can fit in the buffer, check
+ * if we can trim key data to fit in the rest. We will indicate
+ * that key data is incomplete by adding "+" sign at the end, like
+ * this: * "k1,2,3,45,+,".
+ *
+ * Note that we shortest key info (if present) is "k+," so we
+ * can only try to trim if key data is longer than that.
+ */
+ if (full_len && size < full_len + 1 && klen > 3) {
+ remainder = full_len - len;
+ /*
+ * We can only trim if we have space for the remainder
+ * and also for at least "k+," which is 3 more characters.
+ */
+ if (remainder <= space - 3) {
+ /*
+ * We are guaranteed to have 'k' in the buffer, so
+ * we need at least 3 additional bytes for storing
+ * "+," in addition to the remainder.
+ */
+ for (int i = size - 1 - remainder - 3; i >= 0; i--) {
+ if (buf[i] == 'k' || buf[i] == ',') {
+ strcpy(buf + i + 1, "+,");
+ len = i + 3; /* Not counting '\0' */
+ break;
+ }
+ }
+ }
+ }
+
len += input_print_modalias_bits(buf + len, size - len,
'r', id->relbit, 0, REL_MAX);
len += input_print_modalias_bits(buf + len, size - len,
@@ -1412,12 +1452,25 @@ static int input_print_modalias(char *buf, int size, struct input_dev *id,
len += input_print_modalias_bits(buf + len, size - len,
'w', id->swbit, 0, SW_MAX);
- if (add_cr)
- len += snprintf(buf + len, max(size - len, 0), "\n");
-
return len;
}
+static int input_print_modalias(char *buf, int size, const struct input_dev *id)
+{
+ int full_len;
+
+ /*
+ * Printing is done in 2 passes: first one figures out total length
+ * needed for the modalias string, second one will try to trim key
+ * data in case when buffer is too small for the entire modalias.
+ * If the buffer is too small regardless, it will fill as much as it
+ * can (without trimming key data) into the buffer and leave it to
+ * the caller to figure out what to do with the result.
+ */
+ full_len = input_print_modalias_parts(NULL, 0, 0, id);
+ return input_print_modalias_parts(buf, size, full_len, id);
+}
+
static ssize_t input_dev_show_modalias(struct device *dev,
struct device_attribute *attr,
char *buf)
@@ -1425,7 +1478,9 @@ static ssize_t input_dev_show_modalias(struct device *dev,
struct input_dev *id = to_input_dev(dev);
ssize_t len;
- len = input_print_modalias(buf, PAGE_SIZE, id, 1);
+ len = input_print_modalias(buf, PAGE_SIZE, id);
+ if (len < PAGE_SIZE - 2)
+ len += snprintf(buf + len, PAGE_SIZE - len, "\n");
return min_t(int, len, PAGE_SIZE);
}
@@ -1637,6 +1692,23 @@ static int input_add_uevent_bm_var(struct kobj_uevent_env *env,
return 0;
}
+/*
+ * This is a pretty gross hack. When building uevent data the driver core
+ * may try adding more environment variables to kobj_uevent_env without
+ * telling us, so we have no idea how much of the buffer we can use to
+ * avoid overflows/-ENOMEM elsewhere. To work around this let's artificially
+ * reduce amount of memory we will use for the modalias environment variable.
+ *
+ * The potential additions are:
+ *
+ * SEQNUM=18446744073709551615 - (%llu - 28 bytes)
+ * HOME=/ (6 bytes)
+ * PATH=/sbin:/bin:/usr/sbin:/usr/bin (34 bytes)
+ *
+ * 68 bytes total. Allow extra buffer - 96 bytes
+ */
+#define UEVENT_ENV_EXTRA_LEN 96
+
static int input_add_uevent_modalias_var(struct kobj_uevent_env *env,
struct input_dev *dev)
{
@@ -1646,9 +1718,11 @@ static int input_add_uevent_modalias_var(struct kobj_uevent_env *env,
return -ENOMEM;
len = input_print_modalias(&env->buf[env->buflen - 1],
- sizeof(env->buf) - env->buflen,
- dev, 0);
- if (len >= (sizeof(env->buf) - env->buflen))
+ (int)sizeof(env->buf) - env->buflen -
+ UEVENT_ENV_EXTRA_LEN,
+ dev);
+ if (len >= ((int)sizeof(env->buf) - env->buflen -
+ UEVENT_ENV_EXTRA_LEN))
return -ENOMEM;
env->buflen += len;
--
2.40.1
It will return all zero data when DIO reading from inline_data inode, it
is because f2fs_iomap_begin() assign iomap->type w/ IOMAP_HOLE incorrectly
for this case.
We can let iomap framework handle inline data via assigning iomap->type
and iomap->inline_data correctly, however, it will be a little bit
complicated when handling race case in between direct IO and buffered IO.
So, let's force to use buffered IO to fix this issue.
Cc: stable(a)vger.kernel.org
Reported-by: Barry Song <v-songbaohua(a)oppo.com>
Signed-off-by: Chao Yu <chao(a)kernel.org>
---
fs/f2fs/file.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
index db6236f27852..e038910ad1e5 100644
--- a/fs/f2fs/file.c
+++ b/fs/f2fs/file.c
@@ -851,6 +851,8 @@ static bool f2fs_force_buffered_io(struct inode *inode, int rw)
return true;
if (f2fs_compressed_file(inode))
return true;
+ if (f2fs_has_inline_data(inode))
+ return true;
/* disallow direct IO if any of devices has unaligned blksize */
if (f2fs_is_multi_device(sbi) && !sbi->aligned_blksize)
--
2.40.1
These commits reference out.of.bound between v6.9 and v6.10-rc1
These commits are not, yet, in stable/linux-rolling-stable.
Let me know if you would rather me compare to a different repo/branch.
The list has been manually pruned to only contain commits that look like
actual issues.
If they contain a Fixes line it has been verified that at least one of the
commits that the Fixes tag(s) reference is in stable/linux-rolling-stable
2ba24864d2f61b52210b Syz Fuzzers, Out of bounds
3ebc46ca8675de6378e3 Syz Fuzzers, Out of bounds
9841991a446c87f90f66 Kernel panic, NULL pointer, Out of bounds
51fafb3cd7fcf4f46826 Out of bounds
45cf976008ddef4a9c9a Out of bounds
8b2faf1a4f3b6c748c0d Out of bounds
faa4364bef2ec0060de3 Buffer overflow, Out of bounds
8ee1b439b1540ae54314 Out of bounds
7b4c74cf22d7584d1eb4 Out of bounds
1008368e1c7e36bdec01 Out of bounds
--
Ronnie Sahlberg [Principal Software Engineer, Linux]
P 775 384 8203 | E [email] | W ciq.com
These commits reference KASAN between v6.9 and v6.10-rc1
These commits are not, yet, in stable/linux-rolling-stable.
Let me know if you would rather me compare to a different repo/branch.
The list has been manually pruned to only contain commits that look like
actual issues.
If they contain a Fixes line it has been verified that at least one of the
commits that the Fixes tag(s) reference is in stable/linux-rolling-stable
195aba96b854dd664768 KASAN, Out of bounds
2e577732e8d28b9183df Kernel panic, KASAN
20faaf30e55522bba2b5 KASAN, Syz Fuzzers, Out of bounds
c1115ddbda9c930fba0f KASAN, NULL pointer
--
Ronnie Sahlberg [Principal Software Engineer, Linux]
P 775 384 8203 | E [email] | W ciq.com