Validate there are no duplicate hwirq from the irq debug file system /sys/kernel/debug/irq/irqs/* per chip name.
One example log show 2 duplicated hwirq in the irq debug file system.
$ sudo cat /sys/kernel/debug/irq/irqs/163 handler: handle_fasteoi_irq device: 0019:00:00.0 <SNIP> node: 1 affinity: 72-143 effectiv: 76 domain: irqchip@0x0000100022040000-3 hwirq: 0xc8000000 chip: ITS-MSI flags: 0x20
$ sudo cat /sys/kernel/debug/irq/irqs/174 handler: handle_fasteoi_irq device: 0039:00:00.0 <SNIP> node: 3 affinity: 216-287 effectiv: 221 domain: irqchip@0x0000300022040000-3 hwirq: 0xc8000000 chip: ITS-MSI flags: 0x20
The irq-check.sh can help to collect hwirq and chip name from /sys/kernel/debug/irq/irqs/* and print error log when find duplicate hwirq per chip name.
Kernel patch ("PCI/MSI: Fix MSI hwirq truncation") [1] fix above issue. [1]: https://lore.kernel.org/all/20240115135649.708536-1-vidyas@nvidia.com/
Signed-off-by: Joseph Jang jjang@nvidia.com Reviewed-by: Matthew R. Ochs mochs@nvidia.com --- tools/testing/selftests/drivers/irq/Makefile | 5 +++ tools/testing/selftests/drivers/irq/config | 2 + .../selftests/drivers/irq/irq-check.sh | 39 +++++++++++++++++++ 3 files changed, 46 insertions(+) create mode 100644 tools/testing/selftests/drivers/irq/Makefile create mode 100644 tools/testing/selftests/drivers/irq/config create mode 100755 tools/testing/selftests/drivers/irq/irq-check.sh
diff --git a/tools/testing/selftests/drivers/irq/Makefile b/tools/testing/selftests/drivers/irq/Makefile new file mode 100644 index 000000000000..d6998017c861 --- /dev/null +++ b/tools/testing/selftests/drivers/irq/Makefile @@ -0,0 +1,5 @@ +# SPDX-License-Identifier: GPL-2.0 + +TEST_PROGS := irq-check.sh + +include ../../lib.mk diff --git a/tools/testing/selftests/drivers/irq/config b/tools/testing/selftests/drivers/irq/config new file mode 100644 index 000000000000..a53d3b713728 --- /dev/null +++ b/tools/testing/selftests/drivers/irq/config @@ -0,0 +1,2 @@ +CONFIG_GENERIC_IRQ_DEBUGFS=y +CONFIG_GENERIC_IRQ_INJECTION=y diff --git a/tools/testing/selftests/drivers/irq/irq-check.sh b/tools/testing/selftests/drivers/irq/irq-check.sh new file mode 100755 index 000000000000..e784777043a1 --- /dev/null +++ b/tools/testing/selftests/drivers/irq/irq-check.sh @@ -0,0 +1,39 @@ +#!/bin/bash +# SPDX-License-Identifier: GPL-2.0 + +# This script need root permission +uid=$(id -u) +if [ $uid -ne 0 ]; then + echo "SKIP: Must be run as root" + exit 4 +fi + +# Ensure debugfs is mounted +mount -t debugfs nodev /sys/kernel/debug 2>/dev/null +if [ ! -d "/sys/kernel/debug/irq/irqs" ]; then + echo "SKIP: irq debugfs not found" + exit 4 +fi + +# Traverse the irq debug file system directory to collect chip_name and hwirq +hwirq_list=$(for irq_file in /sys/kernel/debug/irq/irqs/*; do + # Read chip name and hwirq from the irq_file + chip_name=$(cat "$irq_file" | grep -m 1 'chip:' | awk '{print $2}') + hwirq=$(cat "$irq_file" | grep -m 1 'hwirq:' | awk '{print $2}' ) + + if [ -z "$chip_name" ] || [ -z "$hwirq" ]; then + continue + fi + + echo "$chip_name $hwirq" +done) + +dup_hwirq_list=$(echo "$hwirq_list" | sort | uniq -cd) + +if [ -n "$dup_hwirq_list" ]; then + echo "ERROR: Found duplicate hwirq" + echo "$dup_hwirq_list" + exit 1 +fi + +exit 0
On 2024/9/4 9:44 AM, Joseph Jang wrote:
Validate there are no duplicate hwirq from the irq debug file system /sys/kernel/debug/irq/irqs/* per chip name.
One example log show 2 duplicated hwirq in the irq debug file system.
$ sudo cat /sys/kernel/debug/irq/irqs/163 handler: handle_fasteoi_irq device: 0019:00:00.0 <SNIP> node: 1 affinity: 72-143 effectiv: 76 domain: irqchip@0x0000100022040000-3 hwirq: 0xc8000000 chip: ITS-MSI flags: 0x20
$ sudo cat /sys/kernel/debug/irq/irqs/174 handler: handle_fasteoi_irq device: 0039:00:00.0 <SNIP> node: 3 affinity: 216-287 effectiv: 221 domain: irqchip@0x0000300022040000-3 hwirq: 0xc8000000 chip: ITS-MSI flags: 0x20
The irq-check.sh can help to collect hwirq and chip name from /sys/kernel/debug/irq/irqs/* and print error log when find duplicate hwirq per chip name.
Kernel patch ("PCI/MSI: Fix MSI hwirq truncation") [1] fix above issue. [1]: https://lore.kernel.org/all/20240115135649.708536-1-vidyas@nvidia.com/
Signed-off-by: Joseph Jang jjang@nvidia.com Reviewed-by: Matthew R. Ochs mochs@nvidia.com
tools/testing/selftests/drivers/irq/Makefile | 5 +++ tools/testing/selftests/drivers/irq/config | 2 + .../selftests/drivers/irq/irq-check.sh | 39 +++++++++++++++++++ 3 files changed, 46 insertions(+) create mode 100644 tools/testing/selftests/drivers/irq/Makefile create mode 100644 tools/testing/selftests/drivers/irq/config create mode 100755 tools/testing/selftests/drivers/irq/irq-check.sh
diff --git a/tools/testing/selftests/drivers/irq/Makefile b/tools/testing/selftests/drivers/irq/Makefile new file mode 100644 index 000000000000..d6998017c861 --- /dev/null +++ b/tools/testing/selftests/drivers/irq/Makefile @@ -0,0 +1,5 @@ +# SPDX-License-Identifier: GPL-2.0
+TEST_PROGS := irq-check.sh
+include ../../lib.mk diff --git a/tools/testing/selftests/drivers/irq/config b/tools/testing/selftests/drivers/irq/config new file mode 100644 index 000000000000..a53d3b713728 --- /dev/null +++ b/tools/testing/selftests/drivers/irq/config @@ -0,0 +1,2 @@ +CONFIG_GENERIC_IRQ_DEBUGFS=y +CONFIG_GENERIC_IRQ_INJECTION=y diff --git a/tools/testing/selftests/drivers/irq/irq-check.sh b/tools/testing/selftests/drivers/irq/irq-check.sh new file mode 100755 index 000000000000..e784777043a1 --- /dev/null +++ b/tools/testing/selftests/drivers/irq/irq-check.sh @@ -0,0 +1,39 @@ +#!/bin/bash +# SPDX-License-Identifier: GPL-2.0
+# This script need root permission +uid=$(id -u) +if [ $uid -ne 0 ]; then
- echo "SKIP: Must be run as root"
- exit 4
+fi
+# Ensure debugfs is mounted +mount -t debugfs nodev /sys/kernel/debug 2>/dev/null +if [ ! -d "/sys/kernel/debug/irq/irqs" ]; then
- echo "SKIP: irq debugfs not found"
- exit 4
+fi
+# Traverse the irq debug file system directory to collect chip_name and hwirq +hwirq_list=$(for irq_file in /sys/kernel/debug/irq/irqs/*; do
- # Read chip name and hwirq from the irq_file
- chip_name=$(cat "$irq_file" | grep -m 1 'chip:' | awk '{print $2}')
- hwirq=$(cat "$irq_file" | grep -m 1 'hwirq:' | awk '{print $2}' )
- if [ -z "$chip_name" ] || [ -z "$hwirq" ]; then
continue
- fi
- echo "$chip_name $hwirq"
+done)
+dup_hwirq_list=$(echo "$hwirq_list" | sort | uniq -cd)
+if [ -n "$dup_hwirq_list" ]; then
- echo "ERROR: Found duplicate hwirq"
- echo "$dup_hwirq_list"
- exit 1
+fi
+exit 0
Hi Tglx,
I follow your suggestions https://www.mail-archive.com/linux-kselftest@vger.kernel.org/msg16952.html to enable IRQ DEBUG_FS and create a new script to scan duplicated hwirq. If you have available time, would you please help to take a look at new patch again ?
https://lore.kernel.org/all/20240904014426.3404397-1-jjang@nvidia.com/T/
Hi Shuah,
If you have time, could you help to take a look at the new patch ?
Thank you, Joseph.
On 10/17/24 22:29, Joseph Jang wrote:
On 2024/9/4 9:44 AM, Joseph Jang wrote:
Validate there are no duplicate hwirq from the irq debug file system /sys/kernel/debug/irq/irqs/* per chip name.
One example log show 2 duplicated hwirq in the irq debug file system.
$ sudo cat /sys/kernel/debug/irq/irqs/163 handler: handle_fasteoi_irq device: 0019:00:00.0 <SNIP> node: 1 affinity: 72-143 effectiv: 76 domain: irqchip@0x0000100022040000-3 hwirq: 0xc8000000 chip: ITS-MSI flags: 0x20
$ sudo cat /sys/kernel/debug/irq/irqs/174 handler: handle_fasteoi_irq device: 0039:00:00.0 <SNIP> node: 3 affinity: 216-287 effectiv: 221 domain: irqchip@0x0000300022040000-3 hwirq: 0xc8000000 chip: ITS-MSI flags: 0x20
The irq-check.sh can help to collect hwirq and chip name from /sys/kernel/debug/irq/irqs/* and print error log when find duplicate hwirq per chip name.
Kernel patch ("PCI/MSI: Fix MSI hwirq truncation") [1] fix above issue. [1]: https://lore.kernel.org/all/20240115135649.708536-1-vidyas@nvidia.com/
Signed-off-by: Joseph Jang jjang@nvidia.com Reviewed-by: Matthew R. Ochs mochs@nvidia.com
tools/testing/selftests/drivers/irq/Makefile | 5 +++ tools/testing/selftests/drivers/irq/config | 2 + .../selftests/drivers/irq/irq-check.sh | 39 +++++++++++++++++++ 3 files changed, 46 insertions(+) create mode 100644 tools/testing/selftests/drivers/irq/Makefile create mode 100644 tools/testing/selftests/drivers/irq/config create mode 100755 tools/testing/selftests/drivers/irq/irq-check.sh
diff --git a/tools/testing/selftests/drivers/irq/Makefile b/tools/testing/selftests/drivers/irq/Makefile new file mode 100644 index 000000000000..d6998017c861 --- /dev/null +++ b/tools/testing/selftests/drivers/irq/Makefile @@ -0,0 +1,5 @@ +# SPDX-License-Identifier: GPL-2.0
+TEST_PROGS := irq-check.sh
+include ../../lib.mk diff --git a/tools/testing/selftests/drivers/irq/config b/tools/testing/selftests/drivers/irq/config new file mode 100644 index 000000000000..a53d3b713728 --- /dev/null +++ b/tools/testing/selftests/drivers/irq/config @@ -0,0 +1,2 @@ +CONFIG_GENERIC_IRQ_DEBUGFS=y +CONFIG_GENERIC_IRQ_INJECTION=y diff --git a/tools/testing/selftests/drivers/irq/irq-check.sh b/tools/testing/selftests/drivers/irq/irq-check.sh new file mode 100755 index 000000000000..e784777043a1 --- /dev/null +++ b/tools/testing/selftests/drivers/irq/irq-check.sh @@ -0,0 +1,39 @@ +#!/bin/bash +# SPDX-License-Identifier: GPL-2.0
+# This script need root permission +uid=$(id -u) +if [ $uid -ne 0 ]; then + echo "SKIP: Must be run as root" + exit 4 +fi
+# Ensure debugfs is mounted +mount -t debugfs nodev /sys/kernel/debug 2>/dev/null +if [ ! -d "/sys/kernel/debug/irq/irqs" ]; then + echo "SKIP: irq debugfs not found" + exit 4 +fi
+# Traverse the irq debug file system directory to collect chip_name and hwirq +hwirq_list=$(for irq_file in /sys/kernel/debug/irq/irqs/*; do + # Read chip name and hwirq from the irq_file + chip_name=$(cat "$irq_file" | grep -m 1 'chip:' | awk '{print $2}') + hwirq=$(cat "$irq_file" | grep -m 1 'hwirq:' | awk '{print $2}' )
+ if [ -z "$chip_name" ] || [ -z "$hwirq" ]; then + continue + fi
+ echo "$chip_name $hwirq" +done)
+dup_hwirq_list=$(echo "$hwirq_list" | sort | uniq -cd)
+if [ -n "$dup_hwirq_list" ]; then + echo "ERROR: Found duplicate hwirq" + echo "$dup_hwirq_list" + exit 1 +fi
+exit 0
Hi Tglx,
I follow your suggestions https://www.mail-archive.com/linux-kselftest@vger.kernel.org/msg16952.html to enable IRQ DEBUG_FS and create a new script to scan duplicated hwirq. If you have available time, would you please help to take a look at new patch again ?
https://lore.kernel.org/all/20240904014426.3404397-1-jjang@nvidia.com/T/
Hi Shuah,
If you have time, could you help to take a look at the new patch ?
Once Thomas reviews this and gives me okay - I will accept the patch.
thanks, -- Shuah
On Tue, Sep 03, 2024 at 06:44:26PM -0700, Joseph Jang wrote:
Validate there are no duplicate hwirq from the irq debug file system /sys/kernel/debug/irq/irqs/* per chip name.
One example log show 2 duplicated hwirq in the irq debug file system.
$ sudo cat /sys/kernel/debug/irq/irqs/163 handler: handle_fasteoi_irq device: 0019:00:00.0 <SNIP> node: 1 affinity: 72-143 effectiv: 76 domain: irqchip@0x0000100022040000-3 hwirq: 0xc8000000 chip: ITS-MSI flags: 0x20
$ sudo cat /sys/kernel/debug/irq/irqs/174 handler: handle_fasteoi_irq device: 0039:00:00.0 <SNIP> node: 3 affinity: 216-287 effectiv: 221 domain: irqchip@0x0000300022040000-3 hwirq: 0xc8000000 chip: ITS-MSI flags: 0x20
The irq-check.sh can help to collect hwirq and chip name from /sys/kernel/debug/irq/irqs/* and print error log when find duplicate hwirq per chip name.
Kernel patch ("PCI/MSI: Fix MSI hwirq truncation") [1] fix above issue. [1]: https://lore.kernel.org/all/20240115135649.708536-1-vidyas@nvidia.com/
I don't know enough about this issue to understand the details. It seems like you look for duplicate hwirqs in chips with the same name, e.g., "ITS-MSI" in this case? That name seems too generic to me (might there be several instances of "ITS-MSI" in a system?)
Also, the name may come from chip->irq_print_chip(), so it apparently relies on irqchip drivers to make the names unique if there are multiple instances?
I would have expected looking for duplicates inside something more specific, like "irqchip@0x0000300022040000-3". But again, I don't know enough about the problem to speak confidently here.
Cosmetic nits:
- Tweak subject to match history (use "git log --oneline tools/testing/selftests/drivers/" to see it), e.g.,
selftests: irq: Add check for duplicate hwirq
- Rewrap commit log to fill 75 columns. No point in using shorter lines.
- Indent the "$ sudu cat ..." block by a couple spaces since it's effectively a quotation, not part of the main text body.
- Possibly include sample output of irq-check.sh (also indented as a quote) when run on the system where you manually found the duplicate via "sudo cat /sys/kernel/debug/irq/irqs/..."
- Reword "The irq-check.sh can help ..." to something like this:
Add an irq-check.sh test to report errors when there are duplicate hwirqs per chip name.
- Since the kernel patch has already been merged, cite it like this instead of using the https://lore URL:
db744ddd59be ("PCI/MSI: Prevent MSI hardware interrupt number truncation")
Signed-off-by: Joseph Jang jjang@nvidia.com Reviewed-by: Matthew R. Ochs mochs@nvidia.com
tools/testing/selftests/drivers/irq/Makefile | 5 +++ tools/testing/selftests/drivers/irq/config | 2 + .../selftests/drivers/irq/irq-check.sh | 39 +++++++++++++++++++ 3 files changed, 46 insertions(+) create mode 100644 tools/testing/selftests/drivers/irq/Makefile create mode 100644 tools/testing/selftests/drivers/irq/config create mode 100755 tools/testing/selftests/drivers/irq/irq-check.sh
diff --git a/tools/testing/selftests/drivers/irq/Makefile b/tools/testing/selftests/drivers/irq/Makefile new file mode 100644 index 000000000000..d6998017c861 --- /dev/null +++ b/tools/testing/selftests/drivers/irq/Makefile @@ -0,0 +1,5 @@ +# SPDX-License-Identifier: GPL-2.0
+TEST_PROGS := irq-check.sh
+include ../../lib.mk diff --git a/tools/testing/selftests/drivers/irq/config b/tools/testing/selftests/drivers/irq/config new file mode 100644 index 000000000000..a53d3b713728 --- /dev/null +++ b/tools/testing/selftests/drivers/irq/config @@ -0,0 +1,2 @@ +CONFIG_GENERIC_IRQ_DEBUGFS=y +CONFIG_GENERIC_IRQ_INJECTION=y diff --git a/tools/testing/selftests/drivers/irq/irq-check.sh b/tools/testing/selftests/drivers/irq/irq-check.sh new file mode 100755 index 000000000000..e784777043a1 --- /dev/null +++ b/tools/testing/selftests/drivers/irq/irq-check.sh @@ -0,0 +1,39 @@ +#!/bin/bash +# SPDX-License-Identifier: GPL-2.0
+# This script need root permission +uid=$(id -u) +if [ $uid -ne 0 ]; then
- echo "SKIP: Must be run as root"
- exit 4
+fi
+# Ensure debugfs is mounted +mount -t debugfs nodev /sys/kernel/debug 2>/dev/null +if [ ! -d "/sys/kernel/debug/irq/irqs" ]; then
- echo "SKIP: irq debugfs not found"
- exit 4
+fi
+# Traverse the irq debug file system directory to collect chip_name and hwirq +hwirq_list=$(for irq_file in /sys/kernel/debug/irq/irqs/*; do
- # Read chip name and hwirq from the irq_file
- chip_name=$(cat "$irq_file" | grep -m 1 'chip:' | awk '{print $2}')
- hwirq=$(cat "$irq_file" | grep -m 1 'hwirq:' | awk '{print $2}' )
- if [ -z "$chip_name" ] || [ -z "$hwirq" ]; then
continue
- fi
- echo "$chip_name $hwirq"
+done)
+dup_hwirq_list=$(echo "$hwirq_list" | sort | uniq -cd)
+if [ -n "$dup_hwirq_list" ]; then
- echo "ERROR: Found duplicate hwirq"
- echo "$dup_hwirq_list"
- exit 1
+fi
+exit 0
2.34.1
On 2024/10/19 3:34 AM, Bjorn Helgaas wrote:
On Tue, Sep 03, 2024 at 06:44:26PM -0700, Joseph Jang wrote:
Validate there are no duplicate hwirq from the irq debug file system /sys/kernel/debug/irq/irqs/* per chip name.
One example log show 2 duplicated hwirq in the irq debug file system.
$ sudo cat /sys/kernel/debug/irq/irqs/163 handler: handle_fasteoi_irq device: 0019:00:00.0 <SNIP> node: 1 affinity: 72-143 effectiv: 76 domain: irqchip@0x0000100022040000-3 hwirq: 0xc8000000 chip: ITS-MSI flags: 0x20
$ sudo cat /sys/kernel/debug/irq/irqs/174 handler: handle_fasteoi_irq device: 0039:00:00.0 <SNIP> node: 3 affinity: 216-287 effectiv: 221 domain: irqchip@0x0000300022040000-3 hwirq: 0xc8000000 chip: ITS-MSI flags: 0x20
The irq-check.sh can help to collect hwirq and chip name from /sys/kernel/debug/irq/irqs/* and print error log when find duplicate hwirq per chip name.
Kernel patch ("PCI/MSI: Fix MSI hwirq truncation") [1] fix above issue. [1]: https://lore.kernel.org/all/20240115135649.708536-1-vidyas@nvidia.com/
I don't know enough about this issue to understand the details. It seems like you look for duplicate hwirqs in chips with the same name, e.g., "ITS-MSI" in this case? That name seems too generic to me (might there be several instances of "ITS-MSI" in a system?)
As I know, each PCIe device typically has only one ITS-MSI controller. Having multiple ITS-MSI instances for the same device would lead to confusion and potential conflicts in interrupt routing.
Also, the name may come from chip->irq_print_chip(), so it apparently relies on irqchip drivers to make the names unique if there are multiple instances?
I would have expected looking for duplicates inside something more specific, like "irqchip@0x0000300022040000-3". But again, I don't know enough about the problem to speak confidently here.
In our case, If we look for duplicates by different irq domains like "irqchip@0x0000100022040000-3" and "irqchip@0x0000300022040000-3" as following example.
$ sudo cat /sys/kernel/debug/irq/irqs/163 handler: handle_fasteoi_irq device: 0019:00:00.0 <SNIP> node: 1 affinity: 72-143 effectiv: 76 domain: irqchip@0x0000100022040000-3 hwirq: 0xc8000000 chip: ITS-MSI flags: 0x20 $ sudo cat /sys/kernel/debug/irq/irqs/174 handler: handle_fasteoi_irq device: 0039:00:00.0 <SNIP> node: 3 affinity: 216-287 effectiv: 221 domain: irqchip@0x0000300022040000-3 hwirq: 0xc8000000 chip: ITS-MSI flags: 0x20
We could not detect the duplicated hwirq number (0xc8000000) in this case.
Cosmetic nits:
Tweak subject to match history (use "git log --oneline tools/testing/selftests/drivers/" to see it), e.g.,
selftests: irq: Add check for duplicate hwirq
Rewrap commit log to fill 75 columns. No point in using shorter lines.
Indent the "$ sudu cat ..." block by a couple spaces since it's effectively a quotation, not part of the main text body.
Possibly include sample output of irq-check.sh (also indented as a quote) when run on the system where you manually found the duplicate via "sudo cat /sys/kernel/debug/irq/irqs/..."
Reword "The irq-check.sh can help ..." to something like this:
Add an irq-check.sh test to report errors when there are duplicate hwirqs per chip name.
Since the kernel patch has already been merged, cite it like this instead of using the https://lore URL:
db744ddd59be ("PCI/MSI: Prevent MSI hardware interrupt number truncation")
If you agree to use irq chip name ("ITS-MSI") to scan duplicate hwirq, I could send version 2 patch to fix above suggestions.
Thank you, Joseph.
Signed-off-by: Joseph Jang jjang@nvidia.com Reviewed-by: Matthew R. Ochs mochs@nvidia.com
tools/testing/selftests/drivers/irq/Makefile | 5 +++ tools/testing/selftests/drivers/irq/config | 2 + .../selftests/drivers/irq/irq-check.sh | 39 +++++++++++++++++++ 3 files changed, 46 insertions(+) create mode 100644 tools/testing/selftests/drivers/irq/Makefile create mode 100644 tools/testing/selftests/drivers/irq/config create mode 100755 tools/testing/selftests/drivers/irq/irq-check.sh
diff --git a/tools/testing/selftests/drivers/irq/Makefile b/tools/testing/selftests/drivers/irq/Makefile new file mode 100644 index 000000000000..d6998017c861 --- /dev/null +++ b/tools/testing/selftests/drivers/irq/Makefile @@ -0,0 +1,5 @@ +# SPDX-License-Identifier: GPL-2.0
+TEST_PROGS := irq-check.sh
+include ../../lib.mk diff --git a/tools/testing/selftests/drivers/irq/config b/tools/testing/selftests/drivers/irq/config new file mode 100644 index 000000000000..a53d3b713728 --- /dev/null +++ b/tools/testing/selftests/drivers/irq/config @@ -0,0 +1,2 @@ +CONFIG_GENERIC_IRQ_DEBUGFS=y +CONFIG_GENERIC_IRQ_INJECTION=y diff --git a/tools/testing/selftests/drivers/irq/irq-check.sh b/tools/testing/selftests/drivers/irq/irq-check.sh new file mode 100755 index 000000000000..e784777043a1 --- /dev/null +++ b/tools/testing/selftests/drivers/irq/irq-check.sh @@ -0,0 +1,39 @@ +#!/bin/bash +# SPDX-License-Identifier: GPL-2.0
+# This script need root permission +uid=$(id -u) +if [ $uid -ne 0 ]; then
- echo "SKIP: Must be run as root"
- exit 4
+fi
+# Ensure debugfs is mounted +mount -t debugfs nodev /sys/kernel/debug 2>/dev/null +if [ ! -d "/sys/kernel/debug/irq/irqs" ]; then
- echo "SKIP: irq debugfs not found"
- exit 4
+fi
+# Traverse the irq debug file system directory to collect chip_name and hwirq +hwirq_list=$(for irq_file in /sys/kernel/debug/irq/irqs/*; do
- # Read chip name and hwirq from the irq_file
- chip_name=$(cat "$irq_file" | grep -m 1 'chip:' | awk '{print $2}')
- hwirq=$(cat "$irq_file" | grep -m 1 'hwirq:' | awk '{print $2}' )
- if [ -z "$chip_name" ] || [ -z "$hwirq" ]; then
continue
- fi
- echo "$chip_name $hwirq"
+done)
+dup_hwirq_list=$(echo "$hwirq_list" | sort | uniq -cd)
+if [ -n "$dup_hwirq_list" ]; then
- echo "ERROR: Found duplicate hwirq"
- echo "$dup_hwirq_list"
- exit 1
+fi
+exit 0
2.34.1
On Mon, Nov 11, 2024 at 03:21:36PM +0800, Joseph Jang wrote:
On 2024/10/19 3:34 AM, Bjorn Helgaas wrote:
On Tue, Sep 03, 2024 at 06:44:26PM -0700, Joseph Jang wrote:
Validate there are no duplicate hwirq from the irq debug file system /sys/kernel/debug/irq/irqs/* per chip name.
One example log show 2 duplicated hwirq in the irq debug file system.
$ sudo cat /sys/kernel/debug/irq/irqs/163 handler: handle_fasteoi_irq device: 0019:00:00.0 <SNIP> node: 1 affinity: 72-143 effectiv: 76 domain: irqchip@0x0000100022040000-3 hwirq: 0xc8000000 chip: ITS-MSI flags: 0x20
$ sudo cat /sys/kernel/debug/irq/irqs/174 handler: handle_fasteoi_irq device: 0039:00:00.0 <SNIP> node: 3 affinity: 216-287 effectiv: 221 domain: irqchip@0x0000300022040000-3 hwirq: 0xc8000000 chip: ITS-MSI flags: 0x20
The irq-check.sh can help to collect hwirq and chip name from /sys/kernel/debug/irq/irqs/* and print error log when find duplicate hwirq per chip name.
Kernel patch ("PCI/MSI: Fix MSI hwirq truncation") [1] fix above issue. [1]: https://lore.kernel.org/all/20240115135649.708536-1-vidyas@nvidia.com/
I don't know enough about this issue to understand the details. It seems like you look for duplicate hwirqs in chips with the same name, e.g., "ITS-MSI" in this case? That name seems too generic to me (might there be several instances of "ITS-MSI" in a system?)
As I know, each PCIe device typically has only one ITS-MSI controller. Having multiple ITS-MSI instances for the same device would lead to confusion and potential conflicts in interrupt routing.
Also, the name may come from chip->irq_print_chip(), so it apparently relies on irqchip drivers to make the names unique if there are multiple instances?
I would have expected looking for duplicates inside something more specific, like "irqchip@0x0000300022040000-3". But again, I don't know enough about the problem to speak confidently here.
In our case, If we look for duplicates by different irq domains like "irqchip@0x0000100022040000-3" and "irqchip@0x0000300022040000-3" as following example.
$ sudo cat /sys/kernel/debug/irq/irqs/163 handler: handle_fasteoi_irq device: 0019:00:00.0 <SNIP> node: 1 affinity: 72-143 effectiv: 76 domain: irqchip@0x0000100022040000-3 hwirq: 0xc8000000 chip: ITS-MSI flags: 0x20 $ sudo cat /sys/kernel/debug/irq/irqs/174 handler: handle_fasteoi_irq device: 0039:00:00.0 <SNIP> node: 3 affinity: 216-287 effectiv: 221 domain: irqchip@0x0000300022040000-3 hwirq: 0xc8000000 chip: ITS-MSI flags: 0x20
We could not detect the duplicated hwirq number (0xc8000000) in this case.
Again, this is really out of my area, but based on Documentation/core-api/irq/irq-domain.rst, I assumed the point of hwirq was that hwirq numbers were local to an interrupt controller, i.e., to an irq_domain.
If that's the case, it should not be a problem if hwirq number 0xc8000000 is used in two separate irq_domains.
Bjorn
On Fri, Nov 22 2024 at 11:54, Bjorn Helgaas wrote:
On Mon, Nov 11, 2024 at 03:21:36PM +0800, Joseph Jang wrote:
We could not detect the duplicated hwirq number (0xc8000000) in this case.
Again, this is really out of my area, but based on Documentation/core-api/irq/irq-domain.rst, I assumed the point of hwirq was that hwirq numbers were local to an interrupt controller, i.e., to an irq_domain.
Correct.
But due to the truncation problem in pci_msi_domain_calc_hwirq() we ended up with the same hwirq number for two different interrupts in the same domain.
That said, I'm not really convinced about the value of the proposed script as it just checks at a random point in time, which does not give any meaningful test coverage.
I'd rather want to see a check in the irq domain code itself. At the point where an interrupt is inserted, the irqdomain can validate that there is no existing mapping for the hardware interrupt number. This check can be unconditionally enabled as interrupt setup is not really a hotpath operation and the lookup in the revmap or the tree is cheap.
Thanks,
tglx
linux-kselftest-mirror@lists.linaro.org