From: Vernon Yang yanglincheng@kylinos.cn
[ Upstream commit 0a27bdb14b028fed30a10cec2f945c38cb5ca4fa ]
The kzalloc(GFP_KERNEL) may return NULL, so all accesses to aer_info->xxx will result in kernel panic. Fix it.
Signed-off-by: Vernon Yang yanglincheng@kylinos.cn Signed-off-by: Bjorn Helgaas bhelgaas@google.com Link: https://patch.msgid.link/20250904182527.67371-1-vernon2gm@gmail.com Signed-off-by: Sasha Levin sashal@kernel.org ---
LLM Generated explanations, may be completely bogus:
YES
**Why It Matters** - Prevents a NULL pointer dereference and kernel panic during device enumeration when `kzalloc(GFP_KERNEL)` fails in AER initialization. This is a real bug users can hit under memory pressure and affects any kernel with `CONFIG_PCIEAER` enabled.
**Change Details** - Adds a NULL check after allocating `dev->aer_info` and returns early on failure, resetting `dev->aer_cap` to keep state consistent: - drivers/pci/pcie/aer.c:395 - drivers/pci/pcie/aer.c:396 - drivers/pci/pcie/aer.c:397 - The dereferences that would otherwise panic immediately follow the allocation (ratelimit initialization), so without this guard, OOM leads to instant crash: - drivers/pci/pcie/aer.c:401 - drivers/pci/pcie/aer.c:403
**Consistency With AER Flows** - Resetting `dev->aer_cap` to 0 on allocation failure is correct and keeps all AER-related code paths coherent: - Save/restore explicitly no-op when `aer_cap == 0`, avoiding config space accesses: - drivers/pci/pcie/aer.c:349 - drivers/pci/pcie/aer.c:371 - AER enablement and ECRC setup get skipped because AER is treated as unavailable: - drivers/pci/pcie/aer.c:417 (enable reporting) - drivers/pci/pcie/aer.c:420 (ECRC) - ECRC helpers themselves also gate on `aer_cap`: - drivers/pci/pcie/aer.c:164 - drivers/pci/pcie/aer.c:188 - Sysfs attributes that unconditionally dereference `pdev->aer_info` are already hidden when `aer_info == NULL`: - Visibility gating for stats attrs checks `pdev->aer_info`: - drivers/pci/pcie/aer.c:632 - Visibility gating for ratelimit attrs checks `pdev->aer_info`: - drivers/pci/pcie/aer.c:769 - AER initialization is called during capability setup for every device; avoiding a panic here is critical: - drivers/pci/probe.c:2671
**Risk and Side Effects** - Impact is limited and defensive: - On allocation failure, AER features are disabled for that device (graceful degradation) instead of panicking. - No architectural changes; no ABI changes; minimal lines touched. - All later AER users already handle `aer_info == NULL` and/or `aer_cap == 0` via existing guards. - Side effects are intentional and safe: - Port driver IRQ message number programming for AER is skipped if `aer_cap == 0`, consistent with AER being unavailable: - drivers/pci/pcie/portdrv.c:81 - drivers/pci/pcie/portdrv.c:242
**Stable Criteria** - Fixes a real crash bug that can affect users (OOM during enumeration or hotplug). - Small, contained change in a single function. - No new features or interfaces; no architectural churn. - Very low regression risk due to consistent gating on `aer_cap`/`aer_info`.
Given the clear correctness and robustness benefits with minimal risk, this is a strong candidate for backporting to stable trees.
drivers/pci/pcie/aer.c | 4 ++++ 1 file changed, 4 insertions(+)
diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c index 9d23294ceb2f6..3dba9c0c6ae11 100644 --- a/drivers/pci/pcie/aer.c +++ b/drivers/pci/pcie/aer.c @@ -383,6 +383,10 @@ void pci_aer_init(struct pci_dev *dev) return;
dev->aer_info = kzalloc(sizeof(*dev->aer_info), GFP_KERNEL); + if (!dev->aer_info) { + dev->aer_cap = 0; + return; + }
ratelimit_state_init(&dev->aer_info->correctable_ratelimit, DEFAULT_RATELIMIT_INTERVAL, DEFAULT_RATELIMIT_BURST);