Hi,
On 7/2/2025 09:31, Greg KH wrote:
On Tue, Jul 01, 2025 at 05:10:32PM +0000, Avadhut Naik wrote:
Each Chip-Select (CS) of a Unified Memory Controller (UMC) on AMD Zen-based SOCs has an Address Mask and a Secondary Address Mask register associated with it. The amd64_edac module logs DIMM sizes on a per-UMC per-CS granularity during init using these two registers.
Currently, the module primarily considers only the Address Mask register for computing DIMM sizes. The Secondary Address Mask register is only considered for odd CS. Additionally, if it has been considered, the Address Mask register is ignored altogether for that CS. For power-of-two DIMMs i.e. DIMMs whose total capacity is a power of two (32GB, 64GB, etc), this is not an issue since only the Address Mask register is used.
For non-power-of-two DIMMs i.e., DIMMs whose total capacity is not a power of two (48GB, 96GB, etc), however, the Secondary Address Mask register is used in conjunction with the Address Mask register. However, since the module only considers either of the two registers for a CS, the size computed by the module is incorrect. The Secondary Address Mask register is not considered for even CS, and the Address Mask register is not considered for odd CS.
Introduce a new helper function so that both Address Mask and Secondary Address Mask registers are considered, when valid, for computing DIMM sizes. Furthermore, also rename some variables for greater clarity.
Fixes: 81f5090db843 ("EDAC/amd64: Support asymmetric dual-rank DIMMs") Closes: https://lore.kernel.org/dbec22b6-00f2-498b-b70d-ab6f8a5ec87e@natrix.lt Reported-by: Žilvinas Žaltiena zilvinas@natrix.lt Signed-off-by: Avadhut Naik avadhut.naik@amd.com Signed-off-by: Borislav Petkov (AMD) bp@alien8.de Reviewed-by: Yazen Ghannam yazen.ghannam@amd.com Tested-by: Žilvinas Žaltiena zilvinas@natrix.lt Cc: stable@vger.kernel.org Link: https://lore.kernel.org/20250529205013.403450-1-avadhut.naik@amd.com (cherry picked from commit a3f3040657417aeadb9622c629d4a0c2693a0f93) Signed-off-by: Avadhut Naik avadhut.naik@amd.com
This was not a clean cherry-pick at all. Please document what you did differently from the original commit please.
thanks,
greg k-h
Yes, the cherry-pick was not clean, but the core logic of changes between the original commit and the cherry-picked commit remains the same.
The amd64_edac module has been reworked quite a lot in the last year or two. Support has also been introduced for new SOC families and models. This rework and support, predominantly undertaken through the below commits, is missing in 6.1 kernel.
9c42edd571aa EDAC/amd64: Add support for AMD heterogeneous Family 19h Model 30h-3Fh ed623d55eef4 EDAC/amd64: Merge struct amd64_family_type into struct amd64_pvt a2e59ab8e933 EDAC/amd64: Drop dbam_to_cs() for Family 17h and later
In this particular context, the original patch makes changes to umc_addr_mask_to_cs_size() and __addr_mask_to_cs_size() functions. These functions, however, are missing in 6.1. They were introduced in the module through commits a2e59ab8e933 and 9c42edd571aa. Instead, their functionality, in 6.1, has been squashed into a single function f17_addr_mask_to_cs_size(). Hence, the cherry-picked patch makes changes to f17_addr_mask_to_cs_size().
Additionally, gpu_addr_mask_to_cs_size() is missing in 6.1. It was introduced through 9c42edd571aa commit. Hence, the cherry-picked patch skips changes made by the original patch to this function.
Also, tested the cherry-picked patch on Zen4 system which had a 96GB (non-power-of-2) DIMM connected to it. Below is the snippet from dmesg:
Ubuntu24 default kernel:
[root avadnaik]# uname -r 6.8.0-62-generic [root avadnaik]# dmesg | awk '/UMC7 chip selects:/ {print; getline; print; getline; print}' [ 27.584535] EDAC MC: UMC7 chip selects: [ 27.584537] EDAC amd64: MC: 0: 32768MB 1: 16384MB [ 27.584539] EDAC amd64: MC: 2: 0MB 3: 0MB [root avadnaik]#
6.1 kernel with cherry-picked commit incorporated
[root avadnaik]# uname -r 6.1.142-edac-6.1-stable-24153-g431fa5011469 [root avadnaik]# dmesg | awk '/UMC7 chip selects:/ {print; getline; print; getline; print}' [ 24.600370] EDAC MC: UMC7 chip selects: [ 24.600371] EDAC amd64: MC: 0: 49152MB 1: 49152MB [ 24.600373] EDAC amd64: MC: 2: 0MB 3: 0MB [root avadnaik]#
Without the cherry-picked patch, the module outputs incorrect DIMM size information.
Please let me know if any further clarification is required from my end.