On 7/7/2025 02:44, Greg KH wrote:
On Mon, Jul 07, 2025 at 02:00:24AM -0500, Naik, Avadhut wrote:
On 7/3/2025 00:28, Greg KH wrote:
On Wed, Jul 02, 2025 at 12:19:41PM -0500, Naik, Avadhut wrote:
Hi,
On 7/2/2025 09:31, Greg KH wrote:
On Tue, Jul 01, 2025 at 05:10:32PM +0000, Avadhut Naik wrote:
Each Chip-Select (CS) of a Unified Memory Controller (UMC) on AMD Zen-based SOCs has an Address Mask and a Secondary Address Mask register associated with it. The amd64_edac module logs DIMM sizes on a per-UMC per-CS granularity during init using these two registers.
Currently, the module primarily considers only the Address Mask register for computing DIMM sizes. The Secondary Address Mask register is only considered for odd CS. Additionally, if it has been considered, the Address Mask register is ignored altogether for that CS. For power-of-two DIMMs i.e. DIMMs whose total capacity is a power of two (32GB, 64GB, etc), this is not an issue since only the Address Mask register is used.
For non-power-of-two DIMMs i.e., DIMMs whose total capacity is not a power of two (48GB, 96GB, etc), however, the Secondary Address Mask register is used in conjunction with the Address Mask register. However, since the module only considers either of the two registers for a CS, the size computed by the module is incorrect. The Secondary Address Mask register is not considered for even CS, and the Address Mask register is not considered for odd CS.
Introduce a new helper function so that both Address Mask and Secondary Address Mask registers are considered, when valid, for computing DIMM sizes. Furthermore, also rename some variables for greater clarity.
Fixes: 81f5090db843 ("EDAC/amd64: Support asymmetric dual-rank DIMMs") Closes: https://lore.kernel.org/dbec22b6-00f2-498b-b70d-ab6f8a5ec87e@natrix.lt Reported-by: Žilvinas Žaltiena zilvinas@natrix.lt Signed-off-by: Avadhut Naik avadhut.naik@amd.com Signed-off-by: Borislav Petkov (AMD) bp@alien8.de Reviewed-by: Yazen Ghannam yazen.ghannam@amd.com Tested-by: Žilvinas Žaltiena zilvinas@natrix.lt Cc: stable@vger.kernel.org Link: https://lore.kernel.org/20250529205013.403450-1-avadhut.naik@amd.com (cherry picked from commit a3f3040657417aeadb9622c629d4a0c2693a0f93) Signed-off-by: Avadhut Naik avadhut.naik@amd.com
This was not a clean cherry-pick at all. Please document what you did differently from the original commit please.
thanks,
greg k-h
Yes, the cherry-pick was not clean, but the core logic of changes between the original commit and the cherry-picked commit remains the same.
The amd64_edac module has been reworked quite a lot in the last year or two. Support has also been introduced for new SOC families and models. This rework and support, predominantly undertaken through the below commits, is missing in 6.1 kernel.
9c42edd571aa EDAC/amd64: Add support for AMD heterogeneous Family 19h Model 30h-3Fh ed623d55eef4 EDAC/amd64: Merge struct amd64_family_type into struct amd64_pvt a2e59ab8e933 EDAC/amd64: Drop dbam_to_cs() for Family 17h and later
Why not take these as prerequisite changes? Taking changes that are radically different from what is upstream is almost always wrong, it makes future backports impossible, and usually is buggy.
Just to ensure that I have understood correctly, are you suggesting that we backport the above three commits to 6.1 too?
Yes, why not?
I just mentioned the above commits because I think they modify the code in question for this backport. But these commits have been merged in as part of larger patchsets (links below):
9c42edd571aa: https://lore.kernel.org/all/20230515113537.1052146-5-muralimk@amd.com/ ed623d55eef4: https://lore.kernel.org/all/20230127170419.1824692-11-yazen.ghannam@amd.com/ a2e59ab8e933: https://lore.kernel.org/all/20230127170419.1824692-9-yazen.ghannam@amd.com/
Backporting these commits might require us to backport these entire sets to 6.1. Wasn't completely sure if this is the road we want to take. Hence, asked the question in my earlier mail.