On 3/28/22 5:52 AM, Borislav Petkov wrote: [...]
Fixes: 6458de97fc15 ("x86/mce/amd: Straighten CPU hotplug path")
How did you decide this is the commit that this is fixing?
I examined the history in those lines by git blame. Will recheck after the below doubt is cleared.
Link: https://lore.kernel.org/lkml/9dfe087a-f941-1bc4-657d-7e7c198888ff@gnuweeb.or...
That Link tag is not needed.
Co-authored-by: Alviro Iskandar Setiawan alviro.iskandar@gnuweeb.org Signed-off-by: Alviro Iskandar Setiawan alviro.iskandar@gnuweeb.org Co-authored-by: Yazen Ghannam yazen.ghannam@amd.com
There's no "Co-authored-by".
The correct tag is described in
Documentation/process/submitting-patches.rst
Will fix them in the v6.
...
@@ -1350,15 +1357,14 @@ int mce_threshold_create_device(unsigned int cpu) if (!(this_cpu_read(bank_map) & (1 << bank))) continue; err = threshold_create_bank(bp, cpu, bank);
if (err)
goto out_err;
if (err) {
_mce_threshold_remove_device(bp, numbanks);
return err;
} this_cpu_write(threshold_banks, bp);}
Do I see it correctly that the publishing of the @bp pointer - i.e., this line - should be moved right above the for loop?
Then mce_threshold_remove_device() would properly free it in the error case and your patch turns into a oneliner?
Previously, in v4 I did that too. But after discussion with Yazen, we got a conclusion that placing `this_cpu_write(threshold_banks, bp);` before the for loop is not the right thing to do.
And then your Fixes: tag would be correct too...
The reason is based on the discussion with Yazen, the full discussion can be read in the Link tag above.
================== The point is:
On Wed, 2 Mar 2022 17:26:32 +0000, Yazen Ghannam yazen.ghannam@amd.com wrote:
The threshold interrupt handler uses this pointer. I think the goal here is to set this pointer when the list is fully formed and clear this pointer before making any changes to the list. Otherwise, the interrupt handler will operate on incomplete data if an interrupt comes in the middle of these updates.
==================
Also, looking at the comment in mce_threshold_remove_device() function:
/* * Clear the pointer before cleaning up, so that the interrupt won't * touch anything of this. */ this_cpu_write(threshold_banks, NULL);
I think it's reasonable to place `this_cpu_write(threshold_banks, bp);` after the "for loop" on the creation process for the similar reason. In short, don't let the interrupt sees incomplete data.
Although, I am not sure if that 100% guarantees mce_threshold_remove_device() will not mess up with the interrupt (e.g. freeing the data while the interrupt reading it), unless we're using RCU stuff.
What do you think?