Thanks for catching this; it definitely looks like a real issue! I guess you're probably seeing junk in the sysfs files?
That is correct. The initial report was seeing junk when reading sysfs files. As descibed, this is happening because we reading data past the end of the stats counters array.
I think maybe we should populate the currently NULL entries in the string[] arrays and simplify the code here, e.g.,
static const char *aer_correctable_error_string[] = { "RxErr", /* Bit Position 0 */ "dev_cor_errs_bit[1]", ...
if (stats[i]) len += sysfs_emit_at(buf, len, "%s %llu\n", strings_array[i], stats[i]);
Doing it this way will change the output format. In this case we will show stats only if their value is greater than zero. The current code shows all the stats those have names (regardless of their value) plus those have non-zero values.
@@ -1342,6 +1342,11 @@ static int aer_probe(struct pcie_device *dev) struct device *device = &dev->device; struct pci_dev *port = dev->port;
- BUILD_BUG_ON(ARRAY_SIZE(aer_correctable_error_string) <
AER_MAX_TYPEOF_COR_ERRS);
- BUILD_BUG_ON(ARRAY_SIZE(aer_uncorrectable_error_string) <
AER_MAX_TYPEOF_UNCOR_ERRS);
And make these check for "!=" instead of "<".
This will require unnecessarily extending stats arrays to have 32 entries in order to match names arrays. If you don't feel strogly about changing "<" to "!=", I prefer to keep the code as it is.