From: Pavel Machek
Sent: 31 January 2020 12:58
On Thu 2020-01-30 19:39:17, Greg Kroah-Hartman wrote:
From: Fenghua Yu fenghua.yu@intel.com
[ Upstream commit f11421ba4af706cb4f5703de34fa77fba8472776 ]
This is not suitable for stable. It does not fix anything. It prepares for theoretical bug that author claims might be introduced to BIOS in future... I doubt it, even BIOS authors boot their machines from time to time.
Atomic operations that span cache lines are super-expensive on x86 (not just to the current processor, but also to other processes as all memory operations are blocked until the operation completes). Upcoming x86 processors have a switch to cause such operations to generate a #AC trap. It is expected that some real time systems will enable this mode in BIOS.
And I wonder if this is even good idea for mainline. x86 architecture is here for long time, and I doubt Intel is going to break it like this. Do you have documentation pointer?
The fact that locked operations that cross cache line boundaries work at all is because of compatibility with very old processors (which always locked the bus).
In preparation for this, it is necessary to fix code that may execute atomic instructions with operands that cross cachelines because the #AC trap will crash the kernel.
How does single bit operation "cross cacheline"? How is this going to impact non-x86 architectures?
The cpu 'bit' instructions used always access a full 'word' of memory at a 'word' offset from the specified base address' With a 64bit bit offset the 'word' is 64 bits, so if the base address of the array isn't 8 byte aligned the cpu does a misaligned RMW cycle.
Non-x86 architectures probably either: 1) Fault on the mis-aligned transfer. 2) Ignore the 'lock'. 3) Use a software 'array of mutex' to emulate locked bit updates. 4) Any random combination of the above.
Since "pwol_mask" is local and never exposed to concurrency, there is no need to set bits in pwol_mask using atomic operations.
Directly operate on the byte which contains the bit instead of using __set_bit() to avoid any big endian concern due to type cast to unsigned long in __set_bit().
What concerns? Is __set_bit() now useless and are we going to open-code it everywhere? Is set_bit() now unusable on x86?
Both set_bit() and __set_bit() are defined to work on bitmaps that are defined as 'long[]'. They are not there because people are too lazy to write foo |= 1 << n.
...
memset(ppattern + offset, 0xff, magicsync);
- for (j = 0; j < magicsync; j++)
set_bit(len++, (unsigned long *) pmask);
for (j = 0; j < magicsync; j++) {
pmask[len >> 3] |= BIT(len & 7);
len++;
}
for (j = 0; j < B44_MAX_PATTERNS; j++) { if ((B44_PATTERN_SIZE - len) >= ETH_ALEN)
@@ -1532,7 +1534,8 @@ static int b44_magic_pattern(u8 *macaddr, u8 *ppattern, u8 *pmask, int offset) for (k = 0; k< ethaddr_bytes; k++) { ppattern[offset + magicsync + (j * ETH_ALEN) + k] = macaddr[k];
set_bit(len++, (unsigned long *) pmask);
pmask[len >> 3] |= BIT(len & 7);
In this case I believe the pmask[] is passed to hardware. It is very much an array of bytes initialised in a specific way.
set_bit() (and __set_bit()) would do completely the wrong thing on BE systems.
David
- Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)