On Tue, Aug 18, 2020 at 11:49:29AM -0400, Mike Marshall wrote:
upstream commit id: ec95f1dedc9c64ac5a8b0bdb7c276936c70fdedd
I verified that ec95f1de "orangefs: get rid of knob code..." will apply to 5.4 and I compiled and ran a patched 5.4 kernel against my normal xfstests... I wish that ec95f1de could be in the 5.4 long term stable kernel.
ec95f1de went upstream in 5.7. When I sent up the patch it was just a theoretical race condition to me: I accepted what Christoph said about it. We now have experienced in-the-real-world how important the patch is...
Someone was trying to read a whole large (more than 100 meg) file from orangefs into some kind of cloud bucket. The resulting read failed with a "Bad address" error. I immediately thought of this patch. I reproduced the "Bad address" error with dd in kernel versions that lack ec95f1de. The "Bad address" error does not occur in kernels that include ec95f1de:
5.7.11-100.fc31.x86_64:
$ ./wr.sh 10000000 > /pvfsmnt/wr.10000000 $ dd if=/pvfsmnt/wr.10000000 of=/tmp/wr.10000000 count=10 bs=419430400 $ ls -l /pvfsmnt/wr.10000000 /tmp/wr.10000000 -rw-rw-r--. 1 hubcap hubcap 498888897 Aug 14 15:41 /pvfsmnt/wr.10000000 -rw-rw-r--. 1 hubcap hubcap 498888897 Aug 14 16:51 /tmp/wr.10000000 $ md5sum /pvfsmnt/wr.10000000 /tmp/wr.10000000 669daa04f91f561f5fb2851fb30e4ffe /pvfsmnt/wr.10000000 669daa04f91f561f5fb2851fb30e4ffe /tmp/wr.10000000
5.6.0hubcap:
$ ./wr.sh 10000000 > /pvfsmnt/wr.10000000 $ dd if=/pvfsmnt/wr.10000000 of=/tmp/wr.10000000 count=10 bs=419430400 dd: error reading '/pvfsmnt/wr.10000000': Bad address 0+0 records in 0+0 records out 0 bytes copied, 10.3365 s, 0.0 kB/s
Sounds reasonable, I'll queue this up after this next round of releases in the next few days, thanks!
greg k-h