Hi Miquel,
On 9/12/23 16:17, Miquel Raynal wrote:
Hi Michal,
michal.simek@amd.com wrote on Tue, 12 Sep 2023 15:55:23 +0200:
Hi Miquel,
On 9/11/23 17:52, Miquel Raynal wrote:
Hi Michal,
miquel.raynal@bootlin.com wrote on Mon, 17 Jul 2023 21:42:20 +0200:
The NAND core complies with the ONFI specification, which itself mentions that after any program or erase operation, a status check should be performed to see whether the operation was finished *and* successful.
The NAND core offers helpers to finish a page write (sending the "PAGE PROG" command, waiting for the NAND chip to be ready again, and checking the operation status). But in some cases, advanced controller drivers might want to optimize this and craft their own page write helper to leverage additional hardware capabilities, thus not always using the core facilities.
Some drivers, like this one, do not use the core helper to finish a page write because the final cycles are automatically managed by the hardware. In this case, the additional care must be taken to manually perform the final status check.
Let's read the NAND chip status at the end of the page write helper and return -EIO upon error.
Cc: Michal Simek michal.simek@amd.com Cc: stable@vger.kernel.org Fixes: 88ffef1b65cf ("mtd: rawnand: arasan: Support the hardware BCH ECC engine") Signed-off-by: Miquel Raynal miquel.raynal@bootlin.com
Hello Michal,
I have not tested this, but based on a report on another driver, I believe the status check is also missing here and could sometimes lead to unnoticed partial writes.
Please test on your side that everything still works and let me know how it goes.
Any news from the testing team about patches 2/3 and 3/3?
I asked Amit to test and he didn't get back to me even I asked for it couple of times.
Ok.
Can you please tell me how to test it? I will setup HW myself and test it and get back to you.
I believe setting up the board to use the hardware BCH engine and performing basic erase/write/read testing with a known file and check it still behaves correctly would work. You can also run
nandbiterrs -i /dev/mtdx
as a second step and verify there is no difference with and without the patch and finally check the impact:
flash_speed -d -c 10 /dev/mtdx (be careful: this is a destructive operation)
I run this myself.
pl353 test log before the patch.
# cat /proc/mtd dev: size erasesize name mtd0: 10000000 00020000 "pl35x-nand-controller" # nandbiterrs -i /dev/mtd0 incremental biterrors test Successfully corrected 0 bit errors per subpage Inserted biterror @ 0/5 Read reported 1 corrected bit errors Successfully corrected 1 bit errors per subpage Inserted biterror @ 0/2 Failed to recover 1 bitflips Read error after 2 bit errors per page # flash_speed -d -c 10 /dev/mtd0 scanning for bad eraseblocks scanned 10 eraseblocks, 0 are bad testing eraseblock write speed eraseblock write speed is 4555 KiB/s testing eraseblock read speed eraseblock read speed is 5765 KiB/s testing page write speed page write speed is 4383 KiB/s testing page read speed page read speed is 5614 KiB/s testing 2 page write speed 2 page write speed is 4444 KiB/s testing 2 page read speed 2 page read speed is 5688 KiB/s Testing erase speed erase speed is 320000 KiB/s Testing 2x multi-block erase speed 2x multi-block erase speed is 320000 KiB/s Testing 4x multi-block erase speed 4x multi-block erase speed is 320000 KiB/s Testing 8x multi-block erase speed 8x multi-block erase speed is 320000 KiB/s Testing 16x multi-block erase speed 16x multi-block erase speed is 320000 KiB/s Testing 32x multi-block erase speed 32x multi-block erase speed is 320000 KiB/s Testing 64x multi-block erase speed 64x multi-block erase speed is 320000 KiB/s finished # dmesg | grep nand [ 2.876719] nand: device found, Manufacturer ID: 0x2c, Chip ID: 0xda [ 2.883130] nand: Micron MT29F2G08ABAEAWP [ 2.887230] nand: 256 MiB, SLC, erase size: 128 KiB, page size: 2048, OOB size: 64 #
When applied
# cat /proc/mtd dev: size erasesize name mtd0: 10000000 00020000 "pl35x-nand-controller" # nandbiterrs -i /dev/mtd0 incremental biterrors test Successfully corrected 0 bit errors per subpage Inserted biterror @ 0/5 Read reported 1 corrected bit errors Successfully corrected 1 bit errors per subpage Inserted biterror @ 0/2 Failed to recover 1 bitflips Read error after 2 bit errors per page # flash_speed -d -c 10 /dev/mtd0 scanning for bad eraseblocks scanned 10 eraseblocks, 0 are bad testing eraseblock write speed eraseblock write speed is 4522 KiB/s testing eraseblock read speed eraseblock read speed is 5765 KiB/s testing page write speed page write speed is 4383 KiB/s testing page read speed page read speed is 5638 KiB/s testing 2 page write speed 2 page write speed is 4444 KiB/s testing 2 page read speed 2 page read speed is 5714 KiB/s Testing erase speed erase speed is 320000 KiB/s Testing 2x multi-block erase speed 2x multi-block erase speed is 320000 KiB/s Testing 4x multi-block erase speed 4x multi-block erase speed is 320000 KiB/s Testing 8x multi-block erase speed 8x multi-block erase speed is 320000 KiB/s Testing 16x multi-block erase speed 16x multi-block erase speed is 320000 KiB/s Testing 32x multi-block erase speed 32x multi-block erase speed is 320000 KiB/s Testing 64x multi-block erase speed 64x multi-block erase speed is 320000 KiB/s finished # dmesg | grep nand [ 2.896206] nand: device found, Manufacturer ID: 0x2c, Chip ID: 0xda [ 2.902648] nand: Micron MT29F2G08ABAEAWP [ 2.906667] nand: 256 MiB, SLC, erase size: 128 KiB, page size: 2048, OOB size: 64
Behavior is the same. Speed is changing on every run.
I don't have zynqmp board here but will try to get data asap.
Thanks, Michal