It looks like the NAND_STATUS_FAIL bit is sticky after an ECC failure, which leads all READ operations following the failing one to report an ECC failure. Reset the chip to clear the NAND_STATUS_FAIL bit.
Note that this behavior is not document in the datasheet, but resetting the chip is the only solution we found to fix the problem.
Fixes: 9748e1d87573 ("mtd: nand: add support for Micron on-die ECC") Cc: stable@vger.kernel.org Signed-off-by: Boris Brezillon boris.brezillon@bootlin.com Cc: Thomas Petazzoni thomas.petazzoni@bootlin.com Cc: Bean Huo beanhuo@micron.com Cc: Peter Pan peterpandong@micron.com --- Peter, Bean,
Can you confirm this behavior, or ask someone in Micron who can confirm it? Also, if a RESET is actually needed, it would be good to update the datasheet accordingly. And if that's not the case, can you explain why the NAND_STATUS_FAIL bit is stuck and how to clear it (I tried a 0x00 command, A.K.A. READ STATUS EXIT, but it does not clear this bit, ERASE and PROGRAM seem to clear the bit, but that's clearly not the kind of operation I can do when the user asks for a READ)?
Thanks,
Boris --- drivers/mtd/nand/raw/nand_micron.c | 17 +++++++++++++++++ 1 file changed, 17 insertions(+)
diff --git a/drivers/mtd/nand/raw/nand_micron.c b/drivers/mtd/nand/raw/nand_micron.c index 0af45b134c0c..a915f568f6a3 100644 --- a/drivers/mtd/nand/raw/nand_micron.c +++ b/drivers/mtd/nand/raw/nand_micron.c @@ -153,6 +153,23 @@ micron_nand_read_page_on_die_ecc(struct mtd_info *mtd, struct nand_chip *chip, ret = nand_read_data_op(chip, chip->oob_poi, mtd->oobsize, false);
+ /* + * Looks like the NAND_STATUS_FAIL bit is sticky after an ECC failure, + * which leads all READ operations following the failing one to report + * an ECC failure. + * Reset the chip to clear it. + * + * Note that this behavior is not document in the datasheet, but + * resetting the chip is the only solution we found to clear this bit. + */ + if (status & NAND_STATUS_FAIL) { + int cs = page >> (chip->chip_shift - chip->page_shift); + + chip->select_chip(mtd, -1); + nand_reset(chip, cs); + chip->select_chip(mtd, cs); + } + out: micron_nand_on_die_ecc_setup(chip, false);
Hi Boris,
On Thu, 3 May 2018 09:49:08 +0200, Boris Brezillon boris.brezillon@bootlin.com wrote:
It looks like the NAND_STATUS_FAIL bit is sticky after an ECC failure, which leads all READ operations following the failing one to report an ECC failure. Reset the chip to clear the NAND_STATUS_FAIL bit.
Note that this behavior is not document in the datasheet, but resetting the chip is the only solution we found to fix the problem.
Fixes: 9748e1d87573 ("mtd: nand: add support for Micron on-die ECC") Cc: stable@vger.kernel.org Signed-off-by: Boris Brezillon boris.brezillon@bootlin.com Cc: Thomas Petazzoni thomas.petazzoni@bootlin.com Cc: Bean Huo beanhuo@micron.com Cc: Peter Pan peterpandong@micron.com
Reviewed-by: Miquel Raynal miquel.raynal@bootlin.com
On Fri, 4 May 2018 11:58:35 +0200 Miquel Raynal miquel.raynal@bootlin.com wrote:
Hi Boris,
On Thu, 3 May 2018 09:49:08 +0200, Boris Brezillon boris.brezillon@bootlin.com wrote:
It looks like the NAND_STATUS_FAIL bit is sticky after an ECC failure, which leads all READ operations following the failing one to report an ECC failure. Reset the chip to clear the NAND_STATUS_FAIL bit.
Note that this behavior is not document in the datasheet, but resetting the chip is the only solution we found to fix the problem.
Fixes: 9748e1d87573 ("mtd: nand: add support for Micron on-die ECC") Cc: stable@vger.kernel.org Signed-off-by: Boris Brezillon boris.brezillon@bootlin.com Cc: Thomas Petazzoni thomas.petazzoni@bootlin.com Cc: Bean Huo beanhuo@micron.com Cc: Peter Pan peterpandong@micron.com
Reviewed-by: Miquel Raynal miquel.raynal@bootlin.com
Queued to mtd/master.
On Tue, 8 May 2018 23:12:59 +0200 Boris Brezillon boris.brezillon@bootlin.com wrote:
On Fri, 4 May 2018 11:58:35 +0200 Miquel Raynal miquel.raynal@bootlin.com wrote:
Hi Boris,
On Thu, 3 May 2018 09:49:08 +0200, Boris Brezillon boris.brezillon@bootlin.com wrote:
It looks like the NAND_STATUS_FAIL bit is sticky after an ECC failure, which leads all READ operations following the failing one to report an ECC failure. Reset the chip to clear the NAND_STATUS_FAIL bit.
Note that this behavior is not document in the datasheet, but resetting the chip is the only solution we found to fix the problem.
Fixes: 9748e1d87573 ("mtd: nand: add support for Micron on-die ECC") Cc: stable@vger.kernel.org Signed-off-by: Boris Brezillon boris.brezillon@bootlin.com Cc: Thomas Petazzoni thomas.petazzoni@bootlin.com Cc: Bean Huo beanhuo@micron.com Cc: Peter Pan peterpandong@micron.com
Reviewed-by: Miquel Raynal miquel.raynal@bootlin.com
Queued to mtd/master.
I'm dropping this patch because I'm no longer sure this is the correct way to fix bug. It seems that nand_set_features_op() is checking the FAIL bit while the ONFI spec clearly says that FAIL bit is only valid after a PROGRAM, ERASE or READ-with-on-die-ECC-enabled op. That might explain why ->set_features() fails with -EIO after an ECC failure (apparently Micron only clears the FAIL bit when launching a PROGRAM, ERASE or READ-with-on-die-ECC-enabled op, not on a SET_FEATURES op).
linux-stable-mirror@lists.linaro.org