This is a note to let you know that I've just added the patch titled
libata: disable LPM for Crucial BX100 SSD 500GB drive
to the 3.18-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
libata-disable-lpm-for-crucial-bx100-ssd-500gb-drive.patch
and it can be found in the queue-3.18 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From b17e5729a630d8326a48ec34ef02e6b4464a6aef Mon Sep 17 00:00:00 2001
From: Kai-Heng Feng <kai.heng.feng(a)canonical.com>
Date: Sun, 18 Feb 2018 22:17:09 +0800
Subject: libata: disable LPM for Crucial BX100 SSD 500GB drive
From: Kai-Heng Feng <kai.heng.feng(a)canonical.com>
commit b17e5729a630d8326a48ec34ef02e6b4464a6aef upstream.
After Laptop Mode Tools starts to use min_power for LPM, a user found
out Crucial BX100 SSD can't get mounted.
Crucial BX100 SSD 500GB drive don't work well with min_power. This also
happens to med_power_with_dipm.
So let's disable LPM for Crucial BX100 SSD 500GB drive.
BugLink: https://bugs.launchpad.net/bugs/1726930
Signed-off-by: Kai-Heng Feng <kai.heng.feng(a)canonical.com>
Signed-off-by: Tejun Heo <tj(a)kernel.org>
Cc: stable(a)vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/ata/libata-core.c | 3 +++
1 file changed, 3 insertions(+)
--- a/drivers/ata/libata-core.c
+++ b/drivers/ata/libata-core.c
@@ -4225,6 +4225,9 @@ static const struct ata_blacklist_entry
{ "PIONEER DVD-RW DVR-212D", NULL, ATA_HORKAGE_NOSETXFER },
{ "PIONEER DVD-RW DVR-216D", NULL, ATA_HORKAGE_NOSETXFER },
+ /* Crucial BX100 SSD 500GB has broken LPM support */
+ { "CT500BX100SSD1", "MU02", ATA_HORKAGE_NOLPM },
+
/* The 512GB version of the MX100 has both queued TRIM and LPM issues */
{ "Crucial_CT512MX100*", NULL, ATA_HORKAGE_NO_NCQ_TRIM |
ATA_HORKAGE_ZERO_AFTER_TRIM |
Patches currently in stable-queue which might be from kai.heng.feng(a)canonical.com are
queue-3.18/libata-disable-lpm-for-crucial-bx100-ssd-500gb-drive.patch
queue-3.18/libata-make-crucial-bx100-500gb-lpm-quirk-apply-to-all-firmware-versions.patch
This is a note to let you know that I've just added the patch titled
libata: Apply NOLPM quirk to Crucial MX100 512GB SSDs
to the 3.18-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
libata-apply-nolpm-quirk-to-crucial-mx100-512gb-ssds.patch
and it can be found in the queue-3.18 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 9c7be59fc519af9081c46c48f06f2b8fadf55ad8 Mon Sep 17 00:00:00 2001
From: Hans de Goede <hdegoede(a)redhat.com>
Date: Fri, 16 Feb 2018 10:48:20 +0100
Subject: libata: Apply NOLPM quirk to Crucial MX100 512GB SSDs
From: Hans de Goede <hdegoede(a)redhat.com>
commit 9c7be59fc519af9081c46c48f06f2b8fadf55ad8 upstream.
Various people have reported the Crucial MX100 512GB model not working
with LPM set to min_power. I've now received a report that it also does
not work with the new med_power_with_dipm level.
It does work with medium_power, but that has no measurable power-savings
and given the amount of people being bitten by the other levels not
working, this commit just disables LPM altogether.
Note all reporters of this have either the 512GB model (max capacity), or
are not specifying their SSD's size. So for now this quirk assumes this is
a problem with the 512GB model only.
Buglink: https://bugzilla.kernel.org/show_bug.cgi?id=89261
Buglink: https://github.com/linrunner/TLP/issues/84
Cc: stable(a)vger.kernel.org
Signed-off-by: Hans de Goede <hdegoede(a)redhat.com>
Signed-off-by: Tejun Heo <tj(a)kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/ata/libata-core.c | 5 +++++
1 file changed, 5 insertions(+)
--- a/drivers/ata/libata-core.c
+++ b/drivers/ata/libata-core.c
@@ -4225,6 +4225,11 @@ static const struct ata_blacklist_entry
{ "PIONEER DVD-RW DVR-212D", NULL, ATA_HORKAGE_NOSETXFER },
{ "PIONEER DVD-RW DVR-216D", NULL, ATA_HORKAGE_NOSETXFER },
+ /* The 512GB version of the MX100 has both queued TRIM and LPM issues */
+ { "Crucial_CT512MX100*", NULL, ATA_HORKAGE_NO_NCQ_TRIM |
+ ATA_HORKAGE_ZERO_AFTER_TRIM |
+ ATA_HORKAGE_NOLPM, },
+
/* devices that don't properly handle queued TRIM commands */
{ "Micron_M500_*", NULL, ATA_HORKAGE_NO_NCQ_TRIM |
ATA_HORKAGE_ZERO_AFTER_TRIM, },
Patches currently in stable-queue which might be from hdegoede(a)redhat.com are
queue-3.18/libata-apply-nolpm-quirk-to-crucial-m500-480-and-960gb-ssds.patch
queue-3.18/libata-apply-nolpm-quirk-to-crucial-mx100-512gb-ssds.patch
queue-3.18/libata-modify-quirks-for-mx100-to-limit-ncq_trim-quirk-to-mu01-version.patch
queue-3.18/libata-make-crucial-bx100-500gb-lpm-quirk-apply-to-all-firmware-versions.patch
This is a note to let you know that I've just added the patch titled
libata: Apply NOLPM quirk to Crucial M500 480 and 960GB SSDs
to the 3.18-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
libata-apply-nolpm-quirk-to-crucial-m500-480-and-960gb-ssds.patch
and it can be found in the queue-3.18 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 62ac3f7305470e3f52f159de448bc1a771717e88 Mon Sep 17 00:00:00 2001
From: Hans de Goede <hdegoede(a)redhat.com>
Date: Mon, 19 Mar 2018 16:33:58 +0100
Subject: libata: Apply NOLPM quirk to Crucial M500 480 and 960GB SSDs
From: Hans de Goede <hdegoede(a)redhat.com>
commit 62ac3f7305470e3f52f159de448bc1a771717e88 upstream.
There have been reports of the Crucial M500 480GB model not working
with LPM set to min_power / med_power_with_dipm level.
It has not been tested with medium_power, but that typically has no
measurable power-savings.
Note the reporters Crucial_CT480M500SSD3 has a firmware version of MU03
and there is a MU05 update available, but that update does not mention any
LPM fixes in its changelog, so the quirk matches all firmware versions.
In my experience the LPM problems with (older) Crucial SSDs seem to be
limited to higher capacity versions of the SSDs (different firmware?),
so this commit adds a NOLPM quirk for the 480 and 960GB versions of the
M500, to avoid LPM causing issues with these SSDs.
Cc: stable(a)vger.kernel.org
Reported-and-tested-by: Martin Steigerwald <martin(a)lichtvoll.de>
Signed-off-by: Hans de Goede <hdegoede(a)redhat.com>
Signed-off-by: Tejun Heo <tj(a)kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/ata/libata-core.c | 8 ++++++++
1 file changed, 8 insertions(+)
--- a/drivers/ata/libata-core.c
+++ b/drivers/ata/libata-core.c
@@ -4233,6 +4233,14 @@ static const struct ata_blacklist_entry
ATA_HORKAGE_ZERO_AFTER_TRIM |
ATA_HORKAGE_NOLPM, },
+ /* 480GB+ M500 SSDs have both queued TRIM and LPM issues */
+ { "Crucial_CT480M500*", NULL, ATA_HORKAGE_NO_NCQ_TRIM |
+ ATA_HORKAGE_ZERO_AFTER_TRIM |
+ ATA_HORKAGE_NOLPM, },
+ { "Crucial_CT960M500*", NULL, ATA_HORKAGE_NO_NCQ_TRIM |
+ ATA_HORKAGE_ZERO_AFTER_TRIM |
+ ATA_HORKAGE_NOLPM, },
+
/* devices that don't properly handle queued TRIM commands */
{ "Micron_M500_*", NULL, ATA_HORKAGE_NO_NCQ_TRIM |
ATA_HORKAGE_ZERO_AFTER_TRIM, },
Patches currently in stable-queue which might be from hdegoede(a)redhat.com are
queue-3.18/libata-apply-nolpm-quirk-to-crucial-m500-480-and-960gb-ssds.patch
queue-3.18/libata-apply-nolpm-quirk-to-crucial-mx100-512gb-ssds.patch
queue-3.18/libata-modify-quirks-for-mx100-to-limit-ncq_trim-quirk-to-mu01-version.patch
queue-3.18/libata-make-crucial-bx100-500gb-lpm-quirk-apply-to-all-firmware-versions.patch
The struct xfs_agfl v5 header was originally introduced with
unexpected padding that caused the AGFL to operate with one less
slot than intended. The header has since been packed, but the fix
left an incompatibility for users who upgrade from an old kernel
with the unpacked header to a newer kernel with the packed header
while the AGFL happens to wrap around the end. The newer kernel
recognizes one extra slot at the physical end of the AGFL that the
previous kernel did not. The new kernel will eventually attempt to
allocate a block from that slot, which contains invalid data, and
cause a crash.
This condition can be detected by comparing the active range of the
AGFL to the count. While this detects a padding mismatch, it can
also trigger false positives for unrelated flcount corruption. Since
we cannot distinguish a size mismatch due to padding from unrelated
corruption, we can't trust the AGFL enough to simply repopulate the
empty slot.
Instead, avoid unnecessarily complex detection logic and and use a
solution that can handle any form of flcount corruption that slips
through read verifiers: distrust the entire AGFL and reset it to an
empty state. Any valid blocks within the AGFL are intentionally
leaked. This requires xfs_repair to rectify (which was already
necessary based on the state the AGFL was found in). The reset
mitigates the side effect of the padding mismatch problem from a
filesystem crash to a free space accounting inconsistency. The
generic approach also means that this patch can be safely backported
to kernels with or without a packed struct xfs_agfl.
Check the AGF for an invalid freelist count on initial read from
disk. If detected, set a flag on the xfs_perag to indicate that a
reset is required before the AGFL can be used. In the first
transaction that attempts to use a flagged AGFL, reset it to empty,
warn the user about the inconsistency and allow the freelist fixup
code to repopulate the AGFL with new blocks. The xfs_perag flag is
cleared to eliminate the need for repeated checks on each block
allocation operation.
Fixes: 96f859d52bcb ("libxfs: pack the agfl header structure so XFS_AGFL_SIZE is correct")
CC: <stable(a)vger.kernel.org>
Suggested-by: Dave Chinner <david(a)fromorbit.com>
Signed-off-by: Brian Foster <bfoster(a)redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong(a)oracle.com>
Reviewed-by Dave Chiluk <chiluk+linuxxfs(a)indeed.com>
---
v2:
- Function rename and logic cleanup.
- CC stable.
v1: https://marc.info/?l=linux-xfs&m=152104784709048&w=2
- Use a simplified AGFL reset mechanism.
- Trigger on AGFL fixup rather than get/put.
- Various refactors, cleanups and comments.
rfc: https://marc.info/?l=linux-xfs&m=152045069616434&w=2
fs/xfs/libxfs/xfs_alloc.c | 94 +++++++++++++++++++++++++++++++++++++++++++++++
fs/xfs/xfs_mount.h | 1 +
fs/xfs/xfs_trace.h | 9 ++++-
3 files changed, 103 insertions(+), 1 deletion(-)
diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
index 3db90b707fb2..39387bdd225d 100644
--- a/fs/xfs/libxfs/xfs_alloc.c
+++ b/fs/xfs/libxfs/xfs_alloc.c
@@ -2063,6 +2063,93 @@ xfs_alloc_space_available(
}
/*
+ * Check the agfl fields of the agf for inconsistency or corruption. The purpose
+ * is to detect an agfl header padding mismatch between current and early v5
+ * kernels. This problem manifests as a 1-slot size difference between the
+ * on-disk flcount and the active [first, last] range of a wrapped agfl. This
+ * may also catch variants of agfl count corruption unrelated to padding. Either
+ * way, we'll reset the agfl and warn the user.
+ *
+ * Return true if a reset is required before the agfl can be used, false
+ * otherwise.
+ */
+static bool
+xfs_agfl_needs_reset(
+ struct xfs_mount *mp,
+ struct xfs_agf *agf)
+{
+ uint32_t f = be32_to_cpu(agf->agf_flfirst);
+ uint32_t l = be32_to_cpu(agf->agf_fllast);
+ uint32_t c = be32_to_cpu(agf->agf_flcount);
+ int agfl_size = xfs_agfl_size(mp);
+ int active;
+
+ /* no agfl header on v4 supers */
+ if (!xfs_sb_version_hascrc(&mp->m_sb))
+ return false;
+
+ /*
+ * The agf read verifier catches severe corruption of these fields.
+ * Repeat some sanity checks to cover a packed -> unpacked mismatch if
+ * the verifier allows it.
+ */
+ if (f >= agfl_size || l >= agfl_size)
+ return true;
+ if (c > agfl_size)
+ return true;
+
+ /*
+ * Check consistency between the on-disk count and the active range. An
+ * agfl padding mismatch manifests as an inconsistent flcount.
+ */
+ if (c && l >= f)
+ active = l - f + 1;
+ else if (c)
+ active = agfl_size - f + l + 1;
+ else
+ active = 0;
+
+ return active != c;
+}
+
+/*
+ * Reset the agfl to an empty state. Ignore/drop any existing blocks since the
+ * agfl content cannot be trusted. Warn the user that a repair is required to
+ * recover leaked blocks.
+ *
+ * The purpose of this mechanism is to handle filesystems affected by the agfl
+ * header padding mismatch problem. A reset keeps the filesystem online with a
+ * relatively minor free space accounting inconsistency rather than suffer the
+ * inevitable crash from use of an invalid agfl block.
+ */
+static void
+xfs_agfl_reset(
+ struct xfs_trans *tp,
+ struct xfs_buf *agbp,
+ struct xfs_perag *pag)
+{
+ struct xfs_mount *mp = tp->t_mountp;
+ struct xfs_agf *agf = XFS_BUF_TO_AGF(agbp);
+
+ ASSERT(pag->pagf_agflreset);
+ trace_xfs_agfl_reset(mp, agf, 0, _RET_IP_);
+
+ xfs_warn(mp,
+ "WARNING: Reset corrupted AGFL on AG %u. %d blocks leaked. "
+ "Please unmount and run xfs_repair.",
+ pag->pag_agno, pag->pagf_flcount);
+
+ agf->agf_flfirst = 0;
+ agf->agf_fllast = cpu_to_be32(xfs_agfl_size(mp) - 1);
+ agf->agf_flcount = 0;
+ xfs_alloc_log_agf(tp, agbp, XFS_AGF_FLFIRST | XFS_AGF_FLLAST |
+ XFS_AGF_FLCOUNT);
+
+ pag->pagf_flcount = 0;
+ pag->pagf_agflreset = false;
+}
+
+/*
* Decide whether to use this allocation group for this allocation.
* If so, fix up the btree freelist's size.
*/
@@ -2123,6 +2210,10 @@ xfs_alloc_fix_freelist(
}
}
+ /* reset a padding mismatched agfl before final free space check */
+ if (pag->pagf_agflreset)
+ xfs_agfl_reset(tp, agbp, pag);
+
/* If there isn't enough total space or single-extent, reject it. */
need = xfs_alloc_min_freelist(mp, pag);
if (!xfs_alloc_space_available(args, need, flags))
@@ -2279,6 +2370,7 @@ xfs_alloc_get_freelist(
agf->agf_flfirst = 0;
pag = xfs_perag_get(mp, be32_to_cpu(agf->agf_seqno));
+ ASSERT(!pag->pagf_agflreset);
be32_add_cpu(&agf->agf_flcount, -1);
xfs_trans_agflist_delta(tp, -1);
pag->pagf_flcount--;
@@ -2390,6 +2482,7 @@ xfs_alloc_put_freelist(
agf->agf_fllast = 0;
pag = xfs_perag_get(mp, be32_to_cpu(agf->agf_seqno));
+ ASSERT(!pag->pagf_agflreset);
be32_add_cpu(&agf->agf_flcount, 1);
xfs_trans_agflist_delta(tp, 1);
pag->pagf_flcount++;
@@ -2597,6 +2690,7 @@ xfs_alloc_read_agf(
pag->pagb_count = 0;
pag->pagb_tree = RB_ROOT;
pag->pagf_init = 1;
+ pag->pagf_agflreset = xfs_agfl_needs_reset(mp, agf);
}
#ifdef DEBUG
else if (!XFS_FORCED_SHUTDOWN(mp)) {
diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h
index 1808f56decaa..10b90bbc5162 100644
--- a/fs/xfs/xfs_mount.h
+++ b/fs/xfs/xfs_mount.h
@@ -353,6 +353,7 @@ typedef struct xfs_perag {
char pagi_inodeok; /* The agi is ok for inodes */
uint8_t pagf_levels[XFS_BTNUM_AGF];
/* # of levels in bno & cnt btree */
+ bool pagf_agflreset; /* agfl requires reset before use */
uint32_t pagf_flcount; /* count of blocks in freelist */
xfs_extlen_t pagf_freeblks; /* total free blocks */
xfs_extlen_t pagf_longest; /* longest free space */
diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
index 945de08af7ba..a982c0b623d0 100644
--- a/fs/xfs/xfs_trace.h
+++ b/fs/xfs/xfs_trace.h
@@ -1477,7 +1477,7 @@ TRACE_EVENT(xfs_extent_busy_trim,
__entry->tlen)
);
-TRACE_EVENT(xfs_agf,
+DECLARE_EVENT_CLASS(xfs_agf_class,
TP_PROTO(struct xfs_mount *mp, struct xfs_agf *agf, int flags,
unsigned long caller_ip),
TP_ARGS(mp, agf, flags, caller_ip),
@@ -1533,6 +1533,13 @@ TRACE_EVENT(xfs_agf,
__entry->longest,
(void *)__entry->caller_ip)
);
+#define DEFINE_AGF_EVENT(name) \
+DEFINE_EVENT(xfs_agf_class, name, \
+ TP_PROTO(struct xfs_mount *mp, struct xfs_agf *agf, int flags, \
+ unsigned long caller_ip), \
+ TP_ARGS(mp, agf, flags, caller_ip))
+DEFINE_AGF_EVENT(xfs_agf);
+DEFINE_AGF_EVENT(xfs_agfl_reset);
TRACE_EVENT(xfs_free_extent,
TP_PROTO(struct xfs_mount *mp, xfs_agnumber_t agno, xfs_agblock_t agbno,
--
2.13.6