This is a note to let you know that I've just added the patch titled
rtc-opal: Fix handling of firmware error codes, prevent busy loops
to the 4.14-stable tree which can be found at: http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git%3Ba=su...
The filename of the patch is: rtc-opal-fix-handling-of-firmware-error-codes-prevent-busy-loops.patch and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree, please let stable@vger.kernel.org know about it.
From 5b8b58063029f02da573120ef4dc9079822e3cda Mon Sep 17 00:00:00 2001
From: Stewart Smith stewart@linux.vnet.ibm.com Date: Tue, 2 Aug 2016 11:50:16 +1000 Subject: rtc-opal: Fix handling of firmware error codes, prevent busy loops
From: Stewart Smith stewart@linux.vnet.ibm.com
commit 5b8b58063029f02da573120ef4dc9079822e3cda upstream.
According to the OPAL docs: skiboot-5.2.5/doc/opal-api/opal-rtc-read-3.txt skiboot-5.2.5/doc/opal-api/opal-rtc-write-4.txt
OPAL_HARDWARE may be returned from OPAL_RTC_READ or OPAL_RTC_WRITE and this indicates either a transient or permanent error.
Prior to this patch, Linux was not dealing with OPAL_HARDWARE being a permanent error particularly well, in that you could end up in a busy loop.
This was not too hard to trigger on an AMI BMC based OpenPOWER machine doing a continuous "ipmitool mc reset cold" to the BMC, the result of that being that we'd get stuck in an infinite loop in opal_get_rtc_time().
We now retry a few times before returning the error higher up the stack.
Fixes: 16b1d26e77b1 ("rtc/tpo: Driver to support rtc and wakeup on PowerNV platform") Cc: stable@vger.kernel.org # v3.19+ Signed-off-by: Stewart Smith stewart@linux.vnet.ibm.com Signed-off-by: Michael Ellerman mpe@ellerman.id.au Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/rtc/rtc-opal.c | 12 ++++++++++-- 1 file changed, 10 insertions(+), 2 deletions(-)
--- a/drivers/rtc/rtc-opal.c +++ b/drivers/rtc/rtc-opal.c @@ -58,6 +58,7 @@ static void tm_to_opal(struct rtc_time * static int opal_get_rtc_time(struct device *dev, struct rtc_time *tm) { long rc = OPAL_BUSY; + int retries = 10; u32 y_m_d; u64 h_m_s_ms; __be32 __y_m_d; @@ -67,8 +68,11 @@ static int opal_get_rtc_time(struct devi rc = opal_rtc_read(&__y_m_d, &__h_m_s_ms); if (rc == OPAL_BUSY_EVENT) opal_poll_events(NULL); - else + else if (retries-- && (rc == OPAL_HARDWARE + || rc == OPAL_INTERNAL_ERROR)) msleep(10); + else if (rc != OPAL_BUSY && rc != OPAL_BUSY_EVENT) + break; }
if (rc != OPAL_SUCCESS) @@ -84,6 +88,7 @@ static int opal_get_rtc_time(struct devi static int opal_set_rtc_time(struct device *dev, struct rtc_time *tm) { long rc = OPAL_BUSY; + int retries = 10; u32 y_m_d = 0; u64 h_m_s_ms = 0;
@@ -92,8 +97,11 @@ static int opal_set_rtc_time(struct devi rc = opal_rtc_write(y_m_d, h_m_s_ms); if (rc == OPAL_BUSY_EVENT) opal_poll_events(NULL); - else + else if (retries-- && (rc == OPAL_HARDWARE + || rc == OPAL_INTERNAL_ERROR)) msleep(10); + else if (rc != OPAL_BUSY && rc != OPAL_BUSY_EVENT) + break; }
return rc == OPAL_SUCCESS ? 0 : -EIO;
Patches currently in stable-queue which might be from stewart@linux.vnet.ibm.com are
queue-4.14/rtc-opal-fix-handling-of-firmware-error-codes-prevent-busy-loops.patch