On Thu, Sep 1, 2022, at 1:31 PM, Reinier Kuipers wrote:
I'm working to fix the y2038 issue for an existing sama5d3-based product. This involves updating the kernel and glibc to appropriate versions (5.10 and 2.35.1 respectively) and I got things running up to a state where, from userspace, both date and hwclock commands have no issue accepting dates beyond 2038. However, even with the RTC_HCTOSYS and RTC_HCTOSYS_DEVICE options configured correctly, the RTC driver fails to initialize the system clock at bootup.
Some digging in rtc/class.c::rtc_hctosys() indicates that do_settimeofday64() is deliberately not executed on systems with BITS_PER_LONG==32 and a second counter higher than INT_MAX. I assumed that the work on 64-bits timestamps was already fully implemented for 32-bit systems as well, so my gut feel is that this BITS_PER_LONG/INT_MAX check has become unnecessary. A test build with these checks disabled results in correct time initialization at bootup with, at a glance, no adverse effects. Does anybody here know whether do_settimeofday64() is robust on 32-bit systems or that the checks are still required to prevent further breakage?
Please see commit b3a5ac42ab18 ("rtc: hctosys: Ensure system time doesn't overflow time_t") and https://github.com/systemd/systemd/issues/1143 for the problem that originally caused this to be added.
Removing this check would probably break systemd again for machines that return a post-y2038 time with systemd built on 32-bit time_t.
The only reliable fix I can see would be to disable CONFIG_RTC_HCTOSYS_DEVICE. I think this is Alexandre's plan for the long run anyway, but I don't know if there has been any progress in convincing distros to turn it off.
Arnd
On 01/09/2022 13:55:19+0200, Arnd Bergmann wrote:
On Thu, Sep 1, 2022, at 1:31 PM, Reinier Kuipers wrote:
I'm working to fix the y2038 issue for an existing sama5d3-based product. This involves updating the kernel and glibc to appropriate versions (5.10 and 2.35.1 respectively) and I got things running up to a state where, from userspace, both date and hwclock commands have no issue accepting dates beyond 2038. However, even with the RTC_HCTOSYS and RTC_HCTOSYS_DEVICE options configured correctly, the RTC driver fails to initialize the system clock at bootup.
Some digging in rtc/class.c::rtc_hctosys() indicates that do_settimeofday64() is deliberately not executed on systems with BITS_PER_LONG==32 and a second counter higher than INT_MAX. I assumed that the work on 64-bits timestamps was already fully implemented for 32-bit systems as well, so my gut feel is that this BITS_PER_LONG/INT_MAX check has become unnecessary. A test build with these checks disabled results in correct time initialization at bootup with, at a glance, no adverse effects. Does anybody here know whether do_settimeofday64() is robust on 32-bit systems or that the checks are still required to prevent further breakage?
Please see commit b3a5ac42ab18 ("rtc: hctosys: Ensure system time doesn't overflow time_t") and https://github.com/systemd/systemd/issues/1143 for the problem that originally caused this to be added.
Removing this check would probably break systemd again for machines that return a post-y2038 time with systemd built on 32-bit time_t.
The only reliable fix I can see would be to disable CONFIG_RTC_HCTOSYS_DEVICE. I think this is Alexandre's plan for the long run anyway, but I don't know if there has been any progress in convincing distros to turn it off.
This is still my plan but systemd mandates RTC_HCTOSYS and I couldn't convince Lennart otherwise.
On Thu, Sep 1, 2022, at 2:49 PM, Alexandre Belloni wrote:
On 01/09/2022 13:55:19+0200, Arnd Bergmann wrote:
The only reliable fix I can see would be to disable CONFIG_RTC_HCTOSYS_DEVICE. I think this is Alexandre's plan for the long run anyway, but I don't know if there has been any progress in convincing distros to turn it off.
This is still my plan but systemd mandates RTC_HCTOSYS and I couldn't convince Lennart otherwise.
Ah, I forgot that systemd actually needs it. So I guess there is currently no way to use systemd on 32-bit machines that are meant to survive 2038, regardless of whether systemd and glibc are built with a 64-bit time_t or not, right?
Is there perhaps a way to change the logic in a way that it does not depend on the current time but instead depends on a property of the RTC device itself, so we make systems break immediately instead of by surprise in 2038?
As far as I remember, the workaround was only needed for certain devices that may set the time to something after 2038 on a depleted battery, but other devices would have a better failure case, right?
Arnd
On Thu, Sep 01, 2022 at 03:12:57PM +0200, Arnd Bergmann wrote:
Ah, I forgot that systemd actually needs it. So I guess there is currently no way to use systemd on 32-bit machines that are meant to survive 2038, regardless of whether systemd and glibc are built with a 64-bit time_t or not, right?
Is there perhaps a way to change the logic in a way that it does not depend on the current time but instead depends on a property of the RTC device itself, so we make systems break immediately instead of by surprise in 2038?
Are you seriously suggesting to cause regressions on systems where the RTC can send the kernel's timekeeping back to the early 1900s, rather than printing a big fat warning message in the kernel log?
On Thu, Sep 1, 2022, at 3:46 PM, Russell King (Oracle) wrote:
On Thu, Sep 01, 2022 at 03:12:57PM +0200, Arnd Bergmann wrote:
Ah, I forgot that systemd actually needs it. So I guess there is currently no way to use systemd on 32-bit machines that are meant to survive 2038, regardless of whether systemd and glibc are built with a 64-bit time_t or not, right?
Is there perhaps a way to change the logic in a way that it does not depend on the current time but instead depends on a property of the RTC device itself, so we make systems break immediately instead of by surprise in 2038?
Are you seriously suggesting to cause regressions on systems where the RTC can send the kernel's timekeeping back to the early 1900s, rather than printing a big fat warning message in the kernel log?
I think the systems that can send the timekeeping back into the early 1900s (or at least after 1970) are fine, the problem is the systems that can randomly send the timekeeping into the post-2038 future.
What kind of warning would you suggest to print here? I don't see how warning about broken hardware at every boot would help, since there is no way for users to react to that warning. Similarly, warning about a time_t value past 2038 does not help because at that time one either has a bricked system (if using a systemd with 32-bit time_t) or it is actually 2038 and the system reverts back to 1970.
What might work is to have all drivers for broken RTC devices default to a 1902-2037 (or 1970-2037) date range to ensure that only those devices are broken in 2038, but still allow overriding the "start-year" property in DT for machines that don't use the broken systemd.
Arnd
On Thu, Sep 01, 2022 at 05:48:01PM +0200, Arnd Bergmann wrote:
On Thu, Sep 1, 2022, at 3:46 PM, Russell King (Oracle) wrote:
On Thu, Sep 01, 2022 at 03:12:57PM +0200, Arnd Bergmann wrote:
Ah, I forgot that systemd actually needs it. So I guess there is currently no way to use systemd on 32-bit machines that are meant to survive 2038, regardless of whether systemd and glibc are built with a 64-bit time_t or not, right?
Is there perhaps a way to change the logic in a way that it does not depend on the current time but instead depends on a property of the RTC device itself, so we make systems break immediately instead of by surprise in 2038?
Are you seriously suggesting to cause regressions on systems where the RTC can send the kernel's timekeeping back to the early 1900s, rather than printing a big fat warning message in the kernel log?
I think the systems that can send the timekeeping back into the early 1900s (or at least after 1970) are fine, the problem is the systems that can randomly send the timekeeping into the post-2038 future.
I believe Armada 388 systems can do that - and since Armada 388 systems are involved in my connectivity, I would very much prefer it if someone doesn't patch stuff that causes them to explode when I decide to upgrade the kernel.
(Yes, I've run into the broken systemd issue with them when the RTC was not correctly set on platform delivery.)
What kind of warning would you suggest to print here? I don't see how warning about broken hardware at every boot would help, since there is no way for users to react to that warning. Similarly, warning about a time_t value past 2038 does not help because at that time one either has a bricked system (if using a systemd with 32-bit time_t) or it is actually 2038 and the system reverts back to 1970.
What might work is to have all drivers for broken RTC devices default to a 1902-2037 (or 1970-2037) date range to ensure that only those devices are broken in 2038, but still allow overriding the "start-year" property in DT for machines that don't use the broken systemd.
I don't care too much how it's handled, but my objection is purely against the intentional breaking of platforms such that they cause people pain.
Sure, they will break in 2038, but that's no reason to break them in 2022/3.
On Thu, Sep 1, 2022, at 6:02 PM, Russell King (Oracle) wrote:
On Thu, Sep 01, 2022 at 05:48:01PM +0200, Arnd Bergmann wrote:
I think the systems that can send the timekeeping back into the early 1900s (or at least after 1970) are fine, the problem is the systems that can randomly send the timekeeping into the post-2038 future.
I believe Armada 388 systems can do that - and since Armada 388 systems are involved in my connectivity, I would very much prefer it if someone doesn't patch stuff that causes them to explode when I decide to upgrade the kernel.
(Yes, I've run into the broken systemd issue with them when the RTC was not correctly set on platform delivery.)
Ok, good to know. I wonder if this patch would be sufficient for this particular driver:
diff --git a/drivers/rtc/rtc-armada38x.c b/drivers/rtc/rtc-armada38x.c index cc542e6b1d5b..f2bbb8efed18 100644 --- a/drivers/rtc/rtc-armada38x.c +++ b/drivers/rtc/rtc-armada38x.c @@ -219,7 +219,7 @@ static int armada38x_rtc_read_time(struct device *dev, struct rtc_time *tm) time = rtc->data->read_rtc_reg(rtc, RTC_TIME); spin_unlock_irqrestore(&rtc->lock, flags);
- rtc_time64_to_tm(time, tm); + rtc_time64_to_tm((s32)time, tm);
return 0; } @@ -541,7 +541,8 @@ static __init int armada38x_rtc_probe(struct platform_device *pdev) rtc->data->update_mbus_timing(rtc);
rtc->rtc_dev->ops = &armada38x_rtc_ops; - rtc->rtc_dev->range_max = U32_MAX; + rtc->rtc_dev->range_min = S32_MIN; + rtc->rtc_dev->range_max = S32_MAX;
return devm_rtc_register_device(rtc->rtc_dev); }
The effect of this is to interpret the RTC values as range 1902...2038 instead of 1970...2106, which should make systemd not crash any more on random input, but have no other side-effects within the 1970...2038 range.
Users that care about running systems beyond 2038 and run a time64 userland can then set the wrap-around point in DT e.g. to 2022...2156 using the 'start-year=<2022>;' property, or any other value they like. If we can do the equivalent for all RTC drivers that may suffer from the same problem, the HCTOSYS hack for the S32_MAX value can just get removed.
Arnd
[I accidentally dropped Rainier from Cc, adding him back now. For reference, the other mail are archived at https://lore.kernel.org/linux-arm-kernel/CAKYb531CyL8XRVRcRN30cC3xRgsd-1FzXU...]
On 01/09/2022 22:33:46+0200, Arnd Bergmann wrote:
On Thu, Sep 1, 2022, at 6:02 PM, Russell King (Oracle) wrote:
On Thu, Sep 01, 2022 at 05:48:01PM +0200, Arnd Bergmann wrote:
I think the systems that can send the timekeeping back into the early 1900s (or at least after 1970) are fine, the problem is the systems that can randomly send the timekeeping into the post-2038 future.
I believe Armada 388 systems can do that - and since Armada 388 systems are involved in my connectivity, I would very much prefer it if someone doesn't patch stuff that causes them to explode when I decide to upgrade the kernel.
(Yes, I've run into the broken systemd issue with them when the RTC was not correctly set on platform delivery.)
Ok, good to know. I wonder if this patch would be sufficient for this particular driver:
I'm pretty sure we don't want to play whack-a-mole with all the drivers, especially with those for RTCs that are available on both 32b and 64b systems.
diff --git a/drivers/rtc/rtc-armada38x.c b/drivers/rtc/rtc-armada38x.c index cc542e6b1d5b..f2bbb8efed18 100644 --- a/drivers/rtc/rtc-armada38x.c +++ b/drivers/rtc/rtc-armada38x.c @@ -219,7 +219,7 @@ static int armada38x_rtc_read_time(struct device *dev, struct rtc_time *tm) time = rtc->data->read_rtc_reg(rtc, RTC_TIME); spin_unlock_irqrestore(&rtc->lock, flags);
- rtc_time64_to_tm(time, tm);
- rtc_time64_to_tm((s32)time, tm);
You may as well just clamp the value here, the RTC subsystem specifically considers a timestamp to be positive and this is why it is not affected by y2038 with 32bit second counters.
return 0; } @@ -541,7 +541,8 @@ static __init int armada38x_rtc_probe(struct platform_device *pdev) rtc->data->update_mbus_timing(rtc); rtc->rtc_dev->ops = &armada38x_rtc_ops;
- rtc->rtc_dev->range_max = U32_MAX;
- rtc->rtc_dev->range_min = S32_MIN;
- rtc->rtc_dev->range_max = S32_MAX;
return devm_rtc_register_device(rtc->rtc_dev); }
The effect of this is to interpret the RTC values as range 1902...2038 instead of 1970...2106, which should make systemd not crash any more on random input, but have no other side-effects within the 1970...2038 range.
Users that care about running systems beyond 2038 and run a time64 userland can then set the wrap-around point in DT e.g. to 2022...2156 using the 'start-year=<2022>;' property, or any other value they like. If we can do the equivalent for all RTC drivers that may suffer from the same problem, the HCTOSYS hack for the S32_MAX value can just get removed.
Arnd
[I accidentally dropped Rainier from Cc, adding him back now. For reference, the other mail are archived at https://lore.kernel.org/linux-arm-kernel/CAKYb531CyL8XRVRcRN30cC3xRgsd-1FzXU...]
On Thu, Sep 1, 2022, at 11:11 PM, Alexandre Belloni wrote:
On 01/09/2022 22:33:46+0200, Arnd Bergmann wrote:
On Thu, Sep 1, 2022, at 6:02 PM, Russell King (Oracle) wrote:
On Thu, Sep 01, 2022 at 05:48:01PM +0200, Arnd Bergmann wrote:
I think the systems that can send the timekeeping back into the early 1900s (or at least after 1970) are fine, the problem is the systems that can randomly send the timekeeping into the post-2038 future.
I believe Armada 388 systems can do that - and since Armada 388 systems are involved in my connectivity, I would very much prefer it if someone doesn't patch stuff that causes them to explode when I decide to upgrade the kernel.
(Yes, I've run into the broken systemd issue with them when the RTC was not correctly set on platform delivery.)
Ok, good to know. I wonder if this patch would be sufficient for this particular driver:
I'm pretty sure we don't want to play whack-a-mole with all the drivers, especially with those for RTCs that are available on both 32b and 64b systems.
If we want to address all drivers at the same time, this also affects every architecture including x86: any 32-bit setup that relies on RTC_HCTOSYS will break in 2038 unless we remove the INT_MAX hack, but removing it everywhere immediately breaks setups that run systemd when the RTC fails.
Note that this is not actually 32-bit specific, since the kernel has no idea if it's running 32-bit or 64-bit userspace. I think we are increasingly seeing users run 32-bit userland on arm64 and rv64 kernels as low-end SoCs are moving to 64-bit cores but remain memory constrained.
diff --git a/drivers/rtc/rtc-armada38x.c b/drivers/rtc/rtc-armada38x.c index cc542e6b1d5b..f2bbb8efed18 100644 --- a/drivers/rtc/rtc-armada38x.c +++ b/drivers/rtc/rtc-armada38x.c @@ -219,7 +219,7 @@ static int armada38x_rtc_read_time(struct device *dev, struct rtc_time *tm) time = rtc->data->read_rtc_reg(rtc, RTC_TIME); spin_unlock_irqrestore(&rtc->lock, flags);
- rtc_time64_to_tm(time, tm);
- rtc_time64_to_tm((s32)time, tm);
You may as well just clamp the value here, the RTC subsystem specifically considers a timestamp to be positive and this is why it is not affected by y2038 with 32bit second counters.
Do you mean clamping to a non-negative value? That would break the 'start-time' trick that I suggested. As far as I can tell, the rtc_device_get_offset() function should work correctly and not apply any translation when start-year is unset, but use a translated range when it is set. If all negative values are capped in the armada38x driver, that would make anything break that is in the wrong half of the translated range.
Arnd
On 01/09/2022 15:12:57+0200, Arnd Bergmann wrote:
On Thu, Sep 1, 2022, at 2:49 PM, Alexandre Belloni wrote:
On 01/09/2022 13:55:19+0200, Arnd Bergmann wrote:
The only reliable fix I can see would be to disable CONFIG_RTC_HCTOSYS_DEVICE. I think this is Alexandre's plan for the long run anyway, but I don't know if there has been any progress in convincing distros to turn it off.
This is still my plan but systemd mandates RTC_HCTOSYS and I couldn't convince Lennart otherwise.
Ah, I forgot that systemd actually needs it. So I guess there is currently no way to use systemd on 32-bit machines that are meant to survive 2038, regardless of whether systemd and glibc are built with a 64-bit time_t or not, right?
Well, it doesn't actually need it. It could as well go and read the RTC and decide what to do with the time it gets. The main reason they want it is because the log timestamps are correct earlier when the kernel does it.
Is there perhaps a way to change the logic in a way that it does not depend on the current time but instead depends on a property of the RTC device itself, so we make systems break immediately instead of by surprise in 2038?
The safe thing to do is really to not use hctosys and have a systemd unit that reads the RTC from userspace. See https://github.com/systemd/systemd/issues/17737 for the whole discussion.
As far as I remember, the workaround was only needed for certain devices that may set the time to something after 2038 on a depleted battery, but other devices would have a better failure case, right?
Yes, this is the main cause, anything able to set the system time after 2038 with a 32bit userspace will cause that (and basically I think this is only hctosys). The issue is that many RTCs don't have a default value for the time registers after power failure. This is usually not an issue as there is also a bit allowing to detect whether the time is correct. note that this will also be an issue once we actually reach 2038 with a 32bit userspace.
On Thu, Sep 1, 2022, at 3:57 PM, Alexandre Belloni wrote:
On 01/09/2022 15:12:57+0200, Arnd Bergmann wrote:
As far as I remember, the workaround was only needed for certain devices that may set the time to something after 2038 on a depleted battery, but other devices would have a better failure case, right?
Yes, this is the main cause, anything able to set the system time after 2038 with a 32bit userspace will cause that (and basically I think this is only hctosys). The issue is that many RTCs don't have a default value for the time registers after power failure. This is usually not an issue as there is also a bit allowing to detect whether the time is correct. note that this will also be an issue once we actually reach 2038 with a 32bit userspace.
The problem is that people are deploying systems already with the expectation that they will survive y2038, and it is rather unlikely that the developers that build these systems will be around to update the systems anywhere close to that. glibc now has the 64-bit time_t support (and musl has had it for a while), so even if you do unit tests on your own software to check for bugs, you wouldn't necessarily run into the issue unless you reboot the system with the RTC set to the future as part of the testing.
In effect, whatever we will need in 2038, we also need to have today, so the current code cannot remain unchanged, the question is just about how to minimize the damage.
Is there any way to find out which RTC drivers are affected by this?
Arnd