On Tue, 2 Oct 2018, Arnd Bergmann wrote:
On Mon, Oct 1, 2018 at 8:53 PM Thomas Gleixner tglx@linutronix.de wrote:
On Mon, 1 Oct 2018, Eric W. Biederman wrote:
In the context of process migration there is a simpler subproblem that I think it is worth exploring if we can do something about.
For a cluster of machines all running with synchronized clocks. CLOCK_REALTIME matches. CLOCK_MONOTNIC does not match between machines. Not having a matching CLOCK_MONOTONIC prevents successful process migration between nodes in that cluster.
Would it be possible to allow setting CLOCK_MONOTONIC at the very beginning of time? So that all of the nodes in a cluster can be in sync?
No change in skew just in offset for CLOCK_MONOTONIC.
There are also dragons involved in coordinating things so that CLOCK_MONOTONIC gets set before CLOCK_MONOTONIC gets used. So I don't know if allowing CLOCK_MONOTONIC to be set would be practical but it seems work exploring all on it's own.
It's used very early on in the kernel, so that would be a major surprise for many things including user space which has expectations on clock monotonic.
It would be reasonably easy to add CLOCK_MONONOTIC_SYNC which can be set in the way you described and then in name spaces make it possible to magically map CLOCK_MONOTONIC to CLOCK_MONOTONIC_SYNC.
It still wouldn't allow to have different NTP/PTP time domains, but might be a good start to address the main migration headaches.
If we make CLOCK_MONOTONIC settable this way in a namespace, do you think that should include device drivers that report timestamps in CLOCK_MONOTONIC base, or only the timekeeping clock and timer interfaces?
Uurgh. That gets messy very fast.
Examples for drivers that can report timestamps are input, sound, v4l, and drm. I think most of these can report stamps in either monotonic or realtime base, while socket timestamps notably are always in realtime.
We can probably get away with not setting the timebase for those device drivers as long as the checkpoint/restart and migration features are not expected to restore the state of an open character device in that way. I don't know if that is a reasonable assumption to make for the examples I listed.
No idea. I'm not a container migration wizard.
Thanks,
tglx