which only does one 64-bit division, and it's one that we can probably optimize out in the future (we can check in ktime_ms_delta whether the difference is more than 2^32 nanoseconds as the fast path).
It looks like ktime_divns already has that optimization for 32-bit divisor, so your solution should avoid the 64-bit division.