On Monday 11 May 2015 21:00:25 Ed Cashin wrote:
First, thanks for the patch. I do appreciate the attempt to simplify this part of the driver, but I don't think that this patch is good to merge.
I'll make some comments inline below.
On 05/10/2015 10:35 PM, Tina Ruchandani wrote:
'struct frame' uses two variables to store the sent timestamp - 'struct timeval' and jiffies. jiffies is used to avoid discrepancies caused by updates to system time. 'struct timeval' uses 32-bit representation for seconds which will overflow in year 2038.
The comment in the deleted lines below mentions the fact that the overflow does not matter for calculating rough-grained deltas in time. So there is no problem in 2038 or on systems with the clock set to 2038 accidentally.
To clarify: because 'struct timeval' is known to be broken in general, we want to remove it from the kernel entirely and replace it with something that is known to be correct in all cases, even for the drivers that are currently correct. Out of the ~250 files in the kernel that use 'timeval', we can't easily tell which ones are correct, so by replacing them all, we can eliminate all the bugs.
We should be able to do that in a way that generally improves all the drivers, because using 'timeval' tends to be suboptimal to start with.
If the currently available interfaces make things worse for the aoe driver, we may have to add extra infrastructure, and get something that also helps the conversion of other drivers.
This patch does the following:
- Replace the use of 'struct timeval' and jiffies with ktime_t, which
is a 64-bit timestamp and is year 2038 safe.
- ktime_t provides both long range (like jiffies) and high resolution
(like timeval). Using ktime_get (monotonic time) instead of wall-clock time prevents any discprepancies caused by updates to system time.
But the patch only changes the struct frame data. The aoe driver only has the struct frame for an incoming AoE response when that response is "expected". If the response comes in a bit late, the frame may have already been used for a new command.
You can see that in aoecmd_ata_rsp when getframe_deferred returns NULL and tsince is called instead of tsince_hr.
In that case, there is still information about the timing embedded in the AoE tag. The send time in jiffies is a rough-grained record of the send time, and it's extracted from the tag. For these "unexpected" responses, this timing information can improve performance significantly without introducing extra overhead or risk.
That path is not changed at all by this patch, right? It also looks like the jiffies information from there is only used to print an error message.
I don't think the patch considers this aspect of the way the round trip time is calculated, and I don't think the primary motivation is justified (if that's 2038 safety, which we have already).
Simplifying it would be nice, but it would be difficult to thoroughly test all of the performance implications. There are still people using 32-bit systems, for example.
Here is my analysis regarding the performance implications:
- Avoiding the access to 'jiffies' in a few places has basically zero impact in small systems, but may help on large SMP machines because it avoids cache line bouncing when you have use multiple concurrent accesses to jiffies.
- Replacing do_gettimeofday() with ktime_get() will improve things slightly on all machines, because it avoids a 32-bit division that takes a couple of cycles, up to hundreds of cycles on some CPU architectures.
- This leaves a single change that is currently making things worse in tsince_hr():
- do_gettimeofday(&now);
- n = now.tv_usec - f->sent.tv_usec;
- n += (now.tv_sec - f->sent.tv_sec) * USEC_PER_SEC;
- now = ktime_get();
- n = ktime_to_us(ktime_sub(now, f->sent));
ktime_to_us() requires a constant 64-bit integer division that is significantly more expensive than the 32-bit it replaces. Thanks to your analysis, I think it's fair to say that the function is indeed timing critical and we should try hard avoid introducing this overhead.
There are of course multiple ways to do this. One way would be to change the code to work on 32-bit nanoseconds instead of 32-bit microseconds. This requires proving that the we cannot exceed 4.29 seconds of round-trip time in calc_rttavg(). Is that a valid assumption or not?
If not, we could replace do_gettimeofday() with ktime_get_ts64(). This will ensure we don't need a 64-bit division when converting the ts64 to a 32-bit microsecond value, and combined with the conversion is still no slower than do_gettimeofday(), and it still avoids the double bookkeeping because it uses a monotonic timebase that is robust against settimeofday.
Arnd