'struct frame' uses two variables to store the sent timestamp - 'struct timeval' and jiffies. jiffies is used to avoid discrepancies caused by updates to system time. 'struct timeval' uses 32-bit representation for seconds which will overflow in year 2038. This patch does the following: - Replace the use of 'struct timeval' and jiffies with ktime_t, which is a 64-bit timestamp and is year 2038 safe. - ktime_t provides both long range (like jiffies) and high resolution (like timeval). Using ktime_get (monotonic time) instead of wall-clock time prevents any discprepancies caused by updates to system time.
Signed-off-by: Tina Ruchandani ruchandani.tina@gmail.com --- drivers/block/aoe/aoe.h | 3 +-- drivers/block/aoe/aoecmd.c | 36 +++++++----------------------------- 2 files changed, 8 insertions(+), 31 deletions(-)
diff --git a/drivers/block/aoe/aoe.h b/drivers/block/aoe/aoe.h index 9220f8e..4582b3c 100644 --- a/drivers/block/aoe/aoe.h +++ b/drivers/block/aoe/aoe.h @@ -112,8 +112,7 @@ enum frame_flags { struct frame { struct list_head head; u32 tag; - struct timeval sent; /* high-res time packet was sent */ - u32 sent_jiffs; /* low-res jiffies-based sent time */ + ktime_t sent; ulong waited; ulong waited_total; struct aoetgt *t; /* parent target I belong to */ diff --git a/drivers/block/aoe/aoecmd.c b/drivers/block/aoe/aoecmd.c index 422b7d8..7f78780 100644 --- a/drivers/block/aoe/aoecmd.c +++ b/drivers/block/aoe/aoecmd.c @@ -398,8 +398,7 @@ aoecmd_ata_rw(struct aoedev *d)
skb = skb_clone(f->skb, GFP_ATOMIC); if (skb) { - do_gettimeofday(&f->sent); - f->sent_jiffs = (u32) jiffies; + f->sent = ktime_get(); __skb_queue_head_init(&queue); __skb_queue_tail(&queue, skb); aoenet_xmit(&queue); @@ -489,8 +488,7 @@ resend(struct aoedev *d, struct frame *f) skb = skb_clone(skb, GFP_ATOMIC); if (skb == NULL) return; - do_gettimeofday(&f->sent); - f->sent_jiffs = (u32) jiffies; + f->sent = ktime_get(); __skb_queue_head_init(&queue); __skb_queue_tail(&queue, skb); aoenet_xmit(&queue); @@ -499,32 +497,15 @@ resend(struct aoedev *d, struct frame *f) static int tsince_hr(struct frame *f) { - struct timeval now; + ktime_t now; int n;
- do_gettimeofday(&now); - n = now.tv_usec - f->sent.tv_usec; - n += (now.tv_sec - f->sent.tv_sec) * USEC_PER_SEC; + now = ktime_get(); + n = ktime_to_us(ktime_sub(now, f->sent));
if (n < 0) n = -n;
- /* For relatively long periods, use jiffies to avoid - * discrepancies caused by updates to the system time. - * - * On system with HZ of 1000, 32-bits is over 49 days - * worth of jiffies, or over 71 minutes worth of usecs. - * - * Jiffies overflow is handled by subtraction of unsigned ints: - * (gdb) print (unsigned) 2 - (unsigned) 0xfffffffe - * $3 = 4 - * (gdb) - */ - if (n > USEC_PER_SEC / 4) { - n = ((u32) jiffies) - f->sent_jiffs; - n *= USEC_PER_SEC / HZ; - } - return n; }
@@ -589,7 +570,6 @@ reassign_frame(struct frame *f) nf->waited = 0; nf->waited_total = f->waited_total; nf->sent = f->sent; - nf->sent_jiffs = f->sent_jiffs; f->skb = skb;
return nf; @@ -633,8 +613,7 @@ probe(struct aoetgt *t)
skb = skb_clone(f->skb, GFP_ATOMIC); if (skb) { - do_gettimeofday(&f->sent); - f->sent_jiffs = (u32) jiffies; + f->sent = ktime_get(); __skb_queue_head_init(&queue); __skb_queue_tail(&queue, skb); aoenet_xmit(&queue); @@ -1474,8 +1453,7 @@ aoecmd_ata_id(struct aoedev *d)
skb = skb_clone(skb, GFP_ATOMIC); if (skb) { - do_gettimeofday(&f->sent); - f->sent_jiffs = (u32) jiffies; + f->sent = ktime_get(); }
return skb;
On Monday 11 May 2015 08:05:05 Tina Ruchandani wrote:
'struct frame' uses two variables to store the sent timestamp - 'struct timeval' and jiffies. jiffies is used to avoid discrepancies caused by updates to system time. 'struct timeval' uses 32-bit representation for seconds which will overflow in year 2038. This patch does the following:
- Replace the use of 'struct timeval' and jiffies with ktime_t, which
is a 64-bit timestamp and is year 2038 safe.
- ktime_t provides both long range (like jiffies) and high resolution
(like timeval). Using ktime_get (monotonic time) instead of wall-clock time prevents any discprepancies caused by updates to system time.
Signed-off-by: Tina Ruchandani ruchandani.tina@gmail.com
Very nice!
@@ -499,32 +497,15 @@ resend(struct aoedev *d, struct frame *f) static int tsince_hr(struct frame *f) {
- struct timeval now;
- ktime_t now; int n;
- do_gettimeofday(&now);
- n = now.tv_usec - f->sent.tv_usec;
- n += (now.tv_sec - f->sent.tv_sec) * USEC_PER_SEC;
- now = ktime_get();
- n = ktime_to_us(ktime_sub(now, f->sent));
I would cut four extra lines by writing this as
return ktime_us_delta(ktime_get(), f->sent));
but the effect is exactly the same.
With that change, please add
Reviewed-by: Arnd Bergmann arnd@arndb.de
Arnd
I would like to see some performance measurements for this patch on a system with fast storage and multiple 10 GbE links.
If not, at least a good analysis of the expected performance impact the patch will have on major architectures.
Tonight I will think about whether the 2038 thing even matters or whether we just need a comment explaining why it's safe.
On May 11, 2015 11:38 AM, Arnd Bergmann arnd@arndb.de wrote:
On Monday 11 May 2015 08:05:05 Tina Ruchandani wrote:
'struct frame' uses two variables to store the sent timestamp - 'struct timeval' and jiffies. jiffies is used to avoid discrepancies caused by updates to system time. 'struct timeval' uses 32-bit representation for seconds which will overflow in year 2038. This patch does the following:
- Replace the use of 'struct timeval' and jiffies with ktime_t, which
is a 64-bit timestamp and is year 2038 safe.
- ktime_t provides both long range (like jiffies) and high resolution
(like timeval). Using ktime_get (monotonic time) instead of wall-clock time prevents any discprepancies caused by updates to system time.
Signed-off-by: Tina Ruchandani ruchandani.tina@gmail.com
Very nice!
@@ -499,32 +497,15 @@ resend(struct aoedev *d, struct frame *f) static int tsince_hr(struct frame *f) {
- struct timeval now;
- ktime_t now;
int n;
- do_gettimeofday(&now);
- n = now.tv_usec - f->sent.tv_usec;
- n += (now.tv_sec - f->sent.tv_sec) * USEC_PER_SEC;
- now = ktime_get();
- n = ktime_to_us(ktime_sub(now, f->sent));
I would cut four extra lines by writing this as
return ktime_us_delta(ktime_get(), f->sent));
but the effect is exactly the same.
With that change, please add
Reviewed-by: Arnd Bergmann arnd@arndb.de
Arnd
First, thanks for the patch. I do appreciate the attempt to simplify this part of the driver, but I don't think that this patch is good to merge.
I'll make some comments inline below.
On 05/10/2015 10:35 PM, Tina Ruchandani wrote:
'struct frame' uses two variables to store the sent timestamp - 'struct timeval' and jiffies. jiffies is used to avoid discrepancies caused by updates to system time. 'struct timeval' uses 32-bit representation for seconds which will overflow in year 2038.
The comment in the deleted lines below mentions the fact that the overflow does not matter for calculating rough-grained deltas in time. So there is no problem in 2038 or on systems with the clock set to 2038 accidentally.
This patch does the following:
- Replace the use of 'struct timeval' and jiffies with ktime_t, which
is a 64-bit timestamp and is year 2038 safe.
- ktime_t provides both long range (like jiffies) and high resolution
(like timeval). Using ktime_get (monotonic time) instead of wall-clock time prevents any discprepancies caused by updates to system time.
But the patch only changes the struct frame data. The aoe driver only has the struct frame for an incoming AoE response when that response is "expected". If the response comes in a bit late, the frame may have already been used for a new command.
You can see that in aoecmd_ata_rsp when getframe_deferred returns NULL and tsince is called instead of tsince_hr.
In that case, there is still information about the timing embedded in the AoE tag. The send time in jiffies is a rough-grained record of the send time, and it's extracted from the tag. For these "unexpected" responses, this timing information can improve performance significantly without introducing extra overhead or risk.
I don't think the patch considers this aspect of the way the round trip time is calculated, and I don't think the primary motivation is justified (if that's 2038 safety, which we have already).
Simplifying it would be nice, but it would be difficult to thoroughly test all of the performance implications. There are still people using 32-bit systems, for example.
Signed-off-by: Tina Ruchandani ruchandani.tina@gmail.com
drivers/block/aoe/aoe.h | 3 +-- drivers/block/aoe/aoecmd.c | 36 +++++++----------------------------- 2 files changed, 8 insertions(+), 31 deletions(-)
diff --git a/drivers/block/aoe/aoe.h b/drivers/block/aoe/aoe.h index 9220f8e..4582b3c 100644 --- a/drivers/block/aoe/aoe.h +++ b/drivers/block/aoe/aoe.h @@ -112,8 +112,7 @@ enum frame_flags { struct frame { struct list_head head; u32 tag;
- struct timeval sent; /* high-res time packet was sent */
- u32 sent_jiffs; /* low-res jiffies-based sent time */
- ktime_t sent; ulong waited; ulong waited_total; struct aoetgt *t; /* parent target I belong to */
diff --git a/drivers/block/aoe/aoecmd.c b/drivers/block/aoe/aoecmd.c index 422b7d8..7f78780 100644 --- a/drivers/block/aoe/aoecmd.c +++ b/drivers/block/aoe/aoecmd.c @@ -398,8 +398,7 @@ aoecmd_ata_rw(struct aoedev *d) skb = skb_clone(f->skb, GFP_ATOMIC); if (skb) {
do_gettimeofday(&f->sent);
f->sent_jiffs = (u32) jiffies;
__skb_queue_head_init(&queue); __skb_queue_tail(&queue, skb); aoenet_xmit(&queue);f->sent = ktime_get();
@@ -489,8 +488,7 @@ resend(struct aoedev *d, struct frame *f) skb = skb_clone(skb, GFP_ATOMIC); if (skb == NULL) return;
- do_gettimeofday(&f->sent);
- f->sent_jiffs = (u32) jiffies;
- f->sent = ktime_get(); __skb_queue_head_init(&queue); __skb_queue_tail(&queue, skb); aoenet_xmit(&queue);
@@ -499,32 +497,15 @@ resend(struct aoedev *d, struct frame *f) static int tsince_hr(struct frame *f) {
- struct timeval now;
- ktime_t now; int n;
- do_gettimeofday(&now);
- n = now.tv_usec - f->sent.tv_usec;
- n += (now.tv_sec - f->sent.tv_sec) * USEC_PER_SEC;
- now = ktime_get();
- n = ktime_to_us(ktime_sub(now, f->sent));
if (n < 0) n = -n;
- /* For relatively long periods, use jiffies to avoid
* discrepancies caused by updates to the system time.
*
* On system with HZ of 1000, 32-bits is over 49 days
* worth of jiffies, or over 71 minutes worth of usecs.
*
* Jiffies overflow is handled by subtraction of unsigned ints:
* (gdb) print (unsigned) 2 - (unsigned) 0xfffffffe
* $3 = 4
* (gdb)
*/
- if (n > USEC_PER_SEC / 4) {
n = ((u32) jiffies) - f->sent_jiffs;
n *= USEC_PER_SEC / HZ;
- }
- return n; }
@@ -589,7 +570,6 @@ reassign_frame(struct frame *f) nf->waited = 0; nf->waited_total = f->waited_total; nf->sent = f->sent;
- nf->sent_jiffs = f->sent_jiffs; f->skb = skb;
return nf; @@ -633,8 +613,7 @@ probe(struct aoetgt *t) skb = skb_clone(f->skb, GFP_ATOMIC); if (skb) {
do_gettimeofday(&f->sent);
f->sent_jiffs = (u32) jiffies;
__skb_queue_head_init(&queue); __skb_queue_tail(&queue, skb); aoenet_xmit(&queue);f->sent = ktime_get();
@@ -1474,8 +1453,7 @@ aoecmd_ata_id(struct aoedev *d) skb = skb_clone(skb, GFP_ATOMIC); if (skb) {
do_gettimeofday(&f->sent);
f->sent_jiffs = (u32) jiffies;
}f->sent = ktime_get();
return skb;
On Monday 11 May 2015 21:00:25 Ed Cashin wrote:
First, thanks for the patch. I do appreciate the attempt to simplify this part of the driver, but I don't think that this patch is good to merge.
I'll make some comments inline below.
On 05/10/2015 10:35 PM, Tina Ruchandani wrote:
'struct frame' uses two variables to store the sent timestamp - 'struct timeval' and jiffies. jiffies is used to avoid discrepancies caused by updates to system time. 'struct timeval' uses 32-bit representation for seconds which will overflow in year 2038.
The comment in the deleted lines below mentions the fact that the overflow does not matter for calculating rough-grained deltas in time. So there is no problem in 2038 or on systems with the clock set to 2038 accidentally.
To clarify: because 'struct timeval' is known to be broken in general, we want to remove it from the kernel entirely and replace it with something that is known to be correct in all cases, even for the drivers that are currently correct. Out of the ~250 files in the kernel that use 'timeval', we can't easily tell which ones are correct, so by replacing them all, we can eliminate all the bugs.
We should be able to do that in a way that generally improves all the drivers, because using 'timeval' tends to be suboptimal to start with.
If the currently available interfaces make things worse for the aoe driver, we may have to add extra infrastructure, and get something that also helps the conversion of other drivers.
This patch does the following:
- Replace the use of 'struct timeval' and jiffies with ktime_t, which
is a 64-bit timestamp and is year 2038 safe.
- ktime_t provides both long range (like jiffies) and high resolution
(like timeval). Using ktime_get (monotonic time) instead of wall-clock time prevents any discprepancies caused by updates to system time.
But the patch only changes the struct frame data. The aoe driver only has the struct frame for an incoming AoE response when that response is "expected". If the response comes in a bit late, the frame may have already been used for a new command.
You can see that in aoecmd_ata_rsp when getframe_deferred returns NULL and tsince is called instead of tsince_hr.
In that case, there is still information about the timing embedded in the AoE tag. The send time in jiffies is a rough-grained record of the send time, and it's extracted from the tag. For these "unexpected" responses, this timing information can improve performance significantly without introducing extra overhead or risk.
That path is not changed at all by this patch, right? It also looks like the jiffies information from there is only used to print an error message.
I don't think the patch considers this aspect of the way the round trip time is calculated, and I don't think the primary motivation is justified (if that's 2038 safety, which we have already).
Simplifying it would be nice, but it would be difficult to thoroughly test all of the performance implications. There are still people using 32-bit systems, for example.
Here is my analysis regarding the performance implications:
- Avoiding the access to 'jiffies' in a few places has basically zero impact in small systems, but may help on large SMP machines because it avoids cache line bouncing when you have use multiple concurrent accesses to jiffies.
- Replacing do_gettimeofday() with ktime_get() will improve things slightly on all machines, because it avoids a 32-bit division that takes a couple of cycles, up to hundreds of cycles on some CPU architectures.
- This leaves a single change that is currently making things worse in tsince_hr():
- do_gettimeofday(&now);
- n = now.tv_usec - f->sent.tv_usec;
- n += (now.tv_sec - f->sent.tv_sec) * USEC_PER_SEC;
- now = ktime_get();
- n = ktime_to_us(ktime_sub(now, f->sent));
ktime_to_us() requires a constant 64-bit integer division that is significantly more expensive than the 32-bit it replaces. Thanks to your analysis, I think it's fair to say that the function is indeed timing critical and we should try hard avoid introducing this overhead.
There are of course multiple ways to do this. One way would be to change the code to work on 32-bit nanoseconds instead of 32-bit microseconds. This requires proving that the we cannot exceed 4.29 seconds of round-trip time in calc_rttavg(). Is that a valid assumption or not?
If not, we could replace do_gettimeofday() with ktime_get_ts64(). This will ensure we don't need a 64-bit division when converting the ts64 to a 32-bit microsecond value, and combined with the conversion is still no slower than do_gettimeofday(), and it still avoids the double bookkeeping because it uses a monotonic timebase that is robust against settimeofday.
Arnd
On Tuesday 12 May 2015 11:44:21 Arnd Bergmann wrote:
There are of course multiple ways to do this. One way would be to change the code to work on 32-bit nanoseconds instead of 32-bit microseconds. This requires proving that the we cannot exceed 4.29 seconds of round-trip time in calc_rttavg(). Is that a valid assumption or not?
If not, we could replace do_gettimeofday() with ktime_get_ts64(). This will ensure we don't need a 64-bit division when converting the ts64 to a 32-bit microsecond value, and combined with the conversion is still no slower than do_gettimeofday(), and it still avoids the double bookkeeping because it uses a monotonic timebase that is robust against settimeofday.
Two other approaches that occurred to me later:
- introduce common ktime_get_ms(), ktime_get_us(), ktime_get_real_ms() and ktime_get_real_is() interfaces, to match the other interfaces we already provide. These could be done as efficiently or better than what aoe does manually today.
- change the timebase that is used for the computations in aoe to use scaled nanoseconds instead of microseconds. Using
u32 time = ktime_get_ns() >> 10;
would give you a similar range and precision as microseconds, but completely avoid integer division. You could also use a different shift value to either extend the range beyond 71 minutes, or the extend the precision to something below a microsecond. This would be the most efficient implementation, but also require significant changes to the driver.
Arnd
On 05/12/2015 07:14 AM, Arnd Bergmann wrote:
On Tuesday 12 May 2015 11:44:21 Arnd Bergmann wrote:
There are of course multiple ways to do this. One way would be to change the code to work on 32-bit nanoseconds instead of 32-bit microseconds. This requires proving that the we cannot exceed 4.29 seconds of round-trip time in calc_rttavg(). Is that a valid assumption or not?
If not, we could replace do_gettimeofday() with ktime_get_ts64(). This will ensure we don't need a 64-bit division when converting the ts64 to a 32-bit microsecond value, and combined with the conversion is still no slower than do_gettimeofday(), and it still avoids the double bookkeeping because it uses a monotonic timebase that is robust against settimeofday.
Two other approaches that occurred to me later:
introduce common ktime_get_ms(), ktime_get_us(), ktime_get_real_ms() and ktime_get_real_is() interfaces, to match the other interfaces we already provide. These could be done as efficiently or better than what aoe does manually today.
change the timebase that is used for the computations in aoe to use scaled nanoseconds instead of microseconds. Using
u32 time = ktime_get_ns() >> 10;
would give you a similar range and precision as microseconds, but completely avoid integer division. You could also use a different shift value to either extend the range beyond 71 minutes, or the extend the precision to something below a microsecond. This would be the most efficient implementation, but also require significant changes to the driver.
That is an interesting idea. People do care about aoe_deadsecs being pretty accurate, so there would need to be a way to make that remain accurate. The driver will fail outstanding I/O to the target and mark it as "down" after unsuccessfully retransmitting commands to the target for a number of seconds equal to aoe_deadsecs.
As to the efficient ktime_get_us idea, that sounds appealing since you mention that they would be efficient.
Thanks for the analysis.
On Tuesday 12 May 2015 21:23:04 Ed Cashin wrote:
On 05/12/2015 07:14 AM, Arnd Bergmann wrote:
On Tuesday 12 May 2015 11:44:21 Arnd Bergmann wrote:
There are of course multiple ways to do this. One way would be to change the code to work on 32-bit nanoseconds instead of 32-bit microseconds. This requires proving that the we cannot exceed 4.29 seconds of round-trip time in calc_rttavg(). Is that a valid assumption or not?
If not, we could replace do_gettimeofday() with ktime_get_ts64(). This will ensure we don't need a 64-bit division when converting the ts64 to a 32-bit microsecond value, and combined with the conversion is still no slower than do_gettimeofday(), and it still avoids the double bookkeeping because it uses a monotonic timebase that is robust against settimeofday.
Two other approaches that occurred to me later:
introduce common ktime_get_ms(), ktime_get_us(), ktime_get_real_ms() and ktime_get_real_is() interfaces, to match the other interfaces we already provide. These could be done as efficiently or better than what aoe does manually today.
change the timebase that is used for the computations in aoe to use scaled nanoseconds instead of microseconds. Using
u32 time = ktime_get_ns() >> 10;
would give you a similar range and precision as microseconds, but completely avoid integer division. You could also use a different shift value to either extend the range beyond 71 minutes, or the extend the precision to something below a microsecond. This would be the most efficient implementation, but also require significant changes to the driver.
That is an interesting idea. People do care about aoe_deadsecs being pretty accurate, so there would need to be a way to make that remain accurate. The driver will fail outstanding I/O to the target and mark it as "down" after unsuccessfully retransmitting commands to the target for a number of seconds equal to aoe_deadsecs.
As to the efficient ktime_get_us idea, that sounds appealing since you mention that they would be efficient.
Thanks for the analysis.
Shall we do the ktime_get_us() approach then? It still requires a 32-bit division like do_gettimeofday(), so it will not be as efficient as the shifted nanoseconds.
As for the aoe_deadsecs computation, converting the aoe_deadsec module parameter into scaled nanoseconds can be done at module load time, and that way you also save the integer division you currently do for each frame in rexmit_timer() to turn the microseconds into seconds.
Arnd
On 05/13/2015 04:04 AM, Arnd Bergmann wrote: ...
Shall we do the ktime_get_us() approach then? It still requires a 32-bit division like do_gettimeofday(), so it will not be as efficient as the shifted nanoseconds.
It's no worse, though, right? So I think it's a good transition. Further optimization could be attempted in an experimental branch at some point for easy testing.
As for the aoe_deadsecs computation, converting the aoe_deadsec module parameter into scaled nanoseconds can be done at module load time, and that way you also save the integer division you currently do for each frame in rexmit_timer() to turn the microseconds into seconds. Arnd
That's true, but the "secs" in the identifier stands for "seconds". It would be misleading to have something called seconds be scaled nanoseconds. And we could just use another variable if it weren't for the fact that this module parameter is exposed through sysfs and can be changed through that mechanism at any time.
Thanks for the expanded motivation. I'll return to your ideas tonight, but I wanted to mention that it is possible for round trips to take well over five seconds, partly because the disk on the target might be resetting.
If the rough-grained mechanism in the AoE tag still works, though, that's very helpful.
On 05/12/2015 05:44 AM, Arnd Bergmann wrote:
On Monday 11 May 2015 21:00:25 Ed Cashin wrote:
...
In that case, there is still information about the timing embedded in the AoE tag. The send time in jiffies is a rough-grained record of the send time, and it's extracted from the tag. For these "unexpected" responses, this timing information can improve performance significantly without introducing extra overhead or risk. That path is not changed at all by this patch, right? It also looks like the jiffies information from there is only used to print an error message.
That's right, thanks, the tag still has the old jiffies embedded in it.
The information, though, is used to update the round-trip-time average and the running estimate of the RTT variance. For the unexpected responses that information can help the driver to maintain high performance when there are inconsistencies in the network performance. (calc_rttavg)