On 22/11/16 14:51, Vincent Guittot wrote:
On 22 November 2016 at 13:32, Juri Lelli juri.lelli@arm.com wrote:
On 22/11/16 11:54, Patrick Bellasi wrote:
On 22-Nov 12:01, Vincent Guittot wrote:
[...]
The example above is: start run 1 40ms after the start of run 0
I would say run1 starts 20ms after the start of run0, isn't it?
and start next event 20ms after the start of run1
mmm... not following here.
so you will not be impacted by the real duration of run0 and run1 (because of freq scaling or micro arch). Do this sequence 2 times then start the next sequence/phase
To me this is the expected behavior:
+-----+ +----------+ +-----+ +----------+ |r0 | |r1 | |r0 | |r1 | | | | | | | | | +-----------------------------------------------------------> 0 10 20 30 40 50 60 70 80 100 120 ^ ^ | | + + Timer0 Timer1
So, if we really need this level of flexibility we can add up periods as well (as Vincent was already saying) so that we end up we a row in the log for each phase reporting a "super-period" of 60ms.
If we agree that this makes sense, what we do for the slack? One possibility could be to start adding up there as well, so that the subslack of the first timer and the subslack of the second one are reflected in the slack reported in the per-phase row.
On thing that might be counter-intuitive looking at traces though is that if, for example, the first run executes for more that the first period (say 25 ms, e.g. frequency was low) it will report a -5ms slack which adds up with the second sub slack of r1 (which instead recovered and effectively executed for 20ms. So, now we report a 10ms overall slack, while in the trace we see a 15ms gap before the next event.
+------------+----------+ +-----+ +----------+ |r0 |r1 | |r0 | |r1 | | | | | | | | +-----------------------------------------------------------> 0 10 20 30 40 50 60 70 80 100 120 ^ ^ | | + + Timer0 Timer1
Oh, BTW, I also think that the following change is actually needed, otherwise the first run overshooting will influence the super-period.
Agree?
--->8--- From 52492c9ba118b686b76fcda05389a3a9c7c73120 Mon Sep 17 00:00:00 2001 From: Juri Lelli juri.lelli@arm.com Date: Tue, 22 Nov 2016 12:29:27 +0000 Subject: [PATCH] rt-app: don't let overshooting run phases affect timers
A run phase that overshoots might run for more that the next timer event. If that happens we still want for timers to not be affect by this happening.
Don't use current clock for setting the next timer event, always refer to t_first.
Signed-off-by: Juri Lelli juri.lelli@arm.com
src/rt-app.c | 1 - 1 file changed, 1 deletion(-)
diff --git a/src/rt-app.c b/src/rt-app.c index bec511121d77..36013758f718 100644 --- a/src/rt-app.c +++ b/src/rt-app.c @@ -338,7 +338,6 @@ static int run_event(event_data_t *event, int dry_run, t_wu = timespec_sub(&t_now, &rdata->res.timer.t_next); ldata->wu_latency = timespec_to_usec(&t_wu); } else {
clock_gettime(CLOCK_MONOTONIC, &rdata->res.timer.t_next);
The 1st implementation didn't update t_next when timer has already expired. But then following timers were also noted as expired because the prev period has stolen time whereas the run time was correct. We faced the issue while testing frequency scaling policy: the governor was slow to react to the 1st period, the overshoot during this 1st period was so huge compared the margin of the use case than the next periods were already expired when the run has finished whereas the governor has reacted and increased freq to max. In fact, the delay generated by the 1st period was so huge that it was not compensated after the result was meaningless because 1 period was really wrong but the following one were also seen as wrong
As an example: run 9ms in a period of 10ms. Your platform starts at lowest OPP which is 10 time slower than max OPP. If the 1st run event takes something like 50ms (the time for the governor to react), we have an overshoot of 40ms. This means that even if we are now at max OPP and the run event really takes 9ms, the next 40 periods will be seen as wrong because we will set timer in the past.
OK, right. I think we discussed already this point in the past (and didn't agree on a common solution :). Since now we have actually imposed and used the "cumulative" approach (where successive activation get penalized potentially infinitely), it models a more strict behaviour by always keeping an absolute (t_first) reference no matter what happens to run phases.
Since I think that both approaches might be useful to implement, why don't we add a "mode" : [absolute|relative] to timers and than we check the mode when executing the event to decide if we want to use current clock for t_next or keep t_first as reference?