On 26 January 2013 07:19, Alex Shi <alex.shi(a)intel.com> wrote:
> This patchset can be used, but causes burst waking benchmark aim9 drop 5~7%
> on my 2 sockets machine. The reason is too light runnable load in early stage
> of waked tasks causes imbalance in balancing.
>
> So, it is immature and just a reference for guys who want to go gurther.
>
> V2 change:
> 1, attached the 1~3 patches, which were sent in power awareness scheduling
> 2, remove CONFIG_FAIR_GROUP_SCHED mask in patch 5th.
>
> Thanks Ingo's comments and testing provided by Fengguang's kbuild system.
> Now it is indepent patchset bases on Linus' tree.
Pushed as: runnable-load-avg-in-load-balance-v2
The nr_busy_cpus field of the sched_group_power is sometime different from 0
whereas the platform is fully idle. This serie fixes 3 use cases:
- when the SCHED softirq is raised on an idle core for idle load balance but
the platform doesn't go out of the cpuidle state
- when some CPUs enter idle state while booting all CPUs
- when a CPU is unplug and/or replug
Vincent Guittot (3):
sched: fix nr_busy_cpus with coupled cpuidle
sched: fix init NOHZ_IDLE flag
sched: fix update NOHZ_IDLE flag
kernel/sched/core.c | 1 +
kernel/sched/fair.c | 2 +-
kernel/time/tick-sched.c | 2 ++
3 files changed, 4 insertions(+), 1 deletion(-)
--
1.7.9.5
Anyone seen this?
http://www.hardkernel.com/renewal_2011/products/prdt_info.php
It's cheaper than a Pandaboard with a quad-core and 2GB or RAM and
ridiculously small. That would probably get my LLVM builds under 1h...
But it seems too good to be true, does any one have experience with it?
cheers,
--renato
On 24 January 2013 09:00, Alex Shi <alex.shi(a)intel.com> wrote:
> This patchset can be used, but causes burst waking benchmark aim9 drop 5~7%
> on my 2 sockets machine. The reason is too light runnable load in early stage
> of waked tasks cause imbalance in balancing.
>
> So, it is immature and just a reference for guys who want to go gurther.
Pushed as runnable-load-avg-in-load-balance-v1-resent at:
http://git.linaro.org/gitweb?p=arm/big.LITTLE/mp.git;a=summary
On 24 January 2013 08:36, Alex Shi <alex.shi(a)intel.com> wrote:
> Since the runnable info needs 345ms to accumulate, balancing
> doesn't do well for many tasks burst waking. After talking with Mike
> Galbraith, we are agree to just use runnable avg in power friendly
> scheduling and keep current instant load in performance scheduling for
> low latency.
>
> So the biggest change in this version is removing runnable load avg in
> balance and just using runnable data in power balance.
Pushed as power-aware-scheduling-v4 at:
http://git.linaro.org/gitweb?p=arm/big.LITTLE/mp.git;a=summary
Calendar Week 4, 2013: Here is test result summary for Linux Linaro ubuntu
Quantal image on following boards:
1) ARM Versatile Express A9;
2) Samsung Origen;
3) TI Panda 4430;
4) TI Panda 4460;
5) ST Ericsson Snowball.
Synopsis: Snowball now can boot into serial console successfully; Device
Tree is unavailable in all images, and no Internet connection on Samsung
Origen board.
1. ARM Versatile Express A9 + Linux Linaro Quantal (Column H):
https://docs.google.com/a/linaro.org/spreadsheet/ccc?key=0AroPySpr4FnEdFNmV…
It keeps exactly same status as last test result: only "Halt" & "Device
Tree" test failed, all other features work well.
2. Samsung Origen + Linux Linaro Quantal (Column H):
https://docs.google.com/a/linaro.org/spreadsheet/ccc?key=0AroPySpr4FnEdEowN…
Device Tree is unavailable this week, also no Internet connection. "Halt"
works well.
3. TI Panda 4430 + Linux Linaro Quantal (Column H):
https://docs.google.com/a/linaro.org/spreadsheet/ccc?key=0AroPySpr4FnEdEwwZ…
Only Device Tree is unavailable, all other features work well.
4. TI Panda 4460 + Linux Linaro Quantal (Column H):
https://docs.google.com/a/linaro.org/spreadsheet/ccc?key=0AroPySpr4FnEdEwwZ…
Same as TI Panda 4430, only Device Tree is unavailable, others work well.
5. ST Ericsson Snowball + Linux Linaro Quantal (Column H):
https://docs.google.com/a/linaro.org/spreadsheet/ccc?key=0AroPySpr4FnEdFJ4X…
Now board can boot into serial console successfully, but many features are
unavailable, like HDMI, reboot, halt, Ethernet.
For the previous week test summary (Calendar week 3), please refer to
attachment.
Thank you.
Best Regards
Botao Sun
Hi everyone,
I have been looking at how different workloads react when the per entity
load tracking metric is integrated into the load balancer and what are
the possible reasons for it.
I had posted the integration patch earlier:
https://lkml.org/lkml/2012/11/15/391
Essentially what I am doing is:
1.I have disabled CONFIG_FAIR_GROUP_SCHED to make the analysis simple
2.I have replaced cfs_rq->load.weight in weighted_cpuload() with
cfs.runnable_load_avg,the active load tracking metric.
3.I have replaced se.load.weight in task_h_load() with
se.load.avg.contrib,the per entity load tracking metric.
4.The load balancer will end up using these metrics.
After conducting experiments on several workloads I found out that the
performance of the workloads with the above integration would neither
improve nor deteriorate.And this observation was consistent.
Ideally the performance should have improved considering,that the metric
does better tracking of load.
Let me explain with a simple example as to why we should see a
performance improvement ideally:Consider 2 80% tasks and 1 40% task.
With integration:
----------------
40%
80% 40%
cpu1 cpu2
The above will be the scenario when the tasks fork initially.And this is
a perfectly balanced system,hence no more load balancing.And proper
distribution of loads on the cpu.
Without integration
-------------------
40% 40%
80% 40% 80% 40%
cpu1 cpu2 OR cpu1 cpu2
Because the view is that all the tasks as having the same load.The load
balancer could ping pong tasks between these two situations.
When I performed this experiment,I did not see an improvement in the
performance though in the former case.On further observation I found
that the following was actually happening.
With integration
----------------
Initially 40% task sleeps 40% task wakes up
and select_idle_sibling()
decides to wake it up on cpu1
40% -> -> 40%
80% 40% 80% 40% 80% 40%
cpu1 cpu2 cpu1 cpu2 cpu1 cpu2
This makes load balance trigger movement of 40% from cpu1 back to
cpu2.Hence the stability that the load balancer was trying to achieve is
gone.Hence the culprit boils down to select_idle_sibling.How is it the
culprit and how is it hindering performance of the workloads?
*What is the way ahead with the per entity load tracking metric in the
load balancer then?*
In replies to a post by Paul in https://lkml.org/lkml/2012/12/6/105,
he mentions the following:
"It is my intuition that the greatest carnage here is actually caused
by wake-up load-balancing getting in the way of periodic in
establishing a steady state. I suspect more mileage would result from
reducing the interference wake-up load-balancing has with steady
state."
"The whole point of using blocked load is so that you can converge on a
steady state where you don't NEED to move tasks. What disrupts this is
we naturally prefer idle cpus on wake-up balance to reduce wake-up
latency. I think the better answer is making these two processes load
balancing() and select_idle_sibling() more co-operative."
I had not realised how this would happen until I saw it happening in the
above experiment.
Based on what Paul explained above let us use the runnable load + the
blocked load for calculating the load on a cfs runqueue rather than just
the runnable load(which is what i am doing now) and see its consequence.
Initially: 40% task sleeps
40%
80% 40% -> 80% 40%
cpu1 cpu2 cpu1 cpu2
So initially the load on cpu1 is say 80 and on cpu2 also it is
80.Balanced.Now when 40% task sleeps,the total load on cpu2=runnable
load+blocked load.which is still 80.
As a consequence,firstly,during periodic load balancing the load is not
moved from cpu1 to cpu2 when the 40% task sleeps.(It sees the load on
cpu2 as 80 and not as 40).
Hence the above scenario remains the same.On wake up,what happens?
Here comes the point of making both load balancing and wake up
balance(select_idle_sibling) co operative. How about we always schedule
the woken up task on the prev_cpu? This seems more sensible considering
load balancing considers blocked load as being a part of the load of cpu2.
If we do that,we end up scheduling the 40% task back on cpu2.Back to the
scenario which load balancing intended.Hence a steady state is
maintained no matter what unless other tasks show up.
Note that considering prev_cpu as the default cpu to run the woken up
task on is possible only because we use blocked load for load balancing
purposes.
The above steps of using blocked load and selecting the prev_cpu as the
target for the woken up task seems to me to be the next step.This could
allow the load balance with the per entity load tracking metric to
behave as it is supposed to without anything else disrupting it.And here
i expect a performance improvement.
Please do let me know your suggestions.This will greatly help take the
right steps here on, in achieving the correct integration.
Thank you
Regards
Preeti U Murthy