From: Peter Zijlstra peterz@infradead.org
commit ea1dc6fc6242f991656e35e2ed3d90ec1cd13418 upstream.
Commit:
fde7d22e01aa ("sched/fair: Fix overly small weight for interactive group entities")
did something non-obvious but also did it buggy yet latent.
The problem was exposed for real by a later commit in the v4.7 merge window:
2159197d6677 ("sched/core: Enable increased load resolution on 64-bit kernels")
... after which tg->load_avg and cfs_rq->load.weight had different units (10 bit fixed point and 20 bit fixed point resp.).
Add a comment to explain the use of cfs_rq->load.weight over the 'natural' cfs_rq->avg.load_avg and add scale_load_down() to correct for the difference in unit.
Since this is (now, as per a previous commit) the only user of calc_tg_weight(), collapse it.
The effects of this bug should be randomly inconsistent SMP-balancing of cgroups workloads.
Reported-by: Jirka Hladky jhladky@redhat.com Signed-off-by: Peter Zijlstra (Intel) peterz@infradead.org Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Peter Zijlstra peterz@infradead.org Cc: Thomas Gleixner tglx@linutronix.de Fixes: 2159197d6677 ("sched/core: Enable increased load resolution on 64-bit kernels") Fixes: fde7d22e01aa ("sched/fair: Fix overly small weight for interactive group entities") Signed-off-by: Ingo Molnar mingo@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- kernel/sched/fair.c | 27 +++++++++++---------------- 1 file changed, 11 insertions(+), 16 deletions(-)
--- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -2394,28 +2394,22 @@ account_entity_dequeue(struct cfs_rq *cf
#ifdef CONFIG_FAIR_GROUP_SCHED # ifdef CONFIG_SMP -static inline long calc_tg_weight(struct task_group *tg, struct cfs_rq *cfs_rq) +static long calc_cfs_shares(struct cfs_rq *cfs_rq, struct task_group *tg) { - long tg_weight; + long tg_weight, load, shares;
/* - * Use this CPU's real-time load instead of the last load contribution - * as the updating of the contribution is delayed, and we will use the - * the real-time load to calc the share. See update_tg_load_avg(). + * This really should be: cfs_rq->avg.load_avg, but instead we use + * cfs_rq->load.weight, which is its upper bound. This helps ramp up + * the shares for small weight interactive tasks. */ - tg_weight = atomic_long_read(&tg->load_avg); - tg_weight -= cfs_rq->tg_load_avg_contrib; - tg_weight += cfs_rq->load.weight; - - return tg_weight; -} + load = scale_load_down(cfs_rq->load.weight);
-static long calc_cfs_shares(struct cfs_rq *cfs_rq, struct task_group *tg) -{ - long tg_weight, load, shares; + tg_weight = atomic_long_read(&tg->load_avg);
- tg_weight = calc_tg_weight(tg, cfs_rq); - load = cfs_rq->load.weight; + /* Ensure tg_weight >= load */ + tg_weight -= cfs_rq->tg_load_avg_contrib; + tg_weight += load;
shares = (tg->shares * load); if (tg_weight) @@ -2434,6 +2428,7 @@ static inline long calc_cfs_shares(struc return tg->shares; } # endif /* CONFIG_SMP */ + static void reweight_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, unsigned long weight) {