Hi.
On Sat, Mar 12, 2022 at 07:07:15PM +0000, Shakeel Butt shakeelb@google.com wrote:
So, I will focus on the error rate in this email.
(OK, I'll stick to error estimate (for long-term) in this message and will send another about the current patch.)
[...]
The benefit this was traded for was the greater accuracy, the possible error is:
- before
- O(nr_cpus * nr_cgroups(subtree) * MEMCG_CHARGE_BATCH) (1)
Please note that (1) is the possible error for each stat item and without any time bound.
I agree (forgot to highlight this can stuck forever).
- after O(nr_cpus * MEMCG_CHARGE_BATCH) // sync. flush
The above is across all the stat items.
Can it be used to argue about the error? E.g. nr_cpus * MEMCG_CHARGE_BATCH / nr_counters looks appealing but that's IMO too optimistic.
The individual item updates are correlated so in practice a single item would see a lower error than my first relation but without delving too much into correlations the upper bound is nr_counters independent.
I don't get the reason of breaking 'cr' into individual stat item or counter. What is the benefit? We want to keep the error rate decoupled from the number of counters (or stat items).
It's just a model, it should capture that every stat item (change) contributes to the common error estimate. (So it moves more towards the nr_cpus * MEMCG_CHARGE_BATCH / nr_counters per-item error (but here we're asking about processing time.))
[...]
My main reason behind trying NR_MEMCG_EVENTS was to reduce flush_work by reducing nr_counters and I don't think nr_counters should have an impact on Δt.
The higher number of items is changing, the sooner they accumulate the target error, no?
(Δt is not the periodic flush period, it's variable time between two sync flushes.)
Michal