On Wed, Jul 24, 2024 at 7:49 AM Johannes Weiner hannes@cmpxchg.org wrote:
On Tue, Jul 23, 2024 at 09:55:19PM -0400, Waiman Long wrote:
Could you use the "-v <n>" option of git-format-patch to add a version number to the patch title? Without that, it can be confusing as to whether the patch is new or a resend of the previous one.
+1
Sorry, I forgot that that flag exists. I'll use that with the next patch. (which I'll send out shortly)
@@ -775,6 +775,11 @@ struct cgroup_subsys {
extern struct percpu_rw_semaphore cgroup_threadgroup_rwsem;
+struct cgroup_of_peak {
- long value;
- struct list_head list;
+};
The name "cgroup_of_peak" is kind of confusing. Maybe local_peak?
It's the peak associated with an 'of' (which is a known concept in cgroup code), and it pairs up nicely with of_peak(). I'd prefer keeping that over local_peak.
@@ -26,6 +26,7 @@ struct page_counter { atomic_long_t children_low_usage;
unsigned long watermark;
- unsigned long local_watermark; /* track min of fd-local resets */
track "min"? I thought it is used to track local maximum after a reset.
Yeah, the comment doesn't sound quite right.
Yeah, it's not explicitly the min. At reset-time, it's the current value at reset-time, and all the fd-local watermarks will all be greater than or equal. Which does effectively make it the min of watermarks at the time it's being set by the reset code.
However, yeah, the page-charging code will increase it, which makes it not a min.
However, I think we'd be hard-pressed to explain correctly and comprehensively what this thing does in <40 characters.
I'd just remove the comment tbh.
Yeah, I definitely didn't think that comment through. Deleting.
@@ -78,7 +79,10 @@ int page_counter_memparse(const char *buf, const char *max,
static inline void page_counter_reset_watermark(struct page_counter *counter) {
- counter->watermark = page_counter_read(counter);
- unsigned long usage = page_counter_read(counter);
- counter->watermark = usage;
- counter->local_watermark = usage; }
Could you set the local_watermark first before setting watermark? There is a very small time window that the invariant "local_watermark <= watermark" is not true.
Does it matter? Only cgroup1 supports global resets; only cgroup2 supports local peaks watching. This doesn't add anything to the race that already exists between reset and global watermark update on cg1.
Hmm, since the global watermark update is now conditional on both watermarks being <= the current usage, it does make sense. Witht that said, since we're assigning without any barriers, as-is, the CPU and compiler are quite free to re-order them anyway.
I've swapped them and added a comment.
@@ -3950,12 +3955,90 @@ static u64 memory_current_read(struct cgroup_subsys_state *css, return (u64)page_counter_read(&memcg->memory) * PAGE_SIZE; }
-static u64 memory_peak_read(struct cgroup_subsys_state *css,
struct cftype *cft)
+static int peak_show(struct seq_file *sf, void *v, struct page_counter *pc) {
- struct mem_cgroup *memcg = mem_cgroup_from_css(css);
- struct cgroup_of_peak *ofp = of_peak(sf->private);
- s64 fd_peak = ofp->value, peak;
- /* User wants global or local peak? */
- if (fd_peak == -1)
peak = pc->watermark;
- else
peak = max(fd_peak, (s64)pc->local_watermark);
Should you save the local_watermark value into ofp->value if local_watermark is bigger? This will ensure that each successive read of the fd is monotonically increasing. Otherwise the value may go up or down if there are multiple resets in between.
The reset saves local_watermark into ofp->value if it's bigger..?
I do see another problem, though. The compiler might issue multiple reads to ofp->value in arbitrary order. We could print max(-1, ...) which is nonsense. Saving ofp->value into a local variable is the right idea, but the compiler might still issue two reads anyway. It needs a READ_ONCE() to force a single read.
Thanks, I didn't realize the compiler had the latitude to decide to read from that struct field a second time when referencing the local variable.
Added.
I'd use unsigned long for fd_peak. This way the "specialness" is on the -1UL comparison. The max() must be between two positive numbers, so the (s64) there is confusing.
I've switched fd_peak to `u64`.
Thanks again,
-- David Finkel Senior Principal Software Engineer, Core Services