On Fri, Dec 17 2021 at 12:30, David Hildenbrand wrote:
Sometimes it is required to have a seqcount implementation that uses a structure with a fixed and minimal size -- just a bare unsigned int -- independent of the kernel configuration. This is especially valuable, when the raw_ variants of the seqlock function will be used and the additional lockdep part of the seqcount_t structure remains essentially unused.
Let's provide a lockdep-free raw_seqcount_t variant that can be used via the raw functions to have a basic seqlock.
The target use case is embedding a raw_seqcount_t in the "struct page", where we really want a minimal size and cannot tolerate a sudden grow of the seqcount_t structure resulting in a significant "struct page" increase or even a layout change.
Cannot tolerate? Could you please provide a reason and not just a statement?
Provide raw_read_seqcount_retry(), to make it easy to match to raw_read_seqcount_begin() in the code.
Let's add a short documentation as well.
Note: There might be other possible users for raw_seqcount_t where the lockdep part might be completely unused and just wastes memory -- essentially any users that only use the raw_ function variants.
Even when the reader side uses raw_seqcount_begin/retry() the writer side still can use the non-raw variant which validates that the associated lock is held on write.
Aside of that your proposed extra raw sequence count needs extra care vs. PREEMPT_RT and this want's to be very clearly documented. Why?
The lock association has two purposes:
1) Lockdep coverage which unearthed bugs already
2) PREEMPT_RT livelock prevention
Assume the following:
spin_lock(wrlock); write_seqcount_begin(seq);
---> preemption by a high priority reader
seqcount_begin(seq); <-- live lock
The RT substitution does:
seqcount_begin(seq) cnt = READ_ONCE(seq->sequence);
if (cnt & 1) { lock(s->lock); unlock(s->lock); }
which prevents the deadlock because it makes the reader block on the associated lock, which allows the preempted writer to make progress.
This applies to raw_seqcount_begin() as well.
I have no objections against the construct itself, but this has to be properly documented vs. the restriction this imposes.
As you can see above the writer side therefore has to ensure that it cannot preempted on PREEMPT_RT, which limits the possibilities what you can do inside a preemption (or interrupt) disabled section on RT enabled kernels. See Documentation/locking/locktypes.rst for further information.
Thanks,
tglx