On Fri, Aug 20, 2021, Mathieu Desnoyers wrote:
Without the lazy clear scheme, a rseq c.s. would look like:
init(rseq_cs)
cpu = TLS->rseq::cpu_id_start
- [1] TLS->rseq::rseq_cs = rseq_cs
- [start_ip] ----------------------------
- [2] if (cpu != TLS->rseq::cpu_id)
goto abort_ip;
- [3] <last_instruction_in_cs>
- [post_commit_ip] ----------------------------
- [4] TLS->rseq::rseq_cs = NULL
But as a fast-path optimization, [4] is not entirely needed because the rseq_cs descriptor contains information about the instruction pointer range of the critical section. Therefore, userspace can omit [4], but if the kernel never clears it, it means that it will have to re-read the rseq_cs descriptor's content each time it needs to check it to confirm that it is not nested over a rseq c.s..
So making the kernel lazily clear the rseq_cs pointer is just an optimization which ensures that the kernel won't do useless work the next time it needs to check rseq_cs, given that it has already validated that the userspace code is currently not within the rseq c.s. currently advertised by the rseq_cs field.
Thanks for the explanation, much appreciated!