Because pids->limit can be changed concurrently (but we don't want to take a lock because it would be needlessly expensive), use the appropriate memory barriers.
Fixes: commit 49b786ea146f ("cgroup: implement the PIDs subsystem") Cc: stable@vger.kernel.org # v4.3+ Signed-off-by: Aleksa Sarai cyphar@cyphar.com --- kernel/cgroup/pids.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/kernel/cgroup/pids.c b/kernel/cgroup/pids.c index 8e513a573fe9..a726e4a20177 100644 --- a/kernel/cgroup/pids.c +++ b/kernel/cgroup/pids.c @@ -152,7 +152,7 @@ static int pids_try_charge(struct pids_cgroup *pids, int num) * p->limit is %PIDS_MAX then we know that this test will never * fail. */ - if (new > p->limit) + if (new > READ_ONCE(p->limit)) goto revert; }
@@ -277,7 +277,7 @@ static ssize_t pids_max_write(struct kernfs_open_file *of, char *buf, * Limit updates don't need to be mutex'd, since it isn't * critical that any racing fork()s follow the new limit. */ - pids->limit = limit; + WRITE_ONCE(pids->limit, limit); return nbytes; }
@@ -285,7 +285,7 @@ static int pids_max_show(struct seq_file *sf, void *v) { struct cgroup_subsys_state *css = seq_css(sf); struct pids_cgroup *pids = css_pids(css); - int64_t limit = pids->limit; + int64_t limit = READ_ONCE(pids->limit);
if (limit >= PIDS_MAX) seq_printf(sf, "%s\n", PIDS_MAX_STR);
On Sat, Oct 12, 2019 at 12:05:39PM +1100, Aleksa Sarai wrote:
Because pids->limit can be changed concurrently (but we don't want to take a lock because it would be needlessly expensive), use the appropriate memory barriers.
I can't quite tell what problem it's fixing. Can you elaborate a scenario where the current code would break that your patch fixes?
Thanks.
On 2019-10-14, Tejun Heo tj@kernel.org wrote:
On Sat, Oct 12, 2019 at 12:05:39PM +1100, Aleksa Sarai wrote:
Because pids->limit can be changed concurrently (but we don't want to take a lock because it would be needlessly expensive), use the appropriate memory barriers.
I can't quite tell what problem it's fixing. Can you elaborate a scenario where the current code would break that your patch fixes?
As far as I can tell, not using *_ONCE() here means that if you had a process changing pids->limit from A to B, a process might be able to temporarily exceed pids->limit -- because pids->limit accesses are not protected by mutexes and the C compiler can produce confusing intermediate values for pids->limit[1].
But this is more of a correctness fix than one fixing an actually exploitable bug -- given the kernel memory model work, it seems like a good idea to just use READ_ONCE() and WRITE_ONCE() for shared memory access.
[1]: https://github.com/google/ktsan/wiki/READ_ONCE-and-WRITE_ONCE
Hello, Aleksa.
On Tue, Oct 15, 2019 at 02:59:31AM +1100, Aleksa Sarai wrote:
On 2019-10-14, Tejun Heo tj@kernel.org wrote:
On Sat, Oct 12, 2019 at 12:05:39PM +1100, Aleksa Sarai wrote:
Because pids->limit can be changed concurrently (but we don't want to take a lock because it would be needlessly expensive), use the appropriate memory barriers.
I can't quite tell what problem it's fixing. Can you elaborate a scenario where the current code would break that your patch fixes?
As far as I can tell, not using *_ONCE() here means that if you had a process changing pids->limit from A to B, a process might be able to temporarily exceed pids->limit -- because pids->limit accesses are not protected by mutexes and the C compiler can produce confusing intermediate values for pids->limit[1].
But this is more of a correctness fix than one fixing an actually exploitable bug -- given the kernel memory model work, it seems like a good idea to just use READ_ONCE() and WRITE_ONCE() for shared memory access.
READ/WRITE_ONCE provides protection against compiler generating multiple accesses for a single operation. It won't prevent split writes / reads of 64bit variables on 32bit machines. For that, you'd have to switch them to atomic64_t's.
Thanks.
On 2019-10-14, Tejun Heo tj@kernel.org wrote:
Hello, Aleksa.
On Tue, Oct 15, 2019 at 02:59:31AM +1100, Aleksa Sarai wrote:
On 2019-10-14, Tejun Heo tj@kernel.org wrote:
On Sat, Oct 12, 2019 at 12:05:39PM +1100, Aleksa Sarai wrote:
Because pids->limit can be changed concurrently (but we don't want to take a lock because it would be needlessly expensive), use the appropriate memory barriers.
I can't quite tell what problem it's fixing. Can you elaborate a scenario where the current code would break that your patch fixes?
As far as I can tell, not using *_ONCE() here means that if you had a process changing pids->limit from A to B, a process might be able to temporarily exceed pids->limit -- because pids->limit accesses are not protected by mutexes and the C compiler can produce confusing intermediate values for pids->limit[1].
But this is more of a correctness fix than one fixing an actually exploitable bug -- given the kernel memory model work, it seems like a good idea to just use READ_ONCE() and WRITE_ONCE() for shared memory access.
READ/WRITE_ONCE provides protection against compiler generating multiple accesses for a single operation. It won't prevent split writes / reads of 64bit variables on 32bit machines. For that, you'd have to switch them to atomic64_t's.
Maybe I'm misunderstanding Documentation/atomic_t.txt, but it looks to me like it's explicitly saying that I shouldn't use atomic64_t if I'm just using it for fetching and assignment.
The non-RMW ops are (typically) regular LOADs and STOREs and are canonically implemented using READ_ONCE(), WRITE_ONCE(), smp_load_acquire() and smp_store_release() respectively. Therefore, if you find yourself only using the Non-RMW operations of atomic_t, you do not in fact need atomic_t at all and are doing it wrong.
As for 64-bit on 32-bit machines -- that is a separate issue, but from [1] it seems to me like there are more problems that *_ONCE() fixes than just split reads and writes.
[1]: https://lwn.net/Articles/793253/
Hello, Aleksa.
On Wed, Oct 16, 2019 at 07:32:19PM +1100, Aleksa Sarai wrote:
Maybe I'm misunderstanding Documentation/atomic_t.txt, but it looks to me like it's explicitly saying that I shouldn't use atomic64_t if I'm just using it for fetching and assignment.
Hah, where is it saying that? The alternative would be seqlock or u64_stats or straight-up locking but idk for this atomic64_t should be fine.
The non-RMW ops are (typically) regular LOADs and STOREs and are canonically implemented using READ_ONCE(), WRITE_ONCE(), smp_load_acquire() and smp_store_release() respectively. Therefore, if you find yourself only using the Non-RMW operations of atomic_t, you do not in fact need atomic_t at all and are doing it wrong.
As for 64-bit on 32-bit machines -- that is a separate issue, but from [1] it seems to me like there are more problems that *_ONCE() fixes than just split reads and writes.
Your explanations are too wishy washy. If you wanna fix it, please do it correctly. R/W ONCE isn't the right solution here.
Thanks.
On 2019-10-16, Tejun Heo tj@kernel.org wrote:
Hello, Aleksa.
On Wed, Oct 16, 2019 at 07:32:19PM +1100, Aleksa Sarai wrote:
Maybe I'm misunderstanding Documentation/atomic_t.txt, but it looks to me like it's explicitly saying that I shouldn't use atomic64_t if I'm just using it for fetching and assignment.
Hah, where is it saying that?
Isn't that what this says:
Therefore, if you find yourself only using the Non-RMW operations of atomic_t, you do not in fact need atomic_t at all and are doing it wrong.
Doesn't using just atomic64_read() and atomic64_set() fall under "only using the non-RMW operations of atomic_t"? But yes, I agree that any locking is overkill.
As for 64-bit on 32-bit machines -- that is a separate issue, but from [1] it seems to me like there are more problems that *_ONCE() fixes than just split reads and writes.
Your explanations are too wishy washy. If you wanna fix it, please do it correctly. R/W ONCE isn't the right solution here.
Sure, I will switch it to use atomic64_read() and atomic64_set() instead if that's what you'd prefer. Though I will mention that on quite a few architectures atomic64_read() is defined as:
#define atomic64_read(v) READ_ONCE((v)->counter)
Hello,
On Thu, Oct 17, 2019 at 02:29:46AM +1100, Aleksa Sarai wrote:
Hah, where is it saying that?
Isn't that what this says:
Therefore, if you find yourself only using the Non-RMW operations of atomic_t, you do not in fact need atomic_t at all and are doing it wrong.
Doesn't using just atomic64_read() and atomic64_set() fall under "only using the non-RMW operations of atomic_t"? But yes, I agree that any locking is overkill.
Yeah, I mean, it's an overkill. We can use seqlock or u64_stat here but it doesn't matter that much.
As for 64-bit on 32-bit machines -- that is a separate issue, but from [1] it seems to me like there are more problems that *_ONCE() fixes than just split reads and writes.
Your explanations are too wishy washy. If you wanna fix it, please do it correctly. R/W ONCE isn't the right solution here.
Sure, I will switch it to use atomic64_read() and atomic64_set() instead if that's what you'd prefer. Though I will mention that on quite a few architectures atomic64_read() is defined as:
#define atomic64_read(v) READ_ONCE((v)->counter)
Yeah, on archs which don't have split access on 64bits. On the ones which do, it does something else. The generic implementation is straight-up locking, I think.
Thanks.
On 2019-10-17, Aleksa Sarai cyphar@cyphar.com wrote:
On 2019-10-16, Tejun Heo tj@kernel.org wrote:
Hello, Aleksa.
On Wed, Oct 16, 2019 at 07:32:19PM +1100, Aleksa Sarai wrote:
Maybe I'm misunderstanding Documentation/atomic_t.txt, but it looks to me like it's explicitly saying that I shouldn't use atomic64_t if I'm just using it for fetching and assignment.
Hah, where is it saying that?
Isn't that what this says:
Therefore, if you find yourself only using the Non-RMW operations of atomic_t, you do not in fact need atomic_t at all and are doing it wrong.
Doesn't using just atomic64_read() and atomic64_set() fall under "only using the non-RMW operations of atomic_t"? But yes, I agree that any locking is overkill.
As for 64-bit on 32-bit machines -- that is a separate issue, but from [1] it seems to me like there are more problems that *_ONCE() fixes than just split reads and writes.
Your explanations are too wishy washy. If you wanna fix it, please do it correctly. R/W ONCE isn't the right solution here.
Sure, I will switch it to use atomic64_read() and atomic64_set() instead if that's what you'd prefer. Though I will mention that on quite a few architectures atomic64_read() is defined as:
#define atomic64_read(v) READ_ONCE((v)->counter)
Though I guess that's because on those architectures it turns out that READ_ONCE is properly atomic?
On Thu, Oct 17, 2019 at 02:35:20AM +1100, Aleksa Sarai wrote:
Sure, I will switch it to use atomic64_read() and atomic64_set() instead if that's what you'd prefer. Though I will mention that on quite a few architectures atomic64_read() is defined as:
#define atomic64_read(v) READ_ONCE((v)->counter)
Though I guess that's because on those architectures it turns out that READ_ONCE is properly atomic?
Oh yeah, on archs where 64bit accesses are atomic, READ_ONCE() / WRITE_ONCE() would work here. If the limit variable were ulong instead of an explicit 64bit variable, RW ONCE would work too as ulong accesses are atomic on all archs IIRC.
Thanks.
linux-stable-mirror@lists.linaro.org