Hi Michal,
On 8/26/25 15:18, Michal Koutný wrote:
Hi Djalal.
On Mon, Aug 18, 2025 at 10:04:21AM +0100, Djalal Harouni tixxdz@gmail.com wrote:
This patch series add support to write cgroup interfaces from BPF.
It is useful to freeze a cgroup hierarchy on suspicious activity for a more thorough analysis before killing it. Planned users of this feature are: systemd and BPF tools where the cgroup hierarchy could be a system service, user session, k8s pod or a container.
Could you please give more specific example of the "suspicious activity"? The last time (v1) it was referring to LSM hooks where such asynchronous approach wasn't ideal.
It solves the case perfectly, you detect something you fail the security hook return -EPERM and optionally freeze the cgroup, snapshot the runtime state.
Oh I thought the attached example is an obvious one, customers want to restrict bpf() usage per cgroup specific container/pod, so when we detect bpf() that's not per allowed cgroup we fail it and freeze it.
Take this and build on top, detect bash/shell exec or any other new dropped binaries, fail and freeze the exec early at linux_bprm object checks.
Also why couldn't all these tools execute the cgroup actions themselves through traditional userspace API?
- Freezing at BPF is obviously better, less race since you don't need access to the corresponding cgroup fs and namespace. Not all tools run as supervisor/container manager. - The bpf_send_signal in some cases is not enough, what if you race with a task clone as an example? however freezing the cgroup hierarchy or the one above is a catch all...
One more point (for possible interference with lifecycles) -- what is the relation between cgroup in which the BPF code "runs" and cgroup that's target of the operation? (I hope this isn't supposed to run from BPF without process context.)
The feature is supposed to be used by sleepable BPF programs, I don't think we need extra checks here?
It could be that this BPF code runs in a process that is under pod-x/container-y/cgroup-z/ and maybe you want to freeze "cgroup-z" or "container-y" and so on... or in case of delegated hierarchies, freezing the parent is a catch all.
Todo:
- Limit size of data to be written.
- Further tests.
- Add cgroup kill support.
I'm missing the retrieval of freeze result in this plan :) cgroup kill
Indeed you are right a small kfunc to read back, yes ;) !
would be simpler for PoC (and maybe even sufficient for your use case?).
I think both are useful cases.
Thank you!
Regards, Michal