On Tue, Apr 29, 2025 at 4:47 PM Jiayuan Chen jiayuan.chen@linux.dev wrote:
This looks to me like an artificial benchmark. Surely perf will be higher when wq is executed on free cpu. In production all cpus likely have work to do, so this whole approach 'lets ask wq to run on that cpu' isn't going to work. Looks like RPS helps. Use that. I think it will scale and work better when the whole server is loaded. pw-bot: cr
Hi Alexei, you're right for requests coming from a remote host, all CPUs are running; in cloud-native scenarios where Sidecars are widely used, they access each other through loopback, but for requests accessing each other through loopback, the wq (workqueue) will definitely run on the CPU where the client is located (based on the implementation of loopback and wq). Since the Sidecar itself is bound to a CPU, which means that in actual scenarios, the CPU bound to the gateway (reverse proxy) program using sockmap cannot be fully utilized.
Enabling RPS can alleviate the sockmap issue, but it will introduce an extra software calculation, so from a performance perspective, we still expect to have a solution that can achieve the highest performance.
And I think it's wrong to optimize for performance of one particular setup. An API that picks a cpu is difficult to get right. Too easy to make performance worse.