Just came across Facebook's announcement of open-sourcing BOLT, a tool for optimising instruction cache and TLB misses. Looks interesting. They claim to get 2-15% perf improvement in their services with BOLT and have a port for aarch64: https://code.facebook.com/posts/605721433136474/accelerate-large-scale-appli...
Regards, Prathamesh