[PATCH 5.4 149/178] x86/asm/64: Align start of __clear_user() loop to 16-bytes