[TCWG CI] 433.milc slowed down by 5% after llvm: [AMDGPU] Implement widening multiplies with v_mad_i64_i32/v_mad_u64_u32