Hi Sumit,
I think I've found it!
You compiled to IR (.bc) by specifying hard-float, but when you convert it to assembly (where the AAPCS will be lowered), you don't:
[pocl] executing [/usr/lib/llvm-3.2/bin/llc -relocation-model=pic -o /tmp/pthread/test_as_type/1-1-1.0-0-0/parallel.s /tmp/pthread/test_as_type/1-1-1.0-0-0/parallel.bc]
Later on you pass the hard-float argument to the assembler (clang, which passes to as), but that's too late.
If you want to compile in separate steps, you'll have to provide consistent flags on each step, to make sure nothing is left behind. All tools, clang, as, llc etc will have to have the same set of flags (or similar flags, if they accept slightly different syntax).
Hope that helps!
--renato