Hello,
While testing SMS on Crotex-A9 I see that the latency of load instruction
is 1
cycle when compiling with -mcpu=cortex-a9 -mthumb -mtune=cortex-a9 -O3.
Below is a snippet from the SMS dump file showing the DDG, created for the
loop in foo function, which depicts the edge between the load of input[i]
(insn 181) and the mult instruction (insn 184).
[181 -(T,1,0)-> 184] is the true dependence edge created between the
two insns; with latency of 1.
On Crotex-A8 the latency of the load is 3 as expected.
I've read in crotex-a9.md file that loads should have a latency of 4 cycles
so I just wanted to check if I should have used other combination of flags
for Crotex-A9 or the load latency should indeed be of 1 cycle here.
Thanks,
Revital
int foo (int max, signed short *input, int y)
{
int i, accum;
for (i = 0; i < max; i++) {
accum += (signed int) input[i] * (signed int) input[i+y];
}
return accum;
}
The snippet from the DDG:
Node num: 2
(insn 181 178 184 13 (set (reg:SI 216 [ D.2019 ])
(zero_extend:SI (mem:HI (plus:SI (reg:SI 319 [ ivtmp.34 ])
(reg:SI 345)) [2 MEM[base: D.2076_257, index:
D.2079_226, offset: 0B]+0 S2 A16]))) tmp.c:7 714
{*thumb2_zero_extendhisi2_v6}
(nil))
OUT ARCS: [181 -(A,0,1)-> 176] [181 -(T,1,0)-> 184]
IN ARCS: [184 -(A,0,1)-> 181] [176 -(T,1,0)-> 181]
Node num: 3
(insn 184 181 234 13 (set (reg/v:SI 209 [ accum ])
(plus:SI (mult:SI (sign_extend:SI (subreg/s/u:HI (reg:SI 212
[ D.2013 ]) 0))
(sign_extend:SI (subreg/s/u:HI (reg:SI 216 [ D.2019 ]) 0)))
(reg/v:SI 209 [ accum ]))) tmp.c:7 64 {maddhisi4}
(expr_list:REG_DEAD (reg:SI 216 [ D.2019 ])
(expr_list:REG_DEAD (reg:SI 212 [ D.2013 ])
(nil))))
== Last Week ==
* Worked to improve libunwind test suite results using new ARM-specific
unwinding. Managed to reduce the number of failures, but some tests
still fail.
* Tested latest ltrace upstream tree and determined that It Is Good.
Hopefully, this marks the cusp of a new release that includes my work.
== This Week ==
* Finish up work with libunwind. Update wiki brain dump.
--
Zach Welch
CodeSourcery
zwelch(a)codesourcery.com
(650) 331-3385 x743
== Last week ==
* Some discussion about libffi variadic function support. The current
proposed design is an additional API function to prepare a new call
interface (CIF) structure with the argument number settings for each
variadic call site. This IMHO, is slightly less flexible that doing a
call-time placement style interface, but should be faster and easier to
implement.
* Collecting a few cases of ARM/Thumb-2 integration in the GCC ARM backend.
* Prepared and traveled to Dallas for the Linaro sprint.
== This week ==
* The Linaro sprint.
== Linaro GCC ==
* Continued hacking on (cleaning up, testing) NEON
element/structure load/store (etc.) intrinsics-improvement patch. This
now passes all the auto-generated neon.exp tests (admittedly only a
very weak test for correctness), and the small number of hand-written
tests I threw at it, so hopefully we can declare victory soon. (This
patch provides huge improvements to generated code for the "fancy" NEON
load/store intrinsics and vtbl/vtbx lookup instructions, and hopefully
also provides part of the solution for allowing the vectorizer to use
the fancy loads/stores also.)
== Misc ==
* Public holiday Monday.
== GCC ==
Pulled down new commits from upstream GCC. My test build failed due to a
new cross-build configure problem. Found the problem in GCC Bugzilla,
and rolled back to the revision before the problem one. Those sources
built ok, so I've pushed the changes to Linaro GCC 4.6 branch. It it now
updated to 31st December.
Discussed cross-build problems with Alexander Sack on IRC.
Merged, tested and pushed all the outstanding Launchpad merge requests
into GCC 4.4 and 4.5.
Merged, tested and pushed all new CS SG++ patches into Linaro GCC 4.5.
Merged, tested and pushed GCC 4.5.2 from upstream into Linaro 4.5.
Spun releases for both Linaro GCC 4.4 and 4.5.
Brought the Linaro patch tracker up to date.
== Other ==
Catch up with lots of email following my two week holiday.
Updated some of the CS patch tracker.
Travel to Dallas for the Linaro sprint.
-------
Next week (10th-14th Jan):
Linaro sprint in Dallas
Following week (17th-21st Jan):
CS annual meeting in Phoenix
Got h264ref working in SPEC; this was another signed-char issue (very
difficult one to find - it didn't crash, it just came out with subtly
the wrong result);
compiling that and sphinx3 with -fsigned-char and they seem happy.
With Richard's fix for gromacs that leaves just the zeusmp binary
that's too large to run on silverberry; it seems to startup on canis1
(that has more RAM).
So that should be a full set; I just need to get the fortran stuff
going on canis1.
Kicked off discussion on libffi-discuss about variadic calls; people
seem OK with the idea of adding it (although unsure exactly how many
things really use it
- even though there are examples in Python documentation). Also
kicked off some of the internal paperwork to contribute code for it.
Dave
== This week ==
* Away Monday, and a fair bit of time on non-Linaro duties.
* Looked at Dave's gromacs bug (693502). Turned out to be a reload
inheritance problem. Tested a patch. Spent some time coming up with
a brute-force testcase that I can submit with the patch.
* Found a bug in the x86 and x86_64 ifunc support that would affect
ARM too if we weren't careful. Came up with an example testcase
and filed the bug upstream:
http://sourceware.org/bugzilla/show_bug.cgi?id=12366
It has been fixed by H.J. Lu. I'm wondering about taking a
slightly different approach for ARM.
* More ifunc work.
* Tried again to reproduce the chrome failure, making sure to use
DEB_BUILD_HARDENING=1. (I hadn't realised first time round that,
in reaction to this bug, debian/rules specifically excluded armel
from the automatic DEB_BUILD_HARDENING=1 setting.) The build
takes a couple of days on my BeagleBoard.
Heh, and just as I wrote that, the build failed with the reported
link error. Neat.
== Next week ==
* Finish testing the patch for 693502 and submit it upstream.
Backport the patch to our tree once accepted.
* Look at the chromium problem.
* More ifunc.
Richard
RAG:
Red:
Amber:
Green: qemu git pull request accepted, patches seem to be flowing
into qemu upstream more freely now
Milestones:
| Planned | Estimate | Actual |
finish virtio-system | 2010-08-27 | postponed | |
finish testing PCI patches | 2010-10-01 | 2010-10-22 | 2010-10-18 |
successful ARM qemu pull req | 2010-12-16 | 2010-12-16 | 2010-12-16 |
finish qemu-cont-integration | 2010-01-25 | 2010-01-25 | |
Bonus extended holiday edition:
This report includes a number of things that happened over
the Christmas holidays as well as this week (which is
a short one, only 3 days).
Progress:
* merge-correctness-fixes:
** my git pull request for various ARM qemu patches was merged!
** a number of other patches were merged to qemu master:
+ implement correct NaN propagation rules
+ rename softfloat float*_is_nan() functions
+ fix UMAAL (Aurelien's patch, reviewed by me)
+ VQSHL (reg) patchset
** diagnosed the segfault in
https://bugs.launchpad.net/ubuntu/+source/qemu-kvm/+bug/604872
and wrote a patchset which fixes it (and related problems):
http://patchwork.ozlabs.org/patch/77887/ (n/7)
** wrote and posted a patchset which implements save/restore
for the versatile platform (so I didn't have to wait 10
minutes for the test case to reach the segfault)
http://patchwork.ozlabs.org/patch/76529/
** finished and posted a patchset implementing flushing of
denormals to zero on input:
http://patchwork.ozlabs.org/patch/77798/ (n/3), now committed
** reviewed/tested Aurelien's SMMLA/SMMLS patch (now committed)
** wrote and posted patches which implement the FS_IOC_FIEMAP
ioctl (http://patchwork.ozlabs.org/patch/77725/) and
the file_sync_range{,2} syscalls
(http://patchwork.ozlabs.org/patch/77723/) -- these are used
by apt, and so linaro-media-create was generating a lot of
warnings from qemu about their lack of implementation
** posted patch to clean up NaN handling in linux-user NWFPE
emulation (follow-on from earlier NaN cleanups):
http://patchwork.ozlabs.org/patch/77795/
* maintain-beagle-models:
** implemented minimal ARM cp14 debug registers
and a random register in the TWL4030 emulation (both needed
to get recent Linaro kernels to boot on the beagle model)
Submitted merge request upstream:
http://meego.gitorious.org/qemu-maemo/qemu/merge_requests/3
Current qemu patch status is tracked here:
https://wiki.linaro.org/PeterMaydell/QemuPatchStatus
Meetings: toolchain standup, pdsw doughnuts
Absences:
2011: Dallas Linaro sprint 9-15 Jan. Holiday 21 Jan, 22 Apr - 2 May.
Hi,
* implemented reduction support in SLP, I'll check if it helps
DenBench next week
* helping Sebastian Pop with if-conversion for vectorization
improvements (BTW, Sebastian's goal is to vectorize kernels from
ffmpeg)
* fixed GCC PR47139
Ira
Hi All,
Thanks for attending the call. I think we had some interesting discussions.
I've posted the minutes from the call on the same page as before:
https://wiki.linaro.org/AndrewStubbs/Sandbox/GCCoptimizations
I'll try to get the audio posted somewhere for anybody that's interested.
Andrew