Re: [RFC] NEON vs. ARM register selection

4 Mar 2012


      ...
The basic idea is that we add a new RTL optimization pass (or two) that
assesses the usage of pseudo registers, and makes recommendations about
what register class each should end up in, if there's a choice. These
recommendations would then be used by later passes to get a better use
of NEON. I might call this the "prealloc" pass, or something.
That sounds very much like the pre-reload that "new-ra" had at one
point (http://gcc.gnu.org/viewcvs/branches/new-regalloc-branch/gcc/pre-reload.c).
The problem with pre-reload for new-ra was that it was basically
reload instead of something nicer and cleaner. It also only ran just
before the register allocator, which is too late for the problem you
are trying to solve.
...
Firstly, for each pseudo-register in a function, the pass would look at
the insn constraints for each "def" and "use", and see how the registers
relate to one another. This might determine things like "if rN is in
class A, then rM must be also in class A".
At SUSE I tried to do this with the webizer pass (web.c). I wrote down
the ideas we implemented at the time (see
http://gcc.gnu.org/ml/gcc/2005-01/msg00179.html):
- web class, to replace regclass and choose register classes webs
instead of pseudos.  This also includes splitting webs if a register
in a web really wants to be in two different classes to satisfy
constraints in two different insns.  Right now, as far as I
understand, regclass just picks one and lets reload figure out how to
fix up that mistake.
- A semi-strict RTL mode.  Right now there is just strict and
non-strict.  On the branch there is a semi-strict mode which is the
same as strict RTL except that pseudo-registers are still allowed.
- pre-reload (which is related to web class) to make sure as many insn
constraints as possible are satisfied before the register allocator
goes to work.  Basically, after pre-reload the insns stream should be
in semi-strict RTL form.
I used the webizer to unify defs and uses. I would split a web if it
needed multiple register classes (I inserted a mov, without checking
that a move existed from the source to the target register class), and
I put pseudos r1 and r2 in the same register class if there was an
insn (set (r1) (r2)) somewhere. The selection of the register classes
had a cost function, but I used rtx_cost, which is not very effective,
really. But I never took this experiment very far because for x86-64
the plan didn't work as well as I had hoped. I don't remember the
details, but the biggest problem I had with the experimental
implementation of these ideas (apart from lots of trouble with recog
for semi-strict RTL) was that there is a bit of an ordering problem
between combine on the one hand, and web-based register classes. If
you assign classes too early and don't allow things to change, then
combine fails too often. If you assign register classes after combine,
you may not get the instructions selected the way you want them to be.
This was when GCC still had the old local-alloc.c and global.c
allocators. Things may be different (better) with IRA and the upcoming
LRA stuff.
If you plan to work on this, I would suggest you discuss the plan on
the GCC mailing list also, with Jeff Law and Vladimir Makarov in CC
because they are working on a reload rewrite (LRA).
Ciao!
Steven

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

Re: [RFC] NEON vs. ARM register selection