I discovered some excessive memory usage in gas recently when defining macros. It turns out that this is a weird implementation feature rather than a bug.
This patch has a possible fix for the issue, but I'd be interested in people's views before I go so far as cleaning it up and discussing it upstream.
Cheers ---Dave
Dave Martin (1): gas: Allow for a more sensible number of macro arguments
gas/as.c | 17 +++++++++++++++++ gas/doc/as.texinfo | 9 ++++++++- gas/hash.c | 5 +++-- gas/hash.h | 1 + gas/macro.c | 22 +++++++++++++++++++++- gas/macro.h | 1 + 6 files changed, 51 insertions(+), 4 deletions(-)
-- 1.7.4.1
Defining a macro seems to eat up about half a megabyte of memory, due to the excessively large size of the macro arguments hash table (large enough for 65537 entries by default).
As a result, gas memory consumption becomes rather huge when a modestly large number (hundreds) of macros are defined.
In fact, it appears to make little sense to force the size of macro argument hash tables to be the same as the size of other hash tables, since the typical number of macro arguments is very small compared with the number of opcodes, macros or symbols; and because the number of macros is not limited to a small constant number.
This patch uses a smaller default size for the macro argument hash tables (43) which should be enough for virtually everyone. hash_new_sized () is exported from hash.c in order to allow hash tables of multiple sizes to be created.
A new argument, --macro-args is added to allow this number to be customised.
These numbers are deliberately conservative -- they could perhaps be reduced further.
Caveats:
* I'm not a hash table expert, so the exact size values I've chosen may not be optimal.
* The user should not be expected to choose good hash table sizes themself when using --macro-args. However, we can't use bfd's knowledge of good hash table sizes because that knowledge is conflated into bfd_hash_set_default_size (), which has side effects that we don't want. Possibly suitable tables of sizes are buried inside libbfd and libiberty, but there's no direct way to get at them.
Factoring out hash table size choice as a separate function might be better, but I don't know whether the same choice function is suitable for all the various hash table implementations used in binutils. --- gas/as.c | 17 +++++++++++++++++ gas/doc/as.texinfo | 9 ++++++++- gas/hash.c | 5 +++-- gas/hash.h | 1 + gas/macro.c | 22 +++++++++++++++++++++- gas/macro.h | 1 + 6 files changed, 51 insertions(+), 4 deletions(-)
diff --git a/gas/as.c b/gas/as.c index b99ea1e..07ed806 100644 --- a/gas/as.c +++ b/gas/as.c @@ -301,6 +301,8 @@ Options:\n\ fprintf (stream, _("\ --hash-size=<value> set the hash table size close to <value>\n")); fprintf (stream, _("\ + --macro-args=<value> tune for <value> arguments per macro")); + fprintf (stream, _("\ --help show this message and exit\n")); fprintf (stream, _("\ --target-help show target specific options\n")); @@ -450,6 +452,7 @@ parse_args (int * pargc, char *** pargv) OPTION_ALTERNATE, OPTION_AL, OPTION_HASH_TABLE_SIZE, + OPTION_MACRO_ARGS_HASH_TABLE_SIZE, OPTION_REDUCE_MEMORY_OVERHEADS, OPTION_WARN_FATAL, OPTION_COMPRESS_DEBUG, @@ -491,6 +494,7 @@ parse_args (int * pargc, char *** pargv) ,{"gstabs", no_argument, NULL, OPTION_GSTABS} ,{"gstabs+", no_argument, NULL, OPTION_GSTABS_PLUS} ,{"hash-size", required_argument, NULL, OPTION_HASH_TABLE_SIZE} + ,{"macro-args", required_argument, NULL, OPTION_MACRO_ARGS_HASH_TABLE_SIZE} ,{"help", no_argument, NULL, OPTION_HELP} #ifdef HAVE_ITBL_CPU /* New option for extending instruction set (see also -t above). @@ -936,6 +940,7 @@ This program has absolutely no warranty.\n")); /* The only change we make at the moment is to reduce the size of the hash tables that we use. */ set_gas_hash_table_size (4051); + set_macro_args_hash_table_size (0); break;
case OPTION_HASH_TABLE_SIZE: @@ -949,6 +954,18 @@ This program has absolutely no warranty.\n")); as_fatal (_("--hash-size needs a numeric argument")); break; } + + case OPTION_MACRO_ARGS_HASH_TABLE_SIZE: + { + unsigned long new_size; + + new_size = strtoul (optarg, NULL, 0); + if (new_size) + set_macro_args_hash_table_size (new_size); + else + as_fatal (_("--macro-args needs a numeric argument")); + break; + } } }
diff --git a/gas/doc/as.texinfo b/gas/doc/as.texinfo index 7a41f02..a37467d 100644 --- a/gas/doc/as.texinfo +++ b/gas/doc/as.texinfo @@ -241,6 +241,7 @@ gcc(1), ld(1), and the Info entries for @file{binutils} and @file{ld}. [@b{--listing-lhs-width2}=@var{NUM}] [@b{--listing-rhs-width}=@var{NUM}] [@b{--listing-cont-lines}=@var{NUM}] [@b{--keep-locals}] [@b{-o} @var{objfile}] [@b{-R}] [@b{--reduce-memory-overheads}] [@b{--statistics}] + [@b{--hash-size}=@var{NUM}] [@b{--macro-args}=@var{NUM}] [@b{-v}] [@b{-version}] [@b{--version}] [@b{-W}] [@b{--warn}] [@b{--fatal-warnings}] [@b{-w}] [@b{-x}] [@b{-Z}] [@b{@@@var{FILE}}] [@b{--size-check=[error|warning]}] @@ -682,13 +683,19 @@ Name the object-file output from @command{@value{AS}} @var{objfile}. @item -R Fold the data section into the text section.
-@kindex --hash-size=@var{number} +@item --hash-size=@var{number} Set the default size of GAS's hash tables to a prime number close to @var{number}. Increasing this value can reduce the length of time it takes the assembler to perform its tasks, at the expense of increasing the assembler's memory requirements. Similarly reducing this value can reduce the memory requirements at the expense of speed.
+@item --macro-args=@var{number} +Set the default hash table size for macro arguments to @var{number}. The +default vaule should be sensible for almost all users: normally you should not +use this option unless you use a lot of macros with very many arguments (tens +or more). + @item --reduce-memory-overheads This option reduces GAS's memory requirements, at the expense of making the assembly processes slower. Currently this switch is a synonym for diff --git a/gas/hash.c b/gas/hash.c index a58c948..586f0cd 100644 --- a/gas/hash.c +++ b/gas/hash.c @@ -81,9 +81,10 @@ set_gas_hash_table_size (unsigned long size) gas_hash_table_size = bfd_hash_set_default_size (size); }
-/* Create a hash table. This return a control block. */ +/* Create a hash table. This return a control block. + The caller is responsible for choosing a sensible size. */
-static struct hash_control * +struct hash_control * hash_new_sized (unsigned long size) { unsigned long alloc; diff --git a/gas/hash.h b/gas/hash.h index fac814e..5269a5d 100644 --- a/gas/hash.h +++ b/gas/hash.h @@ -31,6 +31,7 @@ void set_gas_hash_table_size (unsigned long); /* Create a hash table. This return a control block. */
extern struct hash_control *hash_new (void); +extern struct hash_control *hash_new_sized (unsigned long size);
/* Delete a hash table, freeing all allocated memory. */
diff --git a/gas/macro.c b/gas/macro.c index a74b40b..d0aba07 100644 --- a/gas/macro.c +++ b/gas/macro.c @@ -621,6 +621,26 @@ free_macro (macro_entry *macro) free (macro); }
+static unsigned long macro_args_hash_table_size = 43; + +/* Set the default hash table size for an argument's macros. Macros + with a number of arguments which approach or exceed this value will + perform poorly. + + Calling this function with size equal to 0 sets the hash table size + to the minimum reasonable size ARGS_HASH_MIN defined below. + + Note: apart from attempting to ensure a reasonably sensible minimum, + there is no protection against poorly-chosen hash sizes. */ + +#define ARGS_HASH_MIN 23 + +void +set_macro_args_hash_table_size (unsigned long size) +{ + macro_args_hash_table_size = size < ARGS_HASH_MIN ? ARGS_HASH_MIN : size; +} + /* Define a new macro. Returns NULL on success, otherwise returns an error message. If NAMEP is not NULL, *NAMEP is set to the name of the macro which was defined. */ @@ -643,7 +663,7 @@ define_macro (int idx, sb *in, sb *label,
macro->formal_count = 0; macro->formals = 0; - macro->formal_hash = hash_new (); + macro->formal_hash = hash_new_sized (macro_args_hash_table_size);
idx = sb_skip_white (idx, in); if (! buffer_and_nest ("MACRO", "ENDM", ¯o->sub, get_line)) diff --git a/gas/macro.h b/gas/macro.h index edc1b6b..8909f80 100644 --- a/gas/macro.h +++ b/gas/macro.h @@ -92,5 +92,6 @@ extern const char *define_macro extern int check_macro (const char *, sb *, const char **, macro_entry **); extern void delete_macro (const char *); extern const char *expand_irp (int, int, sb *, sb *, int (*) (sb *)); +extern void set_macro_args_hash_table_size (unsigned long size);
#endif
On Tue, Nov 22, 2011 at 12:10 AM, Dave Martin dave.martin@linaro.org wrote:
Defining a macro seems to eat up about half a megabyte of memory, due to the excessively large size of the macro arguments hash table (large enough for 65537 entries by default).
Hi Dave. Just bikshedding responses from me, but that's better than nothing.
As a result, gas memory consumption becomes rather huge when a modestly large number (hundreds) of macros are defined.
In fact, it appears to make little sense to force the size of macro argument hash tables to be the same as the size of other hash tables, since the typical number of macro arguments is very small compared with the number of opcodes, macros or symbols; and because the number of macros is not limited to a small constant number.
This patch uses a smaller default size for the macro argument hash tables (43) which should be enough for virtually everyone. hash_new_sized () is exported from hash.c in order to allow hash tables of multiple sizes to be created.
What's the most arguments you've seen in real code? Does recursion or nesting of macros ever blow out this number?
A new argument, --macro-args is added to allow this number to be customised.
I don't like this. The default should be high enough that no one ever needs the argument.
These numbers are deliberately conservative -- they could perhaps be reduced further.
Caveats:
* I'm not a hash table expert, so the exact size values I've chosen may not be optimal.
Does it matter? Is the hash table checked often, or is it just used as a convenient name/value container?
On Wed, Nov 23, 2011 at 04:00:58PM +1300, Michael Hope wrote:
On Tue, Nov 22, 2011 at 12:10 AM, Dave Martin dave.martin@linaro.org wrote:
Defining a macro seems to eat up about half a megabyte of memory, due to the excessively large size of the macro arguments hash table (large enough for 65537 entries by default).
Hi Dave. Just bikshedding responses from me, but that's better than nothing.
Thanks for the comments
As a result, gas memory consumption becomes rather huge when a modestly large number (hundreds) of macros are defined.
In fact, it appears to make little sense to force the size of macro argument hash tables to be the same as the size of other hash tables, since the typical number of macro arguments is very small compared with the number of opcodes, macros or symbols; and because the number of macros is not limited to a small constant number.
This patch uses a smaller default size for the macro argument hash tables (43) which should be enough for virtually everyone. hash_new_sized () is exported from hash.c in order to allow hash tables of multiple sizes to be created.
What's the most arguments you've seen in real code? Does recursion or nesting of macros ever blow out this number?
The power of the gas macro processor is not widely appreciated, so far as I can tell. Except for experimantal things I've written myself, more than, say, 20 arguments or recursion appear to be exceptionally rare. The largest number of arguments I can find in the Linux kernel is 16. I can't find a single example of a recursive macro (though this is tricky to search for, so I may have missed some).
Paramaterised definition of macros from within other macros (which is where a lot of the real power, and the scope for having really large numbers of arguments, comes from) is something I've never seen an example of in the wild.
The fact that it's impossible to split a statement over multiple lines (?) also serves as a rather effective deterrent against having significantly large numbers of arguments.
Auto-generated source, or macro recursion, could result in the definition of large numbers of macros, and/or macros with essentially unlimited numbers of arguments. But such usage is far, far away from the common case, so far as I can tell.
Note that proliferation of macro arguments is not implied by the use of recursion. Most sane uses of recursion that I can think of only use/generate macros with a fairly conventional number of arguments.
A new argument, --macro-args is added to allow this number to be customised.
I don't like this. The default should be high enough that no one ever needs the argument.
Well, yes -- I sort of agree, although the same argument applies to --hash-size. You need a really huge compilation unit for that actually to be useful, and because more people seem not to use gcc -pipe, gas is often passed a file, whose size could provide the basis for a good guess about the appropriate size for the hash tables. gcc -combine could also produce huge compilation units. That's a more plausible to get a huge compilation unit, but I haven't seen much use of this either.
Now you mention it, though, because we only have to examine a single line of assembler source to count the arguments for a macro being defined, we may simply be able to create a hash of the correct size rather easily.
That would certainly be better than having a command-line switch, but we still have the problem of choosing a good hash-table size. Unfortunately, there no exposed "create a hash table of a sensible size close to this" function libiberty would seem the obvious place for that, but every part of binutils appears to roll its own hash table implementation.
These numbers are deliberately conservative -- they could perhaps be reduced further.
Caveats:
* I'm not a hash table expert, so the exact size values I've chosen may not be optimal.
Does it matter? Is the hash table checked often, or is it just used as a convenient name/value container?
Lookups will be done in the hash every time an argument is substituted during macro expansion.
So, both -- but if you make heavy use of macros, performance could be significantly impacted if the choice of hash table size is poor.
For most common usage, I suspect it isn't going to matter much though, so long as the hash table size isn't something pathologically bad like a power of 2.
Really, the knowledge about good choice of hash table sizes is not local to this problem -- it should exist in one place and get reused wherever appropriate. Do you have any views on where that should go?
Cheers ---Dave
linaro-toolchain@lists.linaro.org