Successfully identified regression in *glibc* in CI configuration tcwg_bmk_gnu_tx1/gnu-master-aarch64-spec2k6-O2_LTO. So far, this commit has regressed CI configurations: - tcwg_bmk_gnu_tx1/gnu-master-aarch64-spec2k6-O2_LTO
Culprit: <cut> commit f79f2065817e080f65f3c3a2fee966f5a97f1746 Author: Florian Weimer fweimer@redhat.com Date: Wed Apr 21 19:49:50 2021 +0200
nptl: Move legacy unwinding implementation into libc
It is still used internally. Since unwinding is now available unconditionally, avoid indirect calls through function pointers loaded from the stack by inlining the non-cancellation cleanup code. This avoids a regression in security hardening.
The out-of-line __libc_cleanup_routine implementation is no longer needed because the inline definition is now static __always_inline.
Reviewed-by: Adhemerval Zanella adhemerval.zanella@linaro.org </cut>
Results regressed to (for first_bad == f79f2065817e080f65f3c3a2fee966f5a97f1746) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--disable-libsanitizer: -8 # build_abe linux: -7 # build_abe glibc: -6 # build_abe stage2 -- --set gcc_override_configure=--disable-libsanitizer: -5 # true: 0 # benchmark -O2_LTO -- artifacts/build-f79f2065817e080f65f3c3a2fee966f5a97f1746/results_id: 1 # 410.bwaves,bwaves_base.default regressed by 104
from (for last_good == 5715c29e91076800418833f2196f2082f439da75) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--disable-libsanitizer: -8 # build_abe linux: -7 # build_abe glibc: -6 # build_abe stage2 -- --set gcc_override_configure=--disable-libsanitizer: -5 # true: 0 # benchmark -O2_LTO -- artifacts/build-5715c29e91076800418833f2196f2082f439da75/results_id: 1
Artifacts of last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-master-aar... Results ID of last_good: tx1_64/tcwg_bmk_gnu_tx1/bisect-gnu-master-aarch64-spec2k6-O2_LTO/461 Artifacts of first_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-master-aar... Results ID of first_bad: tx1_64/tcwg_bmk_gnu_tx1/bisect-gnu-master-aarch64-spec2k6-O2_LTO/470 Build top page/logs: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-master-aar...
Configuration details:
Reproduce builds: <cut> mkdir investigate-glibc-f79f2065817e080f65f3c3a2fee966f5a97f1746 cd investigate-glibc-f79f2065817e080f65f3c3a2fee966f5a97f1746
git clone https://git.linaro.org/toolchain/jenkins-scripts
mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-master-aar... --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-master-aar... --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-master-aar... --fail chmod +x artifacts/test.sh
# Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh
cd glibc
# Reproduce first_bad build git checkout --detach f79f2065817e080f65f3c3a2fee966f5a97f1746 ../artifacts/test.sh
# Reproduce last_good build git checkout --detach 5715c29e91076800418833f2196f2082f439da75 ../artifacts/test.sh
cd .. </cut>
History of pending regressions and results: https://git.linaro.org/toolchain/ci/base-artifacts.git/log/?h=linaro-local/c...
Artifacts: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-master-aar... Build log: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-master-aar...
Full commit (up to 1000 lines): <cut> commit f79f2065817e080f65f3c3a2fee966f5a97f1746 Author: Florian Weimer fweimer@redhat.com Date: Wed Apr 21 19:49:50 2021 +0200
nptl: Move legacy unwinding implementation into libc
It is still used internally. Since unwinding is now available unconditionally, avoid indirect calls through function pointers loaded from the stack by inlining the non-cancellation cleanup code. This avoids a regression in security hardening.
The out-of-line __libc_cleanup_routine implementation is no longer needed because the inline definition is now static __always_inline.
Reviewed-by: Adhemerval Zanella adhemerval.zanella@linaro.org --- nptl/Versions | 2 ++ nptl/cleanup_defer_compat.c | 56 ++--------------------------------- nptl/libc-cleanup.c | 64 ++++++++++++++++++++++++++++++++++++++-- nptl/nptl-init.c | 2 -- sysdeps/nptl/libc-lock.h | 59 ++++++++++++++++++------------------ sysdeps/nptl/libc-lockP.h | 26 +--------------- sysdeps/nptl/pthread-functions.h | 4 --- 7 files changed, 97 insertions(+), 116 deletions(-)
diff --git a/nptl/Versions b/nptl/Versions index 60202b4969..2e5a964b11 100644 --- a/nptl/Versions +++ b/nptl/Versions @@ -87,6 +87,8 @@ libc { __futex_abstimed_wait64; __futex_abstimed_wait_cancelable64; __libc_alloca_cutoff; + __libc_cleanup_pop_restore; + __libc_cleanup_push_defer; __libc_dl_error_tsd; __libc_pthread_init; __lll_clocklock_elision; diff --git a/nptl/cleanup_defer_compat.c b/nptl/cleanup_defer_compat.c index 49ef53ea60..1957318208 100644 --- a/nptl/cleanup_defer_compat.c +++ b/nptl/cleanup_defer_compat.c @@ -17,41 +17,15 @@ https://www.gnu.org/licenses/. */
#include "pthreadP.h" - +#include <libc-lock.h>
void _pthread_cleanup_push_defer (struct _pthread_cleanup_buffer *buffer, void (*routine) (void *), void *arg) { - struct pthread *self = THREAD_SELF; - buffer->__routine = routine; buffer->__arg = arg; - buffer->__prev = THREAD_GETMEM (self, cleanup); - - int cancelhandling = THREAD_GETMEM (self, cancelhandling); - - /* Disable asynchronous cancellation for now. */ - if (__glibc_unlikely (cancelhandling & CANCELTYPE_BITMASK)) - while (1) - { - int curval = THREAD_ATOMIC_CMPXCHG_VAL (self, cancelhandling, - cancelhandling - & ~CANCELTYPE_BITMASK, - cancelhandling); - if (__glibc_likely (curval == cancelhandling)) - /* Successfully replaced the value. */ - break; - - /* Prepare for the next round. */ - cancelhandling = curval; - } - - buffer->__canceltype = (cancelhandling & CANCELTYPE_BITMASK - ? PTHREAD_CANCEL_ASYNCHRONOUS - : PTHREAD_CANCEL_DEFERRED); - - THREAD_SETMEM (self, cleanup, buffer); + __libc_cleanup_push_defer (buffer); } strong_alias (_pthread_cleanup_push_defer, __pthread_cleanup_push_defer)
@@ -60,31 +34,7 @@ void _pthread_cleanup_pop_restore (struct _pthread_cleanup_buffer *buffer, int execute) { - struct pthread *self = THREAD_SELF; - - THREAD_SETMEM (self, cleanup, buffer->__prev); - - int cancelhandling; - if (__builtin_expect (buffer->__canceltype != PTHREAD_CANCEL_DEFERRED, 0) - && ((cancelhandling = THREAD_GETMEM (self, cancelhandling)) - & CANCELTYPE_BITMASK) == 0) - { - while (1) - { - int curval = THREAD_ATOMIC_CMPXCHG_VAL (self, cancelhandling, - cancelhandling - | CANCELTYPE_BITMASK, - cancelhandling); - if (__glibc_likely (curval == cancelhandling)) - /* Successfully replaced the value. */ - break; - - /* Prepare for the next round. */ - cancelhandling = curval; - } - - CANCELLATION_P (self); - } + __libc_cleanup_pop_restore (buffer);
/* If necessary call the cleanup routine after we removed the current cleanup block from the list. */ diff --git a/nptl/libc-cleanup.c b/nptl/libc-cleanup.c index 61f97eceda..14ccfe9285 100644 --- a/nptl/libc-cleanup.c +++ b/nptl/libc-cleanup.c @@ -17,11 +17,69 @@ https://www.gnu.org/licenses/. */
#include "pthreadP.h" +#include <tls.h> +#include <libc-lock.h>
+void +__libc_cleanup_push_defer (struct _pthread_cleanup_buffer *buffer) +{ + struct pthread *self = THREAD_SELF; + + buffer->__prev = THREAD_GETMEM (self, cleanup); + + int cancelhandling = THREAD_GETMEM (self, cancelhandling); + + /* Disable asynchronous cancellation for now. */ + if (__glibc_unlikely (cancelhandling & CANCELTYPE_BITMASK)) + while (1) + { + int curval = THREAD_ATOMIC_CMPXCHG_VAL (self, cancelhandling, + cancelhandling + & ~CANCELTYPE_BITMASK, + cancelhandling); + if (__glibc_likely (curval == cancelhandling)) + /* Successfully replaced the value. */ + break; + + /* Prepare for the next round. */ + cancelhandling = curval; + } + + buffer->__canceltype = (cancelhandling & CANCELTYPE_BITMASK + ? PTHREAD_CANCEL_ASYNCHRONOUS + : PTHREAD_CANCEL_DEFERRED); + + THREAD_SETMEM (self, cleanup, buffer); +} +libc_hidden_def (__libc_cleanup_push_defer)
void -__libc_cleanup_routine (struct __pthread_cleanup_frame *f) +__libc_cleanup_pop_restore (struct _pthread_cleanup_buffer *buffer) { - if (f->__do_it) - f->__cancel_routine (f->__cancel_arg); + struct pthread *self = THREAD_SELF; + + THREAD_SETMEM (self, cleanup, buffer->__prev); + + int cancelhandling; + if (__builtin_expect (buffer->__canceltype != PTHREAD_CANCEL_DEFERRED, 0) + && ((cancelhandling = THREAD_GETMEM (self, cancelhandling)) + & CANCELTYPE_BITMASK) == 0) + { + while (1) + { + int curval = THREAD_ATOMIC_CMPXCHG_VAL (self, cancelhandling, + cancelhandling + | CANCELTYPE_BITMASK, + cancelhandling); + if (__glibc_likely (curval == cancelhandling)) + /* Successfully replaced the value. */ + break; + + /* Prepare for the next round. */ + cancelhandling = curval; + } + + CANCELLATION_P (self); + } } +libc_hidden_def (__libc_cleanup_pop_restore) diff --git a/nptl/nptl-init.c b/nptl/nptl-init.c index 2c7e2222d4..43e2564e59 100644 --- a/nptl/nptl-init.c +++ b/nptl/nptl-init.c @@ -95,8 +95,6 @@ static const struct pthread_functions pthread_functions = .ptr___pthread_key_create = __pthread_key_create, .ptr___pthread_getspecific = __pthread_getspecific, .ptr___pthread_setspecific = __pthread_setspecific, - .ptr__pthread_cleanup_push_defer = __pthread_cleanup_push_defer, - .ptr__pthread_cleanup_pop_restore = __pthread_cleanup_pop_restore, .ptr_nthreads = &__nptl_nthreads, .ptr___pthread_unwind = &__pthread_unwind, .ptr__nptl_deallocate_tsd = __nptl_deallocate_tsd, diff --git a/sysdeps/nptl/libc-lock.h b/sysdeps/nptl/libc-lock.h index dea96121b3..e8a5e68a12 100644 --- a/sysdeps/nptl/libc-lock.h +++ b/sysdeps/nptl/libc-lock.h @@ -143,39 +143,40 @@ typedef struct __libc_lock_recursive_opaque__ __libc_lock_recursive_t; __libc_maybe_call (__pthread_mutex_unlock, (&(NAME).mutex), 0) #endif
-/* Note that for I/O cleanup handling we are using the old-style - cancel handling. It does not have to be integrated with C++ since - no C++ code is called in the middle. The old-style handling is - faster and the support is not going away. */ -extern void _pthread_cleanup_push_defer (struct _pthread_cleanup_buffer *buffer, - void (*routine) (void *), void *arg); -extern void _pthread_cleanup_pop_restore (struct _pthread_cleanup_buffer *buffer, - int execute); +/* Put the unwind buffer BUFFER on the per-thread callback stack. The + caller must fill BUFFER->__routine and BUFFER->__arg before calling + this function. */ +void __libc_cleanup_push_defer (struct _pthread_cleanup_buffer *buffer); +libc_hidden_proto (__libc_cleanup_push_defer) +/* Remove BUFFER from the unwind callback stack. The caller must invoke + the callback if desired. */ +void __libc_cleanup_pop_restore (struct _pthread_cleanup_buffer *buffer); +libc_hidden_proto (__libc_cleanup_pop_restore)
/* Start critical region with cleanup. */ -#define __libc_cleanup_region_start(DOIT, FCT, ARG) \ - { struct _pthread_cleanup_buffer _buffer; \ - int _avail; \ - if (DOIT) { \ - _avail = PTFAVAIL (_pthread_cleanup_push_defer); \ - if (_avail) { \ - __libc_ptf_call_always (_pthread_cleanup_push_defer, (&_buffer, FCT, \ - ARG)); \ - } else { \ - _buffer.__routine = (FCT); \ - _buffer.__arg = (ARG); \ - } \ - } else { \ - _avail = 0; \ - } +#define __libc_cleanup_region_start(DOIT, FCT, ARG) \ + { bool _cleanup_start_doit; \ + struct _pthread_cleanup_buffer _buffer; \ + /* Non-addressable copy of FCT, so that we avoid indirect calls on \ + the non-unwinding path. */ \ + void (*_cleanup_routine) (void *) = (FCT); \ + _buffer.__arg = (ARG); \ + if (DOIT) \ + { \ + _cleanup_start_doit = true; \ + _buffer.__routine = _cleanup_routine; \ + __libc_cleanup_push_defer (&_buffer); \ + } \ + else \ + _cleanup_start_doit = false;
/* End critical region with cleanup. */ -#define __libc_cleanup_region_end(DOIT) \ - if (_avail) { \ - __libc_ptf_call_always (_pthread_cleanup_pop_restore, (&_buffer, DOIT));\ - } else if (DOIT) \ - _buffer.__routine (_buffer.__arg); \ - } +#define __libc_cleanup_region_end(DOIT) \ + if (_cleanup_start_doit) \ + __libc_cleanup_pop_restore (&_buffer); \ + if (DOIT) \ + _cleanup_routine (_buffer.__arg); \ + } /* matches __libc_cleanup_region_start */
/* Hide the definitions which are only supposed to be used inside libc in diff --git a/sysdeps/nptl/libc-lockP.h b/sysdeps/nptl/libc-lockP.h index 63b605dee2..2c928b7a76 100644 --- a/sysdeps/nptl/libc-lockP.h +++ b/sysdeps/nptl/libc-lockP.h @@ -251,32 +251,12 @@ _Static_assert (LLL_LOCK_INITIALIZER == 0, "LLL_LOCK_INITIALIZER != 0"); /* Get once control variable. */ #define __libc_once_get(ONCE_CONTROL) ((ONCE_CONTROL) != PTHREAD_ONCE_INIT)
-/* Note that for I/O cleanup handling we are using the old-style - cancel handling. It does not have to be integrated with C++ snce - no C++ code is called in the middle. The old-style handling is - faster and the support is not going away. */ -extern void _pthread_cleanup_push (struct _pthread_cleanup_buffer *buffer, - void (*routine) (void *), void *arg); -extern void _pthread_cleanup_pop (struct _pthread_cleanup_buffer *buffer, - int execute); -extern void _pthread_cleanup_push_defer (struct _pthread_cleanup_buffer *buffer, - void (*routine) (void *), void *arg); -extern void _pthread_cleanup_pop_restore (struct _pthread_cleanup_buffer *buffer, - int execute); - -/* Sometimes we have to exit the block in the middle. */ -#define __libc_cleanup_end(DOIT) \ - if (_avail) { \ - __libc_ptf_call_always (_pthread_cleanup_pop_restore, (&_buffer, DOIT));\ - } else if (DOIT) \ - _buffer.__routine (_buffer.__arg) - /* __libc_cleanup_push and __libc_cleanup_pop depend on exception handling and stack unwinding. */ #ifdef __EXCEPTIONS
/* Normal cleanup handling, based on C cleanup attribute. */ -__extern_inline void +static __always_inline void __libc_cleanup_routine (struct __pthread_cleanup_frame *f) { if (f->__do_it) @@ -388,8 +368,6 @@ weak_extern (__pthread_once) weak_extern (__pthread_initialize) weak_extern (__pthread_atfork) weak_extern (__pthread_setcancelstate) -weak_extern (_pthread_cleanup_push_defer) -weak_extern (_pthread_cleanup_pop_restore) # else # pragma weak __pthread_mutex_init # pragma weak __pthread_mutex_destroy @@ -412,8 +390,6 @@ weak_extern (_pthread_cleanup_pop_restore) # pragma weak __pthread_initialize # pragma weak __pthread_atfork # pragma weak __pthread_setcancelstate -# pragma weak _pthread_cleanup_push_defer -# pragma weak _pthread_cleanup_pop_restore # endif #endif
diff --git a/sysdeps/nptl/pthread-functions.h b/sysdeps/nptl/pthread-functions.h index 97a5c48939..4268084b66 100644 --- a/sysdeps/nptl/pthread-functions.h +++ b/sysdeps/nptl/pthread-functions.h @@ -57,10 +57,6 @@ struct pthread_functions int (*ptr___pthread_key_create) (pthread_key_t *, void (*) (void *)); void *(*ptr___pthread_getspecific) (pthread_key_t); int (*ptr___pthread_setspecific) (pthread_key_t, const void *); - void (*ptr__pthread_cleanup_push_defer) (struct _pthread_cleanup_buffer *, - void (*) (void *), void *); - void (*ptr__pthread_cleanup_pop_restore) (struct _pthread_cleanup_buffer *, - int); #define HAVE_PTR_NTHREADS unsigned int *ptr_nthreads; void (*ptr___pthread_unwind) (__pthread_unwind_buf_t *) </cut>
* ci notify:
Successfully identified regression in *glibc* in CI configuration tcwg_bmk_gnu_tx1/gnu-master-aarch64-spec2k6-O2_LTO. So far, this commit has regressed CI configurations:
- tcwg_bmk_gnu_tx1/gnu-master-aarch64-spec2k6-O2_LTO
# 410.bwaves,bwaves_base.default regressed by 104
Does this report reflect a performance regression?
Thanks, Florian
Hi Florian,
This is a report from Linaro’s automatic benchmarking system, which tracks upstream components and automatically bisects performance regressions down to a single commit. We have recently enabled these emails to be sent to patch authors automatically.
It appears that after your patch SPEC CPU2006’s 410.bwaves slowed down by 4% on aarch64-linux-gnu when built with -O2 -flto. Any theories on what could’ve caused it?
This is one of the first reports, and we welcome feedback on improving reporting template.
Regards,
-- Maxim Kuvyrkov https://www.linaro.org
On 26 Jun 2021, at 12:12, ci_notify@linaro.org wrote:
Successfully identified regression in *glibc* in CI configuration tcwg_bmk_gnu_tx1/gnu-master-aarch64-spec2k6-O2_LTO. So far, this commit has regressed CI configurations:
- tcwg_bmk_gnu_tx1/gnu-master-aarch64-spec2k6-O2_LTO
Culprit:
<cut> commit f79f2065817e080f65f3c3a2fee966f5a97f1746 Author: Florian Weimer <fweimer@redhat.com> Date: Wed Apr 21 19:49:50 2021 +0200
nptl: Move legacy unwinding implementation into libc
It is still used internally. Since unwinding is now available unconditionally, avoid indirect calls through function pointers loaded from the stack by inlining the non-cancellation cleanup code. This avoids a regression in security hardening.
The out-of-line __libc_cleanup_routine implementation is no longer needed because the inline definition is now static __always_inline.
Reviewed-by: Adhemerval Zanella adhemerval.zanella@linaro.org
</cut>
Results regressed to (for first_bad == f79f2065817e080f65f3c3a2fee966f5a97f1746) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--disable-libsanitizer: -8 # build_abe linux: -7 # build_abe glibc: -6 # build_abe stage2 -- --set gcc_override_configure=--disable-libsanitizer: -5 # true: 0 # benchmark -O2_LTO -- artifacts/build-f79f2065817e080f65f3c3a2fee966f5a97f1746/results_id: 1 # 410.bwaves,bwaves_base.default regressed by 104
from (for last_good == 5715c29e91076800418833f2196f2082f439da75) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--disable-libsanitizer: -8 # build_abe linux: -7 # build_abe glibc: -6 # build_abe stage2 -- --set gcc_override_configure=--disable-libsanitizer: -5 # true: 0 # benchmark -O2_LTO -- artifacts/build-5715c29e91076800418833f2196f2082f439da75/results_id: 1
Artifacts of last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-master-aar... Results ID of last_good: tx1_64/tcwg_bmk_gnu_tx1/bisect-gnu-master-aarch64-spec2k6-O2_LTO/461 Artifacts of first_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-master-aar... Results ID of first_bad: tx1_64/tcwg_bmk_gnu_tx1/bisect-gnu-master-aarch64-spec2k6-O2_LTO/470 Build top page/logs: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-master-aar...
Configuration details:
Reproduce builds:
<cut> mkdir investigate-glibc-f79f2065817e080f65f3c3a2fee966f5a97f1746 cd investigate-glibc-f79f2065817e080f65f3c3a2fee966f5a97f1746
git clone https://git.linaro.org/toolchain/jenkins-scripts
mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-master-aar... --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-master-aar... --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-master-aar... --fail chmod +x artifacts/test.sh
# Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh
cd glibc
# Reproduce first_bad build git checkout --detach f79f2065817e080f65f3c3a2fee966f5a97f1746 ../artifacts/test.sh
# Reproduce last_good build git checkout --detach 5715c29e91076800418833f2196f2082f439da75 ../artifacts/test.sh
cd ..
</cut>
History of pending regressions and results: https://git.linaro.org/toolchain/ci/base-artifacts.git/log/?h=linaro-local/c...
Artifacts: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-master-aar... Build log: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-master-aar...
Full commit (up to 1000 lines):
<cut> commit f79f2065817e080f65f3c3a2fee966f5a97f1746 Author: Florian Weimer <fweimer@redhat.com> Date: Wed Apr 21 19:49:50 2021 +0200
nptl: Move legacy unwinding implementation into libc
It is still used internally. Since unwinding is now available unconditionally, avoid indirect calls through function pointers loaded from the stack by inlining the non-cancellation cleanup code. This avoids a regression in security hardening.
The out-of-line __libc_cleanup_routine implementation is no longer needed because the inline definition is now static __always_inline.
Reviewed-by: Adhemerval Zanella adhemerval.zanella@linaro.org
nptl/Versions | 2 ++ nptl/cleanup_defer_compat.c | 56 ++--------------------------------- nptl/libc-cleanup.c | 64 ++++++++++++++++++++++++++++++++++++++-- nptl/nptl-init.c | 2 -- sysdeps/nptl/libc-lock.h | 59 ++++++++++++++++++------------------ sysdeps/nptl/libc-lockP.h | 26 +--------------- sysdeps/nptl/pthread-functions.h | 4 --- 7 files changed, 97 insertions(+), 116 deletions(-)
diff --git a/nptl/Versions b/nptl/Versions index 60202b4969..2e5a964b11 100644 --- a/nptl/Versions +++ b/nptl/Versions @@ -87,6 +87,8 @@ libc { __futex_abstimed_wait64; __futex_abstimed_wait_cancelable64; __libc_alloca_cutoff;
- __libc_cleanup_pop_restore;
- __libc_cleanup_push_defer; __libc_dl_error_tsd; __libc_pthread_init; __lll_clocklock_elision;
diff --git a/nptl/cleanup_defer_compat.c b/nptl/cleanup_defer_compat.c index 49ef53ea60..1957318208 100644 --- a/nptl/cleanup_defer_compat.c +++ b/nptl/cleanup_defer_compat.c @@ -17,41 +17,15 @@ https://www.gnu.org/licenses/. */
#include "pthreadP.h"
+#include <libc-lock.h>
void _pthread_cleanup_push_defer (struct _pthread_cleanup_buffer *buffer, void (*routine) (void *), void *arg) {
- struct pthread *self = THREAD_SELF;
- buffer->__routine = routine; buffer->__arg = arg;
- buffer->__prev = THREAD_GETMEM (self, cleanup);
- int cancelhandling = THREAD_GETMEM (self, cancelhandling);
- /* Disable asynchronous cancellation for now. */
- if (__glibc_unlikely (cancelhandling & CANCELTYPE_BITMASK))
- while (1)
{
- int curval = THREAD_ATOMIC_CMPXCHG_VAL (self, cancelhandling,
cancelhandling
& ~CANCELTYPE_BITMASK,
cancelhandling);
- if (__glibc_likely (curval == cancelhandling))
/* Successfully replaced the value. */
break;
- /* Prepare for the next round. */
- cancelhandling = curval;
}
- buffer->__canceltype = (cancelhandling & CANCELTYPE_BITMASK
? PTHREAD_CANCEL_ASYNCHRONOUS
: PTHREAD_CANCEL_DEFERRED);
- THREAD_SETMEM (self, cleanup, buffer);
- __libc_cleanup_push_defer (buffer);
} strong_alias (_pthread_cleanup_push_defer, __pthread_cleanup_push_defer)
@@ -60,31 +34,7 @@ void _pthread_cleanup_pop_restore (struct _pthread_cleanup_buffer *buffer, int execute) {
- struct pthread *self = THREAD_SELF;
- THREAD_SETMEM (self, cleanup, buffer->__prev);
- int cancelhandling;
- if (__builtin_expect (buffer->__canceltype != PTHREAD_CANCEL_DEFERRED, 0)
&& ((cancelhandling = THREAD_GETMEM (self, cancelhandling))
& CANCELTYPE_BITMASK) == 0)
- {
while (1)
- {
int curval = THREAD_ATOMIC_CMPXCHG_VAL (self, cancelhandling,
cancelhandling
| CANCELTYPE_BITMASK,
cancelhandling);
if (__glibc_likely (curval == cancelhandling))
/* Successfully replaced the value. */
break;
/* Prepare for the next round. */
cancelhandling = curval;
- }
CANCELLATION_P (self);
- }
__libc_cleanup_pop_restore (buffer);
/* If necessary call the cleanup routine after we removed the current cleanup block from the list. */
diff --git a/nptl/libc-cleanup.c b/nptl/libc-cleanup.c index 61f97eceda..14ccfe9285 100644 --- a/nptl/libc-cleanup.c +++ b/nptl/libc-cleanup.c @@ -17,11 +17,69 @@ https://www.gnu.org/licenses/. */
#include "pthreadP.h" +#include <tls.h> +#include <libc-lock.h>
+void +__libc_cleanup_push_defer (struct _pthread_cleanup_buffer *buffer) +{
- struct pthread *self = THREAD_SELF;
- buffer->__prev = THREAD_GETMEM (self, cleanup);
- int cancelhandling = THREAD_GETMEM (self, cancelhandling);
- /* Disable asynchronous cancellation for now. */
- if (__glibc_unlikely (cancelhandling & CANCELTYPE_BITMASK))
- while (1)
{
- int curval = THREAD_ATOMIC_CMPXCHG_VAL (self, cancelhandling,
cancelhandling
& ~CANCELTYPE_BITMASK,
cancelhandling);
- if (__glibc_likely (curval == cancelhandling))
/* Successfully replaced the value. */
break;
- /* Prepare for the next round. */
- cancelhandling = curval;
}
- buffer->__canceltype = (cancelhandling & CANCELTYPE_BITMASK
? PTHREAD_CANCEL_ASYNCHRONOUS
: PTHREAD_CANCEL_DEFERRED);
- THREAD_SETMEM (self, cleanup, buffer);
+} +libc_hidden_def (__libc_cleanup_push_defer)
void -__libc_cleanup_routine (struct __pthread_cleanup_frame *f) +__libc_cleanup_pop_restore (struct _pthread_cleanup_buffer *buffer) {
- if (f->__do_it)
- f->__cancel_routine (f->__cancel_arg);
- struct pthread *self = THREAD_SELF;
- THREAD_SETMEM (self, cleanup, buffer->__prev);
- int cancelhandling;
- if (__builtin_expect (buffer->__canceltype != PTHREAD_CANCEL_DEFERRED, 0)
&& ((cancelhandling = THREAD_GETMEM (self, cancelhandling))
& CANCELTYPE_BITMASK) == 0)
- {
while (1)
- {
int curval = THREAD_ATOMIC_CMPXCHG_VAL (self, cancelhandling,
cancelhandling
| CANCELTYPE_BITMASK,
cancelhandling);
if (__glibc_likely (curval == cancelhandling))
/* Successfully replaced the value. */
break;
/* Prepare for the next round. */
cancelhandling = curval;
- }
CANCELLATION_P (self);
- }
} +libc_hidden_def (__libc_cleanup_pop_restore) diff --git a/nptl/nptl-init.c b/nptl/nptl-init.c index 2c7e2222d4..43e2564e59 100644 --- a/nptl/nptl-init.c +++ b/nptl/nptl-init.c @@ -95,8 +95,6 @@ static const struct pthread_functions pthread_functions = .ptr___pthread_key_create = __pthread_key_create, .ptr___pthread_getspecific = __pthread_getspecific, .ptr___pthread_setspecific = __pthread_setspecific,
- .ptr__pthread_cleanup_push_defer = __pthread_cleanup_push_defer,
- .ptr__pthread_cleanup_pop_restore = __pthread_cleanup_pop_restore, .ptr_nthreads = &__nptl_nthreads, .ptr___pthread_unwind = &__pthread_unwind, .ptr__nptl_deallocate_tsd = __nptl_deallocate_tsd,
diff --git a/sysdeps/nptl/libc-lock.h b/sysdeps/nptl/libc-lock.h index dea96121b3..e8a5e68a12 100644 --- a/sysdeps/nptl/libc-lock.h +++ b/sysdeps/nptl/libc-lock.h @@ -143,39 +143,40 @@ typedef struct __libc_lock_recursive_opaque__ __libc_lock_recursive_t; __libc_maybe_call (__pthread_mutex_unlock, (&(NAME).mutex), 0) #endif
-/* Note that for I/O cleanup handling we are using the old-style
- cancel handling. It does not have to be integrated with C++ since
- no C++ code is called in the middle. The old-style handling is
- faster and the support is not going away. */
-extern void _pthread_cleanup_push_defer (struct _pthread_cleanup_buffer *buffer,
void (*routine) (void *), void *arg);
-extern void _pthread_cleanup_pop_restore (struct _pthread_cleanup_buffer *buffer,
int execute);
+/* Put the unwind buffer BUFFER on the per-thread callback stack. The
- caller must fill BUFFER->__routine and BUFFER->__arg before calling
- this function. */
+void __libc_cleanup_push_defer (struct _pthread_cleanup_buffer *buffer); +libc_hidden_proto (__libc_cleanup_push_defer) +/* Remove BUFFER from the unwind callback stack. The caller must invoke
- the callback if desired. */
+void __libc_cleanup_pop_restore (struct _pthread_cleanup_buffer *buffer); +libc_hidden_proto (__libc_cleanup_pop_restore)
/* Start critical region with cleanup. */ -#define __libc_cleanup_region_start(DOIT, FCT, ARG) \
- { struct _pthread_cleanup_buffer _buffer; \
- int _avail; \
- if (DOIT) { \
_avail = PTFAVAIL (_pthread_cleanup_push_defer); \
if (_avail) { \
- __libc_ptf_call_always (_pthread_cleanup_push_defer, (&_buffer, FCT, \
ARG)); \
} else { \
- _buffer.__routine = (FCT); \
- _buffer.__arg = (ARG); \
} \
- } else { \
_avail = 0; \
- }
+#define __libc_cleanup_region_start(DOIT, FCT, ARG) \
- { bool _cleanup_start_doit; \
- struct _pthread_cleanup_buffer _buffer; \
- /* Non-addressable copy of FCT, so that we avoid indirect calls on \
the non-unwinding path. */ \
- void (*_cleanup_routine) (void *) = (FCT); \
- _buffer.__arg = (ARG); \
- if (DOIT) \
- { \
_cleanup_start_doit = true; \
_buffer.__routine = _cleanup_routine; \
__libc_cleanup_push_defer (&_buffer); \
- } \
- else \
_cleanup_start_doit = false;
/* End critical region with cleanup. */ -#define __libc_cleanup_region_end(DOIT) \
- if (_avail) { \
__libc_ptf_call_always (_pthread_cleanup_pop_restore, (&_buffer, DOIT));\
- } else if (DOIT) \
_buffer.__routine (_buffer.__arg); \
- }
+#define __libc_cleanup_region_end(DOIT) \
- if (_cleanup_start_doit) \
- __libc_cleanup_pop_restore (&_buffer); \
- if (DOIT) \
- _cleanup_routine (_buffer.__arg); \
- } /* matches __libc_cleanup_region_start */
/* Hide the definitions which are only supposed to be used inside libc in diff --git a/sysdeps/nptl/libc-lockP.h b/sysdeps/nptl/libc-lockP.h index 63b605dee2..2c928b7a76 100644 --- a/sysdeps/nptl/libc-lockP.h +++ b/sysdeps/nptl/libc-lockP.h @@ -251,32 +251,12 @@ _Static_assert (LLL_LOCK_INITIALIZER == 0, "LLL_LOCK_INITIALIZER != 0"); /* Get once control variable. */ #define __libc_once_get(ONCE_CONTROL) ((ONCE_CONTROL) != PTHREAD_ONCE_INIT)
-/* Note that for I/O cleanup handling we are using the old-style
- cancel handling. It does not have to be integrated with C++ snce
- no C++ code is called in the middle. The old-style handling is
- faster and the support is not going away. */
-extern void _pthread_cleanup_push (struct _pthread_cleanup_buffer *buffer,
void (*routine) (void *), void *arg);
-extern void _pthread_cleanup_pop (struct _pthread_cleanup_buffer *buffer,
int execute);
-extern void _pthread_cleanup_push_defer (struct _pthread_cleanup_buffer *buffer,
void (*routine) (void *), void *arg);
-extern void _pthread_cleanup_pop_restore (struct _pthread_cleanup_buffer *buffer,
int execute);
-/* Sometimes we have to exit the block in the middle. */ -#define __libc_cleanup_end(DOIT) \
- if (_avail) { \
__libc_ptf_call_always (_pthread_cleanup_pop_restore, (&_buffer, DOIT));\
- } else if (DOIT) \
_buffer.__routine (_buffer.__arg)
/* __libc_cleanup_push and __libc_cleanup_pop depend on exception handling and stack unwinding. */ #ifdef __EXCEPTIONS
/* Normal cleanup handling, based on C cleanup attribute. */ -__extern_inline void +static __always_inline void __libc_cleanup_routine (struct __pthread_cleanup_frame *f) { if (f->__do_it) @@ -388,8 +368,6 @@ weak_extern (__pthread_once) weak_extern (__pthread_initialize) weak_extern (__pthread_atfork) weak_extern (__pthread_setcancelstate) -weak_extern (_pthread_cleanup_push_defer) -weak_extern (_pthread_cleanup_pop_restore) # else # pragma weak __pthread_mutex_init # pragma weak __pthread_mutex_destroy @@ -412,8 +390,6 @@ weak_extern (_pthread_cleanup_pop_restore) # pragma weak __pthread_initialize # pragma weak __pthread_atfork # pragma weak __pthread_setcancelstate -# pragma weak _pthread_cleanup_push_defer -# pragma weak _pthread_cleanup_pop_restore # endif #endif
diff --git a/sysdeps/nptl/pthread-functions.h b/sysdeps/nptl/pthread-functions.h index 97a5c48939..4268084b66 100644 --- a/sysdeps/nptl/pthread-functions.h +++ b/sysdeps/nptl/pthread-functions.h @@ -57,10 +57,6 @@ struct pthread_functions int (*ptr___pthread_key_create) (pthread_key_t *, void (*) (void *)); void *(*ptr___pthread_getspecific) (pthread_key_t); int (*ptr___pthread_setspecific) (pthread_key_t, const void *);
- void (*ptr__pthread_cleanup_push_defer) (struct _pthread_cleanup_buffer *,
void (*) (void *), void *);
- void (*ptr__pthread_cleanup_pop_restore) (struct _pthread_cleanup_buffer *,
int);
#define HAVE_PTR_NTHREADS unsigned int *ptr_nthreads; void (*ptr___pthread_unwind) (__pthread_unwind_buf_t *)
</cut>
* Maxim Kuvyrkov:
This is a report from Linaro’s automatic benchmarking system, which tracks upstream components and automatically bisects performance regressions down to a single commit. We have recently enabled these emails to be sent to patch authors automatically.
It appears that after your patch SPEC CPU2006’s 410.bwaves slowed down by 4% on aarch64-linux-gnu when built with -O2 -flto. Any theories on what could’ve caused it?
Does the benchmark link against libpthread? (It doesn't matter if it is actually multi-threaded, only linking counts.)
If it does not link against libpthread, then the changes may have introduced additional atomic instructions. However, Adhemerval removed these atomic instructions again from the current development version. So if they were the cause, performance should be back to the baseline.
But I find it somewhat unlikely that the benchmark exercises this code.
This is one of the first reports, and we welcome feedback on improving reporting template.
Perhaps indicate more clearly that it's about a performance regression.
In the end, someone familiar with the benchmark and experience in performance work needs to look at differences in execution profiles, I expect.
Thanks, Florian
linaro-toolchain@lists.linaro.org