This is an automated email from the git hooks/post-receive script.
bernie.ogden pushed a commit to branch bernie/jenkmarking-multinode in repository toolchain/abe.
commit c5af45f0b4dc1b46bc0c5b1640a28360c3c14741 Author: Bernard Ogden bernie.ogden@linaro.org Date: Thu Jul 23 13:49:58 2015 +0200
Remove last listener
Remove a bunch of support code that was only there for the listener, too.
We replace this with code that sends a ping when the benchmark run finishes on a target. On receipt of ping, the host will try to get the benchmark data from the target that sent the ping (it assumes that the hostname (or IP address, for nameless networks) is stable). For protection against connecting to the wrong target, we require that the public key for each target is in known_hosts before benchmarking begins, and we set StrictHostKeyAuthentication to yes.
On the plus side, this is less complicated and thus less error-prone and more maintainable. It's also more secure, in that our only service is now ssh - not ssh + netcat.
On the minus side, it is less capable. We confine the scripts to only working: 1) Within a single network (all machines must be able to ping one another) 2) With reliable addresses (stable IP or hostname)
If these conditions are not met, failure is reasonably visible: the host will either not notice that the target completed its benchmark (but the user might), or the host will get a message that a target completed the benchmark from a wrong IP address and thus fail to connect to it. Either way, the target will stay up, and so data can be collected 'by hand', provided the necessary private key is availble.
Change-Id: Ieb9393a2b9d665eea268dab1ab24232e213360cc --- scripts/benchutil.sh | 107 ------------------------- scripts/establish_listener.sh | 181 ------------------------------------------ scripts/runbenchmark.sh | 99 ++++------------------- 3 files changed, 15 insertions(+), 372 deletions(-)
diff --git a/scripts/benchutil.sh b/scripts/benchutil.sh deleted file mode 100644 index 94cf75a..0000000 --- a/scripts/benchutil.sh +++ /dev/null @@ -1,107 +0,0 @@ -set -o pipefail -set -o nounset - -function get_addr -{ - local hostname="${1:-localhost}" - local listener_addr - if test x"${hostname}" = xlocalhost; then - listener_addr="`hostname -I`" - else - listener_addr="`ssh -F /dev/null -o PasswordAuthentication=no -o PubkeyAuthent [...] - fi - if test $? -ne 0; then - echo "Failed to run 'hostname -I' on ${hostname}" 1>&2 - return 1 - fi - - #We only try to support IPv4 for now - listener_addr="`echo ${listener_addr} | tr ' ' '\n' | grep '^([[:digit:]]{1,3\ [...] - if test $? -ne 0; then - echo "No IPv4 address found, aborting" 1>&2 - return 1 - fi - - #We don't try to figure out what'll happen if we've got multiple IPv4 interfaces - if test "`echo ${listener_addr} | wc -w`" -ne 1; then - echo "Multiple IPv4 addresses found, aborting" 1>&2 - return 1 - fi - - echo "${listener_addr}" | sed 's/^[[:blank:]]*//' | sed 's/[[:blank:]]*$//' -} - -#Attempt to use read to discover whether there is a record to read from the producer -#If we time out, check to see whether the producer still seems to be alive. -#We can check more than one pid, if we have visibility of some other process(s) that we -#would also like to monitor while we wait to read something. If any of these -#processes seem dead then return 2, otherwise keep trying to read. -#Once we've established that there seems to be a record, try to read it with a -#read timeout. If we fail to read within read_timeout, return 1 to indicate -#read failure - but the producer may still be alive in this case. -#Can also set a deadline - bgread will return 3 if it hasn't seen any new output -#before deadline expires. deadline is only checked at read_timeout intervals. -#Typical invocation: foo="`bgread ${child_pid} <&3`" -#Invocation with read checks every 5 seconds, failure after 2 minutes and two -#pids to monitor: -#foo="`bgread -T 120 -t 5 ${child_pid} ${other_pid} <&3`" -function bgread -{ - OPTIND=1 - local read_timeout=60 - local deadline= - local pid - - while getopts T:t: flag; do - case "${flag}" in - t) read_timeout="${OPTARG}";; - T) deadline="$((${OPTARG} + `date +%s`))";; - *) - echo "Bad arg" 1>&2 - return 1 - ;; - esac - done - shift $((OPTIND - 1)) - - if test $# -eq 0; then - echo "No pid(s) to poll!" 1>&2 - return 1 - fi - local buffer='' - local line='' - - #We have to be careful here. If the read times out when there was a partial - #record on the fifo then the part that has been read just gets discarded. We - #can avoid this problem by using -N to ensure that we read the minimal amount - #and DO NOT discard it. -N 0 might be intuited to do the right thing, but is - #arguably undefined behaviour and empirically doesn't work. - #Read 1 char then timeout if it isn't a delimiter: buffer is the char, exit code 0 OR - #Read the delimiter, don't timeout: buffer is empty, exit code 0 OR - #Fail to read any chars coz there aren't any, then timeout: buffer is empty, exi [...] - while ! read -N 1 -t "${read_timeout}" buffer; do - for pid in "$@"; do - kill -0 "${pid}" > /dev/null 2>&1 || return 2 - done - if test x"${deadline:-}" != x; then - if test `date +%s` -ge ${deadline}; then - echo "bgread timed out" 1>&2 - return 3 - fi - fi - done - - #If we get here, we managed to read 1 char. If we have a null string just - #return it (the record was empty). Otherwise, assume the rest of the record is r [...] - #especially within the generous timeout that we allow. - if test x"${buffer:-}" != x; then - read -t 60 line - if test $? -ne 0; then - echo "Record did not complete" 1>&2 - return 1 - fi - fi - echo "${buffer}${line}" - return 0 -} - diff --git a/scripts/establish_listener.sh b/scripts/establish_listener.sh deleted file mode 100755 index 5266faa..0000000 --- a/scripts/establish_listener.sh +++ /dev/null @@ -1,181 +0,0 @@ -#!/bin/bash -set -o pipefail -set -o nounset - -# load the configure file produced by configure -if test -e "${PWD}/host.conf"; then - . "${PWD}/host.conf" -else - echo "ERROR: no host.conf file! Did you run configure?" 1>&2 - exit 1 -fi -topdir="${abe_path}" #abe global, but this should be the right value for abe -. "${topdir}"/scripts/benchutil.sh -if test $? -ne 0; then - echo "Unable to source ${topdir}/benchutil.sh" 1>&2 - exit 1 -fi - -trap cleanup EXIT -trap 'exit ${error}' TERM INT HUP QUIT - -error=1 -listener_pid= -forward_pid= -pseudofifo_pid= -temps="`mktemp -dt XXXXXXXXX`" || exit 1 -listener_file="${temps}/listener_file" -gateway= - -function cleanup -{ - error=$? - if test x"${listener_pid:-}" != x; then - kill "${listener_pid}" 2>/dev/null - wait "${listener_pid}" - fi - if test x"${pseudofifo_pid:-}" != x; then - kill "${pseudofifo_pid}" 2>/dev/null - #Substituted process is not our child and cannot be waited on. Fortunately, - #it doesn't matter too much when it dies. - fi - if test x"${forward_pid:-}" != x; then - kill "${forward_pid}" 2>/dev/null - wait "${forward_pid}" - fi - if test -d "${temps}"; then - exec {pseudofifo_handle}<&- - rm -rf ${temps} - if test $? -ne 0; then - echo "Failed to delete temporary file store ${temps}" 1>&2 - fi - error=1 - fi - exit "${error}" -} - -#A fifo would make much more sense, but nc doesn't like it -touch "${listener_file}" -if test $? -ne 0; then - echo "Failed to create listener file '${listener_file}'" 1>&2 - exit 1 -fi - -#The trap is just to suppress the 'Terminated' message -exec {pseudofifo_handle}< <(trap 'exit' TERM; tail -f "${listener_file}"& echo $! [...] -read -t 60 pseudofifo_pid <&"${pseudofifo_handle}" -if test $? -ne 0; then - echo "Failed to read pseudofifo pid" 1>&2 - exit 1 -fi - -forward_fifo="${temps}/forward_fifo" -mkfifo "${forward_fifo}" || exit 1 - -while getopts f: flag; do - case "${flag}" in - f) gateway="${OPTARG}" ;; - *) - echo "Bad arg" 1>&2 - exit 1 - ;; - esac -done -shift $((OPTIND - 1)) -if test $# -ne 3; then - echo "establish_listener needs 3 args, got $#" 1>&2 - for arg in "$@"; do echo "${arg}" 1>&2; done - exit 1 -fi -if test x"${gateway:-}" != x; then - if ! echo "${gateway}" | grep -q '.+:.+'; then - echo "If specifying a gateway to forward through, must be in format 'internal_ [...] - echo "Got: ${gateway}" 1>&2 - exit 1 - fi -fi - -listener_addr="$1" -ping -c 1 "${listener_addr}" > /dev/null -if test $? -ne 0; then - echo "Unable to ping host ${listener_addr}" 1>&2 - exit 1 -fi -start_port="$2" -end_port="$3" - -for ((listener_port=${start_port}; listener_port < ${end_port}; listener_port++)); do - #Try to listen on the port. nc will fail if something has snatched it. - echo "Attempting to establish listener on ${listener_addr}:${listener_port}" 1>&2 - - if test -e /bin/nc.traditional; then - /bin/nc.traditional -l -s "${listener_addr}" -p "${listener_port}" >> "${liste [...] - elif test -e /bin/nc.openbsd; then - /bin/nc.openbsd -l "${listener_addr}" "${listener_port}" >> "${liste [...] - else - echo "Unable to identify netcat" 1>&2 - exit 1 - fi - listener_pid=$! - - #nc doesn't confirm that it's got the port, so we spin until either: - #1) We see that the port has been taken by our process - #2) We see our process exit (implying that the port was taken) - #3) We've waited long enough - #(listener_pid can't be reused until we wait on it) - for j in {1..5}; do - if test x"`lsof -i tcp@${listener_addr}:${listener_port} | sed 1d | awk '{prin [...] - break 2; #success, exit outer loop - elif ! ps -p "${listener_pid}" > /dev/null; then - #listener has exited, reap it and go back to start of outer loop - wait "${listener_pid}" - listener_pid= - continue 2 - else - sleep 1 - fi - done - - #We gave up waiting, kill and reap the nc process - kill "${listener_pid}" - wait "${listener_pid}" - listener_pid= -done - -if test x"${listener_pid:-}" = x; then - echo "Failed to find a free port in range ${start_port}-${end_port}" 1>&2 - exit 1 -fi - -if test x"${gateway:-}" != x; then - internal_interface="${gateway/%:*}" - external_interface="${gateway/#*:}" - ssh -F /dev/null -o PasswordAuthentication=no -o PubkeyAuthentication=yes -NR ${ [...] - forward_pid=$! - read -t 30 line < "${forward_fifo}" - if test $? -ne 0; then - echo "Timeout while establishing port forward" 1>&2 - exit 1 - fi - if echo ${line} | grep -q "^Allocated port [[:digit:]]\+ for remote forward to [...] - listener_port="`echo ${line} | cut -d ' ' -f 3`" - else - echo "Unable to get port forwarded for listener" 1>&2 - echo "Tried: ssh -F /dev/null -o PasswordAuthentication=no -o PubkeyAuthentica [...] - echo "Got: $line" 1>&2 - exit 1 - fi - listener_addr="`get_addr ${external_interface}`" || exit 1 -fi - -echo "${listener_addr}" -echo "${listener_port}" - -while true; do - line="`bgread ${pseudofifo_pid} <&${pseudofifo_handle}`" - if test $? -ne 0; then - echo "Failed to read pseudofifo pid" 1>&2 - exit 1 - fi - echo $line -done diff --git a/scripts/runbenchmark.sh b/scripts/runbenchmark.sh index 6e3c487..cbaff46 100755 --- a/scripts/runbenchmark.sh +++ b/scripts/runbenchmark.sh @@ -7,9 +7,12 @@ set -o nounset trap clean_benchmark EXIT trap 'exit ${error}' TERM INT HUP QUIT
+#Precondition: the target is in known_hosts +ssh_opts="-F /dev/null -o StrictHostKeyChecking=yes -o CheckHostIP=yes" +host_ip="`hostname -I | tr -d '[[:space:]]'`" #hostname -I includes a trailing space + tag= session_pid= -listener_pid= benchmark= device= keep= @@ -61,23 +64,12 @@ if test $? -ne 0; then exit 1 fi
-. "${topdir}"/scripts/benchutil.sh -if test $? -ne 0; then - echo "+++ Unable to source ${topdir}/benchutil.sh" 1>&2 - exit 1 -fi . "${confdir}/${device}.conf" #We can't use abe's source_config here as it require [...] if test $? -ne 0; then echo "+++ Failed to source ${confdir}/${device}.conf" 1>&2 exit 1 fi
-temps="`mktemp -dt XXXXXXXXX`" || exit 1 -listener_file="${temps}/listener_file" -listener_fifo="${temps}/listener_fifo" -mkfifo "${listener_fifo}" || exit 1 -exec {listener_handle}<>${listener_fifo} - #Make sure that subscripts clean up - we must not leave benchmark sources or data [...] clean_benchmark() { @@ -111,47 +103,9 @@ clean_benchmark() wait "${session_pid}" fi
- if test x"${listener_pid:-}" != x; then - kill "${listener_pid}" 2>/dev/null - wait "${listener_pid}" - fi - - if test -d "${temps}"; then - exec {listener_handle}>&- - exec {listener_handle}<&- - rm -rf "${temps}" - if test $? -ne 0; then - echo "Failed to delete ${temps}" 1>&2 - error=1 - fi - fi - exit "${error}" }
-ssh_opts="-F /dev/null" -establish_listener_opts= - -#Set up our listener -listener_addr="`get_addr`" -if test $? -ne 0; then - echo "Unable to get IP for listener" 1>&2 - exit 1 -fi -"${topdir}"/scripts/establish_listener.sh ${establish_listener_opts} "${listener_a [...] -listener_pid=$! -listener_addr="`bgread -T 60 ${listener_pid} <&${listener_handle}`" -if test $? -ne 0; then - echo "Failed to read listener address" 1>&2 - exit 1 -fi -listener_port="`bgread -T 60 ${listener_pid} <&${listener_handle}`" -if test $? -ne 0; then - echo "Failed to read listener port" 1>&2 - exit 1 -fi -echo "Listener ${listener_addr}:${listener_port}" - if ! (. "${topdir}"/lib/common.sh; remote_exec "${ip}" true ${ssh_opts}) > /dev/nu [...] echo "Unable to connect to target ${ip:-(unknown)} after boot" 1>&2 exit 1 @@ -209,8 +163,6 @@ if test x"${uncontrolled:-}" = xyes; then flags="-u" fi
-#TODO: Repetition of hostname echoing is ugly, but seems to be needed - -# perhaps there is some delay after the interface comes up ( pids=() cleanup() @@ -234,23 +186,13 @@ fi cd `tar tjf ${buildtar} | head -n1` && \ ../controlledrun.sh ${cautious} ${flags} -l ${tee_output} -- ./linarobench.sh [...] ret=\$?; \ - listener_found=0; \ - for i in {1..10}; do \ - if ping -c 1 ${listener_addr}; then \ - listener_found=1; \ + while true; do \ + if ping -i 11 -c 1 ${host_ip}; then \ break; \ fi \ done; \ - if test \${listener_found} -eq 1; then \ - echo "\${USER}@\`/sbin/ifconfig eth0 | grep 'inet addr' | sed 's/[^:]* [...] - if test \${ret} -eq 0; then \ - true; \ - else \ - false; \ - fi \ - else \ - false; \ - fi" \ + echo \${ret} > ${target_dir}/RETCODE \ + exit \${ret}" \ "${target_dir}/stdout" "${target_dir}/stderr" \ ${ssh_opts} if test $? -ne 0; then @@ -265,28 +207,17 @@ fi )& session_pid=$!
-#No sense in setting a deadline on this one, it's order of days for many cases -ip="`bgread ${listener_pid} <&${listener_handle}`" -if test $? -ne 0; then - echo "Failed to read post-benchmark-run IP" 1>&2 - exit 1 -fi - -error="`echo ${ip} | sed 's/.*://'`" +#Wait for a ping from the target +#This assumes that the target's identifier does not change +#This should hold for name in a DNS network, but not necessarily for IP +#Today LAVA lab does not provide DNS, but IP seems stable in practice +#Rather than work around lack of DNS, just make sure we notice if the IP changes +while ! tcpdump -c 1 -i eth0 'icmp and icmp[icmptype]=icmp-echo' | grep -q "${ip} [...] +error="`(. ${topdir}/lib/common.sh; remote_exec "${ip}" 'cat ${target_dir}/RETCODE [...] if test $? -ne 0; then echo "Unable to determine exit code, assuming the worst." 1>&2 error=1 fi -ip="`echo ${ip} | sed 's/:.*//' | sed 's/\s*$//'`" -if test $? -ne 0; then - echo "Unable to determine IP, giving up." 1>&2 - exit 1 -fi - -if ! (. "${topdir}"/lib/common.sh; remote_exec "${ip}" true ${ssh_opts}) > /dev/nu [...] - echo "Unable to connect to target after ${ip:-(unknown)} benchmark run" 1>&2 - exit 1 -fi
if test ${error} -ne 0; then echo "Command failed: will try to get logs" 1>&2