For cases like IPv6 addresses, having a means to supply tracing predicates for fields with more than 8 bytes would be convenient. This series provides a simple way to support this by allowing simple ==, != memory comparison with the predicate supplied when the size of the field exceeds 8 bytes. For example, to trace ::1, the predicate
"dst == 0x00000000000000000000000000000001"
..could be used. Patch 1 implements this.
As a convenience, support for IPv4, IPv6 and MAC addresses are also included; patches 2-4 cover these and allow simpler comparisons which do not require getting the exact number of bytes right; for exmaple
"dst == ::1" "src != 127.0.0.1" "mac_addr == ab:cd:ef:01:23:45"
Patch 5 adds tests for existing and new filter predicates, and patch 6 documents the fact that for the various addresses supported and the >8 byte memory comparison. only == and != are supported.
Changes since v1 [1]:
- added support for IPv4, IPv6 and MAC addresses (patches 2-4) (Masami and Steven) - added selftests for IPv4, IPv6 and MAC addresses and updated docs accordingly (patches 5,6)
Changes since RFC [2]:
- originally a fix was intermixed with the new functionality as patch 1 in series [2]; the fix landed separately - small tweaks to how filter predicates are defined via fn_num as opposed to via fn directly
[1] https://lore.kernel.org/linux-trace-kernel/1682414197-13173-1-git-send-email... [22] https://lore.kernel.org/lkml/1659910883-18223-1-git-send-email-alan.maguire@...
Alan Maguire (6): tracing: support > 8 byte array filter predicates tracing: support IPv4 address filter predicate tracing: support IPv6 filter predicates tracing: support MAC address filter predicates selftests/ftrace: add test coverage for filter predicates tracing: document IPv4, IPv6, MAC address and > 8 byte numeric filtering support
Documentation/trace/events.rst | 21 +++ kernel/trace/trace_events_filter.c | 164 +++++++++++++++++- .../selftests/ftrace/test.d/event/filter.tc | 91 ++++++++++ 3 files changed, 275 insertions(+), 1 deletion(-) create mode 100644 tools/testing/selftests/ftrace/test.d/event/filter.tc
For > 8 byte values, allow simple binary '==', '!=' predicates where the user passes in a hex ASCII representation of the desired value. The representation must match the field size exactly, and a simple memory comparison between predicate and actual value is carried out. For example:
cd /sys/kernel/debug/tracing/events/tcp/tcp_receive_reset echo "saddr_v6 == 0x00000000000000000000000000000001" > filter
Signed-off-by: Alan Maguire alan.maguire@oracle.com --- kernel/trace/trace_events_filter.c | 54 +++++++++++++++++++++++++++++- 1 file changed, 53 insertions(+), 1 deletion(-)
diff --git a/kernel/trace/trace_events_filter.c b/kernel/trace/trace_events_filter.c index 1dad64267878..64f1dfb72cb5 100644 --- a/kernel/trace/trace_events_filter.c +++ b/kernel/trace/trace_events_filter.c @@ -67,6 +67,7 @@ enum filter_pred_fn { FILTER_PRED_FN_FUNCTION, FILTER_PRED_FN_, FILTER_PRED_TEST_VISITED, + FILTER_PRED_FN_MEMCMP, };
struct filter_pred { @@ -622,8 +623,11 @@ predicate_parse(const char *str, int nr_parens, int nr_preds, kfree(op_stack); kfree(inverts); if (prog_stack) { - for (i = 0; prog_stack[i].pred; i++) + for (i = 0; prog_stack[i].pred; i++) { + if (prog_stack[i].pred->fn_num == FILTER_PRED_FN_MEMCMP) + kfree((u8 *)(uintptr_t)(prog_stack[i].pred->val)); kfree(prog_stack[i].pred); + } kfree(prog_stack); } return ERR_PTR(ret); @@ -890,6 +894,14 @@ static int filter_pred_function(struct filter_pred *pred, void *event) return pred->op == OP_EQ ? ret : !ret; }
+static int filter_pred_memcmp(struct filter_pred *pred, void *event) +{ + u8 *mem = (u8 *)(event + pred->offset); + u8 *cmp = (u8 *)(uintptr_t)(pred->val); + + return (memcmp(mem, cmp, pred->field->size) == 0) ^ pred->not; +} + /* * regex_match_foo - Basic regex callbacks * @@ -1353,6 +1365,8 @@ static int filter_pred_fn_call(struct filter_pred *pred, void *event) return filter_pred_function(pred, event); case FILTER_PRED_TEST_VISITED: return test_pred_visited_fn(pred, event); + case FILTER_PRED_FN_MEMCMP: + return filter_pred_memcmp(pred, event); default: return 0; } @@ -1370,6 +1384,7 @@ static int parse_pred(const char *str, void *data, unsigned long size; unsigned long ip; char num_buf[24]; /* Big enough to hold an address */ + u8 *pred_val; char *field_name; char *name; bool function = false; @@ -1631,6 +1646,43 @@ static int parse_pred(const char *str, void *data, /* go past the last quote */ i++;
+ } else if (str[i] == '0' && tolower(str[i + 1]) == 'x' && + field->size > 8) { + /* For sizes > 8 bytes, we store hex bytes for comparison; + * only '==' and '!=' are supported. + * To keep things simple, the predicate value must specify + * a value that matches the field size exactly, with leading + * 0s if necessary. + */ + if (pred->op != OP_EQ && pred->op != OP_NE) { + parse_error(pe, FILT_ERR_ILLEGAL_FIELD_OP, pos + i); + goto err_free; + } + + /* skip required 0x */ + s += 2; + i += 2; + + while (isalnum(str[i])) + i++; + + len = i - s; + if (len != (field->size * 2)) { + parse_error(pe, FILT_ERR_ILLEGAL_FIELD_OP, pos + s); + goto err_free; + } + + pred_val = kzalloc(field->size, GFP_KERNEL); + if (hex2bin(pred_val, str + s, field->size)) { + parse_error(pe, FILT_ERR_ILLEGAL_INTVAL, pos + s); + kfree(pred_val); + goto err_free; + } + pred->val = (u64)pred_val; + pred->fn_num = FILTER_PRED_FN_MEMCMP; + if (pred->op == OP_NE) + pred->not = 1; + } else if (isdigit(str[i]) || str[i] == '-') {
/* Make sure the field is not a string */
Support '==' and '!=' predicates for IPv4 address format; for example
cd /sys/kernel/debug/tracing/events/tcp/tcp_receive_reset echo "saddr == 127.0.0.1" > filter
Signed-off-by: Alan Maguire alan.maguire@oracle.com --- kernel/trace/trace_events_filter.c | 19 +++++++++++++++++++ 1 file changed, 19 insertions(+)
diff --git a/kernel/trace/trace_events_filter.c b/kernel/trace/trace_events_filter.c index 64f1dfb72cb5..d8e08d3c3594 100644 --- a/kernel/trace/trace_events_filter.c +++ b/kernel/trace/trace_events_filter.c @@ -1384,6 +1384,7 @@ static int parse_pred(const char *str, void *data, unsigned long size; unsigned long ip; char num_buf[24]; /* Big enough to hold an address */ + char scratch[4]; /* Big enough to hold an IPv4 address */ u8 *pred_val; char *field_name; char *name; @@ -1646,6 +1647,24 @@ static int parse_pred(const char *str, void *data, /* go past the last quote */ i++;
+ } else if (field->size == 4 && + sscanf(&str[i], "%hhd.%hhd.%hhd.%hhd", + /* assume address in network byte order */ + &scratch[0], &scratch[1], &scratch[2], &scratch[3]) == 4) { + /* For IPv4 addresses, only '==' or '!=' are supported. */ + if (pred->op != OP_EQ && pred->op != OP_NE) { + parse_error(pe, FILT_ERR_ILLEGAL_FIELD_OP, pos + i); + goto err_free; + } + while (isdigit(str[i]) || str[i] == '.') + i++; + pred_val = kzalloc(field->size, GFP_KERNEL); + memcpy(pred_val, scratch, field->size); + pred->val = (u64)pred_val; + pred->fn_num = FILTER_PRED_FN_MEMCMP; + if (pred->op == OP_NE) + pred->not = 1; + } else if (str[i] == '0' && tolower(str[i + 1]) == 'x' && field->size > 8) { /* For sizes > 8 bytes, we store hex bytes for comparison;
Support '==' and '!=' predicates for IPv6 addresses; for example
cd /sys/kernel/debug/tracing/events/tcp/tcp_receive_reset echo "saddr_v6 == ::1" > filter
or equivalently
echo "saddr_v6 == 0:0:0:0:0:0:0:1" > filter
Signed-off-by: Alan Maguire alan.maguire@oracle.com --- kernel/trace/trace_events_filter.c | 73 ++++++++++++++++++++++++++++++ 1 file changed, 73 insertions(+)
diff --git a/kernel/trace/trace_events_filter.c b/kernel/trace/trace_events_filter.c index d8e08d3c3594..e2521574f3c4 100644 --- a/kernel/trace/trace_events_filter.c +++ b/kernel/trace/trace_events_filter.c @@ -1665,6 +1665,79 @@ static int parse_pred(const char *str, void *data, if (pred->op == OP_NE) pred->not = 1;
+ } else if (field->size == 16 && + (str[i] == ':' || + (isalnum(str[i]) && tolower(str[i + 1]) != 'x'))) { + u8 j, gap_size, gap = 0, gap_count = 0, index = 0; + u16 tmp_v6addr[8] = {}; + u16 v6addr[8] = {}; + + /* For IPv6 addresses, only '==' or '!=' are supported. */ + if (pred->op != OP_EQ && pred->op != OP_NE) { + parse_error(pe, FILT_ERR_ILLEGAL_FIELD_OP, pos + i); + goto err_free; + } + /* Store the u16s in the address string consecutively in + * tmp_v6addr while tracking the presence of a "::" (if any) + * in the IPv6 address string; we will use its location + * to determine how many u16s it represents (the gap_size + * below). Only one "::" is allowed in an IPv6 address + * string. + */ + while (isalnum(str[i]) || str[i] == ':') { + switch (str[i]) { + case ':': + i++; + /* mark "::" index by setting gap */ + if (str[i] == ':') { + gap = index; + gap_count++; + i++; + } + if (gap_count > 1) { + parse_error(pe, FILT_ERR_ILLEGAL_FIELD_OP, + pos + s); + goto err_free; + } + break; + default: + if (sscanf(&str[i], "%hx", &tmp_v6addr[index]) != 1) { + parse_error(pe, FILT_ERR_ILLEGAL_FIELD_OP, + pos + s); + goto err_free; + } + index++; + while (isalnum(str[i])) + i++; + break; + } + } + /* The gap_size here represents the number of u16s the "::" + * represents; for ::1 the gap size is 7, for feed::face + * it is 6, etc. + */ + gap_size = 8 - index; + index = 0; + for (j = 0; j < 8; ) { + if (gap_size > 0 && j == gap) { + j += gap_size; + } else { +#ifdef __BIG_ENDIAN + v6addr[j++] = tmp_v6addr[index]; +#else + v6addr[j++] = ((tmp_v6addr[index] & 0xff) << 8) + + ((tmp_v6addr[index] & 0xff00) >> 8); +#endif + index++; + } + } + pred_val = kzalloc(field->size, GFP_KERNEL); + memcpy(pred_val, v6addr, field->size); + pred->val = (u64)pred_val; + pred->fn_num = FILTER_PRED_FN_MEMCMP; + if (pred->op == OP_NE) + pred->not = 1; + } else if (str[i] == '0' && tolower(str[i + 1]) == 'x' && field->size > 8) { /* For sizes > 8 bytes, we store hex bytes for comparison;
BTW, the subjects for the tracing subsystem should always start with a capital letter.
"tracing: Support IPv6 filter predicates"
But that's not why I'm replying here.
On Fri, 28 Apr 2023 16:34:46 +0100 Alan Maguire alan.maguire@oracle.com wrote:
Support '==' and '!=' predicates for IPv6 addresses; for example
cd /sys/kernel/debug/tracing/events/tcp/tcp_receive_reset echo "saddr_v6 == ::1" > filter
or equivalently
echo "saddr_v6 == 0:0:0:0:0:0:0:1" > filter
Signed-off-by: Alan Maguire alan.maguire@oracle.com
kernel/trace/trace_events_filter.c | 73 ++++++++++++++++++++++++++++++ 1 file changed, 73 insertions(+)
diff --git a/kernel/trace/trace_events_filter.c b/kernel/trace/trace_events_filter.c index d8e08d3c3594..e2521574f3c4 100644 --- a/kernel/trace/trace_events_filter.c +++ b/kernel/trace/trace_events_filter.c @@ -1665,6 +1665,79 @@ static int parse_pred(const char *str, void *data, if (pred->op == OP_NE) pred->not = 1;
- } else if (field->size == 16 &&
(str[i] == ':' ||
(isalnum(str[i]) && tolower(str[i + 1]) != 'x'))) {
u8 j, gap_size, gap = 0, gap_count = 0, index = 0;
u16 tmp_v6addr[8] = {};
u16 v6addr[8] = {};
/* For IPv6 addresses, only '==' or '!=' are supported. */
if (pred->op != OP_EQ && pred->op != OP_NE) {
parse_error(pe, FILT_ERR_ILLEGAL_FIELD_OP, pos + i);
goto err_free;
}
/* Store the u16s in the address string consecutively in
* tmp_v6addr while tracking the presence of a "::" (if any)
* in the IPv6 address string; we will use its location
* to determine how many u16s it represents (the gap_size
* below). Only one "::" is allowed in an IPv6 address
* string.
*/
while (isalnum(str[i]) || str[i] == ':') {
switch (str[i]) {
case ':':
i++;
/* mark "::" index by setting gap */
if (str[i] == ':') {
gap = index;
gap_count++;
i++;
}
if (gap_count > 1) {
parse_error(pe, FILT_ERR_ILLEGAL_FIELD_OP,
pos + s);
goto err_free;
}
break;
default:
if (sscanf(&str[i], "%hx", &tmp_v6addr[index]) != 1) {
parse_error(pe, FILT_ERR_ILLEGAL_FIELD_OP,
pos + s);
goto err_free;
}
index++;
while (isalnum(str[i]))
i++;
break;
}
}
There appears to be no limit to the above loop. I panic'd my machine with:
# echo 'saddr_v6 == 0123:4567:89ab:cdef:0123:4567:89ab:cdef:0123:4567:89ab:cdef:0123:4567:89ab:cdef:0123:4567:89ab:cdef:0123:4567:89ab:cdef:0123:4567:89ab:cdef:0123:4567:89ab:cdef' > /sys/kernel/tracing/events/sock/inet_sk_error_report/filter
-- Steve
/* The gap_size here represents the number of u16s the "::"
* represents; for ::1 the gap size is 7, for feed::face
* it is 6, etc.
*/
gap_size = 8 - index;
index = 0;
for (j = 0; j < 8; ) {
if (gap_size > 0 && j == gap) {
j += gap_size;
} else {
+#ifdef __BIG_ENDIAN
v6addr[j++] = tmp_v6addr[index];
+#else
v6addr[j++] = ((tmp_v6addr[index] & 0xff) << 8) +
((tmp_v6addr[index] & 0xff00) >> 8);
+#endif
index++;
}
}
pred_val = kzalloc(field->size, GFP_KERNEL);
memcpy(pred_val, v6addr, field->size);
pred->val = (u64)pred_val;
pred->fn_num = FILTER_PRED_FN_MEMCMP;
if (pred->op == OP_NE)
pred->not = 1;
- } else if (str[i] == '0' && tolower(str[i + 1]) == 'x' && field->size > 8) { /* For sizes > 8 bytes, we store hex bytes for comparison;
On Fri, 9 Jun 2023 17:12:27 -0400 Steven Rostedt rostedt@goodmis.org wrote:
while (isalnum(str[i]) || str[i] == ':') {
switch (str[i]) {
case ':':
i++;
/* mark "::" index by setting gap */
if (str[i] == ':') {
gap = index;
gap_count++;
i++;
}
if (gap_count > 1) {
parse_error(pe, FILT_ERR_ILLEGAL_FIELD_OP,
pos + s);
goto err_free;
}
break;
default:
if (sscanf(&str[i], "%hx", &tmp_v6addr[index]) != 1) {
parse_error(pe, FILT_ERR_ILLEGAL_FIELD_OP,
pos + s);
goto err_free;
}
index++;
while (isalnum(str[i]))
i++;
break;
There should also be a lot more checks here where the input coming in is correct. It also accepted:
"123456789abcdef0" as "def0", where I expected it to fail.
-- Steve
}
}
There appears to be no limit to the above loop. I panic'd my machine with:
# echo 'saddr_v6 == 0123:4567:89ab:cdef:0123:4567:89ab:cdef:0123:4567:89ab:cdef:0123:4567:89ab:cdef:0123:4567:89ab:cdef:0123:4567:89ab:cdef:0123:4567:89ab:cdef:0123:4567:89ab:cdef' > /sys/kernel/tracing/events/sock/inet_sk_error_report/filter
-- Steve
/* The gap_size here represents the number of u16s the "::"
* represents; for ::1 the gap size is 7, for feed::face
* it is 6, etc.
*/
gap_size = 8 - index;
index = 0;
for (j = 0; j < 8; ) {
if (gap_size > 0 && j == gap) {
j += gap_size;
} else {
+#ifdef __BIG_ENDIAN
v6addr[j++] = tmp_v6addr[index];
+#else
v6addr[j++] = ((tmp_v6addr[index] & 0xff) << 8) +
((tmp_v6addr[index] & 0xff00) >> 8);
+#endif
index++;
}
}
pred_val = kzalloc(field->size, GFP_KERNEL);
memcpy(pred_val, v6addr, field->size);
pred->val = (u64)pred_val;
pred->fn_num = FILTER_PRED_FN_MEMCMP;
if (pred->op == OP_NE)
pred->not = 1;
- } else if (str[i] == '0' && tolower(str[i + 1]) == 'x' && field->size > 8) { /* For sizes > 8 bytes, we store hex bytes for comparison;
Support '==' and '!=' predicates for MAC address format; for example
cd /sys/kernel/debug/tracing/events/cfg80211/rdev_get_key echo "mac_addr == de:ad:be:ef:de:ad"
Signed-off-by: Alan Maguire alan.maguire@oracle.com --- kernel/trace/trace_events_filter.c | 20 +++++++++++++++++++- 1 file changed, 19 insertions(+), 1 deletion(-)
diff --git a/kernel/trace/trace_events_filter.c b/kernel/trace/trace_events_filter.c index e2521574f3c4..f38023f490b1 100644 --- a/kernel/trace/trace_events_filter.c +++ b/kernel/trace/trace_events_filter.c @@ -1384,7 +1384,7 @@ static int parse_pred(const char *str, void *data, unsigned long size; unsigned long ip; char num_buf[24]; /* Big enough to hold an address */ - char scratch[4]; /* Big enough to hold an IPv4 address */ + char scratch[6]; /* Big enough to hold a MAC address */ u8 *pred_val; char *field_name; char *name; @@ -1738,6 +1738,24 @@ static int parse_pred(const char *str, void *data, if (pred->op == OP_NE) pred->not = 1;
+ } else if (field->size == 6 && + sscanf(&str[i], "%hhx:%hhx:%hhx:%hhx:%hhx:%hhx", + &scratch[0], &scratch[1], &scratch[2], &scratch[3], + &scratch[4], &scratch[5]) == 6) { + /* For MAC addresses, only '==' or '!=' are supported. */ + if (pred->op != OP_EQ && pred->op != OP_NE) { + parse_error(pe, FILT_ERR_ILLEGAL_FIELD_OP, pos + i); + goto err_free; + } + while (isalnum(str[i]) || str[i] == ':') + i++; + pred_val = kzalloc(field->size, GFP_KERNEL); + memcpy(pred_val, scratch, field->size); + pred->val = (u64)pred_val; + pred->fn_num = FILTER_PRED_FN_MEMCMP; + if (pred->op == OP_NE) + pred->not = 1; + } else if (str[i] == '0' && tolower(str[i + 1]) == 'x' && field->size > 8) { /* For sizes > 8 bytes, we store hex bytes for comparison;
add tests verifying filter predicates work for 1/2/4/8/16 byte values, IPv4, IPv6, MAC addresses and strings; use predicats at event and subsystem level.
Signed-off-by: Alan Maguire alan.maguire@oracle.com --- .../selftests/ftrace/test.d/event/filter.tc | 91 +++++++++++++++++++ 1 file changed, 91 insertions(+) create mode 100644 tools/testing/selftests/ftrace/test.d/event/filter.tc
diff --git a/tools/testing/selftests/ftrace/test.d/event/filter.tc b/tools/testing/selftests/ftrace/test.d/event/filter.tc new file mode 100644 index 000000000000..21d4715e2176 --- /dev/null +++ b/tools/testing/selftests/ftrace/test.d/event/filter.tc @@ -0,0 +1,91 @@ +#!/bin/sh +# SPDX-License-Identifier: GPL-2.0 +# description: event tracing - enable filter predicates +# requires: set_event events/sched +# flags: + +do_reset() { + echo 0 > ${event}/enable + echo 0 > ${event}/filter + clear_trace +} + +fail() { #msg + echo $1 + exit_fail +} + +test_filter() { # event filter cmd + event=$1 + filter="$2" + cmd=$3 + findevent=`basename $event` + echo "$filter" > ${event}/filter + echo 1 > ${event}/enable + $cmd + count=`grep $findevent trace |wc -l` + if [ $count -lt 1 ]; then + fail "at least one $event should be recorded for '$filter'" + fi + do_reset +} + +# verify filter predicates at trace event/subsys level for +# - string (prev_comm) +# - 1-byte value (common_flags) +# - 2-byte value (common_type) +# - 4-byte value (next_pid) +# - 8-byte value (prev_state) + +for event in events/sched/sched_switch events/sched +do + for filter in "prev_comm == 'ftracetest'" \ + "common_flags != 0" \ + "common_type >= 0" \ + "next_pid > 0" \ + "prev_state != 0" + do + test_filter "$event" "$filter" "yield" + done +done + +# verify '==', '!=' filter predicates for IPv4 addresses at event/subsys +# level +for event in events/fib/fib_table_lookup events/fib ; do + for filter in "dst == 127.0.0.1" \ + "src != 127.0.0.1" + do + test_filter "$event" "$filter" "ping -c 1 127.0.0.1" + done +done + +# verify '==', '!=' filter predicates for IPv6 addresses/16-byte arrays +# at event/subsys level +for event in events/fib6/fib6_table_lookup events/fib6 ; do + for filter in "dst == 0x00000000000000000000000000000001" \ + "src != 0x00000000000000000000000000000001" \ + "dst == ::1" \ + "src != ::1" \ + "dst == 0:0:0:0:0:0:0:1" \ + "dst == 0000:0000:0000:0000:0000:0000:0000:00001" + do + test_filter "$event" "$filter" "ping -c 1 -6 ::1" + done +done + +set +e +modprobe cfg80211 +set -e + +if [[ -d events/cfg80211/rdev_get_key ]]; then + for event in events/cfg80211/rdev_get_key ; do + for filter in "mac_addr == de:ad:be:ef:de:ad" \ + "mac_addr != AB:CD:EF:01:23:45" + do + echo "$filter" > events/cfg80211/rdev_get_key/filter + echo 0 > events/cfg80211/rdev_get_key/filter + done + done +fi + +exit 0
Document that only == and != predicates are supported for IPv4, IPv6 and MAC addresses.
For values > 8 bytes in size, only == and != filter predicates are supported; document this also.
Signed-off-by: Alan Maguire alan.maguire@oracle.com --- Documentation/trace/events.rst | 21 +++++++++++++++++++++ 1 file changed, 21 insertions(+)
diff --git a/Documentation/trace/events.rst b/Documentation/trace/events.rst index f5fcb8e1218f..6a75e4e256c9 100644 --- a/Documentation/trace/events.rst +++ b/Documentation/trace/events.rst @@ -182,10 +182,31 @@ The field-names available for use in filters can be found in the
The relational-operators depend on the type of the field being tested:
+For IPv4, IPv6 and MAC addresses, the available operators are: + +==, != + +For example + +"dst == 127.0.0.1" + +"src != ::1" + +"mac_addr == ab:cd:ef:12:34:56" + The operators available for numeric fields are:
==, !=, <, <=, >, >=, &
+For numeric fields larger than 8 bytes, only + +==, != + +...are allowed, and values for comparison must match field size exactly. +For example, to match the "::1" IPv6 address: + +"dst == 0x00000000000000000000000000000001" + And for string fields they are:
==, !=, ~
linux-kselftest-mirror@lists.linaro.org