Hi,
Greg and Sasha add the "X-stable: review" to their patch bombs, with the intention that people will be able to filter these out should they desire to do so. For example, I usually want all threads that match code I care about, but I don't regularly want to see thousand-patch stable series. So this header is helpful.
However, I'm not able to formulate a query for lore (to pass to `lei q`) that will match on negating it. The idea would be to exclude the thread if the parent has this header. It looks like public inbox might only index on some headers, but can't generically search all? I'm not sure how it works, but queries only seem to half way work when searching for that header.
In the meantime, I've been using this ugly bash script, which gets the job done, but means I have to download everything locally first:
#!/bin/bash PWD="${BASH_SOURCE[0]}" PWD="${PWD%/*}" set -e cd "$PWD" echo "[+] Syncing new mail" >&2 lei up "$PWD" echo "[+] Cleaning up stable patch bombs" >&2 mapfile -d $'\0' -t parents < <(grep -F -x -Z -r -l 'X-stable: review' cur tmp new) { [[ -f stable-message-ids ]] && cat stable-message-ids [[ ${#parents[@]} -gt 0 ]] && sed -n 's/^Message-ID: <(.*)>$/\1/p' "${parents[@]}" } | sort -u > stable-message-ids.new mv stable-message-ids.new stable-message-ids [[ -s stable-message-ids ]] || exit 0 mapfile -d $'\0' -t children < <(grep -F -Z -r -l -f - cur tmp new < stable-message-ids) total=$(( ${#parents[@]} + ${#children[@]} )) [[ $total -gt 0 ]] || exit 0 echo "# rm <...$total messages...>" >&2 rm -f "${parents[@]}" "${children[@]}"
This results in something like:
zx2c4@thinkpad ~/Projects/lkml $ ./update.bash [+] Syncing new mail # https://lore.kernel.org/all/ limiting ... # /usr/bin/curl -gSf -s -d '' https://lore.kernel.org/all/?x=m&t=1&q=(... [+] Cleaning up stable patch bombs # rm <...24593 messages...>
It works, but it'd be nice to not even download these messages in the first place. Since I'm deleting message I don't want, I have to keep track of the message IDs of those deleted messages with the stable header in case replies come in later. That's some book keeping, sheesh!
Any thoughts on this workflow?
Jason
"Jason A. Donenfeld" Jason@zx2c4.com wrote:
Hi,
Greg and Sasha add the "X-stable: review" to their patch bombs, with the intention that people will be able to filter these out should they desire to do so. For example, I usually want all threads that match code I care about, but I don't regularly want to see thousand-patch stable series. So this header is helpful.
However, I'm not able to formulate a query for lore (to pass to `lei q`) that will match on negating it. The idea would be to exclude the thread if the parent has this header. It looks like public inbox might only index on some headers, but can't generically search all? I'm not sure how it works, but queries only seem to half way work when searching for that header.
Correct, public-inbox currently won't index every header due to cost, false positives, and otherwise lack of usefulness (general gibberish from DKIM sigs, various UUIDs, etc).
So it doesn't currently know about "X-stable:"
I started working on making headers indexing configurable last year, but didn't hear a response from the person that potentially was interested:
https://public-inbox.org/meta/20231120032132.M610564@dcvr/
Right now, indexing new headers + validations can be maintained as a Perl module in the public-inbox codebase.
For lore, it'd make sense to be able to configure a bunch (or all) inboxes at once instead of the per-inbox configuration in my proposed RFC.
At minimum, one would have to know:
1) the mail header name (e.g. `X-stable') 2) the search prefix to use (e.g. `xstable:') # can't use dash `-' AFAIK 3) the type of header value (phrase, string, sortable numeric, etc...)
I'm trying to avoid supporting sortable numeric values for this, since supporting them will problems if columns get repurposed with admins changing their minds. A full reindex would fix it, but those are crazy expensive.
So probably just supporting strings and/or phrases to start...
Validation to prevent poisoning by malicious/broken senders can be useful in some cases (and the reason the RFC was a per use case Perl module). That said, I'm not sure if much validation is necessary for X-stable: headers or if just any text is fine.
On Sat, Apr 27, 2024 at 07:19:21AM GMT, Eric Wong wrote:
Correct, public-inbox currently won't index every header due to cost, false positives, and otherwise lack of usefulness (general gibberish from DKIM sigs, various UUIDs, etc).
So it doesn't currently know about "X-stable:"
I started working on making headers indexing configurable last year, but didn't hear a response from the person that potentially was interested:
https://public-inbox.org/meta/20231120032132.M610564@dcvr/
Right now, indexing new headers + validations can be maintained as a Perl module in the public-inbox codebase.
For lore, it'd make sense to be able to configure a bunch (or all) inboxes at once instead of the per-inbox configuration in my proposed RFC.
At minimum, one would have to know:
- the mail header name (e.g. `X-stable')
- the search prefix to use (e.g. `xstable:') # can't use dash `-' AFAIK
- the type of header value (phrase, string, sortable numeric, etc...)
I'm whole-heartedly for this! This ties nicely to my b4 work where I'd like to be able to identify code-review trailers sent for a specific patch, even if that patch itself is not on lore. For example, this could be a patch that is part of a pull-request on a git forge, but we'd still like to be able to collect and find code-review trailers for it when a maintainer applies it.
Currently, I am using the following approach:
| Reviewed-by: Some Developer some.dev@example.org | --- | for-patch-id: abcd...1234
Then I can query 'nq:"for-patch-id: abcd...1234"', but this is probably much more heavy than if I could provide this in a custom header:
| X-For-Patch-ID: abcd...1234
and query for "xforpatchid:abcd...1234"
I'm trying to avoid supporting sortable numeric values for this, since supporting them will problems if columns get repurposed with admins changing their minds. A full reindex would fix it, but those are crazy expensive.
I'm perfectly fine with it only being a string, honestly.
So probably just supporting strings and/or phrases to start...
Validation to prevent poisoning by malicious/broken senders can be useful in some cases (and the reason the RFC was a per use case Perl module). That said, I'm not sure if much validation is necessary for X-stable: headers or if just any text is fine.
I'd let the consumer clients worry about it.
-K
Konstantin Ryabitsev mricon@kernel.org wrote:
On Sat, Apr 27, 2024 at 07:19:21AM GMT, Eric Wong wrote:
Correct, public-inbox currently won't index every header due to cost, false positives, and otherwise lack of usefulness (general gibberish from DKIM sigs, various UUIDs, etc).
So it doesn't currently know about "X-stable:"
I started working on making headers indexing configurable last year, but didn't hear a response from the person that potentially was interested:
https://public-inbox.org/meta/20231120032132.M610564@dcvr/
Right now, indexing new headers + validations can be maintained as a Perl module in the public-inbox codebase.
For lore, it'd make sense to be able to configure a bunch (or all) inboxes at once instead of the per-inbox configuration in my proposed RFC.
At minimum, one would have to know:
- the mail header name (e.g. `X-stable')
- the search prefix to use (e.g. `xstable:') # can't use dash `-' AFAIK
- the type of header value (phrase, string, sortable numeric, etc...)
I'm whole-heartedly for this! This ties nicely to my b4 work where I'd like to be able to identify code-review trailers sent for a specific patch, even if that patch itself is not on lore. For example, this could be a patch that is part of a pull-request on a git forge, but we'd still like to be able to collect and find code-review trailers for it when a maintainer applies it.
OK, a more configurable version is available on a per-inbox basis:
https://public-inbox.org/meta/20240508110957.3108196-1-e@80x24.org/
But that's a PITA to configure with hundreds of inboxes and doesn't have extindex support, yet.
I made it share logic with the old altid code; so I'll also be getting altid into extindex since ISTR users wanting to be able to lookup gmane stuff via extindex.
And it also works with the new C++ xap_helper process (which I'll use for threadid: support (still working on that...)).
I'm perfectly fine with it only being a string, honestly.
Yeah, though there's 3 ways of indexing strings, currently :x I've decided to keep some options open and support boolean_term, text, and phrase for now.
boolean_term is the cheapest and probably best for exactly matching labels/enums and such. The others may work better for more complex texts (comma-delimited labels, maybe).
So probably just supporting strings and/or phrases to start...
Validation to prevent poisoning by malicious/broken senders can be useful in some cases (and the reason the RFC was a per use case Perl module). That said, I'm not sure if much validation is necessary for X-stable: headers or if just any text is fine.
I'd let the consumer clients worry about it.
Agreed.
On Wed, May 08, 2024 at 11:33:14AM GMT, Eric Wong wrote:
I'm whole-heartedly for this! This ties nicely to my b4 work where I'd like to be able to identify code-review trailers sent for a specific patch, even if that patch itself is not on lore. For example, this could be a patch that is part of a pull-request on a git forge, but we'd still like to be able to collect and find code-review trailers for it when a maintainer applies it.
OK, a more configurable version is available on a per-inbox basis:
https://public-inbox.org/meta/20240508110957.3108196-1-e@80x24.org/
But that's a PITA to configure with hundreds of inboxes and doesn't have extindex support, yet.
I made it share logic with the old altid code; so I'll also be getting altid into extindex since ISTR users wanting to be able to lookup gmane stuff via extindex.
Great, thanks for doing this. I'll wait until this has extindex support, because I really need to be able to look across all inboxes.
Yeah, though there's 3 ways of indexing strings, currently :x I've decided to keep some options open and support boolean_term, text, and phrase for now.
What's the difference between "text" and "phrase"?
boolean_term is the cheapest and probably best for exactly matching labels/enums and such.
So, this is for "X-Ignore-Me: Yes" type of headers?
-K
Konstantin Ryabitsev mricon@kernel.org wrote:
On Wed, May 08, 2024 at 11:33:14AM GMT, Eric Wong wrote:
https://public-inbox.org/meta/20240508110957.3108196-1-e@80x24.org/
Yeah, though there's 3 ways of indexing strings, currently :x I've decided to keep some options open and support boolean_term, text, and phrase for now.
What's the difference between "text" and "phrase"?
text is like indexlevel=medium (case-insensitive and sortable by relevance), while phrase is like indexlevel=full so adds positions to allow searching phrases via "double quotesk
(also documented in the proposed public-inbox-config.pod change, not sure how clear it was for non-Xapian-internals-aware folks...)
boolean_term is the cheapest and probably best for exactly matching labels/enums and such.
So, this is for "X-Ignore-Me: Yes" type of headers?
Yes. Though I just remembered it's case-sensitive (same treatment as Message-ID with "m:"), so I guess that needs to be documented.
linux-stable-mirror@lists.linaro.org