Konstantin Ryabitsev mricon@kernel.org wrote:
On Sat, Apr 27, 2024 at 07:19:21AM GMT, Eric Wong wrote:
Correct, public-inbox currently won't index every header due to cost, false positives, and otherwise lack of usefulness (general gibberish from DKIM sigs, various UUIDs, etc).
So it doesn't currently know about "X-stable:"
I started working on making headers indexing configurable last year, but didn't hear a response from the person that potentially was interested:
https://public-inbox.org/meta/20231120032132.M610564@dcvr/
Right now, indexing new headers + validations can be maintained as a Perl module in the public-inbox codebase.
For lore, it'd make sense to be able to configure a bunch (or all) inboxes at once instead of the per-inbox configuration in my proposed RFC.
At minimum, one would have to know:
- the mail header name (e.g. `X-stable')
- the search prefix to use (e.g. `xstable:') # can't use dash `-' AFAIK
- the type of header value (phrase, string, sortable numeric, etc...)
I'm whole-heartedly for this! This ties nicely to my b4 work where I'd like to be able to identify code-review trailers sent for a specific patch, even if that patch itself is not on lore. For example, this could be a patch that is part of a pull-request on a git forge, but we'd still like to be able to collect and find code-review trailers for it when a maintainer applies it.
OK, a more configurable version is available on a per-inbox basis:
https://public-inbox.org/meta/20240508110957.3108196-1-e@80x24.org/
But that's a PITA to configure with hundreds of inboxes and doesn't have extindex support, yet.
I made it share logic with the old altid code; so I'll also be getting altid into extindex since ISTR users wanting to be able to lookup gmane stuff via extindex.
And it also works with the new C++ xap_helper process (which I'll use for threadid: support (still working on that...)).
I'm perfectly fine with it only being a string, honestly.
Yeah, though there's 3 ways of indexing strings, currently :x I've decided to keep some options open and support boolean_term, text, and phrase for now.
boolean_term is the cheapest and probably best for exactly matching labels/enums and such. The others may work better for more complex texts (comma-delimited labels, maybe).
So probably just supporting strings and/or phrases to start...
Validation to prevent poisoning by malicious/broken senders can be useful in some cases (and the reason the RFC was a per use case Perl module). That said, I'm not sure if much validation is necessary for X-stable: headers or if just any text is fine.
I'd let the consumer clients worry about it.
Agreed.