From: Maciej Fijalkowski maciej.fijalkowski@intel.com Date: Thu, 9 Oct 2025 16:27:50 +0200
On Wed, Oct 08, 2025 at 06:56:59PM +0200, Alexander Lobakin wrote:
Turned out certain clearly invalid values passed in &xdp_desc from userspace can pass xp_{,un}aligned_validate_desc() and then lead to UBs or just invalid frames to be queued for xmit.
desc->len close to ``U32_MAX`` with a non-zero pool->tx_metadata_len can cause positive integer overflow and wraparound, the same way low enough desc->addr with a non-zero pool->tx_metadata_len can cause negative integer overflow. Both scenarios can then pass the validation successfully.
Hmm, when underflow happens the addr would be enormous, passing existing validation would really be rare. However let us fix it while at it.
It depends on how big pool->addrs_cnt can be. I haven't dug deep into the internals, is this value also userspace-supplied or generated by the core code and is always valid?
Also see below (xp_aligned_validate_desc()).
This doesn't happen with valid XSk applications, but can be used to perform attacks.
Always promote desc->len to ``u64`` first to exclude positive overflows of it. Use explicit check_{add,sub}_overflow() when validating desc->addr (which is ``u64`` already).
bloat-o-meter reports a little growth of the code size:
add/remove: 0/0 grow/shrink: 2/1 up/down: 60/-16 (44) Function old new delta xskq_cons_peek_desc 299 330 +31 xsk_tx_peek_release_desc_batch 973 1002 +29 xsk_generic_xmit 3148 3132 -16
but hopefully this doesn't hurt the performance much.
Let us be fully transparent and link the previous discussion here?
As per a quick discussion with the maintainers yesterday, we would like to not mention FSB-sponsored code/companies in any way...
I was commenting that breaking up single statement to multiple branches might affect subtly performance as this code is executed per each
The compilers successfully merge such stuff.
The only overhead introduced is the calls to __builtin_{add,sub}_overflow(), they are definitely inlined (compiler intrinsics), also check_{add,sub}_overflow() are wrapped with unlikely(), but still may take a couple instructions.
descriptor. Jason tested copy+aligned mode, let us see if zc+unaligned mode is affected.
<rant> I am also thinking about test side, but xsk tx metadata came with a separate test (xdp_hw_metadata), which was rather about testing positive cases. That is probably a separate discussion, but metadata negative tests should appear somewhere, I suppose xskxceiver would be a good fit, but then, should we merge the existing logic from xdp_hw_metadata? </rant>
I'd love to write a test that would prove that invalid descriptors are successfully rejected, but rather separately from this particular fix.
[...]
@@ -143,14 +143,24 @@ static inline bool xp_unused_options_set(u32 options) static inline bool xp_aligned_validate_desc(struct xsk_buff_pool *pool, struct xdp_desc *desc) {
- u64 addr = desc->addr - pool->tx_metadata_len;
- u64 len = desc->len + pool->tx_metadata_len;
- u64 offset = addr & (pool->chunk_size - 1);
- u64 len = desc->len;
- u64 addr, offset;
- if (!desc->len)
- if (!len)
This is yet another thing being fixed here as for non-zero tx_metadata_len we were allowing 0 length descriptors... :< overall feels like we relied too much on contract with userspace WRT descriptor layout.
If zc perf is fine, then: Reviewed-by: Maciej Fijalkowski maciej.fijalkowski@intel.com
return false;
- if (offset + len > pool->chunk_size)
- /* Can overflow if desc->addr < pool->tx_metadata_len */
- if (check_sub_overflow(desc->addr, pool->tx_metadata_len, &addr))
return false;
- offset = addr & (pool->chunk_size - 1);
If there's an overflow and @addr went crazy, @offset can still be valid as it's capped by pool->chunk_size here.
- /*
* Can't overflow: @offset is guaranteed to be < ``U32_MAX``
* (pool->chunk_size is ``u32``), @len is guaranteed
* to be <= ``U32_MAX``.
*/
- if (offset + len + pool->tx_metadata_len > pool->chunk_size) return false;
if (addr >= pool->addrs_cnt)
But if pool->addrs_cnt is always valid, insanely big @addr would be rejected here, right.
Thanks, Olek