On Wed, Oct 08, 2025 at 06:56:59PM +0200, Alexander Lobakin wrote:
Turned out certain clearly invalid values passed in &xdp_desc from userspace can pass xp_{,un}aligned_validate_desc() and then lead to UBs or just invalid frames to be queued for xmit.
desc->len close to ``U32_MAX`` with a non-zero pool->tx_metadata_len can cause positive integer overflow and wraparound, the same way low enough desc->addr with a non-zero pool->tx_metadata_len can cause negative integer overflow. Both scenarios can then pass the validation successfully.
Hmm, when underflow happens the addr would be enormous, passing existing validation would really be rare. However let us fix it while at it.
This doesn't happen with valid XSk applications, but can be used to perform attacks.
Always promote desc->len to ``u64`` first to exclude positive overflows of it. Use explicit check_{add,sub}_overflow() when validating desc->addr (which is ``u64`` already).
bloat-o-meter reports a little growth of the code size:
add/remove: 0/0 grow/shrink: 2/1 up/down: 60/-16 (44) Function old new delta xskq_cons_peek_desc 299 330 +31 xsk_tx_peek_release_desc_batch 973 1002 +29 xsk_generic_xmit 3148 3132 -16
but hopefully this doesn't hurt the performance much.
Let us be fully transparent and link the previous discussion here?
I was commenting that breaking up single statement to multiple branches might affect subtly performance as this code is executed per each descriptor. Jason tested copy+aligned mode, let us see if zc+unaligned mode is affected.
<rant> I am also thinking about test side, but xsk tx metadata came with a separate test (xdp_hw_metadata), which was rather about testing positive cases. That is probably a separate discussion, but metadata negative tests should appear somewhere, I suppose xskxceiver would be a good fit, but then, should we merge the existing logic from xdp_hw_metadata? </rant>
Fixes: 341ac980eab9 ("xsk: Support tx_metadata_len") Cc: stable@vger.kernel.org # 6.8+ Signed-off-by: Alexander Lobakin aleksander.lobakin@intel.com
net/xdp/xsk_queue.h | 45 +++++++++++++++++++++++++++++++++++---------- 1 file changed, 35 insertions(+), 10 deletions(-)
diff --git a/net/xdp/xsk_queue.h b/net/xdp/xsk_queue.h index f16f390370dc..1eb8d9f8b104 100644 --- a/net/xdp/xsk_queue.h +++ b/net/xdp/xsk_queue.h @@ -143,14 +143,24 @@ static inline bool xp_unused_options_set(u32 options) static inline bool xp_aligned_validate_desc(struct xsk_buff_pool *pool, struct xdp_desc *desc) {
- u64 addr = desc->addr - pool->tx_metadata_len;
- u64 len = desc->len + pool->tx_metadata_len;
- u64 offset = addr & (pool->chunk_size - 1);
- u64 len = desc->len;
- u64 addr, offset;
- if (!desc->len)
- if (!len)
This is yet another thing being fixed here as for non-zero tx_metadata_len we were allowing 0 length descriptors... :< overall feels like we relied too much on contract with userspace WRT descriptor layout.
If zc perf is fine, then: Reviewed-by: Maciej Fijalkowski maciej.fijalkowski@intel.com
return false;
- if (offset + len > pool->chunk_size)
- /* Can overflow if desc->addr < pool->tx_metadata_len */
- if (check_sub_overflow(desc->addr, pool->tx_metadata_len, &addr))
return false;
- offset = addr & (pool->chunk_size - 1);
- /*
* Can't overflow: @offset is guaranteed to be < ``U32_MAX``
* (pool->chunk_size is ``u32``), @len is guaranteed
* to be <= ``U32_MAX``.
*/
- if (offset + len + pool->tx_metadata_len > pool->chunk_size) return false;
if (addr >= pool->addrs_cnt) @@ -158,27 +168,42 @@ static inline bool xp_aligned_validate_desc(struct xsk_buff_pool *pool, if (xp_unused_options_set(desc->options)) return false;
- return true;
} static inline bool xp_unaligned_validate_desc(struct xsk_buff_pool *pool, struct xdp_desc *desc) {
- u64 addr = xp_unaligned_add_offset_to_addr(desc->addr) - pool->tx_metadata_len;
- u64 len = desc->len + pool->tx_metadata_len;
- u64 len = desc->len;
- u64 addr, end;
- if (!desc->len)
- if (!len) return false;
- /* Can't overflow: @len is guaranteed to be <= ``U32_MAX`` */
- len += pool->tx_metadata_len; if (len > pool->chunk_size) return false;
- if (addr >= pool->addrs_cnt || addr + len > pool->addrs_cnt ||
xp_desc_crosses_non_contig_pg(pool, addr, len))
- /* Can overflow if desc->addr is close to 0 */
- if (check_sub_overflow(xp_unaligned_add_offset_to_addr(desc->addr),
pool->tx_metadata_len, &addr))
return false;
- if (addr >= pool->addrs_cnt)
return false;
- /* Can overflow if pool->addrs_cnt is high enough */
- if (check_add_overflow(addr, len, &end) || end > pool->addrs_cnt)
return false;
- if (xp_desc_crosses_non_contig_pg(pool, addr, len)) return false;
if (xp_unused_options_set(desc->options)) return false;
- return true;
} -- 2.51.0