It is observed sometimes when tethering is used over NCM with Windows 11 as host, at some instances, the gadget_giveback has one byte appended at the end of a proper NTB. When the NTB is parsed, unwrap call looks for any leftover bytes in SKB provided by u_ether and if there are any pending bytes, it treats them as a separate NTB and parses it. But in case the second NTB (as per unwrap call) is faulty/corrupt, all the datagrams that were parsed properly in the first NTB and saved in rx_list are dropped.
Adding a few custom traces showed the following: [002] d..1 7828.532866: dwc3_gadget_giveback: ep1out: req 000000003868811a length 1025/16384 zsI ==> 0 [002] d..1 7828.532867: ncm_unwrap_ntb: K: ncm_unwrap_ntb toprocess: 1025 [002] d..1 7828.532867: ncm_unwrap_ntb: K: ncm_unwrap_ntb nth: 1751999342 [002] d..1 7828.532868: ncm_unwrap_ntb: K: ncm_unwrap_ntb seq: 0xce67 [002] d..1 7828.532868: ncm_unwrap_ntb: K: ncm_unwrap_ntb blk_len: 0x400 [002] d..1 7828.532868: ncm_unwrap_ntb: K: ncm_unwrap_ntb ndp_len: 0x10 [002] d..1 7828.532869: ncm_unwrap_ntb: K: Parsed NTB with 1 frames
In this case, the giveback is of 1025 bytes and block length is 1024. The rest 1 byte (which is 0x00) won't be parsed resulting in drop of all datagrams in rx_list.
Same is case with packets of size 2048: [002] d..1 7828.557948: dwc3_gadget_giveback: ep1out: req 0000000011dfd96e length 2049/16384 zsI ==> 0 [002] d..1 7828.557949: ncm_unwrap_ntb: K: ncm_unwrap_ntb nth: 1751999342 [002] d..1 7828.557950: ncm_unwrap_ntb: K: ncm_unwrap_ntb blk_len: 0x800
Lecroy shows one byte coming in extra confirming that the byte is coming in from PC:
Transfer 2959 - Bytes Transferred(1025) Timestamp((18.524 843 590) - Transaction 8391 - Data(1025 bytes) Timestamp(18.524 843 590) --- Packet 4063861 Data(1024 bytes) Duration(2.117us) Idle(14.700ns) Timestamp(18.524 843 590) --- Packet 4063863 Data(1 byte) Duration(66.160ns) Time(282.000ns) Timestamp(18.524 845 722)
According to Windows driver, no ZLP is needed if wBlockLength is non-zero, because the non-zero wBlockLength has already told the function side the size of transfer to be expected. However, there are in-market NCM devices that rely on ZLP as long as the wBlockLength is multiple of wMaxPacketSize. To deal with such devices, it pads an extra 0 at end so the transfer is no longer multiple of wMaxPacketSize.
Cc: stable@vger.kernel.org Fixes: 9f6ce4240a2b ("usb: gadget: f_ncm.c added") Signed-off-by: Krishna Kurapati quic_kriskura@quicinc.com --- Link to v2: https://lore.kernel.org/all/20240131150332.1326523-1-quic_kriskura@quicinc.c...
Changes in v2: Added check to see if the padded byte is 0x00.
Changes in v3: Removed wMaxPacketSize check from v2.
drivers/usb/gadget/function/f_ncm.c | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-)
diff --git a/drivers/usb/gadget/function/f_ncm.c b/drivers/usb/gadget/function/f_ncm.c index ca5d5f564998..e2a059cfda2c 100644 --- a/drivers/usb/gadget/function/f_ncm.c +++ b/drivers/usb/gadget/function/f_ncm.c @@ -1338,7 +1338,15 @@ static int ncm_unwrap_ntb(struct gether *port, "Parsed NTB with %d frames\n", dgram_counter);
to_process -= block_len; - if (to_process != 0) { + + /* + * Windows NCM driver avoids USB ZLPs by adding a 1-byte + * zero pad as needed. + */ + if (to_process == 1 && + (*(unsigned char *)(ntb_ptr + block_len) == 0x00)) { + to_process--; + } else if (to_process > 0) { ntb_ptr = (unsigned char *)(ntb_ptr + block_len); goto parse_ntb; }
On Sun, Feb 4, 2024 at 11:47 PM Krishna Kurapati quic_kriskura@quicinc.com wrote:
It is observed sometimes when tethering is used over NCM with Windows 11 as host, at some instances, the gadget_giveback has one byte appended at the end of a proper NTB. When the NTB is parsed, unwrap call looks for any leftover bytes in SKB provided by u_ether and if there are any pending bytes, it treats them as a separate NTB and parses it. But in case the second NTB (as per unwrap call) is faulty/corrupt, all the datagrams that were parsed properly in the first NTB and saved in rx_list are dropped.
Adding a few custom traces showed the following: [002] d..1 7828.532866: dwc3_gadget_giveback: ep1out: req 000000003868811a length 1025/16384 zsI ==> 0 [002] d..1 7828.532867: ncm_unwrap_ntb: K: ncm_unwrap_ntb toprocess: 1025 [002] d..1 7828.532867: ncm_unwrap_ntb: K: ncm_unwrap_ntb nth: 1751999342 [002] d..1 7828.532868: ncm_unwrap_ntb: K: ncm_unwrap_ntb seq: 0xce67 [002] d..1 7828.532868: ncm_unwrap_ntb: K: ncm_unwrap_ntb blk_len: 0x400 [002] d..1 7828.532868: ncm_unwrap_ntb: K: ncm_unwrap_ntb ndp_len: 0x10 [002] d..1 7828.532869: ncm_unwrap_ntb: K: Parsed NTB with 1 frames
In this case, the giveback is of 1025 bytes and block length is 1024. The rest 1 byte (which is 0x00) won't be parsed resulting in drop of all datagrams in rx_list.
Same is case with packets of size 2048: [002] d..1 7828.557948: dwc3_gadget_giveback: ep1out: req 0000000011dfd96e length 2049/16384 zsI ==> 0 [002] d..1 7828.557949: ncm_unwrap_ntb: K: ncm_unwrap_ntb nth: 1751999342 [002] d..1 7828.557950: ncm_unwrap_ntb: K: ncm_unwrap_ntb blk_len: 0x800
Lecroy shows one byte coming in extra confirming that the byte is coming in from PC:
Transfer 2959 - Bytes Transferred(1025) Timestamp((18.524 843 590)
- Transaction 8391 - Data(1025 bytes) Timestamp(18.524 843 590)
--- Packet 4063861 Data(1024 bytes) Duration(2.117us) Idle(14.700ns) Timestamp(18.524 843 590) --- Packet 4063863 Data(1 byte) Duration(66.160ns) Time(282.000ns) Timestamp(18.524 845 722)
According to Windows driver, no ZLP is needed if wBlockLength is non-zero, because the non-zero wBlockLength has already told the function side the size of transfer to be expected. However, there are in-market NCM devices that rely on ZLP as long as the wBlockLength is multiple of wMaxPacketSize. To deal with such devices, it pads an extra 0 at end so the transfer is no longer multiple of wMaxPacketSize.
Cc: stable@vger.kernel.org Fixes: 9f6ce4240a2b ("usb: gadget: f_ncm.c added") Signed-off-by: Krishna Kurapati quic_kriskura@quicinc.com
Link to v2: https://lore.kernel.org/all/20240131150332.1326523-1-quic_kriskura@quicinc.c...
Changes in v2: Added check to see if the padded byte is 0x00.
Changes in v3: Removed wMaxPacketSize check from v2.
drivers/usb/gadget/function/f_ncm.c | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-)
diff --git a/drivers/usb/gadget/function/f_ncm.c b/drivers/usb/gadget/function/f_ncm.c index ca5d5f564998..e2a059cfda2c 100644 --- a/drivers/usb/gadget/function/f_ncm.c +++ b/drivers/usb/gadget/function/f_ncm.c @@ -1338,7 +1338,15 @@ static int ncm_unwrap_ntb(struct gether *port, "Parsed NTB with %d frames\n", dgram_counter);
to_process -= block_len;
if (to_process != 0) {
/*
* Windows NCM driver avoids USB ZLPs by adding a 1-byte
* zero pad as needed.
*/
if (to_process == 1 &&
(*(unsigned char *)(ntb_ptr + block_len) == 0x00)) {
to_process--;
} else if (to_process > 0) { ntb_ptr = (unsigned char *)(ntb_ptr + block_len); goto parse_ntb; }
-- 2.34.1
Reviewed-by: Maciej Żenczykowski maze@google.com
Let's get this fix out.
Greg, there's further code cleanups (here and elsewhere) I'll send once this is merged. I don't want to annoy Krishna further ;-)
Hi everyone,
I recently switch to f_ncm with Windows 10 since rndis 's safety issue. (the Windows 10 driver version is 10.0.19041.1 2009/4/21)
It seems Windows 10 ncm driver won't send ZLP to let udc properly separate the skbs.
On Mon, 5 Feb 2024 13:16:50 +0530 Krishna Kurapati wrote:
According to Windows driver, no ZLP is needed if wBlockLength is
non-zero,
because the non-zero wBlockLength has already told the function side the size of transfer to be expected. However, there are in-market NCM devices that rely on ZLP as long as the wBlockLength is multiple of
wMaxPacketSize.
To deal with such devices, it pads an extra 0 at end so the transfer
is no
longer multiple of wMaxPacketSize.
I do the iperf3 testing cause gadget constantly report similar error after a litle modification to get more concrete info:
[ 174] configfs-gadget.0: to process=512, so go to find second NTH from: 15872 [ 174] FIND NEXT NTH HEAD:000000006c26a12c: 6e 63 6d 68 10 00 86 16 b0 3b 00 00 48 3b 00 00 00 00 52 34 fc 5f 90 fd ca 40 c1 f4 f4 6e 08 00 [ 174] configfs-gadget.0: Wrong NDP SIGN of this ndp index: 15176, skb len: 16384, ureq_len: 16384, this wSeq: 5766 [ 174] NDP HEAD:00000000298f3cab: 2b 12 48 8f 12 ce 3c c8 d7 39 c0 0d 15 cf 86 14 17 4a 91 85 db df ad 87 f0 35 0d 76 ad 4d 4d 74 [ 174] NTH of this NDP HEAD:00000000af9fbfc9: 6e 63 6d 68 10 00 85 16 00 3e 00 00 90 3d 00 00 00 00 52 34 fc 5f 90 fd ca 40 c1 f4 f4 6e 08 00 [ 174] configfs-gadget.0: Wrong NTH SIGN, skblen 14768, last wSequence: 5766, last dgram_num: 11, ureqlen: 16384 [ 174] HEAD:00000000b1a72bfc: 3f 98 a6 8e 17 f8 bb 29 07 b8 da 13 7f 20 80 8e 77 ca 32 07 ac 71 b8 8d 84 03 d7 1b 96 9b c4 fa
Lecroy shows the wSequence=5765 have 10 Datagram consisting a 31*512 bytes=15872 bytes OUT Transfer but have no ZLP:
OUT Transfer wSequence=5765 NTH32 Datagrams: 1514B * 8 + 580B NDP32 Transfer length: 512B * 31 NO ZLP OUT Transfer wSequence=5766 NTH32 Datagrams: 1514B * 8 NDP32 Transfer length: 512B * 29 + 432
This lead to a result that the first givebacked 16K skb correponding to a usb_request contains two NTH but not complete:
USB Request 1 SKB 16384B (NTH32) (Datagrams) (NDP32) | (NTH32) (Datagrams piece of wSequence=5766) USB Request 2 SKB 14768B (Datagrams piece of wSequence=5766) (NDP32)
From the context, it seems the first report of Wrong NDP SIGN is caused by out-of-bound accessing, the second report of Wrong NTH SIGN is caused by a wrong beginning of NTB parsing.
Do you have any idea how can this be fixed so that the ncm compatibility is better for windows users.
Best Regards, Pan
On Thu, Jan 9, 2025 at 11:37 PM Junzhong Pan panjunzhong@outlook.com wrote:
Hi everyone,
I recently switch to f_ncm with Windows 10 since rndis 's safety issue. (the Windows 10 driver version is 10.0.19041.1 2009/4/21)
It seems Windows 10 ncm driver won't send ZLP to let udc properly separate the skbs.
On Mon, 5 Feb 2024 13:16:50 +0530 Krishna Kurapati wrote:
According to Windows driver, no ZLP is needed if wBlockLength is
non-zero,
because the non-zero wBlockLength has already told the function side the size of transfer to be expected. However, there are in-market NCM devices that rely on ZLP as long as the wBlockLength is multiple of
wMaxPacketSize.
To deal with such devices, it pads an extra 0 at end so the transfer
is no
longer multiple of wMaxPacketSize.
I do the iperf3 testing cause gadget constantly report similar error after a litle modification to get more concrete info:
[ 174] configfs-gadget.0: to process=512, so go to find second NTH from: 15872 [ 174] FIND NEXT NTH HEAD:000000006c26a12c: 6e 63 6d 68 10 00 86 16 b0 3b 00 00 48 3b 00 00 00 00 52 34 fc 5f 90 fd ca 40 c1 f4 f4 6e 08 00 [ 174] configfs-gadget.0: Wrong NDP SIGN of this ndp index: 15176, skb len: 16384, ureq_len: 16384, this wSeq: 5766 [ 174] NDP HEAD:00000000298f3cab: 2b 12 48 8f 12 ce 3c c8 d7 39 c0 0d 15 cf 86 14 17 4a 91 85 db df ad 87 f0 35 0d 76 ad 4d 4d 74 [ 174] NTH of this NDP HEAD:00000000af9fbfc9: 6e 63 6d 68 10 00 85 16 00 3e 00 00 90 3d 00 00 00 00 52 34 fc 5f 90 fd ca 40 c1 f4 f4 6e 08 00 [ 174] configfs-gadget.0: Wrong NTH SIGN, skblen 14768, last wSequence: 5766, last dgram_num: 11, ureqlen: 16384 [ 174] HEAD:00000000b1a72bfc: 3f 98 a6 8e 17 f8 bb 29 07 b8 da 13 7f 20 80 8e 77 ca 32 07 ac 71 b8 8d 84 03 d7 1b 96 9b c4 fa
Lecroy shows the wSequence=5765 have 10 Datagram consisting a 31*512 bytes=15872 bytes OUT Transfer but have no ZLP:
OUT Transfer wSequence=5765 NTH32 Datagrams: 1514B * 8 + 580B NDP32 Transfer length: 512B * 31 NO ZLP OUT Transfer wSequence=5766 NTH32 Datagrams: 1514B * 8 NDP32 Transfer length: 512B * 29 + 432
This lead to a result that the first givebacked 16K skb correponding to a usb_request contains two NTH but not complete:
USB Request 1 SKB 16384B (NTH32) (Datagrams) (NDP32) | (NTH32) (Datagrams piece of wSequence=5766) USB Request 2 SKB 14768B (Datagrams piece of wSequence=5766) (NDP32)
From the context, it seems the first report of Wrong NDP SIGN is caused by out-of-bound accessing, the second report of Wrong NTH SIGN is caused by a wrong beginning of NTB parsing.
Do you have any idea how can this be fixed so that the ncm compatibility is better for windows users.
Best Regards, Pan
Could you clarify which Linux Kernel you're testing against? Either X.Y.Z version or some git kernel sha1 (not including your debug code of course).
Could you provide some pcap of the actual usb frames? Or perhaps describe better the problem, because I'm not quite following from your email. (I'm not sure if the problem is what windows is sending, or how Linux is parsing it)
I *think* what you're saying is that wSequence=5765 & 5766 are being treated as a single ncm message due to their being a multiple of 512 in the former, not followed by a ZLP? I thought that was precisely when microsoft ncm added an extra zero byte...
What's at the end of 5755? Padding? No padding? Is there an NTH32 header in 5766? Should there be?
-- Maciej Żenczykowski, Kernel Networking Developer @ Google
Hi Maciej
Thanks for the reply, I am sorry for the unclear description.
+remove hgajjar@de.adit-jv.com from CC list due to mail delivery failure.
On 2025/1/11 3:00, Maciej Żenczykowski wrote:
On Thu, Jan 9, 2025 at 11:37 PM Junzhong Pan panjunzhong@outlook.com wrote:
Lecroy shows the wSequence=5765 have 10 Datagram consisting a 31*512 bytes=15872 bytes OUT Transfer but have no ZLP:
OUT Transfer wSequence=5765 NTH32 Datagrams: 1514B * 8 + 580B NDP32 Transfer length: 512B * 31 NO ZLP OUT Transfer wSequence=5766 NTH32 Datagrams: 1514B * 8 NDP32 Transfer length: 512B * 29 + 432
This lead to a result that the first givebacked 16K skb correponding to a usb_request contains two NTH but not complete:
USB Request 1 SKB 16384B (NTH32) (Datagrams) (NDP32) | (NTH32) (Datagrams piece of wSequence=5766) USB Request 2 SKB 14768B (Datagrams piece of wSequence=5766) (NDP32)
From the context, it seems the first report of Wrong NDP SIGN is caused by out-of-bound accessing, the second report of Wrong NTH SIGN is caused by a wrong beginning of NTB parsing.
Could you clarify which Linux Kernel you're testing against?
I am using linux 6.6.63, but the related codes have not newer updates since.
Could you provide some pcap of the actual usb frames? Or perhaps describe better the problem, because I'm not quite following from your email. (I'm not sure if the problem is what windows is sending, or how Linux is parsing it)
I think the root cause of the problem is because Windows 10 ncm class driver doesn't send ZLP, meanwhile the current parsing is depend on a condition that NTB won't spread across usb_request.
This is observed only on Windows 10 but not on Windows 11.
To reproduce the issue, you just need a Windows 10 machine and run iperf3 -c from the windows and iperf3 -s on the gadget board, configure the gadget with the following os_desc, windows10 will bind its ncm driver automatically: echo 0xEF > $GADGET/bDeviceClass echo 0x02 > $GADGET/bDeviceSubClass echo 0x01 > $GADGET/bDeviceProtocol echo 1 > $GADGET/os_desc/use echo 0x1 > $GADGET/os_desc/b_vendor_code echo "MSFT100" > $GADGET/os_desc/qw_sign mkdir -p $FUNCTIONS/ncm.0/os_desc/interface.ncm echo WINNCM > $FUNCTIONS/ncm.0/os_desc/interface.ncm/compatible_id
Reproduced on dwc2 and dwc3 controller with linux 6.6.63.
I *think* what you're saying is that wSequence=5765 & 5766 are being treated as a single ncm message due to their being a multiple of 512 in the former, not followed by a ZLP? I thought that was precisely when microsoft ncm added an extra zero byte...
What's at the end of 5755? Padding? No padding? Is there an NTH32 header in 5766? Should there be?
Okay, I will explain it more precisely (some numbers are corrected). On the USB wire, the transfers on OUT endpoint is like this:
OUT Transfer #1 issued by win10, Transfer length: 512B * 31 = 15872 Bytes OUT Transaction 512B for 31 times, no ZLP and other things. Parsed as below: [Offset 0x0000] A NTH32 Header with wSequence=5765, dwNdpIndex=0x3D90 [Offset 0x0012] Datagram blocks [Offset 0x3D90] A NDP32 describing 11 Datagram blocks plus one zero length item and padding, len = 112 Bytes. (0)index=0x0012 length=1514 (1)index=0x05FE length=1514 (2)index=0x0BEA length=1514 (3)index=0x11D6 length=1514 (4)index=0x17C2 length=1514 (5)index=0x1DAE length=1514 (6)index=0x239A length=1514 (7)index=0x2986 length=1514 (8)index=...... length=1514 (9)index=...... length=1514 (10)index=..... length=580 (11)index=0x0 length=0 ...padding...
OUT Transfer #2 issued by win10, Transfer length: 512B * 29 + 432 = 15280 Bytes OUT Transaction 512B for 29 times, and OUT Transaction 432B for one time. [Offset 0x0000] A NTH32 with wSequence=5766, dwNdpIndex=0x3B48 [Offset 0x0012] Datagram block [Offset 0x3B48] A NDP32 describing Datagram blocks
In the u_ether driver, we first queued a usb_request(buf=skb.data, length=16384) to the udc driver, the udc should give it back when it receive a 16384B data or encounter ZLP/Short packet.
Since the OUT Xfer #1 doesn't have ZLP or SP, the udc won't giveback the usb_request with req->actual = 15872, instead, it gather some piece of the data from OUT Transfer #2 to make a req->actual=16384 then return to rx_complete.
For f_ncm, we now have a the first usb_request givebacked from udc driver having skb->data organized as the following structure:
usb_request #1 -----------------------from OUT Transfer #1:------------------- [Offset 0x0000] A NTH32 Header with wSequence=5765, dwNdpIndex=0x3D90 [Offset 0x0012] Datagram blocks [Offset 0x3D90] A NDP32 describing 10 Datagram blocks plus one zero length item and padding, len = 112 Bytes. (0)index=0x0012 length=1514 (1)index=0x05FE length=1514 (2)index=0x0BEA length=1514 (3)index=0x11D6 length=1514 (4)index=0x17C2 length=1514 (5)index=0x1DAE length=1514 (6)index=0x239A length=1514 (7)index=0x2986 length=1514 (8)index=...... length=1514 (9)index=...... length=1514 (10)index=..... length=580 (11)index=0x0 length=0 ...padding... --------------------from OUT Transfer #2:------------------- [Offset 0x3E00/15872] A NTH32 Header with wSequence=5766, dwNdpIndex=0x3B48 [Offset 0x3E00+???] Datagram block piece from next NDP (wSequence=5766)
During the unwrap, we will first report a "Wrong NTH SIGN" when we try to redirect to the NDP of wSequence=5765 parsing the second NTB in the usb_request, since theie is no bound checking, we read some garbage data from skb->data + [0x3E00 + 0x3B48].
Raising the following message (without modification): [ 174] configfs-gadget.0: Wrong NDP SIGN
After this, we go to the next usb_request, at this time, we have a short package(432B) from the OUT Transfer #2, thus the udc would giveback next usb_request to us like this:
usb_request #2 --------------------from OUT Transfer #2:------------------- [Offset 0x0000] Datagram block piece from NTB of wSequence=5765 3f 98 a6 8e 17 f8 bb 29 07 b8 da 13 7f 20 80 8e 77 ca 32 07 ac 71 b8 8d 84 03 d7 1b 96 9b c4 fa [Offset 0x????] A NDP32 describing Datagram blocks
At this point, the unwrap function try to read a NTH sign but got unexpected user data, then it raise this message: [ 174] configfs-gadget.0: Wrong NTH SIGN, skblen 14768 [ 174] HEAD:00000000b1a72bfc: 3f 98 a6 8e 17 f8 bb 29 07 b8 da 13 7f 20 80 8e 77 ca 32 07 ac 71 b8 8d 84 03 d7 1b 96 9b c4 fa
In short, there is two NTBs across two usb_requests, the beginning of a usb_request is not necessary a NTH sign. Do this make sense to you?
Best Regards, Pan
On Sat, Jan 11, 2025 at 2:31 AM Junzhong Pan panjunzhong@outlook.com wrote:
Hi Maciej
Thanks for the reply, I am sorry for the unclear description.
+remove hgajjar@de.adit-jv.com from CC list due to mail delivery failure.
On 2025/1/11 3:00, Maciej Żenczykowski wrote:
On Thu, Jan 9, 2025 at 11:37 PM Junzhong Pan panjunzhong@outlook.com wrote:
Lecroy shows the wSequence=5765 have 10 Datagram consisting a 31*512 bytes=15872 bytes OUT Transfer but have no ZLP:
OUT Transfer wSequence=5765 NTH32 Datagrams: 1514B * 8 + 580B NDP32 Transfer length: 512B * 31 NO ZLP OUT Transfer wSequence=5766 NTH32 Datagrams: 1514B * 8 NDP32 Transfer length: 512B * 29 + 432
This lead to a result that the first givebacked 16K skb correponding to a usb_request contains two NTH but not complete:
USB Request 1 SKB 16384B (NTH32) (Datagrams) (NDP32) | (NTH32) (Datagrams piece of wSequence=5766) USB Request 2 SKB 14768B (Datagrams piece of wSequence=5766) (NDP32)
From the context, it seems the first report of Wrong NDP SIGN is caused by out-of-bound accessing, the second report of Wrong NTH SIGN is caused by a wrong beginning of NTB parsing.
Could you clarify which Linux Kernel you're testing against?
I am using linux 6.6.63, but the related codes have not newer updates since.
Could you provide some pcap of the actual usb frames? Or perhaps describe better the problem, because I'm not quite following from your email. (I'm not sure if the problem is what windows is sending, or how Linux is parsing it)
I think the root cause of the problem is because Windows 10 ncm class driver doesn't send ZLP, meanwhile the current parsing is depend on a condition that NTB won't spread across usb_request.
This is observed only on Windows 10 but not on Windows 11.
To reproduce the issue, you just need a Windows 10 machine and run iperf3 -c from the windows and iperf3 -s on the gadget board, configure the gadget with the following os_desc, windows10 will bind its ncm driver automatically: echo 0xEF > $GADGET/bDeviceClass echo 0x02 > $GADGET/bDeviceSubClass echo 0x01 > $GADGET/bDeviceProtocol echo 1 > $GADGET/os_desc/use echo 0x1 > $GADGET/os_desc/b_vendor_code echo "MSFT100" > $GADGET/os_desc/qw_sign mkdir -p $FUNCTIONS/ncm.0/os_desc/interface.ncm echo WINNCM > $FUNCTIONS/ncm.0/os_desc/interface.ncm/compatible_id
Reproduced on dwc2 and dwc3 controller with linux 6.6.63.
I *think* what you're saying is that wSequence=5765 & 5766 are being treated as a single ncm message due to their being a multiple of 512 in the former, not followed by a ZLP? I thought that was precisely when microsoft ncm added an extra zero byte...
What's at the end of 5755? Padding? No padding? Is there an NTH32 header in 5766? Should there be?
Okay, I will explain it more precisely (some numbers are corrected). On the USB wire, the transfers on OUT endpoint is like this:
OUT Transfer #1 issued by win10, Transfer length: 512B * 31 = 15872 Bytes OUT Transaction 512B for 31 times, no ZLP and other things. Parsed as below: [Offset 0x0000] A NTH32 Header with wSequence=5765, dwNdpIndex=0x3D90 [Offset 0x0012] Datagram blocks [Offset 0x3D90] A NDP32 describing 11 Datagram blocks plus one zero length item and padding, len = 112 Bytes. (0)index=0x0012 length=1514 (1)index=0x05FE length=1514 (2)index=0x0BEA length=1514 (3)index=0x11D6 length=1514 (4)index=0x17C2 length=1514 (5)index=0x1DAE length=1514 (6)index=0x239A length=1514 (7)index=0x2986 length=1514 (8)index=...... length=1514 (9)index=...... length=1514 (10)index=..... length=580 (11)index=0x0 length=0 ...padding...
OUT Transfer #2 issued by win10, Transfer length: 512B * 29 + 432 = 15280 Bytes OUT Transaction 512B for 29 times, and OUT Transaction 432B for one time. [Offset 0x0000] A NTH32 with wSequence=5766, dwNdpIndex=0x3B48 [Offset 0x0012] Datagram block [Offset 0x3B48] A NDP32 describing Datagram blocks
In the u_ether driver, we first queued a usb_request(buf=skb.data, length=16384) to the udc driver, the udc should give it back when it receive a 16384B data or encounter ZLP/Short packet.
Since the OUT Xfer #1 doesn't have ZLP or SP, the udc won't giveback the usb_request with req->actual = 15872, instead, it gather some piece of the data from OUT Transfer #2 to make a req->actual=16384 then return to rx_complete.
For f_ncm, we now have a the first usb_request givebacked from udc driver having skb->data organized as the following structure:
usb_request #1 -----------------------from OUT Transfer #1:------------------- [Offset 0x0000] A NTH32 Header with wSequence=5765, dwNdpIndex=0x3D90 [Offset 0x0012] Datagram blocks [Offset 0x3D90] A NDP32 describing 10 Datagram blocks plus one zero length item and padding, len = 112 Bytes. (0)index=0x0012 length=1514 (1)index=0x05FE length=1514 (2)index=0x0BEA length=1514 (3)index=0x11D6 length=1514 (4)index=0x17C2 length=1514 (5)index=0x1DAE length=1514 (6)index=0x239A length=1514 (7)index=0x2986 length=1514 (8)index=...... length=1514 (9)index=...... length=1514 (10)index=..... length=580 (11)index=0x0 length=0 ...padding... --------------------from OUT Transfer #2:------------------- [Offset 0x3E00/15872] A NTH32 Header with wSequence=5766, dwNdpIndex=0x3B48 [Offset 0x3E00+???] Datagram block piece from next NDP (wSequence=5766)
During the unwrap, we will first report a "Wrong NTH SIGN" when we try to redirect to the NDP of wSequence=5765 parsing the second NTB in the usb_request, since theie is no bound checking, we read some garbage data from skb->data + [0x3E00 + 0x3B48].
Raising the following message (without modification): [ 174] configfs-gadget.0: Wrong NDP SIGN
After this, we go to the next usb_request, at this time, we have a short package(432B) from the OUT Transfer #2, thus the udc would giveback next usb_request to us like this:
usb_request #2 --------------------from OUT Transfer #2:------------------- [Offset 0x0000] Datagram block piece from NTB of wSequence=5765 3f 98 a6 8e 17 f8 bb 29 07 b8 da 13 7f 20 80 8e 77 ca 32 07 ac 71 b8 8d 84 03 d7 1b 96 9b c4 fa [Offset 0x????] A NDP32 describing Datagram blocks
At this point, the unwrap function try to read a NTH sign but got unexpected user data, then it raise this message: [ 174] configfs-gadget.0: Wrong NTH SIGN, skblen 14768 [ 174] HEAD:00000000b1a72bfc: 3f 98 a6 8e 17 f8 bb 29 07 b8 da 13 7f 20 80 8e 77 ca 32 07 ac 71 b8 8d 84 03 d7 1b 96 9b c4 fa
In short, there is two NTBs across two usb_requests, the beginning of a usb_request is not necessary a NTH sign. Do this make sense to you?
Best Regards, Pan
Ok, this was *very* helpful. I'll dig into this more during the week, but here's a few quick questions/statements that immediately come to mind.
(a) I think this looks like a bug on the sending Win10 side, rather than a parsing bug in Linux, with there being no ZLP, and no short (<512) frame, there's simply no way for the receiver to split at the right spot. Indeed, fixing this on the Linux/parsing side seems non-trivial... I guess we could try to treat the connection as simply a serial connection (ie. ignore frame boundaries), but then we might have issues with other senders...
I guess the most likely 'correct' hack/fix would be to hold on to the extra 'N*512' bytes (it doesn't even have to be 1, though likely the N is odd), if it starts with a NTH header...
(b) I notice the '512' not '1024', I think this implies a USB2 connection instead of USB3 -- could you try replicating this with a USB3 capable data cable (and USB3 ports), this should result in 1024 block size instead of 512.
I'm wondering if the win10 stack is avoiding generating N*1024, but then hitting N*512 with odd N...
Presumably '512' would be '64' with USB1.0/1.1, but I guess finding a USB1.x port/host to test against is likely to be near impossible...
I'll try to see if I can find the source of the bug in the Win driver's sources (though based on it being Win10 only, may need to search history)
-- Maciej Żenczykowski, Kernel Networking Developer @ Google
Hi Maciej,
On 2025/1/13 1:49, Maciej Żenczykowski Wrote:> (a) I think this looks like a bug on the sending Win10 side, rather
than a parsing bug in Linux, with there being no ZLP, and no short (<512) frame, there's simply no way for the receiver to split at the right spot.
Indeed, fixing this on the Linux/parsing side seems non-trivial... I guess we could try to treat the connection as simply a serial connection (ie. ignore frame boundaries), but then we might have issues with other senders...
I guess the most likely 'correct' hack/fix would be to hold on to the extra 'N*512' bytes (it doesn't even have to be 1, though likely the N is odd), if it starts with a NTH header...Make sence, it seems we only need to save the rest data beside
dwBlockLength for next unwrap if a hack is acceptable, otherwise I may need to check if a custom host driver for Windows10 user feasible.
I didn't look carefully into the 1byte and padding stuff with Windows11 host yet, I will take a look then.
(b) I notice the '512' not '1024', I think this implies a USB2 connection instead of USB3 -- could you try replicating this with a USB3 capable data cable (and USB3 ports), this should result in 1024 block size instead of 512.
I'm wondering if the win10 stack is avoiding generating N*1024, but then hitting N*512 with odd N...Yes, I am using USB2.0 connection to better capture the crime scene.
Normally the OUT transfer on USB3.0 SuperSpeed connection comes with a bunch of 1024B Data Pakcet along with a Short Packet less than 1024B in the end from the Lecroy trace.
It's also reproducible on USB3.0 SuperSpeed connection using dwc3 controller, but it will cost more time and make it difficult to capture the online data (limited tracer HW buffer), I can try using software tracing or custom logs later:
[ 5] 26.00-27.00 sec 183 MBytes 1.54 Gbits/sec [ 5] 27.00-28.00 sec 182 MBytes 1.53 Gbits/sec [ 206.123935] configfs.gadget.2: Wrong NDP SIGN [ 206.129785] configfs.gadget.2: Wrong NTH SIGN, skblen 12208 [ 206.136802] HEAD:0000000004f66a88: 80 06 bc f9 c0 a8 24 66 c0 a8 24 65 f7 24 14 51 aa 1a 30 d5 01 f8 01 26 50 10 20 14 27 3d 00 00 [ 5] 28.00-29.00 sec 128 MBytes 1.07 Gbits/sec [ 5] 29.00-30.00 sec 191 MBytes 1.61 Gbits/sec>
Presumably '512' would be '64' with USB1.0/1.1, but I guess finding a USB1.x port/host to test against is likely to be near impossible...
I'll try to see if I can find the source of the bug in the Win driver's sources (though based on it being Win10 only, may need to search history) It's great if you can analyze from the host driver.
I didn't know if the NCM driver open-sourced on github by M$ is the correspond version. They said that only Win 11 officially support NCM in the issue on github yet they do have a built-in driver in Win10 and 2004 tag there in the repo.
On Mon, Jan 13, 2025 at 5:31 AM Junzhong Pan panjunzhong@outlook.com wrote:
Hi Maciej,
On 2025/1/13 1:49, Maciej Żenczykowski Wrote:> (a) I think this looks like a bug on the sending Win10 side, rather
than a parsing bug in Linux, with there being no ZLP, and no short (<512) frame, there's simply no way for the receiver to split at the right spot.
Indeed, fixing this on the Linux/parsing side seems non-trivial... I guess we could try to treat the connection as simply a serial connection (ie. ignore frame boundaries), but then we might have issues with other senders...
I guess the most likely 'correct' hack/fix would be to hold on to the extra 'N*512' bytes (it doesn't even have to be 1, though likely the N is odd), if it starts with a NTH header...Make sence, it seems we only need to save the rest data beside
dwBlockLength for next unwrap if a hack is acceptable, otherwise I may need to check if a custom host driver for Windows10 user feasible.
I didn't look carefully into the 1byte and padding stuff with Windows11 host yet, I will take a look then.
(b) I notice the '512' not '1024', I think this implies a USB2 connection instead of USB3 -- could you try replicating this with a USB3 capable data cable (and USB3 ports), this should result in 1024 block size instead of 512.
I'm wondering if the win10 stack is avoiding generating N*1024, but then hitting N*512 with odd N...Yes, I am using USB2.0 connection to better capture the crime scene.
Normally the OUT transfer on USB3.0 SuperSpeed connection comes with a bunch of 1024B Data Pakcet along with a Short Packet less than 1024B in the end from the Lecroy trace.
It's also reproducible on USB3.0 SuperSpeed connection using dwc3 controller, but it will cost more time and make it difficult to capture the online data (limited tracer HW buffer), I can try using software tracing or custom logs later:
[ 5] 26.00-27.00 sec 183 MBytes 1.54 Gbits/sec [ 5] 27.00-28.00 sec 182 MBytes 1.53 Gbits/sec [ 206.123935] configfs.gadget.2: Wrong NDP SIGN [ 206.129785] configfs.gadget.2: Wrong NTH SIGN, skblen 12208 [ 206.136802] HEAD:0000000004f66a88: 80 06 bc f9 c0 a8 24 66 c0 a8 24 65 f7 24 14 51 aa 1a 30 d5 01 f8 01 26 50 10 20 14 27 3d 00 00 [ 5] 28.00-29.00 sec 128 MBytes 1.07 Gbits/sec [ 5] 29.00-30.00 sec 191 MBytes 1.61 Gbits/sec>
Presumably '512' would be '64' with USB1.0/1.1, but I guess finding a USB1.x port/host to test against is likely to be near impossible...
I'll try to see if I can find the source of the bug in the Win driver's sources (though based on it being Win10 only, may need to search history) It's great if you can analyze from the host driver.
I didn't know if the NCM driver open-sourced on github by M$ is the correspond version. They said that only Win 11 officially support NCM in the issue on github yet they do have a built-in driver in Win10 and 2004 tag there in the repo.
Looking at https://github.com/microsoft/NCM-Driver-for-Windows
commit ded4839c5103ab91822bfde1932393bbb14afda3 (tag: windows_10.0.22000, origin/master) Author: Brandon Jiang jiangyue@microsoft.com Date: Mon Oct 4 14:30:44 2021 -0700
update NCM to Windows 11 (21H2) release, built with Windows 11 (22000) WDK and DMF v1.1.82
-- vs previous change to host/device.cpp
commit 40118f55a0843221f04c8036df8b97fa3512a007 (tag: windows_10.0.19041, origin/release_2004) Author: Brandon Jiang jiangyue@microsoft.com Date: Sun Feb 23 19:53:06 2020 -0800
update NCM to 20H1 Windows release, built with 20H1 WDK and DMF v1.1.20
it introduced
if (bufferRequest->TransferLength < bufferRequest->BufferLength && bufferRequest->TransferLength % hostDevice->m_DataBulkOutPipeMaximumPacketSize == 0) { //NCM spec is not explicit if a ZLP shall be sent when wBlockLength != 0 and it happens to be //multiple of wMaxPacketSize. Our interpretation is that no ZLP needed if wBlockLength is non-zero, //because the non-zero wBlockLength has already told the function side the size of transfer to be expected. // //However, there are in-market NCM devices rely on ZLP as long as the wBlockLength is multiple of wMaxPacketSize. //To deal with such devices, we pad an extra 0 at end so the transfer is no longer multiple of wMaxPacketSize
bufferRequest->Buffer[bufferRequest->TransferLength] = 0; bufferRequest->TransferLength++; }
Which I think is literally the fix for this bug you're reporting. That 'fix' is what then caused us to add the patch at the top of this thread.
So that fix was present in the very first official Win11 release (build 22000), but is likely not present in any Win10 release...
https://en.wikipedia.org/wiki/Windows_10_version_history (2004 - 20H1 - May 2020 Update - 19041 - May 27, 2020) https://en.wikipedia.org/wiki/Windows_11 (first version is 21H2 - Sun Valley - 22000 - October 5, 2021)
Hi Maciej,
Thanks for your quick reply.
On 2025/1/14 3:22, Maciej Żenczykowski Wrote:
Looking at https://github.com/microsoft/NCM-Driver-for-Windows
commit ded4839c5103ab91822bfde1932393bbb14afda3 (tag: windows_10.0.22000, origin/master) Author: Brandon Jiang jiangyue@microsoft.com Date: Mon Oct 4 14:30:44 2021 -0700
update NCM to Windows 11 (21H2) release, built with Windows 11
(22000) WDK and DMF v1.1.82
-- vs previous change to host/device.cpp
commit 40118f55a0843221f04c8036df8b97fa3512a007 (tag: windows_10.0.19041, origin/release_2004) Author: Brandon Jiang jiangyue@microsoft.com Date: Sun Feb 23 19:53:06 2020 -0800
update NCM to 20H1 Windows release, built with 20H1 WDK and DMF v1.1.20
it introduced
if (bufferRequest->TransferLength < bufferRequest->BufferLength && bufferRequest->TransferLength %
hostDevice->m_DataBulkOutPipeMaximumPacketSize == 0) { //NCM spec is not explicit if a ZLP shall be sent when wBlockLength != 0 and it happens to be //multiple of wMaxPacketSize. Our interpretation is that no ZLP needed if wBlockLength is non-zero,
In NCM10, 3.2.2 dwBlockLength description, it states:
If exactly dwNtbInMaxSize or dwNtbOutMaxSize bytes are sent, and the size is a multiple of wMaxPacketSize for the given pipe, then no ZLP shall be sent.
I don't know if its a Microsoft's problem or really **not explicit**. Maybe most of the device implementations treat the incoming data as a stream and do contiguous parsing on it.
//because the non-zero wBlockLength has already told the
function side the size of transfer to be expected. // //However, there are in-market NCM devices rely on ZLP as long as the wBlockLength is multiple of wMaxPacketSize. //To deal with such devices, we pad an extra 0 at end so the transfer is no longer multiple of wMaxPacketSize
bufferRequest->Buffer[bufferRequest->TransferLength] = 0; bufferRequest->TransferLength++; }
Which I think is literally the fix for this bug you're reporting. That 'fix' is what then caused us to add the patch at the top of this thread.
So that fix was present in the very first official Win11 release (build 22000), but is likely not present in any Win10 release...
As you mentioned before to fix it in the gadget side, it seems very complicated, maybe we need a extra skb with size=ntb_size as an intermediate buffer to move around those ntb data before parsing it, but may (or may not) lead to performance drop. Any other idea?
Do you think hacking in the gadget side to fix this compatible issue is a good idea consider that there are still a large number of users using Win10?
(Though Win10 will reach end of support on October 14, 2025, but people may still use it for a long time since many PCs in good condition cannot install win11.)
Best Regards
On Mon, Jan 13, 2025 at 7:46 PM Junzhong Pan panjunzhong@outlook.com wrote:
Hi Maciej,
Thanks for your quick reply.
On 2025/1/14 3:22, Maciej Żenczykowski Wrote:
Looking at https://github.com/microsoft/NCM-Driver-for-Windows
commit ded4839c5103ab91822bfde1932393bbb14afda3 (tag: windows_10.0.22000, origin/master) Author: Brandon Jiang jiangyue@microsoft.com Date: Mon Oct 4 14:30:44 2021 -0700
update NCM to Windows 11 (21H2) release, built with Windows 11
(22000) WDK and DMF v1.1.82
-- vs previous change to host/device.cpp
commit 40118f55a0843221f04c8036df8b97fa3512a007 (tag: windows_10.0.19041, origin/release_2004) Author: Brandon Jiang jiangyue@microsoft.com Date: Sun Feb 23 19:53:06 2020 -0800
update NCM to 20H1 Windows release, built with 20H1 WDK and DMF v1.1.20
it introduced
if (bufferRequest->TransferLength < bufferRequest->BufferLength && bufferRequest->TransferLength %
hostDevice->m_DataBulkOutPipeMaximumPacketSize == 0) { //NCM spec is not explicit if a ZLP shall be sent when wBlockLength != 0 and it happens to be //multiple of wMaxPacketSize. Our interpretation is that no ZLP needed if wBlockLength is non-zero,
In NCM10, 3.2.2 dwBlockLength description, it states:
If exactly dwNtbInMaxSize or dwNtbOutMaxSize bytes are sent, and the size is a multiple of wMaxPacketSize for the given pipe, then no ZLP shall be sent.
But the Linux ncm gadget driver sets both of those (dwNtbIn/OutMaxSize) to 16 kiB (I can never remember which one is relevant to which direction, I think in this case it's 'In' cause it's relevant to the gadget/device and thus affects what is sent by Windows and parsed by Linux). So with 15.5 kiB this is not relevant, right? (Please correct me if I'm wrong)
Furthermore that 16 kiB is also the size of the preallocated receive buffer it passes to the usb stack, so there won't be a problem without ZLP (post 16 kiB xfer), because the buffer will naturally terminate.
I don't know if its a Microsoft's problem or really **not explicit**.
I *think* (in this case) this is very much an MS problem (and the fix in the newer driver confirms it). Short packet / ZLP termination is simply how USB works to transfer packets...
Unfortunately MS is not the only one with problems with ZLP. See Linux's drivers/net/usb/cdc_ncm.c 'NO ZLP' vs 'SEND ZLP' and the FLAG_SEND_ZLP. and note it's set on Apple tethering... [I think your quote up above is exactly why the standard requires 'NO ZLP' operation]
So there's very clearly ample confusion here...
Maybe most of the device implementations treat the incoming data as a stream and do contiguous parsing on it.
//because the non-zero wBlockLength has already told the
function side the size of transfer to be expected. // //However, there are in-market NCM devices rely on ZLP as long as the wBlockLength is multiple of wMaxPacketSize. //To deal with such devices, we pad an extra 0 at end so the transfer is no longer multiple of wMaxPacketSize
bufferRequest->Buffer[bufferRequest->TransferLength] = 0; bufferRequest->TransferLength++; }
Which I think is literally the fix for this bug you're reporting. That 'fix' is what then caused us to add the patch at the top of this thread.
So that fix was present in the very first official Win11 release (build 22000), but is likely not present in any Win10 release...
As you mentioned before to fix it in the gadget side, it seems very complicated, maybe we need a extra skb with size=ntb_size as an intermediate buffer to move around those ntb data before parsing it, but may (or may not) lead to performance drop. Any other idea?
It would definitely lead to a fair bit of code complication, and in the particular case of this happening it would involve (at least) an extra copy (to linearize), so definitely be a performance hit.
I think we'd have to have a potential extra buffer/offset/length. Initially it would be null/0/0.
Whenever we receive a frame and parsing it leaves us with leftover bytes, we'd have to allocate this buffer, and copy the leftover stuff into this temporary area.
Try to parse it (note: potentially repeatedly, because we could have 8 2kiB merged pkts...) and swallow the part that parsed, but if the buffer is too short, then hold on to it until we receive more data. If we ever manage to fully parse it - we could potentially deallocate it (or hold on to the memory to avoid multiple alloc/deallocs).
When we receive a usb xfer, if the buffer already exists (or is non zero size), the new xfer needs to be appended to it, and parsing repeats.
This btw. implies this needs to be a 32 kiB (2*16) buffer... vmalloc would be fine.
I think we'd likely need to get rid of the way this stuff abuses skbs for usb anyway. I've wanted to do this anyway (note: not sure I've seen this in the gadget or host side ncm driver).
Ugh...
Do you think hacking in the gadget side to fix this compatible issue is a good idea consider that there are still a large number of users using Win10?
I'm still thinking about it, but I'd far prefer for MS to fix their Win10 driver. This just seems really hard to fix in the gadget.
(Though Win10 will reach end of support on October 14, 2025,
Far longer than that. Since there's (purchasable) extended support (+2 years), and non consumer Win10 EOL is further out as well (Enterprise is Jan 2027, business/school Oct 2028, IoT Jan 2032, etc).
but people may still use it for a long time
Yeah, Win10 will be around for many many years to come...
since many PCs in good condition cannot install win11.)
Or people don't want to, even when they could :-) I personally have a win11 capable PC that I'm refusing to upgrade... (and a pair that are too old)
This is only the *second* time Win11 actually has something (beyond what's in Win10) I could potentially maybe actually want (the first being related to WSL, though I've since stopped using it). I miss XP with classic UI (maybe Win7 was better? not sure).
linux-stable-mirror@lists.linaro.org