Shyam, Could the two test failures in:
http://smb311-linux-testing.southcentralus.cloudapp.azure.com/#/builders/3/b...
be related to this patch ie tests generic/133 and generic/471 failing with hundreds of "CIFS: Status code returned 0xc000009a STATUS_INSUFFICIENT_RESOURCES" errors. The only failing tests were multichannel related.
This test run was with these six patches: 600ed21fe802 (HEAD -> for-next, origin/for-next, origin/HEAD) netfs: Fix a number of read-retry hangs 9f75ff5536b1 smb: client, common: Avoid multiple -Wflex-array-member-not-at-end warnings fab0eddb9fe7 cifs: Treat unhandled directory name surrogate reparse points as mount directory nodes 69476da76b9c cifs: Throw -EOPNOTSUPP error on unsupported reparse point type from parse_reparse_point() ef590eae88cf cifs: deal with the channel loading lag while picking channels f1bf10d7e909 cifs: pick channels for individual subrequests
Anyone else seeing the same errors with multichannel on these tests?
be related
On Wed, Feb 12, 2025 at 2:35 PM Steve French smfrench@gmail.com wrote:
tentatively merged into cifs-2.6.git for-next pending more reviews and testing
On Wed, Feb 12, 2025 at 1:35 AM nspmangalore@gmail.com wrote:
From: Shyam Prasad N sprasad@microsoft.com
Our current approach to select a channel for sending requests is this:
- iterate all channels to find the min and max queue depth
- if min and max are not the same, pick the channel with min depth
- if min and max are same, round robin, as all channels are equally loaded
The problem with this approach is that there's a lag between selecting a channel and sending the request (that increases the queue depth on the channel). While these numbers will eventually catch up, there could be a skew in the channel usage, depending on the application's I/O parallelism and the server's speed of handling requests.
With sufficient parallelism, this lag can artificially increase the queue depth, thereby impacting the performance negatively.
This change will change the step 1 above to start the iteration from the last selected channel. This is to reduce the skew in channel usage even in the presence of this lag.
Fixes: ea90708d3cf3 ("cifs: use the least loaded channel for sending requests") Cc: stable@vger.kernel.org Signed-off-by: Shyam Prasad N sprasad@microsoft.com
fs/smb/client/transport.c | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-)
diff --git a/fs/smb/client/transport.c b/fs/smb/client/transport.c index 0dc80959ce48..e2fbf8b18eb2 100644 --- a/fs/smb/client/transport.c +++ b/fs/smb/client/transport.c @@ -1015,14 +1015,16 @@ struct TCP_Server_Info *cifs_pick_channel(struct cifs_ses *ses) uint index = 0; unsigned int min_in_flight = UINT_MAX, max_in_flight = 0; struct TCP_Server_Info *server = NULL;
int i;
int i, start, cur; if (!ses) return NULL; spin_lock(&ses->chan_lock);
start = atomic_inc_return(&ses->chan_seq); for (i = 0; i < ses->chan_count; i++) {
server = ses->chans[i].server;
cur = (start + i) % ses->chan_count;
server = ses->chans[cur].server; if (!server || server->terminate) continue;
@@ -1039,17 +1041,15 @@ struct TCP_Server_Info *cifs_pick_channel(struct cifs_ses *ses) */ if (server->in_flight < min_in_flight) { min_in_flight = server->in_flight;
index = i;
index = cur; } if (server->in_flight > max_in_flight) max_in_flight = server->in_flight; } /* if all channels are equally loaded, fall back to round-robin */
if (min_in_flight == max_in_flight) {
index = (uint)atomic_inc_return(&ses->chan_seq);
index %= ses->chan_count;
}
if (min_in_flight == max_in_flight)
index = (uint)start % ses->chan_count; server = ses->chans[index].server; spin_unlock(&ses->chan_lock);
-- 2.43.0
-- Thanks,
Steve