The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 0b3d6e6f2dd0a7b697b1aa8c167265908940624b Mon Sep 17 00:00:00 2001
From: Greg Thelen <gthelen(a)google.com>
Date: Fri, 5 Apr 2019 18:39:18 -0700
Subject: [PATCH] mm: writeback: use exact memcg dirty counts
Since commit a983b5ebee57 ("mm: memcontrol: fix excessive complexity in
memory.stat reporting") memcg dirty and writeback counters are managed
as:
1) per-memcg per-cpu values in range of [-32..32]
2) per-memcg atomic counter
When a per-cpu counter cannot fit in [-32..32] it's flushed to the
atomic. Stat readers only check the atomic. Thus readers such as
balance_dirty_pages() may see a nontrivial error margin: 32 pages per
cpu.
Assuming 100 cpus:
4k x86 page_size: 13 MiB error per memcg
64k ppc page_size: 200 MiB error per memcg
Considering that dirty+writeback are used together for some decisions the
errors double.
This inaccuracy can lead to undeserved oom kills. One nasty case is
when all per-cpu counters hold positive values offsetting an atomic
negative value (i.e. per_cpu[*]=32, atomic=n_cpu*-32).
balance_dirty_pages() only consults the atomic and does not consider
throttling the next n_cpu*32 dirty pages. If the file_lru is in the
13..200 MiB range then there's absolutely no dirty throttling, which
burdens vmscan with only dirty+writeback pages thus resorting to oom
kill.
It could be argued that tiny containers are not supported, but it's more
subtle. It's the amount the space available for file lru that matters.
If a container has memory.max-200MiB of non reclaimable memory, then it
will also suffer such oom kills on a 100 cpu machine.
The following test reliably ooms without this patch. This patch avoids
oom kills.
$ cat test
mount -t cgroup2 none /dev/cgroup
cd /dev/cgroup
echo +io +memory > cgroup.subtree_control
mkdir test
cd test
echo 10M > memory.max
(echo $BASHPID > cgroup.procs && exec /memcg-writeback-stress /foo)
(echo $BASHPID > cgroup.procs && exec dd if=/dev/zero of=/foo bs=2M count=100)
$ cat memcg-writeback-stress.c
/*
* Dirty pages from all but one cpu.
* Clean pages from the non dirtying cpu.
* This is to stress per cpu counter imbalance.
* On a 100 cpu machine:
* - per memcg per cpu dirty count is 32 pages for each of 99 cpus
* - per memcg atomic is -99*32 pages
* - thus the complete dirty limit: sum of all counters 0
* - balance_dirty_pages() only sees atomic count -99*32 pages, which
* it max()s to 0.
* - So a workload can dirty -99*32 pages before balance_dirty_pages()
* cares.
*/
#define _GNU_SOURCE
#include <err.h>
#include <fcntl.h>
#include <sched.h>
#include <stdlib.h>
#include <stdio.h>
#include <sys/stat.h>
#include <sys/sysinfo.h>
#include <sys/types.h>
#include <unistd.h>
static char *buf;
static int bufSize;
static void set_affinity(int cpu)
{
cpu_set_t affinity;
CPU_ZERO(&affinity);
CPU_SET(cpu, &affinity);
if (sched_setaffinity(0, sizeof(affinity), &affinity))
err(1, "sched_setaffinity");
}
static void dirty_on(int output_fd, int cpu)
{
int i, wrote;
set_affinity(cpu);
for (i = 0; i < 32; i++) {
for (wrote = 0; wrote < bufSize; ) {
int ret = write(output_fd, buf+wrote, bufSize-wrote);
if (ret == -1)
err(1, "write");
wrote += ret;
}
}
}
int main(int argc, char **argv)
{
int cpu, flush_cpu = 1, output_fd;
const char *output;
if (argc != 2)
errx(1, "usage: output_file");
output = argv[1];
bufSize = getpagesize();
buf = malloc(getpagesize());
if (buf == NULL)
errx(1, "malloc failed");
output_fd = open(output, O_CREAT|O_RDWR);
if (output_fd == -1)
err(1, "open(%s)", output);
for (cpu = 0; cpu < get_nprocs(); cpu++) {
if (cpu != flush_cpu)
dirty_on(output_fd, cpu);
}
set_affinity(flush_cpu);
if (fsync(output_fd))
err(1, "fsync(%s)", output);
if (close(output_fd))
err(1, "close(%s)", output);
free(buf);
}
Make balance_dirty_pages() and wb_over_bg_thresh() work harder to
collect exact per memcg counters. This avoids the aforementioned oom
kills.
This does not affect the overhead of memory.stat, which still reads the
single atomic counter.
Why not use percpu_counter? memcg already handles cpus going offline, so
no need for that overhead from percpu_counter. And the percpu_counter
spinlocks are more heavyweight than is required.
It probably also makes sense to use exact dirty and writeback counters
in memcg oom reports. But that is saved for later.
Link: http://lkml.kernel.org/r/20190329174609.164344-1-gthelen@google.com
Signed-off-by: Greg Thelen <gthelen(a)google.com>
Reviewed-by: Roman Gushchin <guro(a)fb.com>
Acked-by: Johannes Weiner <hannes(a)cmpxchg.org>
Cc: Michal Hocko <mhocko(a)kernel.org>
Cc: Vladimir Davydov <vdavydov.dev(a)gmail.com>
Cc: Tejun Heo <tj(a)kernel.org>
Cc: <stable(a)vger.kernel.org> [4.16+]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org>
diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 1f3d880b7ca1..dbb6118370c1 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -566,7 +566,10 @@ struct mem_cgroup *lock_page_memcg(struct page *page);
void __unlock_page_memcg(struct mem_cgroup *memcg);
void unlock_page_memcg(struct page *page);
-/* idx can be of type enum memcg_stat_item or node_stat_item */
+/*
+ * idx can be of type enum memcg_stat_item or node_stat_item.
+ * Keep in sync with memcg_exact_page_state().
+ */
static inline unsigned long memcg_page_state(struct mem_cgroup *memcg,
int idx)
{
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 532e0e2a4817..81a0d3914ec9 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -3882,6 +3882,22 @@ struct wb_domain *mem_cgroup_wb_domain(struct bdi_writeback *wb)
return &memcg->cgwb_domain;
}
+/*
+ * idx can be of type enum memcg_stat_item or node_stat_item.
+ * Keep in sync with memcg_exact_page().
+ */
+static unsigned long memcg_exact_page_state(struct mem_cgroup *memcg, int idx)
+{
+ long x = atomic_long_read(&memcg->stat[idx]);
+ int cpu;
+
+ for_each_online_cpu(cpu)
+ x += per_cpu_ptr(memcg->stat_cpu, cpu)->count[idx];
+ if (x < 0)
+ x = 0;
+ return x;
+}
+
/**
* mem_cgroup_wb_stats - retrieve writeback related stats from its memcg
* @wb: bdi_writeback in question
@@ -3907,10 +3923,10 @@ void mem_cgroup_wb_stats(struct bdi_writeback *wb, unsigned long *pfilepages,
struct mem_cgroup *memcg = mem_cgroup_from_css(wb->memcg_css);
struct mem_cgroup *parent;
- *pdirty = memcg_page_state(memcg, NR_FILE_DIRTY);
+ *pdirty = memcg_exact_page_state(memcg, NR_FILE_DIRTY);
/* this should eventually include NR_UNSTABLE_NFS */
- *pwriteback = memcg_page_state(memcg, NR_WRITEBACK);
+ *pwriteback = memcg_exact_page_state(memcg, NR_WRITEBACK);
*pfilepages = mem_cgroup_nr_lru_pages(memcg, (1 << LRU_INACTIVE_FILE) |
(1 << LRU_ACTIVE_FILE));
*pheadroom = PAGE_COUNTER_MAX;
Hello,
We ran automated tests on a patchset that was proposed for merging into this
kernel tree. The patches were applied to:
Kernel repo: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git
Commit: 8b298d3a0bd5 - Linux 5.0.7
The results of these automated tests are provided below.
Overall result: PASSED
Merge: OK
Compile: OK
Tests: OK
Please reply to this email if you have any questions about the tests that we
ran or if you have any suggestions on how to make future tests more effective.
,-. ,-.
( C ) ( K ) Continuous
`-',-.`-' Kernel
( I ) Integration
`-'
______________________________________________________________________________
Merge testing
-------------
We cloned this repository and checked out a ref:
Repo: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git
Ref: 8b298d3a0bd5 - Linux 5.0.7
We then merged the patchset with `git am`:
drm-i915-gvt-do-not-let-pin-count-of-shadow-mm-go-ne.patch
kbuild-pkg-use-f-srctree-makefile-to-recurse-to-top-.patch
netfilter-nft_compat-use-.release_ops-and-remove-lis.patch
netfilter-nf_tables-use-after-free-in-dynamic-operat.patch
netfilter-nf_tables-add-missing-release_ops-in-error.patch
hv_netvsc-fix-unwanted-wakeup-after-tx_disable.patch
ibmvnic-fix-completion-structure-initialization.patch
ip6_tunnel-match-to-arphrd_tunnel6-for-dev-type.patch
ipv6-fix-dangling-pointer-when-ipv6-fragment.patch
ipv6-sit-reset-ip-header-pointer-in-ipip6_rcv.patch
kcm-switch-order-of-device-registration-to-fix-a-cra.patch
net-ethtool-not-call-vzalloc-for-zero-sized-memory-r.patch
net-gro-fix-gro-flush-when-receiving-a-gso-packet.patch
net-mlx5-decrease-default-mr-cache-size.patch
netns-provide-pure-entropy-for-net_hash_mix.patch
net-rds-force-to-destroy-connection-if-t_sock-is-nul.patch
net-sched-act_sample-fix-divide-by-zero-in-the-traff.patch
net-sched-fix-get-helper-of-the-matchall-cls.patch
openvswitch-fix-flow-actions-reallocation.patch
qmi_wwan-add-olicard-600.patch
r8169-disable-aspm-again.patch
sctp-initialize-_pad-of-sockaddr_in-before-copying-t.patch
tcp-ensure-dctcp-reacts-to-losses.patch
tcp-fix-a-potential-null-pointer-dereference-in-tcp_.patch
vrf-check-accept_source_route-on-the-original-netdev.patch
net-mlx5e-fix-error-handling-when-refreshing-tirs.patch
net-mlx5e-add-a-lock-on-tir-list.patch
nfp-validate-the-return-code-from-dev_queue_xmit.patch
nfp-disable-netpoll-on-representors.patch
bnxt_en-improve-rx-consumer-index-validity-check.patch
bnxt_en-reset-device-on-rx-buffer-errors.patch
net-ip_gre-fix-possible-use-after-free-in-erspan_rcv.patch
net-ip6_gre-fix-possible-use-after-free-in-ip6erspan.patch
net-bridge-always-clear-mcast-matching-struct-on-rep.patch
net-thunderx-fix-null-pointer-dereference-in-nicvf_o.patch
net-vrf-fix-ping-failed-when-vrf-mtu-is-set-to-0.patch
net-core-netif_receive_skb_list-unlist-skb-before-pa.patch
r8169-disable-default-rx-interrupt-coalescing-on-rtl.patch
net-mlx5-add-a-missing-check-on-idr_find-free-buf.patch
net-mlx5e-update-xoff-formula.patch
net-mlx5e-update-xon-formula.patch
kbuild-clang-choose-gcc_toolchain_dir-not-on-ld.patch
lib-string.c-implement-a-basic-bcmp.patch
revert-clk-meson-clean-up-clock-registration.patch
tty-mark-siemens-r3964-line-discipline-as-broken.patch
tty-ldisc-add-sysctl-to-prevent-autoloading-of-ldiscs.patch
hwmon-w83773g-select-regmap_i2c-to-fix-build-error.patch
hwmon-occ-fix-power-sensor-indexing.patch
smb3-allow-persistent-handle-timeout-to-be-configurable-on-mount.patch
hid-logitech-handle-0-scroll-events-for-the-m560.patch
acpica-clear-status-of-gpes-before-enabling-them.patch
acpica-namespace-remove-address-node-from-global-list-after-method-termination.patch
alsa-seq-fix-oob-reads-from-strlcpy.patch
alsa-hda-realtek-enable-headset-mic-of-acer-travelmate-b114-21-with-alc233.patch
alsa-hda-realtek-add-quirk-for-tuxedo-xc-1509.patch
alsa-xen-front-do-not-use-stream-buffer-size-before-it-is-set.patch
alsa-hda-add-two-more-machines-to-the-power_save_blacklist.patch
Compile testing
---------------
We compiled the kernel for 3 architectures:
aarch64:
make options: -j20 INSTALL_MOD_STRIP=1 targz-pkg
configuration: https://artifacts.cki-project.org/builds/aarch64/kernel-stable_queue-aarch6…
kernel build: https://artifacts.cki-project.org/builds/aarch64/kernel-stable_queue-aarch6…
ppc64le:
make options: -j20 INSTALL_MOD_STRIP=1 targz-pkg
configuration: https://artifacts.cki-project.org/builds/ppc64le/kernel-stable_queue-ppc64l…
kernel build: https://artifacts.cki-project.org/builds/ppc64le/kernel-stable_queue-ppc64l…
x86_64:
make options: -j20 INSTALL_MOD_STRIP=1 targz-pkg
configuration: https://artifacts.cki-project.org/builds/x86_64/kernel-stable_queue-x86_64-…
kernel build: https://artifacts.cki-project.org/builds/x86_64/kernel-stable_queue-x86_64-…
Hardware testing
----------------
We booted each kernel and ran the following tests:
aarch64:
✅ Boot test [0]
✅ LTP lite [1]
✅ Loopdev Sanity [2]
✅ AMTU (Abstract Machine Test Utility) [3]
🚧 ✅ audit: audit testsuite test [4]
✅ httpd: mod_ssl smoke sanity [5]
✅ httpd: php sanity [6]
✅ iotop: sanity [7]
✅ /CoreOS/net-snmp/Regression/bz251332-tcp-transport
✅ tuned: tune-processes-through-perf [8]
🚧 ✅ stress: stress-ng [9]
ppc64le:
✅ Boot test [0]
✅ LTP lite [1]
✅ Loopdev Sanity [2]
✅ AMTU (Abstract Machine Test Utility) [3]
🚧 ✅ audit: audit testsuite test [4]
✅ httpd: mod_ssl smoke sanity [5]
✅ httpd: php sanity [6]
✅ iotop: sanity [7]
✅ /CoreOS/net-snmp/Regression/bz251332-tcp-transport
🚧 ✅ selinux-policy: serge-testsuite [10]
✅ tuned: tune-processes-through-perf [8]
🚧 ✅ stress: stress-ng [9]
x86_64:
✅ Boot test [0]
✅ LTP lite [1]
✅ Loopdev Sanity [2]
✅ AMTU (Abstract Machine Test Utility) [3]
🚧 ✅ audit: audit testsuite test [4]
✅ httpd: mod_ssl smoke sanity [5]
✅ httpd: php sanity [6]
✅ iotop: sanity [7]
✅ /CoreOS/net-snmp/Regression/bz251332-tcp-transport
🚧 ✅ selinux-policy: serge-testsuite [10]
✅ tuned: tune-processes-through-perf [8]
🚧 ✅ stress: stress-ng [9]
Test source:
[0]: https://github.com/CKI-project/tests-beaker/archive/master.zip#distribution…
[1]: https://github.com/CKI-project/tests-beaker/archive/master.zip#distribution…
[2]: https://github.com/CKI-project/tests-beaker/archive/master.zip#filesystems/…
[3]: https://github.com/CKI-project/tests-beaker/archive/master.zip#misc/amtu
[4]: https://github.com/CKI-project/tests-beaker/archive/master.zip#packages/aud…
[5]: https://github.com/CKI-project/tests-beaker/archive/master.zip#packages/htt…
[6]: https://github.com/CKI-project/tests-beaker/archive/master.zip#packages/htt…
[7]: https://github.com/CKI-project/tests-beaker/archive/master.zip#packages/iot…
[8]: https://github.com/CKI-project/tests-beaker/archive/master.zip#packages/tun…
[9]: https://github.com/CKI-project/tests-beaker/archive/master.zip#stress/stres…
[10]: https://github.com/CKI-project/tests-beaker/archive/master.zip#/packages/se…
Waived tests (marked with 🚧)
-----------------------------
This test run included waived tests. Such tests are executed but their results
are not taken into account. Tests are waived when their results are not
reliable enough, e.g. when they're just introduced or are being fixed.
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From b9d0a85d6b2e76630cfd4c475ee3af4109bfd87a Mon Sep 17 00:00:00 2001
From: Yue Haibing <yuehaibing(a)huawei.com>
Date: Wed, 20 Feb 2019 16:25:38 +0800
Subject: [PATCH] tpm: Fix the type of the return value in
calc_tpm2_event_size()
calc_tpm2_event_size() has an invalid signature because
it returns a 'size_t' where as its signature says that
it returns 'int'.
Cc: <stable(a)vger.kernel.org>
Fixes: 4d23cc323cdb ("tpm: add securityfs support for TPM 2.0 firmware event log")
Suggested-by: Jarkko Sakkinen <jarkko.sakkinen(a)linux.intel.com>
Signed-off-by: Yue Haibing <yuehaibing(a)huawei.com>
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen(a)linux.intel.com>
Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen(a)linux.intel.com>
Signed-off-by: James Morris <james.morris(a)microsoft.com>
diff --git a/drivers/char/tpm/eventlog/tpm2.c b/drivers/char/tpm/eventlog/tpm2.c
index d8b77133a83a..f824563fc28d 100644
--- a/drivers/char/tpm/eventlog/tpm2.c
+++ b/drivers/char/tpm/eventlog/tpm2.c
@@ -37,8 +37,8 @@
*
* Returns size of the event. If it is an invalid event, returns 0.
*/
-static int calc_tpm2_event_size(struct tcg_pcr_event2_head *event,
- struct tcg_pcr_event *event_header)
+static size_t calc_tpm2_event_size(struct tcg_pcr_event2_head *event,
+ struct tcg_pcr_event *event_header)
{
struct tcg_efi_specid_event_head *efispecid;
struct tcg_event_field *event_field;
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From b9d0a85d6b2e76630cfd4c475ee3af4109bfd87a Mon Sep 17 00:00:00 2001
From: Yue Haibing <yuehaibing(a)huawei.com>
Date: Wed, 20 Feb 2019 16:25:38 +0800
Subject: [PATCH] tpm: Fix the type of the return value in
calc_tpm2_event_size()
calc_tpm2_event_size() has an invalid signature because
it returns a 'size_t' where as its signature says that
it returns 'int'.
Cc: <stable(a)vger.kernel.org>
Fixes: 4d23cc323cdb ("tpm: add securityfs support for TPM 2.0 firmware event log")
Suggested-by: Jarkko Sakkinen <jarkko.sakkinen(a)linux.intel.com>
Signed-off-by: Yue Haibing <yuehaibing(a)huawei.com>
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen(a)linux.intel.com>
Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen(a)linux.intel.com>
Signed-off-by: James Morris <james.morris(a)microsoft.com>
diff --git a/drivers/char/tpm/eventlog/tpm2.c b/drivers/char/tpm/eventlog/tpm2.c
index d8b77133a83a..f824563fc28d 100644
--- a/drivers/char/tpm/eventlog/tpm2.c
+++ b/drivers/char/tpm/eventlog/tpm2.c
@@ -37,8 +37,8 @@
*
* Returns size of the event. If it is an invalid event, returns 0.
*/
-static int calc_tpm2_event_size(struct tcg_pcr_event2_head *event,
- struct tcg_pcr_event *event_header)
+static size_t calc_tpm2_event_size(struct tcg_pcr_event2_head *event,
+ struct tcg_pcr_event *event_header)
{
struct tcg_efi_specid_event_head *efispecid;
struct tcg_event_field *event_field;
The patch below does not apply to the 5.0-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From b9d0a85d6b2e76630cfd4c475ee3af4109bfd87a Mon Sep 17 00:00:00 2001
From: Yue Haibing <yuehaibing(a)huawei.com>
Date: Wed, 20 Feb 2019 16:25:38 +0800
Subject: [PATCH] tpm: Fix the type of the return value in
calc_tpm2_event_size()
calc_tpm2_event_size() has an invalid signature because
it returns a 'size_t' where as its signature says that
it returns 'int'.
Cc: <stable(a)vger.kernel.org>
Fixes: 4d23cc323cdb ("tpm: add securityfs support for TPM 2.0 firmware event log")
Suggested-by: Jarkko Sakkinen <jarkko.sakkinen(a)linux.intel.com>
Signed-off-by: Yue Haibing <yuehaibing(a)huawei.com>
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen(a)linux.intel.com>
Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen(a)linux.intel.com>
Signed-off-by: James Morris <james.morris(a)microsoft.com>
diff --git a/drivers/char/tpm/eventlog/tpm2.c b/drivers/char/tpm/eventlog/tpm2.c
index d8b77133a83a..f824563fc28d 100644
--- a/drivers/char/tpm/eventlog/tpm2.c
+++ b/drivers/char/tpm/eventlog/tpm2.c
@@ -37,8 +37,8 @@
*
* Returns size of the event. If it is an invalid event, returns 0.
*/
-static int calc_tpm2_event_size(struct tcg_pcr_event2_head *event,
- struct tcg_pcr_event *event_header)
+static size_t calc_tpm2_event_size(struct tcg_pcr_event2_head *event,
+ struct tcg_pcr_event *event_header)
{
struct tcg_efi_specid_event_head *efispecid;
struct tcg_event_field *event_field;
The patch below does not apply to the 5.0-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 153322f7536a181e4d1b288aa6f01c0ce65f5c7c Mon Sep 17 00:00:00 2001
From: Steve French <stfrench(a)microsoft.com>
Date: Thu, 28 Mar 2019 22:32:49 -0500
Subject: [PATCH] smb3: Fix enumerating snapshots to Azure
Some servers (see MS-SMB2 protocol specification
section 3.3.5.15.1) expect that the FSCTL enumerate snapshots
is done twice, with the first query having EXACTLY the minimum
size response buffer requested (16 bytes) which refreshes
the snapshot list (otherwise that and subsequent queries get
an empty list returned). So had to add code to set
the maximum response size differently for the first snapshot
query (which gets the size needed for the second query which
contains the actual list of snapshots).
Signed-off-by: Steve French <stfrench(a)microsoft.com>
Reviewed-by: Ronnie Sahlberg <lsahlber(a)redhat.com>
Reviewed-by: Pavel Shilovsky <pshilov(a)microsoft.com>
CC: Stable <stable(a)vger.kernel.org> # 4.19+
diff --git a/fs/cifs/smb2file.c b/fs/cifs/smb2file.c
index b204e84b87fb..ac97e0c01d4e 100644
--- a/fs/cifs/smb2file.c
+++ b/fs/cifs/smb2file.c
@@ -74,7 +74,7 @@ smb2_open_file(const unsigned int xid, struct cifs_open_parms *oparms,
fid->volatile_fid, FSCTL_LMR_REQUEST_RESILIENCY,
true /* is_fsctl */,
(char *)&nr_ioctl_req, sizeof(nr_ioctl_req),
- NULL, NULL /* no return info */);
+ CIFSMaxBufSize, NULL, NULL /* no return info */);
if (rc == -EOPNOTSUPP) {
cifs_dbg(VFS,
"resiliency not supported by server, disabling\n");
diff --git a/fs/cifs/smb2ops.c b/fs/cifs/smb2ops.c
index 7cfafac255aa..b22be10ee980 100644
--- a/fs/cifs/smb2ops.c
+++ b/fs/cifs/smb2ops.c
@@ -581,7 +581,7 @@ SMB3_request_interfaces(const unsigned int xid, struct cifs_tcon *tcon)
rc = SMB2_ioctl(xid, tcon, NO_FILE_ID, NO_FILE_ID,
FSCTL_QUERY_NETWORK_INTERFACE_INFO, true /* is_fsctl */,
NULL /* no data input */, 0 /* no data input */,
- (char **)&out_buf, &ret_data_len);
+ CIFSMaxBufSize, (char **)&out_buf, &ret_data_len);
if (rc == -EOPNOTSUPP) {
cifs_dbg(FYI,
"server does not support query network interfaces\n");
@@ -1297,7 +1297,7 @@ SMB2_request_res_key(const unsigned int xid, struct cifs_tcon *tcon,
rc = SMB2_ioctl(xid, tcon, persistent_fid, volatile_fid,
FSCTL_SRV_REQUEST_RESUME_KEY, true /* is_fsctl */,
- NULL, 0 /* no input */,
+ NULL, 0 /* no input */, CIFSMaxBufSize,
(char **)&res_key, &ret_data_len);
if (rc) {
@@ -1402,7 +1402,7 @@ smb2_ioctl_query_info(const unsigned int xid,
rc = SMB2_ioctl_init(tcon, &rqst[1],
COMPOUND_FID, COMPOUND_FID,
qi.info_type, true, NULL,
- 0);
+ 0, CIFSMaxBufSize);
}
} else if (qi.flags == PASSTHRU_QUERY_INFO) {
memset(&qi_iov, 0, sizeof(qi_iov));
@@ -1530,8 +1530,8 @@ smb2_copychunk_range(const unsigned int xid,
rc = SMB2_ioctl(xid, tcon, trgtfile->fid.persistent_fid,
trgtfile->fid.volatile_fid, FSCTL_SRV_COPYCHUNK_WRITE,
true /* is_fsctl */, (char *)pcchunk,
- sizeof(struct copychunk_ioctl), (char **)&retbuf,
- &ret_data_len);
+ sizeof(struct copychunk_ioctl), CIFSMaxBufSize,
+ (char **)&retbuf, &ret_data_len);
if (rc == 0) {
if (ret_data_len !=
sizeof(struct copychunk_ioctl_rsp)) {
@@ -1691,7 +1691,7 @@ static bool smb2_set_sparse(const unsigned int xid, struct cifs_tcon *tcon,
rc = SMB2_ioctl(xid, tcon, cfile->fid.persistent_fid,
cfile->fid.volatile_fid, FSCTL_SET_SPARSE,
true /* is_fctl */,
- &setsparse, 1, NULL, NULL);
+ &setsparse, 1, CIFSMaxBufSize, NULL, NULL);
if (rc) {
tcon->broken_sparse_sup = true;
cifs_dbg(FYI, "set sparse rc = %d\n", rc);
@@ -1764,7 +1764,7 @@ smb2_duplicate_extents(const unsigned int xid,
true /* is_fsctl */,
(char *)&dup_ext_buf,
sizeof(struct duplicate_extents_to_file),
- NULL,
+ CIFSMaxBufSize, NULL,
&ret_data_len);
if (ret_data_len > 0)
@@ -1799,7 +1799,7 @@ smb3_set_integrity(const unsigned int xid, struct cifs_tcon *tcon,
true /* is_fsctl */,
(char *)&integr_info,
sizeof(struct fsctl_set_integrity_information_req),
- NULL,
+ CIFSMaxBufSize, NULL,
&ret_data_len);
}
@@ -1807,6 +1807,8 @@ smb3_set_integrity(const unsigned int xid, struct cifs_tcon *tcon,
/* GMT Token is @GMT-YYYY.MM.DD-HH.MM.SS Unicode which is 48 bytes + null */
#define GMT_TOKEN_SIZE 50
+#define MIN_SNAPSHOT_ARRAY_SIZE 16 /* See MS-SMB2 section 3.3.5.15.1 */
+
/*
* Input buffer contains (empty) struct smb_snapshot array with size filled in
* For output see struct SRV_SNAPSHOT_ARRAY in MS-SMB2 section 2.2.32.2
@@ -1818,13 +1820,29 @@ smb3_enum_snapshots(const unsigned int xid, struct cifs_tcon *tcon,
char *retbuf = NULL;
unsigned int ret_data_len = 0;
int rc;
+ u32 max_response_size;
struct smb_snapshot_array snapshot_in;
+ if (get_user(ret_data_len, (unsigned int __user *)ioc_buf))
+ return -EFAULT;
+
+ /*
+ * Note that for snapshot queries that servers like Azure expect that
+ * the first query be minimal size (and just used to get the number/size
+ * of previous versions) so response size must be specified as EXACTLY
+ * sizeof(struct snapshot_array) which is 16 when rounded up to multiple
+ * of eight bytes.
+ */
+ if (ret_data_len == 0)
+ max_response_size = MIN_SNAPSHOT_ARRAY_SIZE;
+ else
+ max_response_size = CIFSMaxBufSize;
+
rc = SMB2_ioctl(xid, tcon, cfile->fid.persistent_fid,
cfile->fid.volatile_fid,
FSCTL_SRV_ENUMERATE_SNAPSHOTS,
true /* is_fsctl */,
- NULL, 0 /* no input data */,
+ NULL, 0 /* no input data */, max_response_size,
(char **)&retbuf,
&ret_data_len);
cifs_dbg(FYI, "enum snaphots ioctl returned %d and ret buflen is %d\n",
@@ -2302,7 +2320,7 @@ smb2_get_dfs_refer(const unsigned int xid, struct cifs_ses *ses,
rc = SMB2_ioctl(xid, tcon, NO_FILE_ID, NO_FILE_ID,
FSCTL_DFS_GET_REFERRALS,
true /* is_fsctl */,
- (char *)dfs_req, dfs_req_size,
+ (char *)dfs_req, dfs_req_size, CIFSMaxBufSize,
(char **)&dfs_rsp, &dfs_rsp_size);
} while (rc == -EAGAIN);
@@ -2656,7 +2674,8 @@ static long smb3_zero_range(struct file *file, struct cifs_tcon *tcon,
rc = SMB2_ioctl_init(tcon, &rqst[num++], cfile->fid.persistent_fid,
cfile->fid.volatile_fid, FSCTL_SET_ZERO_DATA,
true /* is_fctl */, (char *)&fsctl_buf,
- sizeof(struct file_zero_data_information));
+ sizeof(struct file_zero_data_information),
+ CIFSMaxBufSize);
if (rc)
goto zero_range_exit;
@@ -2733,7 +2752,8 @@ static long smb3_punch_hole(struct file *file, struct cifs_tcon *tcon,
rc = SMB2_ioctl(xid, tcon, cfile->fid.persistent_fid,
cfile->fid.volatile_fid, FSCTL_SET_ZERO_DATA,
true /* is_fctl */, (char *)&fsctl_buf,
- sizeof(struct file_zero_data_information), NULL, NULL);
+ sizeof(struct file_zero_data_information),
+ CIFSMaxBufSize, NULL, NULL);
free_xid(xid);
return rc;
}
diff --git a/fs/cifs/smb2pdu.c b/fs/cifs/smb2pdu.c
index 21ac19ff19cb..0dc66f36ff5b 100644
--- a/fs/cifs/smb2pdu.c
+++ b/fs/cifs/smb2pdu.c
@@ -1002,7 +1002,8 @@ int smb3_validate_negotiate(const unsigned int xid, struct cifs_tcon *tcon)
rc = SMB2_ioctl(xid, tcon, NO_FILE_ID, NO_FILE_ID,
FSCTL_VALIDATE_NEGOTIATE_INFO, true /* is_fsctl */,
- (char *)pneg_inbuf, inbuflen, (char **)&pneg_rsp, &rsplen);
+ (char *)pneg_inbuf, inbuflen, CIFSMaxBufSize,
+ (char **)&pneg_rsp, &rsplen);
if (rc == -EOPNOTSUPP) {
/*
* Old Windows versions or Netapp SMB server can return
@@ -2478,7 +2479,8 @@ SMB2_open(const unsigned int xid, struct cifs_open_parms *oparms, __le16 *path,
int
SMB2_ioctl_init(struct cifs_tcon *tcon, struct smb_rqst *rqst,
u64 persistent_fid, u64 volatile_fid, u32 opcode,
- bool is_fsctl, char *in_data, u32 indatalen)
+ bool is_fsctl, char *in_data, u32 indatalen,
+ __u32 max_response_size)
{
struct smb2_ioctl_req *req;
struct kvec *iov = rqst->rq_iov;
@@ -2520,16 +2522,21 @@ SMB2_ioctl_init(struct cifs_tcon *tcon, struct smb_rqst *rqst,
req->OutputCount = 0; /* MBZ */
/*
- * Could increase MaxOutputResponse, but that would require more
- * than one credit. Windows typically sets this smaller, but for some
+ * In most cases max_response_size is set to 16K (CIFSMaxBufSize)
+ * We Could increase default MaxOutputResponse, but that could require
+ * more credits. Windows typically sets this smaller, but for some
* ioctls it may be useful to allow server to send more. No point
* limiting what the server can send as long as fits in one credit
- * Unfortunately - we can not handle more than CIFS_MAX_MSG_SIZE
- * (by default, note that it can be overridden to make max larger)
- * in responses (except for read responses which can be bigger.
- * We may want to bump this limit up
+ * We can not handle more than CIFS_MAX_BUF_SIZE yet but may want
+ * to increase this limit up in the future.
+ * Note that for snapshot queries that servers like Azure expect that
+ * the first query be minimal size (and just used to get the number/size
+ * of previous versions) so response size must be specified as EXACTLY
+ * sizeof(struct snapshot_array) which is 16 when rounded up to multiple
+ * of eight bytes. Currently that is the only case where we set max
+ * response size smaller.
*/
- req->MaxOutputResponse = cpu_to_le32(CIFSMaxBufSize);
+ req->MaxOutputResponse = cpu_to_le32(max_response_size);
if (is_fsctl)
req->Flags = cpu_to_le32(SMB2_0_IOCTL_IS_FSCTL);
@@ -2550,13 +2557,14 @@ SMB2_ioctl_free(struct smb_rqst *rqst)
cifs_small_buf_release(rqst->rq_iov[0].iov_base); /* request */
}
+
/*
* SMB2 IOCTL is used for both IOCTLs and FSCTLs
*/
int
SMB2_ioctl(const unsigned int xid, struct cifs_tcon *tcon, u64 persistent_fid,
u64 volatile_fid, u32 opcode, bool is_fsctl,
- char *in_data, u32 indatalen,
+ char *in_data, u32 indatalen, u32 max_out_data_len,
char **out_data, u32 *plen /* returned data len */)
{
struct smb_rqst rqst;
@@ -2593,8 +2601,8 @@ SMB2_ioctl(const unsigned int xid, struct cifs_tcon *tcon, u64 persistent_fid,
rqst.rq_iov = iov;
rqst.rq_nvec = SMB2_IOCTL_IOV_SIZE;
- rc = SMB2_ioctl_init(tcon, &rqst, persistent_fid, volatile_fid,
- opcode, is_fsctl, in_data, indatalen);
+ rc = SMB2_ioctl_init(tcon, &rqst, persistent_fid, volatile_fid, opcode,
+ is_fsctl, in_data, indatalen, max_out_data_len);
if (rc)
goto ioctl_exit;
@@ -2672,7 +2680,8 @@ SMB2_set_compression(const unsigned int xid, struct cifs_tcon *tcon,
rc = SMB2_ioctl(xid, tcon, persistent_fid, volatile_fid,
FSCTL_SET_COMPRESSION, true /* is_fsctl */,
(char *)&fsctl_input /* data input */,
- 2 /* in data len */, &ret_data /* out data */, NULL);
+ 2 /* in data len */, CIFSMaxBufSize /* max out data */,
+ &ret_data /* out data */, NULL);
cifs_dbg(FYI, "set compression rc %d\n", rc);
diff --git a/fs/cifs/smb2proto.h b/fs/cifs/smb2proto.h
index 3c32d0cfea69..52df125e9189 100644
--- a/fs/cifs/smb2proto.h
+++ b/fs/cifs/smb2proto.h
@@ -142,11 +142,12 @@ extern int SMB2_open_init(struct cifs_tcon *tcon, struct smb_rqst *rqst,
extern void SMB2_open_free(struct smb_rqst *rqst);
extern int SMB2_ioctl(const unsigned int xid, struct cifs_tcon *tcon,
u64 persistent_fid, u64 volatile_fid, u32 opcode,
- bool is_fsctl, char *in_data, u32 indatalen,
+ bool is_fsctl, char *in_data, u32 indatalen, u32 maxoutlen,
char **out_data, u32 *plen /* returned data len */);
extern int SMB2_ioctl_init(struct cifs_tcon *tcon, struct smb_rqst *rqst,
u64 persistent_fid, u64 volatile_fid, u32 opcode,
- bool is_fsctl, char *in_data, u32 indatalen);
+ bool is_fsctl, char *in_data, u32 indatalen,
+ __u32 max_response_size);
extern void SMB2_ioctl_free(struct smb_rqst *rqst);
extern int SMB2_close(const unsigned int xid, struct cifs_tcon *tcon,
u64 persistent_file_id, u64 volatile_file_id);
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 4811e3096daaa56e145a1d2bec45e2e9fe790729 Mon Sep 17 00:00:00 2001
From: Ronnie Sahlberg <lsahlber(a)redhat.com>
Date: Mon, 1 Apr 2019 09:53:44 +1000
Subject: [PATCH] cifs: a smb2_validate_and_copy_iov failure does not mean the
handle is invalid.
It only means that we do not have a valid cached value for the
file_all_info structure.
CC: Stable <stable(a)vger.kernel.org>
Signed-off-by: Ronnie Sahlberg <lsahlber(a)redhat.com>
Signed-off-by: Steve French <stfrench(a)microsoft.com>
Reviewed-by: Pavel Shilovsky <pshilov(a)microsoft.com>
diff --git a/fs/cifs/smb2ops.c b/fs/cifs/smb2ops.c
index b22be10ee980..00225e699d03 100644
--- a/fs/cifs/smb2ops.c
+++ b/fs/cifs/smb2ops.c
@@ -733,14 +733,12 @@ int open_shroot(unsigned int xid, struct cifs_tcon *tcon, struct cifs_fid *pfid)
qi_rsp = (struct smb2_query_info_rsp *)rsp_iov[1].iov_base;
if (le32_to_cpu(qi_rsp->OutputBufferLength) < sizeof(struct smb2_file_all_info))
goto oshr_exit;
- rc = smb2_validate_and_copy_iov(
+ if (!smb2_validate_and_copy_iov(
le16_to_cpu(qi_rsp->OutputBufferOffset),
sizeof(struct smb2_file_all_info),
&rsp_iov[1], sizeof(struct smb2_file_all_info),
- (char *)&tcon->crfid.file_all_info);
- if (rc)
- goto oshr_exit;
- tcon->crfid.file_all_info_is_valid = 1;
+ (char *)&tcon->crfid.file_all_info))
+ tcon->crfid.file_all_info_is_valid = 1;
oshr_exit:
mutex_unlock(&tcon->crfid.fid_mutex);
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 4811e3096daaa56e145a1d2bec45e2e9fe790729 Mon Sep 17 00:00:00 2001
From: Ronnie Sahlberg <lsahlber(a)redhat.com>
Date: Mon, 1 Apr 2019 09:53:44 +1000
Subject: [PATCH] cifs: a smb2_validate_and_copy_iov failure does not mean the
handle is invalid.
It only means that we do not have a valid cached value for the
file_all_info structure.
CC: Stable <stable(a)vger.kernel.org>
Signed-off-by: Ronnie Sahlberg <lsahlber(a)redhat.com>
Signed-off-by: Steve French <stfrench(a)microsoft.com>
Reviewed-by: Pavel Shilovsky <pshilov(a)microsoft.com>
diff --git a/fs/cifs/smb2ops.c b/fs/cifs/smb2ops.c
index b22be10ee980..00225e699d03 100644
--- a/fs/cifs/smb2ops.c
+++ b/fs/cifs/smb2ops.c
@@ -733,14 +733,12 @@ int open_shroot(unsigned int xid, struct cifs_tcon *tcon, struct cifs_fid *pfid)
qi_rsp = (struct smb2_query_info_rsp *)rsp_iov[1].iov_base;
if (le32_to_cpu(qi_rsp->OutputBufferLength) < sizeof(struct smb2_file_all_info))
goto oshr_exit;
- rc = smb2_validate_and_copy_iov(
+ if (!smb2_validate_and_copy_iov(
le16_to_cpu(qi_rsp->OutputBufferOffset),
sizeof(struct smb2_file_all_info),
&rsp_iov[1], sizeof(struct smb2_file_all_info),
- (char *)&tcon->crfid.file_all_info);
- if (rc)
- goto oshr_exit;
- tcon->crfid.file_all_info_is_valid = 1;
+ (char *)&tcon->crfid.file_all_info))
+ tcon->crfid.file_all_info_is_valid = 1;
oshr_exit:
mutex_unlock(&tcon->crfid.fid_mutex);
The patch below does not apply to the 5.0-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 4811e3096daaa56e145a1d2bec45e2e9fe790729 Mon Sep 17 00:00:00 2001
From: Ronnie Sahlberg <lsahlber(a)redhat.com>
Date: Mon, 1 Apr 2019 09:53:44 +1000
Subject: [PATCH] cifs: a smb2_validate_and_copy_iov failure does not mean the
handle is invalid.
It only means that we do not have a valid cached value for the
file_all_info structure.
CC: Stable <stable(a)vger.kernel.org>
Signed-off-by: Ronnie Sahlberg <lsahlber(a)redhat.com>
Signed-off-by: Steve French <stfrench(a)microsoft.com>
Reviewed-by: Pavel Shilovsky <pshilov(a)microsoft.com>
diff --git a/fs/cifs/smb2ops.c b/fs/cifs/smb2ops.c
index b22be10ee980..00225e699d03 100644
--- a/fs/cifs/smb2ops.c
+++ b/fs/cifs/smb2ops.c
@@ -733,14 +733,12 @@ int open_shroot(unsigned int xid, struct cifs_tcon *tcon, struct cifs_fid *pfid)
qi_rsp = (struct smb2_query_info_rsp *)rsp_iov[1].iov_base;
if (le32_to_cpu(qi_rsp->OutputBufferLength) < sizeof(struct smb2_file_all_info))
goto oshr_exit;
- rc = smb2_validate_and_copy_iov(
+ if (!smb2_validate_and_copy_iov(
le16_to_cpu(qi_rsp->OutputBufferOffset),
sizeof(struct smb2_file_all_info),
&rsp_iov[1], sizeof(struct smb2_file_all_info),
- (char *)&tcon->crfid.file_all_info);
- if (rc)
- goto oshr_exit;
- tcon->crfid.file_all_info_is_valid = 1;
+ (char *)&tcon->crfid.file_all_info))
+ tcon->crfid.file_all_info_is_valid = 1;
oshr_exit:
mutex_unlock(&tcon->crfid.fid_mutex);