Changes since v4 [1]:
- Given v4 was from March of 2017 the bulk of the changes result from
rebasing the patch set from a v4.11-rc2 baseline to v5.1-rc1.
- A unit test is added to ndctl to exercise the creation and dax
mounting of multiple independent namespaces in a single 128M section.
[1]: https://lwn.net/Articles/717383/
---
Quote patch7:
"The libnvdimm sub-system has suffered a series of hacks and broken
workarounds for the memory-hotplug implementation's awkward
section-aligned (128MB) granularity. For example the following backtrace
is emitted when attempting arch_add_memory() with physical address
ranges that intersect 'System RAM' (RAM) with 'Persistent Memory' (PMEM)
within a given section:
WARNING: CPU: 0 PID: 558 at kernel/memremap.c:300 devm_memremap_pages+0x3b5/0x4c0
devm_memremap_pages attempted on mixed region [mem 0x200000000-0x2fbffffff flags 0x200]
[..]
Call Trace:
dump_stack+0x86/0xc3
__warn+0xcb/0xf0
warn_slowpath_fmt+0x5f/0x80
devm_memremap_pages+0x3b5/0x4c0
__wrap_devm_memremap_pages+0x58/0x70 [nfit_test_iomap]
pmem_attach_disk+0x19a/0x440 [nd_pmem]
Recently it was discovered that the problem goes beyond RAM vs PMEM
collisions as some platform produce PMEM vs PMEM collisions within a
given section. The libnvdimm workaround for that case revealed that the
libnvdimm section-alignment-padding implementation has been broken for a
long while. A fix for that long-standing breakage introduces as many
problems as it solves as it would require a backward-incompatible change
to the namespace metadata interpretation. Instead of that dubious route
[2], address the root problem in the memory-hotplug implementation."
The approach is taken is to observe that each section already maintains
an array of 'unsigned long' values to hold the pageblock_flags. A single
additional 'unsigned long' is added to house a 'sub-section active'
bitmask. Each bit tracks the mapped state of one sub-section's worth of
capacity which is SECTION_SIZE / BITS_PER_LONG, or 2MB on x86-64.
The implication of allowing sections to be piecemeal mapped/unmapped is
that the valid_section() helper is no longer authoritative to determine
if a section is fully mapped. Instead pfn_valid() is updated to consult
the section-active bitmask. Given that typical memory hotplug still has
deep "section" dependencies the sub-section capability is limited to
'want_memblock=false' invocations of arch_add_memory(), effectively only
devm_memremap_pages() users for now.
With this in place the hacks in the libnvdimm sub-system can be
dropped, and other devm_memremap_pages() users need no longer be
constrained to 128MB mapping granularity.
[2]: https://lore.kernel.org/r/155000671719.348031.2347363160141119237.stgit@dwi…
---
Dan Williams (10):
mm/sparsemem: Introduce struct mem_section_usage
mm/sparsemem: Introduce common definitions for the size and mask of a section
mm/sparsemem: Add helpers track active portions of a section at boot
mm/hotplug: Prepare shrink_{zone,pgdat}_span for sub-section removal
mm/sparsemem: Convert kmalloc_section_memmap() to populate_section_memmap()
mm/sparsemem: Prepare for sub-section ranges
mm/sparsemem: Support sub-section hotplug
mm/devm_memremap_pages: Enable sub-section remap
libnvdimm/pfn: Fix fsdax-mode namespace info-block zero-fields
libnvdimm/pfn: Stop padding pmem namespaces to section alignment
arch/x86/mm/init_64.c | 15 +-
drivers/nvdimm/dax_devs.c | 2
drivers/nvdimm/pfn.h | 12 -
drivers/nvdimm/pfn_devs.c | 93 +++-------
include/linux/memory_hotplug.h | 7 -
include/linux/mm.h | 4
include/linux/mmzone.h | 60 ++++++
kernel/memremap.c | 57 ++----
mm/hmm.c | 2
mm/memory_hotplug.c | 119 +++++++-----
mm/page_alloc.c | 6 -
mm/sparse-vmemmap.c | 21 +-
mm/sparse.c | 382 ++++++++++++++++++++++++++++------------
13 files changed, 476 insertions(+), 304 deletions(-)
Good day. I am D C Johnston, a broker working with Sikhombo
Wealth Services. We are a wealth management company based in
South Africa. I am contacting you because one of my high profile
clients is interested in investing in your country and has asked
me to look for individuals and companies with interesting
business ideas and companies that he can invest in. He wants to
expand his portfolio and has interest in investing a substantial
amount of revenue abroad. I got your contact when searching the
internet through the directory enquiries and I believe that my
client will be interested in working with you.
As you can imagine, we have not had any prior communication
before so I will be keeping the details to a minimum until I get
a confirmation from you that you are interested in this proposal.
I will then give you more details upon receiving your positive
response. Please provide your direct telephone number as well so
that I can give you a call to discuss further.
Best regards
D C Johnston
Sikhombo Wealth Services
From: Eric Biggers <ebiggers(a)google.com>
If the user-provided IV needs to be aligned to the algorithm's
alignmask, then skcipher_walk_virt() copies the IV into a new aligned
buffer walk.iv. But skcipher_walk_virt() can fail afterwards, and then
if the caller unconditionally accesses walk.iv, it's a use-after-free.
Fix this in the LRW template by checking the return value of
skcipher_walk_virt().
This bug was detected by my patches that improve testmgr to fuzz
algorithms against their generic implementation. When the extra
self-tests were run on a KASAN-enabled kernel, a KASAN use-after-free
splat occured during lrw(aes) testing.
Fixes: c778f96bf347 ("crypto: lrw - Optimize tweak computation")
Cc: <stable(a)vger.kernel.org> # v4.20+
Cc: Ondrej Mosnacek <omosnace(a)redhat.com>
Signed-off-by: Eric Biggers <ebiggers(a)google.com>
---
crypto/lrw.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/crypto/lrw.c b/crypto/lrw.c
index 0430ccd087286..b6666c595a686 100644
--- a/crypto/lrw.c
+++ b/crypto/lrw.c
@@ -162,8 +162,10 @@ static int xor_tweak(struct skcipher_request *req, bool second_pass)
}
err = skcipher_walk_virt(&w, req, false);
- iv = (__be32 *)w.iv;
+ if (err)
+ return err;
+ iv = (__be32 *)w.iv;
counter[0] = be32_to_cpu(iv[3]);
counter[1] = be32_to_cpu(iv[2]);
counter[2] = be32_to_cpu(iv[1]);
--
2.21.0
If blk_mq_try_issue_directly() returns BLK_STS*_RESOURCE that means that
the request has not been queued and that the caller should retry to submit
the request. Both blk_mq_request_bypass_insert() and
blk_mq_sched_insert_request() guarantee that a request will be processed.
Hence return BLK_STS_OK if one of these functions is called. This patch
avoids that blk_mq_dispatch_rq_list() crashes when using dm-mpath.
Cc: Christoph Hellwig <hch(a)infradead.org>
Cc: Hannes Reinecke <hare(a)suse.com>
Cc: James Smart <james.smart(a)broadcom.com>
Cc: Ming Lei <ming.lei(a)redhat.com>
Cc: Jianchao Wang <jianchao.w.wang(a)oracle.com>
Cc: Keith Busch <keith.busch(a)intel.com>
Cc: Dongli Zhang <dongli.zhang(a)oracle.com>
Cc: Laurence Oberman <loberman(a)redhat.com>
Tested-by: Laurence Oberman <loberman(a)redhat.com>
Reviewed-by: Laurence Oberman <loberman(a)redhat.com>
Reported-by: Laurence Oberman <loberman(a)redhat.com>
Fixes: 7f556a44e61d ("blk-mq: refactor the code of issue request directly") # v5.0.
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Bart Van Assche <bvanassche(a)acm.org>
---
block/blk-mq.c | 9 ++-------
1 file changed, 2 insertions(+), 7 deletions(-)
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 652d0c6d5945..b2c20dce8a30 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -1859,16 +1859,11 @@ blk_status_t blk_mq_try_issue_directly(struct blk_mq_hw_ctx *hctx,
case BLK_STS_RESOURCE:
if (force) {
blk_mq_request_bypass_insert(rq, run_queue);
- /*
- * We have to return BLK_STS_OK for the DM
- * to avoid livelock. Otherwise, we return
- * the real result to indicate whether the
- * request is direct-issued successfully.
- */
- ret = bypass ? BLK_STS_OK : ret;
+ ret = BLK_STS_OK;
} else if (!bypass) {
blk_mq_sched_insert_request(rq, false,
run_queue, false);
+ ret = BLK_STS_OK;
}
break;
default:
--
2.21.0.196.g041f5ea1cf98
The patch titled
Subject: mm/vmstat.c: fix /proc/vmstat format for CONFIG_DEBUG_TLBFLUSH=y CONFIG_SMP=n
has been added to the -mm tree. Its filename is
mm-vmstat-fix-proc-vmstat-format-for-config_debug_tlbflush=y-config_smp=n.patch
This patch should soon appear at
http://ozlabs.org/~akpm/mmots/broken-out/mm-vmstat-fix-proc-vmstat-format-f…
and later at
http://ozlabs.org/~akpm/mmotm/broken-out/mm-vmstat-fix-proc-vmstat-format-f…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: Konstantin Khlebnikov <khlebnikov(a)yandex-team.ru>
Subject: mm/vmstat.c: fix /proc/vmstat format for CONFIG_DEBUG_TLBFLUSH=y CONFIG_SMP=n
58bc4c34d249 ("mm/vmstat.c: skip NR_TLB_REMOTE_FLUSH* properly") depends
on skipping vmstat entries with empty name introduced in 7aaf77272358
("mm: don't show nr_indirectly_reclaimable in /proc/vmstat") but reverted
in b29940c1abd7 ("mm: rename and change semantics of
nr_indirectly_reclaimable_bytes").
So skipping no longer works and /proc/vmstat has misformatted lines " 0".
This patch simply shows debug counters "nr_tlb_remote_*" for UP.
Link: http://lkml.kernel.org/r/155481488468.467.4295519102880913454.stgit@buzz
Fixes: 58bc4c34d249 ("mm/vmstat.c: skip NR_TLB_REMOTE_FLUSH* properly")
Signed-off-by: Konstantin Khlebnikov <khlebnikov(a)yandex-team.ru>
Acked-by: Vlastimil Babka <vbabka(a)suse.cz>
Cc: Roman Gushchin <guro(a)fb.com>
Cc: Jann Horn <jannh(a)google.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/vmstat.c | 5 -----
1 file changed, 5 deletions(-)
--- a/mm/vmstat.c~mm-vmstat-fix-proc-vmstat-format-for-config_debug_tlbflush=y-config_smp=n
+++ a/mm/vmstat.c
@@ -1274,13 +1274,8 @@ const char * const vmstat_text[] = {
#endif
#endif /* CONFIG_MEMORY_BALLOON */
#ifdef CONFIG_DEBUG_TLBFLUSH
-#ifdef CONFIG_SMP
"nr_tlb_remote_flush",
"nr_tlb_remote_flush_received",
-#else
- "", /* nr_tlb_remote_flush */
- "", /* nr_tlb_remote_flush_received */
-#endif /* CONFIG_SMP */
"nr_tlb_local_flush_all",
"nr_tlb_local_flush_one",
#endif /* CONFIG_DEBUG_TLBFLUSH */
_
Patches currently in -mm which might be from khlebnikov(a)yandex-team.ru are
mm-vmstat-fix-proc-vmstat-format-for-config_debug_tlbflush=y-config_smp=n.patch
If we enter smb2_query_symlink() for something that is not a symlink
and where the SMB2_open() would succeed we would never end up
closing this handle and would thus leak a handle on the server.
Fix this by immediately calling SMB2_close() on successfull open.
Signed-off-by: Ronnie Sahlberg <lsahlber(a)redhat.com>
CC: Stable <stable(a)vger.kernel.org>
---
fs/cifs/smb2ops.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/fs/cifs/smb2ops.c b/fs/cifs/smb2ops.c
index 83a100dd2497..ab4737e3c31f 100644
--- a/fs/cifs/smb2ops.c
+++ b/fs/cifs/smb2ops.c
@@ -2397,6 +2397,8 @@ smb2_query_symlink(const unsigned int xid, struct cifs_tcon *tcon,
rc = SMB2_open(xid, &oparms, utf16_path, &oplock, NULL, &err_iov,
&resp_buftype);
+ if (!rc)
+ SMB2_close(xid, tcon, fid.persistent_fid, fid.volatile_fid);
if (!rc || !err_iov.iov_base) {
rc = -ENOENT;
goto free_path;
--
2.13.6