From: John Stultz <john.stultz(a)linaro.org>
commit 3d88d56c5873f6eebe23e05c3da701960146b801 upstream.
Due to how the MONOTONIC_RAW accumulation logic was handled,
there is the potential for a 1ns discontinuity when we do
accumulations. This small discontinuity has for the most part
gone un-noticed, but since ARM64 enabled CLOCK_MONOTONIC_RAW
in their vDSO clock_gettime implementation, we've seen failures
with the inconsistency-check test in kselftest.
This patch addresses the issue by using the same sub-ns
accumulation handling that CLOCK_MONOTONIC uses, which avoids
the issue for in-kernel users.
Since the ARM64 vDSO implementation has its own clock_gettime
calculation logic, this patch reduces the frequency of errors,
but failures are still seen. The ARM64 vDSO will need to be
updated to include the sub-nanosecond xtime_nsec values in its
calculation for this issue to be completely fixed.
Signed-off-by: John Stultz <john.stultz(a)linaro.org>
Tested-by: Daniel Mentz <danielmentz(a)google.com>
Cc: Prarit Bhargava <prarit(a)redhat.com>
Cc: Kevin Brodsky <kevin.brodsky(a)arm.com>
Cc: Richard Cochran <richardcochran(a)gmail.com>
Cc: Stephen Boyd <stephen.boyd(a)linaro.org>
Cc: Will Deacon <will.deacon(a)arm.com>
Cc: "stable #4 . 8+" <stable(a)vger.kernel.org>
Cc: Miroslav Lichvar <mlichvar(a)redhat.com>
Link: http://lkml.kernel.org/r/1496965462-20003-3-git-send-email-john.stultz@lina…
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
[fabrizio: cherry-pick to 4.4. Kept cycle_t type for function
logarithmic_accumulation local variable "interval". Dropped
casting of "interval" variable]
Signed-off-by: Fabrizio Castro <fabrizio.castro(a)bp.renesas.com>
Signed-off-by: Biju Das <biju.das(a)bp.renesas.com>
---
Hello Greg,
we noticed tools/testing/selftests/timers/clocksource-switch.c
was failing for us, this patch fixes the cause of the failure.
Are you happy to take this patch?
Thanks,
Fab
include/linux/timekeeper_internal.h | 4 ++--
kernel/time/timekeeping.c | 20 ++++++++++----------
2 files changed, 12 insertions(+), 12 deletions(-)
diff --git a/include/linux/timekeeper_internal.h b/include/linux/timekeeper_internal.h
index f0f1793..115216e 100644
--- a/include/linux/timekeeper_internal.h
+++ b/include/linux/timekeeper_internal.h
@@ -56,7 +56,7 @@ struct tk_read_base {
* interval.
* @xtime_remainder: Shifted nano seconds left over when rounding
* @cycle_interval
- * @raw_interval: Raw nano seconds accumulated per NTP interval.
+ * @raw_interval: Shifted raw nano seconds accumulated per NTP interval.
* @ntp_error: Difference between accumulated time and NTP time in ntp
* shifted nano seconds.
* @ntp_error_shift: Shift conversion between clock shifted nano seconds and
@@ -97,7 +97,7 @@ struct timekeeper {
cycle_t cycle_interval;
u64 xtime_interval;
s64 xtime_remainder;
- u32 raw_interval;
+ u64 raw_interval;
/* The ntp_tick_length() value currently being used.
* This cached copy ensures we consistently apply the tick
* length for an entire tick, as ntp_tick_length may change
diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index 6e48668..fed86b2 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -277,8 +277,7 @@ static void tk_setup_internals(struct timekeeper *tk, struct clocksource *clock)
/* Go back from cycles -> shifted ns */
tk->xtime_interval = (u64) interval * clock->mult;
tk->xtime_remainder = ntpinterval - tk->xtime_interval;
- tk->raw_interval =
- ((u64) interval * clock->mult) >> clock->shift;
+ tk->raw_interval = interval * clock->mult;
/* if changing clocks, convert xtime_nsec shift units */
if (old_clock) {
@@ -1767,7 +1766,7 @@ static cycle_t logarithmic_accumulation(struct timekeeper *tk, cycle_t offset,
unsigned int *clock_set)
{
cycle_t interval = tk->cycle_interval << shift;
- u64 raw_nsecs;
+ u64 snsec_per_sec;
/* If the offset is smaller than a shifted interval, do nothing */
if (offset < interval)
@@ -1782,14 +1781,15 @@ static cycle_t logarithmic_accumulation(struct timekeeper *tk, cycle_t offset,
*clock_set |= accumulate_nsecs_to_secs(tk);
/* Accumulate raw time */
- raw_nsecs = (u64)tk->raw_interval << shift;
- raw_nsecs += tk->raw_time.tv_nsec;
- if (raw_nsecs >= NSEC_PER_SEC) {
- u64 raw_secs = raw_nsecs;
- raw_nsecs = do_div(raw_secs, NSEC_PER_SEC);
- tk->raw_time.tv_sec += raw_secs;
+ tk->tkr_raw.xtime_nsec += (u64)tk->raw_time.tv_nsec << tk->tkr_raw.shift;
+ tk->tkr_raw.xtime_nsec += tk->raw_interval << shift;
+ snsec_per_sec = (u64)NSEC_PER_SEC << tk->tkr_raw.shift;
+ while (tk->tkr_raw.xtime_nsec >= snsec_per_sec) {
+ tk->tkr_raw.xtime_nsec -= snsec_per_sec;
+ tk->raw_time.tv_sec++;
}
- tk->raw_time.tv_nsec = raw_nsecs;
+ tk->raw_time.tv_nsec = tk->tkr_raw.xtime_nsec >> tk->tkr_raw.shift;
+ tk->tkr_raw.xtime_nsec -= (u64)tk->raw_time.tv_nsec << tk->tkr_raw.shift;
/* Accumulate error between NTP and clock interval */
tk->ntp_error += tk->ntp_tick << shift;
--
2.7.4
Hi,
On Mon, May 21, 2018 at 3:12 AM, William Wu <william.wu(a)rock-chips.com> wrote:
> The dwc2_get_ls_map() use ttport to reference into the
> bitmap if we're on a multi_tt hub. But the bitmaps index
> from 0 to (hub->maxchild - 1), while the ttport index from
> 1 to hub->maxchild. This will cause invalid memory access
> when the number of ttport is hub->maxchild.
>
> Without this patch, I can easily meet a Kernel panic issue
> if connect a low-speed USB mouse with the max port of FE2.1
> multi-tt hub (1a40:0201) on rk3288 platform.
>
> Signed-off-by: William Wu <william.wu(a)rock-chips.com>
> ---
> drivers/usb/dwc2/hcd_queue.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/usb/dwc2/hcd_queue.c b/drivers/usb/dwc2/hcd_queue.c
> index d7c3d6c..9c55d1a 100644
> --- a/drivers/usb/dwc2/hcd_queue.c
> +++ b/drivers/usb/dwc2/hcd_queue.c
> @@ -383,7 +383,7 @@ static unsigned long *dwc2_get_ls_map(struct dwc2_hsotg *hsotg,
> /* Get the map and adjust if this is a multi_tt hub */
> map = qh->dwc_tt->periodic_bitmaps;
> if (qh->dwc_tt->usb_tt->multi)
> - map += DWC2_ELEMENTS_PER_LS_BITMAP * qh->ttport;
> + map += DWC2_ELEMENTS_PER_LS_BITMAP * (qh->ttport - 1);
Oops, thanks for the fix.
Fixes: 9f9f09b048f5 ("usb: dwc2: host: Totally redo the microframe scheduler")
Cc: stable(a)vger.kernel.org
Reviewed-by: Douglas Anderson <dianders(a)chromium.org>
-Doug
The patch titled
Subject: mm, devm_memremap_pages: handle errors allocating final devres action
has been added to the -mm tree. Its filename is
mm-devm_memremap_pages-handle-errors-allocating-final-devres-action.patch
This patch should soon appear at
http://ozlabs.org/~akpm/mmots/broken-out/mm-devm_memremap_pages-handle-erro…
and later at
http://ozlabs.org/~akpm/mmotm/broken-out/mm-devm_memremap_pages-handle-erro…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: Dan Williams <dan.j.williams(a)intel.com>
Subject: mm, devm_memremap_pages: handle errors allocating final devres action
The last step before devm_memremap_pages() returns success is to allocate
a release action to tear the entire setup down. However, the result from
devm_add_action() is not checked.
Checking the error also means that we need to handle the fact that the
percpu_ref may not be killed by the time devm_memremap_pages_release()
runs. Add a new state flag for this case.
Link: http://lkml.kernel.org/r/152694212460.5484.13180030631810166467.stgit@dwill…
Fixes: e8d513483300 ("memremap: change devm_memremap_pages interface...")
Signed-off-by: Dan Williams <dan.j.williams(a)intel.com>
Cc: Christoph Hellwig <hch(a)lst.de>
Cc: "Jérôme Glisse" <jglisse(a)redhat.com>
Cc: Logan Gunthorpe <logang(a)deltatee.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
include/linux/memremap.h | 1 +
kernel/memremap.c | 8 ++++++--
2 files changed, 7 insertions(+), 2 deletions(-)
diff -puN include/linux/memremap.h~mm-devm_memremap_pages-handle-errors-allocating-final-devres-action include/linux/memremap.h
--- a/include/linux/memremap.h~mm-devm_memremap_pages-handle-errors-allocating-final-devres-action
+++ a/include/linux/memremap.h
@@ -115,6 +115,7 @@ struct dev_pagemap {
dev_page_free_t page_free;
struct vmem_altmap altmap;
bool altmap_valid;
+ bool registered;
struct resource res;
struct percpu_ref *ref;
struct device *dev;
diff -puN kernel/memremap.c~mm-devm_memremap_pages-handle-errors-allocating-final-devres-action kernel/memremap.c
--- a/kernel/memremap.c~mm-devm_memremap_pages-handle-errors-allocating-final-devres-action
+++ a/kernel/memremap.c
@@ -296,7 +296,7 @@ static void devm_memremap_pages_release(
for_each_device_pfn(pfn, pgmap)
put_page(pfn_to_page(pfn));
- if (percpu_ref_tryget_live(pgmap->ref)) {
+ if (pgmap->registered && percpu_ref_tryget_live(pgmap->ref)) {
dev_WARN(dev, "%s: page mapping is still live!\n", __func__);
percpu_ref_put(pgmap->ref);
}
@@ -418,7 +418,11 @@ void *devm_memremap_pages(struct device
percpu_ref_get(pgmap->ref);
}
- devm_add_action(dev, devm_memremap_pages_release, pgmap);
+ error = devm_add_action_or_reset(dev, devm_memremap_pages_release,
+ pgmap);
+ if (error)
+ return ERR_PTR(error);
+ pgmap->registered = true;
return __va(res->start);
_
Patches currently in -mm which might be from dan.j.williams(a)intel.com are
mm-devm_memremap_pages-mark-devm_memremap_pages-export_symbol_gpl.patch
mm-devm_memremap_pages-handle-errors-allocating-final-devres-action.patch
mm-hmm-use-devm-semantics-for-hmm_devmem_add-remove.patch
mm-hmm-replace-hmm_devmem_pages_create-with-devm_memremap_pages.patch
mm-hmm-mark-hmm_devmem_add-add_resource-export_symbol_gpl.patch
Hi, Boris
Sorry for the later as for I am in a long vacation.
Here how the SR should behave:
the status register is updated after each array operation and can be cleared with a reset command.
After a read operation the status register bit0 will report the ECC status of the read until a different array operation is performed (erase/program/read) or a reset occurs.
The status register bit1 will report the status of the time before last time operation. So, this bit can report a fail (value 1) even if the very last operation was successful (bit0=0 bit1=1).
//beanhuo
>
>---
>Peter, Bean,
>
>Can you confirm this behavior, or ask someone in Micron who can confirm it?
>Also, if a RESET is actually needed, it would be good to update the datasheet
>accordingly. And if that's not the case, can you explain why the
>NAND_STATUS_FAIL bit is stuck and how to clear it (I tried a 0x00 command,
>A.K.A. READ STATUS EXIT, but it does not clear this bit, ERASE and PROGRAM
>seem to clear the bit, but that's clearly not the kind of operation I can do
>when the user asks for a READ)?
>
>Thanks,
>
>Boris
The audit_filter_rules() function in auditsc.c compared the session ID
with the credentials of the current task, while it should use the
credentials of the task given to audit_filter_rules() as a parameter
(tsk).
GitHub issue:
https://github.com/linux-audit/audit-kernel/issues/82
Fixes: 8fae47705685 ("audit: add support for session ID user filter")
Cc: stable(a)vger.kernel.org
Signed-off-by: Ondrej Mosnacek <omosnace(a)redhat.com>
---
kernel/auditsc.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/kernel/auditsc.c b/kernel/auditsc.c
index ec38e4d97c23..6d577a34b16b 100644
--- a/kernel/auditsc.c
+++ b/kernel/auditsc.c
@@ -513,7 +513,7 @@ static int audit_filter_rules(struct task_struct *tsk,
result = audit_gid_comparator(cred->fsgid, f->op, f->gid);
break;
case AUDIT_SESSIONID:
- sessionid = audit_get_sessionid(current);
+ sessionid = audit_get_sessionid(tsk);
result = audit_comparator(sessionid, f->op, f->val);
break;
case AUDIT_PERS:
--
2.17.0
The patch titled
Subject: mm: don't allow deferred pages with NEED_PER_CPU_KM
has been removed from the -mm tree. Its filename was
mm-dont-allow-deferred-pages-with-need_per_cpu_km.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: Pavel Tatashin <pasha.tatashin(a)oracle.com>
Subject: mm: don't allow deferred pages with NEED_PER_CPU_KM
It is unsafe to do virtual to physical translations before mm_init() is
called if struct page is needed in order to determine the memory section
number (see SECTION_IN_PAGE_FLAGS). This is because only in mm_init() we
initialize struct pages for all the allocated memory when deferred struct
pages are used.
My recent fix c9e97a1997 ("mm: initialize pages on demand during boot")
exposed this problem, because it greatly reduced number of pages that are
initialized before mm_init(), but the problem existed even before my fix,
as Fengguang Wu found.
Below is a more detailed explanation of the problem.
We initialize struct pages in four places:
1. Early in boot a small set of struct pages is initialized to fill
the first section, and lower zones.
2. During mm_init() we initialize "struct pages" for all the memory
that is allocated, i.e reserved in memblock.
3. Using on-demand logic when pages are allocated after mm_init call (when
memblock is finished)
4. After smp_init() when the rest free deferred pages are initialized.
The problem occurs if we try to do va to phys translation of a memory
between steps 1 and 2. Because we have not yet initialized struct pages
for all the reserved pages, it is inherently unsafe to do va to phys if
the translation itself requires access of "struct page" as in case of this
combination: CONFIG_SPARSE && !CONFIG_SPARSE_VMEMMAP
The following path exposes the problem:
start_kernel()
trap_init()
setup_cpu_entry_areas()
setup_cpu_entry_area(cpu)
get_cpu_gdt_paddr(cpu)
per_cpu_ptr_to_phys(addr)
pcpu_addr_to_page(addr)
virt_to_page(addr)
pfn_to_page(__pa(addr) >> PAGE_SHIFT)
We disable this path by not allowing NEED_PER_CPU_KM with deferred struct
pages feature.
The problems are discussed in these threads:
http://lkml.kernel.org/r/20180418135300.inazvpxjxowogyge@wfg-t540p.sh.intel…http://lkml.kernel.org/r/20180419013128.iurzouiqxvcnpbvz@wfg-t540p.sh.intel…http://lkml.kernel.org/r/20180426202619.2768-1-pasha.tatashin@oracle.com
Link: http://lkml.kernel.org/r/20180515175124.1770-1-pasha.tatashin@oracle.com
Fixes: 3a80a7fa7989 ("mm: meminit: initialise a subset of struct pages if CONFIG_DEFERRED_STRUCT_PAGE_INIT is set")
Signed-off-by: Pavel Tatashin <pasha.tatashin(a)oracle.com>
Acked-by: Michal Hocko <mhocko(a)suse.com>
Reviewed-by: Andrew Morton <akpm(a)linux-foundation.org>
Cc: Steven Sistare <steven.sistare(a)oracle.com>
Cc: Daniel Jordan <daniel.m.jordan(a)oracle.com>
Cc: Mel Gorman <mgorman(a)techsingularity.net>
Cc: Fengguang Wu <fengguang.wu(a)intel.com>
Cc: Dennis Zhou <dennisszhou(a)gmail.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/Kconfig | 1 +
1 file changed, 1 insertion(+)
diff -puN mm/Kconfig~mm-dont-allow-deferred-pages-with-need_per_cpu_km mm/Kconfig
--- a/mm/Kconfig~mm-dont-allow-deferred-pages-with-need_per_cpu_km
+++ a/mm/Kconfig
@@ -636,6 +636,7 @@ config DEFERRED_STRUCT_PAGE_INIT
default n
depends on NO_BOOTMEM
depends on !FLATMEM
+ depends on !NEED_PER_CPU_KM
help
Ordinarily all struct pages are initialised during early boot in a
single thread. On very large machines this can take a considerable
_
Patches currently in -mm which might be from pasha.tatashin(a)oracle.com are
sparc64-ng4-memset-32-bits-overflow.patch
The patch titled
Subject: radix tree: fix multi-order iteration race
has been removed from the -mm tree. Its filename was
radix-tree-fix-multi-order-iteration-race.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: Ross Zwisler <ross.zwisler(a)linux.intel.com>
Subject: radix tree: fix multi-order iteration race
Fix a race in the multi-order iteration code which causes the kernel to
hit a GP fault. This was first seen with a production v4.15 based kernel
(4.15.6-300.fc27.x86_64) utilizing a DAX workload which used order 9 PMD
DAX entries.
The race has to do with how we tear down multi-order sibling entries when
we are removing an item from the tree. Remember for example that an order
2 entry looks like this:
struct radix_tree_node.slots[] = [entry][sibling][sibling][sibling]
where 'entry' is in some slot in the struct radix_tree_node, and the three
slots following 'entry' contain sibling pointers which point back to
'entry.'
When we delete 'entry' from the tree, we call :
radix_tree_delete()
radix_tree_delete_item()
__radix_tree_delete()
replace_slot()
replace_slot() first removes the siblings in order from the first to the
last, then at then replaces 'entry' with NULL. This means that for a
brief period of time we end up with one or more of the siblings removed,
so:
struct radix_tree_node.slots[] = [entry][NULL][sibling][sibling]
This causes an issue if you have a reader iterating over the slots in the
tree via radix_tree_for_each_slot() while only under
rcu_read_lock()/rcu_read_unlock() protection. This is a common case in
mm/filemap.c.
The issue is that when __radix_tree_next_slot() => skip_siblings() tries
to skip over the sibling entries in the slots, it currently does so with
an exact match on the slot directly preceding our current slot. Normally
this works:
V preceding slot
struct radix_tree_node.slots[] = [entry][sibling][sibling][sibling]
^ current slot
This lets you find the first sibling, and you skip them all in order.
But in the case where one of the siblings is NULL, that slot is skipped
and then our sibling detection is interrupted:
V preceding slot
struct radix_tree_node.slots[] = [entry][NULL][sibling][sibling]
^ current slot
This means that the sibling pointers aren't recognized since they point
all the way back to 'entry', so we think that they are normal internal
radix tree pointers. This causes us to think we need to walk down to a
struct radix_tree_node starting at the address of 'entry'.
In a real running kernel this will crash the thread with a GP fault when
you try and dereference the slots in your broken node starting at 'entry'.
We fix this race by fixing the way that skip_siblings() detects sibling
nodes. Instead of testing against the preceding slot we instead look for
siblings via is_sibling_entry() which compares against the position of the
struct radix_tree_node.slots[] array. This ensures that sibling entries
are properly identified, even if they are no longer contiguous with the
'entry' they point to.
Link: http://lkml.kernel.org/r/20180503192430.7582-6-ross.zwisler@linux.intel.com
Fixes: 148deab223b2 ("radix-tree: improve multiorder iterators")
Signed-off-by: Ross Zwisler <ross.zwisler(a)linux.intel.com>
Reported-by: CR, Sapthagirish <sapthagirish.cr(a)intel.com>
Reviewed-by: Jan Kara <jack(a)suse.cz>
Cc: Matthew Wilcox <willy(a)infradead.org>
Cc: Christoph Hellwig <hch(a)lst.de>
Cc: Dan Williams <dan.j.williams(a)intel.com>
Cc: Dave Chinner <david(a)fromorbit.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
lib/radix-tree.c | 6 ++----
1 file changed, 2 insertions(+), 4 deletions(-)
diff -puN lib/radix-tree.c~radix-tree-fix-multi-order-iteration-race lib/radix-tree.c
--- a/lib/radix-tree.c~radix-tree-fix-multi-order-iteration-race
+++ a/lib/radix-tree.c
@@ -1612,11 +1612,9 @@ static void set_iter_tags(struct radix_t
static void __rcu **skip_siblings(struct radix_tree_node **nodep,
void __rcu **slot, struct radix_tree_iter *iter)
{
- void *sib = node_to_entry(slot - 1);
-
while (iter->index < iter->next_index) {
*nodep = rcu_dereference_raw(*slot);
- if (*nodep && *nodep != sib)
+ if (*nodep && !is_sibling_entry(iter->node, *nodep))
return slot;
slot++;
iter->index = __radix_tree_iter_add(iter, 1);
@@ -1631,7 +1629,7 @@ void __rcu **__radix_tree_next_slot(void
struct radix_tree_iter *iter, unsigned flags)
{
unsigned tag = flags & RADIX_TREE_ITER_TAG_MASK;
- struct radix_tree_node *node = rcu_dereference_raw(*slot);
+ struct radix_tree_node *node;
slot = skip_siblings(&node, slot, iter);
_
Patches currently in -mm which might be from ross.zwisler(a)linux.intel.com are
Hello Greg,
Here are trivial backports for upstream commit, which is tagged as stable:
02a3307aa9c2 ("btrfs: fix reading stale metadata blocks after degraded
raid1 mounts")
Regards,
Nikolay