The patch below does not apply to the 5.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 7a30df49f63ad92318ddf1f7498d1129a77dd4bd Mon Sep 17 00:00:00 2001
From: Yang Shi <yang.shi(a)linux.alibaba.com>
Date: Thu, 13 Jun 2019 15:56:05 -0700
Subject: [PATCH] mm: mmu_gather: remove __tlb_reset_range() for force flush
A few new fields were added to mmu_gather to make TLB flush smarter for
huge page by telling what level of page table is changed.
__tlb_reset_range() is used to reset all these page table state to
unchanged, which is called by TLB flush for parallel mapping changes for
the same range under non-exclusive lock (i.e. read mmap_sem).
Before commit dd2283f2605e ("mm: mmap: zap pages with read mmap_sem in
munmap"), the syscalls (e.g. MADV_DONTNEED, MADV_FREE) which may update
PTEs in parallel don't remove page tables. But, the forementioned
commit may do munmap() under read mmap_sem and free page tables. This
may result in program hang on aarch64 reported by Jan Stancek. The
problem could be reproduced by his test program with slightly modified
below.
---8<---
static int map_size = 4096;
static int num_iter = 500;
static long threads_total;
static void *distant_area;
void *map_write_unmap(void *ptr)
{
int *fd = ptr;
unsigned char *map_address;
int i, j = 0;
for (i = 0; i < num_iter; i++) {
map_address = mmap(distant_area, (size_t) map_size, PROT_WRITE | PROT_READ,
MAP_SHARED | MAP_ANONYMOUS, -1, 0);
if (map_address == MAP_FAILED) {
perror("mmap");
exit(1);
}
for (j = 0; j < map_size; j++)
map_address[j] = 'b';
if (munmap(map_address, map_size) == -1) {
perror("munmap");
exit(1);
}
}
return NULL;
}
void *dummy(void *ptr)
{
return NULL;
}
int main(void)
{
pthread_t thid[2];
/* hint for mmap in map_write_unmap() */
distant_area = mmap(0, DISTANT_MMAP_SIZE, PROT_WRITE | PROT_READ,
MAP_ANONYMOUS | MAP_PRIVATE, -1, 0);
munmap(distant_area, (size_t)DISTANT_MMAP_SIZE);
distant_area += DISTANT_MMAP_SIZE / 2;
while (1) {
pthread_create(&thid[0], NULL, map_write_unmap, NULL);
pthread_create(&thid[1], NULL, dummy, NULL);
pthread_join(thid[0], NULL);
pthread_join(thid[1], NULL);
}
}
---8<---
The program may bring in parallel execution like below:
t1 t2
munmap(map_address)
downgrade_write(&mm->mmap_sem);
unmap_region()
tlb_gather_mmu()
inc_tlb_flush_pending(tlb->mm);
free_pgtables()
tlb->freed_tables = 1
tlb->cleared_pmds = 1
pthread_exit()
madvise(thread_stack, 8M, MADV_DONTNEED)
zap_page_range()
tlb_gather_mmu()
inc_tlb_flush_pending(tlb->mm);
tlb_finish_mmu()
if (mm_tlb_flush_nested(tlb->mm))
__tlb_reset_range()
__tlb_reset_range() would reset freed_tables and cleared_* bits, but this
may cause inconsistency for munmap() which do free page tables. Then it
may result in some architectures, e.g. aarch64, may not flush TLB
completely as expected to have stale TLB entries remained.
Use fullmm flush since it yields much better performance on aarch64 and
non-fullmm doesn't yields significant difference on x86.
The original proposed fix came from Jan Stancek who mainly debugged this
issue, I just wrapped up everything together.
Jan's testing results:
v5.2-rc2-24-gbec7550cca10
--------------------------
mean stddev
real 37.382 2.780
user 1.420 0.078
sys 54.658 1.855
v5.2-rc2-24-gbec7550cca10 + "mm: mmu_gather: remove __tlb_reset_range() for force flush"
---------------------------------------------------------------------------------------_
mean stddev
real 37.119 2.105
user 1.548 0.087
sys 55.698 1.357
[akpm(a)linux-foundation.org: coding-style fixes]
Link: http://lkml.kernel.org/r/1558322252-113575-1-git-send-email-yang.shi@linux.…
Fixes: dd2283f2605e ("mm: mmap: zap pages with read mmap_sem in munmap")
Signed-off-by: Yang Shi <yang.shi(a)linux.alibaba.com>
Signed-off-by: Jan Stancek <jstancek(a)redhat.com>
Reported-by: Jan Stancek <jstancek(a)redhat.com>
Tested-by: Jan Stancek <jstancek(a)redhat.com>
Suggested-by: Will Deacon <will.deacon(a)arm.com>
Tested-by: Will Deacon <will.deacon(a)arm.com>
Acked-by: Will Deacon <will.deacon(a)arm.com>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Nick Piggin <npiggin(a)gmail.com>
Cc: "Aneesh Kumar K.V" <aneesh.kumar(a)linux.ibm.com>
Cc: Nadav Amit <namit(a)vmware.com>
Cc: Minchan Kim <minchan(a)kernel.org>
Cc: Mel Gorman <mgorman(a)suse.de>
Cc: <stable(a)vger.kernel.org> [4.20+]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org>
diff --git a/mm/mmu_gather.c b/mm/mmu_gather.c
index 99740e1dd273..8c943a6e1696 100644
--- a/mm/mmu_gather.c
+++ b/mm/mmu_gather.c
@@ -245,14 +245,28 @@ void tlb_finish_mmu(struct mmu_gather *tlb,
{
/*
* If there are parallel threads are doing PTE changes on same range
- * under non-exclusive lock(e.g., mmap_sem read-side) but defer TLB
- * flush by batching, a thread has stable TLB entry can fail to flush
- * the TLB by observing pte_none|!pte_dirty, for example so flush TLB
- * forcefully if we detect parallel PTE batching threads.
+ * under non-exclusive lock (e.g., mmap_sem read-side) but defer TLB
+ * flush by batching, one thread may end up seeing inconsistent PTEs
+ * and result in having stale TLB entries. So flush TLB forcefully
+ * if we detect parallel PTE batching threads.
+ *
+ * However, some syscalls, e.g. munmap(), may free page tables, this
+ * needs force flush everything in the given range. Otherwise this
+ * may result in having stale TLB entries for some architectures,
+ * e.g. aarch64, that could specify flush what level TLB.
*/
if (mm_tlb_flush_nested(tlb->mm)) {
+ /*
+ * The aarch64 yields better performance with fullmm by
+ * avoiding multiple CPUs spamming TLBI messages at the
+ * same time.
+ *
+ * On x86 non-fullmm doesn't yield significant difference
+ * against fullmm.
+ */
+ tlb->fullmm = 1;
__tlb_reset_range(tlb);
- __tlb_adjust_range(tlb, start, end - start);
+ tlb->freed_tables = 1;
}
tlb_flush_mmu(tlb);
I can produce a version of this patch specific to v4.14.113. Please
let me know the proper process for submitting such a patch.
Jason
---
Now instead of four in the eights place /
you’ve got three, ‘Cause you added one /
(That is to say, eight) to the two, /
But you can’t take seven from three, /
So you look at the sixty-fours....
On Thu, Apr 25, 2019 at 5:15 AM Sasha Levin <sashal(a)kernel.org> wrote:
>
> Hi,
>
> [This is an automated email]
>
> This commit has been processed because it contains a "Fixes:" tag,
> fixing commit: a48324de6d4d HID: wacom: Bluetooth IRQ for Intuos Pro should handle prox/range.
>
> The bot has tested the following trees: v5.0.9, v4.19.36, v4.14.113.
>
> v5.0.9: Build OK!
> v4.19.36: Build OK!
> v4.14.113: Failed to apply! Possible dependencies:
> 87046b6c995c ("HID: wacom: Add support for 3rd generation Intuos BT")
> c947218951da ("HID: wacom: Add support for One by Wacom (CTL-472 / CTL-672)")
>
>
> How should we proceed with this patch?
>
> --
> Thanks,
> Sasha
Some old mice have a tendency to not accept the high resolution multiplier.
They reply with a -EPIPE which was previously ignored.
Force the call to resolution multiplier to be synchronous and actually
check for the answer. If this fails, consider the mouse like a normal one.
Fixes: 2dc702c991e377 ("HID: input: use the Resolution Multiplier for
high-resolution scrolling")
Link: https://bugzilla.redhat.com/show_bug.cgi?id=1700071
Reported-and-tested-by: James Feeney <james(a)nurealm.net>
Cc: stable(a)vger.kernel.org # v5.0+
Signed-off-by: Benjamin Tissoires <benjamin.tissoires(a)redhat.com>
---
drivers/hid/hid-core.c | 7 +++-
drivers/hid/hid-input.c | 81 +++++++++++++++++++++++++----------------
include/linux/hid.h | 2 +-
3 files changed, 56 insertions(+), 34 deletions(-)
diff --git a/drivers/hid/hid-core.c b/drivers/hid/hid-core.c
index 76464aabd20c..92387fc0bf18 100644
--- a/drivers/hid/hid-core.c
+++ b/drivers/hid/hid-core.c
@@ -1637,7 +1637,7 @@ static struct hid_report *hid_get_report(struct hid_report_enum *report_enum,
* Implement a generic .request() callback, using .raw_request()
* DO NOT USE in hid drivers directly, but through hid_hw_request instead.
*/
-void __hid_request(struct hid_device *hid, struct hid_report *report,
+int __hid_request(struct hid_device *hid, struct hid_report *report,
int reqtype)
{
char *buf;
@@ -1646,7 +1646,7 @@ void __hid_request(struct hid_device *hid, struct hid_report *report,
buf = hid_alloc_report_buf(report, GFP_KERNEL);
if (!buf)
- return;
+ return -ENOMEM;
len = hid_report_len(report);
@@ -1663,8 +1663,11 @@ void __hid_request(struct hid_device *hid, struct hid_report *report,
if (reqtype == HID_REQ_GET_REPORT)
hid_input_report(hid, report->type, buf, ret, 0);
+ ret = 0;
+
out:
kfree(buf);
+ return ret;
}
EXPORT_SYMBOL_GPL(__hid_request);
diff --git a/drivers/hid/hid-input.c b/drivers/hid/hid-input.c
index 1fce0076e7dc..fadf76f0a386 100644
--- a/drivers/hid/hid-input.c
+++ b/drivers/hid/hid-input.c
@@ -1542,52 +1542,71 @@ static void hidinput_close(struct input_dev *dev)
hid_hw_close(hid);
}
-static void hidinput_change_resolution_multipliers(struct hid_device *hid)
+static bool __hidinput_change_resolution_multipliers(struct hid_device *hid,
+ struct hid_report *report, bool use_logical_max)
{
- struct hid_report_enum *rep_enum;
- struct hid_report *rep;
struct hid_usage *usage;
+ bool update_needed = false;
int i, j;
- rep_enum = &hid->report_enum[HID_FEATURE_REPORT];
- list_for_each_entry(rep, &rep_enum->report_list, list) {
- bool update_needed = false;
+ if (report->maxfield == 0)
+ return false;
- if (rep->maxfield == 0)
- continue;
+ /*
+ * If we have more than one feature within this report we
+ * need to fill in the bits from the others before we can
+ * overwrite the ones for the Resolution Multiplier.
+ */
+ if (report->maxfield > 1) {
+ hid_hw_request(hid, report, HID_REQ_GET_REPORT);
+ hid_hw_wait(hid);
+ }
- /*
- * If we have more than one feature within this report we
- * need to fill in the bits from the others before we can
- * overwrite the ones for the Resolution Multiplier.
+ for (i = 0; i < report->maxfield; i++) {
+ __s32 value = use_logical_max ?
+ report->field[i]->logical_maximum :
+ report->field[i]->logical_minimum;
+
+ /* There is no good reason for a Resolution
+ * Multiplier to have a count other than 1.
+ * Ignore that case.
*/
- if (rep->maxfield > 1) {
- hid_hw_request(hid, rep, HID_REQ_GET_REPORT);
- hid_hw_wait(hid);
- }
+ if (report->field[i]->report_count != 1)
+ continue;
- for (i = 0; i < rep->maxfield; i++) {
- __s32 logical_max = rep->field[i]->logical_maximum;
+ for (j = 0; j < report->field[i]->maxusage; j++) {
+ usage = &report->field[i]->usage[j];
- /* There is no good reason for a Resolution
- * Multiplier to have a count other than 1.
- * Ignore that case.
- */
- if (rep->field[i]->report_count != 1)
+ if (usage->hid != HID_GD_RESOLUTION_MULTIPLIER)
continue;
- for (j = 0; j < rep->field[i]->maxusage; j++) {
- usage = &rep->field[i]->usage[j];
+ *report->field[i]->value = value;
+ update_needed = true;
+ }
+ }
+
+ return update_needed;
+}
- if (usage->hid != HID_GD_RESOLUTION_MULTIPLIER)
- continue;
+static void hidinput_change_resolution_multipliers(struct hid_device *hid)
+{
+ struct hid_report_enum *rep_enum;
+ struct hid_report *rep;
+ int ret;
- *rep->field[i]->value = logical_max;
- update_needed = true;
+ rep_enum = &hid->report_enum[HID_FEATURE_REPORT];
+ list_for_each_entry(rep, &rep_enum->report_list, list) {
+ bool update_needed = __hidinput_change_resolution_multipliers(hid,
+ rep, true);
+
+ if (update_needed) {
+ ret = __hid_request(hid, rep, HID_REQ_SET_REPORT);
+ if (ret) {
+ __hidinput_change_resolution_multipliers(hid,
+ rep, false);
+ return;
}
}
- if (update_needed)
- hid_hw_request(hid, rep, HID_REQ_SET_REPORT);
}
/* refresh our structs */
diff --git a/include/linux/hid.h b/include/linux/hid.h
index ac0c70b4ce10..5a24ebfb5b47 100644
--- a/include/linux/hid.h
+++ b/include/linux/hid.h
@@ -894,7 +894,7 @@ struct hid_field *hidinput_get_led_field(struct hid_device *hid);
unsigned int hidinput_count_leds(struct hid_device *hid);
__s32 hidinput_calc_abs_res(const struct hid_field *field, __u16 code);
void hid_output_report(struct hid_report *report, __u8 *data);
-void __hid_request(struct hid_device *hid, struct hid_report *rep, int reqtype);
+int __hid_request(struct hid_device *hid, struct hid_report *rep, int reqtype);
u8 *hid_alloc_report_buf(struct hid_report *report, gfp_t flags);
struct hid_device *hid_allocate_device(void);
struct hid_report *hid_register_report(struct hid_device *device,
--
2.19.2
Commit 1e07d63749 ("drm/nouveau: add kconfig option to turn off nouveau
legacy contexts. (v3)") has caused a build failure for me when I
actually tried that option (CONFIG_NOUVEAU_LEGACY_CTX_SUPPORT=n):
,----
| Kernel: arch/x86/boot/bzImage is ready (#1)
| Building modules, stage 2.
| MODPOST 290 modules
| ERROR: "drm_legacy_mmap" [drivers/gpu/drm/nouveau/nouveau.ko] undefined!
| scripts/Makefile.modpost:91: recipe for target '__modpost' failed
`----
Upstream does not have that problem, as commit bed2dd8421 ("drm/ttm:
Quick-test mmap offset in ttm_bo_mmap()") has removed the use of
drm_legacy_mmap from nouveau_ttm.c. Unfortunately that commit does not
apply in 5.1.9.
Most likely 4.19.50 and 4.14.125 are also affected, I haven't tested
them yet.
Cheers,
Sven