From: Vaibhav Jain <vaibhav(a)linux.vnet.ibm.com>
[ Upstream commit 07f5ab6002a4f0b633f3495157166f9f6180871b ]
Fix a boundary condition where in some cases an eeh event with state ==
pci_channel_io_perm_failure wont be passed on to a driver attached to
the virtual PCI device associated with a slice. This will happen in case
the slice just before (n-1) doesn't have any vPHB bus associated with
it, that results in an early return from cxl_pci_error_detected()
callback.
With state == pci_channel_io_perm_failure, the adapter will be removed
irrespective of the return value of cxl_vphb_error_detected(). So we now
always return PCI_ERS_RESULT_DISCONNECTED for this case i.e even if
the AFU isn't using a vPHB (currently returns PCI_ERS_RESULT_NONE).
Fixes: e4f5fc001a6("cxl: Do not create vPHB if there are no AFU configuration records")
Signed-off-by: Vaibhav Jain <vaibhav(a)linux.vnet.ibm.com>
Reviewed-by: Matthew R. Ochs <mrochs(a)linux.vnet.ibm.com>
Reviewed-by: Andrew Donnellan <andrew.donnellan(a)au1.ibm.com>
Acked-by: Frederic Barrat <fbarrat(a)linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe(a)ellerman.id.au>
Signed-off-by: Sasha Levin <alexander.levin(a)verizon.com>
---
drivers/misc/cxl/pci.c | 13 ++++++-------
1 file changed, 6 insertions(+), 7 deletions(-)
diff --git a/drivers/misc/cxl/pci.c b/drivers/misc/cxl/pci.c
index eef202d4399b..9b8628273b1b 100644
--- a/drivers/misc/cxl/pci.c
+++ b/drivers/misc/cxl/pci.c
@@ -1793,15 +1793,14 @@ static pci_ers_result_t cxl_pci_error_detected(struct pci_dev *pdev,
/* If we're permanently dead, give up. */
if (state == pci_channel_io_perm_failure) {
- /* Tell the AFU drivers; but we don't care what they
- * say, we're going away.
- */
for (i = 0; i < adapter->slices; i++) {
afu = adapter->afu[i];
- /* Only participate in EEH if we are on a virtual PHB */
- if (afu->phb == NULL)
- return PCI_ERS_RESULT_NONE;
- cxl_vphb_error_detected(afu, state);
+ /*
+ * Tell the AFU drivers; but we don't care what they
+ * say, we're going away.
+ */
+ if (afu->phb != NULL)
+ cxl_vphb_error_detected(afu, state);
}
return PCI_ERS_RESULT_DISCONNECT;
}
--
2.11.0
Fix child-node lookup during probe, which ended up searching the whole
device tree depth-first starting at the parent rather than just matching
on its children.
Note that the original premature free of the parent node has already
been fixed separately, but that fix was apparently never backported to
stable.
Fixes: 47654a162081 ("usb: chipidea: msm: Restore wrapper settings after reset")
Fixes: b74c43156c0c ("usb: chipidea: msm: ci_hdrc_msm_probe() missing of_node_get()")
Cc: stable <stable(a)vger.kernel.org> # 4.10: b74c43156c0c
Cc: Stephen Boyd <stephen.boyd(a)linaro.org>
Cc: Frank Rowand <frank.rowand(a)sony.com>
Signed-off-by: Johan Hovold <johan(a)kernel.org>
---
drivers/usb/chipidea/ci_hdrc_msm.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/usb/chipidea/ci_hdrc_msm.c b/drivers/usb/chipidea/ci_hdrc_msm.c
index 3593ce0ec641..880009987460 100644
--- a/drivers/usb/chipidea/ci_hdrc_msm.c
+++ b/drivers/usb/chipidea/ci_hdrc_msm.c
@@ -247,7 +247,7 @@ static int ci_hdrc_msm_probe(struct platform_device *pdev)
if (ret)
goto err_mux;
- ulpi_node = of_find_node_by_name(of_node_get(pdev->dev.of_node), "ulpi");
+ ulpi_node = of_get_child_by_name(pdev->dev.of_node, "ulpi");
if (ulpi_node) {
phy_node = of_get_next_available_child(ulpi_node, NULL);
ci->hsic = of_device_is_compatible(phy_node, "qcom,usb-hsic-phy");
--
2.15.0
The patch titled
Subject: mm, oom_reaper: fix memory corruption
has been added to the -mm tree. Its filename is
mm-oom_reaper-fix-memory-corruption.patch
This patch should soon appear at
http://ozlabs.org/~akpm/mmots/broken-out/mm-oom_reaper-fix-memory-corruptio…
and later at
http://ozlabs.org/~akpm/mmotm/broken-out/mm-oom_reaper-fix-memory-corruptio…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/SubmitChecklist when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: Michal Hocko <mhocko(a)suse.com>
Subject: mm, oom_reaper: fix memory corruption
David Rientjes has reported the following memory corruption while the oom
reaper tries to unmap the victims address space
BUG: Bad page map in process oom_reaper pte:6353826300000000 pmd:00000000
addr:00007f50cab1d000 vm_flags:08100073 anon_vma:ffff9eea335603f0 mapping: (null) index:7f50cab1d
file: (null) fault: (null) mmap: (null) readpage: (null)
CPU: 2 PID: 1001 Comm: oom_reaper
Call Trace:
[<ffffffffa4bd967d>] dump_stack+0x4d/0x70
[<ffffffffa4a03558>] unmap_page_range+0x1068/0x1130
[<ffffffffa4a2e07f>] __oom_reap_task_mm+0xd5/0x16b
[<ffffffffa4a2e226>] oom_reaper+0xff/0x14c
[<ffffffffa48d6ad1>] kthread+0xc1/0xe0
Tetsuo Handa has noticed that the synchronization inside exit_mmap is
insufficient. We only synchronize with the oom reaper if
tsk_is_oom_victim which is not true if the final __mmput is called from a
different context than the oom victim exit path. This can trivially
happen from context of any task which has grabbed mm reference (e.g. to
read /proc/<pid>/ file which requires mm etc.). The race would look like
this
oom_reaper oom_victim task
mmget_not_zero
do_exit
mmput
__oom_reap_task_mm mmput
__mmput
exit_mmap
remove_vma
unmap_page_range
Fix this issue by providing a new mm_is_oom_victim() helper which operates
on the mm struct rather than a task. Any context which operates on a
remote mm struct should use this helper in place of tsk_is_oom_victim.
The flag is set in mark_oom_victim and never cleared so it is stable in
the exit_mmap path.
Debugged by Tetsuo Handa.
Link: http://lkml.kernel.org/r/20171210095130.17110-1-mhocko@kernel.org
Fixes: 212925802454 ("mm: oom: let oom_reap_task and exit_mmap run concurrently")
Signed-off-by: Michal Hocko <mhocko(a)suse.com>
Reported-by: David Rientjes <rientjes(a)google.com>
Acked-by: David Rientjes <rientjes(a)google.com>
Cc: Tetsuo Handa <penguin-kernel(a)I-love.SAKURA.ne.jp>
Cc: Andrea Argangeli <andrea(a)kernel.org>
Cc: <stable(a)vger.kernel.org> [4.14]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
include/linux/oom.h | 9 +++++++++
include/linux/sched/coredump.h | 1 +
mm/mmap.c | 10 +++++-----
mm/oom_kill.c | 4 +++-
4 files changed, 18 insertions(+), 6 deletions(-)
diff -puN include/linux/oom.h~mm-oom_reaper-fix-memory-corruption include/linux/oom.h
--- a/include/linux/oom.h~mm-oom_reaper-fix-memory-corruption
+++ a/include/linux/oom.h
@@ -67,6 +67,15 @@ static inline bool tsk_is_oom_victim(str
}
/*
+ * Use this helper if tsk->mm != mm and the victim mm needs a special
+ * handling. This is guaranteed to stay true after once set.
+ */
+static inline bool mm_is_oom_victim(struct mm_struct *mm)
+{
+ return test_bit(MMF_OOM_VICTIM, &mm->flags);
+}
+
+/*
* Checks whether a page fault on the given mm is still reliable.
* This is no longer true if the oom reaper started to reap the
* address space which is reflected by MMF_UNSTABLE flag set in
diff -puN include/linux/sched/coredump.h~mm-oom_reaper-fix-memory-corruption include/linux/sched/coredump.h
--- a/include/linux/sched/coredump.h~mm-oom_reaper-fix-memory-corruption
+++ a/include/linux/sched/coredump.h
@@ -70,6 +70,7 @@ static inline int get_dumpable(struct mm
#define MMF_UNSTABLE 22 /* mm is unstable for copy_from_user */
#define MMF_HUGE_ZERO_PAGE 23 /* mm has ever used the global huge zero page */
#define MMF_DISABLE_THP 24 /* disable THP for all VMAs */
+#define MMF_OOM_VICTIM 25 /* mm is the oom victim */
#define MMF_DISABLE_THP_MASK (1 << MMF_DISABLE_THP)
#define MMF_INIT_MASK (MMF_DUMPABLE_MASK | MMF_DUMP_FILTER_MASK |\
diff -puN mm/mmap.c~mm-oom_reaper-fix-memory-corruption mm/mmap.c
--- a/mm/mmap.c~mm-oom_reaper-fix-memory-corruption
+++ a/mm/mmap.c
@@ -3019,20 +3019,20 @@ void exit_mmap(struct mm_struct *mm)
/* Use -1 here to ensure all VMAs in the mm are unmapped */
unmap_vmas(&tlb, vma, 0, -1);
- set_bit(MMF_OOM_SKIP, &mm->flags);
- if (unlikely(tsk_is_oom_victim(current))) {
+ if (unlikely(mm_is_oom_victim(mm))) {
/*
* Wait for oom_reap_task() to stop working on this
* mm. Because MMF_OOM_SKIP is already set before
* calling down_read(), oom_reap_task() will not run
* on this "mm" post up_write().
*
- * tsk_is_oom_victim() cannot be set from under us
- * either because current->mm is already set to NULL
+ * mm_is_oom_victim() cannot be set from under us
+ * either because victim->mm is already set to NULL
* under task_lock before calling mmput and oom_mm is
- * set not NULL by the OOM killer only if current->mm
+ * set not NULL by the OOM killer only if victim->mm
* is found not NULL while holding the task_lock.
*/
+ set_bit(MMF_OOM_SKIP, &mm->flags);
down_write(&mm->mmap_sem);
up_write(&mm->mmap_sem);
}
diff -puN mm/oom_kill.c~mm-oom_reaper-fix-memory-corruption mm/oom_kill.c
--- a/mm/oom_kill.c~mm-oom_reaper-fix-memory-corruption
+++ a/mm/oom_kill.c
@@ -683,8 +683,10 @@ static void mark_oom_victim(struct task_
return;
/* oom_mm is bound to the signal struct life time. */
- if (!cmpxchg(&tsk->signal->oom_mm, NULL, mm))
+ if (!cmpxchg(&tsk->signal->oom_mm, NULL, mm)) {
mmgrab(tsk->signal->oom_mm);
+ set_bit(MMF_OOM_VICTIM, &mm->flags);
+ }
/*
* Make sure that the task is woken up from uninterruptible sleep
_
Patches currently in -mm which might be from mhocko(a)suse.com are
mm-oom_reaper-fix-memory-corruption.patch
mm-drop-hotplug-lock-from-lru_add_drain_all.patch
mm-hugetlb-drop-hugepages_treat_as_movable-sysctl.patch
A verdict of NF_STOLEN after NF_QUEUE will cause an incorrect return value
and a potential kernel panic via double free of skb's
This was broken by commit 7034b566a4e7 ("netfilter: fix nf_queue handling")
and subsequently fixed in v4.10 by commit c63cbc460419 ("netfilter:
use switch() to handle verdict cases from nf_hook_slow()"). However that
commit cannot be cleanly cherry-picked to v4.9
Signed-off-by: Debabrata Banerjee <dbanerje(a)akamai.com>
---
This fix is only needed for v4.9 stable since v4.10+ does not have the
issue
---
net/netfilter/core.c | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/net/netfilter/core.c b/net/netfilter/core.c
index 004af030ef1a..d869ea50623e 100644
--- a/net/netfilter/core.c
+++ b/net/netfilter/core.c
@@ -364,6 +364,11 @@ int nf_hook_slow(struct sk_buff *skb, struct nf_hook_state *state)
ret = nf_queue(skb, state, &entry, verdict);
if (ret == 1 && entry)
goto next_hook;
+ } else {
+ /* Implicit handling for NF_STOLEN, as well as any other
+ * non conventional verdicts.
+ */
+ ret = 0;
}
return ret;
}
--
2.15.1
The patch titled
Subject: kernel: make groups_sort calling a responsibility group_info allocators
has been added to the -mm tree. Its filename is
kernel-make-groups_sort-calling-a-responsibility-group_info-allocators.patch
This patch should soon appear at
http://ozlabs.org/~akpm/mmots/broken-out/kernel-make-groups_sort-calling-a-…
and later at
http://ozlabs.org/~akpm/mmotm/broken-out/kernel-make-groups_sort-calling-a-…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/SubmitChecklist when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: Thiago Rafael Becker <thiago.becker(a)gmail.com>
Subject: kernel: make groups_sort calling a responsibility group_info allocators
In testing, we found that nfsd threads may call set_groups in parallel for
the same entry cached in auth.unix.gid, racing in the call of groups_sort,
corrupting the groups for that entry and leading to permission denials for
the client.
This patch:
- Make groups_sort globally visible.
- Move the call to groups_sort to the modifiers of group_info
- Remove the call to groups_sort from set_groups
Link: http://lkml.kernel.org/r/20171211151420.18655-1-thiago.becker@gmail.com
Signed-off-by: Thiago Rafael Becker <thiago.becker(a)gmail.com>
Reviewed-by: Matthew Wilcox <mawilcox(a)microsoft.com>
Reviewed-by: NeilBrown <neilb(a)suse.com>
Acked-by: "J. Bruce Fields" <bfields(a)fieldses.org>
Cc: Al Viro <viro(a)zeniv.linux.org.uk>
Cc: Martin Schwidefsky <schwidefsky(a)de.ibm.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
arch/s390/kernel/compat_linux.c | 1 +
fs/nfsd/auth.c | 3 +++
include/linux/cred.h | 1 +
kernel/groups.c | 5 +++--
kernel/uid16.c | 1 +
net/sunrpc/auth_gss/gss_rpc_xdr.c | 1 +
net/sunrpc/auth_gss/svcauth_gss.c | 1 +
net/sunrpc/svcauth_unix.c | 2 ++
8 files changed, 13 insertions(+), 2 deletions(-)
diff -puN arch/s390/kernel/compat_linux.c~kernel-make-groups_sort-calling-a-responsibility-group_info-allocators arch/s390/kernel/compat_linux.c
--- a/arch/s390/kernel/compat_linux.c~kernel-make-groups_sort-calling-a-responsibility-group_info-allocators
+++ a/arch/s390/kernel/compat_linux.c
@@ -263,6 +263,7 @@ COMPAT_SYSCALL_DEFINE2(s390_setgroups16,
return retval;
}
+ groups_sort(group_info);
retval = set_current_groups(group_info);
put_group_info(group_info);
diff -puN fs/nfsd/auth.c~kernel-make-groups_sort-calling-a-responsibility-group_info-allocators fs/nfsd/auth.c
--- a/fs/nfsd/auth.c~kernel-make-groups_sort-calling-a-responsibility-group_info-allocators
+++ a/fs/nfsd/auth.c
@@ -60,6 +60,9 @@ int nfsd_setuser(struct svc_rqst *rqstp,
gi->gid[i] = exp->ex_anon_gid;
else
gi->gid[i] = rqgi->gid[i];
+
+ /* Each thread allocates its own gi, no race */
+ groups_sort(gi);
}
} else {
gi = get_group_info(rqgi);
diff -puN include/linux/cred.h~kernel-make-groups_sort-calling-a-responsibility-group_info-allocators include/linux/cred.h
--- a/include/linux/cred.h~kernel-make-groups_sort-calling-a-responsibility-group_info-allocators
+++ a/include/linux/cred.h
@@ -83,6 +83,7 @@ extern int set_current_groups(struct gro
extern void set_groups(struct cred *, struct group_info *);
extern int groups_search(const struct group_info *, kgid_t);
extern bool may_setgroups(void);
+extern void groups_sort(struct group_info *);
/*
* The security context of a task
diff -puN kernel/groups.c~kernel-make-groups_sort-calling-a-responsibility-group_info-allocators kernel/groups.c
--- a/kernel/groups.c~kernel-make-groups_sort-calling-a-responsibility-group_info-allocators
+++ a/kernel/groups.c
@@ -86,11 +86,12 @@ static int gid_cmp(const void *_a, const
return gid_gt(a, b) - gid_lt(a, b);
}
-static void groups_sort(struct group_info *group_info)
+void groups_sort(struct group_info *group_info)
{
sort(group_info->gid, group_info->ngroups, sizeof(*group_info->gid),
gid_cmp, NULL);
}
+EXPORT_SYMBOL(groups_sort);
/* a simple bsearch */
int groups_search(const struct group_info *group_info, kgid_t grp)
@@ -122,7 +123,6 @@ int groups_search(const struct group_inf
void set_groups(struct cred *new, struct group_info *group_info)
{
put_group_info(new->group_info);
- groups_sort(group_info);
get_group_info(group_info);
new->group_info = group_info;
}
@@ -206,6 +206,7 @@ SYSCALL_DEFINE2(setgroups, int, gidsetsi
return retval;
}
+ groups_sort(group_info);
retval = set_current_groups(group_info);
put_group_info(group_info);
diff -puN kernel/uid16.c~kernel-make-groups_sort-calling-a-responsibility-group_info-allocators kernel/uid16.c
--- a/kernel/uid16.c~kernel-make-groups_sort-calling-a-responsibility-group_info-allocators
+++ a/kernel/uid16.c
@@ -192,6 +192,7 @@ SYSCALL_DEFINE2(setgroups16, int, gidset
return retval;
}
+ groups_sort(group_info);
retval = set_current_groups(group_info);
put_group_info(group_info);
diff -puN net/sunrpc/auth_gss/gss_rpc_xdr.c~kernel-make-groups_sort-calling-a-responsibility-group_info-allocators net/sunrpc/auth_gss/gss_rpc_xdr.c
--- a/net/sunrpc/auth_gss/gss_rpc_xdr.c~kernel-make-groups_sort-calling-a-responsibility-group_info-allocators
+++ a/net/sunrpc/auth_gss/gss_rpc_xdr.c
@@ -231,6 +231,7 @@ static int gssx_dec_linux_creds(struct x
goto out_free_groups;
creds->cr_group_info->gid[i] = kgid;
}
+ groups_sort(creds->cr_group_info);
return 0;
out_free_groups:
diff -puN net/sunrpc/auth_gss/svcauth_gss.c~kernel-make-groups_sort-calling-a-responsibility-group_info-allocators net/sunrpc/auth_gss/svcauth_gss.c
--- a/net/sunrpc/auth_gss/svcauth_gss.c~kernel-make-groups_sort-calling-a-responsibility-group_info-allocators
+++ a/net/sunrpc/auth_gss/svcauth_gss.c
@@ -481,6 +481,7 @@ static int rsc_parse(struct cache_detail
goto out;
rsci.cred.cr_group_info->gid[i] = kgid;
}
+ groups_sort(rsci.cred.cr_group_info);
/* mech name */
len = qword_get(&mesg, buf, mlen);
diff -puN net/sunrpc/svcauth_unix.c~kernel-make-groups_sort-calling-a-responsibility-group_info-allocators net/sunrpc/svcauth_unix.c
--- a/net/sunrpc/svcauth_unix.c~kernel-make-groups_sort-calling-a-responsibility-group_info-allocators
+++ a/net/sunrpc/svcauth_unix.c
@@ -520,6 +520,7 @@ static int unix_gid_parse(struct cache_d
ug.gi->gid[i] = kgid;
}
+ groups_sort(ug.gi);
ugp = unix_gid_lookup(cd, uid);
if (ugp) {
struct cache_head *ch;
@@ -819,6 +820,7 @@ svcauth_unix_accept(struct svc_rqst *rqs
kgid_t kgid = make_kgid(&init_user_ns, svc_getnl(argv));
cred->cr_group_info->gid[i] = kgid;
}
+ groups_sort(cred->cr_group_info);
if (svc_getu32(argv) != htonl(RPC_AUTH_NULL) || svc_getu32(argv) != 0) {
*authp = rpc_autherr_badverf;
return SVC_DENIED;
_
Patches currently in -mm which might be from thiago.becker(a)gmail.com are
kernel-make-groups_sort-calling-a-responsibility-group_info-allocators.patch
Hi Eric,
A kernel bug report was opened against Ubuntu [0]. It was found that
reverting the following commit resolved this bug:
commit b2504a5dbef3305ef41988ad270b0e8ec289331c
Author: Eric Dumazet <edumazet(a)google.com>
Date: Tue Jan 31 10:20:32 2017 -0800
net: reduce skb_warn_bad_offload() noise
The regression was introduced as of v4.11-rc1 and still exists in
current mainline.
I was hoping to get your feedback, since you are the patch author. Do
you think gathering any additional data will help diagnose this issue,
or would it be best to submit a revert request?
This commit did in fact resolve another bug[1], but in the process
introduced this regression.
Thanks,
Joe
[0] http://pad.lv/1715609
[1] http://pad.lv/1705447
Hi,
This series corrects a number of issues with NT_PRFPREG regset, most
importantly an FCSR access API regression introduced with the addition of
MSA support, and then a few smaller issues with the get/set handlers.
I have decided to factor out non-MSA and MSA context helpers as the first
step to avoid the issue with excessive indentation that would inevitably
happen if the regression fix was applied to current code as it stands.
It shouldn't be a big deal with backporting as this code hasn't changed
much since the regression, and it will make any future bacports easier.
Only a call to `init_fp_ctx' will have to be trivially resolved (though
arguably commit ac9ad83bc318 ("MIPS: prevent FP context set via ptrace
being discarded"), which has added `init_fp_ctx', would be good to
backport as far as possible instead).
These changes have been verified by examining the register state recorded
in core dumps manually with GDB, as well as by running the GDB test suite.
No user of ptrace(2) PTRACE_GETREGSET and PTRACE_SETREGSET requests is
known for the MIPS port, so this part remains not covered, however it is
assumed to remain consistent with how the creation of core file works.
See individual patch descriptions for further details, and for changes
made since v1 to address concerns raised in the review.
Maciej
Hi,
On Tue, Dec 12, 2017 at 3:06 AM, Felipe Balbi <balbi(a)kernel.org> wrote:
>
> Hi,
>
> Douglas Anderson <dianders(a)chromium.org> writes:
>> On rk3288-veyron devices on Chrome OS it was found that plugging in an
>> Arduino-based USB device could cause the system to lockup, especially
>> if the CPU Frequency was at one of the slower operating points (like
>> 100 MHz / 200 MHz).
>>
>> Upon tracing, I found that the following was happening:
>> * The USB device (full speed) was connected to a high speed hub and
>> then to the rk3288. Thus, we were dealing with split transactions,
>> which is all handled in software on dwc2.
>> * Userspace was initiating a BULK IN transfer
>> * When we sent the SSPLIT (to start the split transaction), we got an
>> ACK. Good. Then we issued the CSPLIT.
>> * When we sent the CSPLIT, we got back a NAK. We immediately (from
>> the interrupt handler) started to retry and sent another SSPLIT.
>> * The device kept NAKing our CSPLIT, so we kept ping-ponging between
>> sending a SSPLIT and a CSPLIT, each time sending from the interrupt
>> handler.
>> * The handling of the interrupts was (because of the low CPU speed and
>> the inefficiency of the dwc2 interrupt handler) was actually taking
>> _longer_ than it took the other side to send the ACK/NAK. Thus we
>> were _always_ in the USB interrupt routine.
>> * The fact that USB interrupts were always going off was preventing
>> other things from happening in the system. This included preventing
>> the system from being able to transition to a higher CPU frequency.
>>
>> As I understand it, there is no requirement to retry super quickly
>> after a NAK, we just have to retry sometime in the future. Thus one
>> solution to the above is to just add a delay between getting a NAK and
>> retrying the transmission. If this delay is sufficiently long to get
>> out of the interrupt routine then the rest of the system will be able
>> to make forward progress. Even a 25 us delay would probably be
>> enough, but we'll be extra conservative and try to delay 1 ms (the
>> exact amount depends on HZ and the accuracy of the jiffy and how close
>> the current jiffy is to ticking, but could be as much as 20 ms or as
>> little as 1 ms).
>>
>> Presumably adding a delay like this could impact the USB throughput,
>> so we only add the delay with repeated NAKs.
>>
>> NOTE: Upon further testing of a pl2303 serial adapter, I found that
>> this fix may help with problems there. Specifically I found that the
>> pl2303 serial adapters tend to respond with a NAK when they have
>> nothing to say and thus we end with this same sequence.
>>
>> Signed-off-by: Douglas Anderson <dianders(a)chromium.org>
>> Cc: stable(a)vger.kernel.org
>> Reviewed-by: Julius Werner <jwerner(a)chromium.org>
>> Tested-by: Stefan Wahren <stefan.wahren(a)i2se.com>
>
> This seems too big for -rc or -stable inclusion.
I've removed the stable tag at your request. I originally added it at
your request in response to v2 of this patch. I'd agree that it's a
pretty big patch and therefore "risky" to pick back to stable. ...but
it does fix a bug reported by several people on the mailing lists, so
I'll leave it to your discretion. Previously in relation to the
stable tag, I had mentioned:
It's a little weird since it doesn't "fix" any specific
commit, so I guess it will be up to stable folks to decide how far to
go back. The dwc2 devices I work with are actually on 3.14, but we
have some pretty massive backports related to dwc2 there...
> In any case, this
> doesn't apply to my testing/next branch. Care to rebase and collect acks
> you received while doing that?
Sure, no problem. I've posted v4 with John Youn's Ack. The reason v3
didn't apply is that you've now got commit e99e88a9d2b0 ("treewide:
setup_timer() -> timer_setup()"). Originally my plan was to beat that
patch into the kernel and then I'd do the timer conversion myself.
That was patch #2 in the v3 series, AKA
<https://patchwork.kernel.org/patch/10032935/>. ...but since I failed
to beat Kees' patch in, I've now squashed patches #1 and #2 together
and resolved the trivial conflict.
If anyone were thinking of trying to backport this patch to older
kernels (where they presumably don't have Kees's timer patch) they can
always use the v3 patch posted here as a reference for how to make
things work. ;)
-Doug