On Fri, Apr 16, 2021 at 10:35:25PM +0800, dillon min wrote:
> Hi Johan
>
> Thanks for share your patch.
>
> Johan Hovold <johan(a)kernel.org>于2021年4月16日 周五22:11写道:
>
> > When DMA is enabled the receive handler runs in a threaded handler, but
> > the primary handler up until very recently neither disabled interrupts
> > in the device or used IRQF_ONESHOT. This would lead to a deadlock if an
> > interrupt comes in while the threaded receive handler is running under
> > the port lock.
> >
> Greg told me there was a patch fixed this case. In case hard irq &
> threaded_fn both offered. The local_irq_save() will be executed before call
> driver’s threaded handler.
>
> Post the original mail from Greg
>
> Please see 81e2073c175b ("genirq: Disable interrupts for force threaded
> handlers") for when threaded irq handlers have irqs disabled, isn't that
> the case you are trying to "protect" from here?
>
> Why is the "threaded" flag used at all? The driver should not care.
>
> Also see 9baedb7baeda ("serial: imx: drop workaround for forced irq
> threading") in linux-next for an example of how this was fixed up in a
> serial driver.
Neither of these commits are (directly) related to the problem this
patch addresses (they are about force-threaded handlers, this is about a
normal threaded handler which run with interrupts enabled).
Johan
Commit 1340ccfa9a9a ("x86,sched: Allow topologies where NUMA nodes
share an LLC") added a vendor and model specific check to never
call topology_sane() for Intel Skylake Server systems where NUMA
nodes share an LLC.
Intel Ice Lake and Sapphire Rapids CPUs also enumerate an LLC that is
shared by multiple NUMA nodes. The LLC on these CPUs is shared for
off-package data access but private to the NUMA node for on-package
access. Rather than managing a list of allowable SNC topologies, make
this SNC topology the default, and treat Intel's Cluster-On-Die (COD)
topology as the exception.
In SNC mode, Sky Lake, Ice Lake, and Sapphire Rapids servers do not
emit this warning:
sched: CPU #3's llc-sibling CPU #0 is not on the same node! [node: 1 != 0]. Ignoring dependency.
Acked-by: Dave Hansen <dave.hansen(a)linux.intel.com>
Suggested-by: Peter Zijlstra (Intel) <peterz(a)infradead.org>
Signed-off-by: Alison Schofield <alison.schofield(a)intel.com>
Cc: stable(a)vger.kernel.org
Cc: Dave Hansen <dave.hansen(a)linux.intel.com>
Cc: Tony Luck <tony.luck(a)intel.com>
Cc: Tim Chen <tim.c.chen(a)linux.intel.com>
Cc: "H. Peter Anvin" <hpa(a)linux.intel.com>
Cc: Borislav Petkov <bp(a)alien8.de>
Cc: Peter Zijlstra (Intel) <peterz(a)infradead.org>
Cc: David Rientjes <rientjes(a)google.com>
Cc: Igor Mammedov <imammedo(a)redhat.com>
Cc: Prarit Bhargava <prarit(a)redhat.com>
Cc: brice.goglin(a)gmail.com
---
Changes v2->v3:
- This is a v3 of this patch: https://lore.kernel.org/lkml/20210216195804.24204-1-alison.schofield@intel.…
- Implemented PeterZ suggestion, and his code, to check for COD instead
of SNC.
- Updated commit message and log.
- Added 'Cc stable.
Changes v1->v2:
- Implemented the minimal required change of adding the new models to
the existing vendor and model specific check.
- Side effect of going minimalist: no longer labelled an X86_BUG (TonyL)
- Considered PeterZ suggestion of checking for COD CPUs, rather than
SNC CPUs. That meant this snc_cpu list would go away, and so it never
needs updating. That ups the stakes for this patch wrt LOC changed
and testing needed. It actually drove me back to this simplest soln.
- Considered DaveH suggestion to remove the check altogether and recognize
these topologies as sane. Not running with that further here. This patch
is what is needed now. The broader discussion of sane topologies can
carry on independent of this update.
- Updated commit message and log.
arch/x86/kernel/smpboot.c | 90 ++++++++++++++++++++-------------------
1 file changed, 46 insertions(+), 44 deletions(-)
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index 02813a7f3a7c..147b2f3a2a09 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -458,29 +458,52 @@ static bool match_smt(struct cpuinfo_x86 *c, struct cpuinfo_x86 *o)
return false;
}
+static bool match_die(struct cpuinfo_x86 *c, struct cpuinfo_x86 *o)
+{
+ if (c->phys_proc_id == o->phys_proc_id &&
+ c->cpu_die_id == o->cpu_die_id)
+ return true;
+ return false;
+}
+
+/*
+ * Unlike the other levels, we do not enforce keeping a
+ * multicore group inside a NUMA node. If this happens, we will
+ * discard the MC level of the topology later.
+ */
+static bool match_pkg(struct cpuinfo_x86 *c, struct cpuinfo_x86 *o)
+{
+ if (c->phys_proc_id == o->phys_proc_id)
+ return true;
+ return false;
+}
+
/*
- * Define snc_cpu[] for SNC (Sub-NUMA Cluster) CPUs.
+ * Define intel_cod_cpu[] for Intel COD (Cluster-on-Die) CPUs.
*
- * These are Intel CPUs that enumerate an LLC that is shared by
- * multiple NUMA nodes. The LLC on these systems is shared for
- * off-package data access but private to the NUMA node (half
- * of the package) for on-package access.
+ * Any Intel CPU that has multiple nodes per package and does not
+ * match intel_cod_cpu[] has the SNC (Sub-NUMA Cluster) topology.
*
- * CPUID (the source of the information about the LLC) can only
- * enumerate the cache as being shared *or* unshared, but not
- * this particular configuration. The CPU in this case enumerates
- * the cache to be shared across the entire package (spanning both
- * NUMA nodes).
+ * When in SNC mode, these CPUs enumerate an LLC that is shared
+ * by multiple NUMA nodes. The LLC is shared for off-package data
+ * access but private to the NUMA node (half of the package) for
+ * on-package access. CPUID (the source of the information about
+ * the LLC) can only enumerate the cache as shared or unshared,
+ * but not this particular configuration.
*/
-static const struct x86_cpu_id snc_cpu[] = {
- X86_MATCH_INTEL_FAM6_MODEL(SKYLAKE_X, NULL),
+static const struct x86_cpu_id intel_cod_cpu[] = {
+ X86_MATCH_INTEL_FAM6_MODEL(HASWELL_X, 0), /* COD */
+ X86_MATCH_INTEL_FAM6_MODEL(BROADWELL_X, 0), /* COD */
+ X86_MATCH_INTEL_FAM6_MODEL(ANY, 1), /* SNC */
{}
};
static bool match_llc(struct cpuinfo_x86 *c, struct cpuinfo_x86 *o)
{
+ const struct x86_cpu_id *id = x86_match_cpu(intel_cod_cpu);
int cpu1 = c->cpu_index, cpu2 = o->cpu_index;
+ bool intel_snc = id && id->driver_data;
/* Do not match if we do not have a valid APICID for cpu: */
if (per_cpu(cpu_llc_id, cpu1) == BAD_APICID)
@@ -495,32 +518,12 @@ static bool match_llc(struct cpuinfo_x86 *c, struct cpuinfo_x86 *o)
* means 'c' does not share the LLC of 'o'. This will be
* reflected to userspace.
*/
- if (!topology_same_node(c, o) && x86_match_cpu(snc_cpu))
+ if (match_pkg(c, o) && !topology_same_node(c, o) && intel_snc)
return false;
return topology_sane(c, o, "llc");
}
-/*
- * Unlike the other levels, we do not enforce keeping a
- * multicore group inside a NUMA node. If this happens, we will
- * discard the MC level of the topology later.
- */
-static bool match_pkg(struct cpuinfo_x86 *c, struct cpuinfo_x86 *o)
-{
- if (c->phys_proc_id == o->phys_proc_id)
- return true;
- return false;
-}
-
-static bool match_die(struct cpuinfo_x86 *c, struct cpuinfo_x86 *o)
-{
- if ((c->phys_proc_id == o->phys_proc_id) &&
- (c->cpu_die_id == o->cpu_die_id))
- return true;
- return false;
-}
-
#if defined(CONFIG_SCHED_SMT) || defined(CONFIG_SCHED_MC)
static inline int x86_sched_itmt_flags(void)
@@ -592,14 +595,23 @@ void set_cpu_sibling_map(int cpu)
for_each_cpu(i, cpu_sibling_setup_mask) {
o = &cpu_data(i);
+ if (match_pkg(c, o) && !topology_same_node(c, o))
+ x86_has_numa_in_package = true;
+
if ((i == cpu) || (has_smt && match_smt(c, o)))
link_mask(topology_sibling_cpumask, cpu, i);
if ((i == cpu) || (has_mp && match_llc(c, o)))
link_mask(cpu_llc_shared_mask, cpu, i);
+ if ((i == cpu) || (has_mp && match_die(c, o)))
+ link_mask(topology_die_cpumask, cpu, i);
}
+ threads = cpumask_weight(topology_sibling_cpumask(cpu));
+ if (threads > __max_smt_threads)
+ __max_smt_threads = threads;
+
/*
* This needs a separate iteration over the cpus because we rely on all
* topology_sibling_cpumask links to be set-up.
@@ -613,8 +625,7 @@ void set_cpu_sibling_map(int cpu)
/*
* Does this new cpu bringup a new core?
*/
- if (cpumask_weight(
- topology_sibling_cpumask(cpu)) == 1) {
+ if (threads == 1) {
/*
* for each core in package, increment
* the booted_cores for this new cpu
@@ -631,16 +642,7 @@ void set_cpu_sibling_map(int cpu)
} else if (i != cpu && !c->booted_cores)
c->booted_cores = cpu_data(i).booted_cores;
}
- if (match_pkg(c, o) && !topology_same_node(c, o))
- x86_has_numa_in_package = true;
-
- if ((i == cpu) || (has_mp && match_die(c, o)))
- link_mask(topology_die_cpumask, cpu, i);
}
-
- threads = cpumask_weight(topology_sibling_cpumask(cpu));
- if (threads > __max_smt_threads)
- __max_smt_threads = threads;
}
/* maps the cpu to the sched domain representing multi-core */
--
2.20.1
This is the start of the stable review cycle for the 4.4.267 release.
There are 38 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Sat, 17 Apr 2021 14:44:01 +0000.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.4.267-rc…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.4.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 4.4.267-rc1
Juergen Gross <jgross(a)suse.com>
xen/events: fix setting irq affinity
Arnaldo Carvalho de Melo <acme(a)redhat.com>
perf map: Tighten snprintf() string precision to pass gcc check on some 32-bit arches
Florian Westphal <fw(a)strlen.de>
netfilter: x_tables: fix compat match/target pad out-of-bound write
Arnd Bergmann <arnd(a)arndb.de>
drm/imx: imx-ldb: fix out of bounds array access warning
Alexander Aring <aahringo(a)redhat.com>
net: ieee802154: stop dump llsec params for monitors
Alexander Aring <aahringo(a)redhat.com>
net: ieee802154: forbid monitor for del llsec seclevel
Alexander Aring <aahringo(a)redhat.com>
net: ieee802154: forbid monitor for set llsec params
Alexander Aring <aahringo(a)redhat.com>
net: ieee802154: fix nl802154 del llsec devkey
Alexander Aring <aahringo(a)redhat.com>
net: ieee802154: fix nl802154 add llsec key
Alexander Aring <aahringo(a)redhat.com>
net: ieee802154: fix nl802154 del llsec dev
Alexander Aring <aahringo(a)redhat.com>
net: ieee802154: fix nl802154 del llsec key
Alexander Aring <aahringo(a)redhat.com>
net: ieee802154: nl-mac: fix check on panid
Pavel Skripkin <paskripkin(a)gmail.com>
net: mac802154: Fix general protection fault
Pavel Skripkin <paskripkin(a)gmail.com>
drivers: net: fix memory leak in peak_usb_create_dev
Pavel Skripkin <paskripkin(a)gmail.com>
drivers: net: fix memory leak in atusb_probe
Phillip Potter <phil(a)philpotter.co.uk>
net: tun: set tun->dev->addr_len during TUNSETLINK processing
Du Cheng <ducheng2(a)gmail.com>
cfg80211: remove WARN_ON() in cfg80211_sme_connect
Krzysztof Kozlowski <krzysztof.kozlowski(a)canonical.com>
clk: socfpga: fix iomem pointer cast on 64-bit
Potnuri Bharat Teja <bharat(a)chelsio.com>
RDMA/cxgb4: check for ipv6 address properly while destroying listener
Alexander Gordeev <agordeev(a)linux.ibm.com>
s390/cpcmd: fix inline assembly register clobbering
Zqiang <qiang.zhang(a)windriver.com>
workqueue: Move the position of debug_work_activate() in __queue_work()
Lukasz Bartosik <lb(a)semihalf.com>
clk: fix invalid usage of list cursor in unregister
Lv Yunlong <lyl2019(a)mail.ustc.edu.cn>
net:tipc: Fix a double free in tipc_sk_mcast_rcv
Claudiu Manoil <claudiu.manoil(a)nxp.com>
gianfar: Handle error code at MAC address change
Eric Dumazet <edumazet(a)google.com>
sch_red: fix off-by-one checks in red_check_params()
Pavel Tikhomirov <ptikhomirov(a)virtuozzo.com>
net: sched: sch_teql: fix null-pointer dereference
Tetsuo Handa <penguin-kernel(a)I-love.SAKURA.ne.jp>
batman-adv: initialize "struct batadv_tvlv_tt_vlan_data"->reserved field
Helge Deller <deller(a)gmx.de>
parisc: parisc-agp requires SBA IOMMU driver
Jack Qiu <jack.qiu(a)huawei.com>
fs: direct-io: fix missing sdio->boundary
Sergei Trofimovich <slyfox(a)gentoo.org>
ia64: fix user_stack_pointer() for ptrace()
Muhammad Usama Anjum <musamaanjum(a)gmail.com>
net: ipv6: check for validity before dereferencing cfg->fc_nlinfo.nlh
Luca Fancellu <luca.fancellu(a)arm.com>
xen/evtchn: Change irq_info lock to raw_spinlock_t
Xiaoming Ni <nixiaoming(a)huawei.com>
nfc: Avoid endless loops caused by repeated llcp_sock_connect()
Xiaoming Ni <nixiaoming(a)huawei.com>
nfc: fix memory leak in llcp_sock_connect()
Xiaoming Ni <nixiaoming(a)huawei.com>
nfc: fix refcount leak in llcp_sock_connect()
Xiaoming Ni <nixiaoming(a)huawei.com>
nfc: fix refcount leak in llcp_sock_bind()
Jonas Holmberg <jonashg(a)axis.com>
ALSA: aloop: Fix initialization of controls
Ye Xiang <xiang.ye(a)intel.com>
iio: hid-sensor-prox: Fix scale not correct issue
-------------
Diffstat:
Makefile | 4 +--
arch/ia64/include/asm/ptrace.h | 8 +----
arch/s390/kernel/cpcmd.c | 6 ++--
drivers/char/agp/Kconfig | 2 +-
drivers/clk/clk.c | 30 ++++++++---------
drivers/clk/socfpga/clk-gate.c | 2 +-
drivers/gpu/drm/imx/imx-ldb.c | 10 ++++++
drivers/iio/light/hid-sensor-prox.c | 14 ++++++--
drivers/infiniband/hw/cxgb4/cm.c | 3 +-
drivers/net/can/usb/peak_usb/pcan_usb_core.c | 6 +++-
drivers/net/ethernet/freescale/gianfar.c | 6 +++-
drivers/net/ieee802154/atusb.c | 1 +
drivers/net/tun.c | 48 ++++++++++++++++++++++++++++
drivers/xen/events/events_base.c | 14 ++++----
drivers/xen/events/events_internal.h | 2 +-
fs/direct-io.c | 5 +--
include/net/red.h | 4 +--
kernel/workqueue.c | 2 +-
net/batman-adv/translation-table.c | 1 +
net/ieee802154/nl-mac.c | 7 ++--
net/ieee802154/nl802154.c | 23 ++++++++++---
net/ipv4/netfilter/arp_tables.c | 2 ++
net/ipv4/netfilter/ip_tables.c | 2 ++
net/ipv6/netfilter/ip6_tables.c | 2 ++
net/ipv6/route.c | 8 +++--
net/mac802154/llsec.c | 2 +-
net/netfilter/x_tables.c | 10 ++----
net/nfc/llcp_sock.c | 10 ++++++
net/sched/sch_teql.c | 3 ++
net/tipc/socket.c | 2 +-
net/wireless/sme.c | 2 +-
sound/drivers/aloop.c | 11 +++++--
tools/perf/util/map.c | 7 ++--
33 files changed, 183 insertions(+), 76 deletions(-)