[PATCH AUTOSEL 4.9 01/26] net: wan: wanxl: use allow to pass CROSS_COMPILE_M68k for rebuilding firmware

List overview All Threads
Download

newer

older

[PATCH] ARM: imx: provide...

Re: [PATCH] uapi: fix userspace...

Sasha Levin

11 Apr 2020 11 Apr '20

11:13 p.m.

From: Masahiro Yamada masahiroy@kernel.org

[ Upstream commit 63b903dfebdea92aa92ad337d8451a6fbfeabf9d ]

As far as I understood from the Kconfig help text, this build rule is used to rebuild the driver firmware, which runs on an old m68k-based chip. So, you need m68k tools for the firmware rebuild.

wanxl.c is a PCI driver, but CONFIG_M68K does not select CONFIG_HAVE_PCI. So, you cannot enable CONFIG_WANXL_BUILD_FIRMWARE for ARCH=m68k. In other words, ifeq ($(ARCH),m68k) is false here.

I am keeping the dead code for now, but rebuilding the firmware requires 'as68k' and 'ld68k', which I do not have in hand.

Instead, the kernel.org m68k GCC [1] successfully built it.

Allowing a user to pass in CROSS_COMPILE_M68K= is handier.

[1] https://mirrors.edge.kernel.org/pub/tools/crosstool/files/bin/x86_64/9.2.0/x...

Suggested-by: Geert Uytterhoeven geert@linux-m68k.org Signed-off-by: Masahiro Yamada masahiroy@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/wan/Kconfig | 2 +- drivers/net/wan/Makefile | 12 ++++++------ 2 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/drivers/net/wan/Kconfig b/drivers/net/wan/Kconfig index 4e9fe75d70675..21190dfbabb16 100644 --- a/drivers/net/wan/Kconfig +++ b/drivers/net/wan/Kconfig @@ -199,7 +199,7 @@ config WANXL_BUILD_FIRMWARE depends on WANXL && !PREVENT_FIRMWARE_BUILD help Allows you to rebuild firmware run by the QUICC processor. - It requires as68k, ld68k and hexdump programs. + It requires m68k toolchains and hexdump programs.

You should never need this option, say N.

diff --git a/drivers/net/wan/Makefile b/drivers/net/wan/Makefile index 73c2326603fcc..fbe8b2815a87c 100644 --- a/drivers/net/wan/Makefile +++ b/drivers/net/wan/Makefile @@ -40,17 +40,17 @@ $(obj)/wanxl.o: $(obj)/wanxlfw.inc

ifeq ($(CONFIG_WANXL_BUILD_FIRMWARE),y) ifeq ($(ARCH),m68k) - AS68K = $(AS) - LD68K = $(LD) + M68KAS = $(AS) + M68KLD = $(LD) else - AS68K = as68k - LD68K = ld68k + M68KAS = $(CROSS_COMPILE_M68K)as + M68KLD = $(CROSS_COMPILE_M68K)ld endif

quiet_cmd_build_wanxlfw = BLD FW $@ cmd_build_wanxlfw = \ - $(CPP) -D__ASSEMBLY__ -Wp,-MD,$(depfile) -I$(srctree)/include/uapi $< | $(AS68K) -m68360 -o $(obj)/wanxlfw.o; \ - $(LD68K) --oformat binary -Ttext 0x1000 $(obj)/wanxlfw.o -o $(obj)/wanxlfw.bin; \ + $(CPP) -D__ASSEMBLY__ -Wp,-MD,$(depfile) -I$(srctree)/include/uapi $< | $(M68KAS) -m68360 -o $(obj)/wanxlfw.o; \ + $(M68KLD) --oformat binary -Ttext 0x1000 $(obj)/wanxlfw.o -o $(obj)/wanxlfw.bin; \ hexdump -ve '"\n" 16/1 "0x%02X,"' $(obj)/wanxlfw.bin | sed 's/0x ,//g;1s/^/static const u8 firmware[]={/;$$s/,$$/\n};\n/' >$(obj)/wanxlfw.inc; \ rm -f $(obj)/wanxlfw.bin $(obj)/wanxlfw.o

-- 2.20.1

Show replies by date

Sasha Levin

11 Apr 11 Apr

11:13 p.m.

New subject: [PATCH AUTOSEL 4.9 02/26] net: phy: probe PHY drivers synchronously

From: Heiner Kallweit hkallweit1@gmail.com

[ Upstream commit 16983507742cbcaa5592af530872a82e82fb9c51 ]

If we have scenarios like

mdiobus_register() -> loads PHY driver module(s) -> registers PHY driver(s) -> may schedule async probe phydev = mdiobus_get_phy() <phydev action involving PHY driver>

phydev = phy_device_create() -> loads PHY driver module -> registers PHY driver -> may schedule async probe <phydev action involving PHY driver>

then we expect the PHY driver to be bound to the phydev when triggering the action. This may not be the case in case of asynchronous probing. Therefore ensure that PHY drivers are probed synchronously.

Default still is sync probing, except async probing is explicitly requested. I saw some comments that the intention is to promote async probing for more parallelism in boot process and want to be prepared for the case that the default is changed to async probing.

Signed-off-by: Heiner Kallweit hkallweit1@gmail.com Reviewed-by: Florian Fainelli f.fainelli@gmail.com Reviewed-by: Andrew Lunn andrew@lunn.ch Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/phy/phy_device.c | 1 + 1 file changed, 1 insertion(+)

diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c index 2f55873060220..8252df3779829 100644 --- a/drivers/net/phy/phy_device.c +++ b/drivers/net/phy/phy_device.c @@ -1744,6 +1744,7 @@ int phy_driver_register(struct phy_driver *new_driver, struct module *owner) new_driver->mdiodrv.driver.probe = phy_probe; new_driver->mdiodrv.driver.remove = phy_remove; new_driver->mdiodrv.driver.owner = owner; + new_driver->mdiodrv.driver.probe_type = PROBE_FORCE_SYNCHRONOUS;

retval = driver_register(&new_driver->mdiodrv.driver); if (retval) {

-- 2.20.1

Sasha Levin

11:13 p.m.

New subject: [PATCH AUTOSEL 4.9 03/26] serial: 8250_omap: Fix sleeping function called from invalid context during probe

From: Peter Ujfalusi peter.ujfalusi@ti.com

[ Upstream commit 4ce35a3617c0ac758c61122b2218b6c8c9ac9398 ]

When booting j721e the following bug is printed:

[ 1.154821] BUG: sleeping function called from invalid context at kernel/sched/completion.c:99 [ 1.154827] in_atomic(): 0, irqs_disabled(): 128, non_block: 0, pid: 12, name: kworker/0:1 [ 1.154832] 3 locks held by kworker/0:1/12: [ 1.154836] #0: ffff000840030728 ((wq_completion)events){+.+.}, at: process_one_work+0x1d4/0x6e8 [ 1.154852] #1: ffff80001214fdd8 (deferred_probe_work){+.+.}, at: process_one_work+0x1d4/0x6e8 [ 1.154860] #2: ffff00084060b170 (&dev->mutex){....}, at: __device_attach+0x38/0x138 [ 1.154872] irq event stamp: 63096 [ 1.154881] hardirqs last enabled at (63095): [<ffff800010b74318>] _raw_spin_unlock_irqrestore+0x70/0x78 [ 1.154887] hardirqs last disabled at (63096): [<ffff800010b740d8>] _raw_spin_lock_irqsave+0x28/0x80 [ 1.154893] softirqs last enabled at (62254): [<ffff800010080c88>] _stext+0x488/0x564 [ 1.154899] softirqs last disabled at (62247): [<ffff8000100fdb3c>] irq_exit+0x114/0x140 [ 1.154906] CPU: 0 PID: 12 Comm: kworker/0:1 Not tainted 5.6.0-rc6-next-20200318-00094-g45e4089b0bd3 #221 [ 1.154911] Hardware name: Texas Instruments K3 J721E SoC (DT) [ 1.154917] Workqueue: events deferred_probe_work_func [ 1.154923] Call trace: [ 1.154928] dump_backtrace+0x0/0x190 [ 1.154933] show_stack+0x14/0x20 [ 1.154940] dump_stack+0xe0/0x148 [ 1.154946] ___might_sleep+0x150/0x1f0 [ 1.154952] __might_sleep+0x4c/0x80 [ 1.154957] wait_for_completion_timeout+0x40/0x140 [ 1.154964] ti_sci_set_device_state+0xa0/0x158 [ 1.154969] ti_sci_cmd_get_device_exclusive+0x14/0x20 [ 1.154977] ti_sci_dev_start+0x34/0x50 [ 1.154984] genpd_runtime_resume+0x78/0x1f8 [ 1.154991] __rpm_callback+0x3c/0x140 [ 1.154996] rpm_callback+0x20/0x80 [ 1.155001] rpm_resume+0x568/0x758 [ 1.155007] __pm_runtime_resume+0x44/0xb0 [ 1.155013] omap8250_probe+0x2b4/0x508 [ 1.155019] platform_drv_probe+0x50/0xa0 [ 1.155023] really_probe+0xd4/0x318 [ 1.155028] driver_probe_device+0x54/0xe8 [ 1.155033] __device_attach_driver+0x80/0xb8 [ 1.155039] bus_for_each_drv+0x74/0xc0 [ 1.155044] __device_attach+0xdc/0x138 [ 1.155049] device_initial_probe+0x10/0x18 [ 1.155053] bus_probe_device+0x98/0xa0 [ 1.155058] deferred_probe_work_func+0x74/0xb0 [ 1.155063] process_one_work+0x280/0x6e8 [ 1.155068] worker_thread+0x48/0x430 [ 1.155073] kthread+0x108/0x138 [ 1.155079] ret_from_fork+0x10/0x18

To fix the bug we need to first call pm_runtime_enable() prior to any pm_runtime calls.

Reported-by: Tomi Valkeinen tomi.valkeinen@ti.com Signed-off-by: Peter Ujfalusi peter.ujfalusi@ti.com Link: https://lore.kernel.org/r/20200320125200.6772-1-peter.ujfalusi@ti.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/tty/serial/8250/8250_omap.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/tty/serial/8250/8250_omap.c b/drivers/tty/serial/8250/8250_omap.c index a3adf21f9dcec..7d4680ef5307d 100644 --- a/drivers/tty/serial/8250/8250_omap.c +++ b/drivers/tty/serial/8250/8250_omap.c @@ -1194,11 +1194,11 @@ static int omap8250_probe(struct platform_device *pdev) spin_lock_init(&priv->rx_dma_lock);

device_init_wakeup(&pdev->dev, true); + pm_runtime_enable(&pdev->dev); pm_runtime_use_autosuspend(&pdev->dev); pm_runtime_set_autosuspend_delay(&pdev->dev, -1);

pm_runtime_irq_safe(&pdev->dev); - pm_runtime_enable(&pdev->dev);

pm_runtime_get_sync(&pdev->dev);

-- 2.20.1

Sasha Levin

11:13 p.m.

New subject: [PATCH AUTOSEL 4.9 04/26] net: phy: mscc: accept all RGMII species in vsc85xx_mac_if_set

From: Vladimir Oltean vladimir.oltean@nxp.com

[ Upstream commit da206d65f2b293274f8082a26da4e43a1610da54 ]

The helper for configuring the pinout of the MII side of the PHY should do so irrespective of whether RGMII delays are used or not. So accept the ID, TXID and RXID variants as well, not just the no-delay RGMII variant.

Signed-off-by: Vladimir Oltean vladimir.oltean@nxp.com Reviewed-by: Florian Fainelli f.fainelli@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/phy/mscc.c | 3 +++ 1 file changed, 3 insertions(+)

diff --git a/drivers/net/phy/mscc.c b/drivers/net/phy/mscc.c index 77a6671d572e3..2c4de7f87c3fd 100644 --- a/drivers/net/phy/mscc.c +++ b/drivers/net/phy/mscc.c @@ -260,6 +260,9 @@ static int vsc85xx_mac_if_set(struct phy_device *phydev, reg_val = phy_read(phydev, MSCC_PHY_EXT_PHY_CNTL_1); reg_val &= ~(MAC_IF_SELECTION_MASK); switch (interface) { + case PHY_INTERFACE_MODE_RGMII_TXID: + case PHY_INTERFACE_MODE_RGMII_RXID: + case PHY_INTERFACE_MODE_RGMII_ID: case PHY_INTERFACE_MODE_RGMII: reg_val |= (MAC_IF_SELECTION_RGMII << MAC_IF_SELECTION_POS); break;

-- 2.20.1

Sasha Levin

11:13 p.m.

New subject: [PATCH AUTOSEL 4.9 05/26] RDMA/cm: Add missing locking around id.state in cm_dup_req_handler

From: Jason Gunthorpe jgg@mellanox.com

[ Upstream commit d1de9a88074b66482443f0cd91618d7b51a7c9b6 ]

All accesses to id.state must be done under the spinlock.

Fixes: a977049dacde ("[PATCH] IB: Add the kernel CM implementation") Link: https://lore.kernel.org/r/20200310092545.251365-10-leon@kernel.org Signed-off-by: Leon Romanovsky leonro@mellanox.com Signed-off-by: Jason Gunthorpe jgg@mellanox.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/infiniband/core/cm.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c index 304429fd04ddb..c764a29c11323 100644 --- a/drivers/infiniband/core/cm.c +++ b/drivers/infiniband/core/cm.c @@ -1544,8 +1544,12 @@ static void cm_dup_req_handler(struct cm_work *work, counter[CM_REQ_COUNTER]);

/* Quick state check to discard duplicate REQs. */ - if (cm_id_priv->id.state == IB_CM_REQ_RCVD) + spin_lock_irq(&cm_id_priv->lock); + if (cm_id_priv->id.state == IB_CM_REQ_RCVD) { + spin_unlock_irq(&cm_id_priv->lock); return; + } + spin_unlock_irq(&cm_id_priv->lock);

ret = cm_alloc_response_msg(work->port, work->mad_recv_wc, &msg); if (ret)

-- 2.20.1

Sasha Levin

11:13 p.m.

New subject: [PATCH AUTOSEL 4.9 06/26] mwifiex: set needed_headroom, not hard_header_len

From: Brian Norris briannorris@chromium.org

[ Upstream commit 9454f7a895b822dd8fb4588fc55fda7c96728869 ]

hard_header_len provides limitations for things like AF_PACKET, such that we don't allow transmitting packets smaller than this.

needed_headroom provides a suggested minimum headroom for SKBs, so that we can trivally add our headers to the front.

The latter is the correct field to use in this case, while the former mostly just prevents sending small AF_PACKET frames.

In any case, mwifiex already does its own bounce buffering [1] if we don't have enough headroom, so hints (not hard limits) are all that are needed.

This is the essentially the same bug (and fix) that brcmfmac had, fixed in commit cb39288fd6bb ("brcmfmac: use ndev->needed_headroom to reserve additional header space").

[1] mwifiex_hard_start_xmit(): if (skb_headroom(skb) < MWIFIEX_MIN_DATA_HEADER_LEN) { [...] /* Insufficient skb headroom - allocate a new skb */

Fixes: 5e6e3a92b9a4 ("wireless: mwifiex: initial commit for Marvell mwifiex driver") Signed-off-by: Brian Norris briannorris@chromium.org Acked-by: Ganapathi Bhat ganapathi.gbhat@nxp.com Signed-off-by: Kalle Valo kvalo@codeaurora.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/wireless/marvell/mwifiex/cfg80211.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/wireless/marvell/mwifiex/cfg80211.c b/drivers/net/wireless/marvell/mwifiex/cfg80211.c index 94901b0041cec..4c705cacd8813 100644 --- a/drivers/net/wireless/marvell/mwifiex/cfg80211.c +++ b/drivers/net/wireless/marvell/mwifiex/cfg80211.c @@ -2992,7 +2992,7 @@ struct wireless_dev *mwifiex_add_virtual_intf(struct wiphy *wiphy,

dev->flags |= IFF_BROADCAST | IFF_MULTICAST; dev->watchdog_timeo = MWIFIEX_DEFAULT_WATCHDOG_TIMEOUT; - dev->hard_header_len += MWIFIEX_MIN_DATA_HEADER_LEN; + dev->needed_headroom = MWIFIEX_MIN_DATA_HEADER_LEN; dev->ethtool_ops = &mwifiex_ethtool_ops;

mdev_priv = netdev_priv(dev);

-- 2.20.1

Sasha Levin

11:13 p.m.

New subject: [PATCH AUTOSEL 4.9 07/26] Bluetooth: L2CAP: handle l2cap config request during open state

From: Howard Chung howardchung@google.com

[ Upstream commit 96298f640104e4cd9a913a6e50b0b981829b94ff ]

According to Core Spec Version 5.2 | Vol 3, Part A 6.1.5, the incoming L2CAP_ConfigReq should be handled during OPEN state.

The section below shows the btmon trace when running L2CAP/COS/CFD/BV-12-C before and after this change.

=== Before === ...

...

ACL Data RX: Handle 256 flags 0x02 dlen 12 #22

L2CAP: Connection Request (0x02) ident 2 len 4 PSM: 1 (0x0001) Source CID: 65 < ACL Data TX: Handle 256 flags 0x00 dlen 16 #23 L2CAP: Connection Response (0x03) ident 2 len 8 Destination CID: 64 Source CID: 65 Result: Connection successful (0x0000) Status: No further information available (0x0000) < ACL Data TX: Handle 256 flags 0x00 dlen 12 #24 L2CAP: Configure Request (0x04) ident 2 len 4 Destination CID: 65 Flags: 0x0000

...

HCI Event: Number of Completed Packets (0x13) plen 5 #25

Num handles: 1 Handle: 256 Count: 1

...

HCI Event: Number of Completed Packets (0x13) plen 5 #26

Num handles: 1 Handle: 256 Count: 1

...

ACL Data RX: Handle 256 flags 0x02 dlen 16 #27

L2CAP: Configure Request (0x04) ident 3 len 8 Destination CID: 64 Flags: 0x0000 Option: Unknown (0x10) [hint] 01 00 .. < ACL Data TX: Handle 256 flags 0x00 dlen 18 #28 L2CAP: Configure Response (0x05) ident 3 len 10 Source CID: 65 Flags: 0x0000 Result: Success (0x0000) Option: Maximum Transmission Unit (0x01) [mandatory] MTU: 672

...

HCI Event: Number of Completed Packets (0x13) plen 5 #29

Num handles: 1 Handle: 256 Count: 1

...

ACL Data RX: Handle 256 flags 0x02 dlen 14 #30

L2CAP: Configure Response (0x05) ident 2 len 6 Source CID: 64 Flags: 0x0000 Result: Success (0x0000)

...

ACL Data RX: Handle 256 flags 0x02 dlen 20 #31

L2CAP: Configure Request (0x04) ident 3 len 12 Destination CID: 64 Flags: 0x0000 Option: Unknown (0x10) [hint] 01 00 91 02 11 11 ...... < ACL Data TX: Handle 256 flags 0x00 dlen 14 #32 L2CAP: Command Reject (0x01) ident 3 len 6 Reason: Invalid CID in request (0x0002) Destination CID: 64 Source CID: 65

...

HCI Event: Number of Completed Packets (0x13) plen 5 #33

Num handles: 1 Handle: 256 Count: 1 ... === After === ...

...

ACL Data RX: Handle 256 flags 0x02 dlen 12 #22

...

HCI Event: Number of Completed Packets (0x13) plen 5 #25

Num handles: 1 Handle: 256 Count: 1

...

HCI Event: Number of Completed Packets (0x13) plen 5 #26

Num handles: 1 Handle: 256 Count: 1

...

ACL Data RX: Handle 256 flags 0x02 dlen 16 #27

...

HCI Event: Number of Completed Packets (0x13) plen 5 #29

Num handles: 1 Handle: 256 Count: 1

...

ACL Data RX: Handle 256 flags 0x02 dlen 14 #30

L2CAP: Configure Response (0x05) ident 2 len 6 Source CID: 64 Flags: 0x0000 Result: Success (0x0000)

...

ACL Data RX: Handle 256 flags 0x02 dlen 20 #31

L2CAP: Configure Request (0x04) ident 3 len 12 Destination CID: 64 Flags: 0x0000 Option: Unknown (0x10) [hint] 01 00 91 02 11 11 ..... < ACL Data TX: Handle 256 flags 0x00 dlen 18 #32 L2CAP: Configure Response (0x05) ident 3 len 10 Source CID: 65 Flags: 0x0000 Result: Success (0x0000) Option: Maximum Transmission Unit (0x01) [mandatory] MTU: 672 < ACL Data TX: Handle 256 flags 0x00 dlen 12 #33 L2CAP: Configure Request (0x04) ident 3 len 4 Destination CID: 65 Flags: 0x0000

...

HCI Event: Number of Completed Packets (0x13) plen 5 #34

Num handles: 1 Handle: 256 Count: 1

...

HCI Event: Number of Completed Packets (0x13) plen 5 #35

Num handles: 1 Handle: 256 Count: 1 ...

Signed-off-by: Howard Chung howardchung@google.com Signed-off-by: Marcel Holtmann marcel@holtmann.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/bluetooth/l2cap_core.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/net/bluetooth/l2cap_core.c b/net/bluetooth/l2cap_core.c index 11012a5090708..a9f74bf367f12 100644 --- a/net/bluetooth/l2cap_core.c +++ b/net/bluetooth/l2cap_core.c @@ -4104,7 +4104,8 @@ static inline int l2cap_config_req(struct l2cap_conn *conn, return 0; }

- if (chan->state != BT_CONFIG && chan->state != BT_CONNECT2) { + if (chan->state != BT_CONFIG && chan->state != BT_CONNECT2 && + chan->state != BT_CONNECTED) { cmd_reject_invalid_cid(conn, cmd->ident, chan->scid, chan->dcid); goto unlock;

-- 2.20.1

Sasha Levin

11:13 p.m.

New subject: [PATCH AUTOSEL 4.9 08/26] drm/tegra: dc: Release PM and RGB output when client's registration fails

From: Dmitry Osipenko digetx@gmail.com

[ Upstream commit 0411ea89a689531e1829fdf8af3747646c02c721 ]

Runtime PM and RGB output need to be released when host1x client registration fails. The releasing is missed in the code, let's correct it.

Signed-off-by: Dmitry Osipenko digetx@gmail.com Signed-off-by: Thierry Reding treding@nvidia.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/tegra/dc.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/tegra/dc.c b/drivers/gpu/drm/tegra/dc.c index 4010d69cbd084..9cb742c01c28a 100644 --- a/drivers/gpu/drm/tegra/dc.c +++ b/drivers/gpu/drm/tegra/dc.c @@ -2019,10 +2019,16 @@ static int tegra_dc_probe(struct platform_device *pdev) if (err < 0) { dev_err(&pdev->dev, "failed to register host1x client: %d\n", err); - return err; + goto disable_pm; }

return 0; + +disable_pm: + pm_runtime_disable(&pdev->dev); + tegra_dc_rgb_remove(dc); + + return err; }

static int tegra_dc_remove(struct platform_device *pdev)

-- 2.20.1

Sasha Levin

11:13 p.m.

New subject: [PATCH AUTOSEL 4.9 09/26] net/mlx5e: Init ethtool steering for representors

From: Vlad Buslov vladbu@mellanox.com

[ Upstream commit 6783e8b29f636383af293a55336f036bc7ad5619 ]

During transition to uplink representors the code responsible for initializing ethtool steering functionality wasn't added to representor init rx routine. This causes NULL pointer dereference during configuration of network flow classification rule with ethtool (only possible to reproduce with next commit in this series which registers necessary ethtool callbacks).

Signed-off-by: Vlad Buslov vladbu@mellanox.com Reviewed-by: Roi Dayan roid@mellanox.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/mellanox/mlx5/core/en_rep.c | 2 ++ 1 file changed, 2 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c index b210c171a3806..37702e4e1871a 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c @@ -358,6 +358,8 @@ static int mlx5e_init_rep_rx(struct mlx5e_priv *priv) if (err) goto err_del_flow_rule;

+ mlx5e_ethtool_init_steering(priv); + return 0;

err_del_flow_rule:

-- 2.20.1

Or Gerlitz

12 Apr 12 Apr

7:10 a.m.

New subject: [PATCH AUTOSEL 4.9 09/26] net/mlx5e: Init ethtool steering for representors

On Sun, Apr 12, 2020 at 2:16 AM Sasha Levin sashal@kernel.org wrote:

...

[ Upstream commit 6783e8b29f636383af293a55336f036bc7ad5619 ]

Sasha,

This was pushed to net-next without a fixes tag, and there're probably reasons for that. As you can see the possible null deref is not even reproducible without another patch which for itself was also net-next and not net one.

If a team is not pushing patch to net nor putting a fixes that, I don't think it's correct to go and pick that into stable and from there to customer production kernels.

Alsom, I am not sure what's the idea behind the auto-selection concept, e.g for mlx5 the maintainer is specifically pointing which patches should go to stable and to what releases there and this is done with care and thinking ahead, why do we want to add on that? and why this can be something which is just automatic selection?

We have customers running production system with LTS 4.4.x and 4.9.y (along with 4.14.z and 4.19.w) kernels, we put lots of care thinking if/what should go there, I don't see a benefit from adding auto-selection, the converse.

Or.

...

During transition to uplink representors the code responsible for initializing ethtool steering functionality wasn't added to representor init rx routine. This causes NULL pointer dereference during configuration of network flow classification rule with ethtool (only possible to reproduce with next commit in this series which registers necessary ethtool callbacks).

...

Signed-off-by: Vlad Buslov vladbu@mellanox.com Reviewed-by: Roi Dayan roid@mellanox.com Signed-off-by: Sasha Levin sashal@kernel.org

Jakub Kicinski

5:59 p.m.

New subject: [PATCH AUTOSEL 4.9 09/26] net/mlx5e: Init ethtool steering for representors

On Sun, 12 Apr 2020 10:10:22 +0300 Or Gerlitz wrote:

...

On Sun, Apr 12, 2020 at 2:16 AM Sasha Levin sashal@kernel.org wrote:

...
[ Upstream commit 6783e8b29f636383af293a55336f036bc7ad5619 ]

Sasha,

This was pushed to net-next without a fixes tag, and there're probably reasons for that. As you can see the possible null deref is not even reproducible without another patch which for itself was also net-next and not net one.

If a team is not pushing patch to net nor putting a fixes that, I don't think it's correct to go and pick that into stable and from there to customer production kernels.

Alsom, I am not sure what's the idea behind the auto-selection concept, e.g for mlx5 the maintainer is specifically pointing which patches should go to stable and to what releases there and this is done with care and thinking ahead, why do we want to add on that? and why this can be something which is just automatic selection?

We have customers running production system with LTS 4.4.x and 4.9.y (along with 4.14.z and 4.19.w) kernels, we put lots of care thinking if/what should go there, I don't see a benefit from adding auto-selection, the converse.

FWIW I had the same thoughts about the nfp driver, and I indicated to Sasha to skip it in the auto selection, which AFAICT worked nicely.

Maybe we should communicate more clearly that maintainers who carefully select patches for stable should opt out of auto-selection?

Or Gerlitz

6:29 p.m.

New subject: [PATCH AUTOSEL 4.9 09/26] net/mlx5e: Init ethtool steering for representors

On Sun, Apr 12, 2020 at 8:59 PM Jakub Kicinski kuba@kernel.org wrote:

...

On Sun, 12 Apr 2020 10:10:22 +0300 Or Gerlitz wrote:

...
On Sun, Apr 12, 2020 at 2:16 AM Sasha Levin sashal@kernel.org wrote:

...
[ Upstream commit 6783e8b29f636383af293a55336f036bc7ad5619 ]

Sasha,

This was pushed to net-next without a fixes tag, and there're probably reasons for that. As you can see the possible null deref is not even reproducible without another patch which for itself was also net-next and not net one.

If a team is not pushing patch to net nor putting a fixes that, I don't think it's correct to go and pick that into stable and from there to customer production kernels.

Alsom, I am not sure what's the idea behind the auto-selection concept, e.g for mlx5 the maintainer is specifically pointing which patches should go to stable and to what releases there and this is done with care and thinking ahead, why do we want to add on that? and why this can be something which is just automatic selection?

We have customers running production system with LTS 4.4.x and 4.9.y (along with 4.14.z and 4.19.w) kernels, we put lots of care thinking if/what should go there, I don't see a benefit from adding auto-selection, the converse.

FWIW I had the same thoughts about the nfp driver, and I indicated to Sasha to skip it in the auto selection, which AFAICT worked nicely.

Maybe we should communicate more clearly that maintainers who carefully select patches for stable should opt out of auto-selection?

Sasha Levin

14 Apr 14 Apr

1:56 a.m.

New subject: [PATCH AUTOSEL 4.9 09/26] net/mlx5e: Init ethtool steering for representors

On Sun, Apr 12, 2020 at 10:59:35AM -0700, Jakub Kicinski wrote:

...

On Sun, 12 Apr 2020 10:10:22 +0300 Or Gerlitz wrote:

...
On Sun, Apr 12, 2020 at 2:16 AM Sasha Levin sashal@kernel.org wrote:

...
[ Upstream commit 6783e8b29f636383af293a55336f036bc7ad5619 ]

Sasha,

This was pushed to net-next without a fixes tag, and there're probably reasons for that. As you can see the possible null deref is not even reproducible without another patch which for itself was also net-next and not net one.

If a team is not pushing patch to net nor putting a fixes that, I don't think it's correct

While it's great that you're putting the effort into adding a fixes tag to your commits, I'm not sure what a fixes tag has to do with inclusion in a stable tree.

It's a great help when we look into queueing something up, but on it's own it doesn't imply anything.

...

...
to go and pick that into stable and from there to customer production kernels.

This mail is your two week warning that this patch might get queued to stable, nothing was actually queued just yet.

...

...
Alsom, I am not sure what's the idea behind the auto-selection concept, e.g for mlx5 the maintainer is specifically pointing which patches should go to stable and

I'm curious, how does this process work? Is it on a mailing list somewhere?

...

...
to what releases there and this is done with care and thinking ahead, why do we want to add on that? and why this can be something which is just automatic selection?

We have customers running production system with LTS 4.4.x and 4.9.y (along with 4.14.z and 4.19.w) kernels, we put lots of care thinking if/what should go there, I don't see a benefit from adding auto-selection, the converse.

FWIW I had the same thoughts about the nfp driver, and I indicated to Sasha to skip it in the auto selection, which AFAICT worked nicely.

Maybe we should communicate more clearly that maintainers who carefully select patches for stable should opt out of auto-selection?

I've added drivers/net/ethernet/mellanox/ to my blacklist for auto selection. It's very easy to opt out, just ask... I've never argued with anyone around this - the maintainers of any given subsystem know about it way better than me.

-- Thanks, Sasha

Leon Romanovsky

7:16 a.m.

New subject: [PATCH AUTOSEL 4.9 09/26] net/mlx5e: Init ethtool steering for representors

On Mon, Apr 13, 2020 at 09:56:28PM -0400, Sasha Levin wrote:

...

On Sun, Apr 12, 2020 at 10:59:35AM -0700, Jakub Kicinski wrote:

...
On Sun, 12 Apr 2020 10:10:22 +0300 Or Gerlitz wrote:

...
On Sun, Apr 12, 2020 at 2:16 AM Sasha Levin sashal@kernel.org wrote:

...
[ Upstream commit 6783e8b29f636383af293a55336f036bc7ad5619 ]

Sasha,

This was pushed to net-next without a fixes tag, and there're probably reasons for that. As you can see the possible null deref is not even reproducible without another patch which for itself was also net-next and not net one.

If a team is not pushing patch to net nor putting a fixes that, I don't think it's correct

While it's great that you're putting the effort into adding a fixes tag to your commits, I'm not sure what a fixes tag has to do with inclusion in a stable tree.

It's a great help when we look into queueing something up, but on it's own it doesn't imply anything.

...
...
to go and pick that into stable and from there to customer production kernels.

This mail is your two week warning that this patch might get queued to stable, nothing was actually queued just yet.

...
...
Alsom, I am not sure what's the idea behind the auto-selection concept, e.g for mlx5 the maintainer is specifically pointing which patches should go to stable and

I'm curious, how does this process work? Is it on a mailing list somewhere?

Saeed asks from Dave explicitly to pick commits to stable@. https://lore.kernel.org/netdev/20200408225124.883292-1-saeedm@mellanox.com/

...

...
...
to what releases there and this is done with care and thinking ahead, why do we want to add on that? and why this can be something which is just automatic selection?

We have customers running production system with LTS 4.4.x and 4.9.y (along with 4.14.z and 4.19.w) kernels, we put lots of care thinking if/what should go there, I don't see a benefit from adding auto-selection, the converse.

FWIW I had the same thoughts about the nfp driver, and I indicated to Sasha to skip it in the auto selection, which AFAICT worked nicely.

Maybe we should communicate more clearly that maintainers who carefully select patches for stable should opt out of auto-selection?

I've added drivers/net/ethernet/mellanox/ to my blacklist for auto selection. It's very easy to opt out, just ask... I've never argued with anyone around this - the maintainers of any given subsystem know about it way better than me.

I was under impression that all netdev commits are excluded.

Thanks

...

-- Thanks, Sasha

Or Gerlitz

10:22 a.m.

New subject: [PATCH AUTOSEL 4.9 09/26] net/mlx5e: Init ethtool steering for representors

On Tue, Apr 14, 2020 at 4:56 AM Sasha Levin sashal@kernel.org wrote:

...

On Sun, Apr 12, 2020 at 10:59:35AM -0700, Jakub Kicinski wrote:

...
On Sun, 12 Apr 2020 10:10:22 +0300 Or Gerlitz wrote:

...
On Sun, Apr 12, 2020 at 2:16 AM Sasha Levin sashal@kernel.org wrote:

...
[ Upstream commit 6783e8b29f636383af293a55336f036bc7ad5619 ]

Sasha,

This was pushed to net-next without a fixes tag, and there're probably reasons for that. As you can see the possible null deref is not even reproducible without another patch which for itself was also net-next and not net one.

If a team is not pushing patch to net nor putting a fixes that, I don't think it's correct

While it's great that you're putting the effort into adding a fixes tag to your commits, I'm not sure what a fixes tag has to do with inclusion in a stable tree.

It's a great help when we look into queueing something up, but on it's own it doesn't imply anything.

...
...
to go and pick that into stable and from there to customer production kernels.

This mail is your two week warning that this patch might get queued to stable, nothing was actually queued just yet.

...
...
Alsom, I am not sure what's the idea behind the auto-selection concept, e.g for mlx5 the maintainer is specifically pointing which patches should go to stable and

I'm curious, how does this process work? Is it on a mailing list somewhere?

...
...
to what releases there and this is done with care and thinking ahead, why do we want to add on that? and why this can be something which is just automatic selection?

We have customers running production system with LTS 4.4.x and 4.9.y (along with 4.14.z and 4.19.w) kernels, we put lots of care thinking if/what should go there, I don't see a benefit from adding auto-selection, the converse.

FWIW I had the same thoughts about the nfp driver, and I indicated to Sasha to skip it in the auto selection, which AFAICT worked nicely.

Maybe we should communicate more clearly that maintainers who carefully select patches for stable should opt out of auto-selection?

I've added drivers/net/ethernet/mellanox/ to my blacklist for auto selection. It's very easy to opt out, just ask... I've never argued with anyone around this - the maintainers of any given subsystem know about it way better than me.

Just to make sure, does this excluding of mlx5 happens immediately, that is, applies also to all non committed patches that you already posted?

IMHO - I think it should be the other way around, you should get approval from sub-system maintainers to put their code in charge into auto-selection, unless there's kernel summit decision that says otherwise, is this documented anywhere?

Greg KH

11:09 a.m.

New subject: [PATCH AUTOSEL 4.9 09/26] net/mlx5e: Init ethtool steering for representors

On Tue, Apr 14, 2020 at 01:22:59PM +0300, Or Gerlitz wrote:

...

IMHO - I think it should be the other way around, you should get approval from sub-system maintainers to put their code in charge into auto-selection, unless there's kernel summit decision that says otherwise, is this documented anywhere?

No, we can't get make this a "only take if I agree" as there are _many_ subsystem maintainers who today never mark anything for stable trees, as they just can't be bothered. And that's fine, stable trees should not take up any extra maintainer time if they do not want to do so. So it's simpler to do an opt-out when asked for.

thanks,

greg k-h

Or Gerlitz

2:38 p.m.

New subject: [PATCH AUTOSEL 4.9 09/26] net/mlx5e: Init ethtool steering for representors

On Tue, Apr 14, 2020 at 2:09 PM Greg KH gregkh@linuxfoundation.org wrote:

...

On Tue, Apr 14, 2020 at 01:22:59PM +0300, Or Gerlitz wrote:

...
IMHO - I think it should be the other way around, you should get approval from sub-system maintainers to put their code in charge into auto-selection, unless there's kernel summit decision that says otherwise, is this documented anywhere?

No, we can't get make this a "only take if I agree" as there are _many_ subsystem maintainers who today never mark anything for stable trees, as they just can't be bothered. And that's fine, stable trees should not take up any extra maintainer time if they do not want to do so. So it's simpler to do an opt-out when asked for.

OK, but I must say I am worried from the comment made here:

"I'm not sure what a fixes tag has to do with inclusion in a stable tree"

This patch

(A) was pushed to -next and not -rc kernel

(B) doesn't have fixes tag

(C) the change log state clearly that what's being "fixed" can't be reproduced on any earlier kernel [..] "only possible to reproduce with next commit in this series"

but it was selected for -stable -- at least if the fixes tag was used as gating criteria, this wrong stable inclusion could have been eliminated

Sasha Levin

3:16 p.m.

New subject: [PATCH AUTOSEL 4.9 09/26] net/mlx5e: Init ethtool steering for representors

On Tue, Apr 14, 2020 at 05:38:32PM +0300, Or Gerlitz wrote:

...

On Tue, Apr 14, 2020 at 2:09 PM Greg KH gregkh@linuxfoundation.org wrote:

...
On Tue, Apr 14, 2020 at 01:22:59PM +0300, Or Gerlitz wrote:

...
IMHO - I think it should be the other way around, you should get approval from sub-system maintainers to put their code in charge into auto-selection, unless there's kernel summit decision that says otherwise, is this documented anywhere?

No, we can't get make this a "only take if I agree" as there are _many_ subsystem maintainers who today never mark anything for stable trees, as they just can't be bothered. And that's fine, stable trees should not take up any extra maintainer time if they do not want to do so. So it's simpler to do an opt-out when asked for.

OK, but I must say I am worried from the comment made here:

"I'm not sure what a fixes tag has to do with inclusion in a stable tree"

This patch

(A) was pushed to -next and not -rc kernel

Fixes can (and should) come in during a merge window as well. They are not put on hold until the -rc releases.

...

(B) doesn't have fixes tag

In the 4.19 stable tree there are 3962 commits explicitly tagged for stable, only 2382 of them have a fixes tag.

...

(C) the change log state clearly that what's being "fixed" can't be reproduced on any earlier kernel [..] "only possible to reproduce with next commit in this series"

but it was selected for -stable -- at least if the fixes tag was used as gating criteria, this wrong stable inclusion could have been eliminated

Are you suggesting that a commit without a fixes tag is never a fix?

-- Thanks, Sasha

Edward Cree

3:49 p.m.

New subject: [PATCH AUTOSEL 4.9 09/26] net/mlx5e: Init ethtool steering for representors

On 14/04/2020 16:16, Sasha Levin wrote:

...

Are you suggesting that a commit without a fixes tag is never a fix?

Because fixes are much more likely than non-fixes to have a Fixes tag, the absence of a fixes tag is Bayesian evidence that a commit is not a fix. It's of course not incontrovertible evidence, since (as you note) some fixes do not have a Fixes tag, but it does increase the amount of countervailing evidence needed to conclude a commit is a fix. In this case it looks as if the only such evidence was that the commit message included the phrase "NULL pointer dereference".

...

Fixes can (and should) come in during a merge window as well. They are not put on hold until the -rc releases.

In networking-land, fixes generally go through David's 'net' tree, rather than 'net-next'; the only times a fix goes to net-next are when a) the code it's fixing is only in net-next; i.e. it's a fix to a previous patch from the same merge window. In this case the fix should not be backported, since the code it's fixing will not appear in stable kernels. b) the code has changed enough between net and net-next that different fixes are appropriate for the two trees. In this case, only the fix that went to 'net' should be backported (since it's the one that's appropriate for net, it's probably more appropriate for stable trees too); the fix that went to 'net-next' should not. Or's original phrasing was that this patch "was pushed to net-next", which is not quite exactly the same thing as -next vs. -rc (though it's similar because of David's system of closing net-next for the duration of the merge window). And this, again, is quite strong Bayesian evidence that the patch should not be selected for stable.

To be honest, that this needs to be explained to you does not inspire confidence in the quality of your autoselection process...

GregKH wrote:

...

we can't get make this a "only take if I agree" as there are _many_ subsystem maintainers who today never mark anything for stable trees

The complaint seems to be that Sasha's threshold for "this looks stable- worthy" is too low; maybe setting it that low should be opt-in, while patches for subsystems that haven't opted in should need to be "obviously" fixes for them to get autoselected, on the grounds that having to watch the autosel messages and pounce on the ones to which the answer is "no no no, applying that to stable kernels will break them horribly" is _also_ extra work for a maintainer. (Especially if, a month after NACKing them, the same patch shows up again in another autosel batch, which — unless my memory is playing tricks on me — has happened to me at least once.)

-ed

Leon Romanovsky

5:37 p.m.

New subject: [PATCH AUTOSEL 4.9 09/26] net/mlx5e: Init ethtool steering for representors

On Tue, Apr 14, 2020 at 04:49:20PM +0100, Edward Cree wrote:

...

On 14/04/2020 16:16, Sasha Levin wrote:

...
Are you suggesting that a commit without a fixes tag is never a fix?

Because fixes are much more likely than non-fixes to have a Fixes tag, the absence of a fixes tag is Bayesian evidence that a commit is not a fix. It's of course not incontrovertible evidence, since (as you note) some fixes do not have a Fixes tag, but it does increase the amount of countervailing evidence needed to conclude a commit is a fix. In this case it looks as if the only such evidence was that the commit message included the phrase "NULL pointer dereference".

...
Fixes can (and should) come in during a merge window as well. They are not put on hold until the -rc releases.

In networking-land, fixes generally go through David's 'net' tree, rather than 'net-next'; the only times a fix goes to net-next are when a) the code it's fixing is only in net-next; i.e. it's a fix to a previous patch from the same merge window. In this case the fix should not be backported, since the code it's fixing will not appear in stable kernels. b) the code has changed enough between net and net-next that different fixes are appropriate for the two trees. In this case, only the fix that went to 'net' should be backported (since it's the one that's appropriate for net, it's probably more appropriate for stable trees too); the fix that went to 'net-next' should not. Or's original phrasing was that this patch "was pushed to net-next", which is not quite exactly the same thing as -next vs. -rc (though it's similar because of David's system of closing net-next for the duration of the merge window). And this, again, is quite strong Bayesian evidence that the patch should not be selected for stable.

To be honest, that this needs to be explained to you does not inspire confidence in the quality of your autoselection process...

It is a little bit harsh to say that.

The autoselection process works good enough for everything outside of netdev community. The amount of bugs in those stable@ trees is not such high if you take into account the amount of fixes automatically brought in.

I think that all Fedora users are indirectly use those stable@ trees.

Thanks

Jakub Kicinski

7:03 p.m.

New subject: [PATCH AUTOSEL 4.9 09/26] net/mlx5e: Init ethtool steering for representors

On Tue, 14 Apr 2020 20:37:18 +0300 Leon Romanovsky wrote:

...

On Tue, Apr 14, 2020 at 04:49:20PM +0100, Edward Cree wrote:

...
On 14/04/2020 16:16, Sasha Levin wrote:

...
Are you suggesting that a commit without a fixes tag is never a fix?

Because fixes are much more likely than non-fixes to have a Fixes tag, the absence of a fixes tag is Bayesian evidence that a commit is not a fix. It's of course not incontrovertible evidence, since (as you note) some fixes do not have a Fixes tag, but it does increase the amount of countervailing evidence needed to conclude a commit is a fix. In this case it looks as if the only such evidence was that the commit message included the phrase "NULL pointer dereference".

...
Fixes can (and should) come in during a merge window as well. They are not put on hold until the -rc releases.

In networking-land, fixes generally go through David's 'net' tree, rather than 'net-next'; the only times a fix goes to net-next are when a) the code it's fixing is only in net-next; i.e. it's a fix to a previous patch from the same merge window. In this case the fix should not be backported, since the code it's fixing will not appear in stable kernels. b) the code has changed enough between net and net-next that different fixes are appropriate for the two trees. In this case, only the fix that went to 'net' should be backported (since it's the one that's appropriate for net, it's probably more appropriate for stable trees too); the fix that went to 'net-next' should not. Or's original phrasing was that this patch "was pushed to net-next", which is not quite exactly the same thing as -next vs. -rc (though it's similar because of David's system of closing net-next for the duration of the merge window). And this, again, is quite strong Bayesian evidence that the patch should not be selected for stable.

To be honest, that this needs to be explained to you does not inspire confidence in the quality of your autoselection process...

It is a little bit harsh to say that.

The autoselection process works good enough for everything outside of netdev community. The amount of bugs in those stable@ trees is not such high if you take into account the amount of fixes automatically brought in.

I think that all Fedora users are indirectly use those stable@ trees.

I think folks how mark things for stable explicitly and carefully have an obvious bias because they only see the false positives of auto-sel and never the benefits.

Sasha Levin

9 p.m.

New subject: [PATCH AUTOSEL 4.9 09/26] net/mlx5e: Init ethtool steering for representors

On Tue, Apr 14, 2020 at 08:37:18PM +0300, Leon Romanovsky wrote:

...

On Tue, Apr 14, 2020 at 04:49:20PM +0100, Edward Cree wrote:

...
On 14/04/2020 16:16, Sasha Levin wrote:

...
Are you suggesting that a commit without a fixes tag is never a fix?

Because fixes are much more likely than non-fixes to have a Fixes tag, the absence of a fixes tag is Bayesian evidence that a commit is not a fix. It's of course not incontrovertible evidence, since (as you note) some fixes do not have a Fixes tag, but it does increase the amount of countervailing evidence needed to conclude a commit is a fix. In this case it looks as if the only such evidence was that the commit message included the phrase "NULL pointer dereference".

...
Fixes can (and should) come in during a merge window as well. They are not put on hold until the -rc releases.

In networking-land, fixes generally go through David's 'net' tree, rather than 'net-next'; the only times a fix goes to net-next are when a) the code it's fixing is only in net-next; i.e. it's a fix to a previous patch from the same merge window. In this case the fix should not be backported, since the code it's fixing will not appear in stable kernels. b) the code has changed enough between net and net-next that different fixes are appropriate for the two trees. In this case, only the fix that went to 'net' should be backported (since it's the one that's appropriate for net, it's probably more appropriate for stable trees too); the fix that went to 'net-next' should not. Or's original phrasing was that this patch "was pushed to net-next", which is not quite exactly the same thing as -next vs. -rc (though it's similar because of David's system of closing net-next for the duration of the merge window). And this, again, is quite strong Bayesian evidence that the patch should not be selected for stable.

To be honest, that this needs to be explained to you does not inspire confidence in the quality of your autoselection process...

It is a little bit harsh to say that.

The autoselection process works good enough for everything outside of netdev community. The amount of bugs in those stable@ trees is not such high if you take into account the amount of fixes automatically brought in.

I'll add that it's funny that we're discussing AUTOSEL in this context given that a conversation I had with Leon quite a few years back around issues with Mellanox patches not going to -stable was one of the triggers for AUTOSEL :)

-- Thanks, Sasha

Michal Kubecek

10:50 p.m.

New subject: [PATCH AUTOSEL 4.9 09/26] net/mlx5e: Init ethtool steering for representors

On Tue, Apr 14, 2020 at 08:37:18PM +0300, Leon Romanovsky wrote:

...

The autoselection process works good enough for everything outside of netdev community.

That's very far from true. I have seen and heard many complaints about AUTOSEL and inflation of stable trees in general, both in private and in public lists. It was also discussed on Kernel Summit few times - with little success.

Just for fun, I suggest everyone to read first section of Documentation/process/stable-kernel-rules.rst and compare with today's reality. Of course, rules can change over time but keeping that document in kernel tree as a memento is rather sad - I went through the rules now and there are only three which are not broken on a regular basis these days.

Michal Kubecek

Leon Romanovsky

15 Apr 15 Apr

5:31 a.m.

New subject: [PATCH AUTOSEL 4.9 09/26] net/mlx5e: Init ethtool steering for representors

On Wed, Apr 15, 2020 at 12:50:09AM +0200, Michal Kubecek wrote:

...

On Tue, Apr 14, 2020 at 08:37:18PM +0300, Leon Romanovsky wrote:

...
The autoselection process works good enough for everything outside of netdev community.

That's very far from true. I have seen and heard many complaints about AUTOSEL and inflation of stable trees in general, both in private and in public lists. It was also discussed on Kernel Summit few times - with little success.

I'm aware of the discussions and the cases brought there. From what I saw, Sasha and Greg came with numbers and process that is much better than anything else we had before. While the opponents came with very narrow examples of imperfections in AUTOSEL machinery.

Of course, the AUTOSEL mechanism is not perfect and prone to errors, but it is much better than try to rely on the developers good will to add Fixes and stable@ tag.

It is pretty easy to be biased while receiving those complaints, because people are contacting us only if something is broken. I imagine that no one is approaching you and expressing his happiness with stable@ or anything else.

Just for fun and to get perspective, for one my very popular tool, which I wrote two years ago, I received only ONE "thank you" email that says that this tool works as expected and helped to identify the problem.

And personally, I'm running latest Fedora on all my servers and laptop and it works great - stable and reliable.

Thanks

...

Just for fun, I suggest everyone to read first section of Documentation/process/stable-kernel-rules.rst and compare with today's reality. Of course, rules can change over time but keeping that document in kernel tree as a memento is rather sad - I went through the rules now and there are only three which are not broken on a regular basis these days.

Michal Kubecek

Sasha Levin

2:07 p.m.

New subject: [PATCH AUTOSEL 4.9 09/26] net/mlx5e: Init ethtool steering for representors

On Wed, Apr 15, 2020 at 12:50:09AM +0200, Michal Kubecek wrote:

...

On Tue, Apr 14, 2020 at 08:37:18PM +0300, Leon Romanovsky wrote:

...
The autoselection process works good enough for everything outside of netdev community.

That's very far from true. I have seen and heard many complaints about

How about some actionable data? What do you expect us to do with "seen and heard"?

Do you have evidence that AUTOSEL introduces more regressions over the stable tagged commits? Do you have evidence showing that AUTOSEL isn't performing properly? Great, let's go over it.

I have always measured the results of this work and compared it to the "regular" flow of stable tagged commits, I'm keeping track of how the quality of it compares to those stable tagged commits, and quite a few times I took action to deal with specific issues that come up with the process based on data.

There's not much I can do with "seen and heard".

...

AUTOSEL and inflation of stable trees in general, both in private and in

I'm puzzled about your objections to "inflation"? The stable tree was missing a bunch of fixes before, and now it doesn't.

In parallel to "inflating" the stable trees I'm also driving efforts to improve the testing of these trees so that we could reliably say that we didn't introduce any regressions into the stable tree. "inflation" on it's own doesn't increase the rate of regressions in stable trees, and that's based on historical data.

...

public lists. It was also discussed on Kernel Summit few times - with little success.

Define success? We're adding a bunch of missing fixes and not causing any more regressions than before, you don't call that a success?

Let me also point out that I've tried to bring up this process (as well as the -stable process in general) at every kernel summit/maintainer's summit (here's the one from last year: https://lwn.net/Articles/799166/) and attempted to address any feedback that came back from it. It seemed to me that there's an agreement that this process was going well (at the very least Linus didn't yell at me), so I'm still puzzled by your own success criteria.

-- Thanks, Sasha

Sasha Levin

14 Apr 14 Apr

8:57 p.m.

New subject: [PATCH AUTOSEL 4.9 09/26] net/mlx5e: Init ethtool steering for representors

On Tue, Apr 14, 2020 at 04:49:20PM +0100, Edward Cree wrote:

...

On 14/04/2020 16:16, Sasha Levin wrote:

...
Are you suggesting that a commit without a fixes tag is never a fix?

Because fixes are much more likely than non-fixes to have a Fixes tag, the absence of a fixes tag is Bayesian evidence that a commit is not a fix. It's of course not incontrovertible evidence, since (as you note) some fixes do not have a Fixes tag, but it does increase the amount of countervailing evidence needed to conclude a commit is a fix. In this case it looks as if the only such evidence was that the commit message included the phrase "NULL pointer dereference".

I've pointed out that almost 50% of commits tagged for stable do not have a fixes tag, and yet they are fixes. You really deduce things based on coin flip probability?

$ git log --oneline -i --grep "fixes:" v4.19..stable/linux-4.19.y | wc -l 6235 $ git log --oneline v4.19..stable/linux-4.19.y | wc -l 12877

Look at that, most fixes in -stable *don't* have a fixes tag. Shouldn't your argument be the opposite? If a patch has a fixes tag, it's probably not a fix?

"it does increase the amount of countervailing evidence needed to conclude a commit is a fix" - Please explain this argument given the above.

...

...
Fixes can (and should) come in during a merge window as well. They are not put on hold until the -rc releases.

In networking-land, fixes generally go through David's 'net' tree, rather than 'net-next'; the only times a fix goes to net-next are when

This is great, but the kernel is more than just net/. Note that I also do not look at net/ itself, but rather drivers/net/ as those end up with a bunch of missed fixes.

...

a) the code it's fixing is only in net-next; i.e. it's a fix to a previous patch from the same merge window. In this case the fix should not be backported, since the code it's fixing will not appear in stable kernels. b) the code has changed enough between net and net-next that different fixes are appropriate for the two trees. In this case, only the fix that went to 'net' should be backported (since it's the one that's appropriate for net, it's probably more appropriate for stable trees too); the fix that went to 'net-next' should not. Or's original phrasing was that this patch "was pushed to net-next", which is not quite exactly the same thing as -next vs. -rc (though it's similar because of David's system of closing net-next for the duration of the merge window). And this, again, is quite strong Bayesian evidence that the patch should not be selected for stable.

To be honest, that this needs to be explained to you does not inspire confidence in the quality of your autoselection process...

Nothing like a personal attack or two to try and make a point?

-- Thanks, Sasha

Edward Cree

15 Apr 15 Apr

4:18 p.m.

New subject: [PATCH AUTOSEL 4.9 09/26] net/mlx5e: Init ethtool steering for representors

Firstly, let me apologise: my previous email was too harsh and too assertiveabout things that were really more uncertain and unclear.

On 14/04/2020 21:57, Sasha Levin wrote:

...

I've pointed out that almost 50% of commits tagged for stable do not have a fixes tag, and yet they are fixes. You really deduce things based on coin flip probability?

Yes, but far less than 50% of commits *not* tagged for stable have a fixes tag. It's not about hard-and-fast Aristotelian "deductions", like "this doesn't have Fixes:, therefore it is not a stable candidate", it's about probabilistic "induction".

...

"it does increase the amount of countervailing evidence needed to conclude a commit is a fix" - Please explain this argument given the above.

Are you familiar with Bayesian statistics? If not, I'd suggest reading something like http://yudkowsky.net/rational/bayes/ which explains it. There's a big difference between a coin flip and a _correlated_ coin flip.

...

This is great, but the kernel is more than just net/. Note that I also do not look at net/ itself, but rather drivers/net/ as those end up with a bunch of missed fixes.

drivers/net/ goes through the same DaveM net/net-next trees, with the same rules.

...

...
To be honest, that this needs to be explained to you does not inspire confidence in the quality of your autoselection process...

Nothing like a personal attack or two to try and make a point?

It wasn't meant as a personal attack, more as an "it's worrying that this is not known to the people doing the stable selection". But I did agonise over whether to say that and now wish I hadn't; sorry. (I'm not exactly my best self right now, what with all the lockdown cabin fever.)

-ed

Sasha Levin

16 Apr 16 Apr

midnight

New subject: [PATCH AUTOSEL 4.9 09/26] net/mlx5e: Init ethtool steering for representors

On Wed, Apr 15, 2020 at 05:18:38PM +0100, Edward Cree wrote:

...

Firstly, let me apologise: my previous email was too harsh and too assertiveabout things that were really more uncertain and unclear.

On 14/04/2020 21:57, Sasha Levin wrote:

...
I've pointed out that almost 50% of commits tagged for stable do not have a fixes tag, and yet they are fixes. You really deduce things based on coin flip probability?

Yes, but far less than 50% of commits *not* tagged for stable have a fixes tag. It's not about hard-and-fast Aristotelian "deductions", like "this doesn't have Fixes:, therefore it is not a stable candidate", it's about probabilistic "induction".

...
"it does increase the amount of countervailing evidence needed to conclude a commit is a fix" - Please explain this argument given the above.

Are you familiar with Bayesian statistics? If not, I'd suggest reading something like http://yudkowsky.net/rational/bayes/ which explains it. There's a big difference between a coin flip and a _correlated_ coin flip.

I'd maybe point out that the selection process is based on a neural network which knows about the existence of a Fixes tag in a commit.

It does exactly what you're describing, but also taking a bunch more factors into it's desicion process ("panic"? "oops"? "overflow"? etc).

...

...
This is great, but the kernel is more than just net/. Note that I also do not look at net/ itself, but rather drivers/net/ as those end up with a bunch of missed fixes.

drivers/net/ goes through the same DaveM net/net-next trees, with the same rules.

Let me put my Microsoft employee hat on here. We have driver/net/hyperv/ which definitely wasn't getting all the fixes it should have been getting without AUTOSEL.

While net/ is doing great, drivers/net/ is not. If it's indeed following the same rules then we need to talk about how we get done right.

I really have no objection to not looking in drivers/net/, it's just that the experience I had with the process suggests that it's not following the same process as net/.

-- Thanks, Sasha

Saeed Mahameed

4:08 a.m.

New subject: [PATCH AUTOSEL 4.9 09/26] net/mlx5e: Init ethtool steering for representors

On Wed, 2020-04-15 at 20:00 -0400, Sasha Levin wrote:

...

On Wed, Apr 15, 2020 at 05:18:38PM +0100, Edward Cree wrote:

...
Firstly, let me apologise: my previous email was too harsh and too assertiveabout things that were really more uncertain and unclear.

On 14/04/2020 21:57, Sasha Levin wrote:

...
I've pointed out that almost 50% of commits tagged for stable do not have a fixes tag, and yet they are fixes. You really deduce things based on coin flip probability?

Yes, but far less than 50% of commits *not* tagged for stable have a fixes tag. It's not about hard-and-fast Aristotelian "deductions", like "this doesn't have Fixes:, therefore it is not a stable candidate", it's about probabilistic "induction".

...
"it does increase the amount of countervailing evidence needed to conclude a commit is a fix" - Please explain this argument given the above.

Are you familiar with Bayesian statistics? If not, I'd suggest reading something like http://yudkowsky.net/rational/bayes/ which explains it. There's a big difference between a coin flip and a _correlated_ coin flip.

I'd maybe point out that the selection process is based on a neural network which knows about the existence of a Fixes tag in a commit.

It does exactly what you're describing, but also taking a bunch more factors into it's desicion process ("panic"? "oops"? "overflow"? etc).

I am not against AUTOSEL in general, as long as the decision to know how far back it is allowed to take a patch is made deterministically and not statistically based on some AI hunch.

Any auto selection for a patch without a Fixes tags can be catastrophic .. imagine a patch without a Fixes Tag with a single line that is fixing some "oops", such patch can be easily applied cleanly to stable- v.x and stable-v.y .. while it fixes the issue on v.x it might have catastrophic results on v.y ..

if you want these factors to keep playing a role in the autosel process, then a human factor or some deterministic testing/code coverage step must take action before backporting such patch on the blind.

What i would suggest here: For patches that are missing a Fixes tag, they should be considered as a "candidate" for autosel, and don't actually apply them until an explicit ACK from some human/regression test factor is received.

...

...
...
This is great, but the kernel is more than just net/. Note that I also do not look at net/ itself, but rather drivers/net/ as those end up with a bunch of missed fixes.

drivers/net/ goes through the same DaveM net/net-next trees, with the same rules.

Let me put my Microsoft employee hat on here. We have driver/net/hyperv/ which definitely wasn't getting all the fixes it should have been getting without AUTOSEL.

until some patch which shouldn't get backported slips through, believe me this will happen, just give it some time ..

...

While net/ is doing great, drivers/net/ is not. If it's indeed following the same rules then we need to talk about how we get done right.

both net and drivers/net are managed by the same maitainer and follow the same rules, can you elaborate on the difference ?

...

I really have no objection to not looking in drivers/net/, it's just that the experience I had with the process suggests that it's not following the same process as net/.

Leon Romanovsky

5:24 a.m.

New subject: [PATCH AUTOSEL 4.9 09/26] net/mlx5e: Init ethtool steering for representors

On Thu, Apr 16, 2020 at 04:08:10AM +0000, Saeed Mahameed wrote:

...

On Wed, 2020-04-15 at 20:00 -0400, Sasha Levin wrote:

...
On Wed, Apr 15, 2020 at 05:18:38PM +0100, Edward Cree wrote:

...
Firstly, let me apologise: my previous email was too harsh and too assertiveabout things that were really more uncertain and unclear.

On 14/04/2020 21:57, Sasha Levin wrote:

...
I've pointed out that almost 50% of commits tagged for stable do not have a fixes tag, and yet they are fixes. You really deduce things based on coin flip probability?

Yes, but far less than 50% of commits *not* tagged for stable have a fixes tag. It's not about hard-and-fast Aristotelian "deductions", like "this doesn't have Fixes:, therefore it is not a stable candidate", it's about probabilistic "induction".

...
"it does increase the amount of countervailing evidence needed to conclude a commit is a fix" - Please explain this argument given the above.

Are you familiar with Bayesian statistics? If not, I'd suggest reading something like http://yudkowsky.net/rational/bayes/ which explains it. There's a big difference between a coin flip and a _correlated_ coin flip.

I'd maybe point out that the selection process is based on a neural network which knows about the existence of a Fixes tag in a commit.

It does exactly what you're describing, but also taking a bunch more factors into it's desicion process ("panic"? "oops"? "overflow"? etc).

I am not against AUTOSEL in general, as long as the decision to know how far back it is allowed to take a patch is made deterministically and not statistically based on some AI hunch.

Any auto selection for a patch without a Fixes tags can be catastrophic .. imagine a patch without a Fixes Tag with a single line that is fixing some "oops", such patch can be easily applied cleanly to stable- v.x and stable-v.y .. while it fixes the issue on v.x it might have catastrophic results on v.y ..

I tried to imagine such flow and failed to do so. Are you talking about anything specific or imaginary case?

<...>

...

...
Let me put my Microsoft employee hat on here. We have driver/net/hyperv/ which definitely wasn't getting all the fixes it should have been getting without AUTOSEL.

until some patch which shouldn't get backported slips through, believe me this will happen, just give it some time ..

Bugs are inevitable, I don't see many differences between bugs introduced by manually cherry-picking or automatically one.

Of course, it is true if this automatically cherry-picking works as expected and evolving.

...

...
While net/ is doing great, drivers/net/ is not. If it's indeed following the same rules then we need to talk about how we get done right.

both net and drivers/net are managed by the same maitainer and follow the same rules, can you elaborate on the difference ?

The main reason is a difference in a volume between net and drivers/net. While net/* patches are watched by many eyes and carefully selected to be ported to stable@, most of the drivers/net patches are not.

Except 3-5 the most active drivers, rest of the driver patches almost never asked to be backported.

Thanks

Sasha Levin

1:30 p.m.

New subject: [PATCH AUTOSEL 4.9 09/26] net/mlx5e: Init ethtool steering for representors

On Thu, Apr 16, 2020 at 08:24:09AM +0300, Leon Romanovsky wrote:

...

On Thu, Apr 16, 2020 at 04:08:10AM +0000, Saeed Mahameed wrote:

...
On Wed, 2020-04-15 at 20:00 -0400, Sasha Levin wrote:

...
On Wed, Apr 15, 2020 at 05:18:38PM +0100, Edward Cree wrote:

...
Firstly, let me apologise: my previous email was too harsh and too assertiveabout things that were really more uncertain and unclear.

On 14/04/2020 21:57, Sasha Levin wrote:

...
I've pointed out that almost 50% of commits tagged for stable do not have a fixes tag, and yet they are fixes. You really deduce things based on coin flip probability?

Yes, but far less than 50% of commits *not* tagged for stable have a fixes tag. It's not about hard-and-fast Aristotelian "deductions", like "this doesn't have Fixes:, therefore it is not a stable candidate", it's about probabilistic "induction".

...
"it does increase the amount of countervailing evidence needed to conclude a commit is a fix" - Please explain this argument given the above.

Are you familiar with Bayesian statistics? If not, I'd suggest reading something like http://yudkowsky.net/rational/bayes/ which explains it. There's a big difference between a coin flip and a _correlated_ coin flip.

I'd maybe point out that the selection process is based on a neural network which knows about the existence of a Fixes tag in a commit.

It does exactly what you're describing, but also taking a bunch more factors into it's desicion process ("panic"? "oops"? "overflow"? etc).

I am not against AUTOSEL in general, as long as the decision to know how far back it is allowed to take a patch is made deterministically and not statistically based on some AI hunch.

Any auto selection for a patch without a Fixes tags can be catastrophic .. imagine a patch without a Fixes Tag with a single line that is fixing some "oops", such patch can be easily applied cleanly to stable- v.x and stable-v.y .. while it fixes the issue on v.x it might have catastrophic results on v.y ..

I tried to imagine such flow and failed to do so. Are you talking about anything specific or imaginary case?

It happens, rarely, but it does. However, all the cases I can think of happened with a stable tagged commit without a fixes where it's backport to an older tree caused unintended behavior (local denial of service in one case).

The scenario you have in mind is true for both stable and non-stable tagged patches, so it you want to restrict how we deal with commits that don't have a fixes tag shouldn't it be true for *all* commits?

...

<...>

...
...
Let me put my Microsoft employee hat on here. We have driver/net/hyperv/ which definitely wasn't getting all the fixes it should have been getting without AUTOSEL.

until some patch which shouldn't get backported slips through, believe me this will happen, just give it some time ..

Bugs are inevitable, I don't see many differences between bugs introduced by manually cherry-picking or automatically one.

Oh bugs slip in, that's why I track how many bugs slipped via stable tagged commits vs non-stable tagged ones, and the statistic may surprise you.

The solution here is to beef up your testing infrastructure rather than taking less patches; we still want to have *all* the fixes, right?

...

Of course, it is true if this automatically cherry-picking works as expected and evolving.

...
...
While net/ is doing great, drivers/net/ is not. If it's indeed following the same rules then we need to talk about how we get done right.

both net and drivers/net are managed by the same maitainer and follow the same rules, can you elaborate on the difference ?

The main reason is a difference in a volume between net and drivers/net. While net/* patches are watched by many eyes and carefully selected to be ported to stable@, most of the drivers/net patches are not.

Except 3-5 the most active drivers, rest of the driver patches almost never asked to be backported.

Right, that's exactly my point: If you're not Mellanox, e1000*, etc you won't see it, but the smaller drivers aren't getting the same handling as the big ones.

I think that we all love the work DaveM does with net/ - it makes our lives a lot easier, and if the same thing would happen with drivers/net/ I'll happily go away and never AUTOSEL a *net* commit, but looking at how our Hyper-V drivers look like it's clearly not there yet.

-- Thanks, Sasha

Saeed Mahameed

7:07 p.m.

New subject: [PATCH AUTOSEL 4.9 09/26] net/mlx5e: Init ethtool steering for representors

On Thu, 2020-04-16 at 09:30 -0400, Sasha Levin wrote:

...

On Thu, Apr 16, 2020 at 08:24:09AM +0300, Leon Romanovsky wrote:

...
On Thu, Apr 16, 2020 at 04:08:10AM +0000, Saeed Mahameed wrote:

...
On Wed, 2020-04-15 at 20:00 -0400, Sasha Levin wrote:

...
On Wed, Apr 15, 2020 at 05:18:38PM +0100, Edward Cree wrote:

...
Firstly, let me apologise: my previous email was too harsh and too assertiveabout things that were really more uncertain and unclear.

On 14/04/2020 21:57, Sasha Levin wrote:

...
I've pointed out that almost 50% of commits tagged for stable do not have a fixes tag, and yet they are fixes. You really deduce things based on coin flip probability?

Yes, but far less than 50% of commits *not* tagged for stable have a fixes tag. It's not about hard-and-fast Aristotelian "deductions", like "this doesn't have Fixes:, therefore it is not a stable candidate", it's about probabilistic "induction".

...
"it does increase the amount of countervailing evidence needed to conclude a commit is a fix" - Please explain this argument given the above.

Are you familiar with Bayesian statistics? If not, I'd suggest reading something like http://yudkowsky.net/rational/bayes/ which explains it. There's a big difference between a coin flip and a _correlated_ coin flip.

I'd maybe point out that the selection process is based on a neural network which knows about the existence of a Fixes tag in a commit.

It does exactly what you're describing, but also taking a bunch more factors into it's desicion process ("panic"? "oops"? "overflow"? etc).

I am not against AUTOSEL in general, as long as the decision to know how far back it is allowed to take a patch is made deterministically and not statistically based on some AI hunch.

Any auto selection for a patch without a Fixes tags can be catastrophic .. imagine a patch without a Fixes Tag with a single line that is fixing some "oops", such patch can be easily applied cleanly to stable- v.x and stable-v.y .. while it fixes the issue on v.x it might have catastrophic results on v.y ..

I tried to imagine such flow and failed to do so. Are you talking about anything specific or imaginary case?

It happens, rarely, but it does. However, all the cases I can think of happened with a stable tagged commit without a fixes where it's backport to an older tree caused unintended behavior (local denial of service in one case).

The scenario you have in mind is true for both stable and non-stable tagged patches, so it you want to restrict how we deal with commits that don't have a fixes tag shouldn't it be true for *all* commits?

All commits? even the ones without "oops" in them ? where does this stop ? :) We _must_ have a hard and deterministic cut for how far back to take a patch based on a human decision.. unless we are 100% positive autoselection AI can never make a mistake.

Humans are allowed to make mistakes, AI is not.

If a Fixes tag is wrong, then a human will be blamed, and that is perfectly fine, but if we have some statistical model that we know it is going to be wrong 0.001% of the time.. and we still let it run.. then something needs to be done about this.

I know there are benefits to autosel, but overtime, if this is not being audited, many pieces of the kernel will get broken unnoticed until some poor distro decides to upgrade their kernel version.

...

...
<...>

...
...
Let me put my Microsoft employee hat on here. We have driver/net/hyperv/ which definitely wasn't getting all the fixes it should have been getting without AUTOSEL.

until some patch which shouldn't get backported slips through, believe me this will happen, just give it some time ..

Bugs are inevitable, I don't see many differences between bugs introduced by manually cherry-picking or automatically one.

Oh bugs slip in, that's why I track how many bugs slipped via stable tagged commits vs non-stable tagged ones, and the statistic may surprise you.

Statistics do not matter here, what really matters is that there is a possibility of a non-human induced error, this should be a no no. or at least make it an opt-in thing for those who want to take their chances and keep a close eye on it..

...

The solution here is to beef up your testing infrastructure rather than

So please let me opt-in until I beef up my testing infra.

...

taking less patches; we still want to have *all* the fixes, right?

if you can be sure 100% it is the right thing to do, then yes, please don't hesitate to take that patch, even without asking anyone !!

Again, Humans are allowed to make mistakes.. AI is not.

...

...
Of course, it is true if this automatically cherry-picking works as expected and evolving.

...
...
While net/ is doing great, drivers/net/ is not. If it's indeed following the same rules then we need to talk about how we get done right.

both net and drivers/net are managed by the same maitainer and follow the same rules, can you elaborate on the difference ?

The main reason is a difference in a volume between net and drivers/net. While net/* patches are watched by many eyes and carefully selected to be ported to stable@, most of the drivers/net patches are not.

Except 3-5 the most active drivers, rest of the driver patches almost never asked to be backported.

Right, that's exactly my point: If you're not Mellanox, e1000*, etc you won't see it, but the smaller drivers aren't getting the same handling as the big ones.

I think that we all love the work DaveM does with net/ - it makes our lives a lot easier, and if the same thing would happen with drivers/net/ I'll happily go away and never AUTOSEL a *net* commit, but looking at how our Hyper-V drivers look like it's clearly not there yet.

Sasha Levin

7:58 p.m.

New subject: [PATCH AUTOSEL 4.9 09/26] net/mlx5e: Init ethtool steering for representors

On Thu, Apr 16, 2020 at 07:07:13PM +0000, Saeed Mahameed wrote:

...

On Thu, 2020-04-16 at 09:30 -0400, Sasha Levin wrote:

...
On Thu, Apr 16, 2020 at 08:24:09AM +0300, Leon Romanovsky wrote:

...
On Thu, Apr 16, 2020 at 04:08:10AM +0000, Saeed Mahameed wrote:

...
On Wed, 2020-04-15 at 20:00 -0400, Sasha Levin wrote:

...
On Wed, Apr 15, 2020 at 05:18:38PM +0100, Edward Cree wrote:

...
Firstly, let me apologise: my previous email was too harsh and too assertiveabout things that were really more uncertain and unclear.

On 14/04/2020 21:57, Sasha Levin wrote: > I've pointed out that almost 50% of commits tagged for > stable do > not > have a fixes tag, and yet they are fixes. You really deduce > things based > on coin flip probability? Yes, but far less than 50% of commits *not* tagged for stable have a fixes tag. It's not about hard-and-fast Aristotelian "deductions", like "this doesn't have Fixes:, therefore it is not a stable candidate", it's about probabilistic "induction".

> "it does increase the amount of countervailing evidence > needed to > conclude a commit is a fix" - Please explain this argument > given > the > above. Are you familiar with Bayesian statistics? If not, I'd suggest reading something like http://yudkowsky.net/rational/bayes/ which explains it. There's a big difference between a coin flip and a _correlated_ coin flip.

I'd maybe point out that the selection process is based on a neural network which knows about the existence of a Fixes tag in a commit.

It does exactly what you're describing, but also taking a bunch more factors into it's desicion process ("panic"? "oops"? "overflow"? etc).

I am not against AUTOSEL in general, as long as the decision to know how far back it is allowed to take a patch is made deterministically and not statistically based on some AI hunch.

Any auto selection for a patch without a Fixes tags can be catastrophic .. imagine a patch without a Fixes Tag with a single line that is fixing some "oops", such patch can be easily applied cleanly to stable- v.x and stable-v.y .. while it fixes the issue on v.x it might have catastrophic results on v.y ..

I tried to imagine such flow and failed to do so. Are you talking about anything specific or imaginary case?

It happens, rarely, but it does. However, all the cases I can think of happened with a stable tagged commit without a fixes where it's backport to an older tree caused unintended behavior (local denial of service in one case).

The scenario you have in mind is true for both stable and non-stable tagged patches, so it you want to restrict how we deal with commits that don't have a fixes tag shouldn't it be true for *all* commits?

All commits? even the ones without "oops" in them ? where does this stop ? :) We _must_ have a hard and deterministic cut for how far back to take a patch based on a human decision.. unless we are 100% positive autoselection AI can never make a mistake.

Humans are allowed to make mistakes, AI is not.

Oh I'm reviewing all patches myself after the bot does it's selection, you can blame me for these screw ups.

...

If a Fixes tag is wrong, then a human will be blamed, and that is perfectly fine, but if we have some statistical model that we know it is going to be wrong 0.001% of the time.. and we still let it run.. then something needs to be done about this.

I know there are benefits to autosel, but overtime, if this is not being audited, many pieces of the kernel will get broken unnoticed until some poor distro decides to upgrade their kernel version.

Quite a few distros are always running on the latest LTS releases, Android isn't that far behind either at this point.

There are actually very few non-LTS users at this point...

...

...
...
<...>

...
...
Let me put my Microsoft employee hat on here. We have driver/net/hyperv/ which definitely wasn't getting all the fixes it should have been getting without AUTOSEL.

until some patch which shouldn't get backported slips through, believe me this will happen, just give it some time ..

Bugs are inevitable, I don't see many differences between bugs introduced by manually cherry-picking or automatically one.

Oh bugs slip in, that's why I track how many bugs slipped via stable tagged commits vs non-stable tagged ones, and the statistic may surprise you.

Statistics do not matter here, what really matters is that there is a possibility of a non-human induced error, this should be a no no. or at least make it an opt-in thing for those who want to take their chances and keep a close eye on it..

Hrm, why? Pretend that the bot is a human sitting somewhere sending mails out, how does it change anything?

...

...
The solution here is to beef up your testing infrastructure rather than

So please let me opt-in until I beef up my testing infra.

Already did :)

...

...
taking less patches; we still want to have *all* the fixes, right?

if you can be sure 100% it is the right thing to do, then yes, please don't hesitate to take that patch, even without asking anyone !!

Again, Humans are allowed to make mistakes.. AI is not.

Again, why?

-- Thanks, Sasha

Saeed Mahameed

9:08 p.m.

New subject: [PATCH AUTOSEL 4.9 09/26] net/mlx5e: Init ethtool steering for representors

On Thu, 2020-04-16 at 15:58 -0400, Sasha Levin wrote:

...

On Thu, Apr 16, 2020 at 07:07:13PM +0000, Saeed Mahameed wrote:

...
On Thu, 2020-04-16 at 09:30 -0400, Sasha Levin wrote:

...
On Thu, Apr 16, 2020 at 08:24:09AM +0300, Leon Romanovsky wrote:

...
On Thu, Apr 16, 2020 at 04:08:10AM +0000, Saeed Mahameed wrote:

...
On Wed, 2020-04-15 at 20:00 -0400, Sasha Levin wrote:

...
On Wed, Apr 15, 2020 at 05:18:38PM +0100, Edward Cree wrote: > Firstly, let me apologise: my previous email was too > harsh > and too > assertiveabout things that were really more uncertain > and > unclear. > > On 14/04/2020 21:57, Sasha Levin wrote: > > I've pointed out that almost 50% of commits tagged for > > stable do > > not > > have a fixes tag, and yet they are fixes. You really > > deduce > > things based > > on coin flip probability? > Yes, but far less than 50% of commits *not* tagged for > stable > have > a fixes > tag. It's not about hard-and-fast Aristotelian > "deductions", like > "this > doesn't have Fixes:, therefore it is not a stable > candidate", it's > about > probabilistic "induction". > > > "it does increase the amount of countervailing evidence > > needed to > > conclude a commit is a fix" - Please explain this > > argument > > given > > the > > above. > Are you familiar with Bayesian statistics? If not, I'd > suggest > reading > something like http://yudkowsky.net/rational/bayes/ > which > explains > it. > There's a big difference between a coin flip and a > _correlated_ > coin flip.

I'd maybe point out that the selection process is based on a neural network which knows about the existence of a Fixes tag in a commit.

It does exactly what you're describing, but also taking a bunch more factors into it's desicion process ("panic"? "oops"? "overflow"? etc).

I am not against AUTOSEL in general, as long as the decision to know how far back it is allowed to take a patch is made deterministically and not statistically based on some AI hunch.

Any auto selection for a patch without a Fixes tags can be catastrophic .. imagine a patch without a Fixes Tag with a single line that is fixing some "oops", such patch can be easily applied cleanly to stable- v.x and stable-v.y .. while it fixes the issue on v.x it might have catastrophic results on v.y ..

I tried to imagine such flow and failed to do so. Are you talking about anything specific or imaginary case?

It happens, rarely, but it does. However, all the cases I can think of happened with a stable tagged commit without a fixes where it's backport to an older tree caused unintended behavior (local denial of service in one case).

The scenario you have in mind is true for both stable and non- stable tagged patches, so it you want to restrict how we deal with commits that don't have a fixes tag shouldn't it be true for *all* commits?

All commits? even the ones without "oops" in them ? where does this stop ? :) We _must_ have a hard and deterministic cut for how far back to take a patch based on a human decision.. unless we are 100% positive autoselection AI can never make a mistake.

Humans are allowed to make mistakes, AI is not.

Oh I'm reviewing all patches myself after the bot does it's selection, you can blame me for these screw ups.

...
If a Fixes tag is wrong, then a human will be blamed, and that is perfectly fine, but if we have some statistical model that we know it is going to be wrong 0.001% of the time.. and we still let it run.. then something needs to be done about this.

I know there are benefits to autosel, but overtime, if this is not being audited, many pieces of the kernel will get broken unnoticed until some poor distro decides to upgrade their kernel version.

Quite a few distros are always running on the latest LTS releases, Android isn't that far behind either at this point.

There are actually very few non-LTS users at this point...

...
...
...
<...>

...
...
Let me put my Microsoft employee hat on here. We have driver/net/hyperv/ which definitely wasn't getting all the fixes it should have been getting without AUTOSEL.

until some patch which shouldn't get backported slips through, believe me this will happen, just give it some time ..

Bugs are inevitable, I don't see many differences between bugs introduced by manually cherry-picking or automatically one.

Oh bugs slip in, that's why I track how many bugs slipped via stable tagged commits vs non-stable tagged ones, and the statistic may surprise you.

Statistics do not matter here, what really matters is that there is a possibility of a non-human induced error, this should be a no no. or at least make it an opt-in thing for those who want to take their chances and keep a close eye on it..

Hrm, why? Pretend that the bot is a human sitting somewhere sending mails out, how does it change anything?

If i know a bot might do something wrong, i Fix it and make sure it will never do it again. For humans i just can't do that, can I ? :) so this is the difference and why we all have jobs ..

...

...
...
The solution here is to beef up your testing infrastructure rather than

So please let me opt-in until I beef up my testing infra.

Already did :)

No you didn't :), I received more than 5 AUTOSEL emails only today and yesterday.

Please don't opt mlx5 out just yet ;-), i need to do some more research and make up my mind..

...

...
...
taking less patches; we still want to have *all* the fixes, right?

if you can be sure 100% it is the right thing to do, then yes, please don't hesitate to take that patch, even without asking anyone !!

Again, Humans are allowed to make mistakes.. AI is not.

Again, why?

Because AI is not there yet.. and this is a very big philosophical question.

Let me simplify: there is a bug in the AI, where it can choose a wrong patch, let's fix it.

gregkh＠linuxfoundation.org

17 Apr 17 Apr

8:28 a.m.

New subject: [PATCH AUTOSEL 4.9 09/26] net/mlx5e: Init ethtool steering for representors

On Thu, Apr 16, 2020 at 09:08:06PM +0000, Saeed Mahameed wrote:

...

On Thu, 2020-04-16 at 15:58 -0400, Sasha Levin wrote:

...
On Thu, Apr 16, 2020 at 07:07:13PM +0000, Saeed Mahameed wrote:

...
On Thu, 2020-04-16 at 09:30 -0400, Sasha Levin wrote:

...
On Thu, Apr 16, 2020 at 08:24:09AM +0300, Leon Romanovsky wrote:

...
On Thu, Apr 16, 2020 at 04:08:10AM +0000, Saeed Mahameed wrote:

...
On Wed, 2020-04-15 at 20:00 -0400, Sasha Levin wrote: > On Wed, Apr 15, 2020 at 05:18:38PM +0100, Edward Cree > wrote: > > Firstly, let me apologise: my previous email was too > > harsh > > and too > > assertiveabout things that were really more uncertain > > and > > unclear. > > > > On 14/04/2020 21:57, Sasha Levin wrote: > > > I've pointed out that almost 50% of commits tagged for > > > stable do > > > not > > > have a fixes tag, and yet they are fixes. You really > > > deduce > > > things based > > > on coin flip probability? > > Yes, but far less than 50% of commits *not* tagged for > > stable > > have > > a fixes > > tag. It's not about hard-and-fast Aristotelian > > "deductions", like > > "this > > doesn't have Fixes:, therefore it is not a stable > > candidate", it's > > about > > probabilistic "induction". > > > > > "it does increase the amount of countervailing evidence > > > needed to > > > conclude a commit is a fix" - Please explain this > > > argument > > > given > > > the > > > above. > > Are you familiar with Bayesian statistics? If not, I'd > > suggest > > reading > > something like http://yudkowsky.net/rational/bayes/ > > which > > explains > > it. > > There's a big difference between a coin flip and a > > _correlated_ > > coin flip. > > I'd maybe point out that the selection process is based on > a > neural > network which knows about the existence of a Fixes tag in a > commit. > > It does exactly what you're describing, but also taking a > bunch > more > factors into it's desicion process ("panic"? "oops"? > "overflow"? > etc). >

I am not against AUTOSEL in general, as long as the decision to know how far back it is allowed to take a patch is made deterministically and not statistically based on some AI hunch.

Any auto selection for a patch without a Fixes tags can be catastrophic .. imagine a patch without a Fixes Tag with a single line that is fixing some "oops", such patch can be easily applied cleanly to stable- v.x and stable-v.y .. while it fixes the issue on v.x it might have catastrophic results on v.y ..

I tried to imagine such flow and failed to do so. Are you talking about anything specific or imaginary case?

It happens, rarely, but it does. However, all the cases I can think of happened with a stable tagged commit without a fixes where it's backport to an older tree caused unintended behavior (local denial of service in one case).

The scenario you have in mind is true for both stable and non- stable tagged patches, so it you want to restrict how we deal with commits that don't have a fixes tag shouldn't it be true for *all* commits?

All commits? even the ones without "oops" in them ? where does this stop ? :) We _must_ have a hard and deterministic cut for how far back to take a patch based on a human decision.. unless we are 100% positive autoselection AI can never make a mistake.

Humans are allowed to make mistakes, AI is not.

Oh I'm reviewing all patches myself after the bot does it's selection, you can blame me for these screw ups.

...
If a Fixes tag is wrong, then a human will be blamed, and that is perfectly fine, but if we have some statistical model that we know it is going to be wrong 0.001% of the time.. and we still let it run.. then something needs to be done about this.

I know there are benefits to autosel, but overtime, if this is not being audited, many pieces of the kernel will get broken unnoticed until some poor distro decides to upgrade their kernel version.

Quite a few distros are always running on the latest LTS releases, Android isn't that far behind either at this point.

There are actually very few non-LTS users at this point...

...
...
...
<...>

...
> Let me put my Microsoft employee hat on here. We have > driver/net/hyperv/ > which definitely wasn't getting all the fixes it should > have > been > getting without AUTOSEL. >

until some patch which shouldn't get backported slips through, believe me this will happen, just give it some time ..

Bugs are inevitable, I don't see many differences between bugs introduced by manually cherry-picking or automatically one.

Oh bugs slip in, that's why I track how many bugs slipped via stable tagged commits vs non-stable tagged ones, and the statistic may surprise you.

Statistics do not matter here, what really matters is that there is a possibility of a non-human induced error, this should be a no no. or at least make it an opt-in thing for those who want to take their chances and keep a close eye on it..

Hrm, why? Pretend that the bot is a human sitting somewhere sending mails out, how does it change anything?

If i know a bot might do something wrong, i Fix it and make sure it will never do it again. For humans i just can't do that, can I ? :) so this is the difference and why we all have jobs ..

...
...
...
The solution here is to beef up your testing infrastructure rather than

So please let me opt-in until I beef up my testing infra.

Already did :)

No you didn't :), I received more than 5 AUTOSEL emails only today and yesterday.

Please don't opt mlx5 out just yet ;-), i need to do some more research and make up my mind..

...
...
...
taking less patches; we still want to have *all* the fixes, right?

if you can be sure 100% it is the right thing to do, then yes, please don't hesitate to take that patch, even without asking anyone !!

Again, Humans are allowed to make mistakes.. AI is not.

Again, why?

Because AI is not there yet.. and this is a very big philosophical question.

Let me simplify: there is a bug in the AI, where it can choose a wrong patch, let's fix it.

You do realize that there are at least 2 steps in this "AI" where people are involved. The first is when Sasha goes thorough the patches and weeds out all of the "bad ones".

The second is when you, the maintainer, is asked if you think there is a problem if the patch is to be merged.

Then there's also the third, when again, I send out emails for the -rc process with the patches involved, and you are cc:ed on it.

This isn't an unchecked process here running with no human checks at all in it, so please don't speak of it like it is.

thanks,

greg k-h

Saeed Mahameed

10:23 p.m.

New subject: [PATCH AUTOSEL 4.9 09/26] net/mlx5e: Init ethtool steering for representors

On Fri, 2020-04-17 at 10:28 +0200, gregkh@linuxfoundation.org wrote:

...

On Thu, Apr 16, 2020 at 09:08:06PM +0000, Saeed Mahameed wrote:

...
On Thu, 2020-04-16 at 15:58 -0400, Sasha Levin wrote:

...
On Thu, Apr 16, 2020 at 07:07:13PM +0000, Saeed Mahameed wrote:

...
On Thu, 2020-04-16 at 09:30 -0400, Sasha Levin wrote:

...
On Thu, Apr 16, 2020 at 08:24:09AM +0300, Leon Romanovsky wrote:

...
On Thu, Apr 16, 2020 at 04:08:10AM +0000, Saeed Mahameed wrote: > On Wed, 2020-04-15 at 20:00 -0400, Sasha Levin wrote: > > On Wed, Apr 15, 2020 at 05:18:38PM +0100, Edward Cree > > wrote: > > > Firstly, let me apologise: my previous email was too > > > harsh > > > and too > > > assertiveabout things that were really more > > > uncertain > > > and > > > unclear. > > > > > > On 14/04/2020 21:57, Sasha Levin wrote: > > > > I've pointed out that almost 50% of commits tagged > > > > for > > > > stable do > > > > not > > > > have a fixes tag, and yet they are fixes. You > > > > really > > > > deduce > > > > things based > > > > on coin flip probability? > > > Yes, but far less than 50% of commits *not* tagged > > > for > > > stable > > > have > > > a fixes > > > tag. It's not about hard-and-fast Aristotelian > > > "deductions", like > > > "this > > > doesn't have Fixes:, therefore it is not a stable > > > candidate", it's > > > about > > > probabilistic "induction". > > > > > > > "it does increase the amount of countervailing > > > > evidence > > > > needed to > > > > conclude a commit is a fix" - Please explain this > > > > argument > > > > given > > > > the > > > > above. > > > Are you familiar with Bayesian statistics? If not, > > > I'd > > > suggest > > > reading > > > something like http://yudkowsky.net/rational/bayes/ > > > which > > > explains > > > it. > > > There's a big difference between a coin flip and a > > > _correlated_ > > > coin flip. > > > > I'd maybe point out that the selection process is based > > on > > a > > neural > > network which knows about the existence of a Fixes tag > > in a > > commit. > > > > It does exactly what you're describing, but also taking > > a > > bunch > > more > > factors into it's desicion process ("panic"? "oops"? > > "overflow"? > > etc). > > > > I am not against AUTOSEL in general, as long as the > decision > to > know > how far back it is allowed to take a patch is made > deterministically > and not statistically based on some AI hunch. > > Any auto selection for a patch without a Fixes tags can > be > catastrophic > .. imagine a patch without a Fixes Tag with a single line > that is > fixing some "oops", such patch can be easily applied > cleanly > to > stable- > v.x and stable-v.y .. while it fixes the issue on v.x it > might > have > catastrophic results on v.y ..

I tried to imagine such flow and failed to do so. Are you talking about anything specific or imaginary case?

It happens, rarely, but it does. However, all the cases I can think of happened with a stable tagged commit without a fixes where it's backport to an older tree caused unintended behavior (local denial of service in one case).

The scenario you have in mind is true for both stable and non- stable tagged patches, so it you want to restrict how we deal with commits that don't have a fixes tag shouldn't it be true for *all* commits?

All commits? even the ones without "oops" in them ? where does this stop ? :) We _must_ have a hard and deterministic cut for how far back to take a patch based on a human decision.. unless we are 100% positive autoselection AI can never make a mistake.

Humans are allowed to make mistakes, AI is not.

Oh I'm reviewing all patches myself after the bot does it's selection, you can blame me for these screw ups.

...
If a Fixes tag is wrong, then a human will be blamed, and that is perfectly fine, but if we have some statistical model that we know it is going to be wrong 0.001% of the time.. and we still let it run.. then something needs to be done about this.

I know there are benefits to autosel, but overtime, if this is not being audited, many pieces of the kernel will get broken unnoticed until some poor distro decides to upgrade their kernel version.

Quite a few distros are always running on the latest LTS releases, Android isn't that far behind either at this point.

There are actually very few non-LTS users at this point...

...
...
...
<...> > > Let me put my Microsoft employee hat on here. We have > > driver/net/hyperv/ > > which definitely wasn't getting all the fixes it should > > have > > been > > getting without AUTOSEL. > > > > until some patch which shouldn't get backported slips > through, > believe > me this will happen, just give it some time ..

Bugs are inevitable, I don't see many differences between bugs introduced by manually cherry-picking or automatically one.

Oh bugs slip in, that's why I track how many bugs slipped via stable tagged commits vs non-stable tagged ones, and the statistic may surprise you.

Statistics do not matter here, what really matters is that there is a possibility of a non-human induced error, this should be a no no. or at least make it an opt-in thing for those who want to take their chances and keep a close eye on it..

Hrm, why? Pretend that the bot is a human sitting somewhere sending mails out, how does it change anything?

If i know a bot might do something wrong, i Fix it and make sure it will never do it again. For humans i just can't do that, can I ? :) so this is the difference and why we all have jobs ..

...
...
...
The solution here is to beef up your testing infrastructure rather than

So please let me opt-in until I beef up my testing infra.

Already did :)

No you didn't :), I received more than 5 AUTOSEL emails only today and yesterday.

Please don't opt mlx5 out just yet ;-), i need to do some more research and make up my mind..

...
...
...
taking less patches; we still want to have *all* the fixes, right?

if you can be sure 100% it is the right thing to do, then yes, please don't hesitate to take that patch, even without asking anyone !!

Again, Humans are allowed to make mistakes.. AI is not.

Again, why?

Because AI is not there yet.. and this is a very big philosophical question.

Let me simplify: there is a bug in the AI, where it can choose a wrong patch, let's fix it.

You do realize that there are at least 2 steps in this "AI" where people are involved. The first is when Sasha goes thorough the patches and weeds out all of the "bad ones".

The second is when you, the maintainer, is asked if you think there is a problem if the patch is to be merged.

Then there's also the third, when again, I send out emails for the -rc process with the patches involved, and you are cc:ed on it.

This isn't an unchecked process here running with no human checks at all in it, so please don't speak of it like it is.

Sure I understand,

But with all do respect to Sasha and i know he is doing a great job, he just can't sign-off on all of the patches on all of the linux kernel and determine just by himself if a patch is good or not.. and the maintainer review is what actually matters here.

But the maintainer ack is an optional thing, and I bet that the vast majority don't even look at these e-mails.

My vision is that we make this an opt-in thing, and we somehow force all active and important kernel subsystems to opt-in, and make it the maintainer responsibility if something goes wrong.

I understand from your statistics that this system is working very well, so i believe eventually every maintainer with a code that matters will come on board.

this way we don't risk it for inactive and less important subsystems/drivers.. and we guarantee the whole thing is properly audited with the maintainers on-board..

gregkh＠linuxfoundation.org

18 Apr 18 Apr

10:51 a.m.

New subject: [PATCH AUTOSEL 4.9 09/26] net/mlx5e: Init ethtool steering for representors

On Fri, Apr 17, 2020 at 10:23:37PM +0000, Saeed Mahameed wrote:

...

On Fri, 2020-04-17 at 10:28 +0200, gregkh@linuxfoundation.org wrote:

...
...
Let me simplify: there is a bug in the AI, where it can choose a wrong patch, let's fix it.

You do realize that there are at least 2 steps in this "AI" where people are involved. The first is when Sasha goes thorough the patches and weeds out all of the "bad ones".

The second is when you, the maintainer, is asked if you think there is a problem if the patch is to be merged.

Then there's also the third, when again, I send out emails for the -rc process with the patches involved, and you are cc:ed on it.

This isn't an unchecked process here running with no human checks at all in it, so please don't speak of it like it is.

Sure I understand,

But with all do respect to Sasha and i know he is doing a great job, he just can't sign-off on all of the patches on all of the linux kernel and determine just by himself if a patch is good or not.. and the maintainer review is what actually matters here.

The maintainer review already happened when the patch went into Linus's tree.

...

But the maintainer ack is an optional thing, and I bet that the vast majority don't even look at these e-mails.

That is true.

...

My vision is that we make this an opt-in thing, and we somehow force all active and important kernel subsystems to opt-in, and make it the maintainer responsibility if something goes wrong.

That's a nice vision, and I too want a pony :)

Seriously, the first rule of the stable trees being created was that it was not going to cause any extra work for a maintainer to do, given that even 18 years ago, our maintainers were overloaded. Now that didn't totally happen, as I do ask for a cc: stable line to be added to patches to give me a hint as to what to apply.

Now some maintainers really don't care about stable trees, and don't even put those lines, which is fine for them, but not fine for me in wanting to make stable trees that contain the needed fixes for them that are going into Linus's tree. So over the years we have come up with tools to dig these patches out of Linus's tree. The latest version of that is this AUTOSEL tool, and after a number of maintainers complained that being notified at the last possible second (i.e. during a -rc cycle) was not early enough to stop some AUTOSEL patches from being merged, Sasha started up the process you see now, giving maintainers a few _weeks_ to object.

For the maintainers that don't care, fine, they just write a nice procmail rule and send the email to the round filing cabinet. For the maintainers that do care, they get a chance to object. For the maintainers that insist they are doing this all right and marking things correctly and don't want AUTOSEL running on their subsystems, they too have that option, which a few have already taken.

So that's where we are today, a process that has evolved over the decades into the one that at the moment, is producing pretty solid and good stable kernels as per the review of external parties. Yes, we can do better and find more patches that need to be backported, and are working on that. But to stop and try to go to an opt-in-only process would cause us to go backwards in the ability for us to provide kernels with useful bugfixes to users.

...

I understand from your statistics that this system is working very well, so i believe eventually every maintainer with a code that matters will come on board.

Sorry, that just will not happen. As proof of that, look at all of the maintainers today that are not "on board" with just a simple "cc: stable". And as I say above, that's fine, I am not going to ask them to do extra work that they do not want to do. And because of that, I, and others like Sasha, are going to have to do _extra_ work to make a better stable kernel release, and that's fine, we are the crazy ones here.

...

this way we don't risk it for inactive and less important subsystems/drivers.. and we guarantee the whole thing is properly audited with the maintainers on-board..

Have you met these maintainers that you can tell what to do despite not being their manager? If you think you can get them on board, then please, try to get them to do the simple thing first (cc: stable) and then we can talk :)

good luck!

greg k-h

Sasha Levin

17 Apr 17 Apr

1:21 p.m.

New subject: [PATCH AUTOSEL 4.9 09/26] net/mlx5e: Init ethtool steering for representors

On Thu, Apr 16, 2020 at 09:08:06PM +0000, Saeed Mahameed wrote:

...

On Thu, 2020-04-16 at 15:58 -0400, Sasha Levin wrote:

...
Hrm, why? Pretend that the bot is a human sitting somewhere sending mails out, how does it change anything?

If i know a bot might do something wrong, i Fix it and make sure it will never do it again. For humans i just can't do that, can I ? :) so this is the difference and why we all have jobs ..

It's tricky because there's no one true value here. Humans are constantly wrong about whether a patch is a fix or not, so how can I train my bot to be 100% right?

...

...
...
...
The solution here is to beef up your testing infrastructure rather than

So please let me opt-in until I beef up my testing infra.

Already did :)

No you didn't :), I received more than 5 AUTOSEL emails only today and yesterday.

Appologies, this is just a result of how my process goes - patch selection happened a few days ago (which is when blacklists are applied), it's been running through my tests since, and mails get sent out only after tests.

...

Please don't opt mlx5 out just yet ;-), i need to do some more research and make up my mind..

Alrighty. Keep in mind you can always reply with just a "no" to AUTOSEL mails, you don't have to explain why you don't want it included to keep it easy.

...

...
...
...
taking less patches; we still want to have *all* the fixes, right?

if you can be sure 100% it is the right thing to do, then yes, please don't hesitate to take that patch, even without asking anyone !!

Again, Humans are allowed to make mistakes.. AI is not.

Again, why?

Because AI is not there yet.. and this is a very big philosophical question.

Let me simplify: there is a bug in the AI, where it can choose a wrong patch, let's fix it.

But we don't know if it's wrong or not, so how can we teach it to be 100% right?

I keep retraining the NN based on previous results which improves it's accuracy, but it'll never be 100%.

The NN claims we're at ~95% with regards to past results.

-- Thanks, Sasha

Saeed Mahameed

10:38 p.m.

New subject: [PATCH AUTOSEL 4.9 09/26] net/mlx5e: Init ethtool steering for representors

On Fri, 2020-04-17 at 09:21 -0400, Sasha Levin wrote:

...

On Thu, Apr 16, 2020 at 09:08:06PM +0000, Saeed Mahameed wrote:

...
On Thu, 2020-04-16 at 15:58 -0400, Sasha Levin wrote:

...
Hrm, why? Pretend that the bot is a human sitting somewhere sending mails out, how does it change anything?

If i know a bot might do something wrong, i Fix it and make sure it will never do it again. For humans i just can't do that, can I ? :) so this is the difference and why we all have jobs ..

It's tricky because there's no one true value here. Humans are constantly wrong about whether a patch is a fix or not, so how can I train my bot to be 100% right?

...
...
...
...
The solution here is to beef up your testing infrastructure rather than

So please let me opt-in until I beef up my testing infra.

Already did :)

No you didn't :), I received more than 5 AUTOSEL emails only today and yesterday.

Appologies, this is just a result of how my process goes - patch selection happened a few days ago (which is when blacklists are applied), it's been running through my tests since, and mails get sent out only after tests.

No worries, as you see i am not really against this AI .. i am just worried about it being an opt-out thing :)

...

...
Please don't opt mlx5 out just yet ;-), i need to do some more research and make up my mind..

Alrighty. Keep in mind you can always reply with just a "no" to AUTOSEL mails, you don't have to explain why you don't want it included to keep it easy.

Sure ! thanks .

...

...
...
...
...
taking less patches; we still want to have *all* the fixes, right?

if you can be sure 100% it is the right thing to do, then yes, please don't hesitate to take that patch, even without asking anyone !!

Again, Humans are allowed to make mistakes.. AI is not.

Again, why?

Because AI is not there yet.. and this is a very big philosophical question.

Let me simplify: there is a bug in the AI, where it can choose a wrong patch, let's fix it.

But we don't know if it's wrong or not, so how can we teach it to be 100% right?

I keep retraining the NN based on previous results which improves it's accuracy, but it'll never be 100%.

The NN claims we're at ~95% with regards to past results.

I didn't really mean for you to fix it..

I am just against using un-audited AI. because i know it can never reach 100%.

Just out of curiosity : what are these 5% failure rate, what types of failures ? how are they identified and how are they feedback into the NN re-training ?

Or Gerlitz

16 Apr 16 Apr

1:40 p.m.

New subject: [PATCH AUTOSEL 4.9 09/26] net/mlx5e: Init ethtool steering for representors

On Thu, Apr 16, 2020 at 3:00 AM Sasha Levin sashal@kernel.org wrote:

...

I'd maybe point out that the selection process is based on a neural network which knows about the existence of a Fixes tag in a commit.

It does exactly what you're describing, but also taking a bunch more factors into it's desicion process ("panic"? "oops"? "overflow"? etc).

As Saeed commented, every extra line in stable / production kernel is wrong. IMHO it doesn't make any sense to take into stable automatically any patch that doesn't have fixes line. Do you have 1/2/3/4/5 concrete examples from your (referring to your Microsoft employee hat comment below) or other's people production environment where patches proved to be necessary but they lacked the fixes tag - would love to see them.

We've been coaching new comers for years during internal and on-list code reviews to put proper fixes tag. This serves (A) for the upstream human review of the patch and (B) reasonable human stable considerations.

You are practically saying that for cases we screwed up stage (A) you can somehow still get away with good results on stage (B) - I don't accept it. BTW - during my reviews I tend to ask/require developers to skip the word panic, and instead better explain the nature of the problem / result.

...

...
...
This is great, but the kernel is more than just net/. Note that I also do not look at net/ itself, but rather drivers/net/ as those end up with a bunch of missed fixes.

...

...
drivers/net/ goes through the same DaveM net/net-next trees, with the same rules.

you ignored this comment, any more specific complaints?

...

Let me put my Microsoft employee hat on here. We have driver/net/hyperv/ which definitely wasn't getting all the fixes it should have been getting without AUTOSEL.

...

While net/ is doing great, drivers/net/ is not. If it's indeed following the same rules then we need to talk about how we get done right.

I never [1] saw -stable push requests being ignored here in netdev. Your drivers have four listed maintainers and it's common habit by commercial companies to have paid && human (non autosel robots) maintainers that take care of their open source drivers. As in commercial SW products, Linux has a current, next and past (stable) releases, so something sounds as missing to me in your care matrix.

[1] actually I do remember that once or twice out of the 2020 times we asked, a patch was not sent to -stable by the sub-system maintainer mistake which he fixed(..) later

Sasha Levin

2:04 p.m.

New subject: [PATCH AUTOSEL 4.9 09/26] net/mlx5e: Init ethtool steering for representors

On Thu, Apr 16, 2020 at 04:40:31PM +0300, Or Gerlitz wrote:

...

On Thu, Apr 16, 2020 at 3:00 AM Sasha Levin sashal@kernel.org wrote:

...
I'd maybe point out that the selection process is based on a neural network which knows about the existence of a Fixes tag in a commit.

It does exactly what you're describing, but also taking a bunch more factors into it's desicion process ("panic"? "oops"? "overflow"? etc).

As Saeed commented, every extra line in stable / production kernel is wrong. IMHO it doesn't make any sense to take into stable automatically any patch that doesn't have fixes line. Do you have 1/2/3/4/5 concrete examples from your (referring to your Microsoft employee hat comment below) or other's people production environment where patches proved to be necessary but they lacked the fixes tag - would love to see them.

Sure, here are 5 from the past few months:

e5e884b42639 ("libertas: Fix two buffer overflows at parsing bss descriptor") 3d94a4a8373b ("mwifiex: fix possible heap overflow in mwifiex_process_country_ie()") 39d170b3cb62 ("ath6kl: fix a NULL-ptr-deref bug in ath6kl_usb_alloc_urb_from_pipe()") 8b51dc729147 ("rsi: fix a double free bug in rsi_91x_deinit()") 5146f95df782 ("USB: hso: Fix OOB memory access in hso_probe/hso_get_config_data")

5 Different drivers, neither has a stable tag or a Fixes: tag, all 5 have a CVE number assigned to them.

...

We've been coaching new comers for years during internal and on-list code reviews to put proper fixes tag. This serves (A) for the upstream human review of the patch and (B) reasonable human stable considerations.

Thanks for doing it - we do see more and more fixes tags.

...

You are practically saying that for cases we screwed up stage (A) you can somehow still get away with good results on stage (B) - I don't accept it. BTW - during my reviews I tend to ask/require developers to skip the word panic, and instead better explain the nature of the problem / result.

Humans are still humans, and humans make mistakes. Fixes tags get forgotten, stable tags get forgotten.

I very much belive you that the mellanox stuff are in good shape thanks to your efforts, but the kernel world is bigger than a few select drivers.

...

...
...
...
This is great, but the kernel is more than just net/. Note that I also do not look at net/ itself, but rather drivers/net/ as those end up with a bunch of missed fixes.

...
...
drivers/net/ goes through the same DaveM net/net-next trees, with the same rules.

you ignored this comment, any more specific complaints?

See above (the example commits). The drivers/net/ stuff don't work as well as net/ sadly.

...

...
Let me put my Microsoft employee hat on here. We have driver/net/hyperv/ which definitely wasn't getting all the fixes it should have been getting without AUTOSEL.

...
While net/ is doing great, drivers/net/ is not. If it's indeed following the same rules then we need to talk about how we get done right.

I never [1] saw -stable push requests being ignored here in netdev. Your drivers have four listed maintainers and it's common habit by commercial companies to have paid && human (non autosel robots) maintainers that take care of their open source drivers. As in commercial SW products, Linux has a current, next and past (stable) releases, so something sounds as missing to me in your care matrix.

How come? DaveM is specifically asking not to add stable tags because he will do the selection himself, right? So the hyperv stuff indeed don't include a stable tag, but all fixes should have a proper Fixes: tag.

So why don't they end up in a stable tree?

If we need to send a mail saying which commits should go to stable, we might as well tag them for stable to begin with, right?

...

[1] actually I do remember that once or twice out of the 2020 times we asked, a patch was not sent to -stable by the sub-system maintainer mistake which he fixed(..) later

-- Thanks, Sasha

Or Gerlitz

2:17 p.m.

New subject: [PATCH AUTOSEL 4.9 09/26] net/mlx5e: Init ethtool steering for representors

On Thu, Apr 16, 2020 at 5:04 PM Sasha Levin sashal@kernel.org wrote:

...

On Thu, Apr 16, 2020 at 04:40:31PM +0300, Or Gerlitz wrote:

...
On Thu, Apr 16, 2020 at 3:00 AM Sasha Levin sashal@kernel.org wrote:

...
I'd maybe point out that the selection process is based on a neural network which knows about the existence of a Fixes tag in a commit.

It does exactly what you're describing, but also taking a bunch more factors into it's desicion process ("panic"? "oops"? "overflow"? etc).

As Saeed commented, every extra line in stable / production kernel is wrong. IMHO it doesn't make any sense to take into stable automatically any patch that doesn't have fixes line. Do you have 1/2/3/4/5 concrete examples from your (referring to your Microsoft employee hat comment below) or other's people production environment where patches proved to be necessary but they lacked the fixes tag - would love to see them.

Sure, here are 5 from the past few months:

e5e884b42639 ("libertas: Fix two buffer overflows at parsing bss descriptor") 3d94a4a8373b ("mwifiex: fix possible heap overflow in mwifiex_process_country_ie()") 39d170b3cb62 ("ath6kl: fix a NULL-ptr-deref bug in ath6kl_usb_alloc_urb_from_pipe()") 8b51dc729147 ("rsi: fix a double free bug in rsi_91x_deinit()") 5146f95df782 ("USB: hso: Fix OOB memory access in hso_probe/hso_get_config_data")

5 Different drivers, neither has a stable tag or a Fixes: tag, all 5 have a CVE number assigned to them.

CVE number sounds good enough to me to AI around and pull to stable, BUT only to where relevant, and nothing beyond (see below).

...

...
We've been coaching new comers for years during internal and on-list code reviews to put proper fixes tag. This serves (A) for the upstream human review of the patch and (B) reasonable human stable considerations.

Thanks for doing it - we do see more and more fixes tags.

...
You are practically saying that for cases we screwed up stage (A) you can somehow still get away with good results on stage (B) - I don't accept it. BTW - during my reviews I tend to ask/require developers to skip the word panic, and instead better explain the nature of the problem / result.

Humans are still humans, and humans make mistakes. Fixes tags get forgotten

In the netdev land, the very much usual habit is for -rc patches to have Fixes tag, so maybe some RCA comment here would be to enforce that better across the kermel land?

...

stable tags get forgotten.

An easy AI would be to deduce the -stable tag from the -fixes tag, just find out where the offending patch was accepted and never/ever (please!) push a fix beyond that kernel.

...

I very much belive you that the mellanox stuff are in good shape thanks to your efforts, but the kernel world is bigger than a few select drivers.

...

...
...
...
...
This is great, but the kernel is more than just net/. Note that I also do not look at net/ itself, but rather drivers/net/ as those end up with a bunch of missed fixes.

...

...
...
...
drivers/net/ goes through the same DaveM net/net-next trees, with the same rules.

...

...
you ignored this comment, any more specific complaints?

...

See above (the example commits). The drivers/net/ stuff don't work as well as net/ sadly.

Disagree.

If there is a problem it is due to some driver/net/ driver maintainers not doing their job good enough.

...

...
...
Let me put my Microsoft employee hat on here. We have driver/net/hyperv/ which definitely wasn't getting all the fixes it should have been getting without AUTOSEL.

...
While net/ is doing great, drivers/net/ is not. If it's indeed following the same rules then we need to talk about how we get done right.

I never [1] saw -stable push requests being ignored here in netdev. Your drivers have four listed maintainers and it's common habit by commercial companies to have paid && human (non autosel robots) maintainers that take care of their open source drivers. As in commercial SW products, Linux has a current, next and past (stable) releases, so something sounds as missing to me in your care matrix.

How come? DaveM is specifically asking not to add stable tags because he will do the selection himself, right? So the hyperv stuff indeed don't include a stable tag, but all fixes should have a proper Fixes: tag.

Ask/tell your maintainer to note that in their cover letters, it will be taken.

...

So why don't they end up in a stable tree?

...

If we need to send a mail saying which commits should go to stable, we might as well tag them for stable to begin with, right?

so how about you go and argue with the netdev maintainer and settle your complaints instead of WA your disagreements using autosel?

Sasha Levin

2:36 p.m.

New subject: [PATCH AUTOSEL 4.9 09/26] net/mlx5e: Init ethtool steering for representors

On Thu, Apr 16, 2020 at 05:17:09PM +0300, Or Gerlitz wrote:

...

On Thu, Apr 16, 2020 at 5:04 PM Sasha Levin sashal@kernel.org wrote:

...
On Thu, Apr 16, 2020 at 04:40:31PM +0300, Or Gerlitz wrote:

...
On Thu, Apr 16, 2020 at 3:00 AM Sasha Levin sashal@kernel.org wrote:

...
I'd maybe point out that the selection process is based on a neural network which knows about the existence of a Fixes tag in a commit.

It does exactly what you're describing, but also taking a bunch more factors into it's desicion process ("panic"? "oops"? "overflow"? etc).

As Saeed commented, every extra line in stable / production kernel is wrong. IMHO it doesn't make any sense to take into stable automatically any patch that doesn't have fixes line. Do you have 1/2/3/4/5 concrete examples from your (referring to your Microsoft employee hat comment below) or other's people production environment where patches proved to be necessary but they lacked the fixes tag - would love to see them.

Sure, here are 5 from the past few months:

e5e884b42639 ("libertas: Fix two buffer overflows at parsing bss descriptor") 3d94a4a8373b ("mwifiex: fix possible heap overflow in mwifiex_process_country_ie()") 39d170b3cb62 ("ath6kl: fix a NULL-ptr-deref bug in ath6kl_usb_alloc_urb_from_pipe()") 8b51dc729147 ("rsi: fix a double free bug in rsi_91x_deinit()") 5146f95df782 ("USB: hso: Fix OOB memory access in hso_probe/hso_get_config_data")

5 Different drivers, neither has a stable tag or a Fixes: tag, all 5 have a CVE number assigned to them.

CVE number sounds good enough to me to AI around and pull to stable, BUT only to where relevant, and nothing beyond (see below).

But this is true with so many other things... This example was easy because I could grep for CVE, but what about commits that don't get assigned a CVE number but are just as severe?

Here are few without a CVE ID:

f700546682a6 ("rsi: fix nommu_map_sg overflow kernel panic") 3c68b8fffb48 ("dpaa_eth: FMan erratum A050385 workaround") 536dc5df2808 ("hv_netvsc: Fix memory leak when removing rndis device")

Could we agree here that important commits from drivers/net/ don't have a fixes tag but still should end up in stable?

...

...
...
We've been coaching new comers for years during internal and on-list code reviews to put proper fixes tag. This serves (A) for the upstream human review of the patch and (B) reasonable human stable considerations.

Thanks for doing it - we do see more and more fixes tags.

...
You are practically saying that for cases we screwed up stage (A) you can somehow still get away with good results on stage (B) - I don't accept it. BTW - during my reviews I tend to ask/require developers to skip the word panic, and instead better explain the nature of the problem / result.

Humans are still humans, and humans make mistakes. Fixes tags get forgotten

In the netdev land, the very much usual habit is for -rc patches to have Fixes tag, so maybe some RCA comment here would be to enforce that better across the kermel land?

I'd be very happy to see better tagging (not just fixes, but also reported-by/tested-by/etc).

...

...
stable tags get forgotten.

An easy AI would be to deduce the -stable tag from the -fixes tag, just find out where the offending patch was accepted and never/ever (please!) push a fix beyond that kernel.

This is tricky because not all commits with a fixes tag need to go to stable, right?

We have a bunch of commits that "fix" a documentation typo for example.

...

...
I very much belive you that the mellanox stuff are in good shape thanks to your efforts, but the kernel world is bigger than a few select drivers.

...
...
...
...
...
This is great, but the kernel is more than just net/. Note that I also do not look at net/ itself, but rather drivers/net/ as those end up with a bunch of missed fixes.

...
...
...
...
drivers/net/ goes through the same DaveM net/net-next trees, with the same rules.

...
...
you ignored this comment, any more specific complaints?

...
See above (the example commits). The drivers/net/ stuff don't work as well as net/ sadly.

Disagree.

If there is a problem it is due to some driver/net/ driver maintainers not doing their job good enough.

Maybe it's just me missing something here, but I thought that drivers/net/ aren't supposed to add a stable tag, they're just supposed to write good commit message, add a fixes tag if necessary, and the netdev maintainers will do the rest?

""" Q: How can I tell what patches are queued up for backporting to the various stable releases? A: Normally Greg Kroah-Hartman collects stable commits himself, but for networking, Dave collects up patches he deems critical for the networking subsystem, and then hands them off to Greg. """

There are 2 things I understood from this:

1. "networking subsystem" == net/ 2. Dave does patch selection.

...

...
...
...
Let me put my Microsoft employee hat on here. We have driver/net/hyperv/ which definitely wasn't getting all the fixes it should have been getting without AUTOSEL.

...
While net/ is doing great, drivers/net/ is not. If it's indeed following the same rules then we need to talk about how we get done right.

I never [1] saw -stable push requests being ignored here in netdev. Your drivers have four listed maintainers and it's common habit by commercial companies to have paid && human (non autosel robots) maintainers that take care of their open source drivers. As in commercial SW products, Linux has a current, next and past (stable) releases, so something sounds as missing to me in your care matrix.

How come? DaveM is specifically asking not to add stable tags because he will do the selection himself, right? So the hyperv stuff indeed don't include a stable tag, but all fixes should have a proper Fixes: tag.

Ask/tell your maintainer to note that in their cover letters, it will be taken.

Or allow them to tag stuff for stable? :)

...

...
So why don't they end up in a stable tree?

...
If we need to send a mail saying which commits should go to stable, we might as well tag them for stable to begin with, right?

so how about you go and argue with the netdev maintainer and settle your complaints instead of WA your disagreements using autosel?

So we had this discussion a while back and what I thought the outcome was that I'll ignore net/ and so I did.

I've mentioned it before - I'm not stuck on doing what I'm doing now for no reason. If the netdev maintainers want this done in a different way, or if you can show me data that what I'm doing is wrong, I'll be happy to change my workflow.

-- Thanks, Sasha

Greg KH

5:20 p.m.

New subject: [PATCH AUTOSEL 4.9 09/26] net/mlx5e: Init ethtool steering for representors

On Thu, Apr 16, 2020 at 04:40:31PM +0300, Or Gerlitz wrote:

...

On Thu, Apr 16, 2020 at 3:00 AM Sasha Levin sashal@kernel.org wrote:

...
I'd maybe point out that the selection process is based on a neural network which knows about the existence of a Fixes tag in a commit.

It does exactly what you're describing, but also taking a bunch more factors into it's desicion process ("panic"? "oops"? "overflow"? etc).

As Saeed commented, every extra line in stable / production kernel is wrong.

What? On what do you base that crazy statement on? I have 18+ years of direct experience of that being the exact opposite.

...

IMHO it doesn't make any sense to take into stable automatically any patch that doesn't have fixes line. Do you have 1/2/3/4/5 concrete examples from your (referring to your Microsoft employee hat comment below) or other's people production environment where patches proved to be necessary but they lacked the fixes tag - would love to see them.

Oh wow, where do you want me to start. I have zillions of these.

But wait, don't trust me, trust a 3rd party. Here's what Google's security team said about the last 9 months of 2019: - 209 known vulnerabilities patched in LTS kernels, most without CVEs - 950+ criticial non-security bugs fixes for device XXXX alone with LTS releases

...

We've been coaching new comers for years during internal and on-list code reviews to put proper fixes tag. This serves (A) for the upstream human review of the patch and (B) reasonable human stable considerations.

If your driver/subsystem is doing this, wonderful, just opt-out of the autosel process and you will never be bothered again.

But, trust me, I think I know a bit about tagging stuff for stable kernels, and yet the AUTOSEL tool keeps finding patches that _I_ forgot to tag as such. So, don't be so sure of yourself, it's humbling :)

Let the AUTOSEL tool run, and if it finds things you don't agree with, a simple "No, please do not include this" email is all you need to do to keep it out of a stable kernel.

So far the AUTOSEL tool has found so many real bugfixes that it isn't funny. If you don't like it, fine, but it has proven itself _way_ beyond my wildest hopes already, and it just keeps getting better.

greg k-h

Saeed Mahameed

7:31 p.m.

New subject: [PATCH AUTOSEL 4.9 09/26] net/mlx5e: Init ethtool steering for representors

On Thu, 2020-04-16 at 19:20 +0200, Greg KH wrote:

...

On Thu, Apr 16, 2020 at 04:40:31PM +0300, Or Gerlitz wrote:

...
On Thu, Apr 16, 2020 at 3:00 AM Sasha Levin sashal@kernel.org wrote:

...
I'd maybe point out that the selection process is based on a neural network which knows about the existence of a Fixes tag in a commit.

It does exactly what you're describing, but also taking a bunch more factors into it's desicion process ("panic"? "oops"? "overflow"? etc).

As Saeed commented, every extra line in stable / production kernel is wrong.

What? On what do you base that crazy statement on? I have 18+ years of direct experience of that being the exact opposite.

Oh, I never said such a thing .. :(

I think Or meant to say: every extra line that no one asked for.

And all i wanted to say is that it can have a catastrophic result.. I know in many cases it is working well, and i didn't say it is wrong, I am just worried about it and wanted to show an example of how it can screw up under the radar with a simple single liner patch..

...

...
IMHO it doesn't make any sense to take into stable automatically any patch that doesn't have fixes line. Do you have 1/2/3/4/5 concrete examples from your (referring to your Microsoft employee hat comment below) or other's people production environment where patches proved to be necessary but they lacked the fixes tag - would love to see them.

Oh wow, where do you want me to start. I have zillions of these.

But wait, don't trust me, trust a 3rd party. Here's what Google's security team said about the last 9 months of 2019:

209 known vulnerabilities patched in LTS kernels, most

without CVEs

950+ criticial non-security bugs fixes for device XXXX alone with LTS releases

So opt-in for these critical or _always_ in use basic kernel sections. but make the default opt-out..

...

...
We've been coaching new comers for years during internal and on- list code reviews to put proper fixes tag. This serves (A) for the upstream human review of the patch and (B) reasonable human stable considerations.

If your driver/subsystem is doing this, wonderful, just opt-out of the autosel process and you will never be bothered again.

There are many legacy devices in the kernel that are not well maintained and being rarely fixed from random users.. if a fix will be picked up to the wrong kernel, it can go unnoticed for years..

...

But, trust me, I think I know a bit about tagging stuff for stable kernels, and yet the AUTOSEL tool keeps finding patches that _I_ forgot to tag as such. So, don't be so sure of yourself, it's humbling :)

Let the AUTOSEL tool run, and if it finds things you don't agree with, a simple "No, please do not include this" email is all you need to do to keep it out of a stable kernel.

So far the AUTOSEL tool has found so many real bugfixes that it isn't funny. If you don't like it, fine, but it has proven itself _way_ beyond my wildest hopes already, and it just keeps getting better.

Now i really don't know what the right balance here, in on one hand, autosel is doing a great job, on the other hand we know it can screw up in some cases, and we know it will.

So we decided to make sacrifices for the greater good ? :)

...

greg k-h

Sasha Levin

7:53 p.m.

New subject: [PATCH AUTOSEL 4.9 09/26] net/mlx5e: Init ethtool steering for representors

On Thu, Apr 16, 2020 at 07:31:25PM +0000, Saeed Mahameed wrote:

...

On Thu, 2020-04-16 at 19:20 +0200, Greg KH wrote:

...
So far the AUTOSEL tool has found so many real bugfixes that it isn't funny. If you don't like it, fine, but it has proven itself _way_ beyond my wildest hopes already, and it just keeps getting better.

Now i really don't know what the right balance here, in on one hand, autosel is doing a great job, on the other hand we know it can screw up in some cases, and we know it will.

So we decided to make sacrifices for the greater good ? :)

autosel is going to screw up, I'm going to screw up, you're going to screw up, and Linus is going screw up. The existence of the stable trees and a "Fixes:" tag is an admission we all screw up, right?

If you're willing to accept that we all make mistakes, you should also accept that we're making mistakes everywhere: we write buggy code, we fail at reviews, we forget tags, and we suck at backporting patches.

If we agree so far, then why do you assume that the same people who do the above also perfectly tag their commits, and do perfect selection of patches for stable? "I'm always right except when I'm wrong".

My view of the the path forward with stable trees is that we have to beef up our validation and testing story to be able to catch these issues better, rather than place arbitrary limitations on parts of the process. To me your suggestions around the Fixes: tag sound like "Never use kmalloc() because people often forget to free memory!" will it prevent memory leaks? sure, but it'll also prevent useful patches from coming it...

Here's my suggestion: give us a test rig we can run our stable release candidates through. Something that simulates "real" load that customers are using. We promise that we won't release a stable kernel if your tests are failing.

-- Thanks, Sasha

Saeed Mahameed

9:32 p.m.

New subject: [PATCH AUTOSEL 4.9 09/26] net/mlx5e: Init ethtool steering for representors

On Thu, 2020-04-16 at 15:53 -0400, Sasha Levin wrote:

...

On Thu, Apr 16, 2020 at 07:31:25PM +0000, Saeed Mahameed wrote:

...
On Thu, 2020-04-16 at 19:20 +0200, Greg KH wrote:

...
So far the AUTOSEL tool has found so many real bugfixes that it isn't funny. If you don't like it, fine, but it has proven itself _way_ beyond my wildest hopes already, and it just keeps getting better.

Now i really don't know what the right balance here, in on one hand, autosel is doing a great job, on the other hand we know it can screw up in some cases, and we know it will.

So we decided to make sacrifices for the greater good ? :)

autosel is going to screw up, I'm going to screw up, you're going to screw up, and Linus is going screw up. The existence of the stable trees and a "Fixes:" tag is an admission we all screw up, right?

Right, so fix this AI and we get one less reason to screw up ?

...

If you're willing to accept that we all make mistakes, you should also accept that we're making mistakes everywhere: we write buggy code, we fail at reviews, we forget tags, and we suck at backporting patches.

If we agree so far, then why do you assume that the same people who do the above also perfectly tag their commits, and do perfect selection of patches for stable? "I'm always right except when I'm wrong".

I am welling to accept people making mistakes, but not the AI..

I am not saying people should be 100% flawless, but i am assuming AI is 100% flawless. if I find a bug in AI I fix. same goes for autosel AI, it must get fixed.

What you are really saying: I don't like bugs, so i wrote an AI that fixes bugs but also can make bugs itself.

...

My view of the the path forward with stable trees is that we have to beef up our validation and testing story to be able to catch these issues better, rather than place arbitrary limitations on parts of the process. To me your suggestions around the Fixes: tag sound like "Never use kmalloc() because people often forget to free memory!" will it prevent memory leaks? sure, but it'll also prevent useful patches from coming it...

No, I will let people do what people do best (make more bugs) this is more than fine.

if it is necessary and we have a magical solution, i will write good AI with no false positives to fix or help avoid memleacks.

BUT if i can't achieve 100% success rate, and i might end up introducing memleack with my AI, then I wouldn't use AI at all.

We have different views on things.. if i know AI is using kmalloc wrongly, I fix it, end of story :).

fact: Your AI is broken, can introduce _new_ un-called for bugs, even it is very very very good 99.99% of the cases.

You are welling to roll the dice, i am not ..

...

Here's my suggestion: give us a test rig we can run our stable release candidates through. Something that simulates "real" load that customers are using. We promise that we won't release a stable kernel if your tests are failing.

I will be more than glad to do so, is there a formal process for such thing ?

Sasha Levin

11:23 p.m.

New subject: [PATCH AUTOSEL 4.9 09/26] net/mlx5e: Init ethtool steering for representors

On Thu, Apr 16, 2020 at 09:32:47PM +0000, Saeed Mahameed wrote:

...

On Thu, 2020-04-16 at 15:53 -0400, Sasha Levin wrote:

...
If we agree so far, then why do you assume that the same people who do the above also perfectly tag their commits, and do perfect selection of patches for stable? "I'm always right except when I'm wrong".

I am welling to accept people making mistakes, but not the AI..

This is where we disagree. If I can have an AI that performs on par with an "average" kernel engineer - I'm happy with it.

The way I see AUTOSEL now is an "average" kernel engineer who does patch sorting for me to review.

Given I review everything that it spits out at me, it's technically a human error (mine), rather than a problem with the AI, right?

...

if it is necessary and we have a magical solution, i will write good AI with no false positives to fix or help avoid memleacks.

Easier said than done :)

I think that the "Intelligence" in AI suggests that it can be making mistakes.

...

BUT if i can't achieve 100% success rate, and i might end up introducing memleack with my AI, then I wouldn't use AI at all.

We have different views on things.. if i know AI is using kmalloc wrongly, I fix it, end of story :).

fact: Your AI is broken, can introduce _new_ un-called for bugs, even it is very very very good 99.99% of the cases.

People are broken too, they introduce new bugs, so why are we accepting new commits into the kernel?

My point is that everything is broken, you can't have 100% perfect anything.

...

...
Here's my suggestion: give us a test rig we can run our stable release candidates through. Something that simulates "real" load that customers are using. We promise that we won't release a stable kernel if your tests are failing.

I will be more than glad to do so, is there a formal process for such thing ?

I'd love to work with you on this if you're interested. There are a few options:

1. Send us a mail when you detect a push to a stable-rc branch. Most people/bots reply to Greg's announce mail with pass/fail.

2. Integrate your tests into kernelci (kernelci.org) - this means that you'll run a "lab" on prem, and kernelci will schedule builds and tests on it's own, sending reports to us.

3. We're open to other solutions if you had something in mind, the first two usually work for people but if you have a different requirement we'll be happy to figure it out.

-- Thanks, Sasha

Saeed Mahameed

21 Apr 21 Apr

3:07 a.m.

New subject: [PATCH AUTOSEL 4.9 09/26] net/mlx5e: Init ethtool steering for representors

On Thu, 2020-04-16 at 19:23 -0400, Sasha Levin wrote:

...

On Thu, Apr 16, 2020 at 09:32:47PM +0000, Saeed Mahameed wrote:

...
On Thu, 2020-04-16 at 15:53 -0400, Sasha Levin wrote:

...
If we agree so far, then why do you assume that the same people who do the above also perfectly tag their commits, and do perfect selection of patches for stable? "I'm always right except when I'm wrong".

I am welling to accept people making mistakes, but not the AI..

This is where we disagree. If I can have an AI that performs on par with an "average" kernel engineer - I'm happy with it.

The way I see AUTOSEL now is an "average" kernel engineer who does patch sorting for me to review.

Given I review everything that it spits out at me, it's technically a human error (mine), rather than a problem with the AI, right?

...
if it is necessary and we have a magical solution, i will write good AI with no false positives to fix or help avoid memleacks.

Easier said than done :)

I think that the "Intelligence" in AI suggests that it can be making mistakes.

...
BUT if i can't achieve 100% success rate, and i might end up introducing memleack with my AI, then I wouldn't use AI at all.

We have different views on things.. if i know AI is using kmalloc wrongly, I fix it, end of story :).

fact: Your AI is broken, can introduce _new_ un-called for bugs, even it is very very very good 99.99% of the cases.

People are broken too, they introduce new bugs, so why are we accepting new commits into the kernel?

My point is that everything is broken, you can't have 100% perfect anything.

...
...
Here's my suggestion: give us a test rig we can run our stable release candidates through. Something that simulates "real" load that customers are using. We promise that we won't release a stable kernel if your tests are failing.

I will be more than glad to do so, is there a formal process for such thing ?

I'd love to work with you on this if you're interested. There are a few options:

Send us a mail when you detect a push to a stable-rc branch. Most

people/bots reply to Greg's announce mail with pass/fail.

Sounds like our best option for now, as we already have our own testing infra that knows how to watch for external changes in mailing lists.

...

Integrate your tests into kernelci (kernelci.org) - this means

that you'll run a "lab" on prem, and kernelci will schedule builds and tests on it's own, sending reports to us.

We're open to other solutions if you had something in mind, the

first two usually work for people but if you have a different requirement we'll be happy to figure it out.

Thanks,

I will have to discuss this with our CI maintainers and see what we prefer.

i will let you know.

Jakub Kicinski

16 Apr 16 Apr

8:08 p.m.

New subject: [PATCH AUTOSEL 4.9 09/26] net/mlx5e: Init ethtool steering for representors

On Thu, 16 Apr 2020 19:31:25 +0000 Saeed Mahameed wrote:

...

...
...
IMHO it doesn't make any sense to take into stable automatically any patch that doesn't have fixes line. Do you have 1/2/3/4/5 concrete examples from your (referring to your Microsoft employee hat comment below) or other's people production environment where patches proved to be necessary but they lacked the fixes tag - would love to see them.

Oh wow, where do you want me to start. I have zillions of these.

But wait, don't trust me, trust a 3rd party. Here's what Google's security team said about the last 9 months of 2019:

209 known vulnerabilities patched in LTS kernels, most

without CVEs

950+ criticial non-security bugs fixes for device XXXX alone with LTS releases

So opt-in for these critical or _always_ in use basic kernel sections. but make the default opt-out..

But the less attentive/weaker the maintainers the more benefit from autosel they get. The default has to be correct for the group which is more likely to take no action.

Saeed Mahameed

9:11 p.m.

New subject: [PATCH AUTOSEL 4.9 09/26] net/mlx5e: Init ethtool steering for representors

On Thu, 2020-04-16 at 13:08 -0700, Jakub Kicinski wrote:

...

On Thu, 16 Apr 2020 19:31:25 +0000 Saeed Mahameed wrote:

...
...
...
IMHO it doesn't make any sense to take into stable automatically any patch that doesn't have fixes line. Do you have 1/2/3/4/5 concrete examples from your (referring to your Microsoft employee hat comment below) or other's people production environment where patches proved to be necessary but they lacked the fixes tag - would love to see them.

Oh wow, where do you want me to start. I have zillions of these.

But wait, don't trust me, trust a 3rd party. Here's what Google's security team said about the last 9 months of 2019:

209 known vulnerabilities patched in LTS kernels, most

without CVEs

950+ criticial non-security bugs fixes for device XXXX alone with LTS releases

So opt-in for these critical or _always_ in use basic kernel sections. but make the default opt-out..

But the less attentive/weaker the maintainers the more benefit from autosel they get. The default has to be correct for the group which is more likely to take no action.

or the more exposed they are to false positives :), unnoticed bugs due to wrong patches getting backported.. this could go for years for less attentive weaker modules, until someone steps on it.

gregkh＠linuxfoundation.org

17 Apr 17 Apr

8:25 a.m.

New subject: [PATCH AUTOSEL 4.9 09/26] net/mlx5e: Init ethtool steering for representors

On Thu, Apr 16, 2020 at 09:11:38PM +0000, Saeed Mahameed wrote:

...

On Thu, 2020-04-16 at 13:08 -0700, Jakub Kicinski wrote:

...
On Thu, 16 Apr 2020 19:31:25 +0000 Saeed Mahameed wrote:

...
...
...
IMHO it doesn't make any sense to take into stable automatically any patch that doesn't have fixes line. Do you have 1/2/3/4/5 concrete examples from your (referring to your Microsoft employee hat comment below) or other's people production environment where patches proved to be necessary but they lacked the fixes tag - would love to see them.

Oh wow, where do you want me to start. I have zillions of these.

But wait, don't trust me, trust a 3rd party. Here's what Google's security team said about the last 9 months of 2019:

209 known vulnerabilities patched in LTS kernels, most

without CVEs

950+ criticial non-security bugs fixes for device XXXX alone with LTS releases

So opt-in for these critical or _always_ in use basic kernel sections. but make the default opt-out..

But the less attentive/weaker the maintainers the more benefit from autosel they get. The default has to be correct for the group which is more likely to take no action.

or the more exposed they are to false positives :), unnoticed bugs due to wrong patches getting backported.. this could go for years for less attentive weaker modules, until someone steps on it.

Bugs due to only a limited set of patches being backported are generally very rare compared to the known bugs being present that are not fixed by not backporting patches.

Play the odds, they are not in your favor at the moment :)

thanks,

greg k-h

Edward Cree

16 Apr 16 Apr

4:06 p.m.

New subject: [PATCH AUTOSEL 4.9 09/26] net/mlx5e: Init ethtool steering for representors

On 16/04/2020 01:00, Sasha Levin wrote:

...

I'd maybe point out that the selection process is based on a neural network which knows about the existence of a Fixes tag in a commit.

It does exactly what you're describing, but also taking a bunch more factors into it's desicion process ("panic"? "oops"? "overflow"? etc).

Yeah, that's why I found it odd that you were responding in a way that _looked like_ classic confusion of P(A|B) and P(B|A). I just wanted to make sure we had that common ground before launching into a long Bayesian explanation.

So, let's go: Let's imagine that 10% of all commits are stable-worthy, and we have a threshold that says we autosel a patch if we think there's better than 50% chance that it's stable-worthy. Then 50% of stable-worthy commits have a Fixes: tag (whose referent exists in the stable tree already), whereas some unknown fraction, let's say 5%, of non-stable-worthy commits have a Fixes: tag. Then P(S|F) = P(F|S)P(S) / (P(F|S)P(S) + P(F|¬S)P(¬S)) = 0.05 / (0.05 + 0.045) = 0.526... That is, a patch with a Fixes: tag is 52.6% likely to be stable-worthy, *not* 50%. (The disparity would be bigger if P(F|¬S) were smaller; conversely, if P(F|¬S) were larger, P(S|F) could be _less than_ 50%.) But also, P(S|¬F) = P(¬F|S)P(S) / (P(¬F|S)P(S) + P(¬F|¬S)P(¬S)) = 0.05 / (0.05 + 0.855) = 0.055... That is, a patch without a Fixes: tag is only 5.5% likely to be stable- worthy, which is *less* than the 10% base rate for all patches. So now you need to get *more* of the positive evidence (panic/oops/overflow etc.) before you get pushed over the 40% threshold. Thus "increase the amount of countervailing evidence needed".

...

most fixes in -stable *don't* have a fixes tag. Shouldn't your argument be the opposite? If a patch has a fixes tag, it's probably not a fix?

I hope it's now clear that this statement confuses P(S|F) with P(F|S).

...

Let me put my Microsoft employee hat on here. We have driver/net/hyperv/ which definitely wasn't getting all the fixes it should have been getting without AUTOSEL.

While net/ is doing great, drivers/net/ is not. If it's indeed following the same rules then we need to talk about how we get done right.

I really have no objection to not looking in drivers/net/, it's just that the experience I had with the process suggests that it's not following the same process as net/.

Again, I'm not saying "don't look in drivers/net/", I'm saying increase the probability threshold there: because _some_ of the stable candidates have already been picked up by our process, the pickings in what's left are thinner, i.e. the base rate P(S) is lower, so you need _more_ evidence before deciding to autosel something. (I don't know exactly how your NN is set up; is it able to use information like "is in drivers/net/" as an input node?) Part of the trouble is that the NN is trained on "did this go to stable eventually", whereas being in drivers/net/ is (on this theory) only a signal in the case where it didn't go to stable _initially_ and had to be caught later; is that information also present in your training data? The NN would only be expected to learn about drivers/net/ for itself if that were the case, otherwise it would have no way of knowing about the lowered base rate. Conversely, if it *did* have that information (was this sent to stable by maintainer's own processes, or was it found later to have been missed) in the training data, it could learn these things by itself and there'd be no need to do anything special for drivers/net/ (or, arguably, even for net/).

...

How come? DaveM is specifically asking not to add stable tags because he will do the selection himself, right?

Driver maintainers sending patch series to Dave often include in the cover letter "please consider patches 4, 7, 8 for stable". It's *directly* CCing stable on patch submissions that Dave asks people not to do. And it sounds from your Microsoft-hat like the HyperV maintainers might be under that same misapprehension, if their stuff isn't making it to stable as much as it should be. But I haven't checked.

-ed

Sasha Levin

6:49 p.m.

New subject: [PATCH AUTOSEL 4.9 09/26] net/mlx5e: Init ethtool steering for representors

On Thu, Apr 16, 2020 at 05:06:46PM +0100, Edward Cree wrote:

...

On 16/04/2020 01:00, Sasha Levin wrote:

...
I'd maybe point out that the selection process is based on a neural network which knows about the existence of a Fixes tag in a commit.

It does exactly what you're describing, but also taking a bunch more factors into it's desicion process ("panic"? "oops"? "overflow"? etc).

Yeah, that's why I found it odd that you were responding in a way that _looked like_ classic confusion of P(A|B) and P(B|A). I just wanted to make sure we had that common ground before launching into a long Bayesian explanation.

Just a question while I process your explanation (thanks for doing it!): wouldn't this be done by the neural network?

It learns what a stable worthy commit is (and what isn't), and applies weights based on these findings, right? So if it learns that most non-stable commits don't have a fixes tag, it's likely to use that and "require" other inputs to have enough weight to compensate over a missing fixes tag so that it'll pass the threshold, no?

-- Thanks, Sasha

Edward Cree

20 Apr 20 Apr

11:45 a.m.

New subject: [PATCH AUTOSEL 4.9 09/26] net/mlx5e: Init ethtool steering for representors

On 16/04/2020 19:49, Sasha Levin wrote:

...

Just a question while I process your explanation (thanks for doing it!): wouldn't this be done by the neural network?

Yes, in the basic case. (Hopefully we're agreed that this is a long way from "I'm not sure what a fixes tag has to do with inclusion in a stable tree.", which is how this whole brouhaha started.)

...

It learns what a stable worthy commit is (and what isn't), and applies weights based on these findings, right? So if it learns that most non-stable commits don't have a fixes tag, it's likely to use that and "require" other inputs to have enough weight to compensate over a missing fixes tag so that it'll pass the threshold, no?

Yes. The problem comes when there are other inputs the NN doesn't have, that ought to screen off some of the information it's using. This is probably best illustrated by an unrealistic extreme case... Let's imagine hypothetically that the maintainer of drivers/blub is an absolutely perfect judge of which patches should go to -stable, and that the transmission path from him to the stable trees never loses a patch. This would mean that every autosel patch in drivers/blub is necessarily a false positive, because all the 'true positives' it might have found have already been taken out of the pool, so to speak. But if the NN is just trained to discriminate patches on whether they end up going to stable, it won't see any difference between a drivers/blub patch that the maintainer sent to stable straight away and a drivers/wibble patch that the latter's less diligent maintainer didn't forward and that only got picked up later when a stable kernel user encountered the bug it was fixing. As long as the NN doesn't have that piece of information, it's going to either generate lots of false positives in drivers/blub or lots of false negatives in drivers/wibble. Now obviously drivers/blub doesn't exist, no maintainer is 100% perfect at -stable submissions; but any difference will produce the same effect on a smaller scale, with the 'blubbish' maintainers seeing a high false positive fraction while from the 'wibblesome' maintainer's point of view autosel is working great. And since the 'blubs' are the ones who're putting effort of their own into stable selection already, they get aggrieved at having to also put effort into catching the false positives from a system that doesn't seem to be doing much for them, and everyone ends up shouting at each other as we're seeing here.

(Do you want me to do another worked numerical example demonstrating the above, or does it make enough sense in words not to need one?)

-ed

Sasha Levin

12:53 p.m.

New subject: [PATCH AUTOSEL 4.9 09/26] net/mlx5e: Init ethtool steering for representors

On Mon, Apr 20, 2020 at 12:45:36PM +0100, Edward Cree wrote:

...

On 16/04/2020 19:49, Sasha Levin wrote:

...
Just a question while I process your explanation (thanks for doing it!): wouldn't this be done by the neural network?

Yes, in the basic case. (Hopefully we're agreed that this is a long way from "I'm not sure what a fixes tag has to do with inclusion in a stable tree.", which is how this whole brouhaha started.)

My point was more that having or not having a fixes tag on it's own doesn't guarantee inclusion in the stable trees - that's why we have and explicit stable tag. What was said was (for me) the equivalent of "my commit message contains the word 'panic', why wasn't it picked?"

A Fixes tag affects the probability of a commit being picked up by AUTOSEL, yes, but it's not a reliable way to include or exclude patches from the stable tree.

It may also sound counter-intuitive but my long term plan (hope) is for AUTOSEL to die because maintainers got better at tagging patches. I don't want to keep doing this forever :)

...

...
It learns what a stable worthy commit is (and what isn't), and applies weights based on these findings, right? So if it learns that most non-stable commits don't have a fixes tag, it's likely to use that and "require" other inputs to have enough weight to compensate over a missing fixes tag so that it'll pass the threshold, no?

Yes. The problem comes when there are other inputs the NN doesn't have, that ought to screen off some of the information it's using. This is probably best illustrated by an unrealistic extreme case...

It's actually not that unrealistic. We have a few subsystems that do a great job with patch selection, and I usually don't find any other patches to pick up from there, while some other subsystems in the kernel require us to pick almost every patch that flows in there (think files that contain device quirks for example).

I've tried to address that by also including the modified filename into the inputs of the NN, so that the NN is better at acting differently based on the subsystem/filename being patched.

For mlx5, for example, there are two ways it would differentiate it from everything else:

- Commit subject lines usually start with net/mlx5, which is used as input to the NN. - Filenames touch drivers/net/ethernet/mellanox/mlx5/*

Anyway, yes - I understand your bigger point here around missing information from the NN. I'd like to think that based on previous experience it does a good job of balancing everything, but I might be mistaken.

...

Let's imagine hypothetically that the maintainer of drivers/blub is an absolutely perfect judge of which patches should go to -stable, and that the transmission path from him to the stable trees never loses a patch. This would mean that every autosel patch in drivers/blub is necessarily a false positive, because all the 'true positives' it might have found have already been taken out of the pool, so to speak. But if the NN is just trained to discriminate patches on whether they end up going to stable, it won't see any difference between a drivers/blub patch that the maintainer sent to stable straight away and a drivers/wibble patch that the latter's less diligent maintainer didn't forward and that only got picked up later when a stable kernel user encountered the bug it was fixing. As long as the NN doesn't have that piece of information, it's going to either generate lots of false positives in drivers/blub or lots of false negatives in drivers/wibble. Now obviously drivers/blub doesn't exist, no maintainer is 100% perfect at -stable submissions; but any difference will produce the same effect on a smaller scale, with the 'blubbish' maintainers seeing a high false positive fraction while from the 'wibblesome' maintainer's point of view autosel is working great. And since the 'blubs' are the ones who're putting effort of their own into stable selection already, they get aggrieved at having to also put effort into catching the false positives from a system that doesn't seem to be doing much for them, and everyone ends up shouting at each other as we're seeing here.

(Do you want me to do another worked numerical example demonstrating the above, or does it make enough sense in words not to need one?)

Nope, the example above works, thanks!

-- Thanks, Sasha

Sasha Levin

11 Apr 11 Apr

11:13 p.m.

New subject: [PATCH AUTOSEL 4.9 10/26] Bluetooth: Fix calculation of SCO handle for packet processing

From: Marcel Holtmann marcel@holtmann.org

[ Upstream commit debdedf2eb5a2d9777cabff40900772be13cd9f9 ]

When processing SCO packets, the handle is wrongly assumed as 16-bit value. The actual size is 12-bits and the other 4-bits are used for packet flags.

Signed-off-by: Marcel Holtmann marcel@holtmann.org Signed-off-by: Johan Hedberg johan.hedberg@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- net/bluetooth/hci_core.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/net/bluetooth/hci_core.c b/net/bluetooth/hci_core.c index a70b078ceb3ca..0161bdbda71ec 100644 --- a/net/bluetooth/hci_core.c +++ b/net/bluetooth/hci_core.c @@ -4043,13 +4043,16 @@ static void hci_scodata_packet(struct hci_dev *hdev, struct sk_buff *skb) { struct hci_sco_hdr *hdr = (void *) skb->data; struct hci_conn *conn; - __u16 handle; + __u16 handle, flags;

skb_pull(skb, HCI_SCO_HDR_SIZE);

handle = __le16_to_cpu(hdr->handle); + flags = hci_flags(handle); + handle = hci_handle(handle);

- BT_DBG("%s len %d handle 0x%4.4x", hdev->name, skb->len, handle); + BT_DBG("%s len %d handle 0x%4.4x flags 0x%4.4x", hdev->name, skb->len, + handle, flags);

hdev->stat.sco_rx++;

-- 2.20.1

Sasha Levin

11:13 p.m.

New subject: [PATCH AUTOSEL 4.9 11/26] Bluetooth: guard against controllers sending zero'd events

From: Alain Michaud alainm@chromium.org

[ Upstream commit 08bb4da90150e2a225f35e0f642cdc463958d696 ]

Some controllers have been observed to send zero'd events under some conditions. This change guards against this condition as well as adding a trace to facilitate diagnosability of this condition.

Signed-off-by: Alain Michaud alainm@chromium.org Signed-off-by: Marcel Holtmann marcel@holtmann.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/bluetooth/hci_event.c | 6 ++++++ 1 file changed, 6 insertions(+)

diff --git a/net/bluetooth/hci_event.c b/net/bluetooth/hci_event.c index 6f78489fdb132..227b2e14eaeac 100644 --- a/net/bluetooth/hci_event.c +++ b/net/bluetooth/hci_event.c @@ -5249,6 +5249,11 @@ void hci_event_packet(struct hci_dev *hdev, struct sk_buff *skb) u8 status = 0, event = hdr->evt, req_evt = 0; u16 opcode = HCI_OP_NOP;

+ if (!event) { + bt_dev_warn(hdev, "Received unexpected HCI Event 00000000"); + goto done; + } + if (hdev->sent_cmd && bt_cb(hdev->sent_cmd)->hci.req_event == event) { struct hci_command_hdr *cmd_hdr = (void *) hdev->sent_cmd->data; opcode = __le16_to_cpu(cmd_hdr->opcode); @@ -5460,6 +5465,7 @@ void hci_event_packet(struct hci_dev *hdev, struct sk_buff *skb) req_complete_skb(hdev, status, opcode, orig_skb); }

+done: kfree_skb(orig_skb); kfree_skb(skb); hdev->stat.evt_rx++;

-- 2.20.1

Sasha Levin

11:13 p.m.

New subject: [PATCH AUTOSEL 4.9 12/26] RDMA/rxe: Fix configuration of atomic queue pair attributes

From: Bart Van Assche bvanassche@acm.org

[ Upstream commit fb3063d31995cc4cf1d47a406bb61d6fb1b1d58d ]

...

From the comment above the definition of the roundup_pow_of_two() macro:

The result is undefined when n == 0.

Hence only pass positive values to roundup_pow_of_two(). This patch fixes the following UBSAN complaint:

UBSAN: Undefined behaviour in ./include/linux/log2.h:57:13 shift exponent 64 is too large for 64-bit type 'long unsigned int' Call Trace: dump_stack+0xa5/0xe6 ubsan_epilogue+0x9/0x26 __ubsan_handle_shift_out_of_bounds.cold+0x4c/0xf9 rxe_qp_from_attr.cold+0x37/0x5d [rdma_rxe] rxe_modify_qp+0x59/0x70 [rdma_rxe] _ib_modify_qp+0x5aa/0x7c0 [ib_core] ib_modify_qp+0x3b/0x50 [ib_core] cma_modify_qp_rtr+0x234/0x260 [rdma_cm] __rdma_accept+0x1a7/0x650 [rdma_cm] nvmet_rdma_cm_handler+0x1286/0x14cd [nvmet_rdma] cma_cm_event_handler+0x6b/0x330 [rdma_cm] cma_ib_req_handler+0xe60/0x22d0 [rdma_cm] cm_process_work+0x30/0x140 [ib_cm] cm_req_handler+0x11f4/0x1cd0 [ib_cm] cm_work_handler+0xb8/0x344e [ib_cm] process_one_work+0x569/0xb60 worker_thread+0x7a/0x5d0 kthread+0x1e6/0x210 ret_from_fork+0x24/0x30

Link: https://lore.kernel.org/r/20200217205714.26937-1-bvanassche@acm.org Fixes: 8700e3e7c485 ("Soft RoCE driver") Signed-off-by: Bart Van Assche bvanassche@acm.org Reviewed-by: Leon Romanovsky leonro@mellanox.com Signed-off-by: Jason Gunthorpe jgg@mellanox.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/infiniband/sw/rxe/rxe_qp.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/drivers/infiniband/sw/rxe/rxe_qp.c b/drivers/infiniband/sw/rxe/rxe_qp.c index d6672127808b7..186da467060cc 100644 --- a/drivers/infiniband/sw/rxe/rxe_qp.c +++ b/drivers/infiniband/sw/rxe/rxe_qp.c @@ -597,15 +597,16 @@ int rxe_qp_from_attr(struct rxe_qp *qp, struct ib_qp_attr *attr, int mask, struct ib_gid_attr sgid_attr;

if (mask & IB_QP_MAX_QP_RD_ATOMIC) { - int max_rd_atomic = __roundup_pow_of_two(attr->max_rd_atomic); + int max_rd_atomic = attr->max_rd_atomic ? + roundup_pow_of_two(attr->max_rd_atomic) : 0;

qp->attr.max_rd_atomic = max_rd_atomic; atomic_set(&qp->req.rd_atomic, max_rd_atomic); }

if (mask & IB_QP_MAX_DEST_RD_ATOMIC) { - int max_dest_rd_atomic = - __roundup_pow_of_two(attr->max_dest_rd_atomic); + int max_dest_rd_atomic = attr->max_dest_rd_atomic ? + roundup_pow_of_two(attr->max_dest_rd_atomic) : 0;

qp->attr.max_dest_rd_atomic = max_dest_rd_atomic;

-- 2.20.1

Sasha Levin

11:14 p.m.

New subject: [PATCH AUTOSEL 4.9 13/26] net: intel: e1000e: fix possible sleep-in-atomic-context bugs in e1000e_get_hw_semaphore()

From: Jia-Ju Bai baijiaju1990@gmail.com

[ Upstream commit 2e05f756c7099c8991142382648a37b0d4c85943 ]

The driver may sleep while holding a spinlock. The function call path (from bottom to top) in Linux 4.19 is:

drivers/net/ethernet/intel/e1000e/mac.c, 1366: usleep_range in e1000e_get_hw_semaphore drivers/net/ethernet/intel/e1000e/80003es2lan.c, 322: e1000e_get_hw_semaphore in e1000_release_swfw_sync_80003es2lan drivers/net/ethernet/intel/e1000e/80003es2lan.c, 197: e1000_release_swfw_sync_80003es2lan in e1000_release_phy_80003es2lan drivers/net/ethernet/intel/e1000e/netdev.c, 4883: (FUNC_PTR) e1000_release_phy_80003es2lan in e1000e_update_phy_stats drivers/net/ethernet/intel/e1000e/netdev.c, 4917: e1000e_update_phy_stats in e1000e_update_stats drivers/net/ethernet/intel/e1000e/netdev.c, 5945: e1000e_update_stats in e1000e_get_stats64 drivers/net/ethernet/intel/e1000e/netdev.c, 5944: spin_lock in e1000e_get_stats64

drivers/net/ethernet/intel/e1000e/mac.c, 1384: usleep_range in e1000e_get_hw_semaphore drivers/net/ethernet/intel/e1000e/80003es2lan.c, 322: e1000e_get_hw_semaphore in e1000_release_swfw_sync_80003es2lan drivers/net/ethernet/intel/e1000e/80003es2lan.c, 197: e1000_release_swfw_sync_80003es2lan in e1000_release_phy_80003es2lan drivers/net/ethernet/intel/e1000e/netdev.c, 4883: (FUNC_PTR) e1000_release_phy_80003es2lan in e1000e_update_phy_stats drivers/net/ethernet/intel/e1000e/netdev.c, 4917: e1000e_update_phy_stats in e1000e_update_stats drivers/net/ethernet/intel/e1000e/netdev.c, 5945: e1000e_update_stats in e1000e_get_stats64 drivers/net/ethernet/intel/e1000e/netdev.c, 5944: spin_lock in e1000e_get_stats64

(FUNC_PTR) means a function pointer is called.

To fix these bugs, usleep_range() is replaced with udelay().

These bugs are found by a static analysis tool STCheck written by myself.

Signed-off-by: Jia-Ju Bai baijiaju1990@gmail.com Tested-by: Aaron Brown aaron.f.brown@intel.com Signed-off-by: Jeff Kirsher jeffrey.t.kirsher@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/intel/e1000e/mac.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/intel/e1000e/mac.c b/drivers/net/ethernet/intel/e1000e/mac.c index 5bdc3a2d4fd70..0fc9f49844632 100644 --- a/drivers/net/ethernet/intel/e1000e/mac.c +++ b/drivers/net/ethernet/intel/e1000e/mac.c @@ -1381,7 +1381,7 @@ s32 e1000e_get_hw_semaphore(struct e1000_hw *hw) if (!(swsm & E1000_SWSM_SMBI)) break;

- usleep_range(50, 100); + udelay(100); i++; }

@@ -1399,7 +1399,7 @@ s32 e1000e_get_hw_semaphore(struct e1000_hw *hw) if (er32(SWSM) & E1000_SWSM_SWESMBI) break;

- usleep_range(50, 100); + udelay(100); }

if (i == timeout) {

-- 2.20.1

Sasha Levin

11:14 p.m.

New subject: [PATCH AUTOSEL 4.9 14/26] crypto: tcrypt - fix printed skcipher [a]sync mode

From: Horia Geantă horia.geanta@nxp.com

[ Upstream commit 8e3b7fd7ea554ccb1bdc596bfbcdaf56f7ab017c ]

When running tcrypt skcipher speed tests, logs contain things like: testing speed of async ecb(des3_ede) (ecb(des3_ede-generic)) encryption or: testing speed of async ecb(aes) (ecb(aes-ce)) encryption

The algorithm implementations are sync, not async. Fix this inaccuracy.

Fixes: 7166e589da5b6 ("crypto: tcrypt - Use skcipher") Signed-off-by: Horia Geantă horia.geanta@nxp.com Signed-off-by: Herbert Xu herbert@gondor.apana.org.au Signed-off-by: Sasha Levin sashal@kernel.org --- crypto/tcrypt.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/crypto/tcrypt.c b/crypto/tcrypt.c index babbda230c07b..42fe4fd0f4ca5 100644 --- a/crypto/tcrypt.c +++ b/crypto/tcrypt.c @@ -885,8 +885,8 @@ static void test_skcipher_speed(const char *algo, int enc, unsigned int secs, return; }

- pr_info("\ntesting speed of async %s (%s) %s\n", algo, - get_driver_name(crypto_skcipher, tfm), e); + pr_info("\ntesting speed of %s %s (%s) %s\n", async ? "async" : "sync", + algo, get_driver_name(crypto_skcipher, tfm), e);

req = skcipher_request_alloc(tfm, GFP_KERNEL); if (!req) {

-- 2.20.1

Sasha Levin

11:14 p.m.

New subject: [PATCH AUTOSEL 4.9 15/26] drm/omap: fix possible object reference leak

From: Wen Yang wen.yang99@zte.com.cn

[ Upstream commit 47340e46f34a3b1d80e40b43ae3d7a8da34a3541 ]

The call to of_find_matching_node returns a node pointer with refcount incremented thus it must be explicitly decremented after the last usage.

Detected by coccinelle with the following warnings: drivers/gpu/drm/omapdrm/dss/omapdss-boot-init.c:212:2-8: ERROR: missing of_node_put; acquired a node pointer with refcount incremented on line 209, but without a corresponding object release within this function. drivers/gpu/drm/omapdrm/dss/omapdss-boot-init.c:237:1-7: ERROR: missing of_node_put; acquired a node pointer with refcount incremented on line 209, but without a corresponding object release within this function.

Signed-off-by: Wen Yang wen.yang99@zte.com.cn Reviewed-by: Laurent Pinchart laurent.pinchart@ideasonboard.com Reviewed-by: Mukesh Ojha mojha@codeaurora.org Cc: Tomi Valkeinen tomi.valkeinen@ti.com Cc: David Airlie airlied@linux.ie Cc: Daniel Vetter daniel@ffwll.ch Cc: Sebastian Reichel sebastian.reichel@collabora.com Cc: Laurent Pinchart laurent.pinchart@ideasonboard.com Cc: dri-devel@lists.freedesktop.org Cc: linux-kernel@vger.kernel.org Cc: Markus Elfring Markus.Elfring@web.de Signed-off-by: Tomi Valkeinen tomi.valkeinen@ti.com Link: https://patchwork.freedesktop.org/patch/msgid/1554692313-28882-2-git-send-em... Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/omapdrm/dss/omapdss-boot-init.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/omapdrm/dss/omapdss-boot-init.c b/drivers/gpu/drm/omapdrm/dss/omapdss-boot-init.c index 136d30484d023..46111e9ee9a25 100644 --- a/drivers/gpu/drm/omapdrm/dss/omapdss-boot-init.c +++ b/drivers/gpu/drm/omapdrm/dss/omapdss-boot-init.c @@ -194,7 +194,7 @@ static int __init omapdss_boot_init(void) dss = of_find_matching_node(NULL, omapdss_of_match);

if (dss == NULL || !of_device_is_available(dss)) - return 0; + goto put_node;

omapdss_walk_device(dss, true);

@@ -219,6 +219,8 @@ static int __init omapdss_boot_init(void) kfree(n); }

+put_node: + of_node_put(dss); return 0; }

-- 2.20.1

Sasha Levin

11:14 p.m.

New subject: [PATCH AUTOSEL 4.9 16/26] audit: CONFIG_CHANGE don't log internal bookkeeping as an event

From: Steve Grubb sgrubb@redhat.com

[ Upstream commit 70b3eeed49e8190d97139806f6fbaf8964306cdb ]

Common Criteria calls out for any action that modifies the audit trail to be recorded. That usually is interpreted to mean insertion or removal of rules. It is not required to log modification of the inode information since the watch is still in effect. Additionally, if the rule is a never rule and the underlying file is one they do not want events for, they get an event for this bookkeeping update against their wishes.

Since no device/inode info is logged at insertion and no device/inode information is logged on update, there is nothing meaningful being communicated to the admin by the CONFIG_CHANGE updated_rules event. One can assume that the rule was not "modified" because it is still watching the intended target. If the device or inode cannot be resolved, then audit_panic is called which is sufficient.

The correct resolution is to drop logging config_update events since the watch is still in effect but just on another unknown inode.

Signed-off-by: Steve Grubb sgrubb@redhat.com Signed-off-by: Paul Moore paul@paul-moore.com Signed-off-by: Sasha Levin sashal@kernel.org --- kernel/audit_watch.c | 2 -- 1 file changed, 2 deletions(-)

diff --git a/kernel/audit_watch.c b/kernel/audit_watch.c index 712469a3103ac..54b30c9bd8b13 100644 --- a/kernel/audit_watch.c +++ b/kernel/audit_watch.c @@ -316,8 +316,6 @@ static void audit_update_watch(struct audit_parent *parent, if (oentry->rule.exe) audit_remove_mark(oentry->rule.exe);

- audit_watch_log_rule_change(r, owatch, "updated_rules"); - call_rcu(&oentry->rcu, audit_free_rule_rcu); }

-- 2.20.1

Sasha Levin

11:14 p.m.

New subject: [PATCH AUTOSEL 4.9 17/26] Bluetooth: btusb: Add support for 13d3:3548 Realtek 8822CE device

From: Sergey Shatunov me@prok.pw

[ Upstream commit eb3939e386ec8df6049697d388298590231ac79c ]

The ASUS FX505DV laptop contains RTL8822CE device with an associated BT chip using a USB ID of 13d3:3548. This patch add fw download support for it.

T: Bus=03 Lev=01 Prnt=01 Port=03 Cnt=03 Dev#= 4 Spd=12 MxCh= 0 D: Ver= 1.00 Cls=e0(wlcon) Sub=01 Prot=01 MxPS=64 #Cfgs= 1 P: Vendor=13d3 ProdID=3548 Rev= 0.00 S: Manufacturer=Realtek S: Product=Bluetooth Radio S: SerialNumber=00e04c000001 C:* #Ifs= 2 Cfg#= 1 Atr=a0 MxPwr=500mA I:* If#= 0 Alt= 0 #EPs= 3 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb E: Ad=81(I) Atr=03(Int.) MxPS= 16 Ivl=1ms E: Ad=02(O) Atr=02(Bulk) MxPS= 64 Ivl=0ms E: Ad=82(I) Atr=02(Bulk) MxPS= 64 Ivl=0ms I:* If#= 1 Alt= 0 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb E: Ad=03(O) Atr=01(Isoc) MxPS= 0 Ivl=1ms E: Ad=83(I) Atr=01(Isoc) MxPS= 0 Ivl=1ms I: If#= 1 Alt= 1 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb E: Ad=03(O) Atr=01(Isoc) MxPS= 9 Ivl=1ms E: Ad=83(I) Atr=01(Isoc) MxPS= 9 Ivl=1ms I: If#= 1 Alt= 2 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb E: Ad=03(O) Atr=01(Isoc) MxPS= 17 Ivl=1ms E: Ad=83(I) Atr=01(Isoc) MxPS= 17 Ivl=1ms I: If#= 1 Alt= 3 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb E: Ad=03(O) Atr=01(Isoc) MxPS= 25 Ivl=1ms E: Ad=83(I) Atr=01(Isoc) MxPS= 25 Ivl=1ms I: If#= 1 Alt= 4 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb E: Ad=03(O) Atr=01(Isoc) MxPS= 33 Ivl=1ms E: Ad=83(I) Atr=01(Isoc) MxPS= 33 Ivl=1ms I: If#= 1 Alt= 5 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb E: Ad=03(O) Atr=01(Isoc) MxPS= 49 Ivl=1ms E: Ad=83(I) Atr=01(Isoc) MxPS= 49 Ivl=1ms

Signed-off-by: Sergey Shatunov me@prok.pw Signed-off-by: Marcel Holtmann marcel@holtmann.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/bluetooth/btusb.c | 1 + 1 file changed, 1 insertion(+)

diff --git a/drivers/bluetooth/btusb.c b/drivers/bluetooth/btusb.c index 4e3b24a0511ff..7ae86f676b518 100644 --- a/drivers/bluetooth/btusb.c +++ b/drivers/bluetooth/btusb.c @@ -364,6 +364,7 @@ static const struct usb_device_id blacklist_table[] = {

/* Additional Realtek 8822CE Bluetooth devices */ { USB_DEVICE(0x04ca, 0x4005), .driver_info = BTUSB_REALTEK }, + { USB_DEVICE(0x13d3, 0x3548), .driver_info = BTUSB_REALTEK },

/* Silicon Wave based devices */ { USB_DEVICE(0x0c10, 0x0000), .driver_info = BTUSB_SWAVE },

-- 2.20.1

Sasha Levin

11:14 p.m.

New subject: [PATCH AUTOSEL 4.9 18/26] scsi: lpfc: Fix RQ buffer leakage when no IOCBs available

From: James Smart jsmart2021@gmail.com

[ Upstream commit 39c4f1a965a9244c3ba60695e8ff8da065ec6ac4 ]

The driver is occasionally seeing the following SLI Port error, requiring reset and reinit:

Port Status Event: ... error 1=0x52004a01, error 2=0x218

The failure means an RQ timeout. That is, the adapter had received asynchronous receive frames, ran out of buffer slots to place the frames, and the driver did not replenish the buffer slots before a timeout occurred. The driver should not be so slow in replenishing buffers that a timeout can occur.

When the driver received all the frames of a sequence, it allocates an IOCB to put the frames in. In a situation where there was no IOCB available for the frame of a sequence, the RQ buffer corresponding to the first frame of the sequence was not returned to the FW. Eventually, with enough traffic encountering the situation, the timeout occurred.

Fix by releasing the buffer back to firmware whenever there is no IOCB for the first frame.

[mkp: typo]

Link: https://lore.kernel.org/r/20200128002312.16346-2-jsmart2021@gmail.com Signed-off-by: Dick Kennedy dick.kennedy@broadcom.com Signed-off-by: James Smart jsmart2021@gmail.com Signed-off-by: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/scsi/lpfc/lpfc_sli.c | 4 ++++ 1 file changed, 4 insertions(+)

diff --git a/drivers/scsi/lpfc/lpfc_sli.c b/drivers/scsi/lpfc/lpfc_sli.c index cbe808e83f477..f441d95b2f129 100644 --- a/drivers/scsi/lpfc/lpfc_sli.c +++ b/drivers/scsi/lpfc/lpfc_sli.c @@ -15646,6 +15646,10 @@ lpfc_prep_seq(struct lpfc_vport *vport, struct hbq_dmabuf *seq_dmabuf) list_add_tail(&iocbq->list, &first_iocbq->list); } } + /* Free the sequence's header buffer */ + if (!first_iocbq) + lpfc_in_buf_free(vport->phba, &seq_dmabuf->dbuf); + return first_iocbq; }

-- 2.20.1

Sasha Levin

11:14 p.m.

New subject: [PATCH AUTOSEL 4.9 19/26] Bluetooth: RFCOMM: fix ODEBUG bug in rfcomm_dev_ioctl

From: Qiujun Huang hqjagain@gmail.com

[ Upstream commit 71811cac8532b2387b3414f7cd8fe9e497482864 ]

Needn't call 'rfcomm_dlc_put' here, because 'rfcomm_dlc_exists' didn't increase dlc->refcnt.

Reported-by: syzbot+4496e82090657320efc6@syzkaller.appspotmail.com Signed-off-by: Qiujun Huang hqjagain@gmail.com Suggested-by: Hillf Danton hdanton@sina.com Signed-off-by: Marcel Holtmann marcel@holtmann.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/bluetooth/rfcomm/tty.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/net/bluetooth/rfcomm/tty.c b/net/bluetooth/rfcomm/tty.c index 2f2cb5e27cdd4..a8c63ef75f73c 100644 --- a/net/bluetooth/rfcomm/tty.c +++ b/net/bluetooth/rfcomm/tty.c @@ -413,10 +413,8 @@ static int __rfcomm_create_dev(struct sock *sk, void __user *arg) dlc = rfcomm_dlc_exists(&req.src, &req.dst, req.channel); if (IS_ERR(dlc)) return PTR_ERR(dlc); - else if (dlc) { - rfcomm_dlc_put(dlc); + if (dlc) return -EBUSY; - } dlc = rfcomm_dlc_alloc(GFP_KERNEL); if (!dlc) return -ENOMEM;

-- 2.20.1

Sasha Levin

11:14 p.m.

New subject: [PATCH AUTOSEL 4.9 20/26] brcmfmac: Fix driver crash on USB control transfer timeout

From: Raveendran Somu raveendran.somu@cypress.com

[ Upstream commit 93a5bfbc7cad8bf3dea81c9bc07761c1226a0860 ]

When the control transfer gets timed out, the error status was returned without killing that urb, this leads to using the same urb. This issue causes the kernel crash as the same urb is sumbitted multiple times. The fix is to kill the urb for timeout transfer before returning error

Signed-off-by: Raveendran Somu raveendran.somu@cypress.com Signed-off-by: Chi-hsien Lin chi-hsien.lin@cypress.com Signed-off-by: Kalle Valo kvalo@codeaurora.org Link: https://lore.kernel.org/r/1585124429-97371-2-git-send-email-chi-hsien.lin@cy... Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/wireless/broadcom/brcm80211/brcmfmac/usb.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/usb.c b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/usb.c index 31727f34381fe..6a87681b52abf 100644 --- a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/usb.c +++ b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/usb.c @@ -336,11 +336,12 @@ static int brcmf_usb_tx_ctlpkt(struct device *dev, u8 *buf, u32 len) return err; } timeout = brcmf_usb_ioctl_resp_wait(devinfo); - clear_bit(0, &devinfo->ctl_op); if (!timeout) { brcmf_err("Txctl wait timed out\n"); + usb_kill_urb(devinfo->ctl_urb); err = -EIO; } + clear_bit(0, &devinfo->ctl_op); return err; }

@@ -366,11 +367,12 @@ static int brcmf_usb_rx_ctlpkt(struct device *dev, u8 *buf, u32 len) } timeout = brcmf_usb_ioctl_resp_wait(devinfo); err = devinfo->ctl_urb_status; - clear_bit(0, &devinfo->ctl_op); if (!timeout) { brcmf_err("rxctl wait timed out\n"); + usb_kill_urb(devinfo->ctl_urb); err = -EIO; } + clear_bit(0, &devinfo->ctl_op); if (!err) return devinfo->ctl_urb_actual_length; else

-- 2.20.1

Sasha Levin

11:14 p.m.

New subject: [PATCH AUTOSEL 4.9 21/26] RDMA/cm: Update num_paths in cma_resolve_iboe_route error flow

From: Avihai Horon avihaih@mellanox.com

[ Upstream commit 987914ab841e2ec281a35b54348ab109b4c0bb4e ]

After a successful allocation of path_rec, num_paths is set to 1, but any error after such allocation will leave num_paths uncleared.

This causes to de-referencing a NULL pointer later on. Hence, num_paths needs to be set back to 0 if such an error occurs.

The following crash from syzkaller revealed it.

kasan: CONFIG_KASAN_INLINE enabled kasan: GPF could be caused by NULL-ptr deref or user memory access general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN PTI CPU: 0 PID: 357 Comm: syz-executor060 Not tainted 4.18.0+ #311 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.11.0-0-g63451fca13-prebuilt.qemu-project.org 04/01/2014 RIP: 0010:ib_copy_path_rec_to_user+0x94/0x3e0 Code: f1 f1 f1 f1 c7 40 0c 00 00 f4 f4 65 48 8b 04 25 28 00 00 00 48 89 45 c8 31 c0 e8 d7 60 24 ff 48 8d 7b 4c 48 89 f8 48 c1 e8 03 <42> 0f b6 14 30 48 89 f8 83 e0 07 83 c0 03 38 d0 7c 08 84 d2 0f 85 RSP: 0018:ffff88006586f980 EFLAGS: 00010207 RAX: 0000000000000009 RBX: 0000000000000000 RCX: 1ffff1000d5fe475 RDX: ffff8800621e17c0 RSI: ffffffff820d45f9 RDI: 000000000000004c RBP: ffff88006586fa50 R08: ffffed000cb0df73 R09: ffffed000cb0df72 R10: ffff88006586fa70 R11: ffffed000cb0df73 R12: 1ffff1000cb0df30 R13: ffff88006586fae8 R14: dffffc0000000000 R15: ffff88006aff2200 FS: 00000000016fc880(0000) GS:ffff88006d000000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000020000040 CR3: 0000000063fec000 CR4: 00000000000006b0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: ? ib_copy_path_rec_from_user+0xcc0/0xcc0 ? __mutex_unlock_slowpath+0xfc/0x670 ? wait_for_completion+0x3b0/0x3b0 ? ucma_query_route+0x818/0xc60 ucma_query_route+0x818/0xc60 ? ucma_listen+0x1b0/0x1b0 ? sched_clock_cpu+0x18/0x1d0 ? sched_clock_cpu+0x18/0x1d0 ? ucma_listen+0x1b0/0x1b0 ? ucma_write+0x292/0x460 ucma_write+0x292/0x460 ? ucma_close_id+0x60/0x60 ? sched_clock_cpu+0x18/0x1d0 ? sched_clock_cpu+0x18/0x1d0 __vfs_write+0xf7/0x620 ? ucma_close_id+0x60/0x60 ? kernel_read+0x110/0x110 ? time_hardirqs_on+0x19/0x580 ? lock_acquire+0x18b/0x3a0 ? finish_task_switch+0xf3/0x5d0 ? _raw_spin_unlock_irq+0x29/0x40 ? _raw_spin_unlock_irq+0x29/0x40 ? finish_task_switch+0x1be/0x5d0 ? __switch_to_asm+0x34/0x70 ? __switch_to_asm+0x40/0x70 ? security_file_permission+0x172/0x1e0 vfs_write+0x192/0x460 ksys_write+0xc6/0x1a0 ? __ia32_sys_read+0xb0/0xb0 ? entry_SYSCALL_64_after_hwframe+0x3e/0xbe ? do_syscall_64+0x1d/0x470 do_syscall_64+0x9e/0x470 entry_SYSCALL_64_after_hwframe+0x49/0xbe

Fixes: 3c86aa70bf67 ("RDMA/cm: Add RDMA CM support for IBoE devices") Link: https://lore.kernel.org/r/20200318101741.47211-1-leon@kernel.org Signed-off-by: Avihai Horon avihaih@mellanox.com Reviewed-by: Maor Gottlieb maorg@mellanox.com Signed-off-by: Leon Romanovsky leonro@mellanox.com Signed-off-by: Jason Gunthorpe jgg@mellanox.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/infiniband/core/cma.c | 1 + 1 file changed, 1 insertion(+)

diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c index 27653aad8f218..0a6cc78ebcf74 100644 --- a/drivers/infiniband/core/cma.c +++ b/drivers/infiniband/core/cma.c @@ -2568,6 +2568,7 @@ static int cma_resolve_iboe_route(struct rdma_id_private *id_priv) err2: kfree(route->path_rec); route->path_rec = NULL; + route->num_paths = 0; err1: kfree(work); return ret;

-- 2.20.1

Sasha Levin

11:14 p.m.

New subject: [PATCH AUTOSEL 4.9 22/26] ASoC: Intel: Skylake: Enable codec wakeup during chip init

From: Cezary Rojewski cezary.rojewski@intel.com

[ Upstream commit e603f11d5df8997d104ab405ff27640b90baffaa ]

Follow the recommendation set by hda_intel.c and enable HDMI/DP codec wakeup during bus initialization procedure. Disable wakeup once init completes.

Signed-off-by: Cezary Rojewski cezary.rojewski@intel.com Reviewed-by: Pierre-Louis Bossart pierre-louis.bossart@linux.intel.com Link: https://lore.kernel.org/r/20200305145314.32579-4-cezary.rojewski@intel.com Signed-off-by: Mark Brown broonie@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- sound/soc/intel/skylake/skl.c | 2 ++ 1 file changed, 2 insertions(+)

diff --git a/sound/soc/intel/skylake/skl.c b/sound/soc/intel/skylake/skl.c index 3e526fbd267e6..223958c49bf8d 100644 --- a/sound/soc/intel/skylake/skl.c +++ b/sound/soc/intel/skylake/skl.c @@ -102,9 +102,11 @@ static int skl_init_chip(struct hdac_bus *bus, bool full_reset) { int ret;

+ snd_hdac_set_codec_wakeup(bus, true); skl_enable_miscbdcge(bus->dev, false); ret = snd_hdac_bus_init_chip(bus, full_reset); skl_enable_miscbdcge(bus->dev, true); + snd_hdac_set_codec_wakeup(bus, false);

return ret; }

-- 2.20.1

Sasha Levin

11:14 p.m.

New subject: [PATCH AUTOSEL 4.9 23/26] dmaengine: stm32-dma: use reset controller only at probe time

From: Etienne Carriere etienne.carriere@st.com

[ Upstream commit 8cf1e0fc50fcc25021567bb2755580504c57c83a ]

Remove reset controller reference from device instance since it is used only at probe time.

Signed-off-by: Etienne Carriere etienne.carriere@st.com Signed-off-by: Amelie Delaunay amelie.delaunay@st.com Link: https://lore.kernel.org/r/20200129153628.29329-3-amelie.delaunay@st.com Signed-off-by: Vinod Koul vkoul@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/dma/stm32-dma.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/dma/stm32-dma.c b/drivers/dma/stm32-dma.c index ae3f60be7759e..f0f82f466ec62 100644 --- a/drivers/dma/stm32-dma.c +++ b/drivers/dma/stm32-dma.c @@ -176,7 +176,6 @@ struct stm32_dma_device { struct dma_device ddev; void __iomem *base; struct clk *clk; - struct reset_control *rst; bool mem2mem; struct stm32_dma_chan chan[STM32_DMA_MAX_CHANNELS]; }; @@ -1006,6 +1005,7 @@ static int stm32_dma_probe(struct platform_device *pdev) struct dma_device *dd; const struct of_device_id *match; struct resource *res; + struct reset_control *rst; int i, ret;

match = of_match_device(stm32_dma_of_match, &pdev->dev); @@ -1034,11 +1034,11 @@ static int stm32_dma_probe(struct platform_device *pdev) dmadev->mem2mem = of_property_read_bool(pdev->dev.of_node, "st,mem2mem");

- dmadev->rst = devm_reset_control_get(&pdev->dev, NULL); - if (!IS_ERR(dmadev->rst)) { - reset_control_assert(dmadev->rst); + rst = devm_reset_control_get(&pdev->dev, NULL); + if (!IS_ERR(rst)) { + reset_control_assert(rst); udelay(2); - reset_control_deassert(dmadev->rst); + reset_control_deassert(rst); }

dma_cap_set(DMA_SLAVE, dd->cap_mask);

-- 2.20.1

Sasha Levin

11:14 p.m.

New subject: [PATCH AUTOSEL 4.9 24/26] scsi: ufs: Fix ufshcd_hold() caused scheduling while atomic

From: Can Guo cang@codeaurora.org

[ Upstream commit c63d6099a7959ecc919b2549dc6b71f53521f819 ]

The async version of ufshcd_hold(async == true), which is only called in queuecommand path as for now, is expected to work in atomic context, thus it should not sleep or schedule out. When it runs into the condition that clocks are ON but link is still in hibern8 state, it should bail out without flushing the clock ungate work.

Fixes: f2a785ac2312 ("scsi: ufshcd: Fix race between clk scaling and ungate work") Link: https://lore.kernel.org/r/1581392451-28743-6-git-send-email-cang@codeaurora.... Reviewed-by: Hongwu Su hongwus@codeaurora.org Reviewed-by: Asutosh Das asutoshd@codeaurora.org Reviewed-by: Bean Huo beanhuo@micron.com Reviewed-by: Stanley Chu stanley.chu@mediatek.com Signed-off-by: Can Guo cang@codeaurora.org Signed-off-by: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/scsi/ufs/ufshcd.c | 5 +++++ 1 file changed, 5 insertions(+)

diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c index 394df57894e6b..9bd26efcc1a10 100644 --- a/drivers/scsi/ufs/ufshcd.c +++ b/drivers/scsi/ufs/ufshcd.c @@ -682,6 +682,11 @@ int ufshcd_hold(struct ufs_hba *hba, bool async) */ if (ufshcd_can_hibern8_during_gating(hba) && ufshcd_is_link_hibern8(hba)) { + if (async) { + rc = -EAGAIN; + hba->clk_gating.active_reqs--; + break; + } spin_unlock_irqrestore(hba->host->host_lock, flags); flush_work(&hba->clk_gating.ungate_work); spin_lock_irqsave(hba->host->host_lock, flags);

-- 2.20.1

Sasha Levin

11:14 p.m.

New subject: [PATCH AUTOSEL 4.9 25/26] ext4: check for non-zero journal inum in ext4_calculate_overhead

From: Ritesh Harjani riteshh@linux.ibm.com

[ Upstream commit f1eec3b0d0a849996ebee733b053efa71803dad5 ]

While calculating overhead for internal journal, also check that j_inum shouldn't be 0. Otherwise we get below error with xfstests generic/050 with external journal (XXX_LOGDEV config) enabled.

It could be simply reproduced with loop device with an external journal and marking blockdev as RO before mounting.

[ 3337.146838] EXT4-fs error (device pmem1p2): ext4_get_journal_inode:4634: comm mount: inode #0: comm mount: iget: illegal inode # ------------[ cut here ]------------ generic_make_request: Trying to write to read-only block-device pmem1p2 (partno 2) WARNING: CPU: 107 PID: 115347 at block/blk-core.c:788 generic_make_request_checks+0x6b4/0x7d0 CPU: 107 PID: 115347 Comm: mount Tainted: G L --------- -t - 4.18.0-167.el8.ppc64le #1 NIP: c0000000006f6d44 LR: c0000000006f6d40 CTR: 0000000030041dd4 <...> NIP [c0000000006f6d44] generic_make_request_checks+0x6b4/0x7d0 LR [c0000000006f6d40] generic_make_request_checks+0x6b0/0x7d0 <...> Call Trace: generic_make_request_checks+0x6b0/0x7d0 (unreliable) generic_make_request+0x3c/0x420 submit_bio+0xd8/0x200 submit_bh_wbc+0x1e8/0x250 __sync_dirty_buffer+0xd0/0x210 ext4_commit_super+0x310/0x420 [ext4] __ext4_error+0xa4/0x1e0 [ext4] __ext4_iget+0x388/0xe10 [ext4] ext4_get_journal_inode+0x40/0x150 [ext4] ext4_calculate_overhead+0x5a8/0x610 [ext4] ext4_fill_super+0x3188/0x3260 [ext4] mount_bdev+0x778/0x8f0 ext4_mount+0x28/0x50 [ext4] mount_fs+0x74/0x230 vfs_kern_mount.part.6+0x6c/0x250 do_mount+0x2fc/0x1280 sys_mount+0x158/0x180 system_call+0x5c/0x70 EXT4-fs (pmem1p2): no journal found EXT4-fs (pmem1p2): can't get journal size EXT4-fs (pmem1p2): mounted filesystem without journal. Opts: dax,norecovery

Fixes: 3c816ded78bb ("ext4: use journal inode to determine journal overhead") Reported-by: Harish Sriram harish@linux.ibm.com Signed-off-by: Ritesh Harjani riteshh@linux.ibm.com Reviewed-by: Jan Kara jack@suse.cz Link: https://lore.kernel.org/r/20200316093038.25485-1-riteshh@linux.ibm.com Signed-off-by: Theodore Ts'o tytso@mit.edu Signed-off-by: Sasha Levin sashal@kernel.org --- fs/ext4/super.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/fs/ext4/super.c b/fs/ext4/super.c index 75f71e52ffc77..66daa3bc40a6b 100644 --- a/fs/ext4/super.c +++ b/fs/ext4/super.c @@ -3345,7 +3345,8 @@ int ext4_calculate_overhead(struct super_block *sb) */ if (sbi->s_journal && !sbi->journal_bdev) overhead += EXT4_NUM_B2C(sbi, sbi->s_journal->j_maxlen); - else if (ext4_has_feature_journal(sb) && !sbi->s_journal) { + else if (ext4_has_feature_journal(sb) && !sbi->s_journal && j_inum) { + /* j_inum for internal journal is non-zero */ j_inode = ext4_get_journal_inode(sb, j_inum); if (j_inode) { j_blocks = j_inode->i_size >> sb->s_blocksize_bits;

-- 2.20.1

Sasha Levin

11:14 p.m.

New subject: [PATCH AUTOSEL 4.9 26/26] svcrdma: Fix leak of transport addresses

From: Chuck Lever chuck.lever@oracle.com

[ Upstream commit 1a33d8a284b1e85e03b8c7b1ea8fb985fccd1d71 ]

Kernel memory leak detected:

unreferenced object 0xffff888849cdf480 (size 8): comm "kworker/u8:3", pid 2086, jiffies 4297898756 (age 4269.856s) hex dump (first 8 bytes): 30 00 cd 49 88 88 ff ff 0..I.... backtrace: [<00000000acfc370b>] __kmalloc_track_caller+0x137/0x183 [<00000000a2724354>] kstrdup+0x2b/0x43 [<0000000082964f84>] xprt_rdma_format_addresses+0x114/0x17d [rpcrdma] [<00000000dfa6ed00>] xprt_setup_rdma_bc+0xc0/0x10c [rpcrdma] [<0000000073051a83>] xprt_create_transport+0x3f/0x1a0 [sunrpc] [<0000000053531a8e>] rpc_create+0x118/0x1cd [sunrpc] [<000000003a51b5f8>] setup_callback_client+0x1a5/0x27d [nfsd] [<000000001bd410af>] nfsd4_process_cb_update.isra.7+0x16c/0x1ac [nfsd] [<000000007f4bbd56>] nfsd4_run_cb_work+0x4c/0xbd [nfsd] [<0000000055c5586b>] process_one_work+0x1b2/0x2fe [<00000000b1e3e8ef>] worker_thread+0x1a6/0x25a [<000000005205fb78>] kthread+0xf6/0xfb [<000000006d2dc057>] ret_from_fork+0x3a/0x50

Introduce a call to xprt_rdma_free_addresses() similar to the way that the TCP backchannel releases a transport's peer address strings.

Fixes: 5d252f90a800 ("svcrdma: Add class for RDMA backwards direction transport") Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- net/sunrpc/xprtrdma/svc_rdma_backchannel.c | 1 + 1 file changed, 1 insertion(+)

diff --git a/net/sunrpc/xprtrdma/svc_rdma_backchannel.c b/net/sunrpc/xprtrdma/svc_rdma_backchannel.c index 6035c5a380a6b..b3d48c6243c80 100644 --- a/net/sunrpc/xprtrdma/svc_rdma_backchannel.c +++ b/net/sunrpc/xprtrdma/svc_rdma_backchannel.c @@ -277,6 +277,7 @@ xprt_rdma_bc_put(struct rpc_xprt *xprt) { dprintk("svcrdma: %s: xprt %p\n", __func__, xprt);

+ xprt_rdma_free_addresses(xprt); xprt_free(xprt); module_put(THIS_MODULE); }

-- 2.20.1

1914

days inactive

1924

days old

linux-stable-mirror@lists.linaro.org

72 comments

participants

tags (0)

participants (9)

Edward Cree
Greg KH
gregkh＠linuxfoundation.org
Jakub Kicinski
Leon Romanovsky
Michal Kubecek
Or Gerlitz
Saeed Mahameed
Sasha Levin