September 2021 - Linux-stable-mirror

FAILED: patch "[PATCH] drm/amdgpu: Cancel delayed work when GFXOFF is disabled" failed to apply to 5.14-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 5.14-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. thanks, greg k-h ------------------ original commit in Linus's tree ------------------ >From 90a9266269eb9f71af1f323c33e1dca53527bd22 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Michel=20D=C3=A4nzer?= <mdaenzer(a)redhat.com> Date: Tue, 17 Aug 2021 10:23:25 +0200 Subject: [PATCH] drm/amdgpu: Cancel delayed work when GFXOFF is disabled MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit schedule_delayed_work does not push back the work if it was already scheduled before, so amdgpu_device_delay_enable_gfx_off ran ~100 ms after the first time GFXOFF was disabled and re-enabled, even if GFXOFF was disabled and re-enabled again during those 100 ms. This resulted in frame drops / stutter with the upcoming mutter 41 release on Navi 14, due to constantly enabling GFXOFF in the HW and disabling it again (for getting the GPU clock counter). To fix this, call cancel_delayed_work_sync when the disable count transitions from 0 to 1, and only schedule the delayed work on the reverse transition, not if the disable count was already 0. This makes sure the delayed work doesn't run at unexpected times, and allows it to be lock-free. v2: * Use cancel_delayed_work_sync & mutex_trylock instead of mod_delayed_work. v3: * Make amdgpu_device_delay_enable_gfx_off lock-free (Christian König) v4: * Fix race condition between amdgpu_gfx_off_ctrl incrementing adev->gfx.gfx_off_req_count and amdgpu_device_delay_enable_gfx_off checking for it to be 0 (Evan Quan) Cc: stable(a)vger.kernel.org Reviewed-by: Evan Quan <evan.quan(a)amd.com> Reviewed-by: Lijo Lazar <lijo.lazar(a)amd.com> # v3 Acked-by: Christian König <christian.koenig(a)amd.com> # v3 Signed-off-by: Michel Dänzer <mdaenzer(a)redhat.com> Signed-off-by: Alex Deucher <alexander.deucher(a)amd.com> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index 41cc00e489ac..41c6b3aacd37 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -2829,12 +2829,11 @@ static void amdgpu_device_delay_enable_gfx_off(struct work_struct *work) struct amdgpu_device *adev = container_of(work, struct amdgpu_device, gfx.gfx_off_delay_work.work); - mutex_lock(&adev->gfx.gfx_off_mutex); - if (!adev->gfx.gfx_off_state && !adev->gfx.gfx_off_req_count) { - if (!amdgpu_dpm_set_powergating_by_smu(adev, AMD_IP_BLOCK_TYPE_GFX, true)) - adev->gfx.gfx_off_state = true; - } - mutex_unlock(&adev->gfx.gfx_off_mutex); + WARN_ON_ONCE(adev->gfx.gfx_off_state); + WARN_ON_ONCE(adev->gfx.gfx_off_req_count); + + if (!amdgpu_dpm_set_powergating_by_smu(adev, AMD_IP_BLOCK_TYPE_GFX, true)) + adev->gfx.gfx_off_state = true; } /** diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c index e7e9655c5623..e7f06bd0f0cd 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c @@ -563,24 +563,38 @@ void amdgpu_gfx_off_ctrl(struct amdgpu_device *adev, bool enable) mutex_lock(&adev->gfx.gfx_off_mutex); - if (!enable) - adev->gfx.gfx_off_req_count++; - else if (adev->gfx.gfx_off_req_count > 0) + if (enable) { + /* If the count is already 0, it means there's an imbalance bug somewhere. + * Note that the bug may be in a different caller than the one which triggers the + * WARN_ON_ONCE. + */ + if (WARN_ON_ONCE(adev->gfx.gfx_off_req_count == 0)) + goto unlock; + adev->gfx.gfx_off_req_count--; - if (enable && !adev->gfx.gfx_off_state && !adev->gfx.gfx_off_req_count) { - schedule_delayed_work(&adev->gfx.gfx_off_delay_work, GFX_OFF_DELAY_ENABLE); - } else if (!enable && adev->gfx.gfx_off_state) { - if (!amdgpu_dpm_set_powergating_by_smu(adev, AMD_IP_BLOCK_TYPE_GFX, false)) { - adev->gfx.gfx_off_state = false; + if (adev->gfx.gfx_off_req_count == 0 && !adev->gfx.gfx_off_state) + schedule_delayed_work(&adev->gfx.gfx_off_delay_work, GFX_OFF_DELAY_ENABLE); + } else { + if (adev->gfx.gfx_off_req_count == 0) { + cancel_delayed_work_sync(&adev->gfx.gfx_off_delay_work); + + if (adev->gfx.gfx_off_state && + !amdgpu_dpm_set_powergating_by_smu(adev, AMD_IP_BLOCK_TYPE_GFX, false)) { + adev->gfx.gfx_off_state = false; - if (adev->gfx.funcs->init_spm_golden) { - dev_dbg(adev->dev, "GFXOFF is disabled, re-init SPM golden settings\n"); - amdgpu_gfx_init_spm_golden(adev); + if (adev->gfx.funcs->init_spm_golden) { + dev_dbg(adev->dev, + "GFXOFF is disabled, re-init SPM golden settings\n"); + amdgpu_gfx_init_spm_golden(adev); + } } } + + adev->gfx.gfx_off_req_count++; } +unlock: mutex_unlock(&adev->gfx.gfx_off_mutex); }

3 years, 10 months

4
7
0 0

[PATCH] drm/msm: Do not run snapshot on non-DPU devices

by Fabio Estevam

Since commit 98659487b845 ("drm/msm: add support to take dpu snapshot") the following NULL pointer dereference is seen on i.MX53: [ 3.275493] msm msm: bound 30000000.gpu (ops a3xx_ops) [ 3.287174] [drm] Initialized msm 1.8.0 20130625 for msm on minor 0 [ 3.293915] 8<--- cut here --- [ 3.297012] Unable to handle kernel NULL pointer dereference at virtual address 00000028 [ 3.305244] pgd = (ptrval) [ 3.307989] [00000028] *pgd=00000000 [ 3.311624] Internal error: Oops: 805 [#1] SMP ARM [ 3.316430] Modules linked in: [ 3.319503] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.14.0+g682d702b426b #1 [ 3.326652] Hardware name: Freescale i.MX53 (Device Tree Support) [ 3.332754] PC is at __mutex_init+0x14/0x54 [ 3.336969] LR is at msm_disp_snapshot_init+0x24/0xa0 i.MX53 does not use the DPU controller. Fix the problem by only calling msm_disp_snapshot_init() on platforms that use the DPU controller. Cc: stable(a)vger.kernel.org Fixes: 98659487b845 ("drm/msm: add support to take dpu snapshot") Signed-off-by: Fabio Estevam <festevam(a)gmail.com> --- drivers/gpu/drm/msm/msm_drv.c | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/msm/msm_drv.c b/drivers/gpu/drm/msm/msm_drv.c index 2e6fc185e54d..2aa2266454b7 100644 --- a/drivers/gpu/drm/msm/msm_drv.c +++ b/drivers/gpu/drm/msm/msm_drv.c @@ -630,10 +630,11 @@ static int msm_drm_init(struct device *dev, const struct drm_driver *drv) if (ret) goto err_msm_uninit; - ret = msm_disp_snapshot_init(ddev); - if (ret) - DRM_DEV_ERROR(dev, "msm_disp_snapshot_init failed ret = %d\n", ret); - + if (kms) { + ret = msm_disp_snapshot_init(ddev); + if (ret) + DRM_DEV_ERROR(dev, "msm_disp_snapshot_init failed ret = %d\n", ret); + } drm_mode_config_reset(ddev); #ifdef CONFIG_DRM_FBDEV_EMULATION -- 2.25.1

3 years, 10 months

2
5
0 0

FAILED: patch "[PATCH] drm/amdgpu: Fix build with missing pm_suspend_target_state" failed to apply to 5.14-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 5.14-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. thanks, greg k-h ------------------ original commit in Linus's tree ------------------ >From a47f6a5806da4f24fbb66148a1519bf72fe060db Mon Sep 17 00:00:00 2001 From: Borislav Petkov <bp(a)suse.de> Date: Tue, 24 Aug 2021 11:42:47 +0200 Subject: [PATCH] drm/amdgpu: Fix build with missing pm_suspend_target_state module export Building a randconfig here triggered: ERROR: modpost: "pm_suspend_target_state" [drivers/gpu/drm/amd/amdgpu/amdgpu.ko] undefined! because the module export of that symbol happens in kernel/power/suspend.c which is enabled with CONFIG_SUSPEND. The ifdef guards in amdgpu_acpi_is_s0ix_supported(), however, test for CONFIG_PM_SLEEP which is defined like this: config PM_SLEEP def_bool y depends on SUSPEND || HIBERNATE_CALLBACKS and that randconfig has: # CONFIG_SUSPEND is not set CONFIG_HIBERNATE_CALLBACKS=y leading to the module export missing. Change the ifdeffery to depend directly on CONFIG_SUSPEND. Fixes: 5706cb3c910c ("drm/amdgpu: fix checking pmops when PM_SLEEP is not enabled") Reviewed-by: Lijo Lazar <lijo.lazar(a)amd.com> Signed-off-by: Borislav Petkov <bp(a)suse.de> Link: https://lkml.kernel.org/r/YSP6Lv53QV0cOAsd@zn.tnic Signed-off-by: Alex Deucher <alexander.deucher(a)amd.com> Cc: stable(a)vger.kernel.org diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c index 260ba01d303e..4811b0faafd9 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c @@ -1040,7 +1040,7 @@ void amdgpu_acpi_detect(void) */ bool amdgpu_acpi_is_s0ix_active(struct amdgpu_device *adev) { -#if IS_ENABLED(CONFIG_AMD_PMC) && IS_ENABLED(CONFIG_PM_SLEEP) +#if IS_ENABLED(CONFIG_AMD_PMC) && IS_ENABLED(CONFIG_SUSPEND) if (acpi_gbl_FADT.flags & ACPI_FADT_LOW_POWER_S0) { if (adev->flags & AMD_IS_APU) return pm_suspend_target_state == PM_SUSPEND_TO_IDLE;

3 years, 10 months

2
1
0 0

[PATCH] comedi: Fix memory leak in compat_insnlist()

by Ian Abbott

`compat_insnlist()` handles the 32-bit version of the `COMEDI_INSNLIST` ioctl (whenwhen `CONFIG_COMPAT` is enabled). It allocates memory to temporarily hold an array of `struct comedi_insn` converted from the 32-bit version in user space. This memory is only being freed if there is a fault while filling the array, otherwise it is leaked. Add a call to `kfree()` to fix the leak. Fixes: b8d47d881305 ("comedi: get rid of compat_alloc_user_space() mess in COMEDI_INSNLIST compat" Cc: Al Viro <viro(a)zeniv.linux.org.uk> Cc: Greg Kroah-Hartman <greglh(a)linuxfoundation.org> Cc: linux-staging(a)lists.linux.dev Cc: <stable(a)vger.kernel.org> # 5.13+ Signed-off-by: Ian Abbott <abbotti(a)mev.co.uk> --- N.B. Also need patches for 5.8+ from before comedi moved out of staging. --- drivers/comedi/comedi_fops.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/comedi/comedi_fops.c b/drivers/comedi/comedi_fops.c index df77b6bf5c64..763cea8418f8 100644 --- a/drivers/comedi/comedi_fops.c +++ b/drivers/comedi/comedi_fops.c @@ -3090,6 +3090,7 @@ static int compat_insnlist(struct file *file, unsigned long arg) mutex_lock(&dev->mutex); rc = do_insnlist_ioctl(dev, insns, insnlist32.n_insns, file); mutex_unlock(&dev->mutex); + kfree(insns); return rc; } -- 2.33.0

3 years, 10 months

1
2
0 0

[PATCH] rsi: Fix module dev_oper_mode parameter description

by Marek Vasut

The module parameters are missing dev_oper_mode 12, BT classic alone, add it. Moreover, the parameters encode newlines, which ends up being printed malformed e.g. by modinfo, so fix that too. However, the module parameter string is duplicated in both USB and SDIO modules and the dev_oper_mode mode enumeration in those module parameters is a duplicate of macros used by the driver. Furthermore, the enumeration is confusing. So, deduplicate the module parameter string and use __stringify() to encode the correct mode enumeration values into the module parameter string. Finally, replace 'Wi-Fi' with 'Wi-Fi alone' and 'BT' with 'BT classic alone' to clarify what those modes really mean. Fixes: 898b255339310 ("rsi: add module parameter operating mode") Signed-off-by: Marek Vasut <marex(a)denx.de> Cc: Amitkumar Karwar <amit.karwar(a)redpinesignals.com> Cc: Angus Ainslie <angus(a)akkea.ca> Cc: David S. Miller <davem(a)davemloft.net> Cc: Jakub Kicinski <kuba(a)kernel.org> Cc: Kalle Valo <kvalo(a)codeaurora.org> Cc: Karun Eagalapati <karun256(a)gmail.com> Cc: Martin Fuzzey <martin.fuzzey(a)flowbird.group> Cc: Martin Kepplinger <martink(a)posteo.de> Cc: Prameela Rani Garnepudi <prameela.j04cs(a)gmail.com> Cc: Sebastian Krzyszkowiak <sebastian.krzyszkowiak(a)puri.sm> Cc: Siva Rebbagondla <siva8118(a)gmail.com> To: netdev(a)vger.kernel.org Cc: <stable(a)vger.kernel.org> # 4.17+ --- drivers/net/wireless/rsi/rsi_91x_sdio.c | 5 +---- drivers/net/wireless/rsi/rsi_91x_usb.c | 5 +---- drivers/net/wireless/rsi/rsi_hal.h | 11 +++++++++++ 3 files changed, 13 insertions(+), 8 deletions(-) diff --git a/drivers/net/wireless/rsi/rsi_91x_sdio.c b/drivers/net/wireless/rsi/rsi_91x_sdio.c index e0c502bc42707..9f16128e4ffab 100644 --- a/drivers/net/wireless/rsi/rsi_91x_sdio.c +++ b/drivers/net/wireless/rsi/rsi_91x_sdio.c @@ -24,10 +24,7 @@ /* Default operating mode is wlan STA + BT */ static u16 dev_oper_mode = DEV_OPMODE_STA_BT_DUAL; module_param(dev_oper_mode, ushort, 0444); -MODULE_PARM_DESC(dev_oper_mode, - "1[Wi-Fi], 4[BT], 8[BT LE], 5[Wi-Fi STA + BT classic]\n" - "9[Wi-Fi STA + BT LE], 13[Wi-Fi STA + BT classic + BT LE]\n" - "6[AP + BT classic], 14[AP + BT classic + BT LE]"); +MODULE_PARM_DESC(dev_oper_mode, DEV_OPMODE_PARAM_DESC); /** * rsi_sdio_set_cmd52_arg() - This function prepares cmd 52 read/write arg. diff --git a/drivers/net/wireless/rsi/rsi_91x_usb.c b/drivers/net/wireless/rsi/rsi_91x_usb.c index 416976f098882..6a120211800db 100644 --- a/drivers/net/wireless/rsi/rsi_91x_usb.c +++ b/drivers/net/wireless/rsi/rsi_91x_usb.c @@ -25,10 +25,7 @@ /* Default operating mode is wlan STA + BT */ static u16 dev_oper_mode = DEV_OPMODE_STA_BT_DUAL; module_param(dev_oper_mode, ushort, 0444); -MODULE_PARM_DESC(dev_oper_mode, - "1[Wi-Fi], 4[BT], 8[BT LE], 5[Wi-Fi STA + BT classic]\n" - "9[Wi-Fi STA + BT LE], 13[Wi-Fi STA + BT classic + BT LE]\n" - "6[AP + BT classic], 14[AP + BT classic + BT LE]"); +MODULE_PARM_DESC(dev_oper_mode, DEV_OPMODE_PARAM_DESC); static int rsi_rx_urb_submit(struct rsi_hw *adapter, u8 ep_num, gfp_t flags); diff --git a/drivers/net/wireless/rsi/rsi_hal.h b/drivers/net/wireless/rsi/rsi_hal.h index d044a440fa080..e435df73ccce3 100644 --- a/drivers/net/wireless/rsi/rsi_hal.h +++ b/drivers/net/wireless/rsi/rsi_hal.h @@ -28,6 +28,17 @@ #define DEV_OPMODE_AP_BT 6 #define DEV_OPMODE_AP_BT_DUAL 14 +#define DEV_OPMODE_PARAM_DESC \ + __stringify(DEV_OPMODE_WIFI_ALONE) "[Wi-Fi alone], " \ + __stringify(DEV_OPMODE_BT_ALONE) "[BT classic alone], " \ + __stringify(DEV_OPMODE_BT_LE_ALONE) "[BT LE], " \ + __stringify(DEV_OPMODE_BT_DUAL) "[BT Dual], " \ + __stringify(DEV_OPMODE_STA_BT) "[Wi-Fi STA + BT classic], " \ + __stringify(DEV_OPMODE_STA_BT_LE) "[Wi-Fi STA + BT LE], " \ + __stringify(DEV_OPMODE_STA_BT_DUAL) "[Wi-Fi STA + BT classic + BT LE], " \ + __stringify(DEV_OPMODE_AP_BT) "[AP + BT classic], " \ + __stringify(DEV_OPMODE_AP_BT_DUAL) "[AP + BT classic + BT LE]" + #define FLASH_WRITE_CHUNK_SIZE (4 * 1024) #define FLASH_SECTOR_SIZE (4 * 1024) -- 2.33.0

3 years, 10 months

3
3
0 0

FAILED: patch "[PATCH] tracing/osnoise: Fix missed cpus_read_unlock() in" failed to apply to 5.14-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 5.14-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. thanks, greg k-h ------------------ original commit in Linus's tree ------------------ >From 4b6b08f2e45edda4c067ac40833e3c1f84383c0b Mon Sep 17 00:00:00 2001 From: "Qiang.Zhang" <qiang.zhang(a)windriver.com> Date: Tue, 31 Aug 2021 10:29:19 +0800 Subject: [PATCH] tracing/osnoise: Fix missed cpus_read_unlock() in start_per_cpu_kthreads() When start_kthread() return error, the cpus_read_unlock() need to be called. Link: https://lkml.kernel.org/r/20210831022919.27630-1-qiang.zhang@windriver.com Cc: <stable(a)vger.kernel.org> Fixes: c8895e271f79 ("trace/osnoise: Support hotplug operations") Acked-by: Daniel Bristot de Oliveira <bristot(a)kernel.org> Signed-off-by: Qiang.Zhang <qiang.zhang(a)windriver.com> Signed-off-by: Steven Rostedt (VMware) <rostedt(a)goodmis.org> diff --git a/kernel/trace/trace_osnoise.c b/kernel/trace/trace_osnoise.c index 65b08b8e5bf8..ce053619f289 100644 --- a/kernel/trace/trace_osnoise.c +++ b/kernel/trace/trace_osnoise.c @@ -1548,7 +1548,7 @@ static int start_kthread(unsigned int cpu) static int start_per_cpu_kthreads(struct trace_array *tr) { struct cpumask *current_mask = &save_cpumask; - int retval; + int retval = 0; int cpu; cpus_read_lock(); @@ -1568,13 +1568,13 @@ static int start_per_cpu_kthreads(struct trace_array *tr) retval = start_kthread(cpu); if (retval) { stop_per_cpu_kthreads(); - return retval; + break; } } cpus_read_unlock(); - return 0; + return retval; } #ifdef CONFIG_HOTPLUG_CPU

3 years, 10 months

3
2
0 0

[PATCH 5.10] fanotify: limit number of event merge attempts

by Amir Goldstein

commit b8cd0ee8cda68a888a317991c1e918a8cba1a568 upstream. Event merges are expensive when event queue size is large, so limit the linear search to 128 merge tests. [Stable backport notes] The following statement from upstream commit is irrelevant for backport: - -In combination with 128 size hash table, there is a potential to merge -with up to 16K events in the hashed queue. - [Stable backport notes] The problem is as old as fanotify and described in the linked cover letter "Performance improvement for fanotify merge". This backported patch fixes the performance issue at the cost of merging fewer potential events. Fixing the performance issue is more important than preserving the "event merge" behavior, which was not predictable in any way that applications could rely on. Link: https://lore.kernel.org/r/20210304104826.3993892-6-amir73il@gmail.com Signed-off-by: Amir Goldstein <amir73il(a)gmail.com> Signed-off-by: Jan Kara <jack(a)suse.cz> Cc: <stable(a)vger.kernel.org> Link: https://lore.kernel.org/linux-fsdevel/20210202162010.305971-1-amir73il@gmai… Link: https://lore.kernel.org/linux-fsdevel/20210915163334.GD6166@quack2.suse.cz/ Signed-off-by: Amir Goldstein <amir73il(a)gmail.com> --- fs/notify/fanotify/fanotify.c | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c index 1192c9953620..c3af99e94f1d 100644 --- a/fs/notify/fanotify/fanotify.c +++ b/fs/notify/fanotify/fanotify.c @@ -129,11 +129,15 @@ static bool fanotify_should_merge(struct fsnotify_event *old_fsn, return false; } +/* Limit event merges to limit CPU overhead per event */ +#define FANOTIFY_MAX_MERGE_EVENTS 128 + /* and the list better be locked by something too! */ static int fanotify_merge(struct list_head *list, struct fsnotify_event *event) { struct fsnotify_event *test_event; struct fanotify_event *new; + int i = 0; pr_debug("%s: list=%p event=%p\n", __func__, list, event); new = FANOTIFY_E(event); @@ -147,6 +151,8 @@ static int fanotify_merge(struct list_head *list, struct fsnotify_event *event) return 0; list_for_each_entry_reverse(test_event, list, list) { + if (++i > FANOTIFY_MAX_MERGE_EVENTS) + break; if (fanotify_should_merge(test_event, event)) { FANOTIFY_E(test_event)->mask |= new->mask; return 1; -- 2.16.5

3 years, 10 months

2
1
0 0

FAILED: patch "[PATCH] drm/i915/gtt: drop the page table optimisation" failed to apply to 4.19-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 4.19-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. thanks, greg k-h ------------------ original commit in Linus's tree ------------------ >From 8f88ca76b3942d82e2c1cea8735ec368d89ecc15 Mon Sep 17 00:00:00 2001 From: Matthew Auld <matthew.auld(a)intel.com> Date: Tue, 13 Jul 2021 14:04:31 +0100 Subject: [PATCH] drm/i915/gtt: drop the page table optimisation We skip filling out the pt with scratch entries if the va range covers the entire pt, since we later have to fill it with the PTEs for the object pages anyway. However this might leave open a small window where the PTEs don't point to anything valid for the HW to consume. When for example using 2M GTT pages this fill_px() showed up as being quite significant in perf measurements, and ends up being completely wasted since we ignore the pt and just use the pde directly. Anyway, currently we have our PTE construction split between alloc and insert, which is probably slightly iffy nowadays, since the alloc doesn't actually allocate anything anymore, instead it just sets up the page directories and points the PTEs at the scratch page. Later when we do the insert step we re-program the PTEs again. Better might be to squash the alloc and insert into a single step, then bringing back this optimisation(along with some others) should be possible. Fixes: 14826673247e ("drm/i915: Only initialize partially filled pagetables") Signed-off-by: Matthew Auld <matthew.auld(a)intel.com> Cc: Jon Bloomfield <jon.bloomfield(a)intel.com> Cc: Chris Wilson <chris.p.wilson(a)intel.com> Cc: Daniel Vetter <daniel(a)ffwll.ch> Cc: <stable(a)vger.kernel.org> # v4.15+ Reviewed-by: Daniel Vetter <daniel.vetter(a)ffwll.ch> Link: https://patchwork.freedesktop.org/patch/msgid/20210713130431.2392740-1-matt… diff --git a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c index 3d02c726c746..6e0e52eeb87a 100644 --- a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c +++ b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c @@ -303,10 +303,7 @@ static void __gen8_ppgtt_alloc(struct i915_address_space * const vm, __i915_gem_object_pin_pages(pt->base); i915_gem_object_make_unshrinkable(pt->base); - if (lvl || - gen8_pt_count(*start, end) < I915_PDES || - intel_vgpu_active(vm->i915)) - fill_px(pt, vm->scratch[lvl]->encode); + fill_px(pt, vm->scratch[lvl]->encode); spin_lock(&pd->lock); if (likely(!pd->entry[idx])) {

3 years, 10 months

1
0
0 0

FAILED: patch "[PATCH] drm/i915/gtt: drop the page table optimisation" failed to apply to 5.4-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 5.4-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. thanks, greg k-h ------------------ original commit in Linus's tree ------------------ >From 8f88ca76b3942d82e2c1cea8735ec368d89ecc15 Mon Sep 17 00:00:00 2001 From: Matthew Auld <matthew.auld(a)intel.com> Date: Tue, 13 Jul 2021 14:04:31 +0100 Subject: [PATCH] drm/i915/gtt: drop the page table optimisation We skip filling out the pt with scratch entries if the va range covers the entire pt, since we later have to fill it with the PTEs for the object pages anyway. However this might leave open a small window where the PTEs don't point to anything valid for the HW to consume. When for example using 2M GTT pages this fill_px() showed up as being quite significant in perf measurements, and ends up being completely wasted since we ignore the pt and just use the pde directly. Anyway, currently we have our PTE construction split between alloc and insert, which is probably slightly iffy nowadays, since the alloc doesn't actually allocate anything anymore, instead it just sets up the page directories and points the PTEs at the scratch page. Later when we do the insert step we re-program the PTEs again. Better might be to squash the alloc and insert into a single step, then bringing back this optimisation(along with some others) should be possible. Fixes: 14826673247e ("drm/i915: Only initialize partially filled pagetables") Signed-off-by: Matthew Auld <matthew.auld(a)intel.com> Cc: Jon Bloomfield <jon.bloomfield(a)intel.com> Cc: Chris Wilson <chris.p.wilson(a)intel.com> Cc: Daniel Vetter <daniel(a)ffwll.ch> Cc: <stable(a)vger.kernel.org> # v4.15+ Reviewed-by: Daniel Vetter <daniel.vetter(a)ffwll.ch> Link: https://patchwork.freedesktop.org/patch/msgid/20210713130431.2392740-1-matt… diff --git a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c index 3d02c726c746..6e0e52eeb87a 100644 --- a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c +++ b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c @@ -303,10 +303,7 @@ static void __gen8_ppgtt_alloc(struct i915_address_space * const vm, __i915_gem_object_pin_pages(pt->base); i915_gem_object_make_unshrinkable(pt->base); - if (lvl || - gen8_pt_count(*start, end) < I915_PDES || - intel_vgpu_active(vm->i915)) - fill_px(pt, vm->scratch[lvl]->encode); + fill_px(pt, vm->scratch[lvl]->encode); spin_lock(&pd->lock); if (likely(!pd->entry[idx])) {

3 years, 10 months

1
0
0 0

FAILED: patch "[PATCH] drm/i915/gtt: drop the page table optimisation" failed to apply to 5.10-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 5.10-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. thanks, greg k-h ------------------ original commit in Linus's tree ------------------ >From 8f88ca76b3942d82e2c1cea8735ec368d89ecc15 Mon Sep 17 00:00:00 2001 From: Matthew Auld <matthew.auld(a)intel.com> Date: Tue, 13 Jul 2021 14:04:31 +0100 Subject: [PATCH] drm/i915/gtt: drop the page table optimisation We skip filling out the pt with scratch entries if the va range covers the entire pt, since we later have to fill it with the PTEs for the object pages anyway. However this might leave open a small window where the PTEs don't point to anything valid for the HW to consume. When for example using 2M GTT pages this fill_px() showed up as being quite significant in perf measurements, and ends up being completely wasted since we ignore the pt and just use the pde directly. Anyway, currently we have our PTE construction split between alloc and insert, which is probably slightly iffy nowadays, since the alloc doesn't actually allocate anything anymore, instead it just sets up the page directories and points the PTEs at the scratch page. Later when we do the insert step we re-program the PTEs again. Better might be to squash the alloc and insert into a single step, then bringing back this optimisation(along with some others) should be possible. Fixes: 14826673247e ("drm/i915: Only initialize partially filled pagetables") Signed-off-by: Matthew Auld <matthew.auld(a)intel.com> Cc: Jon Bloomfield <jon.bloomfield(a)intel.com> Cc: Chris Wilson <chris.p.wilson(a)intel.com> Cc: Daniel Vetter <daniel(a)ffwll.ch> Cc: <stable(a)vger.kernel.org> # v4.15+ Reviewed-by: Daniel Vetter <daniel.vetter(a)ffwll.ch> Link: https://patchwork.freedesktop.org/patch/msgid/20210713130431.2392740-1-matt… diff --git a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c index 3d02c726c746..6e0e52eeb87a 100644 --- a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c +++ b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c @@ -303,10 +303,7 @@ static void __gen8_ppgtt_alloc(struct i915_address_space * const vm, __i915_gem_object_pin_pages(pt->base); i915_gem_object_make_unshrinkable(pt->base); - if (lvl || - gen8_pt_count(*start, end) < I915_PDES || - intel_vgpu_active(vm->i915)) - fill_px(pt, vm->scratch[lvl]->encode); + fill_px(pt, vm->scratch[lvl]->encode); spin_lock(&pd->lock); if (likely(!pd->entry[idx])) {

3 years, 10 months

1
0
0 0

2025

2024

2023

2022

2021

2020

2019

2018

2017

Linux-stable-mirror September 2021