July 2018 - Linux-stable-mirror

FAILED: patch "[PATCH] drm/amd/display: fix invalid function table override" failed to apply to 4.17-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 4.17-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. thanks, greg k-h ------------------ original commit in Linus's tree ------------------ >From 0b9021972d3eef525d53076d8c7ac091e988d2e4 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Christian=20K=C3=B6nig?= <christian.koenig(a)amd.com> Date: Fri, 6 Jul 2018 13:46:05 +0200 Subject: [PATCH] drm/amd/display: fix invalid function table override MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Otherwise we try to program hardware with the wrong watermark functions when multiple DCE generations are installed in one system. Signed-off-by: Christian König <christian.koenig(a)amd.com> Reviewed-by: Harry Wentland <harry.wentland(a)amd.com> Signed-off-by: Alex Deucher <alexander.deucher(a)amd.com> Cc: stable(a)vger.kernel.org diff --git a/drivers/gpu/drm/amd/display/dc/dce/dce_mem_input.c b/drivers/gpu/drm/amd/display/dc/dce/dce_mem_input.c index b235a75355b8..bae752332a9f 100644 --- a/drivers/gpu/drm/amd/display/dc/dce/dce_mem_input.c +++ b/drivers/gpu/drm/amd/display/dc/dce/dce_mem_input.c @@ -741,6 +741,29 @@ static struct mem_input_funcs dce_mi_funcs = { .mem_input_is_flip_pending = dce_mi_is_flip_pending }; +static struct mem_input_funcs dce112_mi_funcs = { + .mem_input_program_display_marks = dce112_mi_program_display_marks, + .allocate_mem_input = dce_mi_allocate_dmif, + .free_mem_input = dce_mi_free_dmif, + .mem_input_program_surface_flip_and_addr = + dce_mi_program_surface_flip_and_addr, + .mem_input_program_pte_vm = dce_mi_program_pte_vm, + .mem_input_program_surface_config = + dce_mi_program_surface_config, + .mem_input_is_flip_pending = dce_mi_is_flip_pending +}; + +static struct mem_input_funcs dce120_mi_funcs = { + .mem_input_program_display_marks = dce120_mi_program_display_marks, + .allocate_mem_input = dce_mi_allocate_dmif, + .free_mem_input = dce_mi_free_dmif, + .mem_input_program_surface_flip_and_addr = + dce_mi_program_surface_flip_and_addr, + .mem_input_program_pte_vm = dce_mi_program_pte_vm, + .mem_input_program_surface_config = + dce_mi_program_surface_config, + .mem_input_is_flip_pending = dce_mi_is_flip_pending +}; void dce_mem_input_construct( struct dce_mem_input *dce_mi, @@ -769,7 +792,7 @@ void dce112_mem_input_construct( const struct dce_mem_input_mask *mi_mask) { dce_mem_input_construct(dce_mi, ctx, inst, regs, mi_shift, mi_mask); - dce_mi->base.funcs->mem_input_program_display_marks = dce112_mi_program_display_marks; + dce_mi->base.funcs = &dce112_mi_funcs; } void dce120_mem_input_construct( @@ -781,5 +804,5 @@ void dce120_mem_input_construct( const struct dce_mem_input_mask *mi_mask) { dce_mem_input_construct(dce_mi, ctx, inst, regs, mi_shift, mi_mask); - dce_mi->base.funcs->mem_input_program_display_marks = dce120_mi_program_display_marks; + dce_mi->base.funcs = &dce120_mi_funcs; }

7 years, 4 months

1
0
0 0

FAILED: patch "[PATCH] drm/amdgpu: add another ATPX quirk for TOPAZ" failed to apply to 4.17-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 4.17-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. thanks, greg k-h ------------------ original commit in Linus's tree ------------------ >From b3fc2ab37e27f8d6588a4755382346ba2335a7c7 Mon Sep 17 00:00:00 2001 From: Alex Deucher <alexander.deucher(a)amd.com> Date: Tue, 17 Jul 2018 10:52:29 -0500 Subject: [PATCH] drm/amdgpu: add another ATPX quirk for TOPAZ Needs ATPX rather than _PR3. Bug: https://bugzilla.kernel.org/show_bug.cgi?id=200517 Reviewed-by: Junwei Zhang <Jerry.Zhang(a)amd.com> Signed-off-by: Alex Deucher <alexander.deucher(a)amd.com> Cc: stable(a)vger.kernel.org diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_atpx_handler.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_atpx_handler.c index 9ab89371d9e8..ca8bf1c9a98e 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_atpx_handler.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_atpx_handler.c @@ -575,6 +575,7 @@ static const struct amdgpu_px_quirk amdgpu_px_quirk_list[] = { { 0x1002, 0x6900, 0x1002, 0x0124, AMDGPU_PX_QUIRK_FORCE_ATPX }, { 0x1002, 0x6900, 0x1028, 0x0812, AMDGPU_PX_QUIRK_FORCE_ATPX }, { 0x1002, 0x6900, 0x1028, 0x0813, AMDGPU_PX_QUIRK_FORCE_ATPX }, + { 0x1002, 0x6900, 0x1025, 0x125A, AMDGPU_PX_QUIRK_FORCE_ATPX }, { 0, 0, 0, 0, 0 }, };

7 years, 4 months

1
0
0 0

FAILED: patch "[PATCH] drm/amdgpu/pp/smu7: use a local variable for toc indexing" failed to apply to 4.14-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 4.14-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. thanks, greg k-h ------------------ original commit in Linus's tree ------------------ >From 02ce6ce2e1d07e31e8314c761a2caa087ea094ce Mon Sep 17 00:00:00 2001 From: Alex Deucher <alexander.deucher(a)amd.com> Date: Thu, 12 Jul 2018 08:38:09 -0500 Subject: [PATCH] drm/amdgpu/pp/smu7: use a local variable for toc indexing MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Rather than using the index variable stored in vram. If the device fails to come back online after a resume cycle, reads from vram will return all 1s which will cause a segfault. Based on a patch from Thomas Martitz <kugel(a)rockbox.org>. This avoids the segfault, but we still need to sort out why the GPU does not come back online after a resume. Bug: https://bugs.freedesktop.org/show_bug.cgi?id=105760 Acked-by: Christian König <christian.koenig(a)amd.com> Signed-off-by: Alex Deucher <alexander.deucher(a)amd.com> Cc: stable(a)vger.kernel.org diff --git a/drivers/gpu/drm/amd/powerplay/smumgr/smu7_smumgr.c b/drivers/gpu/drm/amd/powerplay/smumgr/smu7_smumgr.c index d644a9bb9078..9f407c48d4f0 100644 --- a/drivers/gpu/drm/amd/powerplay/smumgr/smu7_smumgr.c +++ b/drivers/gpu/drm/amd/powerplay/smumgr/smu7_smumgr.c @@ -381,6 +381,7 @@ int smu7_request_smu_load_fw(struct pp_hwmgr *hwmgr) uint32_t fw_to_load; int result = 0; struct SMU_DRAMData_TOC *toc; + uint32_t num_entries = 0; if (!hwmgr->reload_fw) { pr_info("skip reloading...\n"); @@ -422,41 +423,41 @@ int smu7_request_smu_load_fw(struct pp_hwmgr *hwmgr) } toc = (struct SMU_DRAMData_TOC *)smu_data->header; - toc->num_entries = 0; toc->structure_version = 1; PP_ASSERT_WITH_CODE(0 == smu7_populate_single_firmware_entry(hwmgr, - UCODE_ID_RLC_G, &toc->entry[toc->num_entries++]), + UCODE_ID_RLC_G, &toc->entry[num_entries++]), "Failed to Get Firmware Entry.", return -EINVAL); PP_ASSERT_WITH_CODE(0 == smu7_populate_single_firmware_entry(hwmgr, - UCODE_ID_CP_CE, &toc->entry[toc->num_entries++]), + UCODE_ID_CP_CE, &toc->entry[num_entries++]), "Failed to Get Firmware Entry.", return -EINVAL); PP_ASSERT_WITH_CODE(0 == smu7_populate_single_firmware_entry(hwmgr, - UCODE_ID_CP_PFP, &toc->entry[toc->num_entries++]), + UCODE_ID_CP_PFP, &toc->entry[num_entries++]), "Failed to Get Firmware Entry.", return -EINVAL); PP_ASSERT_WITH_CODE(0 == smu7_populate_single_firmware_entry(hwmgr, - UCODE_ID_CP_ME, &toc->entry[toc->num_entries++]), + UCODE_ID_CP_ME, &toc->entry[num_entries++]), "Failed to Get Firmware Entry.", return -EINVAL); PP_ASSERT_WITH_CODE(0 == smu7_populate_single_firmware_entry(hwmgr, - UCODE_ID_CP_MEC, &toc->entry[toc->num_entries++]), + UCODE_ID_CP_MEC, &toc->entry[num_entries++]), "Failed to Get Firmware Entry.", return -EINVAL); PP_ASSERT_WITH_CODE(0 == smu7_populate_single_firmware_entry(hwmgr, - UCODE_ID_CP_MEC_JT1, &toc->entry[toc->num_entries++]), + UCODE_ID_CP_MEC_JT1, &toc->entry[num_entries++]), "Failed to Get Firmware Entry.", return -EINVAL); PP_ASSERT_WITH_CODE(0 == smu7_populate_single_firmware_entry(hwmgr, - UCODE_ID_CP_MEC_JT2, &toc->entry[toc->num_entries++]), + UCODE_ID_CP_MEC_JT2, &toc->entry[num_entries++]), "Failed to Get Firmware Entry.", return -EINVAL); PP_ASSERT_WITH_CODE(0 == smu7_populate_single_firmware_entry(hwmgr, - UCODE_ID_SDMA0, &toc->entry[toc->num_entries++]), + UCODE_ID_SDMA0, &toc->entry[num_entries++]), "Failed to Get Firmware Entry.", return -EINVAL); PP_ASSERT_WITH_CODE(0 == smu7_populate_single_firmware_entry(hwmgr, - UCODE_ID_SDMA1, &toc->entry[toc->num_entries++]), + UCODE_ID_SDMA1, &toc->entry[num_entries++]), "Failed to Get Firmware Entry.", return -EINVAL); if (!hwmgr->not_vf) PP_ASSERT_WITH_CODE(0 == smu7_populate_single_firmware_entry(hwmgr, - UCODE_ID_MEC_STORAGE, &toc->entry[toc->num_entries++]), + UCODE_ID_MEC_STORAGE, &toc->entry[num_entries++]), "Failed to Get Firmware Entry.", return -EINVAL); + toc->num_entries = num_entries; smu7_send_msg_to_smc_with_parameter(hwmgr, PPSMC_MSG_DRV_DRAM_ADDR_HI, upper_32_bits(smu_data->header_buffer.mc_addr)); smu7_send_msg_to_smc_with_parameter(hwmgr, PPSMC_MSG_DRV_DRAM_ADDR_LO, lower_32_bits(smu_data->header_buffer.mc_addr));

7 years, 4 months

1
0
0 0

FAILED: patch "[PATCH] drm/amdgpu/pp/smu7: use a local variable for toc indexing" failed to apply to 4.17-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 4.17-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. thanks, greg k-h ------------------ original commit in Linus's tree ------------------ >From 02ce6ce2e1d07e31e8314c761a2caa087ea094ce Mon Sep 17 00:00:00 2001 From: Alex Deucher <alexander.deucher(a)amd.com> Date: Thu, 12 Jul 2018 08:38:09 -0500 Subject: [PATCH] drm/amdgpu/pp/smu7: use a local variable for toc indexing MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Rather than using the index variable stored in vram. If the device fails to come back online after a resume cycle, reads from vram will return all 1s which will cause a segfault. Based on a patch from Thomas Martitz <kugel(a)rockbox.org>. This avoids the segfault, but we still need to sort out why the GPU does not come back online after a resume. Bug: https://bugs.freedesktop.org/show_bug.cgi?id=105760 Acked-by: Christian König <christian.koenig(a)amd.com> Signed-off-by: Alex Deucher <alexander.deucher(a)amd.com> Cc: stable(a)vger.kernel.org diff --git a/drivers/gpu/drm/amd/powerplay/smumgr/smu7_smumgr.c b/drivers/gpu/drm/amd/powerplay/smumgr/smu7_smumgr.c index d644a9bb9078..9f407c48d4f0 100644 --- a/drivers/gpu/drm/amd/powerplay/smumgr/smu7_smumgr.c +++ b/drivers/gpu/drm/amd/powerplay/smumgr/smu7_smumgr.c @@ -381,6 +381,7 @@ int smu7_request_smu_load_fw(struct pp_hwmgr *hwmgr) uint32_t fw_to_load; int result = 0; struct SMU_DRAMData_TOC *toc; + uint32_t num_entries = 0; if (!hwmgr->reload_fw) { pr_info("skip reloading...\n"); @@ -422,41 +423,41 @@ int smu7_request_smu_load_fw(struct pp_hwmgr *hwmgr) } toc = (struct SMU_DRAMData_TOC *)smu_data->header; - toc->num_entries = 0; toc->structure_version = 1; PP_ASSERT_WITH_CODE(0 == smu7_populate_single_firmware_entry(hwmgr, - UCODE_ID_RLC_G, &toc->entry[toc->num_entries++]), + UCODE_ID_RLC_G, &toc->entry[num_entries++]), "Failed to Get Firmware Entry.", return -EINVAL); PP_ASSERT_WITH_CODE(0 == smu7_populate_single_firmware_entry(hwmgr, - UCODE_ID_CP_CE, &toc->entry[toc->num_entries++]), + UCODE_ID_CP_CE, &toc->entry[num_entries++]), "Failed to Get Firmware Entry.", return -EINVAL); PP_ASSERT_WITH_CODE(0 == smu7_populate_single_firmware_entry(hwmgr, - UCODE_ID_CP_PFP, &toc->entry[toc->num_entries++]), + UCODE_ID_CP_PFP, &toc->entry[num_entries++]), "Failed to Get Firmware Entry.", return -EINVAL); PP_ASSERT_WITH_CODE(0 == smu7_populate_single_firmware_entry(hwmgr, - UCODE_ID_CP_ME, &toc->entry[toc->num_entries++]), + UCODE_ID_CP_ME, &toc->entry[num_entries++]), "Failed to Get Firmware Entry.", return -EINVAL); PP_ASSERT_WITH_CODE(0 == smu7_populate_single_firmware_entry(hwmgr, - UCODE_ID_CP_MEC, &toc->entry[toc->num_entries++]), + UCODE_ID_CP_MEC, &toc->entry[num_entries++]), "Failed to Get Firmware Entry.", return -EINVAL); PP_ASSERT_WITH_CODE(0 == smu7_populate_single_firmware_entry(hwmgr, - UCODE_ID_CP_MEC_JT1, &toc->entry[toc->num_entries++]), + UCODE_ID_CP_MEC_JT1, &toc->entry[num_entries++]), "Failed to Get Firmware Entry.", return -EINVAL); PP_ASSERT_WITH_CODE(0 == smu7_populate_single_firmware_entry(hwmgr, - UCODE_ID_CP_MEC_JT2, &toc->entry[toc->num_entries++]), + UCODE_ID_CP_MEC_JT2, &toc->entry[num_entries++]), "Failed to Get Firmware Entry.", return -EINVAL); PP_ASSERT_WITH_CODE(0 == smu7_populate_single_firmware_entry(hwmgr, - UCODE_ID_SDMA0, &toc->entry[toc->num_entries++]), + UCODE_ID_SDMA0, &toc->entry[num_entries++]), "Failed to Get Firmware Entry.", return -EINVAL); PP_ASSERT_WITH_CODE(0 == smu7_populate_single_firmware_entry(hwmgr, - UCODE_ID_SDMA1, &toc->entry[toc->num_entries++]), + UCODE_ID_SDMA1, &toc->entry[num_entries++]), "Failed to Get Firmware Entry.", return -EINVAL); if (!hwmgr->not_vf) PP_ASSERT_WITH_CODE(0 == smu7_populate_single_firmware_entry(hwmgr, - UCODE_ID_MEC_STORAGE, &toc->entry[toc->num_entries++]), + UCODE_ID_MEC_STORAGE, &toc->entry[num_entries++]), "Failed to Get Firmware Entry.", return -EINVAL); + toc->num_entries = num_entries; smu7_send_msg_to_smc_with_parameter(hwmgr, PPSMC_MSG_DRV_DRAM_ADDR_HI, upper_32_bits(smu_data->header_buffer.mc_addr)); smu7_send_msg_to_smc_with_parameter(hwmgr, PPSMC_MSG_DRV_DRAM_ADDR_LO, lower_32_bits(smu_data->header_buffer.mc_addr));

7 years, 4 months

1
0
0 0

FAILED: patch "[PATCH] drm/amdgpu: Verify root PD is mapped into kernel address" failed to apply to 4.14-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 4.14-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. thanks, greg k-h ------------------ original commit in Linus's tree ------------------ >From 9d4a0d4cdc8b5904ec7c9b9e04bab3e9e60d7a74 Mon Sep 17 00:00:00 2001 From: Andrey Grodzovsky <andrey.grodzovsky(a)amd.com> Date: Thu, 5 Jul 2018 14:49:34 -0400 Subject: [PATCH] drm/amdgpu: Verify root PD is mapped into kernel address space (v4) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Problem: When PD/PT update made by CPU root PD was not yet mapped causing page fault. Fix: Verify root PD is mapped into CPU address space. v2: Make sure that we add the root PD to the relocated list since then it's get mapped into CPU address space bt default in amdgpu_vm_update_directories. v3: Drop change to not move kernel type BOs to evicted list. v4: Remove redundant bo move to relocated list. Link: https://bugs.freedesktop.org/show_bug.cgi?id=107065 Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky(a)amd.com> Reviewed-by: Christian König <christian.koenig(a)amd.com> Reviewed-by: Junwei Zhang <Jerry.Zhang(a)amd.com> Signed-off-by: Alex Deucher <alexander.deucher(a)amd.com> Cc: stable(a)vger.kernel.org diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c index edf16b2b957a..fdcb498f6d19 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c @@ -107,6 +107,9 @@ static void amdgpu_vm_bo_base_init(struct amdgpu_vm_bo_base *base, return; list_add_tail(&base->bo_list, &bo->va); + if (bo->tbo.type == ttm_bo_type_kernel) + list_move(&base->vm_status, &vm->relocated); + if (bo->tbo.resv != vm->root.base.bo->tbo.resv) return; @@ -468,7 +471,6 @@ static int amdgpu_vm_alloc_levels(struct amdgpu_device *adev, pt->parent = amdgpu_bo_ref(parent->base.bo); amdgpu_vm_bo_base_init(&entry->base, vm, pt); - list_move(&entry->base.vm_status, &vm->relocated); } if (level < AMDGPU_VM_PTB) {

7 years, 4 months

1
0
0 0

FAILED: patch "[PATCH] drm/amdgpu: Verify root PD is mapped into kernel address" failed to apply to 4.17-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 4.17-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. thanks, greg k-h ------------------ original commit in Linus's tree ------------------ >From 9d4a0d4cdc8b5904ec7c9b9e04bab3e9e60d7a74 Mon Sep 17 00:00:00 2001 From: Andrey Grodzovsky <andrey.grodzovsky(a)amd.com> Date: Thu, 5 Jul 2018 14:49:34 -0400 Subject: [PATCH] drm/amdgpu: Verify root PD is mapped into kernel address space (v4) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Problem: When PD/PT update made by CPU root PD was not yet mapped causing page fault. Fix: Verify root PD is mapped into CPU address space. v2: Make sure that we add the root PD to the relocated list since then it's get mapped into CPU address space bt default in amdgpu_vm_update_directories. v3: Drop change to not move kernel type BOs to evicted list. v4: Remove redundant bo move to relocated list. Link: https://bugs.freedesktop.org/show_bug.cgi?id=107065 Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky(a)amd.com> Reviewed-by: Christian König <christian.koenig(a)amd.com> Reviewed-by: Junwei Zhang <Jerry.Zhang(a)amd.com> Signed-off-by: Alex Deucher <alexander.deucher(a)amd.com> Cc: stable(a)vger.kernel.org diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c index edf16b2b957a..fdcb498f6d19 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c @@ -107,6 +107,9 @@ static void amdgpu_vm_bo_base_init(struct amdgpu_vm_bo_base *base, return; list_add_tail(&base->bo_list, &bo->va); + if (bo->tbo.type == ttm_bo_type_kernel) + list_move(&base->vm_status, &vm->relocated); + if (bo->tbo.resv != vm->root.base.bo->tbo.resv) return; @@ -468,7 +471,6 @@ static int amdgpu_vm_alloc_levels(struct amdgpu_device *adev, pt->parent = amdgpu_bo_ref(parent->base.bo); amdgpu_vm_bo_base_init(&entry->base, vm, pt); - list_move(&entry->base.vm_status, &vm->relocated); } if (level < AMDGPU_VM_PTB) {

7 years, 4 months

1
0
0 0

FAILED: patch "[PATCH] KVM: PPC: Check if IOMMU page is contained in the pinned" failed to apply to 4.14-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 4.14-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. thanks, greg k-h ------------------ original commit in Linus's tree ------------------ >From 76fa4975f3ed12d15762bc979ca44078598ed8ee Mon Sep 17 00:00:00 2001 From: Alexey Kardashevskiy <aik(a)ozlabs.ru> Date: Tue, 17 Jul 2018 17:19:13 +1000 Subject: [PATCH] KVM: PPC: Check if IOMMU page is contained in the pinned physical page A VM which has: - a DMA capable device passed through to it (eg. network card); - running a malicious kernel that ignores H_PUT_TCE failure; - capability of using IOMMU pages bigger that physical pages can create an IOMMU mapping that exposes (for example) 16MB of the host physical memory to the device when only 64K was allocated to the VM. The remaining 16MB - 64K will be some other content of host memory, possibly including pages of the VM, but also pages of host kernel memory, host programs or other VMs. The attacking VM does not control the location of the page it can map, and is only allowed to map as many pages as it has pages of RAM. We already have a check in drivers/vfio/vfio_iommu_spapr_tce.c that an IOMMU page is contained in the physical page so the PCI hardware won't get access to unassigned host memory; however this check is missing in the KVM fastpath (H_PUT_TCE accelerated code). We were lucky so far and did not hit this yet as the very first time when the mapping happens we do not have tbl::it_userspace allocated yet and fall back to the userspace which in turn calls VFIO IOMMU driver, this fails and the guest does not retry, This stores the smallest preregistered page size in the preregistered region descriptor and changes the mm_iommu_xxx API to check this against the IOMMU page size. This calculates maximum page size as a minimum of the natural region alignment and compound page size. For the page shift this uses the shift returned by find_linux_pte() which indicates how the page is mapped to the current userspace - if the page is huge and this is not a zero, then it is a leaf pte and the page is mapped within the range. Fixes: 121f80ba68f1 ("KVM: PPC: VFIO: Add in-kernel acceleration for VFIO") Cc: stable(a)vger.kernel.org # v4.12+ Signed-off-by: Alexey Kardashevskiy <aik(a)ozlabs.ru> Reviewed-by: David Gibson <david(a)gibson.dropbear.id.au> Signed-off-by: Michael Ellerman <mpe(a)ellerman.id.au> diff --git a/arch/powerpc/include/asm/mmu_context.h b/arch/powerpc/include/asm/mmu_context.h index 896efa559996..79d570cbf332 100644 --- a/arch/powerpc/include/asm/mmu_context.h +++ b/arch/powerpc/include/asm/mmu_context.h @@ -35,9 +35,9 @@ extern struct mm_iommu_table_group_mem_t *mm_iommu_lookup_rm( extern struct mm_iommu_table_group_mem_t *mm_iommu_find(struct mm_struct *mm, unsigned long ua, unsigned long entries); extern long mm_iommu_ua_to_hpa(struct mm_iommu_table_group_mem_t *mem, - unsigned long ua, unsigned long *hpa); + unsigned long ua, unsigned int pageshift, unsigned long *hpa); extern long mm_iommu_ua_to_hpa_rm(struct mm_iommu_table_group_mem_t *mem, - unsigned long ua, unsigned long *hpa); + unsigned long ua, unsigned int pageshift, unsigned long *hpa); extern long mm_iommu_mapped_inc(struct mm_iommu_table_group_mem_t *mem); extern void mm_iommu_mapped_dec(struct mm_iommu_table_group_mem_t *mem); #endif diff --git a/arch/powerpc/kvm/book3s_64_vio.c b/arch/powerpc/kvm/book3s_64_vio.c index d066e37551ec..8c456fa691a5 100644 --- a/arch/powerpc/kvm/book3s_64_vio.c +++ b/arch/powerpc/kvm/book3s_64_vio.c @@ -449,7 +449,7 @@ long kvmppc_tce_iommu_do_map(struct kvm *kvm, struct iommu_table *tbl, /* This only handles v2 IOMMU type, v1 is handled via ioctl() */ return H_TOO_HARD; - if (WARN_ON_ONCE(mm_iommu_ua_to_hpa(mem, ua, &hpa))) + if (WARN_ON_ONCE(mm_iommu_ua_to_hpa(mem, ua, tbl->it_page_shift, &hpa))) return H_HARDWARE; if (mm_iommu_mapped_inc(mem)) diff --git a/arch/powerpc/kvm/book3s_64_vio_hv.c b/arch/powerpc/kvm/book3s_64_vio_hv.c index 925fc316a104..5b298f5a1a14 100644 --- a/arch/powerpc/kvm/book3s_64_vio_hv.c +++ b/arch/powerpc/kvm/book3s_64_vio_hv.c @@ -279,7 +279,8 @@ static long kvmppc_rm_tce_iommu_do_map(struct kvm *kvm, struct iommu_table *tbl, if (!mem) return H_TOO_HARD; - if (WARN_ON_ONCE_RM(mm_iommu_ua_to_hpa_rm(mem, ua, &hpa))) + if (WARN_ON_ONCE_RM(mm_iommu_ua_to_hpa_rm(mem, ua, tbl->it_page_shift, + &hpa))) return H_HARDWARE; pua = (void *) vmalloc_to_phys(pua); @@ -469,7 +470,8 @@ long kvmppc_rm_h_put_tce_indirect(struct kvm_vcpu *vcpu, mem = mm_iommu_lookup_rm(vcpu->kvm->mm, ua, IOMMU_PAGE_SIZE_4K); if (mem) - prereg = mm_iommu_ua_to_hpa_rm(mem, ua, &tces) == 0; + prereg = mm_iommu_ua_to_hpa_rm(mem, ua, + IOMMU_PAGE_SHIFT_4K, &tces) == 0; } if (!prereg) { diff --git a/arch/powerpc/mm/mmu_context_iommu.c b/arch/powerpc/mm/mmu_context_iommu.c index abb43646927a..a4ca57612558 100644 --- a/arch/powerpc/mm/mmu_context_iommu.c +++ b/arch/powerpc/mm/mmu_context_iommu.c @@ -19,6 +19,7 @@ #include <linux/hugetlb.h> #include <linux/swap.h> #include <asm/mmu_context.h> +#include <asm/pte-walk.h> static DEFINE_MUTEX(mem_list_mutex); @@ -27,6 +28,7 @@ struct mm_iommu_table_group_mem_t { struct rcu_head rcu; unsigned long used; atomic64_t mapped; + unsigned int pageshift; u64 ua; /* userspace address */ u64 entries; /* number of entries in hpas[] */ u64 *hpas; /* vmalloc'ed */ @@ -125,6 +127,8 @@ long mm_iommu_get(struct mm_struct *mm, unsigned long ua, unsigned long entries, { struct mm_iommu_table_group_mem_t *mem; long i, j, ret = 0, locked_entries = 0; + unsigned int pageshift; + unsigned long flags; struct page *page = NULL; mutex_lock(&mem_list_mutex); @@ -159,6 +163,12 @@ long mm_iommu_get(struct mm_struct *mm, unsigned long ua, unsigned long entries, goto unlock_exit; } + /* + * For a starting point for a maximum page size calculation + * we use @ua and @entries natural alignment to allow IOMMU pages + * smaller than huge pages but still bigger than PAGE_SIZE. + */ + mem->pageshift = __ffs(ua | (entries << PAGE_SHIFT)); mem->hpas = vzalloc(array_size(entries, sizeof(mem->hpas[0]))); if (!mem->hpas) { kfree(mem); @@ -199,6 +209,23 @@ long mm_iommu_get(struct mm_struct *mm, unsigned long ua, unsigned long entries, } } populate: + pageshift = PAGE_SHIFT; + if (PageCompound(page)) { + pte_t *pte; + struct page *head = compound_head(page); + unsigned int compshift = compound_order(head); + + local_irq_save(flags); /* disables as well */ + pte = find_linux_pte(mm->pgd, ua, NULL, &pageshift); + local_irq_restore(flags); + + /* Double check it is still the same pinned page */ + if (pte && pte_page(*pte) == head && + pageshift == compshift) + pageshift = max_t(unsigned int, pageshift, + PAGE_SHIFT); + } + mem->pageshift = min(mem->pageshift, pageshift); mem->hpas[i] = page_to_pfn(page) << PAGE_SHIFT; } @@ -349,7 +376,7 @@ struct mm_iommu_table_group_mem_t *mm_iommu_find(struct mm_struct *mm, EXPORT_SYMBOL_GPL(mm_iommu_find); long mm_iommu_ua_to_hpa(struct mm_iommu_table_group_mem_t *mem, - unsigned long ua, unsigned long *hpa) + unsigned long ua, unsigned int pageshift, unsigned long *hpa) { const long entry = (ua - mem->ua) >> PAGE_SHIFT; u64 *va = &mem->hpas[entry]; @@ -357,6 +384,9 @@ long mm_iommu_ua_to_hpa(struct mm_iommu_table_group_mem_t *mem, if (entry >= mem->entries) return -EFAULT; + if (pageshift > mem->pageshift) + return -EFAULT; + *hpa = *va | (ua & ~PAGE_MASK); return 0; @@ -364,7 +394,7 @@ long mm_iommu_ua_to_hpa(struct mm_iommu_table_group_mem_t *mem, EXPORT_SYMBOL_GPL(mm_iommu_ua_to_hpa); long mm_iommu_ua_to_hpa_rm(struct mm_iommu_table_group_mem_t *mem, - unsigned long ua, unsigned long *hpa) + unsigned long ua, unsigned int pageshift, unsigned long *hpa) { const long entry = (ua - mem->ua) >> PAGE_SHIFT; void *va = &mem->hpas[entry]; @@ -373,6 +403,9 @@ long mm_iommu_ua_to_hpa_rm(struct mm_iommu_table_group_mem_t *mem, if (entry >= mem->entries) return -EFAULT; + if (pageshift > mem->pageshift) + return -EFAULT; + pa = (void *) vmalloc_to_phys(va); if (!pa) return -EFAULT; diff --git a/drivers/vfio/vfio_iommu_spapr_tce.c b/drivers/vfio/vfio_iommu_spapr_tce.c index 2da5f054257a..7cd63b0c1a46 100644 --- a/drivers/vfio/vfio_iommu_spapr_tce.c +++ b/drivers/vfio/vfio_iommu_spapr_tce.c @@ -467,7 +467,7 @@ static int tce_iommu_prereg_ua_to_hpa(struct tce_container *container, if (!mem) return -EINVAL; - ret = mm_iommu_ua_to_hpa(mem, tce, phpa); + ret = mm_iommu_ua_to_hpa(mem, tce, shift, phpa); if (ret) return -EINVAL;

7 years, 4 months

1
0
0 0

FAILED: patch "[PATCH] KVM: PPC: Check if IOMMU page is contained in the pinned" failed to apply to 4.17-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 4.17-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. thanks, greg k-h ------------------ original commit in Linus's tree ------------------ >From 76fa4975f3ed12d15762bc979ca44078598ed8ee Mon Sep 17 00:00:00 2001 From: Alexey Kardashevskiy <aik(a)ozlabs.ru> Date: Tue, 17 Jul 2018 17:19:13 +1000 Subject: [PATCH] KVM: PPC: Check if IOMMU page is contained in the pinned physical page A VM which has: - a DMA capable device passed through to it (eg. network card); - running a malicious kernel that ignores H_PUT_TCE failure; - capability of using IOMMU pages bigger that physical pages can create an IOMMU mapping that exposes (for example) 16MB of the host physical memory to the device when only 64K was allocated to the VM. The remaining 16MB - 64K will be some other content of host memory, possibly including pages of the VM, but also pages of host kernel memory, host programs or other VMs. The attacking VM does not control the location of the page it can map, and is only allowed to map as many pages as it has pages of RAM. We already have a check in drivers/vfio/vfio_iommu_spapr_tce.c that an IOMMU page is contained in the physical page so the PCI hardware won't get access to unassigned host memory; however this check is missing in the KVM fastpath (H_PUT_TCE accelerated code). We were lucky so far and did not hit this yet as the very first time when the mapping happens we do not have tbl::it_userspace allocated yet and fall back to the userspace which in turn calls VFIO IOMMU driver, this fails and the guest does not retry, This stores the smallest preregistered page size in the preregistered region descriptor and changes the mm_iommu_xxx API to check this against the IOMMU page size. This calculates maximum page size as a minimum of the natural region alignment and compound page size. For the page shift this uses the shift returned by find_linux_pte() which indicates how the page is mapped to the current userspace - if the page is huge and this is not a zero, then it is a leaf pte and the page is mapped within the range. Fixes: 121f80ba68f1 ("KVM: PPC: VFIO: Add in-kernel acceleration for VFIO") Cc: stable(a)vger.kernel.org # v4.12+ Signed-off-by: Alexey Kardashevskiy <aik(a)ozlabs.ru> Reviewed-by: David Gibson <david(a)gibson.dropbear.id.au> Signed-off-by: Michael Ellerman <mpe(a)ellerman.id.au> diff --git a/arch/powerpc/include/asm/mmu_context.h b/arch/powerpc/include/asm/mmu_context.h index 896efa559996..79d570cbf332 100644 --- a/arch/powerpc/include/asm/mmu_context.h +++ b/arch/powerpc/include/asm/mmu_context.h @@ -35,9 +35,9 @@ extern struct mm_iommu_table_group_mem_t *mm_iommu_lookup_rm( extern struct mm_iommu_table_group_mem_t *mm_iommu_find(struct mm_struct *mm, unsigned long ua, unsigned long entries); extern long mm_iommu_ua_to_hpa(struct mm_iommu_table_group_mem_t *mem, - unsigned long ua, unsigned long *hpa); + unsigned long ua, unsigned int pageshift, unsigned long *hpa); extern long mm_iommu_ua_to_hpa_rm(struct mm_iommu_table_group_mem_t *mem, - unsigned long ua, unsigned long *hpa); + unsigned long ua, unsigned int pageshift, unsigned long *hpa); extern long mm_iommu_mapped_inc(struct mm_iommu_table_group_mem_t *mem); extern void mm_iommu_mapped_dec(struct mm_iommu_table_group_mem_t *mem); #endif diff --git a/arch/powerpc/kvm/book3s_64_vio.c b/arch/powerpc/kvm/book3s_64_vio.c index d066e37551ec..8c456fa691a5 100644 --- a/arch/powerpc/kvm/book3s_64_vio.c +++ b/arch/powerpc/kvm/book3s_64_vio.c @@ -449,7 +449,7 @@ long kvmppc_tce_iommu_do_map(struct kvm *kvm, struct iommu_table *tbl, /* This only handles v2 IOMMU type, v1 is handled via ioctl() */ return H_TOO_HARD; - if (WARN_ON_ONCE(mm_iommu_ua_to_hpa(mem, ua, &hpa))) + if (WARN_ON_ONCE(mm_iommu_ua_to_hpa(mem, ua, tbl->it_page_shift, &hpa))) return H_HARDWARE; if (mm_iommu_mapped_inc(mem)) diff --git a/arch/powerpc/kvm/book3s_64_vio_hv.c b/arch/powerpc/kvm/book3s_64_vio_hv.c index 925fc316a104..5b298f5a1a14 100644 --- a/arch/powerpc/kvm/book3s_64_vio_hv.c +++ b/arch/powerpc/kvm/book3s_64_vio_hv.c @@ -279,7 +279,8 @@ static long kvmppc_rm_tce_iommu_do_map(struct kvm *kvm, struct iommu_table *tbl, if (!mem) return H_TOO_HARD; - if (WARN_ON_ONCE_RM(mm_iommu_ua_to_hpa_rm(mem, ua, &hpa))) + if (WARN_ON_ONCE_RM(mm_iommu_ua_to_hpa_rm(mem, ua, tbl->it_page_shift, + &hpa))) return H_HARDWARE; pua = (void *) vmalloc_to_phys(pua); @@ -469,7 +470,8 @@ long kvmppc_rm_h_put_tce_indirect(struct kvm_vcpu *vcpu, mem = mm_iommu_lookup_rm(vcpu->kvm->mm, ua, IOMMU_PAGE_SIZE_4K); if (mem) - prereg = mm_iommu_ua_to_hpa_rm(mem, ua, &tces) == 0; + prereg = mm_iommu_ua_to_hpa_rm(mem, ua, + IOMMU_PAGE_SHIFT_4K, &tces) == 0; } if (!prereg) { diff --git a/arch/powerpc/mm/mmu_context_iommu.c b/arch/powerpc/mm/mmu_context_iommu.c index abb43646927a..a4ca57612558 100644 --- a/arch/powerpc/mm/mmu_context_iommu.c +++ b/arch/powerpc/mm/mmu_context_iommu.c @@ -19,6 +19,7 @@ #include <linux/hugetlb.h> #include <linux/swap.h> #include <asm/mmu_context.h> +#include <asm/pte-walk.h> static DEFINE_MUTEX(mem_list_mutex); @@ -27,6 +28,7 @@ struct mm_iommu_table_group_mem_t { struct rcu_head rcu; unsigned long used; atomic64_t mapped; + unsigned int pageshift; u64 ua; /* userspace address */ u64 entries; /* number of entries in hpas[] */ u64 *hpas; /* vmalloc'ed */ @@ -125,6 +127,8 @@ long mm_iommu_get(struct mm_struct *mm, unsigned long ua, unsigned long entries, { struct mm_iommu_table_group_mem_t *mem; long i, j, ret = 0, locked_entries = 0; + unsigned int pageshift; + unsigned long flags; struct page *page = NULL; mutex_lock(&mem_list_mutex); @@ -159,6 +163,12 @@ long mm_iommu_get(struct mm_struct *mm, unsigned long ua, unsigned long entries, goto unlock_exit; } + /* + * For a starting point for a maximum page size calculation + * we use @ua and @entries natural alignment to allow IOMMU pages + * smaller than huge pages but still bigger than PAGE_SIZE. + */ + mem->pageshift = __ffs(ua | (entries << PAGE_SHIFT)); mem->hpas = vzalloc(array_size(entries, sizeof(mem->hpas[0]))); if (!mem->hpas) { kfree(mem); @@ -199,6 +209,23 @@ long mm_iommu_get(struct mm_struct *mm, unsigned long ua, unsigned long entries, } } populate: + pageshift = PAGE_SHIFT; + if (PageCompound(page)) { + pte_t *pte; + struct page *head = compound_head(page); + unsigned int compshift = compound_order(head); + + local_irq_save(flags); /* disables as well */ + pte = find_linux_pte(mm->pgd, ua, NULL, &pageshift); + local_irq_restore(flags); + + /* Double check it is still the same pinned page */ + if (pte && pte_page(*pte) == head && + pageshift == compshift) + pageshift = max_t(unsigned int, pageshift, + PAGE_SHIFT); + } + mem->pageshift = min(mem->pageshift, pageshift); mem->hpas[i] = page_to_pfn(page) << PAGE_SHIFT; } @@ -349,7 +376,7 @@ struct mm_iommu_table_group_mem_t *mm_iommu_find(struct mm_struct *mm, EXPORT_SYMBOL_GPL(mm_iommu_find); long mm_iommu_ua_to_hpa(struct mm_iommu_table_group_mem_t *mem, - unsigned long ua, unsigned long *hpa) + unsigned long ua, unsigned int pageshift, unsigned long *hpa) { const long entry = (ua - mem->ua) >> PAGE_SHIFT; u64 *va = &mem->hpas[entry]; @@ -357,6 +384,9 @@ long mm_iommu_ua_to_hpa(struct mm_iommu_table_group_mem_t *mem, if (entry >= mem->entries) return -EFAULT; + if (pageshift > mem->pageshift) + return -EFAULT; + *hpa = *va | (ua & ~PAGE_MASK); return 0; @@ -364,7 +394,7 @@ long mm_iommu_ua_to_hpa(struct mm_iommu_table_group_mem_t *mem, EXPORT_SYMBOL_GPL(mm_iommu_ua_to_hpa); long mm_iommu_ua_to_hpa_rm(struct mm_iommu_table_group_mem_t *mem, - unsigned long ua, unsigned long *hpa) + unsigned long ua, unsigned int pageshift, unsigned long *hpa) { const long entry = (ua - mem->ua) >> PAGE_SHIFT; void *va = &mem->hpas[entry]; @@ -373,6 +403,9 @@ long mm_iommu_ua_to_hpa_rm(struct mm_iommu_table_group_mem_t *mem, if (entry >= mem->entries) return -EFAULT; + if (pageshift > mem->pageshift) + return -EFAULT; + pa = (void *) vmalloc_to_phys(va); if (!pa) return -EFAULT; diff --git a/drivers/vfio/vfio_iommu_spapr_tce.c b/drivers/vfio/vfio_iommu_spapr_tce.c index 2da5f054257a..7cd63b0c1a46 100644 --- a/drivers/vfio/vfio_iommu_spapr_tce.c +++ b/drivers/vfio/vfio_iommu_spapr_tce.c @@ -467,7 +467,7 @@ static int tce_iommu_prereg_ua_to_hpa(struct tce_container *container, if (!mem) return -EINVAL; - ret = mm_iommu_ua_to_hpa(mem, tce, phpa); + ret = mm_iommu_ua_to_hpa(mem, tce, shift, phpa); if (ret) return -EINVAL;

7 years, 4 months

1
0
0 0

FAILED: patch "[PATCH] Btrfs: fix file data corruption after cloning a range and" failed to apply to 4.4-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 4.4-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. thanks, greg k-h ------------------ original commit in Linus's tree ------------------ >From bd3599a0e142cd73edd3b6801068ac3f48ac771a Mon Sep 17 00:00:00 2001 From: Filipe Manana <fdmanana(a)suse.com> Date: Thu, 12 Jul 2018 01:36:43 +0100 Subject: [PATCH] Btrfs: fix file data corruption after cloning a range and fsync When we clone a range into a file we can end up dropping existing extent maps (or trimming them) and replacing them with new ones if the range to be cloned overlaps with a range in the destination inode. When that happens we add the new extent maps to the list of modified extents in the inode's extent map tree, so that a "fast" fsync (the flag BTRFS_INODE_NEEDS_FULL_SYNC not set in the inode) will see the extent maps and log corresponding extent items. However, at the end of range cloning operation we do truncate all the pages in the affected range (in order to ensure future reads will not get stale data). Sometimes this truncation will release the corresponding extent maps besides the pages from the page cache. If this happens, then a "fast" fsync operation will miss logging some extent items, because it relies exclusively on the extent maps being present in the inode's extent tree, leading to data loss/corruption if the fsync ends up using the same transaction used by the clone operation (that transaction was not committed in the meanwhile). An extent map is released through the callback btrfs_invalidatepage(), which gets called by truncate_inode_pages_range(), and it calls __btrfs_releasepage(). The later ends up calling try_release_extent_mapping() which will release the extent map if some conditions are met, like the file size being greater than 16Mb, gfp flags allow blocking and the range not being locked (which is the case during the clone operation) nor being the extent map flagged as pinned (also the case for cloning). The following example, turned into a test for fstests, reproduces the issue: $ mkfs.btrfs -f /dev/sdb $ mount /dev/sdb /mnt $ xfs_io -f -c "pwrite -S 0x18 9000K 6908K" /mnt/foo $ xfs_io -f -c "pwrite -S 0x20 2572K 156K" /mnt/bar $ xfs_io -c "fsync" /mnt/bar # reflink destination offset corresponds to the size of file bar, # 2728Kb minus 4Kb. $ xfs_io -c ""reflink ${SCRATCH_MNT}/foo 0 2724K 15908K" /mnt/bar $ xfs_io -c "fsync" /mnt/bar $ md5sum /mnt/bar 95a95813a8c2abc9aa75a6c2914a077e /mnt/bar <power fail> $ mount /dev/sdb /mnt $ md5sum /mnt/bar 207fd8d0b161be8a84b945f0df8d5f8d /mnt/bar # digest should be 95a95813a8c2abc9aa75a6c2914a077e like before the # power failure In the above example, the destination offset of the clone operation corresponds to the size of the "bar" file minus 4Kb. So during the clone operation, the extent map covering the range from 2572Kb to 2728Kb gets trimmed so that it ends at offset 2724Kb, and a new extent map covering the range from 2724Kb to 11724Kb is created. So at the end of the clone operation when we ask to truncate the pages in the range from 2724Kb to 2724Kb + 15908Kb, the page invalidation callback ends up removing the new extent map (through try_release_extent_mapping()) when the page at offset 2724Kb is passed to that callback. Fix this by setting the bit BTRFS_INODE_NEEDS_FULL_SYNC whenever an extent map is removed at try_release_extent_mapping(), forcing the next fsync to search for modified extents in the fs/subvolume tree instead of relying on the presence of extent maps in memory. This way we can continue doing a "fast" fsync if the destination range of a clone operation does not overlap with an existing range or if any of the criteria necessary to remove an extent map at try_release_extent_mapping() is not met (file size not bigger then 16Mb or gfp flags do not allow blocking). CC: stable(a)vger.kernel.org # 3.16+ Signed-off-by: Filipe Manana <fdmanana(a)suse.com> Signed-off-by: David Sterba <dsterba(a)suse.com> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index 1aa91d57404a..8fd86e29085f 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -4241,8 +4241,9 @@ int try_release_extent_mapping(struct page *page, gfp_t mask) struct extent_map *em; u64 start = page_offset(page); u64 end = start + PAGE_SIZE - 1; - struct extent_io_tree *tree = &BTRFS_I(page->mapping->host)->io_tree; - struct extent_map_tree *map = &BTRFS_I(page->mapping->host)->extent_tree; + struct btrfs_inode *btrfs_inode = BTRFS_I(page->mapping->host); + struct extent_io_tree *tree = &btrfs_inode->io_tree; + struct extent_map_tree *map = &btrfs_inode->extent_tree; if (gfpflags_allow_blocking(mask) && page->mapping->host->i_size > SZ_16M) { @@ -4265,6 +4266,8 @@ int try_release_extent_mapping(struct page *page, gfp_t mask) extent_map_end(em) - 1, EXTENT_LOCKED | EXTENT_WRITEBACK, 0, NULL)) { + set_bit(BTRFS_INODE_NEEDS_FULL_SYNC, + &btrfs_inode->runtime_flags); remove_extent_mapping(map, em); /* once for the rb tree */ free_extent_map(em);

7 years, 4 months

1
0
0 0

FAILED: patch "[PATCH] Btrfs: fix file data corruption after cloning a range and" failed to apply to 4.9-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 4.9-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. thanks, greg k-h ------------------ original commit in Linus's tree ------------------ >From bd3599a0e142cd73edd3b6801068ac3f48ac771a Mon Sep 17 00:00:00 2001 From: Filipe Manana <fdmanana(a)suse.com> Date: Thu, 12 Jul 2018 01:36:43 +0100 Subject: [PATCH] Btrfs: fix file data corruption after cloning a range and fsync When we clone a range into a file we can end up dropping existing extent maps (or trimming them) and replacing them with new ones if the range to be cloned overlaps with a range in the destination inode. When that happens we add the new extent maps to the list of modified extents in the inode's extent map tree, so that a "fast" fsync (the flag BTRFS_INODE_NEEDS_FULL_SYNC not set in the inode) will see the extent maps and log corresponding extent items. However, at the end of range cloning operation we do truncate all the pages in the affected range (in order to ensure future reads will not get stale data). Sometimes this truncation will release the corresponding extent maps besides the pages from the page cache. If this happens, then a "fast" fsync operation will miss logging some extent items, because it relies exclusively on the extent maps being present in the inode's extent tree, leading to data loss/corruption if the fsync ends up using the same transaction used by the clone operation (that transaction was not committed in the meanwhile). An extent map is released through the callback btrfs_invalidatepage(), which gets called by truncate_inode_pages_range(), and it calls __btrfs_releasepage(). The later ends up calling try_release_extent_mapping() which will release the extent map if some conditions are met, like the file size being greater than 16Mb, gfp flags allow blocking and the range not being locked (which is the case during the clone operation) nor being the extent map flagged as pinned (also the case for cloning). The following example, turned into a test for fstests, reproduces the issue: $ mkfs.btrfs -f /dev/sdb $ mount /dev/sdb /mnt $ xfs_io -f -c "pwrite -S 0x18 9000K 6908K" /mnt/foo $ xfs_io -f -c "pwrite -S 0x20 2572K 156K" /mnt/bar $ xfs_io -c "fsync" /mnt/bar # reflink destination offset corresponds to the size of file bar, # 2728Kb minus 4Kb. $ xfs_io -c ""reflink ${SCRATCH_MNT}/foo 0 2724K 15908K" /mnt/bar $ xfs_io -c "fsync" /mnt/bar $ md5sum /mnt/bar 95a95813a8c2abc9aa75a6c2914a077e /mnt/bar <power fail> $ mount /dev/sdb /mnt $ md5sum /mnt/bar 207fd8d0b161be8a84b945f0df8d5f8d /mnt/bar # digest should be 95a95813a8c2abc9aa75a6c2914a077e like before the # power failure In the above example, the destination offset of the clone operation corresponds to the size of the "bar" file minus 4Kb. So during the clone operation, the extent map covering the range from 2572Kb to 2728Kb gets trimmed so that it ends at offset 2724Kb, and a new extent map covering the range from 2724Kb to 11724Kb is created. So at the end of the clone operation when we ask to truncate the pages in the range from 2724Kb to 2724Kb + 15908Kb, the page invalidation callback ends up removing the new extent map (through try_release_extent_mapping()) when the page at offset 2724Kb is passed to that callback. Fix this by setting the bit BTRFS_INODE_NEEDS_FULL_SYNC whenever an extent map is removed at try_release_extent_mapping(), forcing the next fsync to search for modified extents in the fs/subvolume tree instead of relying on the presence of extent maps in memory. This way we can continue doing a "fast" fsync if the destination range of a clone operation does not overlap with an existing range or if any of the criteria necessary to remove an extent map at try_release_extent_mapping() is not met (file size not bigger then 16Mb or gfp flags do not allow blocking). CC: stable(a)vger.kernel.org # 3.16+ Signed-off-by: Filipe Manana <fdmanana(a)suse.com> Signed-off-by: David Sterba <dsterba(a)suse.com> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index 1aa91d57404a..8fd86e29085f 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -4241,8 +4241,9 @@ int try_release_extent_mapping(struct page *page, gfp_t mask) struct extent_map *em; u64 start = page_offset(page); u64 end = start + PAGE_SIZE - 1; - struct extent_io_tree *tree = &BTRFS_I(page->mapping->host)->io_tree; - struct extent_map_tree *map = &BTRFS_I(page->mapping->host)->extent_tree; + struct btrfs_inode *btrfs_inode = BTRFS_I(page->mapping->host); + struct extent_io_tree *tree = &btrfs_inode->io_tree; + struct extent_map_tree *map = &btrfs_inode->extent_tree; if (gfpflags_allow_blocking(mask) && page->mapping->host->i_size > SZ_16M) { @@ -4265,6 +4266,8 @@ int try_release_extent_mapping(struct page *page, gfp_t mask) extent_map_end(em) - 1, EXTENT_LOCKED | EXTENT_WRITEBACK, 0, NULL)) { + set_bit(BTRFS_INODE_NEEDS_FULL_SYNC, + &btrfs_inode->runtime_flags); remove_extent_mapping(map, em); /* once for the rb tree */ free_extent_map(em);

7 years, 4 months

1
0
0 0

2025

2024

2023

2022

2021

2020

2019

2018

2017

Linux-stable-mirror July 2018