Hub driver will try to disable a USB3 device twice at logical disconnect,
racing with xhci_free_dev() callback from the first port disable.
This can be triggered with "udisksctl power-off --block-device <disk>"
or by writing "1" to the "remove" sysfs file for a USB3 device
in 4.17-rc4.
USB3 devices don't have a similar disabled link state as USB2 devices,
and use a U3 suspended link state instead. In this state the port
is still enabled and connected.
hub_port_connect() first disconnects the device, then later it notices
that device is still enabled (due to U3 states) it will try to disable
the port again (set to U3).
The xhci_free_dev() called during device disable is async, so checking
for existing xhci->devs[i] when setting link state to U3 the second time
was successful, even if device was being freed.
The regression was caused by, and whole thing revealed by,
Commit 44a182b9d177 ("xhci: Fix use-after-free in xhci_free_virt_device")
which sets xhci->devs[i]->udev to NULL before xhci_virt_dev() returned.
and causes a NULL pointer dereference the second time we try to set U3.
Fix this by checking xhci->devs[i]->udev exists before setting link state.
The original patch went to stable so this fix needs to be applied there as
well.
Fixes: 44a182b9d177 ("xhci: Fix use-after-free in xhci_free_virt_device")
Cc: <stable(a)vger.kernel.org>
Reported-by: Jordan Glover <Golden_Miller83(a)protonmail.ch>
Tested-by: Jordan Glover <Golden_Miller83(a)protonmail.ch>
Signed-off-by: Mathias Nyman <mathias.nyman(a)linux.intel.com>
---
drivers/usb/host/xhci-hub.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/usb/host/xhci-hub.c b/drivers/usb/host/xhci-hub.c
index 72ebbc9..32cd52c 100644
--- a/drivers/usb/host/xhci-hub.c
+++ b/drivers/usb/host/xhci-hub.c
@@ -354,7 +354,7 @@ int xhci_find_slot_id_by_port(struct usb_hcd *hcd, struct xhci_hcd *xhci,
slot_id = 0;
for (i = 0; i < MAX_HC_SLOTS; i++) {
- if (!xhci->devs[i])
+ if (!xhci->devs[i] || !xhci->devs[i]->udev)
continue;
speed = xhci->devs[i]->udev->speed;
if (((speed >= USB_SPEED_SUPER) == (hcd->speed >= HCD_USB3))
--
2.7.4
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From ecf08dad723d3e000aecff6c396f54772d124733 Mon Sep 17 00:00:00 2001
From: Anthoine Bourgeois <anthoine.bourgeois(a)blade-group.com>
Date: Sun, 29 Apr 2018 22:05:58 +0000
Subject: [PATCH] KVM: x86: remove APIC Timer periodic/oneshot spikes
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Since the commit "8003c9ae204e: add APIC Timer periodic/oneshot mode VMX
preemption timer support", a Windows 10 guest has some erratic timer
spikes.
Here the results on a 150000 times 1ms timer without any load:
Before 8003c9ae204e | After 8003c9ae204e
Max 1834us | 86000us
Mean 1100us | 1021us
Deviation 59us | 149us
Here the results on a 150000 times 1ms timer with a cpu-z stress test:
Before 8003c9ae204e | After 8003c9ae204e
Max 32000us | 140000us
Mean 1006us | 1997us
Deviation 140us | 11095us
The root cause of the problem is starting hrtimer with an expiry time
already in the past can take more than 20 milliseconds to trigger the
timer function. It can be solved by forward such past timers
immediately, rather than submitting them to hrtimer_start().
In case the timer is periodic, update the target expiration and call
hrtimer_start with it.
v2: Check if the tsc deadline is already expired. Thank you Mika.
v3: Execute the past timers immediately rather than submitting them to
hrtimer_start().
v4: Rearm the periodic timer with advance_periodic_target_expiration() a
simpler version of set_target_expiration(). Thank you Paolo.
Cc: Mika Penttilä <mika.penttila(a)nextfour.com>
Cc: Wanpeng Li <kernellwp(a)gmail.com>
Cc: Paolo Bonzini <pbonzini(a)redhat.com>
Cc: stable(a)vger.kernel.org
Signed-off-by: Anthoine Bourgeois <anthoine.bourgeois(a)blade-group.com>
8003c9ae204e ("KVM: LAPIC: add APIC Timer periodic/oneshot mode VMX preemption timer support")
Signed-off-by: Radim Krčmář <rkrcmar(a)redhat.com>
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 70dcb5548022..b74c9c1405b9 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -1463,23 +1463,6 @@ static void start_sw_tscdeadline(struct kvm_lapic *apic)
local_irq_restore(flags);
}
-static void start_sw_period(struct kvm_lapic *apic)
-{
- if (!apic->lapic_timer.period)
- return;
-
- if (apic_lvtt_oneshot(apic) &&
- ktime_after(ktime_get(),
- apic->lapic_timer.target_expiration)) {
- apic_timer_expired(apic);
- return;
- }
-
- hrtimer_start(&apic->lapic_timer.timer,
- apic->lapic_timer.target_expiration,
- HRTIMER_MODE_ABS_PINNED);
-}
-
static void update_target_expiration(struct kvm_lapic *apic, uint32_t old_divisor)
{
ktime_t now, remaining;
@@ -1546,6 +1529,26 @@ static void advance_periodic_target_expiration(struct kvm_lapic *apic)
apic->lapic_timer.period);
}
+static void start_sw_period(struct kvm_lapic *apic)
+{
+ if (!apic->lapic_timer.period)
+ return;
+
+ if (ktime_after(ktime_get(),
+ apic->lapic_timer.target_expiration)) {
+ apic_timer_expired(apic);
+
+ if (apic_lvtt_oneshot(apic))
+ return;
+
+ advance_periodic_target_expiration(apic);
+ }
+
+ hrtimer_start(&apic->lapic_timer.timer,
+ apic->lapic_timer.target_expiration,
+ HRTIMER_MODE_ABS_PINNED);
+}
+
bool kvm_lapic_hv_timer_in_use(struct kvm_vcpu *vcpu)
{
if (!lapic_in_kernel(vcpu))
From: Theodore Ts'o <tytso(a)mit.edu>
CVE-2018-1092
If the root directory has an i_links_count of zero, then when the file
system is mounted, then when ext4_fill_super() notices the problem and
tries to call iput() the root directory in the error return path,
ext4_evict_inode() will try to free the inode on disk, before all of
the file system structures are set up, and this will result in an OOPS
caused by a NULL pointer dereference.
This issue has been assigned CVE-2018-1092.
https://bugzilla.kernel.org/show_bug.cgi?id=199179https://bugzilla.redhat.com/show_bug.cgi?id=1560777
Reported-by: Wen Xu <wen.xu(a)gatech.edu>
Signed-off-by: Theodore Ts'o <tytso(a)mit.edu>
Cc: stable(a)vger.kernel.org
(cherry-picked from 8e4b5eae5decd9dfe5a4ee369c22028f90ab4c44)
Signed-off-by: Khalid Elmously <khalid.elmously(a)canonical.com>
---
fs/ext4/inode.c | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 3d4d1dccc8a1..398faeff1938 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -4745,6 +4745,12 @@ struct inode *ext4_iget(struct super_block *sb, unsigned long ino)
goto bad_inode;
raw_inode = ext4_raw_inode(&iloc);
+ if ((ino == EXT4_ROOT_INO) && (raw_inode->i_links_count == 0)) {
+ EXT4_ERROR_INODE(inode, "root inode unallocated");
+ ret = -EFSCORRUPTED;
+ goto bad_inode;
+ }
+
if (EXT4_INODE_SIZE(inode->i_sb) > EXT4_GOOD_OLD_INODE_SIZE) {
ei->i_extra_isize = le16_to_cpu(raw_inode->i_extra_isize);
if (EXT4_GOOD_OLD_INODE_SIZE + ei->i_extra_isize >
--
2.17.0
commit c3856aeb29402e94ad9b3879030165cc6a4fdc56 upstream.
This fixes several bugs in the radix page fault handler relating to
the way large pages in the memory backing the guest were handled.
First, the check for large pages only checked for explicit huge pages
and missed transparent huge pages. Then the check that the addresses
(host virtual vs. guest physical) had appropriate alignment was
wrong, meaning that the code never put a large page in the partition
scoped radix tree; it was always demoted to a small page.
Fixing this exposed bugs in kvmppc_create_pte(). We were never
invalidating a 2MB PTE, which meant that if a page was initially
faulted in without write permission and the guest then attempted
to store to it, we would never update the PTE to have write permission.
If we find a valid 2MB PTE in the PMD, we need to clear it and
do a TLB invalidation before installing either the new 2MB PTE or
a pointer to a page table page.
This also corrects an assumption that get_user_pages_fast would set
the _PAGE_DIRTY bit if we are writing, which is not true. Instead we
mark the page dirty explicitly with set_page_dirty_lock(). This
also means we don't need the dirty bit set on the host PTE when
providing write access on a read fault.
[paulus(a)ozlabs.org - use mark_pages_dirty instead of
kvmppc_update_dirty_map]
Signed-off-by: Paul Mackerras <paulus(a)ozlabs.org>
---
arch/powerpc/kvm/book3s_64_mmu_radix.c | 72 ++++++++++++++++++++++------------
1 file changed, 46 insertions(+), 26 deletions(-)
diff --git a/arch/powerpc/kvm/book3s_64_mmu_radix.c b/arch/powerpc/kvm/book3s_64_mmu_radix.c
index c5d7435455f1..27a41695fcfd 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_radix.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_radix.c
@@ -19,6 +19,9 @@
#include <asm/pgalloc.h>
#include <asm/pte-walk.h>
+static void mark_pages_dirty(struct kvm *kvm, struct kvm_memory_slot *memslot,
+ unsigned long gfn, unsigned int order);
+
/*
* Supported radix tree geometry.
* Like p9, we support either 5 or 9 bits at the first (lowest) level,
@@ -195,6 +198,12 @@ static void kvmppc_pte_free(pte_t *ptep)
kmem_cache_free(kvm_pte_cache, ptep);
}
+/* Like pmd_huge() and pmd_large(), but works regardless of config options */
+static inline int pmd_is_leaf(pmd_t pmd)
+{
+ return !!(pmd_val(pmd) & _PAGE_PTE);
+}
+
static int kvmppc_create_pte(struct kvm *kvm, pte_t pte, unsigned long gpa,
unsigned int level, unsigned long mmu_seq)
{
@@ -219,7 +228,7 @@ static int kvmppc_create_pte(struct kvm *kvm, pte_t pte, unsigned long gpa,
else
new_pmd = pmd_alloc_one(kvm->mm, gpa);
- if (level == 0 && !(pmd && pmd_present(*pmd)))
+ if (level == 0 && !(pmd && pmd_present(*pmd) && !pmd_is_leaf(*pmd)))
new_ptep = kvmppc_pte_alloc();
/* Check if we might have been invalidated; let the guest retry if so */
@@ -244,12 +253,30 @@ static int kvmppc_create_pte(struct kvm *kvm, pte_t pte, unsigned long gpa,
new_pmd = NULL;
}
pmd = pmd_offset(pud, gpa);
- if (pmd_large(*pmd)) {
- /* Someone else has instantiated a large page here; retry */
- ret = -EAGAIN;
- goto out_unlock;
- }
- if (level == 1 && !pmd_none(*pmd)) {
+ if (pmd_is_leaf(*pmd)) {
+ unsigned long lgpa = gpa & PMD_MASK;
+
+ /*
+ * If we raced with another CPU which has just put
+ * a 2MB pte in after we saw a pte page, try again.
+ */
+ if (level == 0 && !new_ptep) {
+ ret = -EAGAIN;
+ goto out_unlock;
+ }
+ /* Valid 2MB page here already, remove it */
+ old = kvmppc_radix_update_pte(kvm, pmdp_ptep(pmd),
+ ~0UL, 0, lgpa, PMD_SHIFT);
+ kvmppc_radix_tlbie_page(kvm, lgpa, PMD_SHIFT);
+ if (old & _PAGE_DIRTY) {
+ unsigned long gfn = lgpa >> PAGE_SHIFT;
+ struct kvm_memory_slot *memslot;
+ memslot = gfn_to_memslot(kvm, gfn);
+ if (memslot)
+ mark_pages_dirty(kvm, memslot, gfn,
+ PMD_SHIFT - PAGE_SHIFT);
+ }
+ } else if (level == 1 && !pmd_none(*pmd)) {
/*
* There's a page table page here, but we wanted
* to install a large page. Tell the caller and let
@@ -412,28 +439,24 @@ int kvmppc_book3s_radix_page_fault(struct kvm_run *run, struct kvm_vcpu *vcpu,
} else {
page = pages[0];
pfn = page_to_pfn(page);
- if (PageHuge(page)) {
- page = compound_head(page);
- pte_size <<= compound_order(page);
+ if (PageCompound(page)) {
+ pte_size <<= compound_order(compound_head(page));
/* See if we can insert a 2MB large-page PTE here */
if (pte_size >= PMD_SIZE &&
- (gpa & PMD_MASK & PAGE_MASK) ==
- (hva & PMD_MASK & PAGE_MASK)) {
+ (gpa & (PMD_SIZE - PAGE_SIZE)) ==
+ (hva & (PMD_SIZE - PAGE_SIZE))) {
level = 1;
pfn &= ~((PMD_SIZE >> PAGE_SHIFT) - 1);
}
}
/* See if we can provide write access */
if (writing) {
- /*
- * We assume gup_fast has set dirty on the host PTE.
- */
pgflags |= _PAGE_WRITE;
} else {
local_irq_save(flags);
ptep = find_current_mm_pte(current->mm->pgd,
hva, NULL, NULL);
- if (ptep && pte_write(*ptep) && pte_dirty(*ptep))
+ if (ptep && pte_write(*ptep))
pgflags |= _PAGE_WRITE;
local_irq_restore(flags);
}
@@ -459,18 +482,15 @@ int kvmppc_book3s_radix_page_fault(struct kvm_run *run, struct kvm_vcpu *vcpu,
pte = pfn_pte(pfn, __pgprot(pgflags));
ret = kvmppc_create_pte(kvm, pte, gpa, level, mmu_seq);
}
- if (ret == 0 || ret == -EAGAIN)
- ret = RESUME_GUEST;
if (page) {
- /*
- * We drop pages[0] here, not page because page might
- * have been set to the head page of a compound, but
- * we have to drop the reference on the correct tail
- * page to match the get inside gup()
- */
- put_page(pages[0]);
+ if (!ret && (pgflags & _PAGE_WRITE))
+ set_page_dirty_lock(page);
+ put_page(page);
}
+
+ if (ret == 0 || ret == -EAGAIN)
+ ret = RESUME_GUEST;
return ret;
}
@@ -676,7 +696,7 @@ void kvmppc_free_radix(struct kvm *kvm)
continue;
pmd = pmd_offset(pud, 0);
for (im = 0; im < PTRS_PER_PMD; ++im, ++pmd) {
- if (pmd_huge(*pmd)) {
+ if (pmd_is_leaf(*pmd)) {
pmd_clear(pmd);
continue;
}
--
2.11.0
From: Finley Xiao <finley.xiao(a)rock-chips.com>
Solve the pd could only ever turn off but never turn them on again,
If the pd registers have the writemask bits.
Fix up the code error for commit:
commit 79bb17ce8edb3141339b5882e372d0ec7346217c
Author: Elaine Zhang <zhangqing(a)rock-chips.com>
Date: Fri Dec 23 11:47:52 2016 +0800
soc: rockchip: power-domain: Support domain control in hiword-registers
New Rockchips SoCs may have their power-domain control in registers
using a writemask-based access scheme (upper 16bit being the write
mask). So add a DOMAIN_M type and handle this case accordingly.
Signed-off-by: Elaine Zhang <zhangqing(a)rock-chips.com>
Signed-off-by: Heiko Stuebner <heiko(a)sntech.de>
Signed-off-by: Finley Xiao <finley.xiao(a)rock-chips.com>
Signed-off-by: Elaine Zhang <zhangqing(a)rock-chips.com>
---
drivers/soc/rockchip/pm_domains.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/soc/rockchip/pm_domains.c b/drivers/soc/rockchip/pm_domains.c
index ebd7c41898c0..01d4ba26a054 100644
--- a/drivers/soc/rockchip/pm_domains.c
+++ b/drivers/soc/rockchip/pm_domains.c
@@ -264,7 +264,7 @@ static void rockchip_do_pmu_set_power_domain(struct rockchip_pm_domain *pd,
return;
else if (pd->info->pwr_w_mask)
regmap_write(pmu->regmap, pmu->info->pwr_offset,
- on ? pd->info->pwr_mask :
+ on ? pd->info->pwr_w_mask :
(pd->info->pwr_mask | pd->info->pwr_w_mask));
else
regmap_update_bits(pmu->regmap, pmu->info->pwr_offset,
--
1.9.1
commit a8b48a4dccea77e29462e59f1dbf0d5aa1ff167c upstream.
This fixes a bug where the trap number that is returned by
__kvmppc_vcore_entry gets corrupted. The effect of the corruption
is that IPIs get ignored on POWER9 systems when the IPI is sent via
a doorbell interrupt to a CPU which is executing in a KVM guest.
The effect of the IPI being ignored is often that another CPU locks
up inside smp_call_function_many() (and if that CPU is holding a
spinlock, other CPUs then lock up inside raw_spin_lock()).
The trap number is currently held in register r12 for most of the
assembly-language part of the guest exit path. In that path, we
call kvmppc_subcore_exit_guest(), which is a C function, without
restoring r12 afterwards. Depending on the kernel config and the
compiler, it may modify r12 or it may not, so some config/compiler
combinations see the bug and others don't.
To fix this, we arrange for the trap number to be stored on the
stack from the point where kvmhv_commence_exit is called until the
end of the function, then the trap number is loaded and returned in
r12 as before.
Cc: stable(a)vger.kernel.org # v4.8+
Fixes: fd7bacbca47a ("KVM: PPC: Book3S HV: Fix TB corruption in guest exit path on HMI interrupt")
Signed-off-by: Paul Mackerras <paulus(a)ozlabs.org>
---
arch/powerpc/kvm/book3s_hv_rmhandlers.S | 8 +++++---
1 file changed, 5 insertions(+), 3 deletions(-)
diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
index 2b3194b9608f..663a398449b7 100644
--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
@@ -308,7 +308,6 @@ kvm_novcpu_exit:
stw r12, STACK_SLOT_TRAP(r1)
bl kvmhv_commence_exit
nop
- lwz r12, STACK_SLOT_TRAP(r1)
b kvmhv_switch_to_host
/*
@@ -1136,6 +1135,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_300)
secondary_too_late:
li r12, 0
+ stw r12, STACK_SLOT_TRAP(r1)
cmpdi r4, 0
beq 11f
stw r12, VCPU_TRAP(r4)
@@ -1445,12 +1445,12 @@ mc_cont:
1:
#endif /* CONFIG_KVM_XICS */
+ stw r12, STACK_SLOT_TRAP(r1)
mr r3, r12
/* Increment exit count, poke other threads to exit */
bl kvmhv_commence_exit
nop
ld r9, HSTATE_KVM_VCPU(r13)
- lwz r12, VCPU_TRAP(r9)
/* Stop others sending VCPU interrupts to this physical CPU */
li r0, -1
@@ -1816,6 +1816,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_POWER9_DD1)
* POWER7/POWER8 guest -> host partition switch code.
* We don't have to lock against tlbies but we do
* have to coordinate the hardware threads.
+ * Here STACK_SLOT_TRAP(r1) contains the trap number.
*/
kvmhv_switch_to_host:
/* Secondary threads wait for primary to do partition switch */
@@ -1868,11 +1869,11 @@ BEGIN_FTR_SECTION
END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
/* If HMI, call kvmppc_realmode_hmi_handler() */
+ lwz r12, STACK_SLOT_TRAP(r1)
cmpwi r12, BOOK3S_INTERRUPT_HMI
bne 27f
bl kvmppc_realmode_hmi_handler
nop
- li r12, BOOK3S_INTERRUPT_HMI
/*
* At this point kvmppc_realmode_hmi_handler would have resync-ed
* the TB. Hence it is not required to subtract guest timebase
@@ -1950,6 +1951,7 @@ END_MMU_FTR_SECTION_IFSET(MMU_FTR_TYPE_RADIX)
li r0, KVM_GUEST_MODE_NONE
stb r0, HSTATE_IN_GUEST(r13)
+ lwz r12, STACK_SLOT_TRAP(r1) /* return trap # in r12 */
ld r0, SFS+PPC_LR_STKOFF(r1)
addi r1, r1, SFS
mtlr r0
--
2.11.0
Currently in ext4_punch_hole we're going to skip the mtime update if
there are no actual blocks to release. However we've actually modified
the file by zeroing the partial block so the mtime should be updated.
Moreover the sync and datasync handling is skipped as well, which is
also wrong. Fix it.
Signed-off-by: Lukas Czerner <lczerner(a)redhat.com>
Reported-by: Joe Habermann <joe.habermann(a)quantum.com>
Cc: <stable(a)vger.kernel.org>
---
fs/ext4/inode.c | 36 ++++++++++++++++++------------------
1 file changed, 18 insertions(+), 18 deletions(-)
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 1e50c5e..6b4c5c0 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -4298,28 +4298,28 @@ int ext4_punch_hole(struct inode *inode, loff_t offset, loff_t length)
EXT4_BLOCK_SIZE_BITS(sb);
stop_block = (offset + length) >> EXT4_BLOCK_SIZE_BITS(sb);
- /* If there are no blocks to remove, return now */
- if (first_block >= stop_block)
- goto out_stop;
+ /* If there are blocks to remove, do it */
+ if (stop_block > first_block) {
- down_write(&EXT4_I(inode)->i_data_sem);
- ext4_discard_preallocations(inode);
+ down_write(&EXT4_I(inode)->i_data_sem);
+ ext4_discard_preallocations(inode);
- ret = ext4_es_remove_extent(inode, first_block,
- stop_block - first_block);
- if (ret) {
- up_write(&EXT4_I(inode)->i_data_sem);
- goto out_stop;
- }
+ ret = ext4_es_remove_extent(inode, first_block,
+ stop_block - first_block);
+ if (ret) {
+ up_write(&EXT4_I(inode)->i_data_sem);
+ goto out_stop;
+ }
- if (ext4_test_inode_flag(inode, EXT4_INODE_EXTENTS))
- ret = ext4_ext_remove_space(inode, first_block,
- stop_block - 1);
- else
- ret = ext4_ind_remove_space(handle, inode, first_block,
- stop_block);
+ if (ext4_test_inode_flag(inode, EXT4_INODE_EXTENTS))
+ ret = ext4_ext_remove_space(inode, first_block,
+ stop_block - 1);
+ else
+ ret = ext4_ind_remove_space(handle, inode, first_block,
+ stop_block);
- up_write(&EXT4_I(inode)->i_data_sem);
+ up_write(&EXT4_I(inode)->i_data_sem);
+ }
if (IS_SYNC(inode))
ext4_handle_sync(handle);
--
2.7.5
When ext4_ind_map_blocks() computes a length of a hole, it doesn't count
with the fact that mapped offset may be somewhere in the middle of the
completely empty subtree. In such case it will return too large length
of the hole which then results in lseek(SEEK_DATA) to end up returning
an incorrect offset beyond the end of the hole.
Fix the problem by correctly taking offset within a subtree into account
when computing a length of a hole.
Fixes: facab4d9711e7aa3532cb82643803e8f1b9518e8
CC: stable(a)vger.kernel.org
Reported-by: Jeff Mahoney <jeffm(a)suse.com>
Signed-off-by: Jan Kara <jack(a)suse.cz>
---
fs/ext4/indirect.c | 14 ++++++++++----
1 file changed, 10 insertions(+), 4 deletions(-)
I'll submit corresponding fstest shortly.
diff --git a/fs/ext4/indirect.c b/fs/ext4/indirect.c
index c32802c956d5..bf7fa1507e81 100644
--- a/fs/ext4/indirect.c
+++ b/fs/ext4/indirect.c
@@ -561,10 +561,16 @@ int ext4_ind_map_blocks(handle_t *handle, struct inode *inode,
unsigned epb = inode->i_sb->s_blocksize / sizeof(u32);
int i;
- /* Count number blocks in a subtree under 'partial' */
- count = 1;
- for (i = 0; partial + i != chain + depth - 1; i++)
- count *= epb;
+ /*
+ * Count number blocks in a subtree under 'partial'. At each
+ * level we count number of complete empty subtrees beyond
+ * current offset and then descend into the subtree only
+ * partially beyond current offset.
+ */
+ count = 0;
+ for (i = partial - chain + 1; i < depth; i++)
+ count = count * epb + (epb - offsets[i] - 1);
+ count++;
/* Fill in size of a hole we found */
map->m_pblk = 0;
map->m_len = min_t(unsigned int, map->m_len, count);
--
2.13.6