We might have RISC-V systems (such as QEMU) where VMID is not part
of the TLB entry tag so these systems will have to flush all TLB
enteries upon any change in hgatp.VMID.
Currently, we zero-out hgatp CSR in kvm_arch_vcpu_put() and we
re-program hgatp CSR in kvm_arch_vcpu_load(). For above described
systems, this will flush all TLB enteries whenever VCPU exits to
user-space hence reducing performance.
This patch fixes above described performance issue by not clearing
hgatp CSR in kvm_arch_vcpu_put().
Fixes: 34bde9d8b9e6 ("RISC-V: KVM: Implement VCPU world-switch")
Cc: stable(a)vger.kernel.org
Signed-off-by: Anup Patel <apatel(a)ventanamicro.com>
---
arch/riscv/kvm/vcpu.c | 2 --
1 file changed, 2 deletions(-)
diff --git a/arch/riscv/kvm/vcpu.c b/arch/riscv/kvm/vcpu.c
index 624166004e36..6785aef4cbd4 100644
--- a/arch/riscv/kvm/vcpu.c
+++ b/arch/riscv/kvm/vcpu.c
@@ -653,8 +653,6 @@ void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
vcpu->arch.isa);
kvm_riscv_vcpu_host_fp_restore(&vcpu->arch.host_context);
- csr_write(CSR_HGATP, 0);
-
csr->vsstatus = csr_read(CSR_VSSTATUS);
csr->vsie = csr_read(CSR_VSIE);
csr->vstvec = csr_read(CSR_VSTVEC);
--
2.25.1
Syzbot found an issue [1] in ext4_fallocate().
The C reproducer [2] calls fallocate(), passing size 0xffeffeff000ul,
and offset 0x1000000ul, which, when added together exceed the
bitmap_maxbytes for the inode. This triggers a BUG in
ext4_ind_remove_space(). According to the comments in this function
the 'end' parameter needs to be one block after the last block to be
removed. In the case when the BUG is triggered it points to the last
block. Modify the ext4_punch_hole() function and add constraint that
caps the length to satisfy the one before laster block requirement.
LINK: [1] https://syzkaller.appspot.com/bug?id=b80bd9cf348aac724a4f4dff251800106d7213…
LINK: [2] https://syzkaller.appspot.com/text?tag=ReproC&x=14ba0238700000
Cc: Theodore Ts'o <tytso(a)mit.edu>
Cc: Andreas Dilger <adilger.kernel(a)dilger.ca>
Cc: Ritesh Harjani <riteshh(a)linux.ibm.com>
Cc: <linux-ext4(a)vger.kernel.org>
Cc: <stable(a)vger.kernel.org>
Cc: <linux-kernel(a)vger.kernel.org>
Fixes: a4bb6b64e39a ("ext4: enable "punch hole" functionality")
Reported-by: syzbot+7a806094edd5d07ba029(a)syzkaller.appspotmail.com
Signed-off-by: Tadeusz Struk <tadeusz.struk(a)linaro.org>
--
v3: Modify the length instead of returning an error.
v2: Change sbi->s_blocksize to inode->i_sb->s_blocksize in maxlength
computation.
---
fs/ext4/inode.c | 11 ++++++++++-
1 file changed, 10 insertions(+), 1 deletion(-)
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 1ce13f69fbec..60bf31765d07 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -3958,7 +3958,8 @@ int ext4_punch_hole(struct inode *inode, loff_t offset, loff_t length)
struct super_block *sb = inode->i_sb;
ext4_lblk_t first_block, stop_block;
struct address_space *mapping = inode->i_mapping;
- loff_t first_block_offset, last_block_offset;
+ loff_t first_block_offset, last_block_offset, max_length;
+ struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb);
handle_t *handle;
unsigned int credits;
int ret = 0, ret2 = 0;
@@ -4001,6 +4002,14 @@ int ext4_punch_hole(struct inode *inode, loff_t offset, loff_t length)
offset;
}
+ /*
+ * For punch hole the length + offset needs to be within one block
+ * before last range. Adjust the length if it goes beyond that limit.
+ */
+ max_length = sbi->s_bitmap_maxbytes - inode->i_sb->s_blocksize;
+ if (offset + length > max_length)
+ length = max_length - offset;
+
if (offset & (sb->s_blocksize - 1) ||
(offset + length) & (sb->s_blocksize - 1)) {
/*
--
2.35.1
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 3cbf0e392f173ba0ce425968c8374a6aa3e90f2e Mon Sep 17 00:00:00 2001
From: Baokun Li <libaokun1(a)huawei.com>
Date: Fri, 5 Nov 2021 17:30:22 +0800
Subject: [PATCH] ubi: Fix race condition between ctrl_cdev_ioctl and
ubi_cdev_ioctl
Hulk Robot reported a KASAN report about use-after-free:
==================================================================
BUG: KASAN: use-after-free in __list_del_entry_valid+0x13d/0x160
Read of size 8 at addr ffff888035e37d98 by task ubiattach/1385
[...]
Call Trace:
klist_dec_and_del+0xa7/0x4a0
klist_put+0xc7/0x1a0
device_del+0x4d4/0xed0
cdev_device_del+0x1a/0x80
ubi_attach_mtd_dev+0x2951/0x34b0 [ubi]
ctrl_cdev_ioctl+0x286/0x2f0 [ubi]
Allocated by task 1414:
device_add+0x60a/0x18b0
cdev_device_add+0x103/0x170
ubi_create_volume+0x1118/0x1a10 [ubi]
ubi_cdev_ioctl+0xb7f/0x1ba0 [ubi]
Freed by task 1385:
cdev_device_del+0x1a/0x80
ubi_remove_volume+0x438/0x6c0 [ubi]
ubi_cdev_ioctl+0xbf4/0x1ba0 [ubi]
[...]
==================================================================
The lock held by ctrl_cdev_ioctl is ubi_devices_mutex, but the lock held
by ubi_cdev_ioctl is ubi->device_mutex. Therefore, the two locks can be
concurrent.
ctrl_cdev_ioctl contains two operations: ubi_attach and ubi_detach.
ubi_detach is bug-free because it uses reference counting to prevent
concurrency. However, uif_init and uif_close in ubi_attach may race with
ubi_cdev_ioctl.
uif_init will race with ubi_cdev_ioctl as in the following stack.
cpu1 cpu2 cpu3
_______________________|________________________|______________________
ctrl_cdev_ioctl
ubi_attach_mtd_dev
uif_init
ubi_cdev_ioctl
ubi_create_volume
cdev_device_add
ubi_add_volume
// sysfs exist
kill_volumes
ubi_cdev_ioctl
ubi_remove_volume
cdev_device_del
// first free
ubi_free_volume
cdev_del
// double free
cdev_device_del
And uif_close will race with ubi_cdev_ioctl as in the following stack.
cpu1 cpu2 cpu3
_______________________|________________________|______________________
ctrl_cdev_ioctl
ubi_attach_mtd_dev
uif_init
ubi_cdev_ioctl
ubi_create_volume
cdev_device_add
ubi_debugfs_init_dev
//error goto out_uif;
uif_close
kill_volumes
ubi_cdev_ioctl
ubi_remove_volume
cdev_device_del
// first free
ubi_free_volume
// double free
The cause of this problem is that commit 714fb87e8bc0 make device
"available" before it becomes accessible via sysfs. Therefore, we
roll back the modification. We will fix the race condition between
ubi device creation and udev by removing ubi_get_device in
vol_attribute_show and dev_attribute_show.This avoids accessing
uninitialized ubi_devices[ubi_num].
ubi_get_device is used to prevent devices from being deleted during
sysfs execution. However, now kernfs ensures that devices will not
be deleted before all reference counting are released.
The key process is shown in the following stack.
device_del
device_remove_attrs
device_remove_groups
sysfs_remove_groups
sysfs_remove_group
remove_files
kernfs_remove_by_name
kernfs_remove_by_name_ns
__kernfs_remove
kernfs_drain
Fixes: 714fb87e8bc0 ("ubi: Fix race condition between ubi device creation and udev")
Reported-by: Hulk Robot <hulkci(a)huawei.com>
Signed-off-by: Baokun Li <libaokun1(a)huawei.com>
Signed-off-by: Richard Weinberger <richard(a)nod.at>
diff --git a/drivers/mtd/ubi/build.c b/drivers/mtd/ubi/build.c
index a7e3eb9befb6..a32050fecabf 100644
--- a/drivers/mtd/ubi/build.c
+++ b/drivers/mtd/ubi/build.c
@@ -351,9 +351,6 @@ static ssize_t dev_attribute_show(struct device *dev,
* we still can use 'ubi->ubi_num'.
*/
ubi = container_of(dev, struct ubi_device, dev);
- ubi = ubi_get_device(ubi->ubi_num);
- if (!ubi)
- return -ENODEV;
if (attr == &dev_eraseblock_size)
ret = sprintf(buf, "%d\n", ubi->leb_size);
@@ -382,7 +379,6 @@ static ssize_t dev_attribute_show(struct device *dev,
else
ret = -EINVAL;
- ubi_put_device(ubi);
return ret;
}
@@ -979,9 +975,6 @@ int ubi_attach_mtd_dev(struct mtd_info *mtd, int ubi_num,
goto out_detach;
}
- /* Make device "available" before it becomes accessible via sysfs */
- ubi_devices[ubi_num] = ubi;
-
err = uif_init(ubi);
if (err)
goto out_detach;
@@ -1026,6 +1019,7 @@ int ubi_attach_mtd_dev(struct mtd_info *mtd, int ubi_num,
wake_up_process(ubi->bgt_thread);
spin_unlock(&ubi->wl_lock);
+ ubi_devices[ubi_num] = ubi;
ubi_notify_all(ubi, UBI_VOLUME_ADDED, NULL);
return ubi_num;
@@ -1034,7 +1028,6 @@ int ubi_attach_mtd_dev(struct mtd_info *mtd, int ubi_num,
out_uif:
uif_close(ubi);
out_detach:
- ubi_devices[ubi_num] = NULL;
ubi_wl_close(ubi);
ubi_free_all_volumes(ubi);
vfree(ubi->vtbl);
diff --git a/drivers/mtd/ubi/vmt.c b/drivers/mtd/ubi/vmt.c
index 139ee132bfbc..1bc7b3a05604 100644
--- a/drivers/mtd/ubi/vmt.c
+++ b/drivers/mtd/ubi/vmt.c
@@ -56,16 +56,11 @@ static ssize_t vol_attribute_show(struct device *dev,
{
int ret;
struct ubi_volume *vol = container_of(dev, struct ubi_volume, dev);
- struct ubi_device *ubi;
-
- ubi = ubi_get_device(vol->ubi->ubi_num);
- if (!ubi)
- return -ENODEV;
+ struct ubi_device *ubi = vol->ubi;
spin_lock(&ubi->volumes_lock);
if (!ubi->volumes[vol->vol_id]) {
spin_unlock(&ubi->volumes_lock);
- ubi_put_device(ubi);
return -ENODEV;
}
/* Take a reference to prevent volume removal */
@@ -103,7 +98,6 @@ static ssize_t vol_attribute_show(struct device *dev,
vol->ref_count -= 1;
ubi_assert(vol->ref_count >= 0);
spin_unlock(&ubi->volumes_lock);
- ubi_put_device(ubi);
return ret;
}
The patch below does not apply to the 5.17-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 85643396c71241aa8b5afc4e23e1099b170f6517 Mon Sep 17 00:00:00 2001
From: Abhishek Naik <abhishek.naik(a)intel.com>
Date: Sun, 30 Jan 2022 11:53:06 +0200
Subject: [PATCH] iwlwifi: nvm: Correct HE capability
The HE PHY capability - Tx 1024-QAM < 242-tone RU support
was not handled for Ms RFs, add the relevant code for it.
Signed-off-by: Abhishek Naik <abhishek.naik(a)intel.com>
Fixes: 1381eb5c8ed5 ("iwlwifi: correct HE capabilities")
Signed-off-by: Luca Coelho <luciano.coelho(a)intel.com>
Link: https://lore.kernel.org/r/iwlwifi.20220130115024.01e232ce98ca.I765d26e9eb6a…
Signed-off-by: Luca Coelho <luciano.coelho(a)intel.com>
diff --git a/drivers/net/wireless/intel/iwlwifi/iwl-nvm-parse.c b/drivers/net/wireless/intel/iwlwifi/iwl-nvm-parse.c
index 0693dfda43a3..0dfd69fcd5d7 100644
--- a/drivers/net/wireless/intel/iwlwifi/iwl-nvm-parse.c
+++ b/drivers/net/wireless/intel/iwlwifi/iwl-nvm-parse.c
@@ -784,6 +784,7 @@ iwl_nvm_fixup_sband_iftd(struct iwl_trans *trans,
switch (CSR_HW_RFID_TYPE(trans->hw_rf_id)) {
case IWL_CFG_RF_TYPE_GF:
case IWL_CFG_RF_TYPE_MR:
+ case IWL_CFG_RF_TYPE_MS:
iftype_data->he_cap.he_cap_elem.phy_cap_info[9] |=
IEEE80211_HE_PHY_CAP9_TX_1024_QAM_LESS_THAN_242_TONE_RU;
if (!is_ap)
The patch below does not apply to the 5.16-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 85643396c71241aa8b5afc4e23e1099b170f6517 Mon Sep 17 00:00:00 2001
From: Abhishek Naik <abhishek.naik(a)intel.com>
Date: Sun, 30 Jan 2022 11:53:06 +0200
Subject: [PATCH] iwlwifi: nvm: Correct HE capability
The HE PHY capability - Tx 1024-QAM < 242-tone RU support
was not handled for Ms RFs, add the relevant code for it.
Signed-off-by: Abhishek Naik <abhishek.naik(a)intel.com>
Fixes: 1381eb5c8ed5 ("iwlwifi: correct HE capabilities")
Signed-off-by: Luca Coelho <luciano.coelho(a)intel.com>
Link: https://lore.kernel.org/r/iwlwifi.20220130115024.01e232ce98ca.I765d26e9eb6a…
Signed-off-by: Luca Coelho <luciano.coelho(a)intel.com>
diff --git a/drivers/net/wireless/intel/iwlwifi/iwl-nvm-parse.c b/drivers/net/wireless/intel/iwlwifi/iwl-nvm-parse.c
index 0693dfda43a3..0dfd69fcd5d7 100644
--- a/drivers/net/wireless/intel/iwlwifi/iwl-nvm-parse.c
+++ b/drivers/net/wireless/intel/iwlwifi/iwl-nvm-parse.c
@@ -784,6 +784,7 @@ iwl_nvm_fixup_sband_iftd(struct iwl_trans *trans,
switch (CSR_HW_RFID_TYPE(trans->hw_rf_id)) {
case IWL_CFG_RF_TYPE_GF:
case IWL_CFG_RF_TYPE_MR:
+ case IWL_CFG_RF_TYPE_MS:
iftype_data->he_cap.he_cap_elem.phy_cap_info[9] |=
IEEE80211_HE_PHY_CAP9_TX_1024_QAM_LESS_THAN_242_TONE_RU;
if (!is_ap)
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 85643396c71241aa8b5afc4e23e1099b170f6517 Mon Sep 17 00:00:00 2001
From: Abhishek Naik <abhishek.naik(a)intel.com>
Date: Sun, 30 Jan 2022 11:53:06 +0200
Subject: [PATCH] iwlwifi: nvm: Correct HE capability
The HE PHY capability - Tx 1024-QAM < 242-tone RU support
was not handled for Ms RFs, add the relevant code for it.
Signed-off-by: Abhishek Naik <abhishek.naik(a)intel.com>
Fixes: 1381eb5c8ed5 ("iwlwifi: correct HE capabilities")
Signed-off-by: Luca Coelho <luciano.coelho(a)intel.com>
Link: https://lore.kernel.org/r/iwlwifi.20220130115024.01e232ce98ca.I765d26e9eb6a…
Signed-off-by: Luca Coelho <luciano.coelho(a)intel.com>
diff --git a/drivers/net/wireless/intel/iwlwifi/iwl-nvm-parse.c b/drivers/net/wireless/intel/iwlwifi/iwl-nvm-parse.c
index 0693dfda43a3..0dfd69fcd5d7 100644
--- a/drivers/net/wireless/intel/iwlwifi/iwl-nvm-parse.c
+++ b/drivers/net/wireless/intel/iwlwifi/iwl-nvm-parse.c
@@ -784,6 +784,7 @@ iwl_nvm_fixup_sband_iftd(struct iwl_trans *trans,
switch (CSR_HW_RFID_TYPE(trans->hw_rf_id)) {
case IWL_CFG_RF_TYPE_GF:
case IWL_CFG_RF_TYPE_MR:
+ case IWL_CFG_RF_TYPE_MS:
iftype_data->he_cap.he_cap_elem.phy_cap_info[9] |=
IEEE80211_HE_PHY_CAP9_TX_1024_QAM_LESS_THAN_242_TONE_RU;
if (!is_ap)
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 2b88cba55883eaafbc9b7cbff0b2c7cdba71ed01 Mon Sep 17 00:00:00 2001
From: Eric Dumazet <edumazet(a)google.com>
Date: Mon, 21 Feb 2022 19:21:13 -0800
Subject: [PATCH] net: preserve skb_end_offset() in skb_unclone_keeptruesize()
syzbot found another way to trigger the infamous WARN_ON_ONCE(delta < len)
in skb_try_coalesce() [1]
I was able to root cause the issue to kfence.
When kfence is in action, the following assertion is no longer true:
int size = xxxx;
void *ptr1 = kmalloc(size, gfp);
void *ptr2 = kmalloc(size, gfp);
if (ptr1 && ptr2)
ASSERT(ksize(ptr1) == ksize(ptr2));
We attempted to fix these issues in the blamed commits, but forgot
that TCP was possibly shifting data after skb_unclone_keeptruesize()
has been used, notably from tcp_retrans_try_collapse().
So we not only need to keep same skb->truesize value,
we also need to make sure TCP wont fill new tailroom
that pskb_expand_head() was able to get from a
addr = kmalloc(...) followed by ksize(addr)
Split skb_unclone_keeptruesize() into two parts:
1) Inline skb_unclone_keeptruesize() for the common case,
when skb is not cloned.
2) Out of line __skb_unclone_keeptruesize() for the 'slow path'.
WARNING: CPU: 1 PID: 6490 at net/core/skbuff.c:5295 skb_try_coalesce+0x1235/0x1560 net/core/skbuff.c:5295
Modules linked in:
CPU: 1 PID: 6490 Comm: syz-executor161 Not tainted 5.17.0-rc4-syzkaller-00229-g4f12b742eb2b #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:skb_try_coalesce+0x1235/0x1560 net/core/skbuff.c:5295
Code: bf 01 00 00 00 0f b7 c0 89 c6 89 44 24 20 e8 62 24 4e fa 8b 44 24 20 83 e8 01 0f 85 e5 f0 ff ff e9 87 f4 ff ff e8 cb 20 4e fa <0f> 0b e9 06 f9 ff ff e8 af b2 95 fa e9 69 f0 ff ff e8 95 b2 95 fa
RSP: 0018:ffffc900063af268 EFLAGS: 00010293
RAX: 0000000000000000 RBX: 00000000ffffffd5 RCX: 0000000000000000
RDX: ffff88806fc05700 RSI: ffffffff872abd55 RDI: 0000000000000003
RBP: ffff88806e675500 R08: 00000000ffffffd5 R09: 0000000000000000
R10: ffffffff872ab659 R11: 0000000000000000 R12: ffff88806dd554e8
R13: ffff88806dd9bac0 R14: ffff88806dd9a2c0 R15: 0000000000000155
FS: 00007f18014f9700(0000) GS:ffff8880b9c00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000020002000 CR3: 000000006be7a000 CR4: 00000000003506f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
tcp_try_coalesce net/ipv4/tcp_input.c:4651 [inline]
tcp_try_coalesce+0x393/0x920 net/ipv4/tcp_input.c:4630
tcp_queue_rcv+0x8a/0x6e0 net/ipv4/tcp_input.c:4914
tcp_data_queue+0x11fd/0x4bb0 net/ipv4/tcp_input.c:5025
tcp_rcv_established+0x81e/0x1ff0 net/ipv4/tcp_input.c:5947
tcp_v4_do_rcv+0x65e/0x980 net/ipv4/tcp_ipv4.c:1719
sk_backlog_rcv include/net/sock.h:1037 [inline]
__release_sock+0x134/0x3b0 net/core/sock.c:2779
release_sock+0x54/0x1b0 net/core/sock.c:3311
sk_wait_data+0x177/0x450 net/core/sock.c:2821
tcp_recvmsg_locked+0xe28/0x1fd0 net/ipv4/tcp.c:2457
tcp_recvmsg+0x137/0x610 net/ipv4/tcp.c:2572
inet_recvmsg+0x11b/0x5e0 net/ipv4/af_inet.c:850
sock_recvmsg_nosec net/socket.c:948 [inline]
sock_recvmsg net/socket.c:966 [inline]
sock_recvmsg net/socket.c:962 [inline]
____sys_recvmsg+0x2c4/0x600 net/socket.c:2632
___sys_recvmsg+0x127/0x200 net/socket.c:2674
__sys_recvmsg+0xe2/0x1a0 net/socket.c:2704
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x44/0xae
Fixes: c4777efa751d ("net: add and use skb_unclone_keeptruesize() helper")
Fixes: 097b9146c0e2 ("net: fix up truesize of cloned skb in skb_prepare_for_shift()")
Reported-by: syzbot <syzkaller(a)googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet(a)google.com>
Cc: Marco Elver <elver(a)google.com>
Signed-off-by: Jakub Kicinski <kuba(a)kernel.org>
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 115be7f73487..31be38078918 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -1795,19 +1795,19 @@ static inline int skb_unclone(struct sk_buff *skb, gfp_t pri)
return 0;
}
-/* This variant of skb_unclone() makes sure skb->truesize is not changed */
+/* This variant of skb_unclone() makes sure skb->truesize
+ * and skb_end_offset() are not changed, whenever a new skb->head is needed.
+ *
+ * Indeed there is no guarantee that ksize(kmalloc(X)) == ksize(kmalloc(X))
+ * when various debugging features are in place.
+ */
+int __skb_unclone_keeptruesize(struct sk_buff *skb, gfp_t pri);
static inline int skb_unclone_keeptruesize(struct sk_buff *skb, gfp_t pri)
{
might_sleep_if(gfpflags_allow_blocking(pri));
- if (skb_cloned(skb)) {
- unsigned int save = skb->truesize;
- int res;
-
- res = pskb_expand_head(skb, 0, 0, pri);
- skb->truesize = save;
- return res;
- }
+ if (skb_cloned(skb))
+ return __skb_unclone_keeptruesize(skb, pri);
return 0;
}
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 27a2296241c9..725f2b356769 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -1787,6 +1787,38 @@ struct sk_buff *skb_realloc_headroom(struct sk_buff *skb, unsigned int headroom)
}
EXPORT_SYMBOL(skb_realloc_headroom);
+int __skb_unclone_keeptruesize(struct sk_buff *skb, gfp_t pri)
+{
+ unsigned int saved_end_offset, saved_truesize;
+ struct skb_shared_info *shinfo;
+ int res;
+
+ saved_end_offset = skb_end_offset(skb);
+ saved_truesize = skb->truesize;
+
+ res = pskb_expand_head(skb, 0, 0, pri);
+ if (res)
+ return res;
+
+ skb->truesize = saved_truesize;
+
+ if (likely(skb_end_offset(skb) == saved_end_offset))
+ return 0;
+
+ shinfo = skb_shinfo(skb);
+
+ /* We are about to change back skb->end,
+ * we need to move skb_shinfo() to its new location.
+ */
+ memmove(skb->head + saved_end_offset,
+ shinfo,
+ offsetof(struct skb_shared_info, frags[shinfo->nr_frags]));
+
+ skb_set_end_offset(skb, saved_end_offset);
+
+ return 0;
+}
+
/**
* skb_expand_head - reallocate header of &sk_buff
* @skb: buffer to reallocate