The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From a7542b87607560d0b89e7ff81d870bd6ff8835cb Mon Sep 17 00:00:00 2001
From: Stefan Assmann <sassmann(a)kpanic.de>
Date: Wed, 21 Aug 2019 16:09:29 +0200
Subject: [PATCH] i40e: check __I40E_VF_DISABLE bit in
i40e_sync_filters_subtask
While testing VF spawn/destroy the following panic occurred.
BUG: unable to handle kernel NULL pointer dereference at 0000000000000029
[...]
Workqueue: i40e i40e_service_task [i40e]
RIP: 0010:i40e_sync_vsi_filters+0x6fd/0xc60 [i40e]
[...]
Call Trace:
? __switch_to_asm+0x35/0x70
? __switch_to_asm+0x41/0x70
? __switch_to_asm+0x35/0x70
? _cond_resched+0x15/0x30
i40e_sync_filters_subtask+0x56/0x70 [i40e]
i40e_service_task+0x382/0x11b0 [i40e]
? __switch_to_asm+0x41/0x70
? __switch_to_asm+0x41/0x70
process_one_work+0x1a7/0x3b0
worker_thread+0x30/0x390
? create_worker+0x1a0/0x1a0
kthread+0x112/0x130
? kthread_bind+0x30/0x30
ret_from_fork+0x35/0x40
Investigation revealed a race where pf->vf[vsi->vf_id].trusted may get
accessed by the watchdog via i40e_sync_filters_subtask() although
i40e_free_vfs() already free'd pf->vf.
To avoid this the call to i40e_sync_vsi_filters() in
i40e_sync_filters_subtask() needs to be guarded by __I40E_VF_DISABLE,
which is also used by i40e_free_vfs().
Note: put the __I40E_VF_DISABLE check after the
__I40E_MACVLAN_SYNC_PENDING check as the latter is more likely to
trigger.
CC: stable(a)vger.kernel.org
Signed-off-by: Stefan Assmann <sassmann(a)kpanic.de>
Tested-by: Andrew Bowers <andrewx.bowers(a)intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher(a)intel.com>
diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
index e9f2f276bf27..3e2e465f43f9 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -2592,6 +2592,10 @@ static void i40e_sync_filters_subtask(struct i40e_pf *pf)
return;
if (!test_and_clear_bit(__I40E_MACVLAN_SYNC_PENDING, pf->state))
return;
+ if (test_and_set_bit(__I40E_VF_DISABLE, pf->state)) {
+ set_bit(__I40E_MACVLAN_SYNC_PENDING, pf->state);
+ return;
+ }
for (v = 0; v < pf->num_alloc_vsi; v++) {
if (pf->vsi[v] &&
@@ -2606,6 +2610,7 @@ static void i40e_sync_filters_subtask(struct i40e_pf *pf)
}
}
}
+ clear_bit(__I40E_VF_DISABLE, pf->state);
}
/**
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From a7542b87607560d0b89e7ff81d870bd6ff8835cb Mon Sep 17 00:00:00 2001
From: Stefan Assmann <sassmann(a)kpanic.de>
Date: Wed, 21 Aug 2019 16:09:29 +0200
Subject: [PATCH] i40e: check __I40E_VF_DISABLE bit in
i40e_sync_filters_subtask
While testing VF spawn/destroy the following panic occurred.
BUG: unable to handle kernel NULL pointer dereference at 0000000000000029
[...]
Workqueue: i40e i40e_service_task [i40e]
RIP: 0010:i40e_sync_vsi_filters+0x6fd/0xc60 [i40e]
[...]
Call Trace:
? __switch_to_asm+0x35/0x70
? __switch_to_asm+0x41/0x70
? __switch_to_asm+0x35/0x70
? _cond_resched+0x15/0x30
i40e_sync_filters_subtask+0x56/0x70 [i40e]
i40e_service_task+0x382/0x11b0 [i40e]
? __switch_to_asm+0x41/0x70
? __switch_to_asm+0x41/0x70
process_one_work+0x1a7/0x3b0
worker_thread+0x30/0x390
? create_worker+0x1a0/0x1a0
kthread+0x112/0x130
? kthread_bind+0x30/0x30
ret_from_fork+0x35/0x40
Investigation revealed a race where pf->vf[vsi->vf_id].trusted may get
accessed by the watchdog via i40e_sync_filters_subtask() although
i40e_free_vfs() already free'd pf->vf.
To avoid this the call to i40e_sync_vsi_filters() in
i40e_sync_filters_subtask() needs to be guarded by __I40E_VF_DISABLE,
which is also used by i40e_free_vfs().
Note: put the __I40E_VF_DISABLE check after the
__I40E_MACVLAN_SYNC_PENDING check as the latter is more likely to
trigger.
CC: stable(a)vger.kernel.org
Signed-off-by: Stefan Assmann <sassmann(a)kpanic.de>
Tested-by: Andrew Bowers <andrewx.bowers(a)intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher(a)intel.com>
diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
index e9f2f276bf27..3e2e465f43f9 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -2592,6 +2592,10 @@ static void i40e_sync_filters_subtask(struct i40e_pf *pf)
return;
if (!test_and_clear_bit(__I40E_MACVLAN_SYNC_PENDING, pf->state))
return;
+ if (test_and_set_bit(__I40E_VF_DISABLE, pf->state)) {
+ set_bit(__I40E_MACVLAN_SYNC_PENDING, pf->state);
+ return;
+ }
for (v = 0; v < pf->num_alloc_vsi; v++) {
if (pf->vsi[v] &&
@@ -2606,6 +2610,7 @@ static void i40e_sync_filters_subtask(struct i40e_pf *pf)
}
}
}
+ clear_bit(__I40E_VF_DISABLE, pf->state);
}
/**
The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From a7542b87607560d0b89e7ff81d870bd6ff8835cb Mon Sep 17 00:00:00 2001
From: Stefan Assmann <sassmann(a)kpanic.de>
Date: Wed, 21 Aug 2019 16:09:29 +0200
Subject: [PATCH] i40e: check __I40E_VF_DISABLE bit in
i40e_sync_filters_subtask
While testing VF spawn/destroy the following panic occurred.
BUG: unable to handle kernel NULL pointer dereference at 0000000000000029
[...]
Workqueue: i40e i40e_service_task [i40e]
RIP: 0010:i40e_sync_vsi_filters+0x6fd/0xc60 [i40e]
[...]
Call Trace:
? __switch_to_asm+0x35/0x70
? __switch_to_asm+0x41/0x70
? __switch_to_asm+0x35/0x70
? _cond_resched+0x15/0x30
i40e_sync_filters_subtask+0x56/0x70 [i40e]
i40e_service_task+0x382/0x11b0 [i40e]
? __switch_to_asm+0x41/0x70
? __switch_to_asm+0x41/0x70
process_one_work+0x1a7/0x3b0
worker_thread+0x30/0x390
? create_worker+0x1a0/0x1a0
kthread+0x112/0x130
? kthread_bind+0x30/0x30
ret_from_fork+0x35/0x40
Investigation revealed a race where pf->vf[vsi->vf_id].trusted may get
accessed by the watchdog via i40e_sync_filters_subtask() although
i40e_free_vfs() already free'd pf->vf.
To avoid this the call to i40e_sync_vsi_filters() in
i40e_sync_filters_subtask() needs to be guarded by __I40E_VF_DISABLE,
which is also used by i40e_free_vfs().
Note: put the __I40E_VF_DISABLE check after the
__I40E_MACVLAN_SYNC_PENDING check as the latter is more likely to
trigger.
CC: stable(a)vger.kernel.org
Signed-off-by: Stefan Assmann <sassmann(a)kpanic.de>
Tested-by: Andrew Bowers <andrewx.bowers(a)intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher(a)intel.com>
diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
index e9f2f276bf27..3e2e465f43f9 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -2592,6 +2592,10 @@ static void i40e_sync_filters_subtask(struct i40e_pf *pf)
return;
if (!test_and_clear_bit(__I40E_MACVLAN_SYNC_PENDING, pf->state))
return;
+ if (test_and_set_bit(__I40E_VF_DISABLE, pf->state)) {
+ set_bit(__I40E_MACVLAN_SYNC_PENDING, pf->state);
+ return;
+ }
for (v = 0; v < pf->num_alloc_vsi; v++) {
if (pf->vsi[v] &&
@@ -2606,6 +2610,7 @@ static void i40e_sync_filters_subtask(struct i40e_pf *pf)
}
}
}
+ clear_bit(__I40E_VF_DISABLE, pf->state);
}
/**
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 13fc1d271a2e3ab8a02071e711add01fab9271f6 Mon Sep 17 00:00:00 2001
From: Filipe Manana <fdmanana(a)suse.com>
Date: Tue, 24 Sep 2019 10:49:54 +0100
Subject: [PATCH] Btrfs: fix race setting up and completing qgroup rescan
workers
There is a race between setting up a qgroup rescan worker and completing
a qgroup rescan worker that can lead to callers of the qgroup rescan wait
ioctl to either not wait for the rescan worker to complete or to hang
forever due to missing wake ups. The following diagram shows a sequence
of steps that illustrates the race.
CPU 1 CPU 2 CPU 3
btrfs_ioctl_quota_rescan()
btrfs_qgroup_rescan()
qgroup_rescan_init()
mutex_lock(&fs_info->qgroup_rescan_lock)
spin_lock(&fs_info->qgroup_lock)
fs_info->qgroup_flags |=
BTRFS_QGROUP_STATUS_FLAG_RESCAN
init_completion(
&fs_info->qgroup_rescan_completion)
fs_info->qgroup_rescan_running = true
mutex_unlock(&fs_info->qgroup_rescan_lock)
spin_unlock(&fs_info->qgroup_lock)
btrfs_init_work()
--> starts the worker
btrfs_qgroup_rescan_worker()
mutex_lock(&fs_info->qgroup_rescan_lock)
fs_info->qgroup_flags &=
~BTRFS_QGROUP_STATUS_FLAG_RESCAN
mutex_unlock(&fs_info->qgroup_rescan_lock)
starts transaction, updates qgroup status
item, etc
btrfs_ioctl_quota_rescan()
btrfs_qgroup_rescan()
qgroup_rescan_init()
mutex_lock(&fs_info->qgroup_rescan_lock)
spin_lock(&fs_info->qgroup_lock)
fs_info->qgroup_flags |=
BTRFS_QGROUP_STATUS_FLAG_RESCAN
init_completion(
&fs_info->qgroup_rescan_completion)
fs_info->qgroup_rescan_running = true
mutex_unlock(&fs_info->qgroup_rescan_lock)
spin_unlock(&fs_info->qgroup_lock)
btrfs_init_work()
--> starts another worker
mutex_lock(&fs_info->qgroup_rescan_lock)
fs_info->qgroup_rescan_running = false
mutex_unlock(&fs_info->qgroup_rescan_lock)
complete_all(&fs_info->qgroup_rescan_completion)
Before the rescan worker started by the task at CPU 3 completes, if
another task calls btrfs_ioctl_quota_rescan(), it will get -EINPROGRESS
because the flag BTRFS_QGROUP_STATUS_FLAG_RESCAN is set at
fs_info->qgroup_flags, which is expected and correct behaviour.
However if other task calls btrfs_ioctl_quota_rescan_wait() before the
rescan worker started by the task at CPU 3 completes, it will return
immediately without waiting for the new rescan worker to complete,
because fs_info->qgroup_rescan_running is set to false by CPU 2.
This race is making test case btrfs/171 (from fstests) to fail often:
btrfs/171 9s ... - output mismatch (see /home/fdmanana/git/hub/xfstests/results//btrfs/171.out.bad)
--- tests/btrfs/171.out 2018-09-16 21:30:48.505104287 +0100
+++ /home/fdmanana/git/hub/xfstests/results//btrfs/171.out.bad 2019-09-19 02:01:36.938486039 +0100
@@ -1,2 +1,3 @@
QA output created by 171
+ERROR: quota rescan failed: Operation now in progress
Silence is golden
...
(Run 'diff -u /home/fdmanana/git/hub/xfstests/tests/btrfs/171.out /home/fdmanana/git/hub/xfstests/results//btrfs/171.out.bad' to see the entire diff)
That is because the test calls the btrfs-progs commands "qgroup quota
rescan -w", "qgroup assign" and "qgroup remove" in a sequence that makes
calls to the rescan start ioctl fail with -EINPROGRESS (note the "btrfs"
commands 'qgroup assign' and 'qgroup remove' often call the rescan start
ioctl after calling the qgroup assign ioctl,
btrfs_ioctl_qgroup_assign()), since previous waits didn't actually wait
for a rescan worker to complete.
Another problem the race can cause is missing wake ups for waiters,
since the call to complete_all() happens outside a critical section and
after clearing the flag BTRFS_QGROUP_STATUS_FLAG_RESCAN. In the sequence
diagram above, if we have a waiter for the first rescan task (executed
by CPU 2), then fs_info->qgroup_rescan_completion.wait is not empty, and
if after the rescan worker clears BTRFS_QGROUP_STATUS_FLAG_RESCAN and
before it calls complete_all() against
fs_info->qgroup_rescan_completion, the task at CPU 3 calls
init_completion() against fs_info->qgroup_rescan_completion which
re-initilizes its wait queue to an empty queue, therefore causing the
rescan worker at CPU 2 to call complete_all() against an empty queue,
never waking up the task waiting for that rescan worker.
Fix this by clearing BTRFS_QGROUP_STATUS_FLAG_RESCAN and setting
fs_info->qgroup_rescan_running to false in the same critical section,
delimited by the mutex fs_info->qgroup_rescan_lock, as well as doing the
call to complete_all() in that same critical section. This gives the
protection needed to avoid rescan wait ioctl callers not waiting for a
running rescan worker and the lost wake ups problem, since setting that
rescan flag and boolean as well as initializing the wait queue is done
already in a critical section delimited by that mutex (at
qgroup_rescan_init()).
Fixes: 57254b6ebce4ce ("Btrfs: add ioctl to wait for qgroup rescan completion")
Fixes: d2c609b834d62f ("btrfs: properly track when rescan worker is running")
CC: stable(a)vger.kernel.org # 4.4+
Reviewed-by: Josef Bacik <josef(a)toxicpanda.com>
Signed-off-by: Filipe Manana <fdmanana(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
index 8d3bd799ac7d..52701c1be109 100644
--- a/fs/btrfs/qgroup.c
+++ b/fs/btrfs/qgroup.c
@@ -3166,9 +3166,6 @@ static void btrfs_qgroup_rescan_worker(struct btrfs_work *work)
btrfs_free_path(path);
mutex_lock(&fs_info->qgroup_rescan_lock);
- if (!btrfs_fs_closing(fs_info))
- fs_info->qgroup_flags &= ~BTRFS_QGROUP_STATUS_FLAG_RESCAN;
-
if (err > 0 &&
fs_info->qgroup_flags & BTRFS_QGROUP_STATUS_FLAG_INCONSISTENT) {
fs_info->qgroup_flags &= ~BTRFS_QGROUP_STATUS_FLAG_INCONSISTENT;
@@ -3184,16 +3181,30 @@ static void btrfs_qgroup_rescan_worker(struct btrfs_work *work)
trans = btrfs_start_transaction(fs_info->quota_root, 1);
if (IS_ERR(trans)) {
err = PTR_ERR(trans);
+ trans = NULL;
btrfs_err(fs_info,
"fail to start transaction for status update: %d",
err);
- goto done;
}
- ret = update_qgroup_status_item(trans);
- if (ret < 0) {
- err = ret;
- btrfs_err(fs_info, "fail to update qgroup status: %d", err);
+
+ mutex_lock(&fs_info->qgroup_rescan_lock);
+ if (!btrfs_fs_closing(fs_info))
+ fs_info->qgroup_flags &= ~BTRFS_QGROUP_STATUS_FLAG_RESCAN;
+ if (trans) {
+ ret = update_qgroup_status_item(trans);
+ if (ret < 0) {
+ err = ret;
+ btrfs_err(fs_info, "fail to update qgroup status: %d",
+ err);
+ }
}
+ fs_info->qgroup_rescan_running = false;
+ complete_all(&fs_info->qgroup_rescan_completion);
+ mutex_unlock(&fs_info->qgroup_rescan_lock);
+
+ if (!trans)
+ return;
+
btrfs_end_transaction(trans);
if (btrfs_fs_closing(fs_info)) {
@@ -3204,12 +3215,6 @@ static void btrfs_qgroup_rescan_worker(struct btrfs_work *work)
} else {
btrfs_err(fs_info, "qgroup scan failed with %d", err);
}
-
-done:
- mutex_lock(&fs_info->qgroup_rescan_lock);
- fs_info->qgroup_rescan_running = false;
- mutex_unlock(&fs_info->qgroup_rescan_lock);
- complete_all(&fs_info->qgroup_rescan_completion);
}
/*
The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 13fc1d271a2e3ab8a02071e711add01fab9271f6 Mon Sep 17 00:00:00 2001
From: Filipe Manana <fdmanana(a)suse.com>
Date: Tue, 24 Sep 2019 10:49:54 +0100
Subject: [PATCH] Btrfs: fix race setting up and completing qgroup rescan
workers
There is a race between setting up a qgroup rescan worker and completing
a qgroup rescan worker that can lead to callers of the qgroup rescan wait
ioctl to either not wait for the rescan worker to complete or to hang
forever due to missing wake ups. The following diagram shows a sequence
of steps that illustrates the race.
CPU 1 CPU 2 CPU 3
btrfs_ioctl_quota_rescan()
btrfs_qgroup_rescan()
qgroup_rescan_init()
mutex_lock(&fs_info->qgroup_rescan_lock)
spin_lock(&fs_info->qgroup_lock)
fs_info->qgroup_flags |=
BTRFS_QGROUP_STATUS_FLAG_RESCAN
init_completion(
&fs_info->qgroup_rescan_completion)
fs_info->qgroup_rescan_running = true
mutex_unlock(&fs_info->qgroup_rescan_lock)
spin_unlock(&fs_info->qgroup_lock)
btrfs_init_work()
--> starts the worker
btrfs_qgroup_rescan_worker()
mutex_lock(&fs_info->qgroup_rescan_lock)
fs_info->qgroup_flags &=
~BTRFS_QGROUP_STATUS_FLAG_RESCAN
mutex_unlock(&fs_info->qgroup_rescan_lock)
starts transaction, updates qgroup status
item, etc
btrfs_ioctl_quota_rescan()
btrfs_qgroup_rescan()
qgroup_rescan_init()
mutex_lock(&fs_info->qgroup_rescan_lock)
spin_lock(&fs_info->qgroup_lock)
fs_info->qgroup_flags |=
BTRFS_QGROUP_STATUS_FLAG_RESCAN
init_completion(
&fs_info->qgroup_rescan_completion)
fs_info->qgroup_rescan_running = true
mutex_unlock(&fs_info->qgroup_rescan_lock)
spin_unlock(&fs_info->qgroup_lock)
btrfs_init_work()
--> starts another worker
mutex_lock(&fs_info->qgroup_rescan_lock)
fs_info->qgroup_rescan_running = false
mutex_unlock(&fs_info->qgroup_rescan_lock)
complete_all(&fs_info->qgroup_rescan_completion)
Before the rescan worker started by the task at CPU 3 completes, if
another task calls btrfs_ioctl_quota_rescan(), it will get -EINPROGRESS
because the flag BTRFS_QGROUP_STATUS_FLAG_RESCAN is set at
fs_info->qgroup_flags, which is expected and correct behaviour.
However if other task calls btrfs_ioctl_quota_rescan_wait() before the
rescan worker started by the task at CPU 3 completes, it will return
immediately without waiting for the new rescan worker to complete,
because fs_info->qgroup_rescan_running is set to false by CPU 2.
This race is making test case btrfs/171 (from fstests) to fail often:
btrfs/171 9s ... - output mismatch (see /home/fdmanana/git/hub/xfstests/results//btrfs/171.out.bad)
--- tests/btrfs/171.out 2018-09-16 21:30:48.505104287 +0100
+++ /home/fdmanana/git/hub/xfstests/results//btrfs/171.out.bad 2019-09-19 02:01:36.938486039 +0100
@@ -1,2 +1,3 @@
QA output created by 171
+ERROR: quota rescan failed: Operation now in progress
Silence is golden
...
(Run 'diff -u /home/fdmanana/git/hub/xfstests/tests/btrfs/171.out /home/fdmanana/git/hub/xfstests/results//btrfs/171.out.bad' to see the entire diff)
That is because the test calls the btrfs-progs commands "qgroup quota
rescan -w", "qgroup assign" and "qgroup remove" in a sequence that makes
calls to the rescan start ioctl fail with -EINPROGRESS (note the "btrfs"
commands 'qgroup assign' and 'qgroup remove' often call the rescan start
ioctl after calling the qgroup assign ioctl,
btrfs_ioctl_qgroup_assign()), since previous waits didn't actually wait
for a rescan worker to complete.
Another problem the race can cause is missing wake ups for waiters,
since the call to complete_all() happens outside a critical section and
after clearing the flag BTRFS_QGROUP_STATUS_FLAG_RESCAN. In the sequence
diagram above, if we have a waiter for the first rescan task (executed
by CPU 2), then fs_info->qgroup_rescan_completion.wait is not empty, and
if after the rescan worker clears BTRFS_QGROUP_STATUS_FLAG_RESCAN and
before it calls complete_all() against
fs_info->qgroup_rescan_completion, the task at CPU 3 calls
init_completion() against fs_info->qgroup_rescan_completion which
re-initilizes its wait queue to an empty queue, therefore causing the
rescan worker at CPU 2 to call complete_all() against an empty queue,
never waking up the task waiting for that rescan worker.
Fix this by clearing BTRFS_QGROUP_STATUS_FLAG_RESCAN and setting
fs_info->qgroup_rescan_running to false in the same critical section,
delimited by the mutex fs_info->qgroup_rescan_lock, as well as doing the
call to complete_all() in that same critical section. This gives the
protection needed to avoid rescan wait ioctl callers not waiting for a
running rescan worker and the lost wake ups problem, since setting that
rescan flag and boolean as well as initializing the wait queue is done
already in a critical section delimited by that mutex (at
qgroup_rescan_init()).
Fixes: 57254b6ebce4ce ("Btrfs: add ioctl to wait for qgroup rescan completion")
Fixes: d2c609b834d62f ("btrfs: properly track when rescan worker is running")
CC: stable(a)vger.kernel.org # 4.4+
Reviewed-by: Josef Bacik <josef(a)toxicpanda.com>
Signed-off-by: Filipe Manana <fdmanana(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
index 8d3bd799ac7d..52701c1be109 100644
--- a/fs/btrfs/qgroup.c
+++ b/fs/btrfs/qgroup.c
@@ -3166,9 +3166,6 @@ static void btrfs_qgroup_rescan_worker(struct btrfs_work *work)
btrfs_free_path(path);
mutex_lock(&fs_info->qgroup_rescan_lock);
- if (!btrfs_fs_closing(fs_info))
- fs_info->qgroup_flags &= ~BTRFS_QGROUP_STATUS_FLAG_RESCAN;
-
if (err > 0 &&
fs_info->qgroup_flags & BTRFS_QGROUP_STATUS_FLAG_INCONSISTENT) {
fs_info->qgroup_flags &= ~BTRFS_QGROUP_STATUS_FLAG_INCONSISTENT;
@@ -3184,16 +3181,30 @@ static void btrfs_qgroup_rescan_worker(struct btrfs_work *work)
trans = btrfs_start_transaction(fs_info->quota_root, 1);
if (IS_ERR(trans)) {
err = PTR_ERR(trans);
+ trans = NULL;
btrfs_err(fs_info,
"fail to start transaction for status update: %d",
err);
- goto done;
}
- ret = update_qgroup_status_item(trans);
- if (ret < 0) {
- err = ret;
- btrfs_err(fs_info, "fail to update qgroup status: %d", err);
+
+ mutex_lock(&fs_info->qgroup_rescan_lock);
+ if (!btrfs_fs_closing(fs_info))
+ fs_info->qgroup_flags &= ~BTRFS_QGROUP_STATUS_FLAG_RESCAN;
+ if (trans) {
+ ret = update_qgroup_status_item(trans);
+ if (ret < 0) {
+ err = ret;
+ btrfs_err(fs_info, "fail to update qgroup status: %d",
+ err);
+ }
}
+ fs_info->qgroup_rescan_running = false;
+ complete_all(&fs_info->qgroup_rescan_completion);
+ mutex_unlock(&fs_info->qgroup_rescan_lock);
+
+ if (!trans)
+ return;
+
btrfs_end_transaction(trans);
if (btrfs_fs_closing(fs_info)) {
@@ -3204,12 +3215,6 @@ static void btrfs_qgroup_rescan_worker(struct btrfs_work *work)
} else {
btrfs_err(fs_info, "qgroup scan failed with %d", err);
}
-
-done:
- mutex_lock(&fs_info->qgroup_rescan_lock);
- fs_info->qgroup_rescan_running = false;
- mutex_unlock(&fs_info->qgroup_rescan_lock);
- complete_all(&fs_info->qgroup_rescan_completion);
}
/*
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From d4e204948fe3e0dc8e1fbf3f8f3290c9c2823be3 Mon Sep 17 00:00:00 2001
From: Qu Wenruo <wqu(a)suse.com>
Date: Mon, 16 Sep 2019 20:02:39 +0800
Subject: [PATCH] btrfs: qgroup: Fix reserved data space leak if we have
multiple reserve calls
[BUG]
The following script can cause btrfs qgroup data space leak:
mkfs.btrfs -f $dev
mount $dev -o nospace_cache $mnt
btrfs subv create $mnt/subv
btrfs quota en $mnt
btrfs quota rescan -w $mnt
btrfs qgroup limit 128m $mnt/subv
for (( i = 0; i < 3; i++)); do
# Create 3 64M holes for latter fallocate to fail
truncate -s 192m $mnt/subv/file
xfs_io -c "pwrite 64m 4k" $mnt/subv/file > /dev/null
xfs_io -c "pwrite 128m 4k" $mnt/subv/file > /dev/null
sync
# it's supposed to fail, and each failure will leak at least 64M
# data space
xfs_io -f -c "falloc 0 192m" $mnt/subv/file &> /dev/null
rm $mnt/subv/file
sync
done
# Shouldn't fail after we removed the file
xfs_io -f -c "falloc 0 64m" $mnt/subv/file
[CAUSE]
Btrfs qgroup data reserve code allow multiple reservations to happen on
a single extent_changeset:
E.g:
btrfs_qgroup_reserve_data(inode, &data_reserved, 0, SZ_1M);
btrfs_qgroup_reserve_data(inode, &data_reserved, SZ_1M, SZ_2M);
btrfs_qgroup_reserve_data(inode, &data_reserved, 0, SZ_4M);
Btrfs qgroup code has its internal tracking to make sure we don't
double-reserve in above example.
The only pattern utilizing this feature is in the main while loop of
btrfs_fallocate() function.
However btrfs_qgroup_reserve_data()'s error handling has a bug in that
on error it clears all ranges in the io_tree with EXTENT_QGROUP_RESERVED
flag but doesn't free previously reserved bytes.
This bug has a two fold effect:
- Clearing EXTENT_QGROUP_RESERVED ranges
This is the correct behavior, but it prevents
btrfs_qgroup_check_reserved_leak() to catch the leakage as the
detector is purely EXTENT_QGROUP_RESERVED flag based.
- Leak the previously reserved data bytes.
The bug manifests when N calls to btrfs_qgroup_reserve_data are made and
the last one fails, leaking space reserved in the previous ones.
[FIX]
Also free previously reserved data bytes when btrfs_qgroup_reserve_data
fails.
Fixes: 524725537023 ("btrfs: qgroup: Introduce btrfs_qgroup_reserve_data function")
CC: stable(a)vger.kernel.org # 4.4+
Signed-off-by: Qu Wenruo <wqu(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
index 4ab85555a947..c4bb69941c77 100644
--- a/fs/btrfs/qgroup.c
+++ b/fs/btrfs/qgroup.c
@@ -3442,6 +3442,9 @@ int btrfs_qgroup_reserve_data(struct inode *inode,
while ((unode = ulist_next(&reserved->range_changed, &uiter)))
clear_extent_bit(&BTRFS_I(inode)->io_tree, unode->val,
unode->aux, EXTENT_QGROUP_RESERVED, 0, 0, NULL);
+ /* Also free data bytes of already reserved one */
+ btrfs_qgroup_free_refroot(root->fs_info, root->root_key.objectid,
+ orig_reserved, BTRFS_QGROUP_RSV_DATA);
extent_changeset_release(reserved);
return ret;
}
The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From d4e204948fe3e0dc8e1fbf3f8f3290c9c2823be3 Mon Sep 17 00:00:00 2001
From: Qu Wenruo <wqu(a)suse.com>
Date: Mon, 16 Sep 2019 20:02:39 +0800
Subject: [PATCH] btrfs: qgroup: Fix reserved data space leak if we have
multiple reserve calls
[BUG]
The following script can cause btrfs qgroup data space leak:
mkfs.btrfs -f $dev
mount $dev -o nospace_cache $mnt
btrfs subv create $mnt/subv
btrfs quota en $mnt
btrfs quota rescan -w $mnt
btrfs qgroup limit 128m $mnt/subv
for (( i = 0; i < 3; i++)); do
# Create 3 64M holes for latter fallocate to fail
truncate -s 192m $mnt/subv/file
xfs_io -c "pwrite 64m 4k" $mnt/subv/file > /dev/null
xfs_io -c "pwrite 128m 4k" $mnt/subv/file > /dev/null
sync
# it's supposed to fail, and each failure will leak at least 64M
# data space
xfs_io -f -c "falloc 0 192m" $mnt/subv/file &> /dev/null
rm $mnt/subv/file
sync
done
# Shouldn't fail after we removed the file
xfs_io -f -c "falloc 0 64m" $mnt/subv/file
[CAUSE]
Btrfs qgroup data reserve code allow multiple reservations to happen on
a single extent_changeset:
E.g:
btrfs_qgroup_reserve_data(inode, &data_reserved, 0, SZ_1M);
btrfs_qgroup_reserve_data(inode, &data_reserved, SZ_1M, SZ_2M);
btrfs_qgroup_reserve_data(inode, &data_reserved, 0, SZ_4M);
Btrfs qgroup code has its internal tracking to make sure we don't
double-reserve in above example.
The only pattern utilizing this feature is in the main while loop of
btrfs_fallocate() function.
However btrfs_qgroup_reserve_data()'s error handling has a bug in that
on error it clears all ranges in the io_tree with EXTENT_QGROUP_RESERVED
flag but doesn't free previously reserved bytes.
This bug has a two fold effect:
- Clearing EXTENT_QGROUP_RESERVED ranges
This is the correct behavior, but it prevents
btrfs_qgroup_check_reserved_leak() to catch the leakage as the
detector is purely EXTENT_QGROUP_RESERVED flag based.
- Leak the previously reserved data bytes.
The bug manifests when N calls to btrfs_qgroup_reserve_data are made and
the last one fails, leaking space reserved in the previous ones.
[FIX]
Also free previously reserved data bytes when btrfs_qgroup_reserve_data
fails.
Fixes: 524725537023 ("btrfs: qgroup: Introduce btrfs_qgroup_reserve_data function")
CC: stable(a)vger.kernel.org # 4.4+
Signed-off-by: Qu Wenruo <wqu(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
index 4ab85555a947..c4bb69941c77 100644
--- a/fs/btrfs/qgroup.c
+++ b/fs/btrfs/qgroup.c
@@ -3442,6 +3442,9 @@ int btrfs_qgroup_reserve_data(struct inode *inode,
while ((unode = ulist_next(&reserved->range_changed, &uiter)))
clear_extent_bit(&BTRFS_I(inode)->io_tree, unode->val,
unode->aux, EXTENT_QGROUP_RESERVED, 0, 0, NULL);
+ /* Also free data bytes of already reserved one */
+ btrfs_qgroup_free_refroot(root->fs_info, root->root_key.objectid,
+ orig_reserved, BTRFS_QGROUP_RSV_DATA);
extent_changeset_release(reserved);
return ret;
}
Dear Sasha,
On 03.10.19 00:49, Sasha Levin wrote:
> This commit has been processed because it contains a -stable tag.
> The stable tag indicates that it's relevant for the following trees: 4.18+
I’ll still need to test it.
> The bot has tested the following trees: v5.3.1, v5.2.17, v4.19.75.
>
> v5.3.1: Build OK!
> v5.2.17: Build OK!
> v4.19.75: Failed to apply! Possible dependencies:
> ce0925e8c2f8 ("mei: dma ring buffers allocation")
>
>
> NOTE: The patch will not be queued to stable trees until it is upstream.
>
> How should we proceed with this patch?
>
> --
> Thanks,
> Sasha
Could you please add a space after the two minuses: `-- `? That would
make it a valid signature separator, and mail clients can strip it.
Kind regards,
Paul