Current journal_max_cmp() and journal_min_cmp() assume that smaller fifo index indicating elder journal entries, but this is only true when fifo index is not swapped.
Fifo structure journal.pin is implemented by a cycle buffer, if the head index reaches highest location of the cycle buffer, it will be swapped to 0. Once the swapping happens, it means a smaller fifo index might be associated to a newer journal entry. So the btree node with oldest journal entry won't be selected by btree_flush_write() to flush out to cache device. The result is, the oldest journal entries may always has no chance to be written into cache device, and after a reboot bch_journal_replay() may complain some journal entries are missing.
This patch handles the fifo index swapping conditions properly, then in btree_flush_write() the btree node with oldest journal entry can be slected from c->flush_btree correctly.
Cc: stable@vger.kernel.org Signed-off-by: Coly Li colyli@suse.de --- drivers/md/bcache/journal.c | 47 +++++++++++++++++++++++++++++++++++++++------ 1 file changed, 41 insertions(+), 6 deletions(-)
diff --git a/drivers/md/bcache/journal.c b/drivers/md/bcache/journal.c index bdb6f9cefe48..bc0e01151155 100644 --- a/drivers/md/bcache/journal.c +++ b/drivers/md/bcache/journal.c @@ -464,12 +464,47 @@ int bch_journal_replay(struct cache_set *s, struct list_head *list) }
/* Journalling */ -#define journal_max_cmp(l, r) \ - (fifo_idx(&c->journal.pin, btree_current_write(l)->journal) < \ - fifo_idx(&(c)->journal.pin, btree_current_write(r)->journal)) -#define journal_min_cmp(l, r) \ - (fifo_idx(&c->journal.pin, btree_current_write(l)->journal) > \ - fifo_idx(&(c)->journal.pin, btree_current_write(r)->journal)) +#define journal_max_cmp(l, r) \ +({ \ + int l_idx, r_idx, f_idx, b_idx; \ + bool _ret = true; \ + \ + l_idx = fifo_idx(&c->journal.pin, btree_current_write(l)->journal); \ + r_idx = fifo_idx(&c->journal.pin, btree_current_write(r)->journal); \ + f_idx = c->journal.pin.front; \ + b_idx = c->journal.pin.back; \ + \ + _ret = (l_idx < r_idx); \ + /* in case fifo back pointer is swapped */ \ + if (b_idx < f_idx) { \ + if (l_idx <= b_idx && r_idx >= f_idx) \ + _ret = false; \ + else if (l_idx >= f_idx && r_idx <= b_idx) \ + _ret = true; \ + } \ + _ret; \ +}) + +#define journal_min_cmp(l, r) \ +({ \ + int l_idx, r_idx, f_idx, b_idx; \ + bool _ret = true; \ + \ + l_idx = fifo_idx(&c->journal.pin, btree_current_write(l)->journal); \ + r_idx = fifo_idx(&c->journal.pin, btree_current_write(r)->journal); \ + f_idx = c->journal.pin.front; \ + b_idx = c->journal.pin.back; \ + \ + _ret = (l_idx > r_idx); \ + /* in case fifo back pointer is swapped */ \ + if (b_idx < f_idx) { \ + if (l_idx <= b_idx && r_idx >= f_idx) \ + _ret = true; \ + else if (l_idx >= f_idx && r_idx <= b_idx) \ + _ret = false; \ + } \ + _ret; \ +})
static void btree_flush_write(struct cache_set *c) {
On 4/19/19 6:05 PM, Coly Li wrote:
Current journal_max_cmp() and journal_min_cmp() assume that smaller fifo index indicating elder journal entries, but this is only true when fifo index is not swapped.
Fifo structure journal.pin is implemented by a cycle buffer, if the head index reaches highest location of the cycle buffer, it will be swapped to 0. Once the swapping happens, it means a smaller fifo index might be associated to a newer journal entry. So the btree node with oldest journal entry won't be selected by btree_flush_write() to flush out to cache device. The result is, the oldest journal entries may always has no chance to be written into cache device, and after a reboot bch_journal_replay() may complain some journal entries are missing.
This patch handles the fifo index swapping conditions properly, then in btree_flush_write() the btree node with oldest journal entry can be slected from c->flush_btree correctly.
Cc: stable@vger.kernel.org Signed-off-by: Coly Li colyli@suse.de
drivers/md/bcache/journal.c | 47 +++++++++++++++++++++++++++++++++++++++------ 1 file changed, 41 insertions(+), 6 deletions(-)
diff --git a/drivers/md/bcache/journal.c b/drivers/md/bcache/journal.c index bdb6f9cefe48..bc0e01151155 100644 --- a/drivers/md/bcache/journal.c +++ b/drivers/md/bcache/journal.c @@ -464,12 +464,47 @@ int bch_journal_replay(struct cache_set *s, struct list_head *list) } /* Journalling */ -#define journal_max_cmp(l, r) \
- (fifo_idx(&c->journal.pin, btree_current_write(l)->journal) < \
fifo_idx(&(c)->journal.pin, btree_current_write(r)->journal))
-#define journal_min_cmp(l, r) \
- (fifo_idx(&c->journal.pin, btree_current_write(l)->journal) > \
fifo_idx(&(c)->journal.pin, btree_current_write(r)->journal))
+#define journal_max_cmp(l, r) \ +({ \
- int l_idx, r_idx, f_idx, b_idx; \
- bool _ret = true; \
\
- l_idx = fifo_idx(&c->journal.pin, btree_current_write(l)->journal); \
- r_idx = fifo_idx(&c->journal.pin, btree_current_write(r)->journal); \
- f_idx = c->journal.pin.front; \
- b_idx = c->journal.pin.back; \
\
- _ret = (l_idx < r_idx); \
- /* in case fifo back pointer is swapped */ \
- if (b_idx < f_idx) { \
if (l_idx <= b_idx && r_idx >= f_idx) \
_ret = false; \
else if (l_idx >= f_idx && r_idx <= b_idx) \
_ret = true; \
- } \
- _ret; \
+})
+#define journal_min_cmp(l, r) \ +({ \
- int l_idx, r_idx, f_idx, b_idx; \
- bool _ret = true; \
\
- l_idx = fifo_idx(&c->journal.pin, btree_current_write(l)->journal); \
- r_idx = fifo_idx(&c->journal.pin, btree_current_write(r)->journal); \
- f_idx = c->journal.pin.front; \
- b_idx = c->journal.pin.back; \
\
- _ret = (l_idx > r_idx); \
- /* in case fifo back pointer is swapped */ \
- if (b_idx < f_idx) { \
if (l_idx <= b_idx && r_idx >= f_idx) \
_ret = true; \
else if (l_idx >= f_idx && r_idx <= b_idx) \
_ret = false; \
- } \
- _ret; \
+}) static void btree_flush_write(struct cache_set *c) {
Please make it a proper function. This is far too convoluted for being handled via #define, and it would avoid cluttering the function namespace with hidden variables.
Cheers,
Hannes
On 2019/4/23 3:09 下午, Hannes Reinecke wrote:
On 4/19/19 6:05 PM, Coly Li wrote:
Current journal_max_cmp() and journal_min_cmp() assume that smaller fifo index indicating elder journal entries, but this is only true when fifo index is not swapped.
Fifo structure journal.pin is implemented by a cycle buffer, if the head index reaches highest location of the cycle buffer, it will be swapped to 0. Once the swapping happens, it means a smaller fifo index might be associated to a newer journal entry. So the btree node with oldest journal entry won't be selected by btree_flush_write() to flush out to cache device. The result is, the oldest journal entries may always has no chance to be written into cache device, and after a reboot bch_journal_replay() may complain some journal entries are missing.
This patch handles the fifo index swapping conditions properly, then in btree_flush_write() the btree node with oldest journal entry can be slected from c->flush_btree correctly.
Cc: stable@vger.kernel.org Signed-off-by: Coly Li colyli@suse.de
drivers/md/bcache/journal.c | 47 +++++++++++++++++++++++++++++++++++++++------ 1 file changed, 41 insertions(+), 6 deletions(-)
diff --git a/drivers/md/bcache/journal.c b/drivers/md/bcache/journal.c index bdb6f9cefe48..bc0e01151155 100644 --- a/drivers/md/bcache/journal.c +++ b/drivers/md/bcache/journal.c @@ -464,12 +464,47 @@ int bch_journal_replay(struct cache_set *s, struct list_head *list) } /* Journalling */ -#define journal_max_cmp(l, r) \ - (fifo_idx(&c->journal.pin, btree_current_write(l)->journal) < \ - fifo_idx(&(c)->journal.pin, btree_current_write(r)->journal)) -#define journal_min_cmp(l, r) \ - (fifo_idx(&c->journal.pin, btree_current_write(l)->journal) > \ - fifo_idx(&(c)->journal.pin, btree_current_write(r)->journal)) +#define journal_max_cmp(l, r) \ +({ \ + int l_idx, r_idx, f_idx, b_idx; \ + bool _ret = true; \ + \ + l_idx = fifo_idx(&c->journal.pin, btree_current_write(l)->journal); \ + r_idx = fifo_idx(&c->journal.pin, btree_current_write(r)->journal); \ + f_idx = c->journal.pin.front; \ + b_idx = c->journal.pin.back; \ + \ + _ret = (l_idx < r_idx); \ + /* in case fifo back pointer is swapped */ \ + if (b_idx < f_idx) { \ + if (l_idx <= b_idx && r_idx >= f_idx) \ + _ret = false; \ + else if (l_idx >= f_idx && r_idx <= b_idx) \ + _ret = true; \ + } \ + _ret; \ +})
+#define journal_min_cmp(l, r) \ +({ \ + int l_idx, r_idx, f_idx, b_idx; \ + bool _ret = true; \ + \ + l_idx = fifo_idx(&c->journal.pin, btree_current_write(l)->journal); \ + r_idx = fifo_idx(&c->journal.pin, btree_current_write(r)->journal); \ + f_idx = c->journal.pin.front; \ + b_idx = c->journal.pin.back; \ + \ + _ret = (l_idx > r_idx); \ + /* in case fifo back pointer is swapped */ \ + if (b_idx < f_idx) { \ + if (l_idx <= b_idx && r_idx >= f_idx) \ + _ret = true; \ + else if (l_idx >= f_idx && r_idx <= b_idx) \ + _ret = false; \ + } \ + _ret; \ +}) static void btree_flush_write(struct cache_set *c) {
Please make it a proper function. This is far too convoluted for being handled via #define, and it would avoid cluttering the function namespace with hidden variables.
Hi Hannes,
Sure let me do it in next version. Thanks.
linux-stable-mirror@lists.linaro.org