Hi Greg and everyone,
On all kernels, SEV enabled guests hit might_sleep() warnings when a driver (nvme in this case) allocates through the DMA API in a non-blockable context:
BUG: sleeping function called from invalid context at mm/vmalloc.c:1710 in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 3383, name: fio 2 locks held by fio/3383: #0: ffff93b6a8568348 (&sb->s_type->i_mutex_key#16){+.+.}, at: ext4_file_write_iter+0xa2/0x5d0 #1: ffffffffa52a61a0 (rcu_read_lock){....}, at: hctx_lock+0x1a/0xe0 CPU: 0 PID: 3383 Comm: fio Tainted: G W 5.5.10 #14 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: dump_stack+0x98/0xd5 ___might_sleep+0x175/0x260 __might_sleep+0x4a/0x80 _vm_unmap_aliases+0x45/0x250 vm_unmap_aliases+0x19/0x20 __set_memory_enc_dec+0xa4/0x130 set_memory_decrypted+0x10/0x20 dma_direct_alloc_pages+0x148/0x150 dma_direct_alloc+0xe/0x10 dma_alloc_attrs+0x86/0xc0 dma_pool_alloc+0x16f/0x2b0 nvme_queue_rq+0x878/0xc30 [nvme] __blk_mq_try_issue_directly+0x135/0x200 blk_mq_request_issue_directly+0x4f/0x80 blk_mq_try_issue_list_directly+0x46/0xb0 blk_mq_sched_insert_requests+0x19b/0x2b0 blk_mq_flush_plug_list+0x22f/0x3b0 blk_flush_plug_list+0xd1/0x100 blk_finish_plug+0x2c/0x40 iomap_dio_rw+0x427/0x490 ext4_file_write_iter+0x181/0x5d0 aio_write+0x109/0x1b0 io_submit_one+0x7d0/0xfa0 __x64_sys_io_submit+0xa2/0x280 do_syscall_64+0x5f/0x250 entry_SYSCALL_64_after_hwframe+0x49/0xbe
There is a series of patches in Christoph's dma-mapping.git repo in the for-next branch on track for 5.8:
1d659236fb43 dma-pool: scale the default DMA coherent pool size with memory capacity 82fef0ad811f x86/mm: unencrypted non-blocking DMA allocations use coherent pools 2edc5bb3c5cc dma-pool: add pool sizes to debugfs 76a19940bd62 dma-direct: atomic allocations must come from atomic coherent pools 54adadf9b085 dma-pool: dynamically expanding atomic pools c84dc6e68a1d dma-pool: add additional coherent pools to map to gfp mask e860c299ac0d dma-remap: separate DMA atomic pools from direct remap code
We'd like to prepare backports to LTS kernels so that our guest images are not modified by us and don't exhibit this issue.
They are bigger than we'd like:
arch/x86/Kconfig | 1 + drivers/iommu/dma-iommu.c | 5 +- include/linux/dma-direct.h | 2 + include/linux/dma-mapping.h | 6 +- kernel/dma/Kconfig | 6 +- kernel/dma/Makefile | 1 + kernel/dma/direct.c | 56 ++++++-- kernel/dma/pool.c | 264 ++++++++++++++++++++++++++++++++++++ kernel/dma/remap.c | 121 +---------------- 9 files changed, 324 insertions(+), 138 deletions(-) create mode 100644 kernel/dma/pool.c
But they apply relatively cleanly to more modern kernels like 5.4. We'd like to backport these all the way to 4.19, however, otherwise guests encounter these bugs.
The changes to kernel/dma/remap.c, for example, simply moves code to the new pool.c. But that original code is actually in arch/arm64 in 4.19 and was moved in 5.0:
commit 0c3b3171ceccb8830c2bb5adff1b4e9b204c1450 Author: Christoph Hellwig hch@lst.de Date: Sun Nov 4 20:29:28 2018 +0100
dma-mapping: move the arm64 noncoherent alloc/free support to common code
commit f0edfea8ef93ed6cc5f747c46c85c8e53e0798a0 Author: Christoph Hellwig hch@lst.de Date: Fri Aug 24 10:31:08 2018 +0200
dma-mapping: move the remap helpers to a separate file
And there are most certainly more dependencies to get a cleanly applying series to 4.19.123. So the backports could be quite extensive.
Peter Gonda pgonda@google.com is currently handling these and we're looking for advice: should we compile a full list of required backports that would be needed to get a series that would only consist of minor conflicts or is this going to be a non-starter?
Thanks!
On Tue, May 19, 2020 at 05:41:15PM -0700, David Rientjes wrote:
Hi Greg and everyone,
On all kernels, SEV enabled guests hit might_sleep() warnings when a driver (nvme in this case) allocates through the DMA API in a non-blockable context:
BUG: sleeping function called from invalid context at mm/vmalloc.c:1710 in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 3383, name: fio 2 locks held by fio/3383: #0: ffff93b6a8568348 (&sb->s_type->i_mutex_key#16){+.+.}, at: ext4_file_write_iter+0xa2/0x5d0 #1: ffffffffa52a61a0 (rcu_read_lock){....}, at: hctx_lock+0x1a/0xe0 CPU: 0 PID: 3383 Comm: fio Tainted: G W 5.5.10 #14 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: dump_stack+0x98/0xd5 ___might_sleep+0x175/0x260 __might_sleep+0x4a/0x80 _vm_unmap_aliases+0x45/0x250 vm_unmap_aliases+0x19/0x20 __set_memory_enc_dec+0xa4/0x130 set_memory_decrypted+0x10/0x20 dma_direct_alloc_pages+0x148/0x150 dma_direct_alloc+0xe/0x10 dma_alloc_attrs+0x86/0xc0 dma_pool_alloc+0x16f/0x2b0 nvme_queue_rq+0x878/0xc30 [nvme] __blk_mq_try_issue_directly+0x135/0x200 blk_mq_request_issue_directly+0x4f/0x80 blk_mq_try_issue_list_directly+0x46/0xb0 blk_mq_sched_insert_requests+0x19b/0x2b0 blk_mq_flush_plug_list+0x22f/0x3b0 blk_flush_plug_list+0xd1/0x100 blk_finish_plug+0x2c/0x40 iomap_dio_rw+0x427/0x490 ext4_file_write_iter+0x181/0x5d0 aio_write+0x109/0x1b0 io_submit_one+0x7d0/0xfa0 __x64_sys_io_submit+0xa2/0x280 do_syscall_64+0x5f/0x250 entry_SYSCALL_64_after_hwframe+0x49/0xbe
There is a series of patches in Christoph's dma-mapping.git repo in the for-next branch on track for 5.8:
1d659236fb43 dma-pool: scale the default DMA coherent pool size with memory capacity 82fef0ad811f x86/mm: unencrypted non-blocking DMA allocations use coherent pools 2edc5bb3c5cc dma-pool: add pool sizes to debugfs 76a19940bd62 dma-direct: atomic allocations must come from atomic coherent pools 54adadf9b085 dma-pool: dynamically expanding atomic pools c84dc6e68a1d dma-pool: add additional coherent pools to map to gfp mask e860c299ac0d dma-remap: separate DMA atomic pools from direct remap code
We'd like to prepare backports to LTS kernels so that our guest images are not modified by us and don't exhibit this issue.
They are bigger than we'd like:
arch/x86/Kconfig | 1 + drivers/iommu/dma-iommu.c | 5 +- include/linux/dma-direct.h | 2 + include/linux/dma-mapping.h | 6 +- kernel/dma/Kconfig | 6 +- kernel/dma/Makefile | 1 + kernel/dma/direct.c | 56 ++++++-- kernel/dma/pool.c | 264 ++++++++++++++++++++++++++++++++++++ kernel/dma/remap.c | 121 +---------------- 9 files changed, 324 insertions(+), 138 deletions(-) create mode 100644 kernel/dma/pool.c
But they apply relatively cleanly to more modern kernels like 5.4. We'd like to backport these all the way to 4.19, however, otherwise guests encounter these bugs.
The changes to kernel/dma/remap.c, for example, simply moves code to the new pool.c. But that original code is actually in arch/arm64 in 4.19 and was moved in 5.0:
commit 0c3b3171ceccb8830c2bb5adff1b4e9b204c1450 Author: Christoph Hellwig hch@lst.de Date: Sun Nov 4 20:29:28 2018 +0100
dma-mapping: move the arm64 noncoherent alloc/free support to common code
commit f0edfea8ef93ed6cc5f747c46c85c8e53e0798a0 Author: Christoph Hellwig hch@lst.de Date: Fri Aug 24 10:31:08 2018 +0200
dma-mapping: move the remap helpers to a separate file
And there are most certainly more dependencies to get a cleanly applying series to 4.19.123. So the backports could be quite extensive.
Peter Gonda pgonda@google.com is currently handling these and we're looking for advice: should we compile a full list of required backports that would be needed to get a series that would only consist of minor conflicts or is this going to be a non-starter?
A full series would be good. Once these hit Linus's tree and show up in a -rc or two, feel free to send on the backports and we can look at them then.
thanks,
greg k-h
On Wed, 20 May 2020, Greg Kroah-Hartman wrote:
On Tue, May 19, 2020 at 05:41:15PM -0700, David Rientjes wrote:
Hi Greg and everyone,
On all kernels, SEV enabled guests hit might_sleep() warnings when a driver (nvme in this case) allocates through the DMA API in a non-blockable context:
BUG: sleeping function called from invalid context at mm/vmalloc.c:1710 in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 3383, name: fio 2 locks held by fio/3383: #0: ffff93b6a8568348 (&sb->s_type->i_mutex_key#16){+.+.}, at: ext4_file_write_iter+0xa2/0x5d0 #1: ffffffffa52a61a0 (rcu_read_lock){....}, at: hctx_lock+0x1a/0xe0 CPU: 0 PID: 3383 Comm: fio Tainted: G W 5.5.10 #14 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: dump_stack+0x98/0xd5 ___might_sleep+0x175/0x260 __might_sleep+0x4a/0x80 _vm_unmap_aliases+0x45/0x250 vm_unmap_aliases+0x19/0x20 __set_memory_enc_dec+0xa4/0x130 set_memory_decrypted+0x10/0x20 dma_direct_alloc_pages+0x148/0x150 dma_direct_alloc+0xe/0x10 dma_alloc_attrs+0x86/0xc0 dma_pool_alloc+0x16f/0x2b0 nvme_queue_rq+0x878/0xc30 [nvme] __blk_mq_try_issue_directly+0x135/0x200 blk_mq_request_issue_directly+0x4f/0x80 blk_mq_try_issue_list_directly+0x46/0xb0 blk_mq_sched_insert_requests+0x19b/0x2b0 blk_mq_flush_plug_list+0x22f/0x3b0 blk_flush_plug_list+0xd1/0x100 blk_finish_plug+0x2c/0x40 iomap_dio_rw+0x427/0x490 ext4_file_write_iter+0x181/0x5d0 aio_write+0x109/0x1b0 io_submit_one+0x7d0/0xfa0 __x64_sys_io_submit+0xa2/0x280 do_syscall_64+0x5f/0x250 entry_SYSCALL_64_after_hwframe+0x49/0xbe
There is a series of patches in Christoph's dma-mapping.git repo in the for-next branch on track for 5.8:
1d659236fb43 dma-pool: scale the default DMA coherent pool size with memory capacity 82fef0ad811f x86/mm: unencrypted non-blocking DMA allocations use coherent pools 2edc5bb3c5cc dma-pool: add pool sizes to debugfs 76a19940bd62 dma-direct: atomic allocations must come from atomic coherent pools 54adadf9b085 dma-pool: dynamically expanding atomic pools c84dc6e68a1d dma-pool: add additional coherent pools to map to gfp mask e860c299ac0d dma-remap: separate DMA atomic pools from direct remap code
We'd like to prepare backports to LTS kernels so that our guest images are not modified by us and don't exhibit this issue.
They are bigger than we'd like:
arch/x86/Kconfig | 1 + drivers/iommu/dma-iommu.c | 5 +- include/linux/dma-direct.h | 2 + include/linux/dma-mapping.h | 6 +- kernel/dma/Kconfig | 6 +- kernel/dma/Makefile | 1 + kernel/dma/direct.c | 56 ++++++-- kernel/dma/pool.c | 264 ++++++++++++++++++++++++++++++++++++ kernel/dma/remap.c | 121 +---------------- 9 files changed, 324 insertions(+), 138 deletions(-) create mode 100644 kernel/dma/pool.c
But they apply relatively cleanly to more modern kernels like 5.4. We'd like to backport these all the way to 4.19, however, otherwise guests encounter these bugs.
The changes to kernel/dma/remap.c, for example, simply moves code to the new pool.c. But that original code is actually in arch/arm64 in 4.19 and was moved in 5.0:
commit 0c3b3171ceccb8830c2bb5adff1b4e9b204c1450 Author: Christoph Hellwig hch@lst.de Date: Sun Nov 4 20:29:28 2018 +0100
dma-mapping: move the arm64 noncoherent alloc/free support to common code
commit f0edfea8ef93ed6cc5f747c46c85c8e53e0798a0 Author: Christoph Hellwig hch@lst.de Date: Fri Aug 24 10:31:08 2018 +0200
dma-mapping: move the remap helpers to a separate file
And there are most certainly more dependencies to get a cleanly applying series to 4.19.123. So the backports could be quite extensive.
Peter Gonda pgonda@google.com is currently handling these and we're looking for advice: should we compile a full list of required backports that would be needed to get a series that would only consist of minor conflicts or is this going to be a non-starter?
A full series would be good. Once these hit Linus's tree and show up in a -rc or two, feel free to send on the backports and we can look at them then.
Thanks Greg. I'll let Peter follow-up when he has the full list of commits that we'll need with minimal conflicts to apply this series cleanly as a kind of preview of what to expect the last week of June for 4.19 LTS :)
linux-stable-mirror@lists.linaro.org