On Mon, Aug 10, 2020 at 04:54:50PM +0200, Greg KH wrote:
On Mon, Aug 10, 2020 at 05:19:42PM +0300, Dima Stepanov wrote:
From: Bjorn Helgaas bhelgaas@google.com
commit 51c48b310183ab6ba5419edfc6a8de889cc04521 upstream.
pci_bridge_check_ranges() determines whether a bridge supports the optional I/O and prefetchable memory windows and sets the flag bits in the bridge resources. This *could* be done once during enumeration except that the resource allocation code completely clears the flag bits, e.g., in the pci_assign_unassigned_bridge_resources() path.
The problem with pci_bridge_check_ranges() in the resource allocation path is that we may allocate resources after devices have been claimed by drivers, and pci_bridge_check_ranges() *changes* the window registers to determine whether they're writable. This may break concurrent accesses to devices behind the bridge.
Add a new pci_read_bridge_windows() to determine whether a bridge supports the optional windows, call it once during enumeration, remember the results, and change pci_bridge_check_ranges() so it doesn't touch the bridge windows but sets the flag bits based on those remembered results.
Link: https://lore.kernel.org/linux-pci/1506151482-113560-1-git-send-email-wangzho... Link: https://lists.gnu.org/archive/html/qemu-devel/2018-12/msg02082.html Reported-by: Yandong Xu xuyandong2@huawei.com Tested-by: Yandong Xu xuyandong2@huawei.com Signed-off-by: Bjorn Helgaas bhelgaas@google.com Cc: Michael S. Tsirkin mst@redhat.com Cc: Sagi Grimberg sagi@grimberg.me Cc: Ofer Hayut ofer@lightbitslabs.com Cc: Roy Shterman roys@lightbitslabs.com Cc: Keith Busch keith.busch@intel.com Cc: Zhou Wang wangzhou1@hisilicon.com Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=208371 Signed-off-by: Dima Stepanov dimastep@yandex-team.ru
drivers/pci/probe.c | 52 +++++++++++++++++++++++++++++++++++++++++++++++++ drivers/pci/setup-bus.c | 45 ++++-------------------------------------- include/linux/pci.h | 3 +++ 3 files changed, 59 insertions(+), 41 deletions(-)
Why is this now needed in 4.19.y? What changed to require it and what prevents the users from just using 5.4.y instead?
A bit of an explaination when backporting patches that are not obvious "fixes" to much older kernels is always appreciated :)
thanks,
greg k-h
Hi Greg,
Sorry, was not sure how to make it properly. So i'll try to describe the history of this issue: - in 2017: https://lore.kernel.org/linux-pci/1506151482-113560-1-git-send-email-wangzho... - in 2018: https://lists.gnu.org/archive/html/qemu-devel/2018-12/msg02082.html - in 2019 it was fixed: commit: 51c48b310183ab6ba5419edfc6a8de889cc04521 And there was a small idea to add this patch to stable, if a bugzilla report will be added: https://lkml.org/lkml/2019/2/5/600. But as i understand there were some problems with reproducing. - and we hit it again in 2020 and filed a bug for it with the steps to reproduce: https://bugzilla.kernel.org/show_bug.cgi?id=208371 Because of it, i thought that it really looks like an issue that sometimes triggered. And some words about motivation: - What changed to require it? We filed a bugzilla bug and tried to prove that it is a real issue (not the possibility). - In general nothing prevents users from using 5.4.y. But in big complicated environments (clouds) it is not obvious that exactly this issue leads to such behaviour. Also users can rely on default distribution kernels.
Sorry again, for a little confusion, not very familiar with the process, but hope that this description helps. What do you think about it?
Thanks, Dima.