All fields in the PE are big-endian. Use cpu_to_be32() like everywhere else something is written to the PE. Otherwise a wrong TID will be used by the NPU. If this TID happens to point to an existing thread sharing the same mm, it could be woken up by error. This is highly improbable though. The likely outcome of this is the NPU not finding the target thread and forcing the AFU into sending an interrupt, which userspace is supposed to handle anyway.
Fixes: e948e06fc63a ("ocxl: Expose the thread_id needed for wait on POWER9") Cc: stable@vger.kernel.org # v4.18 Signed-off-by: Greg Kurz groug@kaod.org ---
This bug remained unnoticed so far because the current OCXL test suite happens to call OCXL_IOCTL_ENABLE_P9_WAIT before attaching a context. This causes ocxl_link_update_pe() to be called before ocxl_link_add_pe() which re-writes the TID in the PE with the appropriate endianness.
I have some patches that change the behavior of the OCXL test suite so that it can catch the issue:
https://github.com/gkurz/libocxl/commits/wake-host-thread-rework --- drivers/misc/ocxl/link.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/misc/ocxl/link.c b/drivers/misc/ocxl/link.c index 31695a078485..646d16450066 100644 --- a/drivers/misc/ocxl/link.c +++ b/drivers/misc/ocxl/link.c @@ -566,7 +566,7 @@ int ocxl_link_update_pe(void *link_handle, int pasid, __u16 tid)
mutex_lock(&spa->spa_lock);
- pe->tid = tid; + pe->tid = cpu_to_be32(tid);
/* * The barrier makes sure the PE is updated
On 17/12/18 8:28 am, Greg Kurz wrote:
All fields in the PE are big-endian. Use cpu_to_be32() like everywhere else something is written to the PE. Otherwise a wrong TID will be used by the NPU. If this TID happens to point to an existing thread sharing the same mm, it could be woken up by error. This is highly improbable though. The likely outcome of this is the NPU not finding the target thread and forcing the AFU into sending an interrupt, which userspace is supposed to handle anyway.
Fixes: e948e06fc63a ("ocxl: Expose the thread_id needed for wait on POWER9") Cc: stable@vger.kernel.org # v4.18 Signed-off-by: Greg Kurz groug@kaod.org
If only we read our sparse warnings which would have told us this. One warning down :)
Thanks for finding this!
Acked-by: Andrew Donnellan andrew.donnellan@au1.ibm.com
This bug remained unnoticed so far because the current OCXL test suite happens to call OCXL_IOCTL_ENABLE_P9_WAIT before attaching a context. This causes ocxl_link_update_pe() to be called before ocxl_link_add_pe() which re-writes the TID in the PE with the appropriate endianness.
I have some patches that change the behavior of the OCXL test suite so that it can catch the issue:
https://github.com/gkurz/libocxl/commits/wake-host-thread-rework
drivers/misc/ocxl/link.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/misc/ocxl/link.c b/drivers/misc/ocxl/link.c index 31695a078485..646d16450066 100644 --- a/drivers/misc/ocxl/link.c +++ b/drivers/misc/ocxl/link.c @@ -566,7 +566,7 @@ int ocxl_link_update_pe(void *link_handle, int pasid, __u16 tid) mutex_lock(&spa->spa_lock);
- pe->tid = tid;
- pe->tid = cpu_to_be32(tid);
/* * The barrier makes sure the PE is updated
On Sun, 2018-12-16 at 22:28 +0100, Greg Kurz wrote:
All fields in the PE are big-endian. Use cpu_to_be32() like everywhere else something is written to the PE. Otherwise a wrong TID will be used by the NPU. If this TID happens to point to an existing thread sharing the same mm, it could be woken up by error. This is highly improbable though. The likely outcome of this is the NPU not finding the target thread and forcing the AFU into sending an interrupt, which userspace is supposed to handle anyway.
Fixes: e948e06fc63a ("ocxl: Expose the thread_id needed for wait on POWER9") Cc: stable@vger.kernel.org # v4.18 Signed-off-by: Greg Kurz groug@kaod.org
This bug remained unnoticed so far because the current OCXL test suite happens to call OCXL_IOCTL_ENABLE_P9_WAIT before attaching a context. This causes ocxl_link_update_pe() to be called before ocxl_link_add_pe() which re-writes the TID in the PE with the appropriate endianness.
I have some patches that change the behavior of the OCXL test suite so that it can catch the issue:
https://github.com/gkurz/libocxl/commits/wake-host-thread-rework
drivers/misc/ocxl/link.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/misc/ocxl/link.c b/drivers/misc/ocxl/link.c index 31695a078485..646d16450066 100644 --- a/drivers/misc/ocxl/link.c +++ b/drivers/misc/ocxl/link.c @@ -566,7 +566,7 @@ int ocxl_link_update_pe(void *link_handle, int pasid, __u16 tid) mutex_lock(&spa->spa_lock);
- pe->tid = tid;
- pe->tid = cpu_to_be32(tid);
/* * The barrier makes sure the PE is updated
Good catch, thanks.
Reviewed-by: Alastair D'Silva alastair@d-silva.org
On Mon, 17 Dec 2018 11:38:51 +1100 "Alastair D'Silva" alastair@au1.ibm.com wrote:
On Sun, 2018-12-16 at 22:28 +0100, Greg Kurz wrote:
All fields in the PE are big-endian. Use cpu_to_be32() like everywhere else something is written to the PE. Otherwise a wrong TID will be used by the NPU. If this TID happens to point to an existing thread sharing the same mm, it could be woken up by error. This is highly improbable though. The likely outcome of this is the NPU not finding the target thread and forcing the AFU into sending an interrupt, which userspace is supposed to handle anyway.
Fixes: e948e06fc63a ("ocxl: Expose the thread_id needed for wait on POWER9") Cc: stable@vger.kernel.org # v4.18 Signed-off-by: Greg Kurz groug@kaod.org
This bug remained unnoticed so far because the current OCXL test suite happens to call OCXL_IOCTL_ENABLE_P9_WAIT before attaching a context. This causes ocxl_link_update_pe() to be called before ocxl_link_add_pe() which re-writes the TID in the PE with the appropriate endianness.
I have some patches that change the behavior of the OCXL test suite so that it can catch the issue:
https://github.com/gkurz/libocxl/commits/wake-host-thread-rework
drivers/misc/ocxl/link.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/misc/ocxl/link.c b/drivers/misc/ocxl/link.c index 31695a078485..646d16450066 100644 --- a/drivers/misc/ocxl/link.c +++ b/drivers/misc/ocxl/link.c @@ -566,7 +566,7 @@ int ocxl_link_update_pe(void *link_handle, int pasid, __u16 tid) mutex_lock(&spa->spa_lock);
- pe->tid = tid;
- pe->tid = cpu_to_be32(tid);
/* * The barrier makes sure the PE is updated
Good catch, thanks.
Reviewed-by: Alastair D'Silva alastair@d-silva.org
Friendly ping before Xmas break :)
On Sun, 2018-12-16 at 21:28:50 UTC, Greg Kurz wrote:
All fields in the PE are big-endian. Use cpu_to_be32() like everywhere else something is written to the PE. Otherwise a wrong TID will be used by the NPU. If this TID happens to point to an existing thread sharing the same mm, it could be woken up by error. This is highly improbable though. The likely outcome of this is the NPU not finding the target thread and forcing the AFU into sending an interrupt, which userspace is supposed to handle anyway.
Fixes: e948e06fc63a ("ocxl: Expose the thread_id needed for wait on POWER9") Cc: stable@vger.kernel.org # v4.18 Signed-off-by: Greg Kurz groug@kaod.org Acked-by: Andrew Donnellan andrew.donnellan@au1.ibm.com Reviewed-by: Alastair D'Silva alastair@d-silva.org
Applied to powerpc next, thanks.
https://git.kernel.org/powerpc/c/e1e71e201703500f708bdeaf64660a
cheers
linux-stable-mirror@lists.linaro.org