On Thu, Jul 7, 2022 at 3:48 AM chenjun (AM) chenjun102@huawei.com wrote:
在 2022/7/7 0:41, Jason Andryuk 写道:
WEC TPMs (in 1.2 mode) and NTC (in 2.0 mode) have been observer to frequently, but intermittently, fail probe with: tpm_tis: probe of 00:09 failed with error -1
Added debugging output showed that the request_locality in tpm_tis_core_init succeeds, but then the tpm_chip_start fails when its call to tpm_request_locality -> request_locality fails.
The access register in check_locality would show: 0x80 TPM_ACCESS_VALID 0x82 TPM_ACCESS_VALID | TPM_ACCESS_REQUEST_USE 0x80 TPM_ACCESS_VALID continuing until it times out. TPM_ACCESS_ACTIVE_LOCALITY (0x20) doesn't get set which would end the wait.
My best guess is something racy was going on between release_locality's write and request_locality's write. There is no wait in release_locality to ensure that the locality is released, so the subsequent request_locality could confuse the TPM?
tpm_chip_start grabs locality 0, and updates chip->locality. Call that before the TPM_INT_ENABLE write, and drop the explicit request/release calls. tpm_chip_stop performs the release. With this, we switch to using chip->locality instead of priv->locality. The probe failure is not seen after this.
commit 0ef333f5ba7f ("tpm: add request_locality before write TPM_INT_ENABLE") added a request_locality/release_locality pair around tpm_tis_write32 TPM_INT_ENABLE, but there is a read of TPM_INT_ENABLE for the intmask which should also have the locality grabbed. tpm_chip_start is moved before that to have the locality open during the read.
Fixes: 0ef333f5ba7f ("tpm: add request_locality before write TPM_INT_ENABLE")
0ef333f5ba7f is probably not the commit that introduced the problem? As you said the problem was in 5.4 and the commit was merged in 5.16.
I was imprecise with my versions. 0ef333f5ba7f was backported to stable-5.4.y as 13af3a9b1ba6 in 5.4.174. I was running 5.4.163 on some systems without problem, but the probe failures started after jumping to 5.4.200.
Other systems showing the probe failures were 5.15.29 (which has 0ef333f5ba7f backported as ea1fd8364c9f 5.15.17) and 5.17.y (5.17.5 I think) from a Fedora 36 USB drive.
Other machines run fine with 0ef333f5ba7f which is part of why I didn't notice it earlier between 5.4.613 and 5.4.200.
At the top I wrote: "frequently, but intermittently, fail probe". To expand on that: Basically on the affected machines, the probe during boot fails. In one instance, I repeatedly ran `echo 00:05 > /sys/bus/pnp/drivers/tpm_tis/bind`, and it successfully probed on the 7th try. So something racy seems to be going on.
Regards, Jason