It looks like
b4a34aa6d "ipmi: Fix how the lower layers are told to watch for messages"
was backported to fullfill a dependency for another backport, but there was another change:
e1891cffd4c4 "ipmi: Make the smi watcher be disabled immediately when not needed"
That is needed to avoid calling a lower layer function with xmit_msgs_lock held. It doesn't apply completely cleanly because of other changes, but you just need to leave in the free_user_work() function and delete the other function in the conflict. In addition to that, you will also need:
383035211c79 "ipmi: move message error checking to avoid deadlock"
to fix a bug in that change.
Can you try this out?
Yes, sorry for the delay, had a bit of technical problems testing your proposed patches. In the meantime we found out that over a dozen of our test servers have had the same crash, some of them multiple times since the kernel update.
Anyways, with your proposed patches on top of 4.19.286, I couldn't trigger the lockdep warning anymore even in a server that without the fixes triggers it very reliably right after the boot. I also saw in another very similar server (without the fixes) that it took almost 17 hours to get even the lockdep warning. Maybe some specific BMC behavior affects this or something? Sadly, that kind of diminishes the value of the short duration tests, but at least there has so far been zero lockdep warnings with the fixes applied. The actual lockups are then way too unpredictable to test reliably in any kind of short time frame.
Anyways, looking at e1891cffd4c4, it's right there where the issue seems to originate from, so it makes total sense to me that it does fix it. I was already kind of looking at it when you confirmed it. Thanks for pointing out also the 383035211c79 patch, it might have been easily missed.