I've cc'ed some folks in hopes to get this resolved upstream.
Either way, 4.1's EoL was previously moved to about 6 months from now, so hopefully we'll have more than enough time to get this resolved.
On Sat, Nov 11, 2017 at 10:13:55PM +0000, Tuncer Ayaz wrote:
The predicament I'm in on my machines is that ever since drm-intel has implemented atomic modesetting, there's a list regressions caused by those fundamental architecture changes and the code churn it implied. This means 4.1 is (from what I can tell) the last kernel before atomic modesetting was added and the only kernel free of all those issues which necessitate trying out various combinations of flags on the kernel cmdline.
For instance, right now I'm trying 4.13.12 with these flags: video=SVIDEO-1:d i915.semaphores=1 i915.enable_rc6=0 i915.enable_psr=0 intel_iommu=igfx_off
PS: I'm kinda confused how anyone uses DMAR with VT-d when it's known to be buggy.
The flags seem to decrease the chances of provoking the bugs, but after a day of running Xorg, it's possible to still hit the RCS0 GPU hangs.
If you don't pass video=SVIDEO-1:d, then atomic's flip_done times out on boot or exit to VT console. It's good that other people have the same issues and have been following the bugzilla tickets, and con confirm the results.
I'm kinda glad I don't have a machine that's newer than Sandybridge since that means I can use 4.1, though it's not a long-term solution, and the plan is for the reported bugzilla tickets to be resolved at some point, or me switching away from Intel GPUs, which might be doable if I save money and get an AMD APU laptop next summer and switch my desktop to a discrete GPU.
For example: https://bugs.freedesktop.org/show_bug.cgi?id=101237 https://bugs.freedesktop.org/show_bug.cgi?id=103076 https://bbs.archlinux.org/viewtopic.php?id=218581&p=3 https://bugs.archlinux.org/task/51703
So, since 4.4, 4.9 and 4.12, drm-tip are still regressive, I wanted to ask if you considered pushing back 4.1's EOL.
Given a look at bugzilla, I have the impression that those issues will need at least another year before they're fixed, since most of them have been sitting there for many, many months. I suspect the Intel DRM team doesn't have the bandwidth to address the issues in a timely fashion while still adding upbringing for new GPUs and features (fences, etc.).
The generic modesetting DDX and Wayland are less susceptible to the GPU hangs, but can be made to provoke it if tried long enough. However, the modesetting DDX tears heavily and is about to gain atomic modesetting in the next Xorg release, so will suffer from the same easy GPU hang likelihood.
Prior to SandyBridge there was zero tearing but beginning with SandyBridge xf86-video-intel's TearFree=TRUE is the only reliable way to fix Xorg tearing.
I do appreciate you maintaining 4.1 so far and hate to admit that I'm reliant on it on more than two machines, before and after Sandybridge, exluding those machines which need a newer kernel. I also understand how much work this is and since I'm not using Linux professionally for a product, I can't offer compensation for your time. I can only offer to collect and point you at a list of DRM bugs for validation of my claims.