Am 24.07.24 um 10:16 schrieb Tvrtko Ursulin:
[SNIP]
Absolutely.
Absolutely good and absolutely me, or absolutely you? :)
You, I don't even have time to finish all the stuff I already started :/
These are the TODO points and their opens:
- Adjust amdgpu_ctx_set_entity_priority() to call
drm_sched_entity_modify_sched() regardless of the hw_type - to fix priority changes on a single sched other than gfx or compute.
Either that or to stop using the scheduler priority to implement userspace priorities and always use different HW queues for that.
- Document sched_list array lifetime must align with the entity and
adjust the callers.
Open:
Do you still oppose keeping sched_list for num_scheds == 1?
Not if you can fix up all callers.
If so do you propose drm_sched_entity_modify_sched() keeps disagreeing with drm_sched_entity_init() on this detail? And keep the "one shot single sched_list" quirk in? Why is that nicer than simply keeping the list and remove that quirk? Once lifetime rules are clear it IMO is okay to always keep the list.
Yeah if every caller of drm_sched_entity_init() can be fixed I'm fine with that as well.
- Remove drm_sched_entity_set_priority().
Open:
Should we at this point also modify amdgpu_device_init_schedulers() to stop initialising schedulers with DRM_SCHED_PRIORITY_COUNT run queues?
One step at a time.
Once drm_sched_entity_set_priority() is gone there is little point. Unless there are some non-obvious games with the kernel priority or something.
In general scheduler priorities were meant to be used for things like kernel queues which would always have higher priority than user space submissions and using them for userspace turned out to be not such a good idea.
Out of curiousity what were the problems? I cannot think of anything fundamental since there are priorities at the backend level after all, just fewer levels.
A higher level queue can starve lower level queues to the point that they never get a chance to get anything running.
Oh that.. well I call that implementation details. :) Because nowhere in the uapi is I think guaranteed execution ordering needs to be strictly in descending priority. This potentially goes back to what you said above that a potential larger rewrite might be beneficial. Implementing some smarter scheduling. Although the issue will be it is just frontend scheduling after all. So a bit questionable to invest in making it too smart.
+1 and we have a bug report complaining that RR is in at least one situation better than FIFO. So it is actually quite hard to remove.
On the other hand FIFO has some really nice properties as well. E.g. try to run >100 glxgears instances (on weaker hw) and just visually compare the result differences between RR and FIFO. FIFO is baby smooth and RR is basically stuttering all the time.
I think this goes more back to what I was suggesting during early xe days, that potentially drm scheduler should be split into dependency handling part and the scheduler part. Drivers with 1:1 entity:scheduler and full hardware/firmware scheduling do not really need neither fifo or rr.
Yeah that's my thinking as well and I also suggested that multiple times in discussions with Sima and others.
This basically means that userspace gets a chance to submit infinity fences with all the bad consequences.
I mean one problem unrelated to this discussion is this:
void drm_sched_entity_select_rq(struct drm_sched_entity *entity) { struct dma_fence *fence; struct drm_gpu_scheduler *sched; struct drm_sched_rq *rq;
/* queue non-empty, stay on the same engine */ if (spsc_queue_count(&entity->job_queue)) return;
Which makes it look like the entity with a constant trickle of jobs will never actually manage to change it's run queue. Neither today, nor after the potential drm_sched_entity_set_priority() removal.
That's intentional and based on how the scheduler load balancing works.
I see that it is intentional but if it can silently prevent priority changes (even hw priority) it is not very solid. Unless I am missing something here.
IIRC the GSoC student who implemented this (with me as mentor) actually documented that behavior somewhere.
And to be honest it kind of makes sense because switching priorities would otherwise be disruptive, e.g. you have a moment were you need to drain all previous submissions with the old priority before you can do new ones with the new priority.
Hmmm I don't see how it makes sense. Perhaps a test case for AMDGPU_SCHED_OP_*_PRIORITY_OVERRIDE is missing to show how it doesn't work. Or at least how easy it can be defeated with callers none the wiser.
Ok, that needs a bit longer explanation. You don't by any chance have teams?
For context I am kind of interested because I wired up amdgpu to the DRM cgroup controller and use priority override to de-prioritize certain cgroups and it kind of works. But again, it will not be great if a client with a constant trickle of submissions can just defeat it.
Yeah, exactly that use case is currently not possible :(
Regards, Christian.
Regards,
Tvrtko