On Fri, Jan 24, 2020 at 09:48:30AM -0800, Yang Shi wrote:
On 1/24/20 7:26 AM, Wei Yang wrote:
On Fri, Jan 24, 2020 at 07:46:49AM +0100, Michal Hocko wrote:
On Fri 24-01-20 06:56:47, Wei Yang wrote:
On Thu, Jan 23, 2020 at 09:55:26AM +0100, Michal Hocko wrote:
On Thu 23-01-20 11:27:36, Wei Yang wrote:
On Thu, Jan 23, 2020 at 07:38:51AM +0800, Yang Shi wrote: > Since commit a49bd4d71637 ("mm, numa: rework do_pages_move"), > the semantic of move_pages() was changed to return the number of > non-migrated pages (failed to migration) and the call would be aborted > immediately if migrate_pages() returns positive value. But it didn't > report the number of pages that we even haven't attempted to migrate. > So, fix it by including non-attempted pages in the return value. > First, we want to change the semantic of move_pages(2). The return value indicates the number of pages we didn't managed to migrate?
Second, the return value from migrate_pages() doesn't mean the number of pages we failed to migrate. For example, one -ENOMEM is returned on the first page, migrate_pages() would return 1. But actually, no page successfully migrated.
ENOMEM is considered a permanent failure and as such it is returned by migrate pages (see goto out).
Third, even the migrate_pages() return the exact non-migrate page, we are not sure those non-migrated pages are at the tail of the list. Because in the last case in migrate_pages(), it just remove the page from list. It could be a page in the middle of the list. Then, in userspace, how the return value be leveraged to determine the valid status? Any page in the list could be the victim.
Yes, I was wrong when stating that the caller would know better which status to check. I misremembered the original patch as it was quite some time ago. While storing the error code would be possible after some massaging of migrate_pages is this really something we deeply care about. The caller can achieve the same by initializing the status array to a non-node number - e.g. -1 - and check based on that.
So for a user, the best practice is to initialize the status array to -1 and check each status to see whether the page is migrated successfully?
Yes IMO. Just consider -errno return value. You have no way to find out which pages have been migrated until we reached that error. The possitive return value would fall into the same case.
Then do we need to return the number of non-migrated page? What benefit could user get from the number. How about just return an error code to indicate the failure? I may miss some point, would you mind giving me a hint?
This is certainly possible. We can return -EAGAIN if some pages couldn't be migrated because they are pinned. But please read my previous email to the very end for arguments why this might cause more problems than it actually solves.
Let me put your comment here:
Because new users could have started depending on it. It is not all that unlikely that the current implementation would just work for them because they are migrating a set of pages on to the same node so the batch would be a single list throughout the whole given page set.
Your idea is to preserve current semantic, return non-migrated pages number to userspace.
And the reason is:
1. Users have started depending on it. 2. No real bug reported yet. 3. User always migrate page to the same node. (If my understanding is correct)
I think this gets some reason, since we want to minimize the impact to userland.
While let's see what user probably use this syscall. Since from the man page, we never told the return value could be positive, the number of non-migrated pages, user would think only 0 means a successful migration and all other cases are failure. Then user probably handle negative and positive return value the same way, like (!err).
If my guess is true, return a negative error value for this case could minimize the impact to userland here. 1. Preserve the semantic of move_pages(2): 0 means success, negative means some error and needs extra handling. 2. Trivial change to the man page. 3. Suppose no change to users.
Well, in case I missed your point, sorry about that.
I think we should compare the new semantic with the old one. With the old semantic the move_pages() return 0 for both success *and* migration failure. So, I'm supposed (I don't have any real usecase) the user may do the below with the old semantic: - Just check if it is failed (ignore migration failure), "!err" is good enough. This usecase is fine as well with the new semantic since migration failure is also a kind of error cases. - Care about migration failure, the user needs traverse all bits in the status array. With the new semantic they just need check if "err > 0", if they want to know what specific pages are failed to migrate, then traverse the status array (with initialized as -1 as Michal suggested in earlier email).
So, with returning errno for migration failure if the userspace wants to see if migration is failed, they need do: 1. Check "!err" 2. Read errno if #1 returns false 3. Traverse status array to see how many pages are failed to migrate
You are right. I misunderstand the mechanism of error handling on err and errno.
But with the new semantic they just need check if "err > 0", one step is fine for the most cases. So I said this approach seems more straightforward to the userspace and makes more sense IMHO.
This system call has quite a complex semantic and I am not 100% sure what is the right thing to do here. Maybe we do want to continue and try to migrate as much as possible on non-fatal migration failures and accumulate the number of failed pages while doing so.
The main problem is that we can have an academic discussion but the primary question is what do actual users want. A lack of real bug reports suggests that nobody has actually noticed this. So I would rather keep returning the correct number of non-migrated pages. Why? Because new users could have started depending on it. It is not all that unlikely that the current implementation would just work for them because they are migrating a set of pages on to the same node so the batch would be a single list throughout the whole given page set.
-- Michal Hocko SUSE Labs