On Thu, Mar 28, 2024 at 07:06:21PM -0500, Elizabeth Figura wrote:
diff --git a/Documentation/userspace-api/ntsync.rst b/Documentation/userspace-api/ntsync.rst new file mode 100644 index 000000000000..202c2350d3af --- /dev/null +++ b/Documentation/userspace-api/ntsync.rst @@ -0,0 +1,399 @@ +=================================== +NT synchronization primitive driver +===================================
+This page documents the user-space API for the ntsync driver.
+ntsync is a support driver for emulation of NT synchronization +primitives by user-space NT emulators. It exists because implementation +in user-space, using existing tools, cannot match Windows performance +while offering accurate semantics. It is implemented entirely in +software, and does not drive any hardware device.
+This interface is meant as a compatibility tool only, and should not +be used for general synchronization. Instead use generic, versatile +interfaces such as futex(2) and poll(2).
+Synchronization primitives +==========================
+The ntsync driver exposes three types of synchronization primitives: +semaphores, mutexes, and events.
+A semaphore holds a single volatile 32-bit counter, and a static 32-bit +integer denoting the maximum value. It is considered signaled when the +counter is nonzero. The counter is decremented by one when a wait is +satisfied. Both the initial and maximum count are established when the +semaphore is created.
+A mutex holds a volatile 32-bit recursion count, and a volatile 32-bit +identifier denoting its owner. A mutex is considered signaled when its +owner is zero (indicating that it is not owned). The recursion count is +incremented when a wait is satisfied, and ownership is set to the given +identifier.
+A mutex also holds an internal flag denoting whether its previous owner +has died; such a mutex is said to be abandoned. Owner death is not +tracked automatically based on thread death, but rather must be +communicated using ``NTSYNC_IOC_MUTEX_KILL``. An abandoned mutex is +inherently considered unowned.
+Except for the "unowned" semantics of zero, the actual value of the +owner identifier is not interpreted by the ntsync driver at all. The +intended use is to store a thread identifier; however, the ntsync +driver does not actually validate that a calling thread provides +consistent or unique identifiers.
+An event holds a volatile boolean state denoting whether it is signaled +or not. There are two types of events, auto-reset and manual-reset. An +auto-reset event is designaled when a wait is satisfied; a manual-reset +event is not. The event type is specified when the event is created.
+Unless specified otherwise, all operations on an object are atomic and +totally ordered with respect to other operations on the same object.
+Objects are represented by files. When all file descriptors to an +object are closed, that object is deleted.
+Char device +===========
+The ntsync driver creates a single char device /dev/ntsync. Each file +description opened on the device represents a unique instance intended +to back an individual NT virtual machine. Objects created by one ntsync +instance may only be used with other objects created by the same +instance.
+ioctl reference +===============
+All operations on the device are done through ioctls. There are four +structures used in ioctl calls::
- struct ntsync_sem_args {
- __u32 sem;
- __u32 count;
- __u32 max;
- };
- struct ntsync_mutex_args {
- __u32 mutex;
- __u32 owner;
- __u32 count;
- };
- struct ntsync_event_args {
- __u32 event;
- __u32 signaled;
- __u32 manual;
- };
- struct ntsync_wait_args {
- __u64 timeout;
- __u64 objs;
- __u32 count;
- __u32 owner;
- __u32 index;
- __u32 alert;
- __u32 flags;
- __u32 pad;
- };
+Depending on the ioctl, members of the structure may be used as input, +output, or not at all. All ioctls return 0 on success.
+The ioctls on the device file are as follows:
+.. c:macro:: NTSYNC_IOC_CREATE_SEM
- Create a semaphore object. Takes a pointer to struct
- :c:type:`ntsync_sem_args`, which is used as follows:
- .. list-table::
* - ``sem``
- On output, contains a file descriptor to the created semaphore.
* - ``count``
- Initial count of the semaphore.
* - ``max``
- Maximum count of the semaphore.
- Fails with ``EINVAL`` if ``count`` is greater than ``max``.
+.. c:macro:: NTSYNC_IOC_CREATE_MUTEX
- Create a mutex object. Takes a pointer to struct
- :c:type:`ntsync_mutex_args`, which is used as follows:
- .. list-table::
* - ``mutex``
- On output, contains a file descriptor to the created mutex.
* - ``count``
- Initial recursion count of the mutex.
* - ``owner``
- Initial owner of the mutex.
- If ``owner`` is nonzero and ``count`` is zero, or if ``owner`` is
- zero and ``count`` is nonzero, the function fails with ``EINVAL``.
+.. c:macro:: NTSYNC_IOC_CREATE_EVENT
- Create an event object. Takes a pointer to struct
- :c:type:`ntsync_event_args`, which is used as follows:
- .. list-table::
* - ``event``
- On output, contains a file descriptor to the created event.
* - ``signaled``
- If nonzero, the event is initially signaled, otherwise
nonsignaled.
* - ``manual``
- If nonzero, the event is a manual-reset event, otherwise
auto-reset.
+The ioctls on the individual objects are as follows:
+.. c:macro:: NTSYNC_IOC_SEM_POST
- Post to a semaphore object. Takes a pointer to a 32-bit integer,
- which on input holds the count to be added to the semaphore, and on
- output contains its previous count.
- If adding to the semaphore's current count would raise the latter
- past the semaphore's maximum count, the ioctl fails with
- ``EOVERFLOW`` and the semaphore is not affected. If raising the
- semaphore's count causes it to become signaled, eligible threads
- waiting on this semaphore will be woken and the semaphore's count
- decremented appropriately.
+.. c:macro:: NTSYNC_IOC_MUTEX_UNLOCK
- Release a mutex object. Takes a pointer to struct
- :c:type:`ntsync_mutex_args`, which is used as follows:
- .. list-table::
* - ``mutex``
- Ignored.
* - ``owner``
- Specifies the owner trying to release this mutex.
* - ``count``
- On output, contains the previous recursion count.
- If ``owner`` is zero, the ioctl fails with ``EINVAL``. If ``owner``
- is not the current owner of the mutex, the ioctl fails with
- ``EPERM``.
- The mutex's count will be decremented by one. If decrementing the
- mutex's count causes it to become zero, the mutex is marked as
- unowned and signaled, and eligible threads waiting on it will be
- woken as appropriate.
+.. c:macro:: NTSYNC_IOC_SET_EVENT
- Signal an event object. Takes a pointer to a 32-bit integer, which on
- output contains the previous state of the event.
- Eligible threads will be woken, and auto-reset events will be
- designaled appropriately.
+.. c:macro:: NTSYNC_IOC_RESET_EVENT
- Designal an event object. Takes a pointer to a 32-bit integer, which
- on output contains the previous state of the event.
+.. c:macro:: NTSYNC_IOC_PULSE_EVENT
- Wake threads waiting on an event object while leaving it in an
- unsignaled state. Takes a pointer to a 32-bit integer, which on
- output contains the previous state of the event.
- A pulse operation can be thought of as a set followed by a reset,
- performed as a single atomic operation. If two threads are waiting on
- an auto-reset event which is pulsed, only one will be woken. If two
- threads are waiting a manual-reset event which is pulsed, both will
- be woken. However, in both cases, the event will be unsignaled
- afterwards, and a simultaneous read operation will always report the
- event as unsignaled.
+.. c:macro:: NTSYNC_IOC_READ_SEM
- Read the current state of a semaphore object. Takes a pointer to
- struct :c:type:`ntsync_sem_args`, which is used as follows:
- .. list-table::
* - ``sem``
- Ignored.
* - ``count``
- On output, contains the current count of the semaphore.
* - ``max``
- On output, contains the maximum count of the semaphore.
+.. c:macro:: NTSYNC_IOC_READ_MUTEX
- Read the current state of a mutex object. Takes a pointer to struct
- :c:type:`ntsync_mutex_args`, which is used as follows:
- .. list-table::
* - ``mutex``
- Ignored.
* - ``owner``
- On output, contains the current owner of the mutex, or zero
if the mutex is not currently owned.
* - ``count``
- On output, contains the current recursion count of the mutex.
- If the mutex is marked as abandoned, the function fails with
- ``EOWNERDEAD``. In this case, ``count`` and ``owner`` are set to
- zero.
+.. c:macro:: NTSYNC_IOC_READ_EVENT
- Read the current state of an event object. Takes a pointer to struct
- :c:type:`ntsync_event_args`, which is used as follows:
- .. list-table::
* - ``event``
- Ignored.
* - ``signaled``
- On output, contains the current state of the event.
* - ``manual``
- On output, contains 1 if the event is a manual-reset event,
and 0 otherwise.
+.. c:macro:: NTSYNC_IOC_KILL_OWNER
- Mark a mutex as unowned and abandoned if it is owned by the given
- owner. Takes an input-only pointer to a 32-bit integer denoting the
- owner. If the owner is zero, the ioctl fails with ``EINVAL``. If the
- owner does not own the mutex, the function fails with ``EPERM``.
- Eligible threads waiting on the mutex will be woken as appropriate
- (and such waits will fail with ``EOWNERDEAD``, as described below).
+.. c:macro:: NTSYNC_IOC_WAIT_ANY
- Poll on any of a list of objects, atomically acquiring at most one.
- Takes a pointer to struct :c:type:`ntsync_wait_args`, which is
- used as follows:
- .. list-table::
* - ``timeout``
- Absolute timeout in nanoseconds. If ``NTSYNC_WAIT_REALTIME``
is set, the timeout is measured against the REALTIME clock;
otherwise it is measured against the MONOTONIC clock. If the
timeout is equal to or earlier than the current time, the
function returns immediately without sleeping. If ``timeout``
is U64_MAX, the function will sleep until an object is
signaled, and will not fail with ``ETIMEDOUT``.
* - ``objs``
- Pointer to an array of ``count`` file descriptors
(specified as an integer so that the structure has the same
size regardless of architecture). If any object is
invalid, the function fails with ``EINVAL``.
* - ``count``
- Number of objects specified in the ``objs`` array.
If greater than ``NTSYNC_MAX_WAIT_COUNT``, the function fails
with ``EINVAL``.
* - ``owner``
- Mutex owner identifier. If any object in ``objs`` is a mutex,
the ioctl will attempt to acquire that mutex on behalf of
``owner``. If ``owner`` is zero, the ioctl fails with
``EINVAL``.
* - ``index``
- On success, contains the index (into ``objs``) of the object
which was signaled. If ``alert`` was signaled instead,
this contains ``count``.
* - ``alert``
- Optional event object file descriptor. If nonzero, this
specifies an "alert" event object which, if signaled, will
terminate the wait. If nonzero, the identifier must point to a
valid event.
* - ``flags``
- Zero or more flags. Currently the only flag is
``NTSYNC_WAIT_REALTIME``, which causes the timeout to be
measured against the REALTIME clock instead of MONOTONIC.
* - ``pad``
- Unused, must be set to zero.
- This function attempts to acquire one of the given objects. If unable
- to do so, it sleeps until an object becomes signaled, subsequently
- acquiring it, or the timeout expires. In the latter case the ioctl
- fails with ``ETIMEDOUT``. The function only acquires one object, even
- if multiple objects are signaled.
- A semaphore is considered to be signaled if its count is nonzero, and
- is acquired by decrementing its count by one. A mutex is considered
- to be signaled if it is unowned or if its owner matches the ``owner``
- argument, and is acquired by incrementing its recursion count by one
- and setting its owner to the ``owner`` argument. An auto-reset event
- is acquired by designaling it; a manual-reset event is not affected
- by acquisition.
- Acquisition is atomic and totally ordered with respect to other
- operations on the same object. If two wait operations (with different
- ``owner`` identifiers) are queued on the same mutex, only one is
- signaled. If two wait operations are queued on the same semaphore,
- and a value of one is posted to it, only one is signaled. The order
- in which threads are signaled is not specified.
- If an abandoned mutex is acquired, the ioctl fails with
- ``EOWNERDEAD``. Although this is a failure return, the function may
- otherwise be considered successful. The mutex is marked as owned by
- the given owner (with a recursion count of 1) and as no longer
- abandoned, and ``index`` is still set to the index of the mutex.
- The ``alert`` argument is an "extra" event which can terminate the
- wait, independently of all other objects. If members of ``objs`` and
- ``alert`` are both simultaneously signaled, a member of ``objs`` will
- always be given priority and acquired first.
- It is valid to pass the same object more than once, including by
- passing the same event in the ``objs`` array and in ``alert``. If a
- wakeup occurs due to that object being signaled, ``index`` is set to
- the lowest index corresponding to that object.
- The function may fail with ``EINTR`` if a signal is received.
+.. c:macro:: NTSYNC_IOC_WAIT_ALL
- Poll on a list of objects, atomically acquiring all of them. Takes a
- pointer to struct :c:type:`ntsync_wait_args`, which is used
- identically to ``NTSYNC_IOC_WAIT_ANY``, except that ``index`` is
- always filled with zero on success if not woken via alert.
- This function attempts to simultaneously acquire all of the given
- objects. If unable to do so, it sleeps until all objects become
- simultaneously signaled, subsequently acquiring them, or the timeout
- expires. In the latter case the ioctl fails with ``ETIMEDOUT`` and no
- objects are modified.
- Objects may become signaled and subsequently designaled (through
- acquisition by other threads) while this thread is sleeping. Only
- once all objects are simultaneously signaled does the ioctl acquire
- them and return. The entire acquisition is atomic and totally ordered
- with respect to other operations on any of the given objects.
- If an abandoned mutex is acquired, the ioctl fails with
- ``EOWNERDEAD``. Similarly to ``NTSYNC_IOC_WAIT_ANY``, all objects are
- nevertheless marked as acquired. Note that if multiple mutex objects
- are specified, there is no way to know which were marked as
- abandoned.
- As with "any" waits, the ``alert`` argument is an "extra" event which
- can terminate the wait. Critically, however, an "all" wait will
- succeed if all members in ``objs`` are signaled, *or* if ``alert`` is
- signaled. In the latter case ``index`` will be set to ``count``. As
- with "any" waits, if both conditions are filled, the former takes
- priority, and objects in ``objs`` will be acquired.
- Unlike ``NTSYNC_IOC_WAIT_ANY``, it is not valid to pass the same
- object more than once, nor is it valid to pass the same object in
- ``objs`` and in ``alert``. If this is attempted, the function fails
- with ``EINVAL``.
The doc LGTM, thanks!
Reviewed-by: Bagas Sanjaya bagasdotme@gmail.com