The docs, implementations and use of arch_[enter|leave]_lazy_mmu_mode()
is a bit of a mess (to put it politely). There are a number of issues
related to nesting of lazy mmu regions and confusion over whether the
task, when in a lazy mmu region, is preemptible or not. Fix all the
issues relating to the core-mm. Follow up commits will fix the
arch-specific implementations. 3 arches implement lazy mmu; powerpc,
sparc and x86.
When arch_[enter|leave]_lazy_mmu_mode() was first introduced by commit
6606c3e0da53 ("[PATCH] paravirt: lazy mmu mode hooks.patch"), it was
expected that lazy mmu regions would never nest and that the appropriate
page table lock(s) would be held while in the region, thus ensuring the
region is non-preemptible. Additionally lazy mmu regions were only used
during manipulation of user mappings.
Commit 38e0edb15bd0 ("mm/apply_to_range: call pte function with lazy
updates") started invoking the lazy mmu mode in apply_to_pte_range(),
which is used for both user and kernel mappings. For kernel mappings the
region is no longer protected by any lock so there is no longer any
guarantee about non-preemptibility. Additionally, for RT configs, the
holding the PTL only implies no CPU migration, it doesn't prevent
preemption.
Commit bcc6cc832573 ("mm: add default definition of set_ptes()") added
arch_[enter|leave]_lazy_mmu_mode() to the default implementation of
set_ptes(), used by x86. So after this commit, lazy mmu regions can be
nested. Additionally commit 1a10a44dfc1d ("sparc64: implement the new
page table range API") and commit 9fee28baa601 ("powerpc: implement the
new page table range API") did the same for the sparc and powerpc
set_ptes() overrides.
powerpc couldn't deal with preemption so avoids it in commit
b9ef323ea168 ("powerpc/64s: Disable preemption in hash lazy mmu mode"),
which explicitly disables preemption for the whole region in its
implementation. x86 can support preemption (or at least it could until
it tried to add support nesting; more on this below). Sparc looks to be
totally broken in the face of preemption, as far as I can tell.
powerpc can't deal with nesting, so avoids it in commit 47b8def9358c
("powerpc/mm: Avoid calling arch_enter/leave_lazy_mmu() in set_ptes"),
which removes the lazy mmu calls from its implementation of set_ptes().
x86 attempted to support nesting in commit 49147beb0ccb ("x86/xen: allow
nesting of same lazy mode") but as far as I can tell, this breaks its
support for preemption.
In short, it's all a mess; the semantics for
arch_[enter|leave]_lazy_mmu_mode() are not clearly defined and as a
result the implementations all have different expectations, sticking
plasters and bugs.
arm64 is aiming to start using these hooks, so let's clean everything up
before adding an arm64 implementation. Update the documentation to state
that lazy mmu regions can never be nested, must not be called in
interrupt context and preemption may or may not be enabled for the
duration of the region. And fix the generic implementation of set_ptes()
to avoid nesting.
arch-specific fixes to conform to the new spec will proceed this one.
These issues were spotted by code review and I have no evidence of
issues being reported in the wild.
Cc: <stable(a)vger.kernel.org>
Fixes: bcc6cc832573 ("mm: add default definition of set_ptes()")
Acked-by: David Hildenbrand <david(a)redhat.com>
Signed-off-by: Ryan Roberts <ryan.roberts(a)arm.com>
---
include/linux/pgtable.h | 14 ++++++++------
1 file changed, 8 insertions(+), 6 deletions(-)
diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h
index 94d267d02372..787c632ee2c9 100644
--- a/include/linux/pgtable.h
+++ b/include/linux/pgtable.h
@@ -222,10 +222,14 @@ static inline int pmd_dirty(pmd_t pmd)
* hazard could result in the direct mode hypervisor case, since the actual
* write to the page tables may not yet have taken place, so reads though
* a raw PTE pointer after it has been modified are not guaranteed to be
- * up to date. This mode can only be entered and left under the protection of
- * the page table locks for all page tables which may be modified. In the UP
- * case, this is required so that preemption is disabled, and in the SMP case,
- * it must synchronize the delayed page table writes properly on other CPUs.
+ * up to date.
+ *
+ * In the general case, no lock is guaranteed to be held between entry and exit
+ * of the lazy mode. So the implementation must assume preemption may be enabled
+ * and cpu migration is possible; it must take steps to be robust against this.
+ * (In practice, for user PTE updates, the appropriate page table lock(s) are
+ * held, but for kernel PTE updates, no lock is held). Nesting is not permitted
+ * and the mode cannot be used in interrupt context.
*/
#ifndef __HAVE_ARCH_ENTER_LAZY_MMU_MODE
#define arch_enter_lazy_mmu_mode() do {} while (0)
@@ -287,7 +291,6 @@ static inline void set_ptes(struct mm_struct *mm, unsigned long addr,
{
page_table_check_ptes_set(mm, ptep, pte, nr);
- arch_enter_lazy_mmu_mode();
for (;;) {
set_pte(ptep, pte);
if (--nr == 0)
@@ -295,7 +298,6 @@ static inline void set_ptes(struct mm_struct *mm, unsigned long addr,
ptep++;
pte = pte_next_pfn(pte);
}
- arch_leave_lazy_mmu_mode();
}
#endif
#define set_pte_at(mm, addr, ptep, pte) set_ptes(mm, addr, ptep, pte, 1)
--
2.43.0
Hi
I was doing some research on your industry and I landed on your website.We will see some bugs on your website.
Would you like me to send over a screenshot & Specific report of those bugs?
Best"
[BUG]
A user space program that calls select/poll get always an immediate data
ready-to-read response. As a result the intended use to wait until next
data becomes ready does not work.
User space snippet:
struct pollfd pollfd = {
.fd = open("/dev/pps0", O_RDONLY),
.events = POLLIN|POLLERR,
.revents = 0 };
while(1) {
poll(&pollfd, 1, 2000/*ms*/); // returns immediate, but should wait
if(revents & EPOLLIN) { // always true
struct pps_fdata fdata;
memset(&fdata, 0, sizeof(memdata));
ioctl(PPS_FETCH, &fdata); // currently fetches data at max speed
}
}
[CAUSE]
pps_cdev_poll() returns unconditionally EPOLLIN.
[FIX]
Remember the last fetch event counter and compare this value in
pps_cdev_poll() with most recent event counter
and return 0 if they are equal.
Signed-off-by: Denis OSTERLAND-HEIM <denis.osterland(a)diehl.com>
Co-developed-by: Rodolfo Giometti <giometti(a)enneenne.com>
Signed-off-by: Rodolfo Giometti <giometti(a)enneenne.com>
Fixes: eae9d2ba0cfc ("LinuxPPS: core support")
CC: stable(a)vger.kernel.org # 5.4+
---
drivers/pps/pps.c | 11 +++++++++--
include/linux/pps_kernel.h | 1 +
2 files changed, 10 insertions(+), 2 deletions(-)
diff --git a/drivers/pps/pps.c b/drivers/pps/pps.c
index 6a02245ea35f..9463232af8d2 100644
--- a/drivers/pps/pps.c
+++ b/drivers/pps/pps.c
@@ -41,6 +41,9 @@ static __poll_t pps_cdev_poll(struct file *file, poll_table *wait)
poll_wait(file, &pps->queue, wait);
+if (pps->last_fetched_ev == pps->last_ev)
+return 0;
+
return EPOLLIN | EPOLLRDNORM;
}
@@ -186,9 +189,11 @@ static long pps_cdev_ioctl(struct file *file,
if (err)
return err;
-/* Return the fetched timestamp */
+/* Return the fetched timestamp and save last fetched event */
spin_lock_irq(&pps->lock);
+pps->last_fetched_ev = pps->last_ev;
+
fdata.info.assert_sequence = pps->assert_sequence;
fdata.info.clear_sequence = pps->clear_sequence;
fdata.info.assert_tu = pps->assert_tu;
@@ -272,9 +277,11 @@ static long pps_cdev_compat_ioctl(struct file *file,
if (err)
return err;
-/* Return the fetched timestamp */
+/* Return the fetched timestamp and save last fetched event */
spin_lock_irq(&pps->lock);
+pps->last_fetched_ev = pps->last_ev;
+
compat.info.assert_sequence = pps->assert_sequence;
compat.info.clear_sequence = pps->clear_sequence;
compat.info.current_mode = pps->current_mode;
diff --git a/include/linux/pps_kernel.h b/include/linux/pps_kernel.h
index c7abce28ed29..aab0aebb529e 100644
--- a/include/linux/pps_kernel.h
+++ b/include/linux/pps_kernel.h
@@ -52,6 +52,7 @@ struct pps_device {
int current_mode;/* PPS mode at event time */
unsigned int last_ev;/* last PPS event id */
+unsigned int last_fetched_ev;/* last fetched PPS event id */
wait_queue_head_t queue;/* PPS event queue */
unsigned int id;/* PPS source unique ID */
--
2.47.2
Diehl Metering GmbH, Donaustrasse 120, 90451 Nuernberg
Sitz der Gesellschaft: Ansbach, Registergericht: Ansbach HRB 69
Geschaeftsfuehrer: Dr. Christof Bosbach (Sprecher), Dipl.-Dolm. Annette Geuther, Dipl.-Kfm. Reiner Edel, Jean-Claude Luttringer
Bitte denken Sie an die Umwelt, bevor Sie diese E-Mail drucken. Diese E-Mail kann vertrauliche Informationen enthalten. Sollten die in dieser E-Mail enthaltenen Informationen nicht für Sie bestimmt sein, informieren Sie bitte unverzueglich den Absender per E-Mail und loeschen Sie diese E-Mail in Ihrem System. Jede unberechtigte Form der Reproduktion, Bekanntgabe, Aenderung, Verteilung und/oder Publikation dieser E-Mail ist strengstens untersagt. Informationen zum Datenschutz finden Sie auf unserer Homepage<https://www.diehl.com/metering/de/impressum-und-rechtliche-hinweise/>.
Before printing, think about environmental responsibility.This message may contain confidential information. If you are not authorized to receive this information please advise the sender immediately by reply e-mail and delete this message without making any copies. Any form of unauthorized use, publication, reproduction, copying or disclosure of the e-mail is not permitted. Information about data protection can be found on our homepage<https://www.diehl.com/metering/en/data-protection/>.