From: Jason Gunthorpe jgg@nvidia.com Sent: Friday, September 19, 2025 2:07 AM
On Thu, Sep 18, 2025 at 06:49:08AM +0000, Tian, Kevin wrote:
This is enough to implement the 8 initial format variations with all of their features:
- Entries comprised of contiguous blocks of IO PTEs for larger page sizes (AMDv1, ARMv8)
- Multi-level tables, up to 6 levels. Runtime selected top level
- Runtime variable table level size (ARM's concatenated tables)
- Expandable top level (AMDv1)
any more context about this one? how is it different from the earlier "runtime selected top level"?
How about:
- The size of the top table level can be selected at runtime (ARM's concatenated tables)
- The number of levels in the table can optionally increase dynamically during map (AMDv1)
clearer.
- item/entry_size
The number of bytes of VA the table translates for.
If the item is a table entry then the next table covers
this size. If the entry is an output address then the
s/is/translates/
Don't follow?
entry is not an address. So I meant:
"If the entry translates to an output address"
+/*
- PT_WARN_ON is used for invariants that the kunit should be checking
can't
- happen.
- */
+#if IS_ENABLED(CONFIG_DEBUG_GENERIC_PT) +#define PT_WARN_ON WARN_ON +#else +static inline bool PT_WARN_ON(bool condition) +{
- return false;
+} +#endif
Then call it PT_DBG_WARN_ON() to be more explicit?
Ah, I'd rather leave it..
btw looks there is no plain WARN_ON() used in generic-pt. Just be curious about the rationale behind. Is it a new trend to contain all warnings under a debug option?
It is sort of like VM_WARN_ON(), the places that got put are largely performance path so you don't want it enabled unless doing debugging.
Generally the idea is to use PT_WARN_ON() on performance path cases only, and leave normal stuff to normal WARN_ON.
Good to know
+/* If not supplied by the format then contiguous pages are not
supported */
+#ifndef pt_entry_num_contig_lg2 +static inline unsigned int pt_entry_num_contig_lg2(const struct pt_state *pts) +{
- return ilog2(1);
+}
+static inline unsigned short pt_contig_count_lg2(const struct pt_state
*pts)
+{
- return ilog2(1);
+}
what is the difference between above two helpers?
Oh, pt_contig_count_lg2 didn't get kdocs because they are internal helpers to build other functions..
Like this:
/*
- If not supplied by the format then contiguous pages are not supported.
- If contiguous pages are supported then the format must also provide
- pt_contig_count_lg2() if it supports a single contiguous size per level,
- or pt_possible_sizes() if it supports multiple sizes per level.
could be simplified to require the format to always support pt_possible_sizes() if contiguous sizes are supported, no matter being a single size or multiple.
*/ #ifndef pt_entry_num_contig_lg2 static inline unsigned int pt_entry_num_contig_lg2(const struct pt_state *pts) { return ilog2(1); }
/*
- Return the number of contiguous OA items forming an entry at this table
level */ static inline unsigned short pt_contig_count_lg2(const struct pt_state *pts) { return ilog2(1); } #endif
It's currently not implemented by any driver so will have the default version returning 0. and it is only used by default pt_possible_sizes(), which then returns only one page size accordingly.
ARM and RISCV use it. AMD is the only format that support more than one size of contiguous per level.
hmm I didn't find ARM/RISCV defining pt_contig_count_lg2(). So in all cases it's the default version returning ilog2(1). Then what's the point of keeping it instead of directly using ilog2(1) in default pt_possible_sizes? sorry that I still didn't connect the dots here.
+/**
- pt_item_fully_covered() - Check if the item or entry is entirely
contained
within pts->range
when using pts it's more accurate to call it pt_entry_fully_covered()
Not so much related to PTS, as pts could be either, but it makes more sense to refer to table poitner as an entry than does it to refer to a contiguous entry an item.
make sense.
- PT_FEAT_FLUSH_RANGE,
- /**
* @PT_FEAT_FLUSH_RANGE_NO_GAPS: Like PT_FEAT_FLUSH_RANGE
except that
* the optimization objective is to only flush IOVA that has been
* changed. This mode is suitable for cases like hypervisor shadowing
* where flushing unchanged ranges may cause the hypervisor to
reparse
* significant amount of page table.
*/
- PT_FEAT_FLUSH_RANGE_NO_GAPS,
FLUSH_RANGE and FLUSH_RANGE_NO_GAPS are mutually exclusive but one format must select one? then we could just keep one flag (NO_GAP) then feature off means FLUSH_RANGE.
I think at least one or two more flushing modes will be needed, and they would be mutually exclusive. It is setup with this one hot encoding because of how the pt_feature function compile optimization works.
clear now.