Hi Lurii,
On Tue, Nov 26, 2019 at 3:12 AM Linux Kernel Mailing List
<linux-kernel(a)vger.kernel.org> wrote:
> Commit: 1cbeab1b242d16fdb22dc3dab6a7d6afe746ae6d
> Parent: d460623c5fa126dc51bb2571dd7714ca75b0116c
> Refname: refs/heads/master
> Web: https://git.kernel.org/torvalds/c/1cbeab1b242d16fdb22dc3dab6a7d6afe746ae6d
> Author: Iurii Zaikin <yzaikin(a)google.com>
> AuthorDate: Thu Oct 17 15:12:33 2019 -0700
> Committer: Shuah Khan <skhan(a)linuxfoundation.org>
> CommitDate: Wed Oct 23 10:28:23 2019 -0600
>
> ext4: add kunit test for decoding extended timestamps
>
> KUnit tests for decoding extended 64 bit timestamps that verify the
> seconds part of [a/c/m] timestamps in ext4 inode structs are decoded
> correctly.
>
> Test data is derived from the table in the Inode Timestamps section of
> Documentation/filesystems/ext4/inodes.rst.
>
> KUnit tests run during boot and output the results to the debug log
> in TAP format (http://testanything.org/). Only useful for kernel devs
> running KUnit test harness and are not for inclusion into a production
> build.
>
> Signed-off-by: Iurii Zaikin <yzaikin(a)google.com>
> Reviewed-by: Theodore Ts'o <tytso(a)mit.edu>
> Reviewed-by: Brendan Higgins <brendanhiggins(a)google.com>
> Tested-by: Brendan Higgins <brendanhiggins(a)google.com>
> Reviewed-by: Shuah Khan <skhan(a)linuxfoundation.org>
> Signed-off-by: Shuah Khan <skhan(a)linuxfoundation.org>
While this test succeeds on arm64, it fails on m68k and arm32 (presumably
all 32-bit platforms?):
# Subtest: ext4_inode_test
1..1
# inode_test_xtimestamp_decoding: EXPECTATION FAILED at fs/ext4/inode-test.c:250
Expected test_data[i].expected.tv_sec == timestamp.tv_sec, but
test_data[i].expected.tv_sec == -2147483648
timestamp.tv_sec == 2147483648
1901-12-13 Lower bound of 32bit < 0 timestamp, no extra bits: msb:1
lower_bound:1 extra_bits: 0
# inode_test_xtimestamp_decoding: EXPECTATION FAILED at fs/ext4/inode-test.c:250
Expected test_data[i].expected.tv_sec == timestamp.tv_sec, but
test_data[i].expected.tv_sec == 2147483648
timestamp.tv_sec == 6442450944
2038-01-19 Lower bound of 32bit <0 timestamp, lo extra sec bit on:
msb:1 lower_bound:1 extra_bits: 1
# inode_test_xtimestamp_decoding: EXPECTATION FAILED at fs/ext4/inode-test.c:250
Expected test_data[i].expected.tv_sec == timestamp.tv_sec, but
test_data[i].expected.tv_sec == 6442450944
timestamp.tv_sec == 10737418240
2174-02-25 Lower bound of 32bit <0 timestamp, hi extra sec bit on:
msb:1 lower_bound:1 extra_bits: 2
not ok 1 - inode_test_xtimestamp_decoding
not ok 1 - ext4_inode_test
Gr{oetje,eeting}s,
Geert
--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert(a)linux-m68k.org
In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds
Previous error message for invalid kunitconfig was vague. Added to it so
that it lists invalid fields and prompts for them to be removed. Added
validate_config function returning whether or not this kconfig is valid.
Signed-off-by: Heidi Fahim <heidifahim(a)google.com>
---
tools/testing/kunit/kunit_kernel.py | 27 +++++++++++++++------------
1 file changed, 15 insertions(+), 12 deletions(-)
diff --git a/tools/testing/kunit/kunit_kernel.py b/tools/testing/kunit/kunit_kernel.py
index bf3876835331..010d3f5030d2 100644
--- a/tools/testing/kunit/kunit_kernel.py
+++ b/tools/testing/kunit/kunit_kernel.py
@@ -93,6 +93,19 @@ class LinuxSourceTree(object):
return False
return True
+ def validate_config(self, build_dir):
+ kconfig_path = get_kconfig_path(build_dir)
+ validated_kconfig = kunit_config.Kconfig()
+ validated_kconfig.read_from_file(kconfig_path)
+ if not self._kconfig.is_subset_of(validated_kconfig):
+ invalid = self._kconfig.entries() - validated_kconfig.entries()
+ message = 'Provided Kconfig is not contained in validated .config. Invalid fields found in kunitconfig: %s' % (
+ ', '.join([str(e) for e in invalid])
+ )
+ logging.error(message)
+ return False
+ return True
+
def build_config(self, build_dir):
kconfig_path = get_kconfig_path(build_dir)
if build_dir and not os.path.exists(build_dir):
@@ -103,12 +116,7 @@ class LinuxSourceTree(object):
except ConfigError as e:
logging.error(e)
return False
- validated_kconfig = kunit_config.Kconfig()
- validated_kconfig.read_from_file(kconfig_path)
- if not self._kconfig.is_subset_of(validated_kconfig):
- logging.error('Provided Kconfig is not contained in validated .config!')
- return False
- return True
+ return self.validate_config(build_dir)
def build_reconfig(self, build_dir):
"""Creates a new .config if it is not a subset of the kunitconfig."""
@@ -133,12 +141,7 @@ class LinuxSourceTree(object):
except (ConfigError, BuildError) as e:
logging.error(e)
return False
- used_kconfig = kunit_config.Kconfig()
- used_kconfig.read_from_file(get_kconfig_path(build_dir))
- if not self._kconfig.is_subset_of(used_kconfig):
- logging.error('Provided Kconfig is not contained in final config!')
- return False
- return True
+ return self.validate_config(build_dir)
def run_kernel(self, args=[], timeout=None, build_dir=None):
args.extend(['mem=256M'])
--
2.24.0.432.g9d3f5f5b63-goog
Support for frequency limits in dev_pm_qos was removed when cpufreq was
switched to freq_qos, this series attempts to restore it by
reimplementing on top of freq_qos.
Discussion about removal is here:
https://lore.kernel.org/linux-pm/VI1PR04MB7023DF47D046AEADB4E051EBEE680@VI1…
The cpufreq core switched away because it needs contraints at the level
of a "cpufreq_policy" which cover multiple cpus so dev_pm_qos coupling
to struct device was not useful. Cpufreq could only use dev_pm_qos by
implementing an additional layer of aggregation anyway.
However in the devfreq subsystem scaling is always performed on a per-device
basis so dev_pm_qos is a very good match. Support for dev_pm_qos in devfreq
core is here:
https://patchwork.kernel.org/cover/11252409/
That series is RFC mostly because it needs these PM core patches.
Earlier versions got entangled in some locking cleanups but those are
not strictly necessary to get dev_pm_qos functionality.
In theory if freq_qos is extended to handle conflicting min/max values then
this sharing would be valuable. Right now freq_qos just ties two unrelated
pm_qos aggregations for min and max freq.
---
This is implemented by embeding a freq_qos_request inside dev_pm_qos_request:
the data field was already an union in order to deal with flag requests.
The internal freq_qos_apply is exported so that it can be called from
dev_pm_qos apply_constraints.
The dev_pm_qos_constraints_destroy function has no obvious equivalent in
freq_qos and the whole approach of "removing requests" is somewhat dubios:
request objects should be owned by consumers and the list of qos requests
should be empty when the target device is deleted.
Changes since v2:
* #define PM_QOS_MAX_FREQUENCY_DEFAULT_VALUE FREQ_QOS_MAX_DEFAULT_VALUE
* #define FREQ_QOS_MAX_DEFAULT_VALUE S32_MAX (in new patch)
* Add initial kunit test for freq_qos, validating the MAX_DEFAULT_VALUE found
by Matthias.
Link to v2: https://patchwork.kernel.org/cover/11250413/
First two patches can be applied separated
Changes since v1:
* Don't rename or EXPORT_SYMBOL_GPL the freq_qos_apply function; just
drop the static marker.
Link to v1: https://patchwork.kernel.org/cover/11212887/
Leonard Crestez (4):
PM / QoS: Initial kunit test
PM / QOS: Redefine FREQ_QOS_MAX_DEFAULT_VALUE to S32_MAX
PM / QoS: Reorder pm_qos/freq_qos/dev_pm_qos structs
PM / QoS: Restore DEV_PM_QOS_MIN/MAX_FREQUENCY
drivers/base/Kconfig | 4 ++
drivers/base/power/Makefile | 1 +
drivers/base/power/qos-test.c | 116 ++++++++++++++++++++++++++++++++++
drivers/base/power/qos.c | 69 ++++++++++++++++++--
include/linux/pm_qos.h | 86 ++++++++++++++-----------
kernel/power/qos.c | 4 +-
6 files changed, 237 insertions(+), 43 deletions(-)
create mode 100644 drivers/base/power/qos-test.c
--
2.17.1
Hi,
Here is the 3rd version of patches to fix some issues which happens on
the kernel with CONFIG_FUNCTION_TRACER=n or CONFIG_DYNAMIC_FTRACE=n.
In this version and v2, I updated the descriptions of the first 2 patches
according to Steve's comment, added Steve's Reviewed-by to the 3rd patch,
and added the 4th patch which was newly found.
Thank you,
---
Masami Hiramatsu (4):
selftests/ftrace: Fix to check the existence of set_ftrace_filter
selftests/ftrace: Fix ftrace test cases to check unsupported
selftests/ftrace: Do not to use absolute debugfs path
selftests/ftrace: Fix multiple kprobe testcase
.../ftrace/test.d/ftrace/func-filter-stacktrace.tc | 2 ++
.../selftests/ftrace/test.d/ftrace/func_cpumask.tc | 5 +++++
tools/testing/selftests/ftrace/test.d/functions | 4 +++-
.../ftrace/test.d/kprobe/multiple_kprobes.tc | 6 +++---
.../inter-event/trigger-action-hist-xfail.tc | 4 ++--
.../inter-event/trigger-onchange-action-hist.tc | 2 +-
.../inter-event/trigger-snapshot-action-hist.tc | 4 ++--
7 files changed, 18 insertions(+), 9 deletions(-)
--
Masami Hiramatsu (Linaro) <mhiramat(a)kernel.org>
Hi Linus,
Please pull the following kselftest fixes update for Linux 5.5-rc1.
This kselftest fixes update for Linux 5.5-rc1 consists of several
fixes to tests and framework. Masami Hiramatsu fixed several tests
to build and run correctly on arm and other 32bit architectures.
diff is attached.
thanks,
-- Shuah
----------------------------------------------------------------
The following changes since commit ce3a677802121e038d2f062e90f96f84e7351da0:
selftests: watchdog: Add command line option to show watchdog_info
(2019-10-02 13:44:43 -0600)
are available in the Git repository at:
git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
tags/linux-kselftest-5.5-rc1-fixes
for you to fetch changes up to ed2d8fa734e7759ac3788a19f308d3243d0eb164:
selftests: sync: Fix cast warnings on arm (2019-11-07 14:54:37 -0700)
----------------------------------------------------------------
linux-kselftest-5.5-rc1-fixes
This kselftest fixes update for Linux 5.5-rc1 consists of several
fixes to tests and framework. Masami Hiramatsu fixed several tests
to build and run correctly on arm and other 32bit architectures.
----------------------------------------------------------------
Kees Cook (2):
selftests: gen_kselftest_tar.sh: Do not clobber kselftest/
selftests: Move kselftest_module.sh into kselftest/
Masami Hiramatsu (6):
selftests: breakpoints: Fix a typo of function name
selftests: proc: Make va_max 1MB
selftests: vm: Build/Run 64bit tests only on 64bit arch
selftests: net: Use size_t and ssize_t for counting file size
selftests: net: Fix printf format warnings on arm
selftests: sync: Fix cast warnings on arm
Prabhakar Kushwaha (1):
kselftest: Fix NULL INSTALL_PATH for TARGETS runlist
Shuah Khan (1):
selftests: Fix O= and KBUILD_OUTPUT handling for relative paths
tools/testing/selftests/Makefile | 8 +++++---
.../selftests/breakpoints/breakpoint_test_arm64.c | 2 +-
tools/testing/selftests/gen_kselftest_tar.sh | 21
+++++++++++--------
.../{kselftest_module.sh => kselftest/module.sh} | 0
tools/testing/selftests/kselftest_install.sh | 24
+++++++++++-----------
tools/testing/selftests/lib/bitmap.sh | 2 +-
tools/testing/selftests/lib/prime_numbers.sh | 2 +-
tools/testing/selftests/lib/printf.sh | 2 +-
tools/testing/selftests/lib/strscpy.sh | 2 +-
tools/testing/selftests/net/so_txtime.c | 4 ++--
tools/testing/selftests/net/tcp_mmap.c | 8 ++++----
tools/testing/selftests/net/udpgso.c | 3 ++-
tools/testing/selftests/net/udpgso_bench_tx.c | 3 ++-
.../selftests/proc/proc-self-map-files-002.c | 6 +++++-
tools/testing/selftests/sync/sync.c | 6 +++---
tools/testing/selftests/vm/Makefile | 5 +++++
tools/testing/selftests/vm/run_vmtests | 10 +++++++++
17 files changed, 68 insertions(+), 40 deletions(-)
rename tools/testing/selftests/{kselftest_module.sh =>
kselftest/module.sh} (100%)
----------------------------------------------------------------
These counters will track hugetlb reservations rather than hugetlb
memory faulted in. This patch only adds the counter, following patches
add the charging and uncharging of the counter.
Problem:
Currently tasks attempting to allocate more hugetlb memory than is available get
a failure at mmap/shmget time. This is thanks to Hugetlbfs Reservations [1].
However, if a task attempts to allocate hugetlb memory only more than its
hugetlb_cgroup limit allows, the kernel will allow the mmap/shmget call,
but will SIGBUS the task when it attempts to fault the memory in.
We have developers interested in using hugetlb_cgroups, and they have expressed
dissatisfaction regarding this behavior. We'd like to improve this
behavior such that tasks violating the hugetlb_cgroup limits get an error on
mmap/shmget time, rather than getting SIGBUS'd when they try to fault
the excess memory in.
The underlying problem is that today's hugetlb_cgroup accounting happens
at hugetlb memory *fault* time, rather than at *reservation* time.
Thus, enforcing the hugetlb_cgroup limit only happens at fault time, and
the offending task gets SIGBUS'd.
Proposed Solution:
A new page counter named hugetlb.xMB.reservation_[limit|usage]_in_bytes. This
counter has slightly different semantics than
hugetlb.xMB.[limit|usage]_in_bytes:
- While usage_in_bytes tracks all *faulted* hugetlb memory,
reservation_usage_in_bytes tracks all *reserved* hugetlb memory and
hugetlb memory faulted in without a prior reservation.
- If a task attempts to reserve more memory than limit_in_bytes allows,
the kernel will allow it to do so. But if a task attempts to reserve
more memory than reservation_limit_in_bytes, the kernel will fail this
reservation.
This proposal is implemented in this patch series, with tests to verify
functionality and show the usage. We also added cgroup-v2 support to
hugetlb_cgroup so that the new use cases can be extended to v2.
Alternatives considered:
1. A new cgroup, instead of only a new page_counter attached to
the existing hugetlb_cgroup. Adding a new cgroup seemed like a lot of code
duplication with hugetlb_cgroup. Keeping hugetlb related page counters under
hugetlb_cgroup seemed cleaner as well.
2. Instead of adding a new counter, we considered adding a sysctl that modifies
the behavior of hugetlb.xMB.[limit|usage]_in_bytes, to do accounting at
reservation time rather than fault time. Adding a new page_counter seems
better as userspace could, if it wants, choose to enforce different cgroups
differently: one via limit_in_bytes, and another via
reservation_limit_in_bytes. This could be very useful if you're
transitioning how hugetlb memory is partitioned on your system one
cgroup at a time, for example. Also, someone may find usage for both
limit_in_bytes and reservation_limit_in_bytes concurrently, and this
approach gives them the option to do so.
Testing:
- Added tests passing.
- libhugetlbfs tests mostly passing, but some tests have trouble with and
without this patch series. Seems environment issue rather than code:
- Overall results:
********** TEST SUMMARY
* 2M
* 32-bit 64-bit
* Total testcases: 84 0
* Skipped: 0 0
* PASS: 66 0
* FAIL: 14 0
* Killed by signal: 0 0
* Bad configuration: 4 0
* Expected FAIL: 0 0
* Unexpected PASS: 0 0
* Test not present: 0 0
* Strange test result: 0 0
**********
- Failing tests:
- elflink_rw_and_share_test("linkhuge_rw") segfaults with and without this
patch series.
- LD_PRELOAD=libhugetlbfs.so HUGETLB_MORECORE=yes malloc (2M: 32):
FAIL Address is not hugepage
- LD_PRELOAD=libhugetlbfs.so HUGETLB_RESTRICT_EXE=unknown:malloc
HUGETLB_MORECORE=yes malloc (2M: 32):
FAIL Address is not hugepage
- LD_PRELOAD=libhugetlbfs.so HUGETLB_MORECORE=yes malloc_manysmall (2M: 32):
FAIL Address is not hugepage
- GLIBC_TUNABLES=glibc.malloc.tcache_count=0 LD_PRELOAD=libhugetlbfs.so
HUGETLB_MORECORE=yes heapshrink (2M: 32):
FAIL Heap not on hugepages
- GLIBC_TUNABLES=glibc.malloc.tcache_count=0 LD_PRELOAD=libhugetlbfs.so
libheapshrink.so HUGETLB_MORECORE=yes heapshrink (2M: 32):
FAIL Heap not on hugepages
- HUGETLB_ELFMAP=RW linkhuge_rw (2M: 32): FAIL small_data is not hugepage
- HUGETLB_ELFMAP=RW HUGETLB_MINIMAL_COPY=no linkhuge_rw (2M: 32):
FAIL small_data is not hugepage
- alloc-instantiate-race shared (2M: 32):
Bad configuration: sched_setaffinity(cpu1): Invalid argument -
FAIL Child 1 killed by signal Killed
- shmoverride_linked (2M: 32):
FAIL shmget failed size 2097152 from line 176: Invalid argument
- HUGETLB_SHM=yes shmoverride_linked (2M: 32):
FAIL shmget failed size 2097152 from line 176: Invalid argument
- shmoverride_linked_static (2M: 32):
FAIL shmget failed size 2097152 from line 176: Invalid argument
- HUGETLB_SHM=yes shmoverride_linked_static (2M: 32):
FAIL shmget failed size 2097152 from line 176: Invalid argument
- LD_PRELOAD=libhugetlbfs.so shmoverride_unlinked (2M: 32):
FAIL shmget failed size 2097152 from line 176: Invalid argument
- LD_PRELOAD=libhugetlbfs.so HUGETLB_SHM=yes shmoverride_unlinked (2M: 32):
FAIL shmget failed size 2097152 from line 176: Invalid argument
[1]: https://www.kernel.org/doc/html/latest/vm/hugetlbfs_reserv.html
Signed-off-by: Mina Almasry <almasrymina(a)google.com>
Acked-by: Hillf Danton <hdanton(a)sina.com>
---
include/linux/hugetlb.h | 23 ++++++++-
mm/hugetlb_cgroup.c | 111 ++++++++++++++++++++++++++++++----------
2 files changed, 107 insertions(+), 27 deletions(-)
diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index 53fc34f930d08..9c49a0ba894d3 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -320,6 +320,27 @@ unsigned long hugetlb_get_unmapped_area(struct file *file, unsigned long addr,
#ifdef CONFIG_HUGETLB_PAGE
+enum {
+ /* Tracks hugetlb memory faulted in. */
+ HUGETLB_RES_USAGE,
+ /* Tracks hugetlb memory reserved. */
+ HUGETLB_RES_RESERVATION_USAGE,
+ /* Limit for hugetlb memory faulted in. */
+ HUGETLB_RES_LIMIT,
+ /* Limit for hugetlb memory reserved. */
+ HUGETLB_RES_RESERVATION_LIMIT,
+ /* Max usage for hugetlb memory faulted in. */
+ HUGETLB_RES_MAX_USAGE,
+ /* Max usage for hugetlb memory reserved. */
+ HUGETLB_RES_RESERVATION_MAX_USAGE,
+ /* Faulted memory accounting fail count. */
+ HUGETLB_RES_FAILCNT,
+ /* Reserved memory accounting fail count. */
+ HUGETLB_RES_RESERVATION_FAILCNT,
+ HUGETLB_RES_NULL,
+ HUGETLB_RES_MAX,
+};
+
#define HSTATE_NAME_LEN 32
/* Defines one hugetlb page size */
struct hstate {
@@ -340,7 +361,7 @@ struct hstate {
unsigned int surplus_huge_pages_node[MAX_NUMNODES];
#ifdef CONFIG_CGROUP_HUGETLB
/* cgroup control files */
- struct cftype cgroup_files[5];
+ struct cftype cgroup_files[HUGETLB_RES_MAX];
#endif
char name[HSTATE_NAME_LEN];
};
diff --git a/mm/hugetlb_cgroup.c b/mm/hugetlb_cgroup.c
index f1930fa0b445d..1ed4448ca41d3 100644
--- a/mm/hugetlb_cgroup.c
+++ b/mm/hugetlb_cgroup.c
@@ -25,6 +25,10 @@ struct hugetlb_cgroup {
* the counter to account for hugepages from hugetlb.
*/
struct page_counter hugepage[HUGE_MAX_HSTATE];
+ /*
+ * the counter to account for hugepage reservations from hugetlb.
+ */
+ struct page_counter reserved_hugepage[HUGE_MAX_HSTATE];
};
#define MEMFILE_PRIVATE(x, val) (((x) << 16) | (val))
@@ -33,6 +37,14 @@ struct hugetlb_cgroup {
static struct hugetlb_cgroup *root_h_cgroup __read_mostly;
+static inline struct page_counter *
+hugetlb_cgroup_get_counter(struct hugetlb_cgroup *h_cg, int idx, bool reserved)
+{
+ if (reserved)
+ return &h_cg->reserved_hugepage[idx];
+ return &h_cg->hugepage[idx];
+}
+
static inline
struct hugetlb_cgroup *hugetlb_cgroup_from_css(struct cgroup_subsys_state *s)
{
@@ -254,30 +266,33 @@ void hugetlb_cgroup_uncharge_cgroup(int idx, unsigned long nr_pages,
return;
}
-enum {
- RES_USAGE,
- RES_LIMIT,
- RES_MAX_USAGE,
- RES_FAILCNT,
-};
-
static u64 hugetlb_cgroup_read_u64(struct cgroup_subsys_state *css,
struct cftype *cft)
{
struct page_counter *counter;
+ struct page_counter *reserved_counter;
struct hugetlb_cgroup *h_cg = hugetlb_cgroup_from_css(css);
counter = &h_cg->hugepage[MEMFILE_IDX(cft->private)];
+ reserved_counter = &h_cg->reserved_hugepage[MEMFILE_IDX(cft->private)];
switch (MEMFILE_ATTR(cft->private)) {
- case RES_USAGE:
+ case HUGETLB_RES_USAGE:
return (u64)page_counter_read(counter) * PAGE_SIZE;
- case RES_LIMIT:
+ case HUGETLB_RES_RESERVATION_USAGE:
+ return (u64)page_counter_read(reserved_counter) * PAGE_SIZE;
+ case HUGETLB_RES_LIMIT:
return (u64)counter->max * PAGE_SIZE;
- case RES_MAX_USAGE:
+ case HUGETLB_RES_RESERVATION_LIMIT:
+ return (u64)reserved_counter->max * PAGE_SIZE;
+ case HUGETLB_RES_MAX_USAGE:
return (u64)counter->watermark * PAGE_SIZE;
- case RES_FAILCNT:
+ case HUGETLB_RES_RESERVATION_MAX_USAGE:
+ return (u64)reserved_counter->watermark * PAGE_SIZE;
+ case HUGETLB_RES_FAILCNT:
return counter->failcnt;
+ case HUGETLB_RES_RESERVATION_FAILCNT:
+ return reserved_counter->failcnt;
default:
BUG();
}
@@ -291,6 +306,7 @@ static ssize_t hugetlb_cgroup_write(struct kernfs_open_file *of,
int ret, idx;
unsigned long nr_pages;
struct hugetlb_cgroup *h_cg = hugetlb_cgroup_from_css(of_css(of));
+ bool reserved = false;
if (hugetlb_cgroup_is_root(h_cg)) /* Can't set limit on root */
return -EINVAL;
@@ -304,9 +320,14 @@ static ssize_t hugetlb_cgroup_write(struct kernfs_open_file *of,
nr_pages = round_down(nr_pages, 1 << huge_page_order(&hstates[idx]));
switch (MEMFILE_ATTR(of_cft(of)->private)) {
- case RES_LIMIT:
+ case HUGETLB_RES_RESERVATION_LIMIT:
+ reserved = true;
+ /* Fall through. */
+ case HUGETLB_RES_LIMIT:
mutex_lock(&hugetlb_limit_mutex);
- ret = page_counter_set_max(&h_cg->hugepage[idx], nr_pages);
+ ret = page_counter_set_max(hugetlb_cgroup_get_counter(h_cg, idx,
+ reserved),
+ nr_pages);
mutex_unlock(&hugetlb_limit_mutex);
break;
default:
@@ -320,18 +341,26 @@ static ssize_t hugetlb_cgroup_reset(struct kernfs_open_file *of,
char *buf, size_t nbytes, loff_t off)
{
int ret = 0;
- struct page_counter *counter;
+ struct page_counter *counter, *reserved_counter;
struct hugetlb_cgroup *h_cg = hugetlb_cgroup_from_css(of_css(of));
counter = &h_cg->hugepage[MEMFILE_IDX(of_cft(of)->private)];
+ reserved_counter =
+ &h_cg->reserved_hugepage[MEMFILE_IDX(of_cft(of)->private)];
switch (MEMFILE_ATTR(of_cft(of)->private)) {
- case RES_MAX_USAGE:
+ case HUGETLB_RES_MAX_USAGE:
page_counter_reset_watermark(counter);
break;
- case RES_FAILCNT:
+ case HUGETLB_RES_RESERVATION_MAX_USAGE:
+ page_counter_reset_watermark(reserved_counter);
+ break;
+ case HUGETLB_RES_FAILCNT:
counter->failcnt = 0;
break;
+ case HUGETLB_RES_RESERVATION_FAILCNT:
+ reserved_counter->failcnt = 0;
+ break;
default:
ret = -EINVAL;
break;
@@ -357,37 +386,67 @@ static void __init __hugetlb_cgroup_file_init(int idx)
struct hstate *h = &hstates[idx];
/* format the size */
- mem_fmt(buf, 32, huge_page_size(h));
+ mem_fmt(buf, sizeof(buf), huge_page_size(h));
/* Add the limit file */
- cft = &h->cgroup_files[0];
+ cft = &h->cgroup_files[HUGETLB_RES_LIMIT];
snprintf(cft->name, MAX_CFTYPE_NAME, "%s.limit_in_bytes", buf);
- cft->private = MEMFILE_PRIVATE(idx, RES_LIMIT);
+ cft->private = MEMFILE_PRIVATE(idx, HUGETLB_RES_LIMIT);
+ cft->read_u64 = hugetlb_cgroup_read_u64;
+ cft->write = hugetlb_cgroup_write;
+
+ /* Add the reservation limit file */
+ cft = &h->cgroup_files[HUGETLB_RES_RESERVATION_LIMIT];
+ snprintf(cft->name, MAX_CFTYPE_NAME, "%s.reservation_limit_in_bytes",
+ buf);
+ cft->private = MEMFILE_PRIVATE(idx, HUGETLB_RES_RESERVATION_LIMIT);
cft->read_u64 = hugetlb_cgroup_read_u64;
cft->write = hugetlb_cgroup_write;
/* Add the usage file */
- cft = &h->cgroup_files[1];
+ cft = &h->cgroup_files[HUGETLB_RES_USAGE];
snprintf(cft->name, MAX_CFTYPE_NAME, "%s.usage_in_bytes", buf);
- cft->private = MEMFILE_PRIVATE(idx, RES_USAGE);
+ cft->private = MEMFILE_PRIVATE(idx, HUGETLB_RES_USAGE);
+ cft->read_u64 = hugetlb_cgroup_read_u64;
+
+ /* Add the reservation usage file */
+ cft = &h->cgroup_files[HUGETLB_RES_RESERVATION_USAGE];
+ snprintf(cft->name, MAX_CFTYPE_NAME, "%s.reservation_usage_in_bytes",
+ buf);
+ cft->private = MEMFILE_PRIVATE(idx, HUGETLB_RES_RESERVATION_USAGE);
cft->read_u64 = hugetlb_cgroup_read_u64;
/* Add the MAX usage file */
- cft = &h->cgroup_files[2];
+ cft = &h->cgroup_files[HUGETLB_RES_MAX_USAGE];
snprintf(cft->name, MAX_CFTYPE_NAME, "%s.max_usage_in_bytes", buf);
- cft->private = MEMFILE_PRIVATE(idx, RES_MAX_USAGE);
+ cft->private = MEMFILE_PRIVATE(idx, HUGETLB_RES_MAX_USAGE);
+ cft->write = hugetlb_cgroup_reset;
+ cft->read_u64 = hugetlb_cgroup_read_u64;
+
+ /* Add the MAX reservation usage file */
+ cft = &h->cgroup_files[HUGETLB_RES_RESERVATION_MAX_USAGE];
+ snprintf(cft->name, MAX_CFTYPE_NAME,
+ "%s.reservation_max_usage_in_bytes", buf);
+ cft->private = MEMFILE_PRIVATE(idx, HUGETLB_RES_RESERVATION_MAX_USAGE);
cft->write = hugetlb_cgroup_reset;
cft->read_u64 = hugetlb_cgroup_read_u64;
/* Add the failcntfile */
- cft = &h->cgroup_files[3];
+ cft = &h->cgroup_files[HUGETLB_RES_FAILCNT];
snprintf(cft->name, MAX_CFTYPE_NAME, "%s.failcnt", buf);
- cft->private = MEMFILE_PRIVATE(idx, RES_FAILCNT);
+ cft->private = MEMFILE_PRIVATE(idx, HUGETLB_RES_FAILCNT);
+ cft->write = hugetlb_cgroup_reset;
+ cft->read_u64 = hugetlb_cgroup_read_u64;
+
+ /* Add the reservation failcntfile */
+ cft = &h->cgroup_files[HUGETLB_RES_RESERVATION_FAILCNT];
+ snprintf(cft->name, MAX_CFTYPE_NAME, "%s.reservation_failcnt", buf);
+ cft->private = MEMFILE_PRIVATE(idx, HUGETLB_RES_RESERVATION_FAILCNT);
cft->write = hugetlb_cgroup_reset;
cft->read_u64 = hugetlb_cgroup_read_u64;
/* NULL terminate the last cft */
- cft = &h->cgroup_files[4];
+ cft = &h->cgroup_files[HUGETLB_RES_NULL];
memset(cft, 0, sizeof(*cft));
WARN_ON(cgroup_add_legacy_cftypes(&hugetlb_cgrp_subsys,
--
2.24.0.rc1.363.gb1bccd3e3d-goog
Hi,
Here is a set of well-reviewed (expect for one patch), lower-risk items
that can go into Linux 5.5. The one patch that wasn't reviewed is the
powerpc conversion, and it's still at this point a no-op, because
tracking isn't yet activated.
This is based on linux-next: b9d3d01405061bb42358fe53f824e894a1922ced
("Add linux-next specific files for 20191122").
This is essentially a cut-down v8 of "mm/gup: track dma-pinned pages:
FOLL_PIN" [1], and with one of the VFIO patches split into two patches.
The idea here is to get this long list of "noise" checked into 5.5, so
that the actual, higher-risk "track FOLL_PIN pages" (which is deferred:
not part of this series) will be a much shorter patchset to review.
For the v4l2-core changes, I've left those here (instead of sending
them separately to the -media tree), in order to get the name change
done now (put_user_page --> unpin_user_page). However, I've added a Cc
stable, as recommended during the last round of reviews.
Here are the relevant notes from the original cover letter, edited to
match the current situation:
This is a prerequisite to tracking dma-pinned pages. That in turn is a
prerequisite to solving the larger problem of proper interactions
between file-backed pages, and [R]DMA activities, as discussed in [1],
[2], [3], and in a remarkable number of email threads since about
2017. :)
A new internal gup flag, FOLL_PIN is introduced, and thoroughly
documented in the last patch's Documentation/vm/pin_user_pages.rst.
I believe that this will provide a good starting point for doing the
layout lease work that Ira Weiny has been working on. That's because
these new wrapper functions provide a clean, constrained, systematically
named set of functionality that, again, is required in order to even
know if a page is "dma-pinned".
In contrast to earlier approaches, the page tracking can be
incrementally applied to the kernel call sites that, until now, have
been simply calling get_user_pages() ("gup"). In other words, opt-in by
changing from this:
get_user_pages() (sets FOLL_GET)
put_page()
to this:
pin_user_pages() (sets FOLL_PIN)
put_user_page()
Because there are interdependencies with FOLL_LONGTERM, a similar
conversion as for FOLL_PIN, was applied. The change was from this:
get_user_pages(FOLL_LONGTERM) (also sets FOLL_GET)
put_page()
to this:
pin_longterm_pages() (sets FOLL_PIN | FOLL_LONGTERM)
put_user_page()
[1] https://lore.kernel.org/r/20191121071354.456618-1-jhubbard@nvidia.com
thanks,
John Hubbard
NVIDIA
Dan Williams (1):
mm: Cleanup __put_devmap_managed_page() vs ->page_free()
John Hubbard (18):
mm/gup: factor out duplicate code from four routines
mm/gup: move try_get_compound_head() to top, fix minor issues
goldish_pipe: rename local pin_user_pages() routine
mm: fix get_user_pages_remote()'s handling of FOLL_LONGTERM
vfio: fix FOLL_LONGTERM use, simplify get_user_pages_remote() call
mm/gup: introduce pin_user_pages*() and FOLL_PIN
goldish_pipe: convert to pin_user_pages() and put_user_page()
IB/{core,hw,umem}: set FOLL_PIN via pin_user_pages*(), fix up ODP
mm/process_vm_access: set FOLL_PIN via pin_user_pages_remote()
drm/via: set FOLL_PIN via pin_user_pages_fast()
fs/io_uring: set FOLL_PIN via pin_user_pages()
net/xdp: set FOLL_PIN via pin_user_pages()
media/v4l2-core: set pages dirty upon releasing DMA buffers
media/v4l2-core: pin_user_pages (FOLL_PIN) and put_user_page()
conversion
vfio, mm: pin_user_pages (FOLL_PIN) and put_user_page() conversion
powerpc: book3s64: convert to pin_user_pages() and put_user_page()
mm/gup_benchmark: use proper FOLL_WRITE flags instead of hard-coding
"1"
mm, tree-wide: rename put_user_page*() to unpin_user_page*()
Documentation/core-api/index.rst | 1 +
Documentation/core-api/pin_user_pages.rst | 233 ++++++++++++++
arch/powerpc/mm/book3s64/iommu_api.c | 12 +-
drivers/gpu/drm/via/via_dmablit.c | 6 +-
drivers/infiniband/core/umem.c | 4 +-
drivers/infiniband/core/umem_odp.c | 13 +-
drivers/infiniband/hw/hfi1/user_pages.c | 4 +-
drivers/infiniband/hw/mthca/mthca_memfree.c | 8 +-
drivers/infiniband/hw/qib/qib_user_pages.c | 4 +-
drivers/infiniband/hw/qib/qib_user_sdma.c | 8 +-
drivers/infiniband/hw/usnic/usnic_uiom.c | 4 +-
drivers/infiniband/sw/siw/siw_mem.c | 4 +-
drivers/media/v4l2-core/videobuf-dma-sg.c | 8 +-
drivers/nvdimm/pmem.c | 6 -
drivers/platform/goldfish/goldfish_pipe.c | 35 +--
drivers/vfio/vfio_iommu_type1.c | 35 +--
fs/io_uring.c | 6 +-
include/linux/mm.h | 77 +++--
mm/gup.c | 332 +++++++++++++-------
mm/gup_benchmark.c | 9 +-
mm/memremap.c | 80 ++---
mm/process_vm_access.c | 28 +-
net/xdp/xdp_umem.c | 4 +-
tools/testing/selftests/vm/gup_benchmark.c | 6 +-
24 files changed, 642 insertions(+), 285 deletions(-)
create mode 100644 Documentation/core-api/pin_user_pages.rst
--
2.24.0
Hi,
Here is the 2nd version of patches to fix some issues which happens on
the kernel with CONFIG_FUNCTION_TRACER=n or CONFIG_DYNAMIC_FTRACE=n.
In this version I just updated the descriptions of the first 2 patches
according to Steve's comment and add his Reviewed-by to the last patch.
Thank you,
---
Masami Hiramatsu (3):
selftests/ftrace: Fix to check the existence of set_ftrace_filter
selftests/ftrace: Fix ftrace test cases to check unsupported
selftests/ftrace: Do not to use absolute debugfs path
.../ftrace/test.d/ftrace/func-filter-stacktrace.tc | 2 ++
.../selftests/ftrace/test.d/ftrace/func_cpumask.tc | 5 +++++
tools/testing/selftests/ftrace/test.d/functions | 4 +++-
.../inter-event/trigger-action-hist-xfail.tc | 4 ++--
.../inter-event/trigger-onchange-action-hist.tc | 2 +-
.../inter-event/trigger-snapshot-action-hist.tc | 4 ++--
6 files changed, 15 insertions(+), 6 deletions(-)
--
Masami Hiramatsu (Linaro) <mhiramat(a)kernel.org>