linaro-kernel

linaro-kernel@lists.linaro.org

10095 discussions

Regards Of Firmware For Versatile Express TC2 board

by Leo Yan

hi Nico & all, For in my previous email, i get to know if we want to boot up successfully with Nico's latest big.LITTLE SMP related patches (these patches can get from landing team's tracking-armlt-tc2-pm branch), we need update VE's firmware, otherwise the system ONLY can boot up the first one core. For VE DVD v5.0 has updated the firmware files, we tried it but still cannot bootup successfully, below are the firmware infos now we are using: V2P-CA15_A7 DCC bios: dbb_v107.ebf; V2M-P1 MCC bios: mbb_v311.ebf; V2M-P1 Bootmonoitor: bm_v517r.axf; so i want to confirm which firmwares' version can match with Nico's patches? Or there have special setting/configuration for the boot monitor? Any suggestion is welcome and appreciate. -- Thx, Leo Yan

12 years, 6 months

Crash on Panda using kernel 3.7-rc6

by kasi viswanathan

Hi, I booted the kernel image compiled from linux-linaro-3.7-rc6-2012.11 and i got this crash. Pls let me know how can i fix it or is there any patch already availabe Uncompressing Linux... done, booting the kernel. [ 0.000000] Booting Linux on physical CPU 0 [ 0.000000] Linux version 3.7.0-rc6 (root@kasi) (gcc version 4.4.1 (Sourcery G++ Lite 2010q1-202) ) #1 SMP Thu Dec 20 08:31:16 IST 2012 [ 0.000000] CPU: ARMv7 Processor [411fc092] revision 2 (ARMv7), cr=10c53c7d [ 0.000000] CPU: PIPT / VIPT nonaliasing data cache, VIPT aliasing instruction cache [ 0.000000] Machine: OMAP4 Panda board [ 0.000000] Reserving 16777216 bytes SDRAM for VRAM [ 0.000000] Memory policy: ECC disabled, Data cache writealloc [ 0.000000] On node 0 totalpages: 126720 [ 0.000000] free_area_init_node: node 0, pgdat c07e8b40, node_mem_map c0d47000 [ 0.000000] Normal zone: 1024 pages used for memmap [ 0.000000] Normal zone: 0 pages reserved [ 0.000000] Normal zone: 125696 pages, LIFO batch:31 [ 0.000000] OMAP4430 ES2.2 [ 0.000000] PERCPU: Embedded 9 pages/cpu @c114f000 s13120 r8192 d15552 u36864 [ 0.000000] pcpu-alloc: s13120 r8192 d15552 u36864 alloc=9*4096 [ 0.000000] pcpu-alloc: [0] 0 [0] 1 [ 0.000000] Built 1 zonelists in Zone order, mobility grouping on. Total pages: 125696 [ 0.000000] Kernel command line: console=ttyO2,115200n8 vram=16M root=/dev/mmcblk0p2 rw rootfstype=ext3 rootwait [ 0.000000] PID hash table entries: 2048 (order: 1, 8192 bytes) [ 0.000000] Dentry cache hash table entries: 65536 (order: 6, 262144 bytes) [ 0.000000] Inode-cache hash table entries: 32768 (order: 5, 131072 bytes) [ 0.000000] Memory: 495MB = 495MB total [ 0.000000] Memory: 488684k/488684k available, 35604k reserved, 0K highmem [ 0.000000] Virtual kernel memory layout: [ 0.000000] vector : 0xffff0000 - 0xffff1000 ( 4 kB) [ 0.000000] fixmap : 0xfff00000 - 0xfffe0000 ( 896 kB) [ 0.000000] vmalloc : 0xe0800000 - 0xff000000 ( 488 MB) [ 0.000000] lowmem : 0xc0000000 - 0xe0000000 ( 512 MB) [ 0.000000] pkmap : 0xbfe00000 - 0xc0000000 ( 2 MB) [ 0.000000] modules : 0xbf000000 - 0xbfe00000 ( 14 MB) [ 0.000000] .text : 0xc0008000 - 0xc070d278 (7189 kB) [ 0.000000] .init : 0xc070e000 - 0xc075f340 ( 325 kB) [ 0.000000] .data : 0xc0760000 - 0xc07ebdd0 ( 560 kB) [ 0.000000] .bss : 0xc07ebdf4 - 0xc0d46804 (5483 kB) [ 0.000000] Hierarchical RCU implementation. [ 0.000000] NR_IRQS:16 nr_irqs:16 16 [ 0.000000] OMAP clockevent source: GPTIMER1 at 32768 Hz [ 0.000000] sched_clock: 32 bits at 32kHz, resolution 30517ns, wraps every 131071999ms [ 0.000000] OMAP clocksource: 32k_counter at 32768 Hz [ 0.000000] Console: colour dummy device 80x30 [ 0.000000] Lock dependency validator: Copyright (c) 2006 Red Hat, Inc., Ingo Molnar [ 0.000000] ... MAX_LOCKDEP_SUBCLASSES: 8 [ 0.000000] ... MAX_LOCK_DEPTH: 48 [ 0.000000] ... MAX_LOCKDEP_KEYS: 8191 [ 0.000000] ... CLASSHASH_SIZE: 4096 [ 0.000000] ... MAX_LOCKDEP_ENTRIES: 16384 [ 0.000000] ... MAX_LOCKDEP_CHAINS: 32768 [ 0.000000] ... CHAINHASH_SIZE: 16384 [ 0.000000] memory used by lock dependency info: 3695 kB [ 0.000000] per task-struct memory footprint: 1152 bytes [ 0.001098] Calibrating delay loop... [ 0.007904] BUG: spinlock bad magic on CPU#0, swapper/0/0 [ 0.007965] lock: atomic64_lock+0xc0/0x400, .magic: 00000000, .owner: <none>/-1, .owner_cpu: 0 [ 0.008026] [<c001ccfc>] (unwind_backtrace+0x0/0xf4) from [<c02eba20>] (do_raw_spin_lock+0x1b8/0x1f8) [ 0.008087] [<c02eba20>] (do_raw_spin_lock+0x1b8/0x1f8) from [<c0521a6c>] (_raw_spin_lock_irqsave+0x4c/0x58) [ 0.008117] [<c0521a6c>] (_raw_spin_lock_irqsave+0x4c/0x58) from [<c02f715c>] (atomic64_read+0x28/0x48) [ 0.008148] [<c02f715c>] (atomic64_read+0x28/0x48) from [<c007e1c0>] (update_cfs_rq_blocked_load+0x54/0x11c) [ 0.008148] [<c007e1c0>] (update_cfs_rq_blocked_load+0x54/0x11c) from [<c007e50c>] (task_tick_fair+0x284/0x8b0) [ 0.008209] [<c007e50c>] (task_tick_fair+0x284/0x8b0) from [<c007b0dc>] (scheduler_tick+0xd8/0x140) [ 0.008239] [<c007b0dc>] (scheduler_tick+0xd8/0x140) from [<c0056134>] (update_process_times+0x58/0x68) [ 0.008270] [<c0056134>] (update_process_times+0x58/0x68) from [<c0093910>] (tick_periodic+0x48/0xc4) [ 0.008300] [<c0093910>] (tick_periodic+0x48/0xc4) from [<c00939b0>] (tick_handle_periodic+0x24/0x98) [ 0.008331] [<c00939b0>] (tick_handle_periodic+0x24/0x98) from [<c002a104>] (omap2_gp_timer_interrupt+0x24/0x34) [ 0.008361] [<c002a104>] (omap2_gp_timer_interrupt+0x24/0x34) from [<c00acafc>] (handle_irq_event_percpu+0x64/0x244) [ 0.008392] [<c00acafc>] (handle_irq_event_percpu+0x64/0x244) from [<c00acd18>] (handle_irq_event+0x3c/0x5c) [ 0.008422] [<c00acd18>] (handle_irq_event+0x3c/0x5c) from [<c00af8f4>] (handle_fasteoi_irq+0x98/0x13c) [ 0.008422] [<c00af8f4>] (handle_fasteoi_irq+0x98/0x13c) from [<c00aca88>] (generic_handle_irq+0x28/0x30) [ 0.008483] [<c00aca88>] (generic_handle_irq+0x28/0x30) from [<c00154e0>] (handle_IRQ+0x4c/0xac) [ 0.008514] [<c00154e0>] (handle_IRQ+0x4c/0xac) from [<c00084c4>] (gic_handle_irq+0x2c/0x60) [ 0.008544] [<c00084c4>] (gic_handle_irq+0x2c/0x60) from [<c0522364>] (__irq_svc+0x44/0x5c) [ 0.008544] Exception stack(0xc0761f20 to 0xc0761f68) [ 0.008575] 1f20: 0000001a 00000001 c07620c0 ffff6a00 00000000 00000000 c07ebe8c c07620c0 [ 0.008575] 1f40: ffff6a00 c07620c0 00000000 00000000 00000001 c0761f68 c051ab3c c051ab50 [ 0.008575] 1f60: 60000153 ffffffff [ 0.008636] [<c0522364>] (__irq_svc+0x44/0x5c) from [<c051ab50>] (calibrate_delay+0x3a8/0x550) [ 0.008666] [<c051ab50>] (calibrate_delay+0x3a8/0x550) from [<c070e6cc>] (start_kernel+0x268/0x2fc) [ 0.008697] [<c070e6cc>] (start_kernel+0x268/0x2fc) from [<80008078>] (0x80008078) [ 0.101379] 2007.19 BogoMIPS (lpj=7839744) [ 0.101379] pid_max: default: 32768 minimum: 301 [ 0.101867] Security Framework initialized [ 0.102081] Mount-cache hash table entries: 512 [ 0.103698] CPU: Testing write buffer coherency: ok [ 0.104339] CPU0: thread -1, cpu 0, socket 0, mpidr 80000000 [ 0.104675] Setting up static identity map for 0x80527778 - 0x805277e8 [ 0.104736] L310 cache controller enabled [ 0.104736] l2x0: 16 ways, CACHE_ID 0x410000c4, AUX_CTRL 0x7e470000, Cache size: 1048576 B [ 0.107269] CPU1: Booted secondary processor [ 0.174926] CPU1: thread -1, cpu 1, socket 0, mpidr 80000001 [ 0.175018] Brought up 2 CPUs [ 0.175018] SMP: Total of 2 processors activated (4022.78 BogoMIPS). [ 0.175079] CPU0 packing on CPU-1 [ 0.175109] CPU1 packing on CPU-1 [ 0.179840] ttyO2 used as console in debug mode: uart2 clocks will not be gated [ 0.187591] omap_hwmod: l3_instr: _wait_target_disable failed [ 0.242126] pinctrl core: initialized pinctrl subsystem [ 0.243072] regulator-dummy: no parameters [ 0.243499] NET: Registered protocol family 16 [ 0.244873] DMA: preallocated 256 KiB pool for atomic coherent allocations [ 0.245544] omap-gpmc omap-gpmc: GPMC revision 6.0 [ 0.248992] gpiochip_add: registered GPIOs 0 to 31 on device: gpio [ 0.249084] OMAP GPIO hardware version 0.1 [ 0.249664] gpiochip_add: registered GPIOs 32 to 63 on device: gpio [ 0.250335] gpiochip_add: registered GPIOs 64 to 95 on device: gpio [ 0.250976] gpiochip_add: registered GPIOs 96 to 127 on device: gpio [ 0.251617] gpiochip_add: registered GPIOs 128 to 159 on device: gpio [ 0.252288] gpiochip_add: registered GPIOs 160 to 191 on device: gpio [ 0.254394] omap_mux_init: Add partition: #1: core, flags: 2 [ 0.255462] omap_mux_init: Add partition: #2: wkup, flags: 2 [ 0.255706] _omap_mux_get_by_name: Could not find signal sys_nirq [ 0.255737] _omap_mux_get_by_name: Could not find signal sys_nirq [ 0.258056] _omap_mux_get_by_name: Could not find signal uart1_rx.uart1_rx [ 0.258087] _omap_mux_get_by_name: Could not find signal uart1_rx.uart1_rx [ 0.285980] hw-breakpoint: found 5 (+1 reserved) breakpoint and 1 watchpoint registers. [ 0.286010] hw-breakpoint: maximum watchpoint size is 4 bytes. [ 0.291900] OMAP DMA hardware revision 0.0 [ 0.294128] ARM PMU: not yet supported on OMAP4430 due to missing CTI driver [ 0.312683] bio: create slab <bio-0> at 0 [ 0.340423] omap-dma-engine omap-dma-engine: OMAP DMA engine driver [ 0.340789] vwl1271: 1800 mV [ 0.342712] SCSI subsystem initialized [ 0.343963] usbcore: registered new interface driver usbfs [ 0.344146] usbcore: registered new interface driver hub [ 0.344421] usbcore: registered new device driver usb [ 0.356689] Skipping twl internal clock init and using bootloader value (unknown osc rate) [ 0.357574] twl 1-0048: PIH (irq 39) chaining IRQs 418..438 [ 0.358459] VUSB: 3300 mV normal standby [ 0.359008] vdd_mpu: 500 <--> 1500 mV normal [ 0.359466] vdd_iva: 500 <--> 1500 mV normal [ 0.359985] vdd_core: 500 <--> 1500 mV normal [ 0.360717] V1V8: 1800 mV normal standby [ 0.361236] V2V1: 2100 mV normal standby [ 0.361907] VMMC: 1200 <--> 3000 mV at 3000 mV normal standby [ 0.362762] VPP: 1800 <--> 2500 mV at 1900 mV normal standby [ 0.363677] VCXIO: 1800 mV normal standby [ 0.363708] VCXIO: supplied by V2V1 [ 0.364868] VDAC: 1800 mV normal standby [ 0.364898] VDAC: supplied by V2V1 [ 0.365875] VAUX2_6030: 1200 <--> 2800 mV at 1800 mV normal standby [ 0.366668] VAUX3_6030: 1000 <--> 3000 mV at 1200 mV normal standby [ 0.367553] VANA: 2100 mV normal standby [ 0.367889] omap_i2c omap_i2c.1: bus 1 rev2.4.0 at 400 kHz [ 0.378723] omap_i2c omap_i2c.2: bus 2 rev2.4.0 at 400 kHz [ 0.394348] omap_i2c omap_i2c.3: bus 3 rev2.4.0 at 100 kHz [ 0.409698] omap_i2c omap_i2c.4: bus 4 rev2.4.0 at 400 kHz [ 0.411407] Switching to clocksource 32k_counter [ 0.447662] NET: Registered protocol family 2 [ 0.448364] TCP established hash table entries: 16384 (order: 5, 131072 bytes) [ 0.448699] TCP bind hash table entries: 16384 (order: 7, 589824 bytes) [ 0.449951] TCP: Hash tables configured (established 16384 bind 16384) [ 0.450103] TCP: reno registered [ 0.450134] UDP hash table entries: 256 (order: 2, 20480 bytes) [ 0.450195] UDP-Lite hash table entries: 256 (order: 2, 20480 bytes) [ 0.450653] NET: Registered protocol family 1 [ 0.451110] RPC: Registered named UNIX socket transport module. [ 0.451110] RPC: Registered udp transport module. [ 0.451110] RPC: Registered tcp transport module. [ 0.451110] RPC: Registered tcp NFSv4.1 backchannel transpo

12 years, 6 months

[ACTIVITY] (John Stultz) Dec 17-21

by John Stultz

=== Highlights === * More discussions w/ Zach kernel tree mgmt stuff * Minchan sent v4 of his volatile anon vma patch, and I took an initial look. * Worked with Dmitry's ashmem unit test, researched ioctl numbering issue. * Reviewed Serban's patch and provided feedback * Updated linaro.android tree w/ fixes required for Tushar as well as cpufreq updates === Plans === * Holiday break. Hope everyone has a happy new year! === Issues === * NA

12 years, 6 months

(Dong Aisheng) 2012-11-10 - 2012-12-17

by Dong Aisheng

=== Issues === * Took the whole week leave to take of my family in hospital. Backed to work today and then possibly will take leave on 12/20 and 12/21 again since my sister has a surgical operation on that day. <http://dict.baidu.com/s?wd=surgical%20operation>

12 years, 6 months

[ACTIVITY] (Rajanikanth H V) 2012-12-10 to 2012-12-14

by Rajanikanth HV

==== Activity Summary ==== * Discussion with ShawnGuo regarding runtime size verification on i.MX platform. ShawnGuo communicated first cut "size data" information about modules, however "change in procedure" and verification on 3.7 kernel needs to be carried out. I have communicated the same. * Finalized runtime data size information on vexpress platform, will be updating in google docs. * OMAP Setup is made ready to verify runtime size information however multiplatform config is not supported and verification on OMAP is currently pushed down in the priority, updated the blueprint accordingly. * Runtime size information on Snowball/U8500: setup to verify runtime sze information is ready, verifying on 3.7 kernel ==== Plan ==== * continue to work on runtime size information across I.MX and u8500 platform * root cause Ethernet issue ==== Issues ==== 0.5 day leave

12 years, 6 months

[ACTIVITY] (John Stultz) Dec 10-14

by John Stultz

=== Highlights === * Lots of discussions w/ Zach/Deepak on kernel tree mgmt stuff * Minchan sent v3 of his volatile anon vma patch, and I reviewed and pointed out hole in the semantics * Started working on integrating the ashmem driver with the volatile anon vma patch, but its ending up being not as trivial as I hoped. * Community discussions about RTCs and persistent_clock interfaces === Plans === * Continue hacking on the vma/madvise approach to volatile ranges. * Likely more talks w/ Zach :) * Prep for holiday break === Issues === * NA

12 years, 6 months

[ACTIVITY] (Ulf Hansson) 2012-12-08 - 2012-12-14

by Ulf Hansson

== Ulf Hansson == === Highlights === Storage: * Acked patches on mmc-list related to SDIO suspend/resume issues. * Reviewed patches on mmc-list for Idle time BKOPS. * Acked patches on mmc-list for fixing signal voltage switch procedure for UHS mode. * Co-operated with Lee Jones to help out in move away from using the mmci host driver "ios_handler" (ux500 platform callback). * Sent patches for discussion for mmci host driver, especially with regards to power management. Clk: * Acked patches from Mike Turquette on clk framework. === Plans === Storage: * Push patches for mmci host driver to support for UHS cards. * Push patches for mmci host driver to further extend the power management support. * Push patches for mmci host driver to add new features like CMD23 support and more. * Follow up on Idle time BKOPS patches on mmc list. Clk: * Send patch/rfc for clk framework, to make an unsued clk unprepared at late_init. * Add support for new clk-types in abx500 clock driver for the ux500 platform. === Issues === * Been trying for several month to get a hold of eMMC 4.5 device with an SD-card adapter. Extremely important for the storage work in Linaro to fully test eMMC4.5 features. It seems almost impossible. Kind regards Ulf Hansson

12 years, 6 months

Questions For CCI-400's Non-secure Access For big.LITTLE MP

by Leo Yan

hi, i saw Nico's git for the developing the big.LITTLE's cluster power control for MP. In the kernel code. the cluster's first man need enable the CCI's port and snooping for the cluster in non-secure world; In CCI-400's spec, it says need to set the Secure Access Register (0x90008) bit 0 so that we can enable non-secure access to CCI-400 registers. On fast model, i added the code in boot-wrapper to set bit_0 for CCI's Secure Access Register; but after set this bit, the boot-wrapper code cannot change to hypervisor mode successfully. On fast model, can we use CCI's secure access register? Current i use the fast model version is: FE000-KT-00002-r7p1-80rel0.tgz, so if it's related with the fast model's version? Also, could u kindly point out there have boot-swapper's git for reference? -- Thx, Leo Yan

12 years, 6 months

[RFC v2] Add mempressure cgroup

by Anton Vorontsov

The main changes for the mempressure cgroup: - Added documentation, describes APIs and the purpose; - Implemented shrinker interface, this is based on Andrew's idea and supersedes my "balance" level idea; - The shrinker interface comes with a stress-test utility, that is what Andrew was also asking for. A simple app that we can run and see if the thing works as expected; - Added reclaimer's target_mem_cgroup handling; - As promised, added support for multiple listeners, and fixed some other comments on the previous RFC. Just for the reference, the first mempressure RFC: http://lkml.org/lkml/2012/11/28/109 Signed-off-by: Anton Vorontsov <anton.vorontsov(a)linaro.org> --- Documentation/cgroups/mempressure.txt | 89 ++++++ Documentation/cgroups/mempressure_test.c | 209 +++++++++++++ include/linux/cgroup_subsys.h | 6 + include/linux/vmstat.h | 11 + init/Kconfig | 12 + mm/Makefile | 1 + mm/mempressure.c | 488 +++++++++++++++++++++++++++++++ mm/vmscan.c | 4 + 8 files changed, 820 insertions(+) create mode 100644 Documentation/cgroups/mempressure.txt create mode 100644 Documentation/cgroups/mempressure_test.c create mode 100644 mm/mempressure.c diff --git a/Documentation/cgroups/mempressure.txt b/Documentation/cgroups/mempressure.txt new file mode 100644 index 0000000..913accc --- /dev/null +++ b/Documentation/cgroups/mempressure.txt @@ -0,0 +1,89 @@ + Memory pressure cgroup +~~~~~~~~~~~~~~~~~~~~~~~~~~ + Before using the mempressure cgroup, make sure you have it mounted: + + # cd /sys/fs/cgroup/ + # mkdir mempressure + # mount -t cgroup cgroup ./mempressure -o mempressure + + After that, you can use the following files: + + /sys/fs/cgroup/.../mempressure.shrinker +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + The file implements userland shrinker (memory reclaimer) interface, so + that the kernel can ask userland to help with the memory reclaiming + process. + + There are two basic concepts: chunks and chunks' size. The program must + tell the kernel the granularity of its allocations (chunk size) and the + number of reclaimable chunks. The granularity may be not 100% accurate, + but the more it is accurate, the better. I.e. suppose the application + has 200 page renders cached (but not displayed), 1MB each. So the chunk + size is 1MB, and the number of chunks is 200. + + The granularity is specified during shrinker registration (i.e. via + argument to the event_control cgroup file; and it is OK to register + multiple shrinkers for different granularities). The number of + reclaimable chunks is specified by writing to the mempressure.shrinker + file. + + The notification comes through the eventfd() interface. Upon the + notification, a read() from the eventfd returns the number of chunks to + reclaim (free). + + It is assumed that the application will free the specified amount of + chunks before reading from the eventfd again. If that is not the case, + suppose the program was not able to reclaim the chunks, then application + should re-add the amount of chunks by writing to the + mempressure.shrinker file (otherwise the chunks won't be accounted by + the kernel, since it assumes that they were reclaimed). + + Event control: + Used to setup shrinker events. There is only one argument for the + event control: chunk size in bytes. + Read: + Not implemented. + Write: + Writes must be in "<eventfd> <number of chunks>" format. Positive + numbers increment the internal counter, negative numbers decrement it + (but the kernel prevents the counter from falling down below zero). + Test: + See mempressure_test.c + + /sys/fs/cgroup/.../mempressure.level +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + Instead of working on the bytes level (like shrinkers), one may decide + to maintain the interactivity/memory allocation cost. + + For this, the cgroup has memory pressure level notifications, and the + levels are defined like this: + + The "low" level means that the system is reclaiming memory for new + allocations. Monitoring reclaiming activity might be useful for + maintaining overall system's cache level. Upon notification, the program + (typically "Activity Manager") might analyze vmstat and act in advance + (i.e. prematurely shutdown unimportant services). + + The "medium" level means that the system is experiencing medium memory + pressure, there is some mild swapping activity. Upon this event + applications may decide to free any resources that can be easily + reconstructed or re-read from a disk. Note that for a fine-grained + control, you should probably use the shrinker interface, as described + above. + + The "oom" level means that the system is actively thrashing, it is about + to out of memory (OOM) or even the in-kernel OOM killer is on its way to + trigger. Applications should do whatever they can to help the system. + + Event control: + Is used to setup an eventfd with a level threshold. The argument to + the event control specifies the level threshold. + Read: + Reads mempory presure levels: low, medium or oom. + Write: + Not implemented. + Test: + To set up a notification: + + # cgroup_event_listener ./mempressure.level low + ("low", "medium", "oom" are permitted.) diff --git a/Documentation/cgroups/mempressure_test.c b/Documentation/cgroups/mempressure_test.c new file mode 100644 index 0000000..9747fd6 --- /dev/null +++ b/Documentation/cgroups/mempressure_test.c @@ -0,0 +1,209 @@ +/* + * mempressure shrinker test + * + * Copyright 2012 Linaro Ltd. + * Anton Vorontsov <anton.vorontsov(a)linaro.org> + * + * It is pretty simple: we create two threads, the first one constantly + * tries to allocate memory (more than we physically have), the second + * thread listens to the kernel shrinker notifications and frees asked + * amount of chunks. When we allocate more than available RAM, the two + * threads start to fight. Idially, we should not OOM (but if we reclaim + * slower than we allocate, things might OOM). Also, ideally we should not + * grow swap too much. + * + * The test accepts no arguments, so you can just run it and observe the + * output and memory usage (e.g. 'watch -n 0.2 free -m'). Upon ctrl+c, the + * test prints total amount of bytes we helped to reclaim. + * + * Compile with -pthread. + * + * This program is free software; you can redistribute it and/or modify it + * under the terms of the GNU General Public License version 2 as published + * by the Free Software Foundation. + */ + +#define _GNU_SOURCE +#include <stdio.h> +#include <stdlib.h> +#include <stdint.h> +#include <stdbool.h> +#include <unistd.h> +#include <string.h> +#include <sys/types.h> +#include <sys/stat.h> +#include <fcntl.h> +#include <pthread.h> +#include <signal.h> +#include <errno.h> +#include <sys/eventfd.h> +#include <sys/sysinfo.h> + +#define CG "/sys/fs/cgroup/mempressure" +#define CG_EVENT_CONTROL (CG "/cgroup.event_control") +#define CG_SHRINKER (CG "/mempressure.shrinker") + +#define CHUNK_SIZE (1 * 1024 * 1024) + +static size_t num_chunks; + +static void **chunks; +static pthread_mutex_t *locks; +static int efd; +static int sfd; + +static inline void pabort(bool f, int code, const char *str) +{ + if (!f) + return; + perror(str); + printf("(%d)\n", code); + abort(); +} + +static void init_shrinker(void) +{ + int cfd; + int ret; + char *str; + + cfd = open(CG_EVENT_CONTROL, O_WRONLY); + pabort(cfd < 0, cfd, CG_EVENT_CONTROL); + + sfd = open(CG_SHRINKER, O_RDWR); + pabort(sfd < 0, sfd, CG_SHRINKER); + + efd = eventfd(0, 0); + pabort(efd < 0, efd, "eventfd()"); + + ret = asprintf(&str, "%d %d %d\n", efd, sfd, CHUNK_SIZE); + printf("%s\n", str); + pabort(ret == -1, ret, "control string"); + + ret = write(cfd, str, ret + 1); + pabort(ret == -1, ret, "write() to event_control"); +} + +static void add_reclaimable(int chunks) +{ + int ret; + char *str; + + ret = asprintf(&str, "%d %d\n", efd, CHUNK_SIZE); + pabort(ret == -1, ret, "add_reclaimable, asprintf"); + + ret = write(sfd, str, ret + 1); + pabort(ret <= 0, ret, "add_reclaimable, write"); +} + +static int chunks_to_reclaim(void) +{ + uint64_t n = 0; + int ret; + + ret = read(efd, &n, sizeof(n)); + pabort(ret <= 0, ret, "read() from eventfd"); + + printf("%d chunks to reclaim\n", (int)n); + + return n; +} + +static unsigned int reclaimed; + +static void print_stats(int signum) +{ + printf("\nTOTAL: helped to reclaim %d chunks (%d MB)\n", + reclaimed, reclaimed * CHUNK_SIZE / 1024 / 1024); + exit(0); +} + +static void *shrinker_thr_fn(void *arg) +{ + puts("shrinker thread started"); + + sigaction(SIGINT, &(struct sigaction){.sa_handler = print_stats}, NULL); + + while (1) { + unsigned int i = 0; + int n; + + n = chunks_to_reclaim(); + + reclaimed += n; + + while (n) { + pthread_mutex_lock(&locks[i]); + if (chunks[i]) { + free(chunks[i]); + chunks[i] = NULL; + n--; + } + pthread_mutex_unlock(&locks[i]); + + i = (i + 1) % num_chunks; + } + } + return NULL; +} + +static void consume_memory(void) +{ + unsigned int i = 0; + unsigned int j = 0; + + puts("consuming memory..."); + + while (1) { + pthread_mutex_lock(&locks[i]); + if (!chunks[i]) { + chunks[i] = malloc(CHUNK_SIZE); + pabort(!chunks[i], 0, "chunks alloc failed"); + memset(chunks[i], 0, CHUNK_SIZE); + j++; + } + pthread_mutex_unlock(&locks[i]); + + if (j >= num_chunks / 10) { + add_reclaimable(num_chunks / 10); + printf("added %d reclaimable chunks\n", j); + j = 0; + } + + i = (i + 1) % num_chunks; + } +} + +int main(int argc, char *argv[]) +{ + int ret; + int i; + pthread_t shrinker_thr; + struct sysinfo si; + + ret = sysinfo(&si); + pabort(ret != 0, ret, "sysinfo()"); + + num_chunks = (si.totalram + si.totalswap) * si.mem_unit / 1024 / 1024; + + chunks = malloc(sizeof(*chunks) * num_chunks); + locks = malloc(sizeof(*locks) * num_chunks); + pabort(!chunks || !locks, ENOMEM, NULL); + + init_shrinker(); + + for (i = 0; i < num_chunks; i++) { + ret = pthread_mutex_init(&locks[i], NULL); + pabort(ret != 0, ret, "pthread_mutex_init"); + } + + ret = pthread_create(&shrinker_thr, NULL, shrinker_thr_fn, NULL); + pabort(ret != 0, ret, "pthread_create(shrinker)"); + + consume_memory(); + + ret = pthread_join(shrinker_thr, NULL); + pabort(ret != 0, ret, "pthread_join(shrinker)"); + + return 0; +} diff --git a/include/linux/cgroup_subsys.h b/include/linux/cgroup_subsys.h index f204a7a..b9802e2 100644 --- a/include/linux/cgroup_subsys.h +++ b/include/linux/cgroup_subsys.h @@ -37,6 +37,12 @@ SUBSYS(mem_cgroup) /* */ +#if IS_SUBSYS_ENABLED(CONFIG_CGROUP_MEMPRESSURE) +SUBSYS(mpc_cgroup) +#endif + +/* */ + #if IS_SUBSYS_ENABLED(CONFIG_CGROUP_DEVICE) SUBSYS(devices) #endif diff --git a/include/linux/vmstat.h b/include/linux/vmstat.h index 92a86b2..3f7f7d2 100644 --- a/include/linux/vmstat.h +++ b/include/linux/vmstat.h @@ -10,6 +10,17 @@ extern int sysctl_stat_interval; +struct mem_cgroup; +#ifdef CONFIG_CGROUP_MEMPRESSURE +extern void vmpressure(struct mem_cgroup *memcg, + ulong scanned, ulong reclaimed); +extern void vmpressure_prio(struct mem_cgroup *memcg, int prio); +#else +static inline void vmpressure(struct mem_cgroup *memcg, + ulong scanned, ulong reclaimed) {} +static inline void vmpressure_prio(struct mem_cgroup *memcg, int prio) {} +#endif + #ifdef CONFIG_VM_EVENT_COUNTERS /* * Light weight per cpu counter implementation. diff --git a/init/Kconfig b/init/Kconfig index 6fdd6e3..5c308be 100644 --- a/init/Kconfig +++ b/init/Kconfig @@ -826,6 +826,18 @@ config MEMCG_KMEM the kmem extension can use it to guarantee that no group of processes will ever exhaust kernel resources alone. +config CGROUP_MEMPRESSURE + bool "Memory pressure monitor for Control Groups" + help + The memory pressure monitor cgroup provides a facility for + userland programs so that they could easily assist the kernel + with the memory management. This includes simple memory pressure + notifications and a full-fledged userland reclaimer. + + For more information see Documentation/cgroups/mempressure.txt + + If unsure, say N. + config CGROUP_HUGETLB bool "HugeTLB Resource Controller for Control Groups" depends on RESOURCE_COUNTERS && HUGETLB_PAGE && EXPERIMENTAL diff --git a/mm/Makefile b/mm/Makefile index 6b025f8..40cee19 100644 --- a/mm/Makefile +++ b/mm/Makefile @@ -50,6 +50,7 @@ obj-$(CONFIG_MIGRATION) += migrate.o obj-$(CONFIG_QUICKLIST) += quicklist.o obj-$(CONFIG_TRANSPARENT_HUGEPAGE) += huge_memory.o obj-$(CONFIG_MEMCG) += memcontrol.o page_cgroup.o +obj-$(CONFIG_CGROUP_MEMPRESSURE) += mempressure.o obj-$(CONFIG_CGROUP_HUGETLB) += hugetlb_cgroup.o obj-$(CONFIG_MEMORY_FAILURE) += memory-failure.o obj-$(CONFIG_HWPOISON_INJECT) += hwpoison-inject.o diff --git a/mm/mempressure.c b/mm/mempressure.c new file mode 100644 index 0000000..e39a33d --- /dev/null +++ b/mm/mempressure.c @@ -0,0 +1,488 @@ +/* + * Linux VM pressure + * + * Copyright 2012 Linaro Ltd. + * Anton Vorontsov <anton.vorontsov(a)linaro.org> + * + * Based on ideas from Andrew Morton, David Rientjes, KOSAKI Motohiro, + * Leonid Moiseichuk, Mel Gorman, Minchan Kim and Pekka Enberg. + * + * This program is free software; you can redistribute it and/or modify it + * under the terms of the GNU General Public License version 2 as published + * by the Free Software Foundation. + */ + +#include <linux/cgroup.h> +#include <linux/fs.h> +#include <linux/sched.h> +#include <linux/mm.h> +#include <linux/vmstat.h> +#include <linux/eventfd.h> +#include <linux/swap.h> +#include <linux/printk.h> + +static void mpc_vmpressure(struct mem_cgroup *memcg, ulong s, ulong r); + +/* + * Generic VM Pressure routines (no cgroups or any other API details) + */ + +/* + * The window size is the number of scanned pages before we try to analyze + * the scanned/reclaimed ratio (or difference). + * + * It is used as a rate-limit tunable for the "low" level notification, + * and for averaging medium/oom levels. Using small window sizes can cause + * lot of false positives, but too big window size will delay the + * notifications. + * + * The same window size also used for the shrinker, so be aware. It might + * be a good idea to derive the window size from the machine size, similar + * to what we do for the vmstat. + */ +static const uint vmpressure_win = SWAP_CLUSTER_MAX * 16; +static const uint vmpressure_level_med = 60; +static const uint vmpressure_level_oom = 99; +static const uint vmpressure_level_oom_prio = 4; + +enum vmpressure_levels { + VMPRESSURE_LOW = 0, + VMPRESSURE_MEDIUM, + VMPRESSURE_OOM, + VMPRESSURE_NUM_LEVELS, +}; + +static const char *vmpressure_str_levels[] = { + [VMPRESSURE_LOW] = "low", + [VMPRESSURE_MEDIUM] = "medium", + [VMPRESSURE_OOM] = "oom", +}; + +static enum vmpressure_levels vmpressure_level(uint pressure) +{ + if (pressure >= vmpressure_level_oom) + return VMPRESSURE_OOM; + else if (pressure >= vmpressure_level_med) + return VMPRESSURE_MEDIUM; + return VMPRESSURE_LOW; +} + +static ulong vmpressure_calc_level(uint win, uint s, uint r) +{ + ulong p; + + if (!s) + return 0; + + /* + * We calculate the ratio (in percents) of how many pages were + * scanned vs. reclaimed in a given time frame (window). Note that + * time is in VM reclaimer's "ticks", i.e. number of pages + * scanned. This makes it possible to set desired reaction time + * and serves as a ratelimit. + */ + p = win - (r * win / s); + p = p * 100 / win; + + pr_debug("%s: %3lu (s: %6u r: %6u)\n", __func__, p, s, r); + + return vmpressure_level(p); +} + +void vmpressure(struct mem_cgroup *memcg, ulong scanned, ulong reclaimed) +{ + if (!scanned) + return; + mpc_vmpressure(memcg, scanned, reclaimed); +} + +void vmpressure_prio(struct mem_cgroup *memcg, int prio) +{ + if (prio > vmpressure_level_oom_prio) + return; + + /* OK, the prio is below the threshold, send the pre-OOM event. */ + vmpressure(memcg, vmpressure_win, 0); +} + +/* + * Memory pressure cgroup code + */ + +struct mpc_event { + struct eventfd_ctx *efd; + enum vmpressure_levels level; + struct list_head node; +}; + +struct mpc_shrinker { + struct eventfd_ctx *efd; + size_t chunks; + size_t chunk_sz; + struct list_head node; +}; + +struct mpc_state { + struct cgroup_subsys_state css; + + uint scanned; + uint reclaimed; + struct mutex sr_lock; + + struct list_head events; + struct mutex events_lock; + + struct list_head shrinkers; + struct mutex shrinkers_lock; + + struct work_struct work; +}; + +static struct mpc_state *wk2mpc(struct work_struct *wk) +{ + return container_of(wk, struct mpc_state, work); +} + +static struct mpc_state *css2mpc(struct cgroup_subsys_state *css) +{ + return container_of(css, struct mpc_state, css); +} + +static struct mpc_state *tsk2mpc(struct task_struct *tsk) +{ + return css2mpc(task_subsys_state(tsk, mpc_cgroup_subsys_id)); +} + +static struct mpc_state *cg2mpc(struct cgroup *cg) +{ + return css2mpc(cgroup_subsys_state(cg, mpc_cgroup_subsys_id)); +} + +static void mpc_shrinker(struct mpc_state *mpc, ulong s, ulong r) +{ + struct mpc_shrinker *sh; + ssize_t to_reclaim_pages = s - r; + + if (!to_reclaim_pages) + return; + + mutex_lock(&mpc->shrinkers_lock); + + /* + * To make accounting more precise and to avoid excessive + * communication with the kernel, we operate on chunks instead of + * bytes. Say, asking to free 8 KBs makes little sense if + * granularity of allocations is 10 MBs. Also, knowing the + * granularity (chunk size) and the number of reclaimable chunks, + * we just ask that N chunks should be freed, and we assume that + * it will be freed, thus we decrement our internal counter + * straight away (i.e. userland does not need to respond how much + * was reclaimed). But, if userland could not free it, it is + * responsible to increment the counter back. + */ + list_for_each_entry(sh, &mpc->shrinkers, node) { + size_t to_reclaim_chunks; + + if (!sh->chunks) + continue; + + to_reclaim_chunks = to_reclaim_pages * + PAGE_SIZE / sh->chunk_sz; + to_reclaim_chunks = min(sh->chunks, to_reclaim_chunks); + + if (!to_reclaim_chunks) + continue; + + sh->chunks -= to_reclaim_chunks; + + eventfd_signal(sh->efd, to_reclaim_chunks); + + to_reclaim_pages -= to_reclaim_chunks * + sh->chunk_sz / PAGE_SIZE; + if (to_reclaim_pages <= 0) + break; + } + + mutex_unlock(&mpc->shrinkers_lock); +} + +static void mpc_event(struct mpc_state *mpc, ulong s, ulong r) +{ + struct mpc_event *ev; + int level = vmpressure_calc_level(vmpressure_win, s, r); + + mutex_lock(&mpc->events_lock); + + list_for_each_entry(ev, &mpc->events, node) { + if (level >= ev->level) + eventfd_signal(ev->efd, 1); + } + + mutex_unlock(&mpc->events_lock); +} + +static void mpc_vmpressure_wk_fn(struct work_struct *wk) +{ + struct mpc_state *mpc = wk2mpc(wk); + ulong s; + ulong r; + + mutex_lock(&mpc->sr_lock); + s = mpc->scanned; + r = mpc->reclaimed; + mpc->scanned = 0; + mpc->reclaimed = 0; + mutex_unlock(&mpc->sr_lock); + + mpc_shrinker(mpc, s, r); + mpc_event(mpc, s, r); +} + +static void __mpc_vmpressure(struct mpc_state *mpc, ulong s, ulong r) +{ + mutex_lock(&mpc->sr_lock); + mpc->scanned += s; + mpc->reclaimed += r; + mutex_unlock(&mpc->sr_lock); + + if (s < vmpressure_win || work_pending(&mpc->work)) + return; + + schedule_work(&mpc->work); +} + +static void mpc_vmpressure(struct mem_cgroup *memcg, ulong s, ulong r) +{ + /* + * There are two options for implementing cgroup pressure + * notifications: + * + * - Store pressure counter atomically in the task struct. Upon + * hitting 'window' wake up a workqueue that will walk every + * task and sum per-thread pressure into cgroup pressure (to + * which the task belongs). The cons are obvious: bloats task + * struct, have to walk all processes and makes pressue less + * accurate (the window becomes per-thread); + * + * - Store pressure counters in per-cgroup state. This is easy and + * straightforward, and that's how we do things here. But this + * requires us to not put the vmpressure hooks into hotpath, + * since we have to grab some locks. + */ + +#ifdef CONFIG_MEMCG + if (memcg) { + struct cgroup_subsys_state *css = mem_cgroup_css(memcg); + struct cgroup *cg = css->cgroup; + struct mpc_state *mpc = cg2mpc(cg); + + if (mpc) + __mpc_vmpressure(mpc, s, r); + return; + } +#endif + task_lock(current); + __mpc_vmpressure(tsk2mpc(current), s, r); + task_unlock(current); +} + +static struct cgroup_subsys_state *mpc_create(struct cgroup *cg) +{ + struct mpc_state *mpc; + + mpc = kzalloc(sizeof(*mpc), GFP_KERNEL); + if (!mpc) + return ERR_PTR(-ENOMEM); + + mutex_init(&mpc->sr_lock); + mutex_init(&mpc->events_lock); + mutex_init(&mpc->shrinkers_lock); + INIT_LIST_HEAD(&mpc->events); + INIT_LIST_HEAD(&mpc->shrinkers); + INIT_WORK(&mpc->work, mpc_vmpressure_wk_fn); + + return &mpc->css; +} + +static void mpc_destroy(struct cgroup *cg) +{ + struct mpc_state *mpc = cg2mpc(cg); + + kfree(mpc); +} + +static ssize_t mpc_read_level(struct cgroup *cg, struct cftype *cft, + struct file *file, char __user *buf, + size_t sz, loff_t *ppos) +{ + struct mpc_state *mpc = cg2mpc(cg); + uint level; + const char *str; + + mutex_lock(&mpc->sr_lock); + + level = vmpressure_calc_level(vmpressure_win, + mpc->scanned, mpc->reclaimed); + + mutex_unlock(&mpc->sr_lock); + + str = vmpressure_str_levels[level]; + return simple_read_from_buffer(buf, sz, ppos, str, strlen(str)); +} + +static int mpc_register_level_event(struct cgroup *cg, struct cftype *cft, + struct eventfd_ctx *eventfd, + const char *args) +{ + struct mpc_state *mpc = cg2mpc(cg); + struct mpc_event *ev; + int lvl; + + for (lvl = 0; lvl < VMPRESSURE_NUM_LEVELS; lvl++) { + if (!strcmp(vmpressure_str_levels[lvl], args)) + break; + } + + if (lvl >= VMPRESSURE_NUM_LEVELS) + return -EINVAL; + + ev = kzalloc(sizeof(*ev), GFP_KERNEL); + if (!ev) + return -ENOMEM; + + ev->efd = eventfd; + ev->level = lvl; + + mutex_lock(&mpc->events_lock); + list_add(&ev->node, &mpc->events); + mutex_unlock(&mpc->events_lock); + + return 0; +} + +static void mpc_unregister_event(struct cgroup *cg, struct cftype *cft, + struct eventfd_ctx *eventfd) +{ + struct mpc_state *mpc = cg2mpc(cg); + struct mpc_event *ev; + + mutex_lock(&mpc->events_lock); + list_for_each_entry(ev, &mpc->events, node) { + if (ev->efd != eventfd) + continue; + list_del(&ev->node); + kfree(ev); + break; + } + mutex_unlock(&mpc->events_lock); +} + +static int mpc_register_shrinker(struct cgroup *cg, struct cftype *cft, + struct eventfd_ctx *eventfd, + const char *args) +{ + struct mpc_state *mpc = cg2mpc(cg); + struct mpc_shrinker *sh; + ulong chunk_sz; + int ret; + + ret = kstrtoul(args, 10, &chunk_sz); + if (ret) + return ret; + + sh = kzalloc(sizeof(*sh), GFP_KERNEL); + if (!sh) + return -ENOMEM; + + sh->efd = eventfd; + sh->chunk_sz = chunk_sz; + + mutex_lock(&mpc->shrinkers_lock); + list_add(&sh->node, &mpc->shrinkers); + mutex_unlock(&mpc->shrinkers_lock); + + return 0; +} + +static void mpc_unregister_shrinker(struct cgroup *cg, struct cftype *cft, + struct eventfd_ctx *eventfd) +{ + struct mpc_state *mpc = cg2mpc(cg); + struct mpc_shrinker *sh; + + mutex_lock(&mpc->shrinkers_lock); + list_for_each_entry(sh, &mpc->shrinkers, node) { + if (sh->efd != eventfd) + continue; + list_del(&sh->node); + kfree(sh); + break; + } + mutex_unlock(&mpc->shrinkers_lock); +} + +static int mpc_write_shrinker(struct cgroup *cg, struct cftype *cft, + const char *str) +{ + struct mpc_state *mpc = cg2mpc(cg); + struct mpc_shrinker *sh; + struct eventfd_ctx *eventfd; + struct file *file; + ssize_t chunks; + int fd; + int ret; + + ret = sscanf(str, "%d %zd\n", &fd, &chunks); + if (ret != 2) + return -EINVAL; + + file = fget(fd); + if (!file) + return -EBADF; + + eventfd = eventfd_ctx_fileget(file); + + mutex_lock(&mpc->shrinkers_lock); + + /* Can avoid the loop once we introduce ->priv for eventfd_ctx. */ + list_for_each_entry(sh, &mpc->shrinkers, node) { + if (sh->efd != eventfd) + continue; + if (chunks < 0 && abs(chunks) > sh->chunks) + sh->chunks = 0; + else + sh->chunks += chunks; + break; + } + + mutex_unlock(&mpc->shrinkers_lock); + + eventfd_ctx_put(eventfd); + fput(file); + + return 0; +} + +static struct cftype mpc_files[] = { + { + .name = "level", + .read = mpc_read_level, + .register_event = mpc_register_level_event, + .unregister_event = mpc_unregister_event, + }, + { + .name = "shrinker", + .register_event = mpc_register_shrinker, + .unregister_event = mpc_unregister_shrinker, + .write_string = mpc_write_shrinker, + }, + {}, +}; + +struct cgroup_subsys mpc_cgroup_subsys = { + .name = "mempressure", + .subsys_id = mpc_cgroup_subsys_id, + .create = mpc_create, + .destroy = mpc_destroy, + .base_cftypes = mpc_files, +}; diff --git a/mm/vmscan.c b/mm/vmscan.c index 48550c6..d8ff846 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -1877,6 +1877,9 @@ restart: shrink_active_list(SWAP_CLUSTER_MAX, lruvec, sc, LRU_ACTIVE_ANON); + vmpressure(sc->target_mem_cgroup, + sc->nr_scanned - nr_scanned, nr_reclaimed); + /* reclaim/compaction might need reclaim to continue */ if (should_continue_reclaim(lruvec, nr_reclaimed, sc->nr_scanned - nr_scanned, sc)) @@ -2099,6 +2102,7 @@ static unsigned long do_try_to_free_pages(struct zonelist *zonelist, count_vm_event(ALLOCSTALL); do { + vmpressure_prio(sc->target_mem_cgroup, sc->priority); sc->nr_scanned = 0; aborted_reclaim = shrink_zones(zonelist, sc); -- 1.8.0

12 years, 6 months

(Dong Aisheng) 2012-11-04 - 2012-12-09

by Dong Aisheng

=== Highlights === * KVM/ARM VGIC state saving and restoring implementation. Did some discussion with Perter Maydell about the implementation and some various issues. Decided to try ONE_REG interface first based on his suggestion. Referred to Cache simulation code according to Perter's suggestion for banked register solution. Still digging whether it's safe to not save some gic virtual interface control registers since we do not want user level to know virtualization accel hw features in under layer . * Prepared an IMX6 flexcan support patch series. === Issues === * Have to take about two days leave(maybe more) in the coming days this week to with my family to go to hospital

12 years, 6 months

Jump to page:

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

linaro-kernel