After a big move and other things, finally I can focus on the Linux
work. Now I have a high-speed internet in my new place, so using
'mumble' for conferencing is no problem any longer.
== Highlights ==
* We've got some 'looks fine' feedback on the userland LMK, and that's
reassuring.
The "bad news" is that there's not much of enthusiasm overall from
Android folks. That's understandable as kernel LMK driver works and
already in mainline^Wstaging kernel, so why bother. Well, it can't
live in staging/ forever, so we'd better hurry up.
* ulmkd's Makefile is again suitable for GNU/Linux builds
(as an addition to Android/Linux). This makes it easier for me
to test, plus maybe there we'll be other users for the daemon.
* for_each_process and task->mm fixes finally merged into -mm.
I will need a small documentation update for the series, but
overall the series seem to be fine.
* Prepared a few fixes for the memcg slab accounting. The proposed
slab accounting feature looks like exactly what was needed, except
that it doesn't account slab for the root cgroup. If that's not
a design decision, then it can be improved. If not, there are
two ways: a) drop cgroups support and go solely w/ vmevent
infrastructure b) try to push something like 'memory.available'
attribute for memcg. 'a)' is easy, and 'b)' is probably what I'll
try to implement tomorrow. Once implemented, we'll have all
options ready, and so can mark cgroups as either fully suitable
for lowmem notifications or not suitable by design.
== Plans ==
* I wonder if I need to make a deep-dive into Android build system
and try to integrate ulmkd into Android image myself?
* Back to interactive governor improvements? Well, as far as I
recall, the story behind interactive governor is very similar to
LMK: nobody likes the cpufreq overall, and want generic power
management improvements for the scheduler. At least, we need to
get 'interactive vs. ondemand' cpufreq latency numbers. That
would be a good starting point for any other improvements. And
the problem with cpufreq latency measurements was that it takes
ages for the benchmark to complete.
--
Anton Vorontsov
Email: cbouatmailru(a)gmail.com
Hello Glauber,
On Fri, Apr 20, 2012 at 06:57:08PM -0300, Glauber Costa wrote:
> This is my current attempt at getting the kmem controller
> into a mergeable state. IMHO, all the important bits are there, and it should't
> change *that* much from now on. I am, however, expecting at least a couple more
> interactions before we sort all the edges out.
>
> This series works for both the slub and the slab. One of my main goals was to
> make sure that the interfaces we are creating actually makes sense for both
> allocators.
>
> I did some adaptations to the slab-specific patches, but the bulk of it
> comes from Suleiman's patches. I did the best to use his patches
> as-is where possible so to keep authorship information. When not possible,
> I tried to be fair and quote it in the commit message.
>
> In this series, all existing caches are created per-memcg after its first hit.
> The main reason is, during discussions in the memory summit we came into
> agreement that the fragmentation problems that could arise from creating all
> of them are mitigated by the typically small quantity of caches in the system
> (order of a few megabytes total for sparsely used caches).
> The lazy creation from Suleiman is kept, although a bit modified. For instance,
> I now use a locked scheme instead of cmpxcgh to make sure cache creation won't
> fail due to duplicates, which simplifies things by quite a bit.
>
> The slub is a bit more complex than what I came up with in my slub-only
> series. The reason is we did not need to use the cache-selection logic
> in the allocator itself - it was done by the cache users. But since now
> we are lazy creating all caches, this is simply no longer doable.
>
> I am leaving destruction of caches out of the series, although most
> of the infrastructure for that is here, since we did it in earlier
> series. This is basically because right now Kame is reworking it for
> user memcg, and I like the new proposed behavior a lot more. We all seemed
> to have agreed that reclaim is an interesting problem by itself, and
> is not included in this already too complicated series. Please note
> that this is still marked as experimental, so we have so room. A proper
> shrinker implementation is a hard requirement to take the kmem controller
> out of the experimental state.
>
> I am also not including documentation, but it should only be a matter
> of merging what we already wrote in earlier series plus some additions.
The patches look great, thanks a lot for your work!
I finally tried them, and after a few fixes the kmem accounting
seems to work fine with slab. The fixes will follow this email,
and if they're fine, feel free to fold them into your patches.
However, with slub I'm getting kernel hangs and various traces[1].
It seems that kernel memcg recurses when trying to call
memcg_create_cache_enqueue() -- it calls kmalloc_no_account()
which was introduced to not recurse into memcg, but looking
into 'slub: provide kmalloc_no_account' patch, I don't see
any difference between _no_account and ordinary kmalloc. Hm.
OK, slub apart... the accounting works with slab, which is great.
There's another, more generic question: is there any particular
reason why you don't want to account slab memory for root cgroup?
Personally I'm interested in kmem accounting because I use
memcg for lowmemory notifications. I'm installing events
on the root's memory.usage_in_bytes, and the thresholds values
are calculated like this:
total_ram - wanted_threshold
So, if we want to get a notification when there's 64 MB memory
left on a 256 MB machine, we'd install an event on the 194 MB
mark (the good thing about usage_in_bytes, is that it does
account file caches, so the formula is simple).
Obviously, without kmem accounting the formula can be very
imprecise when kernel (e.g. hw drivers) itself start using a
lot of memory. With root's slab accounting the problem
would be solved, but for some reason you deliberately do not
want to account it for root cgroup. I suspect that there are
some performance concerns?..
Thanks,
[1]
BUG: unable to handle kernel paging request at ffffffffb2e80900
IP: [<ffffffff8105940c>] check_preempt_wakeup+0x3c/0x210
PGD 160d067 PUD 1611063 PMD 0
Thread overran stack, or stack corrupted
Oops: 0000 [#1] SMP
CPU 0
Pid: 943, comm: bash Not tainted 3.4.0-rc4+ #34 Bochs Bochs
RIP: 0010:[<ffffffff8105940c>] [<ffffffff8105940c>] check_preempt_wakeup+0x3c/0x210
RSP: 0018:ffff880006305ee8 EFLAGS: 00010006
RAX: 00000000000109c0 RBX: ffff8800071b4e20 RCX: ffff880006306000
RDX: 0000000000000000 RSI: 0000000006306028 RDI: ffff880007c109c0
RBP: ffff880006305f28 R08: 0000000000000000 R09: 0000000000000001
R10: 0000000000000000 R11: 0000000000000000 R12: ffff880007c109c0
R13: ffff88000644ddc0 R14: ffff8800071b4e68 R15: 0000000000000000
FS: 00007fad1244c700(0000) GS:ffff880007c00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffffffffb2e80900 CR3: 00000000063b8000 CR4: 00000000000006b0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process bash (pid: 943, threadinfo ffff880006306000, task ffff88000644ddc0)
Stack:
0000000000000000 ffff88000644de08 ffff880007c109c0 ffff880007c109c0
ffff8800071b4e20 0000000000000000 0000000000000000 0000000000000000
ffff880006305f48 ffffffff81053304 ffff880007c109c0 ffff880007c109c0
Call Trace:
Code: 76 48 41 55 41 54 49 89 fc 53 48 89 f3 48 83 ec 18 4c 8b af e0 07 00 00 49 8d 4d 48 48 89 4d c8 49 8b 4d 08 4c 3b 75 c8 8b 71 18 <48> 8b 34 f5 c0 07 65 81 48 8b bc 30 a8 00 00 00 8b 35 3a 3f 5c
RIP [<ffffffff8105940c>] check_preempt_wakeup+0x3c/0x210
RSP <ffff880006305ee8>
CR2: ffffffffb2e80900
---[ end trace 78fa9c86bebb1214 ]---
--
Anton Vorontsov
Email: cbouatmailru(a)gmail.com
=== Highlights ===
* Interview with Deepak and one-on-one introductory discussion
- Deepak pointed me to relevant linaro WIKI's with appropriate usage
information
* Discussion with lee jones and Deepak about the DT Work. ab8500 power has
been assigned
* OnBoarding is nearing completion
* Spent some time to read through DT documents
* Lee jones supported me to get the build and test readiness, Setup to
start the work is ready.
* Attended session on "platform perimeter" hosted by Linus Walleij
- http://www.df.lth.se/~triad/papers/ESC-400Slides_Walleij.pdf
* Spent good length of time with niklas to get the DeviceTree work
transferred
=== Plans ===
* Complete ab8500 power DeviceTree assignments
* Spend some time on DT Spec study
* IT/Admin work to carryout in order to get the new laptop with UBUNTU
distro.
* Complete the pending onboard activity
=== Issues ===
* 2011-09 version of linaro-media-create python application did not succeed
in
preparing bundled image for flashing, so migrated to 2012.04-1, found
issue is
fixed.
https://launchpad.net/ubuntu/+source/linaro-image-tools/2012.04-1/+build/34…
=== Highlights ===
* Tested Rafael's wakelock interface patches. Found a bug and sent a
fix, which he included.
* Submitted the volatile ranges patch for inclusion. Got some minor
feedback. Dave Chinner suggested I rework the patch so that it uses
fallocate rather then fadvise. I pushed back a bit to make sure that is
a consensus opinion, but will likely try to switch things over next week.
* After getting positive feedback from Arve, on my patch to convert
ashmem to use wakeup sources instead of the stubbed out wakelocks, I
submitted it and Greg included it into staging-next for 3.5
* Got a small RTC null pointer fix merged into tip/timers/urgent for 3.4
* Pinged the Android team on Anton's ulmkd proposal, got some
interesting feedback, and no outright objections.
* Submitted a talk to linux plumbers
* Reviewed some patches to introduce CLOCK_TAI functionality. Queued a
few community cleanups.
=== Plans ===
* Rework volatile ranges to use fallocate & resubmit to lkml
=== Issues ===
NA
== Linus Walleij linusw ==
=== Highlights ===
* pinctrl fixes sent and pulled by Torvalds.
* Reviewed and applied a bunch of pinctrl patches from Stephen,
Dong, and a newcomer: John Crispin from OpenWRT. We have
good traction in this subsystem.
* Sent pull requests for ux500-core, ux500-devicetree and
ux500-gpio-pins to bring the machine up to date with the recent
patches. Not yet pulled into ARM SoC
* Brought ux500 pinctrl forward, trying to find a smooth transition
from the existing codebase to a pinctrl-based approach. This is
taking some time since it requires quite a bit of upfront code.
* Reviewing device tree from Lee Jones.
* Russell accepted PL190 VIC cleanups and applied them.
* Discussing power domain issues. Andrea and our internal
power management team as well as Vincent Guittot from Linaro
PM are all involved. We might need to propose modifications to
runtime PM.
* Ran the ESC platform perimeter presentation twice internally
discussing and presenting new features in kernels 3.0 thru 3.4+
roughly. It appears to have been appreciated.
http://www.df.lth.se/~triad/papers/ESC-400Slides_Walleij.pdf
=== Plans ===
* Review incoming pinctrl patches.
* Convert ux500 (gpio-nomadik) to pinctrl
for the kernel 3.5 timeframe.
* Test the PL08x patches on the Ericsson Research
PB11MPCore and submit platform data for using
pl08x DMA on that platform.
* Look into other Ux500 stuff in need of mainlining...
like
- Ux500 clocks (looks like a new assignee might look into this)
- the HWMON stuff.
=== Issues ===
* Reviewing-and applying pinctrl and ux500 patches cuts
into ux500 pin control development and disturbs focus.
Thanks,
Linus Walleij
== Deepak Saxena <dsaxena> ==
=== Highlights ===
* Researching new roadmap cards
* Talk on "One zImage to Rule them All" accepted to LinuCon Japan!
* Doing lots of back research on single zImage. Focusing on
information research right now,
going through all the emails I can find on the topic back to last
May, chasing down discussions,
digging through the kernel, syncing up with Nico and Arnd as
needed, and getting my head
around it all. My goal is to have a clear idea of what is left to
do and also a plan of what
we should focus on at Connect.
* 1:1's
* Interviewed Rajinakanth from ST-E for KWG
* Synced up with Scott and Vicki on training sessions at Connect.
=== Plans ===
* Usual 1:1's and meetings
* ST-E DT sync up with Raj and Lee
* Connect training planning
* Interviewing Ulf from ST-E for KWG position
* Continue work on new roadmap cards
* Continue single zImage deep dive
* Dig out my timex.h patches if I have time
=== Issues ===
* None at the time
=== Travel/Time Off ===
* Possibly out Monday, April 30th.
* Off May 4th and following week, back to work on Monday the 13th.
Will have no interwebs and limited phone access.
* Connect Q2, with follow up travel to Linux Con Japan the week after
and going to Austin before Connect to sync up with Mounir.
=== Highlights ===
* Sent the mach-mxs common clk conversion series based on v3.4-rc3 plus
various fixup/cleanup patches that Mike confirmed pick-up on his
-next branch and the clkdev registration patch.
* Reviewed imx pinctrl patch from Dong Aisheng.
* Added pinctrl support for mach-mxs in using a different binding from
imx. Discussing the pros and cros between these two the bindings.
* Reviewed initial spi-mxs support patch from Fabio.
* Reviewed the mach-mxs changes of Marek's mxs usb series, and push him
to use device tree.
* Reviewed mxs-dma device tree support series from Dong Aisheng.
* Reviewed gpmi-nand device tree support series from Huang Shijie.
* Reviewed and merged the initial imx6q-sabresd support patch from
Fabio.
* Reviewed imx5/6 cpuidle patches from Rob Lee.
--
Regards,
Shawn
=== Pinctrl ===
* sent two patch to improve pinctrl core debug information
* sent out pinctrl-imx v2 patch
Addressed many comments from Stephen, Sascha and etc.
The main changes are allow pin based config.
Still not received comments.
* Sent out handling pinctrl dummy state in core patch.
Got a lot discussions, the finally decision would be provide a common
api for platforms to call
to decide whether it needs using dummy state.
* sent out a patch series to convert mxs-dma driver to be more generic to
support both mxs and mx6
and add dt support
=== Plan ===
* send pincgtrl handling dummy state v2 patch, then we can start convert
drivers to use pinctrl
without breaking exist code.
* pinctrl handle dummy gpio support which avoid breaking exist code when
using pinctrl gpio support
* pinctrl-imx gpio support which depends on pinctrl handle dummy gpio patch
...otherwise the we're getting the wrong numbers in usage_in_bytes.
On Mon, Mar 05, 2012 at 09:19:34AM +0900, KAMEZAWA Hiroyuki wrote:
[...]
> > diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> > index 228d646..c8abdc5 100644
> > --- a/mm/memcontrol.c
> > +++ b/mm/memcontrol.c
> > @@ -3812,6 +3812,9 @@ static inline u64 mem_cgroup_usage(struct mem_cgroup *memcg, bool swap)
> >
> > val = mem_cgroup_recursive_stat(memcg, MEM_CGROUP_STAT_CACHE);
> > val += mem_cgroup_recursive_stat(memcg, MEM_CGROUP_STAT_RSS);
> > + val += mem_cgroup_recursive_stat(memcg, MEM_CGROUP_STAT_FILE_MAPPED);
> >
> > 1. Is there any particular reason we don't currently account file mapped
> > memory in usage_in_bytes?
> >
> > To me, MEM_CGROUP_STAT_FILE_MAPPED hunk seems logical even if we
> > don't use it for lowmemory notifications.
> >
> > Plus, it seems that FILE_MAPPED _is_ accounted for the non-root
> > cgroups, so I guess it's clearly a bug for the root memcg?
>
> CACHE includes all file caches. Why do you think FILE_MAPPED is not included in CACHE ?
There were tons of changes in the memcg lately, but I believe the issue
is still there.
For example, looking into this code flow:
-> page_add_file_rmap() (mm/rmap.c)
-> mem_cgroup_inc_page_stat(page, MEMCG_NR_FILE_MAPPED) (include/linux/memcontrol.h)
-> void mem_cgroup_update_page_stat(page, MEMCG_NR_FILE_MAPPED, 1) (mm/memcontrol.c)
And then:
void mem_cgroup_update_page_stat(struct page *page,
enum mem_cgroup_page_stat_item idx, int val)
{
...
switch (idx) {
case MEMCG_NR_FILE_MAPPED:
idx = MEM_CGROUP_STAT_FILE_MAPPED;
break;
default:
BUG();
}
this_cpu_add(memcg->stat->count[idx], val);
...
}
So, clearly, this function only bothers updating _FILE_MAPPED only,
leaving _CACHE alone.
If you're saying that _CACHE meant to include _FILE_MAPPED, then
I guess the patch down below would be a proper fix then... Otherwise
we need to be consistent on stats reporting, and either fall-back
to my original fix (in mem_cgroup_usage()), or think about doing it
some other way...
Signed-off-by: Anton Vorontsov <anton.vorontsov(a)linaro.org>
---
The patch is against current -next.
Thanks,
mm/memcontrol.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 884e936..760ecf5 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -1958,6 +1958,8 @@ void mem_cgroup_update_page_stat(struct page *page,
switch (idx) {
case MEMCG_NR_FILE_MAPPED:
+ idx = MEM_CGROUP_STAT_CACHE;
+ this_cpu_add(memcg->stat->count[idx], val);
idx = MEM_CGROUP_STAT_FILE_MAPPED;
break;
default:
--
1.7.9.2
=== Highlights ===
* Reworked the Android ashmem code so it uses my fadvise volatile work
* Handled a few issues for Andrey w/ the linaro-android-3.4 tree.
* Synced up with gregkh and tglx, then sent the Android Alarm timer
mending patches to staging for 3.5, and got them merged.
* Converted Android Alarm driver to using wakeup sources instead of
wakelocks, sent out an RFC to lkml and the Android team
* Spent more time looking at Rafael's userland wakelock api.
* Worked on reported RTC regression. Looks to be a hardware quirk, but
still waiting on feedback from the owner.
* Dug in a bit on a community clocksource watchdog issue. Still no
solution yet.
* Gave Deepak feedback on upstream tracking ideas.
* Did bi-weekly Android upstreaming subteam email
=== Plans ===
* Haven't gotten much feedback recently, so I'm planning to submit
volatile range code for inclusion. This likely will get detractors to
come out of the woodwork. But hey, maybe I'll get lucky and it will
just go in?
* See if Rafael's userland wakelock api can be readied for 3.5
=== Issues ===
* NA.