On Tue, Oct 10, 2017 at 10:31 AM, Julia Lawall julia.lawall@lip6.fr wrote:
On Tue, 10 Oct 2017, Levin, Alexander (Sasha Levin) wrote:
(Cc'ed Julia)
On Mon, Oct 09, 2017 at 09:33:01AM -0700, Laura Abbott wrote:
On 10/06/2017 08:10 PM, Levin, Alexander (Sasha Levin) wrote:
We are experimenting with using neural network to aid with patch selection for stable kernel trees. There are quite a few commits that were not marked for stable, but are stable material, and we're trying to get them into their appropriate kernel trees.
Apart from the practical which has been covered, I'd be interested in hearing about the details of how this works if you can share them.
This work is based on Julia's work (https://soarsmu.github.io/papers/icse12-patch.pdf) to identify commits that fix bugs.
Essentially, my approach to this is to extract as much information as possbile form the commit, including things such as:
- How many times a certain word appeared in the message
- Who is the author
- Code metrics
- etc
In my case, I end up with about 30,000 of these "inputs", and train a neural network based on whether a given commit was included in a stable tree or not.
This approach has a few drawbacks compared to the one Julia described in her paper:
- Not every bug fixing commit ends up in stable (some end up in -rc
fixing a bug from the current merge window).
- Same as above, but for commits we miss and fail to add to stable.
- Sometimes commits get added to stable even though they don't follow
the rules at all (security fixes are a simple example).
But it does seem to be effective at finding bug fixing commits that should be in stable.
At this stage we are still trying to figure out what a "bug fixing" commit really is. For example, an observation we recently made was that the code metrics actually don't have much weight in determining whether a commit should be in stable or not.
As we just started, I'm still experimenting with a few approaches, and I belive Julia is waiting for a new student to take over this, so we don't have any big insights to share just yet :)
That's a good summary of the current status. Thanks!
julia
I just started noticing the AUTOSEL tags yesterday and I think that's a great idea to tag patches, but was there any thought to also putting something in the commit message this way they're easily identifiable in the git logs? I think it would be useful if there was some metadata in the commit message which identified that it was selected through some automated system. That way if I find a regression and it identifies one of these commits I can know that maybe it was chosen incorrectly, and also would allow me to alert the owner of the selection script to better help refine its selection process. Otherwise I'd have to track back through the mailing lists to see how it landed in the stable release.
Just a thought. Also, thank you for trying to improve the stable kernels!