Re: [lng-odp] Runtime inlining

10 Nov 2015

      On 10-11-2015 10:04, Grant Likely wrote:
...
On Tue, Nov 10, 2015 at 11:08 AM, Maxim Uvarov maxim.uvarov@linaro.org wrote:
...
On 10 November 2015 at 13:41, Zoltan Kiss zoltan.kiss@linaro.org wrote:
...
On 10/11/15 07:39, Maxim Uvarov wrote:
...
And it looks like it's problem in OVS, not in ODP. I.e. OVS should allow
to  use library functions for fast path (where inlines are critical).
I.e. not just call odp_packet_len(),  but move hole OVS function to
dynamic library.
I'm not sure I get your point here, but OVS allows to use dynamic library
functions on fast path. The problem is that it's slow, because of the
function call overhead.
I'm not familiar with ovs code. But for example ovs has something like:
ovs_get_and_packet_process()
{
// here you use some inlines:
  pkt = odp_recv();
  len = odp_packet_len(pkt);
... etc.
}
So it's clear for each target arch you needs it's own variant of
ovs_get_and_packet_process() function. That function should go from ovs to
dynamic library.
Which library? A library specific to OVS? Or some common ODP library
that everyone uses? In either case the solution is not scalable. In
the first case it still requires the app vendor to have a separate
build for each and every supported target. In the second, it is
basically argues for all fast-path application-specific code to go
into a non-app-specific library. That really won't fly.
I have two answers to this question. One for the short term, and one
for the long.
In the short term we have no choice. If we're going to support
portable application binaries, then we cannot do inlines. ODP simply
isn't set up to support that. Portable binaries will have to take the
hit of doing a function call each and every time. It's not fast, but
it *works*, which at least will set a lowest common denominator. To
mitigate the problem we could encourage application packages to
include a generic version (no-inlines, but works everywhere) plus one
or more optimized builds (with inlines) and the correct binary is
selected at runtime. Not great, but it is a reasonable answer for the
short term.
For the long term to get away from per-platform builds, I see two
viable options. Bill suggested the first: Use LLVM to optimize at
runtime so that thing like inlines get picked up when linked to the
platform library. There is some precedence of other projects already
doing this, so this isn't as far fetched as it may seem. The second is
to do what we already do in the kernel for ftrace: instrument the
function calls and runtime patch them with optimized inlines. Not
pretty, probably fragile, but we do have the knowledge from the kernel
of how to do it. All said, I would prefer an LLVM based solution, but
investigation is needed to figure out how to make it work.
The LLVM JIT approach will require a  lot of engineer work from ODP side. 
Currently LLVM provides two JIT engines: the MCJIT and the ORC 
(which is new on LLVM 3.7).
The MCJIT work on 'modules': the programs can either pass a C or IR file or
use the API to create a module with multiple functions.  The JIT engine will
then build and create a ELF module that will be loaded in process address
VMA. It is essentially an AOT JIT.
The ORC stands for 'On Request Compilation' and it differ than MCJIT is
aiming to lazy compilation using indirection hooks.  The function won't be
JITted until is is called. [1]
In any case you won't have inline speed if you decide to just JIT the
inline calls, it will still be an indirection calls to the JIT functions.
Neither supports patchpoints, which was the kernel does to dynamically
change the code to patch for specific instructions.
If you want to actually dynamic change the code you can try the
DynamicRIO [2] project that aims to provide an API to do so.  However
it is aimed for instrumentation, so I am not sure how well it plays with
performance-wise projects.
I would suggest instead of focus on dynamic code generation for such
inlines, to work on more general functions that are actually called
through either PLT or indirections and crate runtime dispatch based
on runtime.
You can follow the GCC strategy to do indirection calls 
(the __builtin_supports('') which openssl emulates as well) or since
it is a library to use IFUNC on the PLT calls (like GLIBC does with
memory and math operations).  With current GCC you can build different
versions of the same function and add a IFUNC dispatch to select the
best one at runtime.
[1] http://article.gmane.org/gmane.comp.compilers.llvm.devel/80639
[2] http://www.dynamorio.org/
...
g.
...
...
...
On 10 November 2015 at 02:50, Bill Fischofer <bill.fischofer@linaro.org
mailto:bill.fischofer@linaro.org> wrote:
Adding Grant Likely to this chain as it relates to the broader
subject of portable ABIs that we've been discussing.

On Mon, Nov 9, 2015 at 4:48 PM, Jim Wilson <jim.wilson@linaro.org
<mailto:jim.wilson@linaro.org>> wrote:

    On Mon, Nov 9, 2015 at 2:39 PM, Bill Fischofer
    <bill.fischofer@linaro.org <mailto:bill.fischofer@linaro.org>>
    wrote:
    > The IO Visor project appears to be doing something like this

with LLVM and
        > JIT constructs to dynamically insert code into the kernel in a
        > platform-independent manner. Perhaps we can leverage that
technology?
    GCC has some experimental JIT support, but I think it would be a

lot
        of work to use it, and I don't know how stable it is.
        https://gcc.gnu.org/wiki/JIT
        The LLVM support is probably more advanced.
    Jim

_______________________________________________
lng-odp mailing list
lng-odp@lists.linaro.org <mailto:lng-odp@lists.linaro.org>
https://lists.linaro.org/mailman/listinfo/lng-odp

lng-odp mailing list
lng-odp@lists.linaro.org
https://lists.linaro.org/mailman/listinfo/lng-odp

lng-odp mailing list
lng-odp@lists.linaro.org
https://lists.linaro.org/mailman/listinfo/lng-odp

linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
https://lists.linaro.org/mailman/listinfo/linaro-toolchain

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

Re: [lng-odp] Runtime inlining