On 10-11-2015 10:04, Grant Likely wrote:
On Tue, Nov 10, 2015 at 11:08 AM, Maxim Uvarov maxim.uvarov@linaro.org wrote:
On 10 November 2015 at 13:41, Zoltan Kiss zoltan.kiss@linaro.org wrote:
On 10/11/15 07:39, Maxim Uvarov wrote:
And it looks like it's problem in OVS, not in ODP. I.e. OVS should allow to use library functions for fast path (where inlines are critical). I.e. not just call odp_packet_len(), but move hole OVS function to dynamic library.
I'm not sure I get your point here, but OVS allows to use dynamic library functions on fast path. The problem is that it's slow, because of the function call overhead.
I'm not familiar with ovs code. But for example ovs has something like:
ovs_get_and_packet_process() { // here you use some inlines: pkt = odp_recv(); len = odp_packet_len(pkt);
... etc.
}
So it's clear for each target arch you needs it's own variant of ovs_get_and_packet_process() function. That function should go from ovs to dynamic library.
Which library? A library specific to OVS? Or some common ODP library that everyone uses? In either case the solution is not scalable. In the first case it still requires the app vendor to have a separate build for each and every supported target. In the second, it is basically argues for all fast-path application-specific code to go into a non-app-specific library. That really won't fly.
I have two answers to this question. One for the short term, and one for the long.
In the short term we have no choice. If we're going to support portable application binaries, then we cannot do inlines. ODP simply isn't set up to support that. Portable binaries will have to take the hit of doing a function call each and every time. It's not fast, but it *works*, which at least will set a lowest common denominator. To mitigate the problem we could encourage application packages to include a generic version (no-inlines, but works everywhere) plus one or more optimized builds (with inlines) and the correct binary is selected at runtime. Not great, but it is a reasonable answer for the short term.
For the long term to get away from per-platform builds, I see two viable options. Bill suggested the first: Use LLVM to optimize at runtime so that thing like inlines get picked up when linked to the platform library. There is some precedence of other projects already doing this, so this isn't as far fetched as it may seem. The second is to do what we already do in the kernel for ftrace: instrument the function calls and runtime patch them with optimized inlines. Not pretty, probably fragile, but we do have the knowledge from the kernel of how to do it. All said, I would prefer an LLVM based solution, but investigation is needed to figure out how to make it work.
The LLVM JIT approach will require a lot of engineer work from ODP side. Currently LLVM provides two JIT engines: the MCJIT and the ORC (which is new on LLVM 3.7).
The MCJIT work on 'modules': the programs can either pass a C or IR file or use the API to create a module with multiple functions. The JIT engine will then build and create a ELF module that will be loaded in process address VMA. It is essentially an AOT JIT.
The ORC stands for 'On Request Compilation' and it differ than MCJIT is aiming to lazy compilation using indirection hooks. The function won't be JITted until is is called. [1]
In any case you won't have inline speed if you decide to just JIT the inline calls, it will still be an indirection calls to the JIT functions. Neither supports patchpoints, which was the kernel does to dynamically change the code to patch for specific instructions.
If you want to actually dynamic change the code you can try the DynamicRIO [2] project that aims to provide an API to do so. However it is aimed for instrumentation, so I am not sure how well it plays with performance-wise projects.
I would suggest instead of focus on dynamic code generation for such inlines, to work on more general functions that are actually called through either PLT or indirections and crate runtime dispatch based on runtime.
You can follow the GCC strategy to do indirection calls (the __builtin_supports('') which openssl emulates as well) or since it is a library to use IFUNC on the PLT calls (like GLIBC does with memory and math operations). With current GCC you can build different versions of the same function and add a IFUNC dispatch to select the best one at runtime.
[1] http://article.gmane.org/gmane.comp.compilers.llvm.devel/80639 [2] http://www.dynamorio.org/
g.
On 10 November 2015 at 02:50, Bill Fischofer <bill.fischofer@linaro.org mailto:bill.fischofer@linaro.org> wrote:
Adding Grant Likely to this chain as it relates to the broader subject of portable ABIs that we've been discussing. On Mon, Nov 9, 2015 at 4:48 PM, Jim Wilson <jim.wilson@linaro.org <mailto:jim.wilson@linaro.org>> wrote: On Mon, Nov 9, 2015 at 2:39 PM, Bill Fischofer <bill.fischofer@linaro.org <mailto:bill.fischofer@linaro.org>> wrote: > The IO Visor project appears to be doing something like this
with LLVM and > JIT constructs to dynamically insert code into the kernel in a > platform-independent manner. Perhaps we can leverage that technology?
GCC has some experimental JIT support, but I think it would be a
lot of work to use it, and I don't know how stable it is. https://gcc.gnu.org/wiki/JIT The LLVM support is probably more advanced.
Jim _______________________________________________ lng-odp mailing list lng-odp@lists.linaro.org <mailto:lng-odp@lists.linaro.org> https://lists.linaro.org/mailman/listinfo/lng-odp
lng-odp mailing list lng-odp@lists.linaro.org https://lists.linaro.org/mailman/listinfo/lng-odp
lng-odp mailing list lng-odp@lists.linaro.org https://lists.linaro.org/mailman/listinfo/lng-odp
linaro-toolchain mailing list linaro-toolchain@lists.linaro.org https://lists.linaro.org/mailman/listinfo/linaro-toolchain