0
I Use This!
Activity Not Available

News

Posted over 12 years ago
It's time for another OpenHouse at the OpenGamma HQ! We've been wanting to organise another OpenHouse for a while now, but our growing client base has kept us extremely busy. However, with the impending 1.1 release and our recent Series C fundraise ... [More] , we have a lot to share with you. Our doors will open at 6 pm on Thursday, 20th September, with demos, pizza and beer, as usual. Our platform engineers are currently putting the finishing touches on the 1.1 release of the OpenGamma Platform, which will be out this month. Some of the key new features that we'll be showcasing include: UI improvements New command line management tools Improved dependency graph building functionality ...and more We'll also be discussing what the recent Series C fundraise means for you as a community. Come along to meet the team, see the Platform in action, and ask any questions you may have! You can just come by our offices in Southwark after work on Thursday 20th September, but we'd like you to register on Eventbrite so that we have a rough idea of who's coming and how much food and drink to order. We hope to see as many of you as possible! Register now All images from the OpenGamma OpenHouse in May 2012 - (c) 2012 OpenGamma Limited. [Less]
Posted over 12 years ago
It's been over 18 months since I wrote about OpenGamma's last round of funding, but we've just closed our third round of equity funding, led by ICAP. The official joint OpenGamma/ICAP press release has the relevant details, but I wanted to talk a ... [More] little bit more about what this means to us as a company and you as a community. About the Raise To start with, we've had some questions over the years of why we've raised over $23 million in investment to date. For those of you who come from the financial services industry, it should be obvious: the software we develop is very large, very complex, and requires a lot of people with very deep and in-demand skillsets to build and test it. Without investors (such as Accel Partners and FirstMark Capital, who funded our Series A and B investments; and ICAP and Euclid Opportunities who joined on the Series C) who share our vision of radical disruption in the quantitative finance market, there's no way we would have been able to build a system as comprehensive and high quality as the OpenGamma Platform. In addition, for this round of funding we chose to go with a strategic investor rather than a traditional venture capital firm. ICAP is the world's leading interdealer broker and provider of post-trade services, and was already a market data partner of ours (a forthcoming version of the OpenGamma Platform will natively support their market data products, much like we already do for Bloomberg, Thomson-Reuters, ACTIV Financial, Tullett-Prebon, and others). We're excited to deepen this relationship, and we think Mark Beeston will be a fantastic addition to the OpenGamma Board of Directors. However, that doesn't mean we're losing any of our independence. Both ICAP and OpenGamma realize that our customers greatly value the independence and neutrality we provide for the whole industry. We will not be sacrificing any of the areas that customers want us to support in the entire ecosystem of financial services technology and services: For market data, we will continue to support and in some cases resell data from the major industry aggregators (e.g. Bloomberg, Thomson-Reuters, ACTIV) as well as the primary sources (including IDB competitors to ICAP such as Tullett-Prebon). We will continue to provide trade processing bridges for all counterparties and fund administrators that our customers find valuable, regardless of their competitive positioning to ICAP's offerings. We will continue to push integration with as many back-office and portfolio management systems as possible. In short, OpenGamma customers and open source community members get the best of all possible worlds from this investment: Sufficient funds to ensure OpenGamma is able to execute on our vision of providing an open source de facto standard for quantitative finance; A funding partner who sits at the heart of this industry and has seen every other technology in this space; and A commitment to independence and neutrality for the benefit of every single capital market participant. Where We've Been In the last year, we've had some pretty significant milestones: we released version 1.0 of the OpenGamma Platform; we were selected as a finalist for the Red Herring Europe awards; we won the Top Innovator award at the SWIFT Innotribe Startup Challenge; and we were selected as one of the 8 Technology Firms of the Future by Risk Magazine. But to me, the most important milestone was moving our first commercial customers from evaluation into pilot and now into full production installations. Some of the largest and sophisticated capital markets participants are now using the OpenGamma Platform 24/7 to manage some of the most complex derivatives portfolios in the world. And our support offerings have delighted them. So all in all, it's been a pretty darn good year. But we're not slowing down. In fact, we're accelerating. Where We're Going This investment lets us hit the gas on some areas we know are of significant importance to our open source community and our commercial customers: We're hiring. Significantly. Across our entire workforce: sales, technical services, and core R&D (in both our Platform and Quantitative Development teams). If you've thought about working for us in the past, I can't think of a better time to join the team. We're expanding our geographical coverage. In addition to our worldwide and North American headquarters in London and New York, we'll be opening a number of regional offices over the next few years, making sure that customers have the comfort of someone who can come in with little notice, and to ensure that we can provide 24x7 support. We're enhancing our product offerings. Examples of areas you can expect product innovation in over the next few years include asset class coverage, GUI tools, pre-built risk models, and trade data connectors. In addition, the market has asked us to start providing hosted services, and we've been listening. We'll start to roll those out this year, and continue in 2013. We're adding to our service offerings. I've already talked about more support coverage, but we'll also be adding new documentation sets, training packages, and monitoring services. We won't rest until you can be up, running, and productive using the OpenGamma Platform as fast as possible, and until you only need to focus your IT spend on what is unique to your firm and your trading and risk style. I'd like to thank everybody for their support in building the first full, production-grade, open source platform for quantitative finance. But if you think what we've done is impressive, wait until you see what's coming next... [Less]
Posted over 12 years ago
Risk Magazine is celebrating its 25th anniversary this month. As part of the celebrations, Risk's editorial team has published a special supplement, Firms of the Future. The supplement examines how the financial services industry - and risk ... [More] management in particular - is expected to change over the next five years, and identifies the organisations Risk believes will have the biggest impact on the OTC derivatives market. OpenGamma is proud to have been selected as one of the eight Technology Firms of the Future. With more and more hedge funds and investment banks exploring open source alternatives to proprietary tools, we believe we are ideally positioned to disrupt the way markets view risk management and analytics technology over the coming years. Of course, we would like to see more open source technologies on the list. I am sure that will be the case in future rankings. You can download an extract of the Firms of the Future supplement below; for the full listings, head over to Risk (subscription required). Risk Magazine: Firms of the Future - OpenGamma (PDF) [Less]
Posted over 12 years ago
When I last worked at an Investment Bank, we were users of the Imagine Trading System (though running in-house, rather than via the Imagine cloud-based offerings), and we ran billions of notional dollars of derivatives contracts through it. It’s ... [More] with that history and context that I’m welcoming the Imagine Financial Platform to the financial landscape. While I may dispute that Imagine is offering the industry the “first opportunity to build a community around a standard platform,” I helped co-found OpenGamma with the belief that communities are critical to the development of the software that traders and risk managers use on a daily basis. Whether these communities are supporting developers or end-users, our industry works better when people are able to share knowledge, experience, and code. I’m especially thrilled that Imagine is “opening [their] portfolio and risk management system.” We here at OpenGamma believe in being Radically Open, and believe that openness is critical both to the developer and end-user experience of any portfolio or risk management system, but also to the fostering of the communities that both Imagine and OpenGamma recognize are so important for trading in capital markets today. The more openness the better, we think! With that in mind, I can’t think of any better way to support the Imagine Financial Platform than for OpenGamma to join the Imagine Marketplace and help bring the benefits of the OpenGamma Platform and our wealth of standards-based and state-of-the-art functionality to Imagine users. The Imagine Marketplace already contains “downloadable open-source modules,” and we’re well aware of the number of OpenGamma Platform downloads and documentation visits from firms working with Imagine on a daily basis, so we’re sure that Imagine Software is just as excited as we are about the chance to bring OpenGamma technology to Imagine users through the Imagine Financial Platform and Imagine Marketplace. There aren't many details available about the APIs and exactly how to work with them, or how to join the Imagine Marketplace, but I’m sure we’ll be able to figure it out soon. Once our application is processed we’ll be able to update you with just how we’ll be able to bring the OpenGamma Platform to the thousands of professionals at hundreds of firms using Imagine! [Less]
Posted over 12 years ago
What was said last My last post was about the theory of algorithmic differentiation and how we implemented it at OpenGamma. This post is about the efficiency we obtained in practice for algorithmic differentiation computation of greeks. It also ... [More] explains how we went one step further with respect to the standard implementation by analysing the model calibration question. Efficiency in practice The efficiency described previously is the theoretical efficiency; in practice the code may not be perfect, or some easier-to-maintain choices may be selected in some places. What is the impact of the practice on the theoretical figures mentioned above? Maybe surprisingly, the results are not only very good, but in a lot of cases well below the theoretical upper bound. In this section we take three examples, in increasing level of sophistication. The numbers are obtained with the OpenGamma OG-Analytics library1 performing valuation of a large number of instruments. One function used a lot in finance is the Black option price formula. The formula and its derivatives version are implemented in the class BlackPriceFunction. For that function the computation time for the price only and for the price and its three derivatives (with respect to forward, strike and volatility) are (time for 1,000,000 prices): Price 170 ms Price and 3 derivatives 195 ms This is a ratio of 1.15 to 1 (only 15% more for 4 numbers instead of 1). When analysing how such a result is possible, one has to remember that algorithm-specific information can be used. In this case the exercise boundary (often called d1) is an optimal boundary (in the sense it gives the highest value to the price) and the derivative with respect to a variable at its optimum is 0. That information can be used to speed up the computation (there is no need to algorithmically compute the number we know is 0). With a similar implementation but without the algorithm-specific information, the time is 245 ms (ratio 1.45 to 1). The gain due to the optimisation is roughly 30% of the pricing time. The second example is a interest rate derivative one. The price of a swaption (with physical settlement) is computed in the Hull-White model. The methods are implemented in the class SwaptionPhysicalFixedIborHullWhiteMethod. The derivatives are with respect to all the rates in the yield curves used to compute the cash-flow (2 curves, quarterly payments for 5 years = 40 sensitivities). The computation times are (time for 10,000 swaptions): Present value 90 ms Present value and curve sensitivity (40 derivatives) 320 ms Here, again, the ratio is below the theoretical upper bound: 3.5 to 1. The last example relates to CMS spread options. The options are priced with a binormal model on swap rates with strike dependent volatility and calibrated to CMS prices obtained by replication with SABR smile. The methods are implemented in the class CapFloorCMSSpreadSABRBinormalMethod. In the example we use a 10Y-2Y spread. The times are (time for 100 spread options): Present value 40 ms Curve sensitivity (80 derivatives) 235 ms
 SABR risks (4 derivatives) 185 ms In this case the ratio is not as good (but still 10 times faster than finite difference for the curve sensitivity). One of the reasons is that the sensitivities are computed by recomputing the numerical integral used in the replication. One step further: calibration and implicit functions One step that can be quite time consuming when pricing an exotic instrument is the model calibration. The process is as follows: the price of an exotic instrument is related to a specific basket of vanilla instruments; the price of those vanilla instruments is computed in a given base model; the complex model parameters are calibrated to fit the the vanilla option prices from the base model. This step is usually done through a generic numerical equation solver; and the exotic instrument is priced with the calibrated complex model. In the algorithmic differentiation process, we suppose that the pricing algorithms and their derivatives are implemented. We want to differentiate the exotic price with respect to the parameters of the base model. In the bump and recompute approach, this corresponds to computing the risks with model recalibration. As the calibration can represent an important fraction of the total computation time of the procedure described above, an alternative method is highly desirable. In general, the calibration process is not explicit, but done through a numerical equation solving. We have only one set of parameters of the calibrated model; the set that matches the prices from a specific set of curve and base model parameters. There is no explicit algorithm that provides the calibrated model parameters as function of the base model, and even less its adjoint algorithmic differentiation. In a recent paper (Henrard (2011)) I described a way to combine efficiently the calibration procedure with the algorithmic differentiation through the implicit function theorem. With that apporach the results are extremely good. The more speaking results of the paper are reproduced here.

 The prices of amortised swaptions are computed in a Libor Market Model calibrated to a set of SABR models for vanilla swaptions of different tenors. The sensitivities computed are the sensitivities of the price with respect to the curves and to the SABR models parameters. The amortisation scheme for the swaption is annual. So for each year there is an underlying SABR model for the vanilla swaptions with its own set of parameters, and we would like to compute the sensitivity of the final price to those initial inputs, with the understanding that the sensitivity is the total sensitivity, including the calibration mechanism. The table below is for a 10 years swaption with annual amortisation and semi-annual floating leg in a multi-curves setting. There are around 40 interest rate sensitivities (2 curves for a 10 years instrument with 2 payments a year) and 30 SABR parameters (10 calibration instruments with 3 SABR parameters for each). The results are (time in seconds for 250 swaptions; the "Risks" time is the additional time on top of the value time required to obtain the risk type): Risk type Approach Value Risks Total SABR Finite difference 0.425 30x0.425 13.175 SABR AAD and implicit function 0.425 0.075 0.500 Curve Finite difference 0.425 42x0.425 18.275 Curve AAD and implicit function 0.425 0.315 0.740 Curve and SABR Finite difference 0.425 72x0.425 31.025 Curve and SABR AAD and implicit function 0.425 20 0.745 The same results can be computed for swaptions with different tenors. In Figure 1, we graphically represent the results for tenors between 2 and 30 years. In our implementation, the time ratio for all rate risks and parameter risks is below 2.5 in all cases (and often below 2) while in the finite difference case, the ratio is linearly growing with the tenor, ranging from 14 (2 years with 7 risk per year) to 210 (30 years with 7 risk per year). At the extreme the computation time is divided by almost 100. Figure 1: Computation time ratios (price and sensitivities time to price time) for the finite difference and AAD methods. The AAD method uses the implicit function approach. Figures for annually amortised swaptions in a LMM calibrated to vanilla swaptions in SABR. Figure 2: Detail of the previous picture centered on AAD method result. What about higher order? The higher order derivatives can also be useful; unfortunately algorithmic differentiation does not extend easily to them. One can take the derivative of the derivative. But for a n-input and 1 output function at a cost of 1, the first order will be n input, n output for a cost of 4 (using ADD) and the second order a n input and n x n output for a cost of 4xn x 4, which is not faster than applying a finite difference mechanism to the first order derivatives at a cost of n x 4 (one sided and 2xn x 4 for symmetrical differential ratio). We have generally not implemented second order derivative mechanisms at OpenGamma. The exceptions are the Black and the SABR formulas. In the SABR extrapolation, to obtain a probability distribution without point mass, we extrapolate the price with a continuous second order derivative; for this we need an efficient second order implementation. When we compute the derivatives of the extrapolated price with respect to the input, we somehow need a third order derivative with respect to some parameters (two orders for the smooth extrapolation and one order for the derivative). In that case we used explicit second order SABR derivatives and finite difference for the third order. Conclusion We have systematically implemented algorithmic differentiation (generally through the adjoint mode) throughout the OG-Analytics library. We already had a good code base, including some derivative building blocks, when we decided to formally use adjoint algorithmic differentiation. We decided to implement it through manually written code. The results we obtained are very good. We have also implemented some implied function based methods to cope with model calibration. In that case the results are even better. References Giles, M. and Glasserman, P. (2006). Smoking adjoints: fast Monte Carlo Greeks. Risk, 2006, 19, 88-92 Griewank, A. and Walther, A. (2008). Evaluating derivatives: principles and techniques of algorithmic differentiation. SIAM, second edition. Henrard, M. (2011). Adjoint Algorithmic Differentiation: Calibration and Implicit Function Theorem. Journal of Computational Finance, to appear. Preprint version at http://ssrn.com/abstract=1896329. Footnotes 1The OpenGamma OG-Analytics Library is open source and available at http://developers.opengamma.com/. [Less]
Posted over 12 years ago
Quantitative finance and derivatives In quantitative finance most of the modelling and development time is spent on models and algorithms to compute present values from a set of inputs. The art is to obtain efficiently the value of the financial ... [More] instruments from a (relatively large) data set (often between 10 and 200 data points). On the other side, when it comes to CPU time, most of it is spent on computing the derivatives of the value with respect to the inputs: the so-called greeks. A traditional way to compute them is to use a finite difference, or "bump and recompute" approach: each input is bumped by a small amount, and the value is recomputed. The derivative ratios are computed from the bumped values and the initial value. This is a very simple approach: you only need the initial algorithm and a loop to bump all the inputs. Unfortunately, it is usually not an efficient algorithm. For each input you need to recompute the value. For a 50 inputs problem, the time required to compute the first order derivatives is 50+1 value time. Moreover, some numerical stability issues can lead to poor quality results. In this blog post, I'll quickly describe the theory of Algorithmic Differentiation and how we implemented it at OpenGamma. In the second part (to follow next week), I'll describe the efficiency obtained in practice and then go one step further than the standard implementation in finance and describe how to cope with model calibration. Theory The theory of algorithmic differentiation is well known and has been used in computer science for a long time1. It is also called the art of differentiating programs. The idea is that the steps to compute the value are recorded and 'replayed' with the required modifications to obtain the derivatives. From a mathematical point of view, the technique uses the chain rule on a repeated basis. Algorithmic differentiation comes in two flavours: the standard or forward mode, and the adjoint or reverse mode. The two differ in the way the differentiation algorithm works: from the inputs to the outputs for the forward mode, and from the outputs to the inputs for the reverse mode. In the sequel I measure the efficiency of derivative computations methods by computing the ratio of the time required to compute the value and all its derivatives to the time required to compute the value only. If the time ratio is 1, it means that computing the derivatives is free (only a dream, not an expectation); for the finite difference, the ratio scales linearly with the number of inputs. The time ratio of the standard mode is less than twice the number of inputs (see Griewank and Walther (2008) for the details): Cost(V+D) < 2 * number of inputs * Cost (V) The time ratio of the adjoint algorithm is less than four times the number of outputs: Cost(V+D) < 4 * number of outputs * Cost (V) In the problem we are facing in finance, the input is usually of dimension between 10 and 200 and the output is of dimension 1 (the present value). The choice between the two flavours and the benefits are clear: using the adjoint implementation, the computation time can be divided by ten or more. One of the first systematic implementations of the technique in quantitative finance, which sparked more research and development, was Giles & Glasserman (2006). Algorithmic Differentiation was used to compute the greeks in the Libor Market Model with Monte Carlo. Monte Carlo simulations are usually relatively slow and the Libor Market Model implementation requires the computation of path dependent drift terms that make them even slower. Obtaining a more efficient implementation of greeks computation in that context is certainly a big advantage. Bottom-up If the technique is so efficient, why is it not systematically implemented for the most time-consuming tasks? One of the reasons is certainly that you cannot start from the top and decide to apply it to your more complex task. The technique requires the computation of the derivative of each line of code. It means that each method used should have its derivative version; one has to start from the very bottom. Each method used in the complex task, down to the simplest interpolations, should have a derivative version before even starting to look at the top task. If the library has not been designed with that aspect in mind, it is a long and time-consuming task to, a posteriori, redevelop the missing parts. On the other side, computing the derivative in an explicit way (by opposition to a finite difference way) is relatively natural for functions whose derivatives are used on a regular basis and important building blocks may already be present, even if not under the name of Algorithmic Differentiation. How we did it The implementation of algorithmic differentiation can be done in several ways: manually writing the code; overloading methods to produce derivatives at run time; or by automatic code generation from the original code. Our implementation is based on manually written code. There are several reasons for that. It gives us the flexibility to use financial domain specific knowledge to simplify the computations when possible. Understanding the intuition behind some variables can lead to significant efficiency gain. As I will show below, this can lead to results significantly faster than the theoretical upper bound and a direct implementation. Our libraries are written in Java. This imposes some constraints on the implementation and in particular on overloading operators.The overloading approach was not a route we could take easily. When we started to explicitly use the term Algorithmic Differentiation and put it in place for almost all of the pricing functions, we looked through the existing code. Most of the building blocks already had a derivative version. This was, in particular, the case for anything related to curve construction. The curves construction in a multi-curves environment with enough flexibility to use any liquid instrument naturally comes down to root finding exercises with a large number of unknowns (easily 50 to 70). To solve those, an efficient gradient implementation is required. Without the efficient gradient implementation, in one form or another, the results would be quite poor. From there it is a matter of continuing adding the derivative versions for each new model. A priori this may seem like a lot of work, roughly doubling the amount of code. In practice, when the price code is cleanly written and fully tested, adding the differentiation version is relatively straightforward and quick, certainly a lot quicker than writing the original pricing code. My next post will deal with the results obtained in practice and some extension to the standard implementation in finance. For more information on algorithmic differentiation, see Adjoint Algorithmic Differentiation: Calibration and Implicit Function Theorem (PDF). References Giles, M. and Glasserman, P. (2006). Smoking adjoints: fast Monte Carlo Greeks. Risk, 2006, 19, 88-92 Griewank, A. and Walther, A. (2008). Evaluating derivatives: principles and techniques of algorithmic differentiation. SIAM, second edition. Footnotes 1As a reference on the subject, I recommend: Griewank and Walther (2008) [Less]
Posted over 12 years ago
What was said last My last post was about how we want to test our maths libraries by accessing native library code with fuzzy data sets and then comparing results. The post before that was about how to access native code with minimum overhead. This ... [More] post is about the actual implementation of our wrapper to native code that allows us access to native maths libraries. What to wrap? The prime candidates for wrapping are BLAS, LAPACK and SLATEC API calls; these libraries cover the majority of the core numerical methods, particularly in the field of linear algebra. All the libraries are written in Fortran 77 and the NETLIB reference libraries for BLAS and LAPACK are widely available and usually come in the form of ATLAS on Linux machines. Needless to say, I develop code solely on Linux variants and this is reflected in the following; however, there is no reason why these techniques couldn't be used on other OS's. First thing to note, we are aware of JBLAS, which wraps BLAS and LAPACK from ATLAS. However we want something that supports multithreading (especially with regards to exception handling), uses faster access methods, and importantly is a flexible tool for wrapping other Fortran libraries. How we do it To begin with, we note that the source for the reference BLAS, LAPACK and SLATEC libraries from Netlib are organised with one function per file. So first we need to extract the type information and routine names from the files. As fun as writing an F77 parser would be, an easier method is to use GCC to parse the code and dump out the SSA tree from which a bit of regex is all that is needed to extract the required information. This method also has the advantage of simply failing when called on Fortran code that isn't actually valid! For example purposes we'll use the "dgemv.f" code from BLAS. First run gcc: #>gcc -c -fdump-tree-ssa dgemv.f which gives a file called "dgemv.f.017t.ssa", the top of which contains: ;; Function dgemv (dgemv_) dgemv (character(kind=1)[1:1] & restrict trans, integer(kind=4) & restrict m, integer(kind=4) & restrict n, real(kind=8) & restrict alpha, real(kind=8)[0:] * restrict a, integer(kind=4) & restrict lda, real(kind=8)[0:] * restrict x, integer(kind=4) & restrict incx, real(kind=8) & restrict beta, real(kind=8)[0:] * restrict y, integer(kind=4) & restrict incy, integer(kind=4) _trans) { which is exactly what is needed: all the type information is there in an easily parsable form. To make a convenient call to native DGEMV from Java, we need to have: Some C/C++ prototypes for the Fortran library calls Some C/C++ JNICALL code that moves/formats data and makes calls to the native Fortran libraries Some Java code that provides access to the C/C++ code once wrapped in a library Some Java signatures that describe the JNICALLs to Java So we wrote some Python code to call GCC and extract the type information. The Python code contains maps between the various types in Fortran, C and Java, which can then be looked up when generating the items on the list above. Some C/C++ prototypes for the Fortran library calls The code we generated to tell C the prototype of the Fortran DGEMV function is given below. In Fortran all arguments to functions are passed by reference, hence everything being a pointer, we need to account for this further up stream in the call chain. We also define DGEMV_F77 using the F77_FUNC macro to deal with variations in capitalisation and underscoring in function names used by different compilers. Some C/C++ JNICALL code that marshalls data and makes calls to the native Fortran libraries We first define a few of macros to handle problems with getting data from the JVM. MACRO_CRITICALGETPOINTER() attempts to obtain access to the memory via (*env)->GetPrimitiveArrayCritical(). Should the memory access fail, the MACRO_FAILEDPOINTERGRAB() throws a Java Exception to indicate a failure has occurred; should this fail, an exit code is given. Finally the MACRO_CRITICALFREEPOINTER() frees the pinning on the memory or deallocates if the memory was copied. The JNICALL is automatically generated based on the type information and the type maps defined in the Python code. Within the call we make sure arrays are passed with an additional parameter prefixed with _offset_ to emulate pointer behaviour from the Java side. This allows us to arbitrarily stride to a position in memory, much as would be done with an array slice in Fortran or a pointer offset in C. Some Java code that provides access to the C/C++ code once wrapped in a library For ease, we provide two methods for each call to a JNICALL function, one with the array offsets and one without. Below is the call with offsets available. We have some cpp defines available to support locking and thread safety if needed. Those familiar with compiler preprocessors will note that Java doesn't have one! We are using GCC's CPP and some GNU SED in a GNU Make rule to pretend. We call the functions in Java by their true names and then pass the data through to the "wrapped" function of the same name that is available via JNI. Some Java signatures that describe the JNICALLs to Java Finally, and possibly the easiest bit, is just defining a signature for the "wrapped" call provided by the native library. If only that were all... This phrase is here for people looking how to "Call Java from native code when the environment pointer is not available" and coming in from search engines. What we have created above is a method with which to call the native libraries. What we need now is something to load said libraries, and this is provided by the System.loadLibrary() function. We also need to address a rather important problem too. The Fortran BLAS and LAPACK routines have their own error-handling mechanism provided by the function XERBLA. This function contains the Fortran key word "STOP", which does what it says: it stops execution and exits. If a bad call is made to a native maths library function which results in the thread running XERBLA, the thread exits and takes the JVM with it. So we need to provide an override, our own version of XERBLA, which when called handles the error and does not exit. Thankfully, all calls to XERBLA in the native libraries are followed by the "RETURN" key word, which causes the thread to immediately exit the function and return to the caller. Therefore if we manage to override XERBLA we can continue to run in the JVM. To override a function when creating a wrapping library is relatively simple, one simply forces the link order of the library to link a library containing the override function before the library that is being wrapped. Then, when the wrapping library is dynamically loaded and the symbol is looked up, the overriding library symbols take preference. This all seems fine, however, when XERBLA is called the thread making the call is embedded deep into the land of the native maths library and so has no knowledge of the JVM, which is rather problematic were one to want to throw an exception. The way around this is to globally cache the JVM pointer in the library containing the XERBLA override function via the JNI_OnLoad()function mechanism. This is sketched out in the following code for an overriding library providing the XERBLA function: We can see in this code that we have a global JavaVM pointer cache *JVMcache. The JNI_OnLoad mechanism, which is called when the native library is loaded, is then used to set the global JavaVM pointer cache. Then, should XERBLA be called from within BLAS/LAPACK, because of the link order our XERBLA override is called. Within our XERBLA we then use the global JavaVM pointer cache and attach our running thread back to the JVM to get access to the environment pointer from which it is then trivial to "gracefully" handle the exception. Anything else? We automated all this code generation via a Python script. The whole lot was then put into an Autotools package. We wrote some new rules and macros to allow building and preprocessing Java code, which are rather useful for obvious reasons. The Java code was then zipped in a jar and the libraries are provided at system level accessed via the classpath variable. Conclusion We now have access via a relatively safe mechanism to native maths libraries. This allows us to test our code, and the code of others against reference implementations from Java directly. This means writing code to automate fuzzy data testing is possible as comparisons can easily be made, making our maths libraries more robust and heavily tested. [Less]
Posted over 12 years ago
I read with interest Leslie Spiro's post on the Finextra blog today titled “How important is the source in Open Source”. I think Leslie makes a number of very good points as to why we’re starting to see a wave of Open Source projects springing up ... [More] which directly target the financial services industry: Open Source, with good governance and a commercially-friendly license (like the Apache Public License that we release the OpenGamma Platform under), minimizes vendor lock-in for the benefit of customers. Widespread adoption of common technologies makes recruitment easier by broadening the pool of candidates with plug-and-play skillsets. Both of those factors put together lead to significantly lower overall cost to financial services firms, critical in today’s era of permanently reduced margins. However, Leslie ends with a statement that I don't agree with: So in summary, actual access to the source code does not really make a difference to customers. Until I co-founded OpenGamma, I spent my entire professional career working as a software engineer (the last 7 years of it in financial services), and access to the source was critical to my being able to do my job. And when I worked for software vendors in Silicon Valley, it was critical to my customers and partners as well. We work in an industry which requires extensive integration efforts on the part of almost every firm. Market data systems, trading systems, risk systems, back-office systems, partner systems, all of these require integration effort. And our end-users, traders, always have some form of custom work that they need, whether it’s custom screens, custom analytic methods, or custom trading models. That requires a lot of integration. And integration is where access to the source code becomes critical for developers. I could write about this subject for days, but I just wanted to pull a few anecdotes of where it’s proven critical to me and firms I worked with. When I worked at WebLogic on the WebLogic Server EJB container, we were the first EJB container (this is all before J2EE actually existed mind) that had third-party support for pluggable container managed persistence (so you could use your favorite ORM system within WebLogic Server). Although I was under instructions to only submit pre-release jar files to our partners for their testing, as these were pre-release, there were inevitably bugs and areas that weren’t documented as well as they could have been. These partners then actually decompiled the source code so that they knew exactly what WLS was doing under the hood in order to be able to make sure their implementation against our SPIs was operating successfully, to speed time on my responses to their support requests. They did this because access to the source code radically sped up their own development efforts, resulting in a significantly faster, cheaper, and better integration effort. At the last bank I worked for, we were a customer of a commercial MOM system. Although this system supported the open JMS standard, we had a performance problem that only showed up under very certain conditions. Although I had been working with the support engineers from the vendor, the turnaround time was at least 24 hours for each request, and I wasn't getting anywhere. So I decompiled their Java client library, and it turned out that there was an inefficient data structure being used for the message headers. I very rapidly changed our code and achieved a 30% application performance speedup within 12 hours of decompiling. The source code was critical to me to be able to do my job, and resulted in a faster system with significantly less effort. At said firm we were using a commercial trading system. Like many customers, we had heavily customized the trading system with integration with our custom analytics library, custom security types that the vendor didn’t support, and a host of custom reports. While this was a mature system with very few bugs, we found that once we hit the wall of the Black Box portions of the system, our development speed slowed down. Were we making the right calls? Did we understand the performance-critical nature of the system? What happened when we needed to include a new library? What compiler flags were required? In the end the firm paid over 10 times our initial license cost for access to the source code for the system; I know several other firms that did the same thing. For us, to speed up and harden our integration efforts, we paid 10 times as much for the source code as we did for the binary version. OpenGamma's customers are already finding that access to our source code is important for integration, even though they’re not modifying our code themselves. One large hedge fund customer in a matter of weeks had connected the OpenGamma Platform to three internal data sources, connected 5 of their internal custom trading and analytic models to our calculation engine, and wrote custom Excel functions connecting to their own proprietary Java libraries. Without being able to step through the whole system in Eclipse, this process would have taken considerably longer, no matter how good the documentation was. Given what programmers at financial services firms cost, this resulted in clear and measurable savings to the firm. While all of these examples are anecdotal, one only needs to take a look at broad industry trends to see that Open Source systems are the go-to friends of the developer: Tomcat and Jetty have more paid-support customers than any commercial but closed-source implementations of the Servlet and JSP specifications; JBoss is far and away the most commercially successful J2EE implementation; Spring is the dominant application framework today for commercial Java developers; Every major NoSQL and NewSQL system is Open Source: MongoDB, Cassandra, Voldemort, Redis, Riak, the list goes on and on. Developers have moved, and are moving, with their feet. While open standards, vendor neutrality, and ease of recruitment are undoubtedly important parts of an Open Source ecosystem, to developers, access to the source code really DOES make a difference. [Less]
Posted almost 13 years ago
What was said last My last post was about how to access native libraries with minimal performance drop. It was seen that JNA provided the easiest method in terms of writing code, but to get the real performance critical JNI calls were needed. All ... [More] this was in order to have a base against which we can test our maths libraries, and it is some discussion on how we test our maths libraries that forms the content of this post. To reiterate, the problem we encountered was that whilst using singular value decompositions (SVD), provided by Apache Commons Math 2.2 and Colt, we found that in some cases we were getting different results for the same input matrix ($A$). The differences could not be accounted for by floating point error causing the vectors in the null space of $A^T$ to form a correct but different basis. The matrices tested were not singular to machine precision either, in which case floating point error counts for a lot - these matrices were just poorly conditioned, but not pathological. So in an attempt to work out what was going on we developed our own SVD implementation and that gave results closer to those of Colt. The latest snapshot versions of Apache Commons Math also give a result similar to our SVD and Colt SVD, which means they must have changed algorithms. So, this all raised the rather massive questions of "What is the right answer?" and "How can we check that our code is correct and always gives the right answer?" Ways of testing In the world of non-mathematical algorithms, unit tests and coverage of code are often sufficient in demonstrating that a piece of code works. But in the world of maths, because the data ranges can often by definition fill the entire range of floating point types, and floating point considerations are rather complicated (especially in iterative methods), a different approach is needed. A few testing methods are: Use prior knowledge of what's likely to trip algorithms up and invent pathological data sets for unit tests. This method has its place, but is time consuming and can only possibly test tiny subset of cases. Validation by reconstruction. For a large number of algorithms the original inputs, or some result based on them, can be reconstructed from the results of running the algorithm. This gives a simple check that the results are in the "right sort of area" by comparing p-norm errors or similar on the reconstructed data in comparison to the original. Comparison. Another approach is to compare against work-hardened reference implementations of the algorithms, which usually come from native code libraries - hence the last post investigating ways of accessing these. Testing whilst developing At OpenGamma, we like GNU Octave; in fact, we think it is great. It is also conveniently built with the backing of ATLAS/BLAS/LAPACK/Suitesparse/qrupdate/FFTW amongst others, and as a result can, with little effort, be scripted to provide IO for results and test comparisons for our code. Most conveniently this can be performed using the Java package from Octave Forge that allows the instantiation of Java objects in Octave code and therefore makes testing Java code rather easy. An example of this is given below, to demonstrate that this isn't OpenGamma trickery; we're accessing the Apache Commons Math library and using their linear algebra example. So whilst developing, we have an Octave instance running with a harness to whatever method it is we are working on. As algorithm development is undertaken, continuous tests against fuzzy data sets are performed and results from Java are compared to the results from Octave's (which are largely backed by work hardened native libraries). If the results match for a very large number of fuzzy cases then we become more convinced that we've got the algorithm correct! This is all well and good, however, a problem occurs when it comes to on-going testing. We use the TestNG test suite and so integration with this would be ideal, which means ideally Java has to make the calls (we know we could generate the TestNG format output etc. from Octave but that doesn't bode well for continuous integration testing). On this basis we could programmatically hook into Octave's driving library, libOctave, with something like JNA as seen in my last post, or, we could even instantiate the Octave interpreter to make calls from Java. However, this would cause license taint as Octave is universally GPL2+, and we are predominantly Apache 2.0. Therefore, despite the ease of development Octave gives us, we can't actually release the code as part of our distro! Regardless of this, for internal development purposes it is very useful to have Octave fired up against which we can test our developmental Java code. Continuous testing As mentioned earlier, we'd like to do continuous testing of our maths library by fuzzing data and then forming comparisons of the output to results from native libraries. This means the native library calls have to be wrapped and we have to have compatible APIs at some level in the code. The method for wrapping these libraries and dealing with their quirks and thread safety is the subject of my next post. [Less]
Posted almost 13 years ago
What was said last In my previous blog post I wrote about sparse matrix support in the OG-Maths library and how we are working on sparse direct decompositions. This is still true, and they are still in the pipeline. Whilst writing these sparse ... [More] algorithms we found that having a reference implementation of equivalent dense matrix code massively aided debugging. So we wrote dense implementations of LU and Singular Value (SVD) decompositions, these being archetypal of the two fundamental decomposition strategies, direct permutation and orthogonal transform. Today's questions... All went well until it came to the question of how we test the code, and the SVD results in particular. The Colt and Apache Commons Math SVDs often gave different answers (in cases when numerical issues wouldn't be present). Clearly, both couldn't be right! So we asked the question, how can we test our SVD code? Mathematically speaking we can do some testing by checking against known results, but that doesn't really catch the edge cases where the more difficult problems invariably occur. So we asked another question: could we compare to the work-hardened LAPACK/BLAS system libraries, and if so, how? It is these questions that form the basis for an article tuple: How to get fast native access (this post) How we test OG-Maths Caveats on writing JNI code and thread safety So we face the problem of wanting results from decent implementations of LAPACK/BLAS obtained through Java calls, which clearly screams for native interfacing via the Java Native Interface (JNI). The follow question was then asked, what is the penalty of jumping through the JNI to native code for the type of calls we are making? We therefore set up a simple test on the standard y:=Ax BLAS2 operation (see this post for details) called via the following: JNA, JNI non-critical access, JNI critical access, Java and, as a reference, direct calls from C code. Within this test we were looking for two things: ease of use and therefore implied faster development, and raw performance in terms of time taken to perform the call. A little bit about each method: JNA, Java Native Access, is a massively useful, very easy-to-use interface to native libraries. The SLOC count for using this library and effort required by the developer is really quite small, however this ease would come with an apparent time penalty. Data pushed through the JNA calls appears to be heavily marshalled and as a result a lot of work would be required to be undertaken on the native side to amortise the cost of passing the data. JNI Is the classic Java native interface. Within it are a couple of ways to grab data from the JVM from native code. The first is via the nativePtr = (*env)->Get<Java BIGNUM Type>ArrayElements(); function calls which may or may not (most pretty much always do in practice) make copies of the data requested from the JVM and return a pointer to that opposed to accessing the version the JVM has. Obviously working on a copy of the data is thread safe until the time comes to publish the results of the native code (if applicable) back to the JVM memory space, in which case the native memory would clobber what is held equivalently by the JVM. The second JNI method is via the critical family of calls (*env)->GetPrimitiveArrayCritical(); which allow (potentially) direct access to the data held by the JVM. There are two caveats with using these calls: 1) it only potentially allows direct access, it might return a copy and definitely will if type conversion is needed; and 2) the critical state cannot be held or persisted for "an extended period of time" (more info here) without calling the release functions. This method is only available if the OS on which the JVM is running supports memory pinning (most do!). Set up We use the same harness and system as described in this post and test the y:=Ax operation again. We ignore column vs. row major storage differences to ensure the level of work is the same, and we aren't really bothered about the actual answer but rather the time/effort taken to get it. Therefore all data is assumed to be laid out in a manner that is conducive to just running the DGEMV call. We chose the following combinations of calling methods and both ATLAS (Netlib) and Intel MKL BLAS: Calling code language Calling method BLAS supplied by Java Java OGMaths BLAS Java JNA Netlib Fedora 16 default build (ATLAS) Java JNI non-critical calls Netlib Fedora 16 default build (ATLAS) Java JNI critical calls Netlib Fedora 16 default build (ATLAS) C Direct Netlib Fedora 16 default build (ATLAS) Java JNI non-critical calls Intel MKL 10.3 Java JNI critical calls Intel MKL 10.3 C Direct Intel MKL 10.3 Note: the expected call from JNA to Intel's MKL could not be made to work despite numerous efforts with regards to altering LD_PRELOAD, LD_LIBRARY_PATH, -Djava.library.path and -Djna.library.path. Static linking and loading was also attempted and failed. One day I have little doubt we will get this to work, but given the time cost required, we just wanted some results and so abandoned it. We also do not consider the lack of this result in any way detracts from our findings as they are replicated in full using Fedora 16 ATLAS BLAS. Code One of the criteria for choosing the right method to make the jump to native code was the ease of coding to do so. The JNA wins outright here: it can be nicely interfaced and handled rather cleanly with little effort. JNA example We first create an interface that extends the JNA Library class. For clarity we've not put in arguments to the dgemv_() call and have missed out the try{}catch{} on failure to load: Then the call is made as LAPACKLibrary.INSTANCE.dgemv_(...); which is fantastically easy; there are even classes to aid with getting the right pointer types and mangling them coherently! Obviously the underscoring notation of the library needs to be accounted for, but that can be dealt with dynamically. JNI example On the other hand, calls through the JNI require considerably more effort than JNA. First, we write a wrapper to our native library call, for example: public static native void dgemv_critical_(...); from which the header containing the corresponding signatures can be generated with the javah binary. We then have to implement the name mangled generated headers as functions, for example (in a very cut-down, pseudo-code-like version): We then wrap this code into a shared library via: and then write a static initialiser in our Java class to load the native library wrapper we just created: Then, if we've done everything correctly, and the system library and java class paths are correct, we should just have to write the call to the function name in our wrapper library from the Java code: dgemv_critical_(...); This is evidently a lot more effort than using the JNA. However, in the case of LAPACK and BLAS it's likely the code could mostly be auto-generated (in fact, this is exactly what we do!). Results Running the same tests as in this post we obtain the following performance-related results: The calls labelled "safe" are via the JNI non-critical calls, those labelled unsafe are via the JNI "critical" calls. What do we learn from this graph? That as expected Intel's MKL out-performs ATLAS BLAS. That our OGMaths Java implementations of the DGEMV call is about as fast as ATLAS BLAS. That critical calls to native code are considerably faster than non-critical calls. That the cost of the non-critical calls appears to be proportional to the data size (no surprise given the memcpy()). That JNA is a considerably slower method of reaching native libraries. Note: It is nice that our DGEMV and ATLAS's run at approximately the same speed. This is down to the quality of the instruction stream generated. For the heavy-lifting part, our Java code is running a set of register-rotated scalar SSE instructions. After running objdump on the ATLAS's library, in this case, it appears the ATLAS code is similar at least in the instruction blend (register-rotated scalar SSE), hence similar speed. What can we infer? For ease of use, JNA wins. However, the compute cost on the native side would have to be considerable in proportion to the data set size to make it viable for use in terms of performance. Critical calls are on the whole almost as fast as passing native pointers in terms of performance; this is a massive deal for us. Writing JNI functions and interfaces in a safe and clean manner is hard and requires a lot of effort. For this reason they should only be undertaken if performance is critical, as JNA provides a considerably more convenient interface. Conclusion We confirmed what we expected: safe calls from Java to native libraries are expensive, but if done in a certain way, the cost is absolutely minimal. This means that the raw power of something like Intel's MKL can be harnessed from Java at very little overhead cost if systems require the performance. Finally we can conclude with the fact that at OpenGamma we need unit tests that compare against dependable results and so calling native libraries is going to have to happen. My next post will be about ways we test our maths libraries. [Less]