Posted
about 12 years
ago
Math.NET Numerics v2.5 was released in April 2013. This page shows a few highlights and important changes, but covers them in more detail, with some code samples and also tries to explain the reasoning behind the change. For a full listing of all
... [More]
changes, see our release notes directly in the repository.
Please let me know if these “What’s New” articles are useful in this format and whether I should continue to put the together for future releases.
Statistics
Order Statistics & Quantiles
Previously our order statistics and quantile functions were quite limited. With this release we finally have almost complete quantile support:
OrderStatistic
Median
LowerQuartile
UpperQuartile
InterquartileRange
FiveNumberSummary
Percentile
Quantile
QuantileCustom
All of them are implemented on top of the quantile function. We always default to approximately median-unbiased quantiles, usually denoted as type R-8, which do not assume samples to be normally distributed. If you need compatibility with another implementation, you can use QuantileCustom which accepts either a QuantileDefinition enum (we support all 9 R-types, SAS 1-5, Excel, Nist, Hydrology, etc.) or a 4-parameter definition as in Mathematica.
For the empirical inverse cummulative distribution, which is essentially an R1-type quantile, you can use the new Statistics.InverseCDF function.
More efficient ways to compute statistics
Previously there were two ways to estimate some statistics from a sample set: The Statistics class provided static extension methods to evaluate a single statistic from an enumerable, and DescriptiveStatistics to compute a whole set of standard statistics at once. This was unsatisfactory since it was not very efficient: the DescriptiveStatistics way actually required more than one pass internally (mostly because of the median) and it was not leveraging the fact that the sample set may already be sorted.
To fix the first issue, we’ve marked DescriptiveStatistics.Median as obsolete and will remove it in v3. Until then, the median computation is delayed until requested the first time. In normal cases where Median is not used it now only requires a single pass.
The second issue we attacked by introducing three new classes to compute a single statistic directly from the best fitting sample data format:
ArrayStatistics operates on arrays which are not assumed to be sorted.
SortedArrayStatistics operates on arrays which must be sorted in ascending order.
StreamingStatistics operates on a stream in a single pass, without keeping the full data in memory at any time. Can thus be used to stream over data larger than system memory.
ArrayStatistics implements Minimum, Maximum, Mean, Variance, StandardDeviation, PopulationVariance and PopulationStandardDeviation. In addition it implements all the order statistics/quantile functions mentioned above, but in an inplace way that reorders the data array (partial sorting) and because of that is marked with an Inplace-suffix to indicate the side effect. These inplace functions get slightly faster when calling them repeatedly, but will always be slower than the sorted array statistics. Nevertheless, since the sorting itself is quite expensive, all in all the (non-sorted) array statistics are still faster in practice if only few calls are needed.
Example: We want to compute the IQR of {3,1,2,4}.
var data = new double[] { 3.0, 1.0, 2.0, 4.0 };
ArrayStatistics.InterquartileRangeInplace(data); // iqr = 2.16666666666667
This is equivalent to executing IQR(c(3,1,2,4), type=8) in R, with quantile definition type R-8.
SortedArrayStatistics expects data to be sorted in ascending order and implements Minimum, Maximum, and all the order statistics/quantile functions mentioned above. It leverages the ordering for very fast (constant time) order statistics. There’s also no need to reorder the data, so other than ArrayStatistics, this class never modifies the provided array and has no side effect. It does not re-implement any operations that cannot leverage the ordering, like Mean or Variance, so use the implementation from ArrayStatistics instead if needed.
var data = new[] { 1.0, 2.0, 3.0, 4.0 };
var iqr = SortedArrayStatistics.InterquartileRange(data); // iqr = 2.16666666666667
StreamingStatistics estimates statistics in a single pass without memorization and implements Minimum, Maximum, Mean, Variance, StandardDeviation, PopulationVariance and PopulationStandardDeviation. It does not implement any order statistics, since they require sorting and are thus not computable in a single pass without keeping the data in memory. No function of this class has any side effects on the data.
The Statistics class has been updated to leverage these new implementations internally, and implements all of the statistics mentioned above as extension methods on enumerables. No function of this class has any side effects on the data.
var iqr = new[] { 3.0, 1.0, 2.0, 4.0 }.InterquartileRange(); // iqr = 2.16666666666667
Note that this is generally slower than ArrayStatistics because it requires to copy the array to make sure there are no side effects, and much slower than SortedArrayStatistics which would have constant time (assuming we sorted it manually first).
Repeated Evaluation and Precomputed Functions
Most of the quantile functions accept a tau-parameter. Often you need to evaluate that function with not one but a whole range of values for that parameter, say for plotting. In such scenarios it is advantageous to first sort the data and then use the SortedArrayStatistics functions with constant time complexity. For convenience we also provide alternative implementations with the Func-suffix in the Statistics class that do exactly that: instead of accepting a tau-parameter themselves, they return a function that accepts tau:
var icdf = new[] { 3.0, 1.0, 2.0, 4.0 }.InverseCDFFunc();
var a = icdf(0.3); // 2
var b = new[] { 0.0, 0.1, 0.5, 0.9, 1.0 }.Select(icdf).ToArray(); // 1,1,2,4,4
Linear Algebra
There have been quite a few bug fixes and performance improvements around linear algebra. See the release notes for details.
Matrix and Vector String Formatting
Previously the ToString method used to render the whole matrix to a string. When working with very large data sets this can be an expensive operation both on CPU and memory usage. It also makes it a pain to work with interactive consoles or REPL environments like F# Interactive that write the ToString of the resulting object to the console output.
Starting from v2.5, ToString methods no longer render the whole structure to a string for large data. Instead, ToString now only renders an excerpt of the data, together with a line about dimension, type and in case of sparse data a sparseness indicator. The intention is to give a good idea about the data in a visually useful way:
DenseMatrix.CreateRandom(60,80,Normal.WithMeanVariance(2.0,1.0)).ToString();
generates the following multi-line string:
DenseMatrix 60x80-Double
1.68665 1.24322 3.36594 2.07444 4.13008 ... 3.01076
1.10888 2.8856 2.31662 3.94124 3.56711 ... -0.216804
0.843804 2.67243 1.097 2.34063 0.875953 ... 1.808
3.87044 2.69509 2.79642 -0.354365 2.45302 ... 2.79665
2.05722 3.39823 2.56256 1.88849 1.75259 ... 3.8987
1.9874 2.97047 1.40584 1.97734 2.37733 ... 2.875
2.06503 1.15681 3.85957 2.84836 1.25326 ... -0.108938
... ... ... ... ... ... ...
4.39245 1.32734 2.83637 1.78257 2.44356 ... 2.58935
The output is not perfect yet, as we’d ideally align the decimal point and automatically choose the right width for each column. Hopefully we can fix that in a future version. How much data is shown by default can be adjusted in the Control class:
Control.MaxToStringColumns = 6;
Control.MaxToStringRows = 8;
Or you can use an override or alternative:
Matrix.ToString(maxRows, maxColumns, formatProvider)
// Just the top info line
Matrix.ToTypeString()
// Just the matrix data without the top info line
Matrix.ToMatrixString(maxRows, maxColumns, formatProvider)
Matrix.ToMatrixString(maxRows, maxColumns, padding, format, formatProvider)
Note that ToString has never been intended to serialize a matrix to a string in order to parse it back later. Please use one of our data libraries instead, e.g. the MathNet.Numerics.Data.Text package.
Creating a Matrix or Vector
Constructing a matrix or vector has become more consistent: Except for obsolete members which will be removed in v3, all constructors now directly use the provided data structure by reference, without any copying. This means that there are only constructors left that accept the actual inner data structure format. Usually you’d use the new static functions instead, which always either create a copy or construct the inner data structure directly from the provided data, without keeping a reference to it.
Some examples below. The C# way usually works in F# just the same as well, but we provide more idiomatic alternatives for most of them:
// Directly from an array in the internal column-major format (no copying)
C#: new DenseMatrix(2, 3, new[] { 1.0, 2.0, 10.0, 20.0, 100.0, 300.0 })
F#: DenseMatrix.raw 2 3 [| 1.0; 2.0; 10.0; 20.0; 100.0; 300.0 |]
// All-zero 3x4 (3 rows, 4 columns) matrix
C#: new DenseMatrix(3, 4)
F#: DenseMatrix(3, 4)
F#: DenseMatrix.zeroCreate 3 4
// 3x4, all cells the same fixed value
C#: DenseMatrix.Create(3, 4, (r, c) => 20.5) // note: better way is planned
F#: DenseMatrix.create 3 4 20.5
// 3x4, random
C#: DenseMatrix.CreateRandom(3, 4, Normal.WithMeanVariance(2.0, 0.5));
F#: DenseMatrix.randomCreate 3 4 (Normal.WithMeanVariance(2.0, 0.5))
// 3x4, using an initializer function
C#: DenseMatrix.Create(3, 4, (r, c) => r/100.0 j)
F#: DenseMatrix.init 3 4 (fun r c -> float r/100.0 float c)
// From enumerables of enumerables (as rows or columns)
C#: DenseMatrix.OfColumns(2,3, new[] { new[] { 1.0, 4.0 }, new[] { 2.0, 5.0 }, new[] { 3.0, 6.0 } })
F#: DenseMatrix.ofRows 2 3 [{ 1.0 .. 3.0 }; { 4.0 .. 6.0 }]
// From F# lists of lists
F#: DenseMatrix.ofRowsList 2 3 [[1.0 .. 3.0]; [4.0 .. 6.0]]
F#: matrix [[1.0; 2.0; 3.0]
[4.0; 5.0; 6.0]]
// By calling a function to construct each row (or analog for columns)
F#: DenseMatrix.initRow 10 10 (fun r -> vector [float r .. float (r 9)])
All of these work the same way also for sparse matrices, and similarly for vectors. Useful for sparse data is another way that accepts a list or sequence of indexed row * column * value tuples, where all other cells are assumed to be zero:
F#: SparseMatrix.ofListi 200 100 [(4,3,20.0); (18,9,3.0); (2,1,99.9)]
F#: DenseMatrix.ofSeqi 10 10 (seq {for i in 0..9 -> (i,i,float(i*i))})
C#: SparseMatrix.OfIndexed(10,10,Enumerable.Range(0,10).Select(i => Tuple.Create(i,i,(double)(i*i))))
Inplace Map
The F# modules have supported (outplace) combinators like map for quite some time. We’ve now implemented inplace map directly in the storage classes so it can operate efficiently also on sparse data, and is accessible from C# code without using the F# extensions. The F# map-related functions have been updated to leverage these routines:
matrix |> Matrix.map (fun x -> 2.0*x)
matrix |> Matrix.mapnz (fun x -> 2.0*x) // non-zero: may skip zero values, useful if sparse
matrix |> Matrix.mapi (fun i j x -> x float i - float j) // indexed with row, column
matrix |> Matrix.mapinz (fun i j x -> x float i - float j) // indexed, non-zero
Or the equivalent inplace versions which return unit:
matrix |> Matrix.mapInPlace (fun x -> 2.0*x)
matrix |> Matrix.mapnzInPlace (fun x -> 2.0*x)
matrix |> Matrix.mapiInPlace (fun i j x -> x float i - float j)
matrix |> Matrix.mapinzInPlace (fun i j x -> x float i - float j)
In C#:
matrix.MapInplace(x => 2*x);
matrix.MapIndexedInplace((i,j,x) => x i-j, forceMapZeros:true);
F# Slice Setters
Speaking about F#, we’ve supported the slice getter syntax for a while as a nice way to get a sub-matrix. For example, to get the bottom right 2x2 sub-matrix of m we can do:
let m = DenseMatrix.init 3 4 (fun i j -> float (10 * i j))
m |> printfn "%A"
m.[1..2,2..3] |> printfn "%A"
The same syntax now also works for setters, e.g. to overwrite the very same bottom right corner we can write:
m.[1..2,2..3] <- matrix [[0.1; 0.2]
[0.3; 0.4]]
printfn "%A" m
The 3 printfn statements generate the following output:
DenseMatrix 3x4-Double
0 1 2 3
10 11 12 13
20 21 22 23
DenseMatrix 2x2-Double
12 13
22 23
DenseMatrix 3x4-Double
0 1 2 3
10 11 0.1 0.2
20 21 0.3 0.4
[Less]
|
Posted
about 12 years
ago
Math.NET Numerics v2.5, released in April 2013, is focused on statistics and linear algebra. As usual you’ll find a full listing of all changes in the release notes. However, I’d like to take the chance to highlight some important changes, show some
... [More]
code samples and explain the reasoning behind the changes.
Please let me know if these “What’s New” articles are useful in this format and whether I should continue to put the together for future releases.
Statistics
Order Statistics & Quantiles
Previously our order statistics and quantile functions were quite limited. With this release we finally have almost complete quantile support:
OrderStatistic
Median
LowerQuartile
UpperQuartile
InterquartileRange
FiveNumberSummary
Percentile
Quantile
QuantileCustom
All of them are implemented on top of the quantile function. We always default to approximately median-unbiased quantiles, usually denoted as type R-8, which do not assume samples to be normally distributed. If you need compatibility with another implementation, you can use QuantileCustom which accepts either a QuantileDefinition enum (we support all 9 R-types, SAS 1-5, Excel, Nist, Hydrology, etc.) or a 4-parameter definition as in Mathematica.
For the empirical inverse cummulative distribution, which is essentially an R1-type quantile, you can use the new Statistics.InverseCDF function.
More efficient ways to compute statistics
Previously there were two ways to estimate some statistics from a sample set: The Statistics class provided static extension methods to evaluate a single statistic from an enumerable, and DescriptiveStatistics to compute a whole set of standard statistics at once. This was unsatisfactory since it was not very efficient: the DescriptiveStatistics way actually required more than one pass internally (mostly because of the median) and it was not leveraging the fact that the sample set may already be sorted.
To fix the first issue, we’ve marked DescriptiveStatistics.Median as obsolete and will remove it in v3. Until then, the median computation is delayed until requested the first time. In normal cases where Median is not used it now only requires a single pass.
The second issue we attacked by introducing three new classes to compute a single statistic directly from the best fitting sample data format:
ArrayStatistics operates on arrays which are not assumed to be sorted.
SortedArrayStatistics operates on arrays which must be sorted in ascending order.
StreamingStatistics operates on a stream in a single pass, without keeping the full data in memory at any time. Can thus be used to stream over data larger than system memory.
ArrayStatistics implements Minimum, Maximum, Mean, Variance, StandardDeviation, PopulationVariance and PopulationStandardDeviation. In addition it implements all the order statistics/quantile functions mentioned above, but in an inplace way that reorders the data array (partial sorting) and because of that is marked with an Inplace-suffix to indicate the side effect. These inplace functions get slightly faster when calling them repeatedly, but will always be slower than the sorted array statistics. Nevertheless, since the sorting itself is quite expensive, all in all the (non-sorted) array statistics are still faster in practice if only few calls are needed.
Example: We want to compute the IQR of {3,1,2,4}.
var data = new double[] { 3.0, 1.0, 2.0, 4.0 };
ArrayStatistics.InterquartileRangeInplace(data); // iqr = 2.16666666666667
This is equivalent to executing IQR(c(3,1,2,4), type=8) in R, with quantile definition type R-8.
SortedArrayStatistics expects data to be sorted in ascending order and implements Minimum, Maximum, and all the order statistics/quantile functions mentioned above. It leverages the ordering for very fast (constant time) order statistics. There’s also no need to reorder the data, so other than ArrayStatistics, this class never modifies the provided array and has no side effect. It does not re-implement any operations that cannot leverage the ordering, like Mean or Variance, so use the implementation from ArrayStatistics instead if needed.
var data = new[] { 1.0, 2.0, 3.0, 4.0 };
var iqr = SortedArrayStatistics.InterquartileRange(data); // iqr = 2.16666666666667
StreamingStatistics estimates statistics in a single pass without memorization and implements Minimum, Maximum, Mean, Variance, StandardDeviation, PopulationVariance and PopulationStandardDeviation. It does not implement any order statistics, since they require sorting and are thus not computable in a single pass without keeping the data in memory. No function of this class has any side effects on the data.
The Statistics class has been updated to leverage these new implementations internally, and implements all of the statistics mentioned above as extension methods on enumerables. No function of this class has any side effects on the data.
var iqr = new[] { 3.0, 1.0, 2.0, 4.0 }.InterquartileRange(); // iqr = 2.16666666666667
Note that this is generally slower than ArrayStatistics because it requires to copy the array to make sure there are no side effects, and much slower than SortedArrayStatistics which would have constant time (assuming we sorted it manually first).
Repeated Evaluation and Precomputed Functions
Most of the quantile functions accept a tau-parameter. Often you need to evaluate that function with not one but a whole range of values for that parameter, say for plotting. In such scenarios it is advantageous to first sort the data and then use the SortedArrayStatistics functions with constant time complexity. For convenience we also provide alternative implementations with the Func-suffix in the Statistics class that do exactly that: instead of accepting a tau-parameter themselves, they return a function that accepts tau:
var icdf = new[] { 3.0, 1.0, 2.0, 4.0 }.InverseCDFFunc();
var a = icdf(0.3); // 2
var b = new[] { 0.0, 0.1, 0.5, 0.9, 1.0 }.Select(icdf).ToArray(); // 1,1,2,4,4
Linear Algebra
There have been quite a few bug fixes and performance improvements around linear algebra. See the release notes for details.
Matrix and Vector String Formatting
Previously the ToString method used to render the whole matrix to a string. When working with very large data sets this can be an expensive operation both on CPU and memory usage. It also makes it a pain to work with interactive consoles or REPL environments like F# Interactive that write the ToString of the resulting object to the console output.
Starting from v2.5, ToString methods no longer render the whole structure to a string for large data. Instead, ToString now only renders an excerpt of the data, together with a line about dimension, type and in case of sparse data a sparseness indicator. The intention is to give a good idea about the data in a visually useful way:
DenseMatrix.CreateRandom(60,80,Normal.WithMeanVariance(2.0,1.0)).ToString();
generates the following multi-line string:
DenseMatrix 60x80-Double
1.68665 1.24322 3.36594 2.07444 4.13008 ... 3.01076
1.10888 2.8856 2.31662 3.94124 3.56711 ... -0.216804
0.843804 2.67243 1.097 2.34063 0.875953 ... 1.808
3.87044 2.69509 2.79642 -0.354365 2.45302 ... 2.79665
2.05722 3.39823 2.56256 1.88849 1.75259 ... 3.8987
1.9874 2.97047 1.40584 1.97734 2.37733 ... 2.875
2.06503 1.15681 3.85957 2.84836 1.25326 ... -0.108938
... ... ... ... ... ... ...
4.39245 1.32734 2.83637 1.78257 2.44356 ... 2.58935
The output is not perfect yet, as we’d ideally align the decimal point and automatically choose the right width for each column. Hopefully we can fix that in a future version. How much data is shown by default can be adjusted in the Control class:
Control.MaxToStringColumns = 6;
Control.MaxToStringRows = 8;
Or you can use an override or alternative:
Matrix.ToString(maxRows, maxColumns, formatProvider)
// Just the top info line
Matrix.ToTypeString()
// Just the matrix data without the top info line
Matrix.ToMatrixString(maxRows, maxColumns, formatProvider)
Matrix.ToMatrixString(maxRows, maxColumns, padding, format, formatProvider)
Note that ToString has never been intended to serialize a matrix to a string in order to parse it back later. Please use one of our data libraries instead, e.g. the MathNet.Numerics.Data.Text package.
Creating a Matrix or Vector
Constructing a matrix or vector has become more consistent: Except for obsolete members which will be removed in v3, all constructors now directly use the provided data structure by reference, without any copying. This means that there are only constructors left that accept the actual inner data structure format. Usually you’d use the new static functions instead, which always either create a copy or construct the inner data structure directly from the provided data, without keeping a reference to it.
Some examples below. The C# way usually works in F# just the same as well, but we provide more idiomatic alternatives for most of them:
// Directly from an array in the internal column-major format (no copying)
C#: new DenseMatrix(2, 3, new[] { 1.0, 2.0, 10.0, 20.0, 100.0, 300.0 })
F#: DenseMatrix.raw 2 3 [| 1.0; 2.0; 10.0; 20.0; 100.0; 300.0 |]
// All-zero 3x4 (3 rows, 4 columns) matrix
C#: new DenseMatrix(3, 4)
F#: DenseMatrix(3, 4)
F#: DenseMatrix.zeroCreate 3 4
// 3x4, all cells the same fixed value
C#: DenseMatrix.Create(3, 4, (r, c) => 20.5) // note: better way is planned
F#: DenseMatrix.create 3 4 20.5
// 3x4, random
C#: DenseMatrix.CreateRandom(3, 4, Normal.WithMeanVariance(2.0, 0.5));
F#: DenseMatrix.randomCreate 3 4 (Normal.WithMeanVariance(2.0, 0.5))
// 3x4, using an initializer function
C#: DenseMatrix.Create(3, 4, (r, c) => r/100.0 + j)
F#: DenseMatrix.init 3 4 (fun r c -> float r/100.0 + float c)
// From enumerables of enumerables (as rows or columns)
C#: DenseMatrix.OfColumns(2,3, new[] { new[] { 1.0, 4.0 }, new[] { 2.0, 5.0 }, new[] { 3.0, 6.0 } })
F#: DenseMatrix.ofRows 2 3 [{ 1.0 .. 3.0 }; { 4.0 .. 6.0 }]
// From F# lists of lists
F#: DenseMatrix.ofRowsList 2 3 [[1.0 .. 3.0]; [4.0 .. 6.0]]
F#: matrix [[1.0; 2.0; 3.0]
[4.0; 5.0; 6.0]]
// By calling a function to construct each row (or analog for columns)
F#: DenseMatrix.initRow 10 10 (fun r -> vector [float r .. float (r+9)])
All of these work the same way also for sparse matrices, and similarly for vectors. Useful for sparse data is another way that accepts a list or sequence of indexed row * column * value tuples, where all other cells are assumed to be zero:
F#: SparseMatrix.ofListi 200 100 [(4,3,20.0); (18,9,3.0); (2,1,99.9)]
F#: DenseMatrix.ofSeqi 10 10 (seq {for i in 0..9 -> (i,i,float(i*i))})
C#: SparseMatrix.OfIndexed(10,10,Enumerable.Range(0,10).Select(i => Tuple.Create(i,i,(double)(i*i))))
Inplace Map
The F# modules have supported (outplace) combinators like map for quite some time. We’ve now implemented inplace map directly in the storage classes so it can operate efficiently also on sparse data, and is accessible from C# code without using the F# extensions. The F# map-related functions have been updated to leverage these routines:
matrix |> Matrix.map (fun x -> 2.0*x)
matrix |> Matrix.mapnz (fun x -> 2.0*x) // non-zero: may skip zero values, useful if sparse
matrix |> Matrix.mapi (fun i j x -> x + float i - float j) // indexed with row, column
matrix |> Matrix.mapinz (fun i j x -> x + float i - float j) // indexed, non-zero
Or the equivalent inplace versions which return unit:
matrix |> Matrix.mapInPlace (fun x -> 2.0*x)
matrix |> Matrix.mapnzInPlace (fun x -> 2.0*x)
matrix |> Matrix.mapiInPlace (fun i j x -> x + float i - float j)
matrix |> Matrix.mapinzInPlace (fun i j x -> x + float i - float j)
In C#:
matrix.MapInplace(x => 2*x);
matrix.MapIndexedInplace((i,j,x) => x+i-j, forceMapZeros:true);
F# Slice Setters
Speaking about F#, we’ve supported the slice getter syntax for a while as a nice way to get a sub-matrix. For example, to get the bottom right 2x2 sub-matrix of m we can do:
let m = DenseMatrix.init 3 4 (fun i j -> float (10 * i + j))
m |> printfn "%A"
m.[1..2,2..3] |> printfn "%A"
The same syntax now also works for setters, e.g. to overwrite the very same bottom right corner we can write:
m.[1..2,2..3] <- matrix [[0.1; 0.2]
[0.3; 0.4]]
printfn "%A" m
The 3 printfn statements generate the following output:
DenseMatrix 3x4-Double
0 1 2 3
10 11 12 13
20 21 22 23
DenseMatrix 2x2-Double
12 13
22 23
DenseMatrix 3x4-Double
0 1 2 3
10 11 0.1 0.2
20 21 0.3 0.4
[Less]
|
Posted
about 12 years
ago
Linear algebra is one of those areas where performance can be essential, but
also one where native optimizations can make a huge difference. That’s why in Math.NET Numerics
we implemented linear algebra on top of a provider abstraction where
... [More]
providers can be exchanged.
Out of the box Math.NET Numerics only includes a fully managed provider which is supported on
all platforms, but unfortunately is also rather slow. This doesn’t matter much for most problems,
but if you’re working with very large dense matrices it can be a deal breaker. That’s why we’ve
added some helper projects you can use to compile your own native provider, but that is still
quite involved and requires some experience around C or C++. Not any more, kudos to @marcuscuda!
Since Math.NET Numerics v2.4
we begin to distribute native providers as NuGet packages, starting with one based on Intel MKL.
Enabling native algorithms becomes almost as simple as adding a NuGet package to your project.
Enabling Native Linear Algebra
Since the native providers are optimized for and work on a specific platform only, you’ll
have to decide exactly what platform your project should target (x64/64-bit, or x86/32-bit),
configure your project or compiler accordingly and then choose the matching native package.
We might be able to simplify this in the future by distributing both versions together
and automatically selecting the right one at runtime, but for now that’s how it is.
PM> Install-Package MathNet.Numerics.MKL.Win-x64
PM> Install-Package MathNet.Numerics.MKL.Win-x86
We’re also preparing analog packages for Linux.
I mentioned it is almost as simple as adding a NuGet package. Almost, because you still need to make sure
the libraries get copied to your bin directory (e.g. by choosing “Copy always” in VS) and to actually
enable them using the Control class:
1
Control.LinearAlgebraProvider = new MklLinearAlgebraProvider();
Without this line the default provider would have been used instead. Usually the default provider is the
fully managed one mentioned above. However, you can choose the new MKL provider as default
from the outside by setting the MathNetNumericsLAProvider environment variable to MKL
(Note: does not work on the portable build).
This is especially useful for unit testing against multiple providers, but also e.g.
to upgrade to a faster provider in cases where you cannot actually modify the source code.
If needed you can prevent the environment variable to have any effect in your application
by explicitly configuring a provider yourself, e.g. the managed provider:
1
Control.LinearAlgebraProvider = new ManagedLinearAlgebraProvider();
Speed Up
You can expect a very significant speed up with the MKL provider. For example,
computing the product of two 1000x1000 dense double matrices I’ve just measured the following
median times on my 4-core x64 Windows desktop, with no code change between them (except
configuring the Control class):
Managed provider (parallelization enabled): 0.9s
MKL native provider: 0.085s
This might not be a very thorough analysis but shows that even in this simple case
the native provider is 10 times faster than our managed provider. Admittedly our
managed provider might have some room for improvement (we actually have a promising
alternative implementation waiting to be fully tested and integrated), but even with
the best generic & portable managed algorithm we’re unlikely to ever beat a native provider.
Alternative Native Providers - Call for Help
We try to make alternative providers available the same way in the future,
hopefully also including some more open/free ones. Unfortunately, for most of
them even just getting them to compile for both x86 and x64 on Windows can
be very tricky. We’ve worked on ACML, Goto (now OpenBLAS) and ATLAS
in the past but ran into problems with each of them on Windows.
Hence, please let us know if you have any experience you could share with us. Thanks. [Less]
|
Posted
about 12 years
ago
Linear algebra is one of those areas where performance can be essential, but
also one where native optimizations can make a huge difference. That’s why in Math.NET Numerics
we implented linear algebra on top of a provider abstraction where
... [More]
providers can be exchanged.
Out of the box Math.NET Numerics only includes a fully managed provider which is supported on
all platforms, but unfortunately is also rather slow. This doesn’t matter much for most problems,
but if you’re working with very large dense matrices it can be a deal breaker. That’s why we’ve
added some helper projects you can use to compile your own native provider, but that is still
quite involved and requires some experience around C or C++. Not any more, kudos to @marcuscuda!
Since Math.NET Numerics v2.4
we begin to distribute native providers as NuGet packages, starting with one based on Intel MKL.
Enabling native algorithms becomes almost as simple as adding a NuGet package to your project.
Enabling Native Linear Algebra
Since the native providers are optimized for and work on a specific platform only, you’ll
have to decide exactly what platform your project should target (x64/64-bit, or x86/32-bit),
configure your project or compiler accordingly and then choose the matching native package.
We might be able to simplify this in the future by distributing both versions together
and automatically selecting the right one at runtime, but for now that’s how it is.
PM> Install-Package MathNet.Numerics.MKL.Win-x64
PM> Install-Package MathNet.Numerics.MKL.Win-x86
We’re also preparing analog packages for Linux.
I mentioned it is almost as simple as adding a NuGet package. Almost, because you still need to make sure
the libraries get copied to your bin directory (e.g. by choosing “Copy always” in VS) and to actually
enable them using the Control class:
1
Control.LinearAlgebraProvider = new MklLinearAlgebraProvider();
Without this line the default provider would have been used instead. Usually the default provider is the
fully managed one mentioned above. However, you can choose the new MKL provider as default
from the outside by setting the MathNetNumericsLAProvider environment variable to MKL
(Note: does not work on the portable build).
This is especially useful for unit testing against multiple providers, but also e.g.
to upgrade to a faster provider in cases where you cannot actually modify the source code.
If needed you can prevent the environment variable to have any effect in your application
by explicitly configuring a provider yourself, e.g. the managed provider:
1
Control.LinearAlgebraProvider = new ManagedLinearAlgebraProvider();
Speed Up
You can expect a very significant speed up with the MKL provider. For example,
computing the product of two 1000x1000 dense double matrices I’ve just measured the following
median times on my 4-core x64 Windows desktop, with no code change between them (except
configuring the Control class):
Managed provider (parallelization enabled): 0.9s
MKL native provider: 0.085s
This might not be a very thorough analysis but shows that even in this simple case
the native provider is 10 times faster than our managed provider. Admittedly our
managed provider might have some room for improvement (we actually have a promising
alternative implementation waiting to be fully tested and integrated), but even with
the best generic & portable managed algorithm we’re unlikely to ever beat a native provider.
Alternative Native Providers - Call for Help
We try to make alternative providers available the same way in the future,
hopefully also including some more open/free ones. Unfortunately, for most of
them even just getting them to compile for both x86 and x64 on Windows can
be very tricky. We’ve worked on ACML, Goto (now OpenBLAS) and ATLAS
in the past but ran into problems with each of them on Windows.
Hence, please let us know if you have any experience you could share with us. Thanks. [Less]
|
Posted
about 12 years
ago
Linear algebra is one of those areas where performance can be essential, but
also one where native optimizations can make a huge difference. That’s why in Math.NET Numerics
we implemented linear algebra on top of a provider abstraction where
... [More]
providers can be exchanged.
Out of the box Math.NET Numerics only includes a fully managed provider which is supported on
all platforms, but unfortunately is also rather slow. This doesn’t matter much for most problems,
but if you’re working with very large dense matrices it can be a deal breaker. That’s why we’ve
added some helper projects you can use to compile your own native provider, but that is still
quite involved and requires some experience around C or C++. Not any more, kudos to @marcuscuda!
Since Math.NET Numerics v2.4
we begin to distribute native providers as NuGet packages, starting with one based on Intel MKL.
Enabling native algorithms becomes almost as simple as adding a NuGet package to your project.
Enabling Native Linear Algebra
Since the native providers are optimized for and work on a specific platform only, you’ll
have to decide exactly what platform your project should target (x64/64-bit, or x86/32-bit),
configure your project or compiler accordingly and then choose the matching native package.
We might be able to simplify this in the future by distributing both versions together
and automatically selecting the right one at runtime, but for now that’s how it is.
PM> Install-Package MathNet.Numerics.MKL.Win-x64
PM> Install-Package MathNet.Numerics.MKL.Win-x86
We’re also preparing analog packages for Linux.
I mentioned it is almost as simple as adding a NuGet package. Almost, because you still need to make sure
the libraries get copied to your bin directory (e.g. by choosing “Copy always” in VS) and to actually
enable them using the Control class:
1
Control.LinearAlgebraProvider = new MklLinearAlgebraProvider();
Without this line the default provider would have been used instead. Usually the default provider is the
fully managed one mentioned above. However, you can choose the new MKL provider as default
from the outside by setting the MathNetNumericsLAProvider environment variable to MKL
(Note: does not work on the portable build).
This is especially useful for unit testing against multiple providers, but also e.g.
to upgrade to a faster provider in cases where you cannot actually modify the source code.
If needed you can prevent the environment variable to have any effect in your application
by explicitly configuring a provider yourself, e.g. the managed provider:
1
Control.LinearAlgebraProvider = new ManagedLinearAlgebraProvider();
Speed Up
You can expect a very significant speed up with the MKL provider. For example,
computing the product of two 1000x1000 dense double matrices I’ve just measured the following
median times on my 4-core x64 Windows desktop, with no code change between them (except
configuring the Control class):
Managed provider (parallelization enabled): 0.9s
MKL native provider: 0.085s
This might not be a very thorough analysis but shows that even in this simple case
the native provider is 10 times faster than our managed provider. Admittedly our
managed provider might have some room for improvement (we actually have a promising
alternative implementation waiting to be fully tested and integrated), but even with
the best generic & portable managed algorithm we’re unlikely to ever beat a native provider.
Alternative Native Providers - Call for Help
We try to make alternative providers available the same way in the future,
hopefully also including some more open/free ones. Unfortunately, for most of
them even just getting them to compile for both x86 and x64 on Windows can
be very tricky. We’ve worked on ACML, Goto (now OpenBLAS) and ATLAS
in the past but ran into problems with each of them on Windows.
Hence, please let us know if you have any experience you could share with us. Thanks. [Less]
|
Posted
over 12 years
ago
Likely the most requested feature for Math.NET Numerics is
support for some form of regression, or fitting data to a curve.
I’ll show in this article how you can easily compute regressions manually using Math.NET,
until we support it out of the
... [More]
box. We already have broad interpolation support,
but interpolation is about fitting some curve exactly through a given set of data points
and therefore an entirely different problem.
For a regression there are usually much more data points available than curve parameters,
so we want to find the parameters that produce the lowest errors on the provided data points,
according to some error metric.
Least Squares Linear Regression
If the curve is linear in its parameters, then we’re speaking of linear regression.
The problem becomes much simpler and we can leverage the rich linear algebra toolset to
find the best parameters, especially if we want to minimize the square of the errors
(least squares metric).
In the general case such a curve would be in the form of a linear combination
of arbitrary but known functions , scaled by the parameters .
Note that none of the functions depends on any of the parameters.
If we have data points , then we can write the whole problem as an
overdefined system of equations:
Or in matrix notation:
This is a standard least squares problem and can easily be solved using
Math.NET Numerics’s linear algebra classes
and the QR decomposition.
In literature you’ll usually find algorithms explicitly computing some form of matrix inversion.
While symbolically correct, using the QR decomposition instead is numerically more robust.
This is a solved problem, after all.
var p = X.QR().Solve(y);
Some matrices of this form have well known names, for example the
Vandermonde-Matrix for fitting to a polynomial.
Example: Fitting to a Line
A line can be parametrized by the height at and its slope :
This maps to the general case with parameters as follows:
And therefore the equation system
The complete code when using Math.NET Numerics would look like this:
// data points
var xdata = new double[] { 10, 20, 30 };
var ydata = new double[] { 15, 20, 25 };
// build matrices
var X = DenseMatrix.CreateFromColumns(
new[] {new DenseVector(xdata.Length, 1), new DenseVector(xdata)});
var y = new DenseVector(ydata);
// solve
var p = X.QR().Solve(y);
var a = p[0];
var b = p[1];
Example: Fitting to an arbitrary linear function
The functions do not have to be linear in at all to work with
linear regression, as long as the resulting function remains linear in the
parameters . In fact, we can use arbitrary functions, as long as they are
defined at all our data points . For example, let’s compute the regression
to the following complicated function including the
Digamma function
, sometimes also known as Psi function:
The resulting equation system in Matrix form:
The complete code with Math.NET Numerics, but this time with F#:
// define our target functions
let f1 x = Math.Sqrt(Math.Exp(x))
let f2 x = SpecialFunctions.DiGamma(x*x)
// create data samples, with chosen parameters and with gaussian noise added
let fy (noise:IContinuousDistribution) x = 2.5*f1(x) - 4.0*f2(x) + noise.Sample()
let xdata = [ 1.0 .. 1.0 .. 10.0 ]
let ydata = xdata |> List.map (fy (Normal.WithMeanVariance(0.0,2.0)))
// build matrix form
let X =
[|
xdata |> List.map f1 |> vector
xdata |> List.map f2 |> vector
|] |> DenseMatrix.CreateFromColumns
let y = vector ydata
// solve
let p = X.QR().Solve(y)
let (a,b) = (p.[0], p.[1])
Note that we use the Math.NET Numerics F# package
here (e.g. for the vector function).
Example: Fitting to a Sine
Just like the digamma function we can also target a sine curve. However,
to make it more interesting, we’re also looking for phase shift and frequency parameters:
Unfortunately the function now depends on parameters and
which is not allowed in linear regression. Indeed, fitting to a fequency in a linear way is not
trivial if possible at all, but for a fixed we can leverage the following trigonometric identity:
and therefore
However, note that because of the non-linear transformation on the and parameters,
the result will no longer be strictly the least square error solution. While our result
would be good enough for some scenarios, we’d either need to compensate or switch to
non-linear regression if we need the actual least square error parameters.
The complete code in C# with Math.NET Numerics would look like this:
// data points: we compute y perfectly but then add strong random noise to it
var rnd = new Random(1);
var omega = 1.0d
var xdata = new double[] { -1, 0, 0.1, 0.2, 0.3, 0.4, 0.65, 1.0, 1.2, 2.1, 4.5, 5.0, 6.0 };
var ydata = xdata
.Select(x => 5 + 2 * Math.Sin(omega*x + 0.2) + 2*(rnd.NextDouble()-0.5)).ToArray();
// build matrices
var X = Matrix.CreateFromColumns(new[] {
new DenseVector(xdata.Length, 1),
new DenseVector(xdata.Select(t => Math.Sin(omega*t)).ToArray()),
new DenseVector(xdata.Select(t => Math.Cos(omega*t)).ToArray())});
var y = new DenseVector(ydata);
// solve
var p = X.QR().Solve(y);
var a = p[0];
var b = SpecialFunctions.Hypotenuse(p[1], p[2]);
var c = Math.Atan2(p[2], p[1]);
The following graph visualizes the resulting regressions. The curve we computed the values
from, before adding the strong noise, is shown in black. The red dots show the actual data points
with only small noise, the blue dots the points with much stronger noise added. The red and blue
curves then show the actual computed regressions for each.
[Less]
|
Posted
over 12 years
ago
Likely the most requested feature for Math.NET Numerics is
support for some form of regression, or fitting data to a curve.
I’ll show in this article how you can easily compute regressions manually using Math.NET,
until we support it out of the
... [More]
box. We already have broad interpolation support,
but interpolation is about fitting some curve exactly through a given set of data points
and therefore an entirely different problem.
For a regression there are usually much more data points available than curve parameters,
so we want to find the parameters that results in the lowest errors on the provided data points,
according to some error metric.
Least Squares Linear Regression
If the curve is linear in its parameters, then we’re speaking of linear regression.
The problem becomes much simpler and we can leverage the rich linear algebra toolset to
find the best parameters, especially if we want to minimize the square of the errors
(least squares metric).
In the general case such a curve would be in the form of a linear combination
of arbitrary but known functions , scaled by the parameters .
Note that none of the functions depends on any of the parameters.
If we have data points , then we can write the whole problem as an
overdefined system of equations:
Or in matrix notation:
This is a standard least squares problem and can easily be solved using
Math.NET Numerics’s linear algebra classes
and the QR decomposition.
In literature you’ll usually find algorithms explicitly computing some form of matrix inversion.
While symbolically correct, using the QR decomposition instead is numerically more robust.
This is a solved problem, after all.
var p = X.QR().Solve(y);
Some matrices of this form have well known names, for example the
Vandermonde-Matrix for fitting to a polynomial.
Example: Fitting to a Line
A line can be parametrized by the height at and its slope :
This maps to the general case with parameters as follows:
And therefore the equation system
The complete code when using Math.NET Numerics would look like this:
// data points
var xdata = new double[] { 10, 20, 30 };
var ydata = new double[] { 15, 20, 25 };
// build matrices
var X = DenseMatrix.CreateFromColumns(
new[] {new DenseVector(xdata.Length, 1), new DenseVector(xdata)});
var y = DenseMatrix.CreateFromColumns(new[] {new DenseVector(ydata)});
// solve
var p = X.QR().Solve(y);
var a = p[0, 0];
var b = p[1, 0];
Example: Fitting to a Sine
The functions do not have to be linear in at all to work with
linear regression. To fit some data points to a sine curve we can simpliy define
the functions like below and continue as above with the line:
However, this curve would be much more useful if it would support phase shift and frequency of the sine:
Unfortunately the function now depends on parameters and
which is not allowed in linear regression. Indeed, fitting to a fequency in a linear way is not
trivial if possible at all, but for a fixed we can leverage the following trigonometric identity
to support :
and therefore
However, note that because of the non-linear transformation on the and parameters,
the result will no longer be strictly the least square error solution. While our result
would be good enough for some scenarios, we’d either need to compensate or switch to
non-linear regression if we need the actual least square error parameters.
The complete code in C# with Math.NET Numerics would look like this:
// data points: we compute y perfectly but then add strong random noise to it
var rnd = new Random(1);
var omega = 1.0d
var xdata = new double[] { -1, 0, 0.1, 0.2, 0.3, 0.4, 0.65, 1.0, 1.2, 2.1, 4.5, 5.0, 6.0 };
var ydata = xdata
.Select(x => 5 + 2 * Math.Sin(omega*x + 0.2) + 2*(rnd.NextDouble()-0.5)).ToArray();
// build matrices
var X = Matrix.CreateFromColumns(new[] {
new DenseVector(xdata.Length, 1),
new DenseVector(xdata.Select(t => Math.Sin(omega*t)).ToArray()),
new DenseVector(xdata.Select(t => Math.Cos(omega*t)).ToArray())});
var Y = DenseMatrix.CreateFromColumns(new[] {new DenseVector(ydata)});
// solve
var p = X.QR().Solve(Y);
var a = p[0, 0];
var b = Math.Sqrt(Math.Pow(p[1,0],2) + Math.Pow(p[2,0],2));
var c = Math.Atan2(p[2,0], p[1,0]);
The following graph visualizes the resulting regressions. The curve we computed the values
from, before adding the strong noise, is shown in black. The red dots show the actual data points
with only small noise, the blue dots the points with much stronger noise added. The red and blue
curves then show the actual computed regressions for each.
[Less]
|
Posted
over 12 years
ago
Likely the most requested feature for Math.NET Numerics is
support for some form of regression, or fitting data to a curve.
I’ll show in this article how you can easily compute regressions manually using Math.NET,
until we support it out of the box.
... [More]
We already have broad interpolation support,
but interpolation is about fitting some curve exactly through a given set of data points
and therefore an entirely different problem.
For a regression there are usually much more data points available than curve parameters,
so we want to find the parameters that produce the lowest errors on the provided data points,
according to some error metric.
Least Squares Linear Regression
If the curve is linear in its parameters, then we’re speaking of linear regression.
The problem becomes much simpler and we can leverage the rich linear algebra toolset to
find the best parameters, especially if we want to minimize the square of the errors
(least squares metric).
In the general case such a curve would be in the form of a linear combination
of arbitrary but known functions , scaled by the parameters .
Note that none of the functions depends on any of the parameters.
If we have data points , then we can write the whole problem as an
overdefined system of equations:
Or in matrix notation:
This is a standard least squares problem and can easily be solved using
Math.NET Numerics’s linear algebra classes
and the QR decomposition.
In literature you’ll usually find algorithms explicitly computing some form of matrix inversion.
While symbolically correct, using the QR decomposition instead is numerically more robust.
This is a solved problem, after all.
var p = X.QR().Solve(y);
Some matrices of this form have well known names, for example the
Vandermonde-Matrix for fitting to a polynomial.
Example: Fitting to a Line
A line can be parametrized by the height at and its slope :
This maps to the general case with parameters as follows:
And therefore the equation system
The complete code when using Math.NET Numerics would look like this:
// data points
var xdata = new double[] { 10, 20, 30 };
var ydata = new double[] { 15, 20, 25 };
// build matrices
var X = DenseMatrix.CreateFromColumns(
new[] {new DenseVector(xdata.Length, 1), new DenseVector(xdata)});
var y = new DenseVector(ydata);
// solve
var p = X.QR().Solve(y);
var a = p[0];
var b = p[1];
Example: Fitting to an arbitrary linear function
The functions do not have to be linear in at all to work with
linear regression, as long as the resulting function remains linear in the
parameters . In fact, we can use arbitrary functions, as long as they are
defined at all our data points . For example, let’s compute the regression
to the following complicated function including the
Digamma function
, sometimes also known as Psi function:
The resulting equation system in Matrix form:
The complete code with Math.NET Numerics, but this time with F#:
// define our target functions
let f1 x = Math.Sqrt(Math.Exp(x))
let f2 x = SpecialFunctions.DiGamma(x*x)
// create data samples, with chosen parameters and with gaussian noise added
let fy (noise:IContinuousDistribution) x = 2.5*f1(x) - 4.0*f2(x) + noise.Sample()
let xdata = [ 1.0 .. 1.0 .. 10.0 ]
let ydata = xdata |> List.map (fy (Normal.WithMeanVariance(0.0,2.0)))
// build matrix form
let X =
[|
xdata |> List.map f1 |> vector
xdata |> List.map f2 |> vector
|] |> DenseMatrix.CreateFromColumns
let y = vector ydata
// solve
let p = X.QR().Solve(y)
let (a,b) = (p.[0], p.[1])
Note that we use the Math.NET Numerics F# package
here (e.g. for the vector function).
Example: Fitting to a Sine
Just like the digamma function we can also target a sine curve. However,
to make it more interesting, we’re also looking for phase shift and frequency parameters:
Unfortunately the function now depends on parameters and
which is not allowed in linear regression. Indeed, fitting to a fequency in a linear way is not
trivial if possible at all, but for a fixed we can leverage the following trigonometric identity:
and therefore
However, note that because of the non-linear transformation on the and parameters,
the result will no longer be strictly the least square error solution. While our result
would be good enough for some scenarios, we’d either need to compensate or switch to
non-linear regression if we need the actual least square error parameters.
The complete code in C# with Math.NET Numerics would look like this:
// data points: we compute y perfectly but then add strong random noise to it
var rnd = new Random(1);
var omega = 1.0d
var xdata = new double[] { -1, 0, 0.1, 0.2, 0.3, 0.4, 0.65, 1.0, 1.2, 2.1, 4.5, 5.0, 6.0 };
var ydata = xdata
.Select(x => 5 + 2 * Math.Sin(omega*x + 0.2) + 2*(rnd.NextDouble()-0.5)).ToArray();
// build matrices
var X = Matrix.CreateFromColumns(new[] {
new DenseVector(xdata.Length, 1),
new DenseVector(xdata.Select(t => Math.Sin(omega*t)).ToArray()),
new DenseVector(xdata.Select(t => Math.Cos(omega*t)).ToArray())});
var y = new DenseVector(ydata);
// solve
var p = X.QR().Solve(y);
var a = p[0];
var b = SpecialFunctions.Hypotenuse(p[1], p[2]);
var c = Math.Atan2(p[2], p[1]);
The following graph visualizes the resulting regressions. The curve we computed the values
from, before adding the strong noise, is shown in black. The red dots show the actual data points
with only small noise, the blue dots the points with much stronger noise added. The red and blue
curves then show the actual computed regressions for each.
[Less]
|
Posted
almost 14 years
ago
by
Math.NET Team
Feature request 5691 was implemented in commit ca95f4f7.
We've added two new static methods in the Matrix<T> class, CreateFromRows and CreateFromColumns.
Example:
var row1 = new DenseVector(new [] { 1.0 }); var row2 = new DenseVector(new []
... [More]
{ 1.0, 2.0, 3.0, 4.0 }); var row3 = new DenseVector(new [] { 1.0, 2.0 }); var rowVectors = new List>{row1, row2, row3 }; var matrix = Matrix.CreateFromRows(rowVectors);
This will create a 3x4 dense matrix. The number of columns is determined by the maximum length of the given vectors (zero is used to pad short rows). Note that the storage type of the first vector determines the storage type of the matrix (i.e. a sparse vector creates a sparse matrix). CreateFromColumns works similarly. [Less]
|
Posted
almost 14 years
ago
by
Math.NET Team
Help us improve Math.NET Numerics. We are offering $50 bounties for finding and fixing a bug, or implementing a feature request. To claim a bug bounty, provide a unit test that demonstrates a bug and provide a fix. For a feature bounty, complete a
... [More]
task or feature request in the Math.NET Numerics' Issue Tracker.
Additional terms:1) documentation or resource string typos don't count as bugs.2) bug fixes must include a unit test that demostrates the bug.3) all new features must include unit tests.4) only tasks and feature requests that are added by or accepted by the Math.NET team are eligible. 5) bounties will be paid via PayPal.6) a maximum of 30 boutnites will be paid.7) the Math.NET Numerics team must accept the submitted fix or feature code. The team can reject any submission for any reason.
[Less]
|