Tuesday, September 21, 2010

MATLAB parallelisation for the Adaptive Optics Simulators

Recently in the article[1] there were proposed interesting ideas about parallelisation of MATLAB. In Euro50 project of the Extremely Large Telescope, the model of the entire telescope is built on the MATLAB and calculations are obviously beyond the capabilities of personal computers.

MATLAB does not include any parallel functionality however there are basic interfaces to the TCP/IP network stack, C and Fortran. By building upon these interfaces there have been of the order of 30 attempts (as in the survey R. Choy, Parallel MATLAB survey) to produce ``toolkits'' to allow MATLAB to be used in a parallel fashion. The approaches of communications between nodes that were used in those toolkits either take a low-level approach using commands similar to Message Passing Interface (MPI) libraries or use a more high level approach choosing to use simpler commands that resemble more closely those of MATLAB. The problem with MPI is that it is difficult to use with MATLAB[1].

According to[1], the acceleration of the simulation's execution using compilation to MEX is significant: experiment showed that MATLAB could indeed compete with the traditional HPC languages in raw performance, typically attaining 90% of the performance of Fortran 90 on calculations.

As a starting point, the parallelisation toolkit for MATLAB written by Einar Heiberg was used.
The Matlab parallelization toolkit is released as Open Source, uses a Master/Slave paradigm and is most suitable to problems where the amount of communication is low. The toolkit by D.Moraru called MatlabWS that is used for Euro50 is not available publicly. But the results of the investigations are interesting. The authors of[1] found out that in practise, a bigger problem is not a network bandwidth but a latency introduced by network card driver. As an example, using the Heiberg toolkit have shown that on 100Mbps Ethernet there is minimum period of 35ms of latency involved in any communications between MATLAB instances on separate cluster nodes regardless of message size. While moving to gigabit Ethernet would increase bandwidth, it would have little impact on latency[1].

Another approach is MPITB written by Javier Fernandez Baldomero that uses MPI. According to the webpage, PC MATLAB Linux users in a cluster with several PCs can use MPITB in order to call MPI library routines from within the MATLAB interpreter. Depending on your licensing scheme (node/user-based), additional licenses might be required to spawn additional MATLAB processes on other cluster nodes. Currently processes can be spawned and arranged in topologies, MATLAB variables can be sent/received.

As a conclusion, the improvements in latency have resulted in a reduction in typical model run time from 70 hours to 24 hours. It is hoped that architectural changes that better exploit a lower latency environment can reduce this still further. In addition another toolkit MPITB offers comparable latencies over Ethernet, using native MPI. MPITB uses the LAM MPI implementation. However this toolkit is more complex to configure and use. MatlabWS's combination of the ease of use of the Heiberg toolkit with the performance of MPITB make it a very compelling product.

[TBD] Ideas about parallelisation of GNU/Octave are collected here.

References:
[1] Browne, M., Andersen, T., Enmark,
A., Moraru, D., and Shearer, A., "Parallelization of MATLAB for Euro50 Integrated
Modeling", Proc. SPIE, Vol. 5497, 2004.

Saturday, September 18, 2010

MATLAB vs Octave Benchmark

The benchmark is performed by a script that is a part of AORTA simulator. For the time-consumption measure, standard tic-toc combination is used. For the memory consumption, UNIX command is executed during the computation to obtain actual memory consumption.

The hardware used was Dell Latitude E5100 laptop with Intel Celeron 2.2GHz processor, 2Gb DDR RAM, and ATA-100 160Gb HDD. The test was run in a system's terminal (graphical X Window system was shut down) so the interference with other software was negligible.

Since the simulator uses Fast Fourier Transform (FFT) and matrix operations heavily, the benchmark consists of those operations. In order to gain maximum performance, all matrices were power of two: from 2x2 up to 8192x8192 .

Those numbers, of course, are not absolute and change from version to version. But the according to our results, the tendency remains the same: Octave tends to be faster on Fourier transform than MATLAB but consumes more memory.

Comparison of computation time

The basic matrix matrix operations, such as Gauss and point-wise multiplication as well as summation, the performance of MATLAB and Octave are approximately the same. The differences are in computational time of Fourier transform. As one can see from the Fig 1.1, Octave outperforms MATLAB on FFT transforms on any matrix sizes.

b)

Figure 1.1: Median computation time versus matrix size for: a) FFT function and for b) FFTSHIFT function.

For example, on typical size of matrices in adaptive optics simulations (thousands by thousands of elements), Octave is almost 3 times faster than MATLAB with similar memory consumption. However for 8192x8192 matrix, the ``memory exhausted or requested size too large for range of Octave's index type'' message appeared and there is no datapoint for such a large matrix. It is also notable that on Windows and MATLAB v2009 an attempt to calculate huge matrices (more than 5000-6000 elements in each direction) results to the same ``out of memory'' error message.

Hence, except matrices that are bigger than 8192x8192, GNU/Octave is almost 3 times faster than MATLAB in FFT.

Comparison of memory consumption

Memory consumption for Octave is generally larger than for MATLAB as it is apparent from the Fig. 1.2. For small matrices (less than 2048x2048) it can be 2.5-3 times more memory, and for large matrices ( and more) that is typically 10-20%. Consequently, on the available hardware with 2Gb RAM the calculation of FFT of matrix 8192x8192 elements is not possible.

Figure 1.2: Median memory consumption versus matrix size for: a) FFT function and for b) FFTSHIFT function.

Although Octave consumes considerably more memory than MATLAB while performing FFT and thus may run out of memory on huge matrices.

Monday, August 23, 2010

Routh array, Routh Criterion and how to calculate the Routh Hurwitz array

The post is available also in PDF format here[mirror].

Trying to find a reasonable explanation of how to calculate Routh array, I found one here (while in book are usually given cryptic and tricky ways, even in Goodwin-Graebe).

Routh criterion

The Routh-Hurwitz stability criterion is a necessary (and frequently sufficient) method to establish the stability of a single-input, single-output (SISO), linear time invariant (LTI) control system.

One of the most popular algorithms to determine whether or not a polynomial is strictly Hurwitz, is Routh's algorithm. Consider a polynomial and its associated Routh array (see below). Then the number of roots with real part greater than zero is equal to the number of sign changes in the first column of the array.

Routh array

A tabular method of Routh criterion can be used to determine the stability when the roots of a higher order characteristic polynomial are difficult to obtain. Consider the polynomial

. The first two rows (two highest ones) of the Routh's array are formed using coefficients

in the increasing order:

Row	Power	First column	Second column	Third column
n				...
n-1	$s^{n-1}$			...

If there are no more coefficients then just add zeros in the table.

Then for each row after third row we need to calculate , , and so on. Now starting from row 3, each row contains the b's, c's, d's and so on as follows:

Row	Power	First column	Second column	Third column
n				...
n-1	$s^{n-1}$			...
n-2	$s^{n-2}$			...
n-3	$s^{n-3}$			...
n-4	$s^{n-4}$			...
&vellip#vdots;
0	$s^{0}$		0	0

Now we need to calculate the coefficients and so on. That can be done using the following determinants formulas:

and so on until . Remember to add zeros in the table when no more coefficients left - it will alleviate the calculations and make them more straightforward. The principle is the following: when one calculates , the first column in the determinant is always [column_1; row_1, row_2]"/> of the Routh table, e.g. .

Second column in the determinant contains , e.g. for calculations of .

The following illustration shows the principle:

Here dark-green rectangle depicts the first column of the determinant in the calculations of

, dark-green circle - denominator in the equation for

. Bright-green rectangle depicts the second column in the determinant for calculations of

. In the same fashion, dark-blue rectangle depicts the first column of the determinant in the calculations of

, dark-blue circle - denominator in the equation for

. Bright-blue rectangle depicts the second column in the determinant for calculations of

, and so on.

Example of calculation of Routh array

Consider the polynomial

. As above, the first two rows of the Routh's array are formed using coefficients

in the increasing order:

Row	Power	First column	Second column	Third column
n		1	3	1
n-1	$s^{3}$	1	2	0 ...

Now we need to calculate using mentioned above equations.

Thus our table will look like:

Row	Power	First column	Second column	Third column
n		1	3	1
n-1	$s^{3}$	1	2	0 ...
n-2	$s^{2}$	1	1	0 ...
n-3	$s^{1}$	1	0	0 ...
n-4	$s^{0}$	1	0	0 ...

From the array we note that there are no sign changes in the first column. According to Routh's criterion, this means that is a strictly Hurwitz polynomial.