mvkonnik: September 2010

Tuesday, September 21, 2010

MATLAB parallelisation for the Adaptive Optics Simulators

Recently in the article[1] there were proposed interesting ideas about parallelisation of MATLAB. In Euro50 project of the Extremely Large Telescope, the model of the entire telescope is built on the MATLAB and calculations are obviously beyond the capabilities of personal computers.

MATLAB does not include any parallel functionality however there are basic interfaces to the TCP/IP network stack, C and Fortran. By building upon these interfaces there have been of the order of 30 attempts (as in the survey R. Choy, Parallel MATLAB survey) to produce ``toolkits'' to allow MATLAB to be used in a parallel fashion. The approaches of communications between nodes that were used in those toolkits either take a low-level approach using commands similar to Message Passing Interface (MPI) libraries or use a more high level approach choosing to use simpler commands that resemble more closely those of MATLAB. The problem with MPI is that it is difficult to use with MATLAB[1].

According to[1], the acceleration of the simulation's execution using compilation to MEX is significant: experiment showed that MATLAB could indeed compete with the traditional HPC languages in raw performance, typically attaining 90% of the performance of Fortran 90 on calculations.

As a starting point, the parallelisation toolkit for MATLAB written by Einar Heiberg was used.
The Matlab parallelization toolkit is released as Open Source, uses a Master/Slave paradigm and is most suitable to problems where the amount of communication is low. The toolkit by D.Moraru called MatlabWS that is used for Euro50 is not available publicly. But the results of the investigations are interesting. The authors of[1] found out that in practise, a bigger problem is not a network bandwidth but a latency introduced by network card driver. As an example, using the Heiberg toolkit have shown that on 100Mbps Ethernet there is minimum period of 35ms of latency involved in any communications between MATLAB instances on separate cluster nodes regardless of message size. While moving to gigabit Ethernet would increase bandwidth, it would have little impact on latency[1].

Another approach is MPITB written by Javier Fernandez Baldomero that uses MPI. According to the webpage, PC MATLAB Linux users in a cluster with several PCs can use MPITB in order to call MPI library routines from within the MATLAB interpreter. Depending on your licensing scheme (node/user-based), additional licenses might be required to spawn additional MATLAB processes on other cluster nodes. Currently processes can be spawned and arranged in topologies, MATLAB variables can be sent/received.

As a conclusion, the improvements in latency have resulted in a reduction in typical model run time from 70 hours to 24 hours. It is hoped that architectural changes that better exploit a lower latency environment can reduce this still further. In addition another toolkit MPITB offers comparable latencies over Ethernet, using native MPI. MPITB uses the LAM MPI implementation. However this toolkit is more complex to configure and use. MatlabWS's combination of the ease of use of the Heiberg toolkit with the performance of MPITB make it a very compelling product.

[TBD] Ideas about parallelisation of GNU/Octave are collected here.

References:
[1] Browne, M., Andersen, T., Enmark,
A., Moraru, D., and Shearer, A., "Parallelization of MATLAB for Euro50 Integrated
Modeling", Proc. SPIE, Vol. 5497, 2004.

Saturday, September 18, 2010

MATLAB vs Octave Benchmark

The benchmark is performed by a script that is a part of AORTA simulator. For the time-consumption measure, standard tic-toc combination is used. For the memory consumption, UNIX command is executed during the computation to obtain actual memory consumption.

The hardware used was Dell Latitude E5100 laptop with Intel Celeron 2.2GHz processor, 2Gb DDR RAM, and ATA-100 160Gb HDD. The test was run in a system's terminal (graphical X Window system was shut down) so the interference with other software was negligible.

Since the simulator uses Fast Fourier Transform (FFT) and matrix operations heavily, the benchmark consists of those operations. In order to gain maximum performance, all matrices were power of two: from 2x2 up to 8192x8192 .

Those numbers, of course, are not absolute and change from version to version. But the according to our results, the tendency remains the same: Octave tends to be faster on Fourier transform than MATLAB but consumes more memory.

Comparison of computation time

The basic matrix matrix operations, such as Gauss and point-wise multiplication as well as summation, the performance of MATLAB and Octave are approximately the same. The differences are in computational time of Fourier transform. As one can see from the Fig 1.1, Octave outperforms MATLAB on FFT transforms on any matrix sizes.

b)

Figure 1.1: Median computation time versus matrix size for: a) FFT function and for b) FFTSHIFT function.

For example, on typical size of matrices in adaptive optics simulations (thousands by thousands of elements), Octave is almost 3 times faster than MATLAB with similar memory consumption. However for 8192x8192 matrix, the ``memory exhausted or requested size too large for range of Octave's index type'' message appeared and there is no datapoint for such a large matrix. It is also notable that on Windows and MATLAB v2009 an attempt to calculate huge matrices (more than 5000-6000 elements in each direction) results to the same ``out of memory'' error message.

Hence, except matrices that are bigger than 8192x8192, GNU/Octave is almost 3 times faster than MATLAB in FFT.

Comparison of memory consumption

Memory consumption for Octave is generally larger than for MATLAB as it is apparent from the Fig. 1.2. For small matrices (less than 2048x2048) it can be 2.5-3 times more memory, and for large matrices ( and more) that is typically 10-20%. Consequently, on the available hardware with 2Gb RAM the calculation of FFT of matrix 8192x8192 elements is not possible.

Figure 1.2: Median memory consumption versus matrix size for: a) FFT function and for b) FFTSHIFT function.

Although Octave consumes considerably more memory than MATLAB while performing FFT and thus may run out of memory on huge matrices.