Recently in the article[1] there were proposed interesting ideas about parallelisation of MATLAB. In Euro50 project of the Extremely Large Telescope, the model of the entire telescope is built on the MATLAB and calculations are obviously beyond the capabilities of personal computers.
MATLAB does not include any parallel functionality however there are basic interfaces to the TCP/IP network stack, C and Fortran. By building upon these interfaces there have been of the order of 30 attempts (as in the survey R. Choy, Parallel MATLAB survey) to produce ``toolkits'' to allow MATLAB to be used in a parallel fashion. The approaches of communications between nodes that were used in those toolkits either take a low-level approach using commands similar to Message Passing Interface (MPI) libraries or use a more high level approach choosing to use simpler commands that resemble more closely those of MATLAB. The problem with MPI is that it is difficult to use with MATLAB[1].
According to[1], the acceleration of the simulation's execution using compilation to MEX is significant: experiment showed that MATLAB could indeed compete with the traditional HPC languages in raw performance, typically attaining 90% of the performance of Fortran 90 on calculations.
As a starting point, the parallelisation toolkit for MATLAB written by Einar Heiberg was used.
The Matlab parallelization toolkit is released as Open Source, uses a Master/Slave paradigm and is most suitable to problems where the amount of communication is low. The toolkit by D.Moraru called MatlabWS that is used for Euro50 is not available publicly. But the results of the investigations are interesting. The authors of[1] found out that in practise, a bigger problem is not a network bandwidth but a latency introduced by network card driver. As an example, using the Heiberg toolkit have shown that on 100Mbps Ethernet there is minimum period of 35ms of latency involved in any communications between MATLAB instances on separate cluster nodes regardless of message size. While moving to gigabit Ethernet would increase bandwidth, it would have little impact on latency[1].
Another approach is MPITB written by Javier Fernandez Baldomero that uses MPI. According to the webpage, PC MATLAB Linux users in a cluster with several PCs can use MPITB in order to call MPI library routines from within the MATLAB interpreter. Depending on your licensing scheme (node/user-based), additional licenses might be required to spawn additional MATLAB processes on other cluster nodes. Currently processes can be spawned and arranged in topologies, MATLAB variables can be sent/received.
As a conclusion, the improvements in latency have resulted in a reduction in typical model run time from 70 hours to 24 hours. It is hoped that architectural changes that better exploit a lower latency environment can reduce this still further. In addition another toolkit MPITB offers comparable latencies over Ethernet, using native MPI. MPITB uses the LAM MPI implementation. However this toolkit is more complex to configure and use. MatlabWS's combination of the ease of use of the Heiberg toolkit with the performance of MPITB make it a very compelling product.
[TBD] Ideas about parallelisation of GNU/Octave are collected here.
References:
[1] Browne, M., Andersen, T., Enmark,
A., Moraru, D., and Shearer, A., "Parallelization of MATLAB for Euro50 Integrated
Modeling", Proc. SPIE, Vol. 5497, 2004.
No comments:
Post a Comment