Abstract
In this paper we evaluate and compare two representativeand popular distributed processing engines for large scalebig data analytics, Spark and graph based engine GraphLab. Wedesign a benchmark suite including representative algorithmsand datasets to compare the performances of the computingengines, from performance aspects of running time, memory andCPU usage, network and I/O overhead. The benchmark suite istested on both local computer cluster and virtual machines oncloud. By varying the number of computers and memory weexamine the scalability of the computing engines with increasingcomputing resources (such as CPU and memory). We also runcross-evaluation of generic and graph based analytic algorithmsover graph processing and generic platforms to identify thepotential performance degradation if only one processing engineis available. It is observed that both computing engines showgood scalability with increase of computing resources. WhileGraphLab largely outperforms Spark for graph algorithms, ithas close running time performance as Spark for non-graphalgorithms. Additionally the running time with Spark for graphalgorithms over cloud virtual machines is observed to increaseby almost 100% compared to over local computer clusters.
Original language | English |
---|---|
Title of host publication | Proceedings, 2016 IEEE Second International Conference on Big Data Computing Service and Applications, BigDataService 2016 |
Place of Publication | Piscataway, NJ (US) |
Publisher | IEEE |
Pages | 10-13 |
Number of pages | 4 |
ISBN (Print) | 978-1-5090-2251-9 |
DOIs | |
Publication status | Published - 23 May 2016 |
Event | 2nd IEEE International Conference on Big Data Computing Service and Applications - Oxford, United Kingdom Duration: 29 Mar 2016 → 1 Apr 2016 |
Conference
Conference | 2nd IEEE International Conference on Big Data Computing Service and Applications |
---|---|
Abbreviated title | BigDataService 2016 |
Country/Territory | United Kingdom |
City | Oxford |
Period | 29/03/16 → 1/04/16 |