Benchmarking of distributed computing engines spark and GraphLab for big data analytics

Jian Wei, Kai Chen*, Yi Zhou, Qu Zhou, Jianhua He

*Corresponding author for this work

Research output: Chapter in Book/Published conference outputConference publication

Abstract

In this paper we evaluate and compare two representativeand popular distributed processing engines for large scalebig data analytics, Spark and graph based engine GraphLab. Wedesign a benchmark suite including representative algorithmsand datasets to compare the performances of the computingengines, from performance aspects of running time, memory andCPU usage, network and I/O overhead. The benchmark suite istested on both local computer cluster and virtual machines oncloud. By varying the number of computers and memory weexamine the scalability of the computing engines with increasingcomputing resources (such as CPU and memory). We also runcross-evaluation of generic and graph based analytic algorithmsover graph processing and generic platforms to identify thepotential performance degradation if only one processing engineis available. It is observed that both computing engines showgood scalability with increase of computing resources. WhileGraphLab largely outperforms Spark for graph algorithms, ithas close running time performance as Spark for non-graphalgorithms. Additionally the running time with Spark for graphalgorithms over cloud virtual machines is observed to increaseby almost 100% compared to over local computer clusters.

Original languageEnglish
Title of host publicationProceedings, 2016 IEEE Second International Conference on Big Data Computing Service and Applications, BigDataService 2016
Place of PublicationPiscataway, NJ (US)
PublisherIEEE
Pages10-13
Number of pages4
ISBN (Print)978-1-5090-2251-9
DOIs
Publication statusPublished - 23 May 2016
Event2nd IEEE International Conference on Big Data Computing Service and Applications - Oxford, United Kingdom
Duration: 29 Mar 20161 Apr 2016

Conference

Conference2nd IEEE International Conference on Big Data Computing Service and Applications
Abbreviated titleBigDataService 2016
Country/TerritoryUnited Kingdom
CityOxford
Period29/03/161/04/16

Bibliographical note

-

Fingerprint

Dive into the research topics of 'Benchmarking of distributed computing engines spark and GraphLab for big data analytics'. Together they form a unique fingerprint.

Cite this