Friday, February 5, 2016

Do you want to deploy a cluster in a single click with single configuration file?

Do you want to create a hadoop cluster to test your map-reduce job? Are you postponing it thinking about the overhead of setting up the environment? Karamel provides you the easiest way to do it. Karamel is a platform which supports convenient deployment orchestration of distributed system deployments. You can set up your cluster in cloud with few simple steps.  

Download the latest version of Karamel here. You just need a configuration file describing how you need to deploy your cluster. The link provides all required information on setting up Karamel (you just need to extract the .tar file). Following video will help you to have a smooth floor in dealing with Karamel in doing your deployment and running experiments.

Thursday, February 4, 2016

Reproducible Stream Processing Benchmark to compare Apache Spark and Apache Flink on Cloud

Stream processing is becoming a crucial requirement with the high volume of data generated and the need for real-time processing of those data. And the data processing platforms are trying to provide smart and efficient approaches for stream processing. 

Yahoo has recently published a stream processing benchmark and has published the resources here to run the experiment in a single node. Since benchmarking stream processing is an interesting and important task, we wanted to reproduce the experiment on a clustered setup. 

We created a reproducible experiment on Amazon EC2 to reproduce yahoo streaming benchmark on a cluster of Apache Spark and Apache Flink. You can find all the resources and in instruction here which will help you to reproduce our experiment. And more importantly, you can reproduce the experiment with different configurations conveniently by following the instruction in above-mentioned link.

Following are some of the application-level and system-level performance results that we obtained during the experiment.

Application-level performance:

System-level performance



The configurations and explanation of the results can be found in the stream processing evaluation section of full report.

We have completed a performance comparison of batch processing as well for Apache Spark and Apache Flink to reproduce DongWong’s performance comparison.

The full project report can be found here.


    Jim Dowling
    Kamal Hakinzaheh
    Shelan Perera