Monday, March 28, 2016

Vagrant : An amazing tool to manage VMs

Vagrant is an amazing tool for managing VMs. Why it is so amazing is because of the convenience it adds in the aspect of managing VMs. It can work with VirtualBox which makes it possible to get the advantage of vagrant for free.
If you have installed Vagrant and VirtualBox starting a vagrant and spawning up a virtual machine is super simple. You just need to enter 2 commands below.

vagrant init hashicorp/precise64
vagrant up

First command creates a configuration file in your current directory with the name "Vagrantfile" and this file can be used to configure the VMs as required. The second command will create and start the VM with the name "default". With these two commands, you will have your VM up and running in VirtualBox.

But most of the time we may need to start several VMs and manage them. The Vagrantfile can be used to support that. Given below is a sample Vagrantfile (without comments) that was used to spawn up 3 VMs: vm1, vm2 and vm3 with the specified IPs. The memory of each VM, number of cpus, etc can be configured in the Vagrantfile as required.

Vagrant.configure(2) do |config|
  config.vm.box = "hashicorp/precise64"
  config.vm.provider "virtualbox" do |vb|
  vb.memory = "3072"
  vb.cpus = 2
end

config.hostmanager.enabled = true
config.hostmanager.manage_host = false
config.hostmanager.ignore_private_ip = false
config.hostmanager.include_offline = true

  config.vm.define "vm1" do |vm1|
    vm1.vm.hostname = 'vm1-hostname'
    vm1.vm.network "private_network", ip: "172.28.128.5"
  end

  config.vm.define "vm2" do |vm2|
    vm2.vm.hostname = 'vm2-hostname'
    vm2.vm.network "private_network", ip: "172.28.128.6"
  end

  config.vm.define "vm3" do |vm3|
    vm3.vm.hostname = 'vm3-hostname'
    vm3.vm.network "private_network", ip: "172.28.128.7"
  end
end


Below are some useful commands that are be required to manage these VMs.
  • vagrant up vm1 (create vm1 if not created and start the vm1)
  • vagrant ssh vm1 (ssh into vm1)
  • vagrant halt vm1 (gracefully shutdown vm1)
  • vagrant destroy vm1 (destroy vm1)
  • vagrant status (shows status of the VMs created in current directory)
  • vagrant global-status (shows status of all the VMs in your host)

And one another useful feature is the easiness of copying the files to your VM. You can simply copy whatever the files you need to the same directory where the Vagrantfile is and those files will be available in /vagrant directory in your VM.
Give it a try when you need to work with VMs.

Friday, February 5, 2016

Do you want to deploy a cluster in a single click with single configuration file?

Do you want to create a hadoop cluster to test your map-reduce job? Are you postponing it thinking about the overhead of setting up the environment? Karamel provides you the easiest way to do it. Karamel is a platform which supports convenient deployment orchestration of distributed system deployments. You can set up your cluster in cloud with few simple steps.  

Download the latest version of Karamel here. You just need a configuration file describing how you need to deploy your cluster. The link provides all required information on setting up Karamel (you just need to extract the .tar file). Following video will help you to have a smooth floor in dealing with Karamel in doing your deployment and running experiments.









Thursday, February 4, 2016

Reproducible Stream Processing Benchmark to compare Apache Spark and Apache Flink on Cloud

Stream processing is becoming a crucial requirement with the high volume of data generated and the need for real-time processing of those data. And the data processing platforms are trying to provide smart and efficient approaches for stream processing. 

Yahoo has recently published a stream processing benchmark and has published the resources here to run the experiment in a single node. Since benchmarking stream processing is an interesting and important task, we wanted to reproduce the experiment on a clustered setup. 

We created a reproducible experiment on Amazon EC2 to reproduce yahoo streaming benchmark on a cluster of Apache Spark and Apache Flink. You can find all the resources and in instruction here which will help you to reproduce our experiment. And more importantly, you can reproduce the experiment with different configurations conveniently by following the instruction in above-mentioned link.

Following are some of the application-level and system-level performance results that we obtained during the experiment.

Application-level performance:



System-level performance


Memory:






CPU:





The configurations and explanation of the results can be found in the stream processing evaluation section of full report.

We have completed a performance comparison of batch processing as well for Apache Spark and Apache Flink to reproduce DongWong’s performance comparison.

The full project report can be found here.

Acknowledgement


    Jim Dowling
    Kamal Hakinzaheh
    Shelan Perera