Monday, March 28, 2016

Vagrant : An amazing tool to manage VMs

Vagrant is an amazing tool for managing VMs. Why it is so amazing is because of the convenience it adds in the aspect of managing VMs. It can work with VirtualBox which makes it possible to get the advantage of vagrant for free.
If you have installed Vagrant and VirtualBox starting a vagrant and spawning up a virtual machine is super simple. You just need to enter 2 commands below.

vagrant init hashicorp/precise64
vagrant up

First command creates a configuration file in your current directory with the name "Vagrantfile" and this file can be used to configure the VMs as required. The second command will create and start the VM with the name "default". With these two commands, you will have your VM up and running in VirtualBox.

But most of the time we may need to start several VMs and manage them. The Vagrantfile can be used to support that. Given below is a sample Vagrantfile (without comments) that was used to spawn up 3 VMs: vm1, vm2 and vm3 with the specified IPs. The memory of each VM, number of cpus, etc can be configured in the Vagrantfile as required.

Vagrant.configure(2) do |config| = "hashicorp/precise64"
  config.vm.provider "virtualbox" do |vb|
  vb.memory = "3072"
  vb.cpus = 2

config.hostmanager.enabled = true
config.hostmanager.manage_host = false
config.hostmanager.ignore_private_ip = false
config.hostmanager.include_offline = true

  config.vm.define "vm1" do |vm1|
    vm1.vm.hostname = 'vm1-hostname' "private_network", ip: ""

  config.vm.define "vm2" do |vm2|
    vm2.vm.hostname = 'vm2-hostname' "private_network", ip: ""

  config.vm.define "vm3" do |vm3|
    vm3.vm.hostname = 'vm3-hostname' "private_network", ip: ""

Below are some useful commands that are be required to manage these VMs.
  • vagrant up vm1 (create vm1 if not created and start the vm1)
  • vagrant ssh vm1 (ssh into vm1)
  • vagrant halt vm1 (gracefully shutdown vm1)
  • vagrant destroy vm1 (destroy vm1)
  • vagrant status (shows status of the VMs created in current directory)
  • vagrant global-status (shows status of all the VMs in your host)

And one another useful feature is the easiness of copying the files to your VM. You can simply copy whatever the files you need to the same directory where the Vagrantfile is and those files will be available in /vagrant directory in your VM.
Give it a try when you need to work with VMs.

Friday, February 5, 2016

Do you want to deploy a cluster in a single click with single configuration file?

Do you want to create a hadoop cluster to test your map-reduce job? Are you postponing it thinking about the overhead of setting up the environment? Karamel provides you the easiest way to do it. Karamel is a platform which supports convenient deployment orchestration of distributed system deployments. You can set up your cluster in cloud with few simple steps.  

Download the latest version of Karamel here. You just need a configuration file describing how you need to deploy your cluster. The link provides all required information on setting up Karamel (you just need to extract the .tar file). Following video will help you to have a smooth floor in dealing with Karamel in doing your deployment and running experiments.

Thursday, February 4, 2016

Reproducible Stream Processing Benchmark to compare Apache Spark and Apache Flink on Cloud

Stream processing is becoming a crucial requirement with the high volume of data generated and the need for real-time processing of those data. And the data processing platforms are trying to provide smart and efficient approaches for stream processing. 

Yahoo has recently published a stream processing benchmark and has published the resources here to run the experiment in a single node. Since benchmarking stream processing is an interesting and important task, we wanted to reproduce the experiment on a clustered setup. 

We created a reproducible experiment on Amazon EC2 to reproduce yahoo streaming benchmark on a cluster of Apache Spark and Apache Flink. You can find all the resources and in instruction here which will help you to reproduce our experiment. And more importantly, you can reproduce the experiment with different configurations conveniently by following the instruction in above-mentioned link.

Following are some of the application-level and system-level performance results that we obtained during the experiment.

Application-level performance:

System-level performance



The configurations and explanation of the results can be found in the stream processing evaluation section of full report.

We have completed a performance comparison of batch processing as well for Apache Spark and Apache Flink to reproduce DongWong’s performance comparison.

The full project report can be found here.


    Jim Dowling
    Kamal Hakinzaheh
    Shelan Perera

Tuesday, December 1, 2015

Computer users, relax your tired eyes using 20-20-20 rule

We all know that using a computer for a longer time each day, makes your eyes tired. This is a simple rule which may help reduce your eyestrain and let your eyes relax for a while. It is recommended to stand up and stretch once in a while without sitting on your chair all the time. But sometimes it may not be possible to walk in every 30 minutes from your chair. But at least try following 20-20-20 rule to relax your eyes :)

20-20-20 rule:

And try to blink your eyes more so that it will help your eyes to avoid getting dry. 

Wednesday, November 4, 2015

Presentation tips and 10-20-30 Rule for Presentations

When it is required to do a presentation either it is business related or academic, we always want to make it attractive and smart. Following are some of the well known methods that I have been using.

  • Use less text
We know that the audience will get board when you have a lot of text in your slides. If they do not get board, they will start reading the slides without listening to your talk. So you miss the change to emphasis what you want. Either way you loose :(
So if you feel that should have text for a particular slide to explain it easier, try point format and be precise. 3, 4 bullet points with up to 6, 7 words per each bullet will be ideal.

  • Communicate your idea with pictures
We know that "A picture is worth a thousand words". It is always easy and clear to convey your idea through a picture than having a lot of text. (This is a very well known fact)

  • Use simple animations
If some complex concept or an algorithm should be presented, make some animated slides which can explain it easily than trying to struggle with all Xs and Ys. You may feel that it will be clear, but believe me it will not. Just try to memorize the very first time you got to know about that concept. Even though you have mastered it now, for the audience it may be the first time :)

And of course you should not have these complex explanations in your presentation if you do not think that you MUST have them.

  • Less slides and good formatting
Have some attractive formatting, so your audience will not get board seeing it. But of course I did not mean 'fancy' :)

  • 10-20-30 Rule
This was a concept introduced by Guy Kawasaki who is a Silicon Valley marketing executive.
In brief it says
               10 slides only
               20 minutes presentation time
               30 font size
Here is his explanation on how it will be effective. 

And I have some final tip for the audience. If your colleague is doing a presentation for academic purposes and you are supposed to give feedback, please provide the real feedback you have. Do not say that it was amazing just because you think that your friend will feel bad if you didn't do so. I specifically mentioned academic because I believe when it comes to business, it is most probable to have the real feedback since it is business. :)

I personally had this practice of being nice and trying to avoid giving bad feedback and suddenly I realized that I am taking away the change of my colleague to further improve his/her presentation skills. This can be something simple like 'You spoke too fast' but if he improve that, it will help a lot to attract more audience next time. You can of course be polite and convey it nicely, so that it will really help the presenter to improve himself/herself. And may be you can give written feedback too.

So good luck with your next presentation :)

Wednesday, October 7, 2015

Singleton Pattern

This is one of the simplest, yet very useful design pattern. Singleton design pattern can be used when it is required to create just one object of a class. As an example, if you have a ApplicationManager class in your application where all other components need to get help from, you need to have access to ApplicationManager object. But just think that you need to share some objects through your ApplicationManager. Then of course you do not want to have different object referring to different ApplicationManager objects. You need to have one common object shared with all of them. And singleton pattern is the way to achieve this.

I believe an example code snippet will help a lot in understanding this simply.

public class ApplicationManager {

    private static ApplicationManager applicationManager = new ApplicationManager();

    private ApplicationManager() { }

    public static ApplicationManager getInstance() {
        return applicationManager;
    public void yourMethod() {

So you see the constructor is private, so no one can initialize an object of ApplicationManager and if they want to use it, can call getInstance() method and use the already created static ApplicationManager object. So no more than one object is created and global access is provided.

Monday, October 5, 2015

Producer Consumer Pattern

Producer Consumer Design Pattern is one of the mainly used design patterns. And I am pretty sure that, many who does not know that this is an example of a design pattern too have used it. This design pattern is mainly used to introduce decoupling between consuming something and the relevant producing procedure. In other words it implements an efficient and smooth data sharing between the producer of the data and the data consumer. This decoupling is mainly done maintaining a queue for the data items.

By using a queue even the producer and consumer have different rates of producing and consuming data, the data sharing floor will be smooth through the queue. Producer do not have to wait until the consumer finishes data item 1 to deliver data item 2. And producer do not have any overhead of managing the produced items even though there are many consumers as far as all the consumers are consuming the items in that same queue.

So this design pattern can be easily used to perform data sharing between the producer parties and consumer parties in order to achieve good synchronization among them.