Strive for the Best: Setup Hadoop in Mac

It is really simple to setup hadoop in Mac. I tried the latest available version at the moment. ( hadoop-2.5.1 ) You can setup hadoop in standalone mode or pseudo-distributed mode in your local machine. By following the below steps you will be able to setup hadoop in your machine in standalone mode.

( you need to install java and ssh beforehand to run hadoop )

Download the version you need to install from here
Extract the downloaded pack
The extracted directory will be your HADOOP_HOME ( ie: /Users/username/hadoopDir )
Add HADOOP_HOME to .bash_profile
Export HADOOP_HOME=/Users/userName/hadoop-2.5.1
export PATH=$PATH:$HADOOP_HOME/bin
Source .bash_profile to affect the new changes
source ~/.bash_profile

Now you should be able to echo HADOOP_HOME in terminal ( echo $HADOOP_HOME )
Make sure that you can ssh to localhost
ssh localhost

Now your stand alone hadoop setup is ready to use.

I will share a sample code I found on map reduce to test your setup.

import java.io.IOException;
import java.util.StringTokenizer;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

public class WordCount {

    public static class TokenizerMapper
            extends Mapper{

        private final static IntWritable one = new IntWritable(1);
        private Text word = new Text();

        public void map(Object key, Text value, Context context
        ) throws IOException, InterruptedException {
            StringTokenizer itr = new StringTokenizer(value.toString());
            while (itr.hasMoreTokens()) {
                word.set(itr.nextToken());
                context.write(word, one);
            }
        }
    }

    public static class IntSumReducer
            extends Reducer {
        private IntWritable result = new IntWritable();

        public void reduce(Text key, Iterable values,
                           Context context
        ) throws IOException, InterruptedException {
            int sum = 0;
            for (IntWritable val : values) {
                sum += val.get();
            }
            result.set(sum);
            context.write(key, result);
        }
    }

    public static void main(String[] args) throws Exception {
        Configuration conf = new Configuration();
        Job job = Job.getInstance(conf, "word count");
        job.setJarByClass(WordCount.class);
        job.setMapperClass(TokenizerMapper.class);
        job.setCombinerClass(IntSumReducer.class);
        job.setReducerClass(IntSumReducer.class);
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(IntWritable.class);
        FileInputFormat.addInputPath(job, new Path(args[0]));
        FileOutputFormat.setOutputPath(job, new Path(args[1]));
        System.exit(job.waitForCompletion(true) ? 0 : 1);
    }
}

Run sample

Create a jar from the sample
Create a text file of which you need to count words
Run

hadoop jar [path_to_jar] [path_to_main_class] [path_to_input] [path_to_output] ie: hadoop jar wordCount.jar WordCount inputFile output

On a successful execution, you will have the output directory created at the path you specify. And your result will be in output/part-r-00000
When you run the program again you need to remove the ‘output’ directory or give some other path for the output to be written.

You will see that this is really simple.

You can find steps on setting up hadoop in pseudo-distributed mode in this post

2 comments:

Chi nguyễnMarch 31, 2016 at 2:11 AM
Những biểu hiện và cách chữa đau dạ dày hiệu quả bằng đông tây y , Đông y chữa loét dạ dày , Bệnh viem amidan cap tính , Phép chữa viêm amidan mủ ,Thuốc trị nổi mề đay mẩn ngứa , Thuốc chữa gan nhiễm mỡ , Làm cách nào chữa viêm phế quản phổi cho bé , Thuốc dân gian chữa ho có đờm ,Dân gian trong chữa bệnh viêm mũi dị ứng mãn tính ,Viêm xoang mũi và cach chua viem xoang hieu qua nhanh, Tây y trong điều trị thuốc đông y chữa dạ dày Triệt để , Cùng các mẹ chữa Bệnh trào ngược dạ dày ,Bệnh trào ngược dạ dày và cách chữa bệnh trào ngược dạ dày đơn giản , Thuốc chống rụng tóc hiệu quả , Viêm phụ khoa điều trị rối loạn kinh nguyệt thật dễ , Thoái hóa cột sống và thoái hóa đốt sống cổ có nhiều liên quan .ở nhà cũng có thể giảm đau dạ dày rất đơn giản . Bài thuốc nam gia truyền dieu tri viem gan b mãn , Thuốc nam tri viem amidan , Viêm mũi họng viêm amidan hốc mủ trắng chữa ra sao ?, Bệnh đường tiêu hóa benh dai trang mãn tính . Cách phòng ngừa bệnh dạ dày bằng đau dạ dày không nên ăn gì thật dễ dàng . Những biểu hiện và trieu chung dau da day thế nào là đúng nhất. Mẹo hay chữa benh viem hong mãn tính
Eva.WilliamJune 13, 2020 at 10:45 AM
This comment has been removed by the author.

Tuesday, October 28, 2014

Setup Hadoop in Mac

Run sample

2 comments: