
Download Hadoop from apache Hadoop mirror: http://hadoop.apache.org/releases.html#Download
In this case, we choose Hadoop-2.2.0.
Unzip the downloaded Hadoop package and put the Hadoop fold to directory where you want it to be installed.
tar -zxvf hadoop-2.2.0.tar.gz
Switch to your Hana server user:
su hana_user_name.
We need to install Hadoop under Hana user, because Hana server needs to communicate with Hadoop with the same user.
If you just want to set up Hadoop without accessing from Hana, you can simply create a dedicate Hadoop account by “addgroup” and “adduser” (these two command lines depend on system, Suse and Ubuntu seem to have different command lines)
Before we install the Hadoop, we should make we have Java installed.
Use:
java –version
to check java and find java path by
whereis java
And write the following script in $HOME/.bashrc to add your java path:
export JAVA_HOME=/java/path/
export PATH=$PATH:/java/path/
Install ssh first if you don’t have.
Type the following commands in console to create a public key and put the key to authorized keys
ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
Write the following script in $HOME/.bashrc if you want to add the Hadoop path permanently.
Open the .bashrc file by
vi $HOME/.bashrc
Add the following script
export HADOOP_INSTALL=/hadoop/path/
For the hadoop path, I put the Hadoop folder under /usr/local,
so I use /usr/local/hadoop instead of /hadoop/path/ in my case
export PATH=$PATH:$HADOOP_INSTALL/bin
export PATH=$PATH:$HADOOP_INSTALL/sbin
Find the configuration files, core-site.xml, hdfs-core.xml, yarn-site.xml, mapred-site.xml, hadoop-env.sh in Hadoop folder. These files exist in $ HADOOP_INSTALL /etc/hadoop/ under you Hadoop folder. You may simply rename the “template file” in the folder if you can find the xml files. For example:
cp mapred-site.xml.template mapred-site.xml
Some other tutorials said you can find them under /conf/ directory, I guess /conf/ is for older Hadoop version, but in hadoop-2.2.0 the files are under /etc/hadoop/
Modify the configuration files as followed:
vi core-site.xml
Put the following between configuration tab
<property>
<name>fs.default.name</name>
<value>hdfs://computer name or IP(localhost would also work):8020</value>
</property>
vi hdfs-site.xml
Put the following between configuration tab
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/namenode/dir</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/datanode/dir</value>
</property>
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
vi yarn-site.xml
Put the following between configuration tab
<property>
<name>yarn.resourcemanager.hostname</name>
<value>yourcomputername or IP</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
vi mapred-site.xml
Put the following between configuration tab
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
For more information about all the tabs, please check
http://hadoop.apache.org/docs/r2.2.0/hadoop-project-dist/hadoop-common/core-default.xml
http://hadoop.apache.org/docs/r2.2.0/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml
http://hadoop.apache.org/docs/r2.2.0/hadoop-mapreduce-client/hadoop-mapreduce-client-core/mapred-def...
http://hadoop.apache.org/docs/r2.2.0/hadoop-yarn/hadoop-yarn-common/yarn-default.xml
vi hadoop-env.sh
add the following two statement in the end of this file
export HADOOP_COMMON_LIB_NATIVE_DIR=/hadoop/path/lib/native
export HADOOP_OPTS="-Djava.library.path=/hadoop/path /lib"
The last thing needs to do before starting your Hadoop is to format your namenode and datanode simply by:
hadoop namenode -format
In the end, you can start Hadoop by calling “start-all.sh”, you may find this file in /hadoop/path/sbin.
To check your Hadoop has started, type
jps
You should see NameNode, NodeManager, DataNode, SecondaryNameNode and ResourceManager are running.
Alternatively, you can also check if Hadoop is running by visiting localhost:50070 to check Hadoop file system information
and localhost:8088 to check cluster information.
You may find that localhost:50030 contains jobtracker info in some tutorials. However, localhost:50030 does not exist in hadoop-2.2.0, because hadoop-2.2.0 divides the two major functions of the JobTracker: resource management and job life-cycle management into separate components. Don’t worry about localhost:50030 not working.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
User | Count |
---|---|
15 | |
14 | |
13 | |
10 | |
9 | |
9 | |
8 | |
7 | |
7 | |
7 |