Friday, 19 June 2015

Creating Hadoop 3 Node cluster and Hbase Replicaton Using Cloudera Manager


    
1  Setting up Amazon EC2 Instances
Creating two clusters on same regions with 3 node  on one cluster and 3 nodes on other Clusters with minimum volume of 8GB.
1.1 Launch Instance
Login to Amazon Web Services, click on My Account and navigate to Amazon EC2 Console
C:\Users\Guidanz-prem\Desktop\Document\6-9-2014 12-15-35 PM.png
1.2 Select AMI
Select the  Ubuntu-precise-12.04 Server 64 bit OS
C:\Users\Guidanz-prem\Desktop\screenshoots\1.png


1.3 Select Instance Type

Select the `Instance Type` as `m3.medium.

C:\Users\Guidanz-prem\Desktop\screenshoots\2.png

1.4 Configure Number of Instances
Provide the instance details ,shutdown behavior and availability zone.
C:\Users\Guidanz-prem\Desktop\screenshoots\3.png


1.5 Add Storage
Use the default options in the below screen.
C:\Users\Guidanz-prem\Desktop\screenshoots\4.png

1.6 Instance Description
Provide instance name and description

C:\Users\Guidanz-prem\Desktop\screenshoots\5.png

1.7 Define a Security Group
It is very important to configure the EC2 firewall correctly. On the “Configure Firewall” page choose “Create a new Security Group,” and authorize all the ports listed below:
C:\Users\Guidanz-prem\Desktop\screenshoots\6.png


1.8 Review and Launch Instance.
Check the instance details and click launch

C:\Users\Guidanz-prem\Desktop\screenshoots\7.png

1.9 Launch Instance and Create Security Pair
Amazon EC2 uses public–key cryptography to encrypt and decrypt login information. Public–key cryptography uses a public key to encrypt a piece of data, such as a password, then the recipient uses the private key to decrypt the data. The public and private keys are known as a key pair.
1.10 Define a Security Group
Create a new security group, and  modify the security group with security rules.

C:\Users\Guidanz-prem\Desktop\screenshoots\6.png

1.11 Launching Instances
Once you click “Launch Instance” 6 instance should be launched with “pending” state

C:\Users\Guidanz-prem\Desktop\Document\6-9-2014 12-29-33 PM.png



Once in “running” state rename the instance name as below.
NameNode
Standby1
Standby2
Master
Slave1
Slave2

C:\Users\Guidanz-prem\Desktop\Document\6-9-2014 12-37-21 PM.png

2  Setting up client access to Amazon Instances
Create a new keypair and give it a name “Clusterkey” and download the keypair (.pem) file to your local machine. Click Launch Instance
C:\Users\Guidanz-prem\Desktop\screenshoots\8.png

2.1 Generating Private Key
Let’s launch PUTTYGEN client and import the key pair which is already  created during launch instance step – “Clusterkey.pem”   
Navigate to Conversions and “Import Key”
C:\Users\Guidanz-prem\Desktop\screenshoots\import_key.png
Click Generate ,
           C:\Users\Guidanz-prem\Desktop\screenshoots\import_key2.png

Save Private Key
Now save the private key by clicking on “Save Private Key” and click “Yes” and leave passphrase empty.
    http://letsdobigdata.files.wordpress.com/2014/01/save_privatekey.png
2.2 Connect to Amazon Instance
Launch Putty client and Load the ppk file
C:\Users\Guidanz-prem\Desktop\Document\6-9-2014 12-56-55 PM.png.
Repeat this for slave nodes.

2.3 Setup WinSCP access to EC2 instances:


In order to securely transfer files from your windows machine to Amazon EC2 WinSCP is a handy utility.
For User name, enter the default user name for your AMI. For Amazon Ubuntu AMIs, the user name is Ubuntu
For Private key, enter the path to your private key, or click the "…" button to browse for the file.
Click Login to connect, and click Yes to add the host fingerprint to the host cache.

C:\Users\Guidanz-prem\Desktop\screenshoots\WINSCP.png
Select the pem file clusterkey.pem file and drag it to other right pane.
C:\Users\Guidanz-prem\Desktop\6-9-2014 4-06-20 PM.png
Repeat this for slave nodes.


3 Setup Password-less SSH on Servers


Master server remotely starts services on salve nodes, whichrequires password-less access to Slave Servers. AWS Ubuntu server comes with pre-installed OpenSSh server.
The public part of the key loaded into the agent must be put on the target system in ~/.ssh/authorized_keys. This has been taken care of by the AWS Server creation process
Now we need to add the AWS EC2 Key Pair identity Clusterkey.pem to ssh profile In order to do that we will need to use following ssh utilities
  • ‘ssh-agent’ is a background program that handles passwords for SSH private keys.
  •  ‘ssh-add’ command prompts the user for a private key password and adds it to the list maintained by ssh-agent. Once you add a password to ssh-agent, you will not be asked to provide the key when using SSH or SCP to connect to hosts with your public key.
Amazon EC2 Instance  has already taken care of ‘authorized_keys’ on master server, execute following commands to allow password-less SSH access to slave servers.

Steps:
 In a command line shell,  change directories to the location of the private key file that you created when you launched the instance.
 Use the chmod command to make sure your private key file isn't publicly viewable. For example, if the name of your private key file is my-key-pair.pem, you would use the following command:
          chmod 400 Clusterkey.pem

Use the ssh command to connect to the instance. You'll specify the private key (.pem) file and username@public_dns_name. For Amazon Ubuntu, the default user name is  ubuntu. For RHEL5, the user name is often root but might be ec2-user. For Ubuntu, the user name is ubuntu. For SUSE Linux, the user name is root. Otherwise, check with your AMI provider.

       ssh -i Clusterkey.pem ubuntu@ec2-54-241-10-95.compute-1.amazonaws.com

You'll see a response like the following.
The authenticity of host 'ec2-198-51-100-1.compute-1.amazonaws.com (10.254.142.33)'
can't be established.
RSA key fingerprint is 1f:51:ae:28:bf:89:e9:d8:1f:25:5d:37:2d:7d:b8:ca:9f:f5:f1:6f.
Are you sure you want to continue connecting (yes/no)?

 (Optional) If you've launched a public AMI, verify that the fingerprint in the security alert matches the fingerprint that you obtained in step 1. If these fingerprints don't match, someone might be attempting a "man-in-the-middle" attack. If they match, continue to the next step
 Enter yes.
You'll see a response like the following.
Warning: Permanently added 'ec2-54-241-10-95.compute-1.amazonaws.com' (RSA)
to the list of known hosts.

Sample screenshot for the password-less ssh,
C:\Users\Guidanz-prem\Desktop\6-9-2014 4-14-06 PM.png


4  Download the Cloudera Manager 4.5 installer and execute it on the remote instance:
$ wget http://archive.cloudera.com/cm4/installer/latest/cloudera-manager-installer.bin
$ chmod +x cloudera-manager-installer.bin
$ sudo ./cloudera-manager-installer.bin

C:\Users\Guidanz-prem\Desktop\6-9-2014 5-04-23 PM.png



C:\Users\Guidanz-prem\Desktop\6-9-2014 5-04-37 PM.png



Click Yes,


C:\Users\Guidanz-prem\Desktop\6-9-2014 5-04-53 PM.png



Note down the http://localhost:7180/ this is used to open the Cloudera Manager Console using browser.



C:\Users\Guidanz-prem\Desktop\6-9-2014 5-06-13 PM.png



C:\Users\Guidanz-prem\Desktop\6-9-2014 5-06-45 PM.png



4.2 Installing a CDH Cluster with Cloud Express Wizard
After logging in, Cloudera Manager will detect that it runs on EC2, and it will greet you with the welcome screen of the new wizard (see below). There is a warning that the instances started by this installer are instance store-based, which implies that stopping or terminating these instances results in losing all data stored on them. Remember to back-up  important data from the cluster before terminating the instances!
Default username:admin
Default password:admin

C:\Users\Guidanz-prem\Desktop\6-9-2014 3-24-27 PM.png



Select  Cloudera Enterprise Trial and click next, C:\Users\Guidanz-prem\Desktop\6-9-2014 3-24-46 PM.png

Click Launch the classic wizard,
C:\Users\Guidanz-prem\Desktop\6-9-2014 3-26-00 PM.png
Click continue,
C:\Users\Guidanz-prem\Desktop\6-9-2014 3-26-14 PM.png

Enter the internal ips of each node on the clusters

C:\Users\Guidanz-prem\Desktop\screenshoots\step4.png

Select  the package,versoin and release ,
C:\Users\Guidanz-prem\Desktop\screenshoots\step5.png

Login as Ubuntu user and click browse to upload the .pem file and click continue
C:\Users\Guidanz-prem\Desktop\screenshoots\step6.png

Installation Progress Starts here,


C:\Users\Guidanz-prem\Desktop\screenshoots\step7.png

If No issues with configurations installation will complete successfully.

C:\Users\Guidanz-prem\Desktop\screenshoots\Step8.png

Click Continue,
C:\Users\Guidanz-prem\Desktop\screenshoots\step9.png
C:\Users\Guidanz-prem\Desktop\screenshoots\Step10.png

Choose the CDH services  whichever required, and click inspect Assignments,

C:\Users\Guidanz-prem\Desktop\screenshoots\step11.png

Assign appropriate services and its roles to the required hosts
C:\Users\Guidanz-prem\Desktop\screenshoots\step12.png

Click test connection,

C:\Users\Guidanz-prem\Desktop\screenshoots\step13.png



C:\Users\Guidanz-prem\Desktop\screenshoots\Step14.png




Click continue,

C:\Users\Guidanz-prem\Desktop\screenshoots\Step15.png

Cluster services start’s here,
C:\Users\Guidanz-prem\Desktop\screenshoots\step16.png



C:\Users\Guidanz-prem\Desktop\screenshoots\step17.png




Check the health status and configuration issues it should shows good health

The Java Heap size recommended minimum size is 1G


C:\Users\Guidanz-prem\Desktop\screenshoots\Step19.png


HBase Replication:

Step1:
Enable the replication In the Cloudera Manager as below
C:\Users\Guidanz-prem\Desktop\6-9-2014 7-12-40 PM.png
Restart the HBase

C:\Users\Guidanz-prem\Desktop\6-9-2014 7-14-00 PM.png

C:\Users\Guidanz-prem\Desktop\6-9-2014 7-14-12 PM.png


Step2:
Add the following code to HBase's configuration file (hbase-site.xml) to enable
replication on the master cluster:
hadoop@master1$ vi $HBASE_HOME/conf/hbase-site.xml
<property>
<name>hbase.replication</name>
<value>true</value>
</property>
Sync the change to all the servers, including the client nodes in the cluster, and
restart HBase.
Repeat this to slave node.
Step3:
hbase(main):010:0> create 'emp', { NAME => 'Details', REPLICATION_SCOPE =>1}
0 row(s) in 1.1070 seconds
=> Hbase::Table - emp
hbase(main):011:0> disable 'emp'
0 row(s) in 1.2170 seconds
If you are using an existing table, alter it to support replication:
hbase(main):012:0> alter 'emp', NAME => 'cf1', REPLICATION_SCOPE => '1'
Updating all regions with the new schema...
1/1 regions updated.
Done.
0 row(s) in 1.5200 seconds

hbase(main):013:0> enable 'emp'
0 row(s) in 1.1860 seconds
Execute steps 2 to 3 on the peer (slave) cluster as well. This includes enabling
replication, restarting HBase, and creating an identical copy of the table.
Step4:
hbase(main):014:0> start_replication
0 row(s) in 0.1210 seconds
hbase(main):016:0>  put 'emp', 'row1', 'Details:name','devaraj'
0 row(s) in 0.0180 seconds
hbase(main):017:0>put 'emp','row1','Details:Eid','1009'
0 row(s) in 0.0130 seconds

hbase(main):019:0>put 'emp','row1','Details:mobile','90000101011'
0 row(s) in 0.0140 seconds
hbase(main):021:0> put 'emp','row1','Details:Year','2013'
0 row(s) in 0.0110 seconds
hbase(main):022:0> put 'emp','row2','Details:Name','Prabu'
Step5:
To check peer is enabled or not:
hbase(main):001:0> list_peers
PEER_ID CLUSTER_KEY STATE
1 ip-10-202-169-141.us-west-1.compute.internal:2181:/hbase ENABLED
2 ip-10-190-147-97.us-west-1.compute.internal:2181:/hbase ENABLED
3 ip-10-249-0-249.us-west-1.compute.internal:2181:/hbase ENABLED

hbase(main):002:0> add_peer '2', 'ip-10-190-147-97.us-west-1.compute.internal:2181:/hbase'
0 row(s) in 0.0290 seconds

hbase(main):003:0> add_peer '3', 'ip-10-249-0-249.us-west-1.compute.internal:2181:/hbase'
0 row(s) in 0.0700 seconds.

Step6:
Connect to HBase Shell on the peer cluster and do a scan on the table to see if the
data has been replicated:

$HBASE_HOME/bin/hbase shell

hbase> scan ' emp'
ROW                                                COLUMN+CELL
row1                                              column=Details:name, timestamp=1401702464224, value=Devaraj
row1                                              column=Details:Eid, timestamp=1401703326645, value=1010

HADOOP_HOME/bin/hadoop jar $HBASE_HOME/hbase-
0.92.1.jar verifyrep 1 emp
Step6:
Stop the replication on the master cluster by running the following command:

hbase> stop_replication

Step7:
Remove the replication peer from the master cluster by using the following command:

hbase> remove_peer '1'

No comments:

Post a Comment