Thursday 18 June 2015

Hbase Backup and Restore




HBase Backup and Restore
There are two broad strategies for performing HBase backups
 Backing up with a full cluster shutdown, and backing up on a live cluster. Each approach has pros and cons.

 Full Shutdown Backup
A full shutdown backup has to stop HBase (or disable all tables) at first, then use Hadoop's
distcp command to copy the contents of an HBase directory to either another directory on
the same HDFS, or to a different HDFS.
Distcp (distributed copy) is a tool provided by Hadoop for copying a large dataset on the
same, or different HDFS cluster. It uses MapReduce to copy files in parallel, handle error and
recovery, and report the job status.

Some environments can tolerate a periodic full shutdown of their HBase cluster, for example if it is being used a back-end analytic capacity and not serving front-end web-pages. The benefits are that the NameNode/Master are RegionServers are down, so there is no chance of missing any in-flight changes to either StoreFiles or metadata. The obvious con is that the cluster is down.
Backing up HBase Data with Same Cluster:
Shutdown the HBase Cluster

1. hduser@webserver:~$ sudo service hbase-master stop
* Stopping HBase master daemon (hbase-master):
stopping master.
2. hduser@webserver:~$ sudojps
1940 DataNode
2327 SecondaryNameNode
3068 Main
2865 HRegionServer
7497 Jps
2130 NameNode
3292 QuorumPeerMain
1813 TaskTracker

Make sure the HMaster daemon is not listed in the output.

3. Also, make sure mapred.map.tasks.speculative.execution is not set to
final and true on the client of the source cluster.
This property is set in the MapReduce configuration file (mapred-site.xml) under
the $HADOOP_HOME/conf directory. If it is set to final and true, remove the
setting. This is a client-side change; it will only affect the MapReduce jobs submitted
from that client.

hduser@webserver:/etc/hadoop/conf$ cat mapred-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<configuration>
<property>
<name>mapred.job.tracker</name>
<value>localhost:8021</value>
</property>

<!-- Enable Hue plugins -->
<property>
<name>mapred.jobtracker.plugins</name>
<value>org.apache.hadoop.thriftfs.ThriftJobTrackerPlugin</value>
<description>Comma-separated list of jobtracker plug-ins to be activated.
</description>
</property>
<property>
<name>jobtracker.thrift.address</name>
<value>0.0.0.0:9290</value>
</property>
</configuration>


3. To Create the directory in hdfs

hduser@webserver:~$ sudo -u hdfshadoopfs -mkdir /opt/fullbackup


4.Use distcp to copy the HBase root directory from the source to the
Backup cluster. The HBase root directory is set by the hbase.rootdir property
in the HBase configuration file (hbase-site.xml).
hduser@webserver:~$ sudo -u hdfshadoopdistcp hdfs://localhost:8020/hbase/hdfs://localhost:8020/opt/fullbackup/
hdfs://localhost:8020/opt/fullbackup/                                                                                   14/03/12 15:32:07 INFO tools.DistCp: srcPaths=[hdfs://localhost:8020/hbase]
14/03/12 15:32:07 INFO tools.DistCp: destPath=hdfs://localhost:8020/opt/fullbackup
14/03/12 15:32:09 INFO tools.DistCp: sourcePathsCount=106
14/03/12 15:32:09 INFO tools.DistCp: filesToCopyCount=42
14/03/12 15:32:09 INFO tools.DistCp: bytesToCopyCount=18.6 K
14/03/12 15:32:09 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
14/03/12 15:32:10 INFO mapred.JobClient: Running job: job_201403121358_0001
14/03/12 15:32:11 INFO mapred.JobClient:  map 0% reduce 0%
14/03/12 15:32:24 INFO mapred.JobClient:  map 100% reduce 0%
14/03/12 15:32:27 INFO mapred.JobClient: Job complete: job_201403121358_0001
14/03/12 15:32:27 INFO mapred.JobClient: Counters: 27
14/03/12 15:32:27 INFO mapred.JobClient:   File System Counters
14/03/12 15:32:27 INFO mapred.JobClient:     FILE: Number of bytes read=0
14/03/12 15:32:27 INFO mapred.JobClient:     FILE: Number of bytes written=199262
14/03/12 15:32:27 INFO mapred.JobClient:     FILE: Number of read operations=0
14/03/12 15:32:27 INFO mapred.JobClient:     FILE: Number of large read operations=0
14/03/12 15:32:27 INFO mapred.JobClient:     FILE: Number of write operations=0
14/03/12 15:32:27 INFO mapred.JobClient:     HDFS: Number of bytes read=39946
14/03/12 15:32:27 INFO mapred.JobClient:     HDFS: Number of bytes written=19004
14/03/12 15:32:27 INFO mapred.JobClient:     HDFS: Number of read operations=319
14/03/12 15:32:27 INFO mapred.JobClient:     HDFS: Number of large read operations=0
14/03/12 15:32:27 INFO mapred.JobClient:     HDFS: Number of write operations=191
14/03/12 15:32:27 INFO mapred.JobClient:   Job Counters
14/03/12 15:32:27 INFO mapred.JobClient:     Launched map tasks=1
14/03/12 15:32:27 INFO mapred.JobClient:     Total time spent by all maps in occupied slots (ms)=15039
14/03/12 15:32:27 INFO mapred.JobClient:     Total time spent by all reduces in occupied slots (ms)=0
14/03/12 15:32:27 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0
14/03/12 15:32:27 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0
14/03/12 15:32:27 INFO mapred.JobClient:   Map-Reduce Framework
14/03/12 15:32:27 INFO mapred.JobClient:     Map input records=105
14/03/12 15:32:27 INFO mapred.JobClient:     Map output records=0
14/03/12 15:32:27 INFO mapred.JobClient:     Input split bytes=167
14/03/12 15:32:27 INFO mapred.JobClient:     Spilled Records=0
14/03/12 15:32:27 INFO mapred.JobClient:     CPU time spent (ms)=1530
14/03/12 15:32:27 INFO mapred.JobClient:     Physical memory (bytes) snapshot=145743872
14/03/12 15:32:27 INFO mapred.JobClient:     Virtual memory (bytes) snapshot=1014980608
14/03/12 15:32:27 INFO mapred.JobClient:     Total committed heap usage (bytes)=107479040
14/03/12 15:32:27 INFO mapred.JobClient:   org.apache.hadoop.mapreduce.lib.input.FileInputFormatCounter
14/03/12 15:32:27 INFO mapred.JobClient:     BYTES_READ=20675
14/03/12 15:32:27 INFO mapred.JobClient:   distcp
14/03/12 15:32:27 INFO mapred.JobClient:     Bytes copied=19004
14/03/12 15:32:27 INFO mapred.JobClient:     Bytes expected=19004
14/03/12 15:32:27 INFO mapred.JobClient:     Files copied=42

Backing up HBase Data from HDFS to Local File System:

1.Create the directory
hduser@webserver:/opt$ sudomkdirfullbackup

2. Change the read-write permission

hduser@webserver:/opt$ sudochmod -R 777 /opt/fullbackup/

3.Use distcp to copy the HBase root directory from the hdfs to the
Local file system.

hduser@webserver:/$ sudo -u hdfshadoopfs -copyToLocal hdfs://localhost:8020/hbase/* /opt/fullbackup/

hduser@webserver:/opt/fullbackup$ ls
cust  dummyemp  emp1  hbase.id  hbase.version  -ROOT-

Copying from Local Filesystem to HDFS:

hduser@webserver:/opt/fullbackup$ sudo -u hdfshadoopfs -put /opt/fullbackup hdfs://localhost:8020/backup
hduser@webserver:/opt/fullbackup$ sudo -u hdfshadoopfs -ls /
Found 7 items
drwxrwxrwx   - root  supergroup          0 2014-02-19 14:15 /_distcp_logs_rn8sji
drwxr-xr-x   - hdfssupergroup          0 2014-03-12 17:03 /backup
drwxr-xr-x   - hbasesupergroup          0 2014-03-12 13:59 /hbase
drwxr-xr-x   - root  supergroup          0 2014-03-12 15:41 /opt
drwxrwxrwt   - hdfssupergroup          0 2014-03-07 14:31 /tmp
drwxrwxrwx   - hdfssupergroup          0 2014-02-18 12:36 /user
drwxrwxrwx   - hdfssupergroup          0 2014-01-07 16:37 /var


Restoring the table from the Previous Backup[Local file system to HDFS]
1.  Drop the table  emp1
hbase(main):001:0> list
TABLE
cust
dummy
emp
emp1
4 row(s) in 0.6790 seconds

hbase(main):002:0> disable 'emp1'
0 row(s) in 2.1620 seconds

hbase(main):003:0> drop 'emp1'
0 row(s) in 1.1260 seconds

2 . Restoring from Local file system to HDFS
hduser@webserver:/$ sudo -u hdfshadoopfs -copyFromLocal /opt/fullbackup/emp1  hdfs://localhost:8020/hbase


3. Checking  the table restored or not
hduser@webserver:/$ sudo service hbase-master start
* Starting HBase master daemon (hbase-master):
 * HBase master daemon is running
hduser@webserver:/$ sudohbase shell
14/03/12 17:16:29 WARN conf.Configuration: hadoop.native.lib is deprecated. Instead, use io.native.lib.available
HBase Shell; enter 'help<RETURN>' for list of supported commands.
Type "exit<RETURN>" to leave the HBase Shell
Version 0.94.6-cdh4.5.0, rUnknown, Wed Nov 20 15:48:11 PST 2013

hbase(main):001:0> list
TABLE
cust
dummy
emp
emp1
4 row(s) in 0.6340 seconds

Successfully restored.

1 comment:

  1. Excellent article. Very interesting to read. I really love to read such a nice article. Thanks! keep rocking.Big data hadoop online Course

    ReplyDelete