You don't have to be great to start, but you have to start to be great: Hbase Backup and Restore

HBase Backup and Restore

There are two broad strategies for performing HBase backups

Backing up with a full cluster shutdown, and backing up on a live cluster. Each approach has pros and cons.

Full Shutdown Backup

A full shutdown backup has to stop HBase (or disable all tables) at first, then use Hadoop's

distcp command to copy the contents of an HBase directory to either another directory on

the same HDFS, or to a different HDFS.

Distcp (distributed copy) is a tool provided by Hadoop for copying a large dataset on the

same, or different HDFS cluster. It uses MapReduce to copy files in parallel, handle error and

recovery, and report the job status.

Some environments can tolerate a periodic full shutdown of their HBase cluster, for example if it is being used a back-end analytic capacity and not serving front-end web-pages. The benefits are that the NameNode/Master are RegionServers are down, so there is no chance of missing any in-flight changes to either StoreFiles or metadata. The obvious con is that the cluster is down.

Backing up HBase Data with Same Cluster:

Shutdown the HBase Cluster

1. hduser@webserver:~$ sudo service hbase-master stop

* Stopping HBase master daemon (hbase-master):

stopping master.

2. hduser@webserver:~$ sudojps

1940 DataNode

2327 SecondaryNameNode

3068 Main

2865 HRegionServer

7497 Jps

2130 NameNode

3292 QuorumPeerMain

1813 TaskTracker

Make sure the HMaster daemon is not listed in the output.

3. Also, make sure mapred.map.tasks.speculative.execution is not set to

final and true on the client of the source cluster.

This property is set in the MapReduce configuration file (mapred-site.xml) under

the $HADOOP_HOME/conf directory. If it is set to final and true, remove the

setting. This is a client-side change; it will only affect the MapReduce jobs submitted

from that client.

hduser@webserver:/etc/hadoop/conf$ cat mapred-site.xml

<?xml version="1.0"?>

<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<configuration>

<property>

<name>mapred.job.tracker</name>

<value>localhost:8021</value>

</property>

<property>

<name>mapred.jobtracker.plugins</name>

<value>org.apache.hadoop.thriftfs.ThriftJobTrackerPlugin</value>

<description>Comma-separated list of jobtracker plug-ins to be activated.

</description>

</property>

<property>

<name>jobtracker.thrift.address</name>

<value>0.0.0.0:9290</value>

</property>

</configuration>

3. To Create the directory in hdfs

hduser@webserver:~$ sudo -u hdfshadoopfs -mkdir /opt/fullbackup

4.Use distcp to copy the HBase root directory from the source to the

Backup cluster. The HBase root directory is set by the hbase.rootdir property

in the HBase configuration file (hbase-site.xml).

hduser@webserver:~$ sudo -u hdfshadoopdistcp hdfs://localhost:8020/hbase/hdfs://localhost:8020/opt/fullbackup/

hdfs://localhost:8020/opt/fullbackup/ 14/03/12 15:32:07 INFO tools.DistCp: srcPaths=[hdfs://localhost:8020/hbase]

14/03/12 15:32:07 INFO tools.DistCp: destPath=hdfs://localhost:8020/opt/fullbackup

14/03/12 15:32:09 INFO tools.DistCp: sourcePathsCount=106

14/03/12 15:32:09 INFO tools.DistCp: filesToCopyCount=42

14/03/12 15:32:09 INFO tools.DistCp: bytesToCopyCount=18.6 K

14/03/12 15:32:09 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.

14/03/12 15:32:10 INFO mapred.JobClient: Running job: job_201403121358_0001

14/03/12 15:32:11 INFO mapred.JobClient: map 0% reduce 0%

14/03/12 15:32:24 INFO mapred.JobClient: map 100% reduce 0%

14/03/12 15:32:27 INFO mapred.JobClient: Job complete: job_201403121358_0001

14/03/12 15:32:27 INFO mapred.JobClient: Counters: 27

14/03/12 15:32:27 INFO mapred.JobClient: File System Counters

14/03/12 15:32:27 INFO mapred.JobClient: FILE: Number of bytes read=0

14/03/12 15:32:27 INFO mapred.JobClient: FILE: Number of bytes written=199262

14/03/12 15:32:27 INFO mapred.JobClient: FILE: Number of read operations=0

14/03/12 15:32:27 INFO mapred.JobClient: FILE: Number of large read operations=0

14/03/12 15:32:27 INFO mapred.JobClient: FILE: Number of write operations=0

14/03/12 15:32:27 INFO mapred.JobClient: HDFS: Number of bytes read=39946

14/03/12 15:32:27 INFO mapred.JobClient: HDFS: Number of bytes written=19004

14/03/12 15:32:27 INFO mapred.JobClient: HDFS: Number of read operations=319

14/03/12 15:32:27 INFO mapred.JobClient: HDFS: Number of large read operations=0

14/03/12 15:32:27 INFO mapred.JobClient: HDFS: Number of write operations=191

14/03/12 15:32:27 INFO mapred.JobClient: Job Counters

14/03/12 15:32:27 INFO mapred.JobClient: Launched map tasks=1

14/03/12 15:32:27 INFO mapred.JobClient: Total time spent by all maps in occupied slots (ms)=15039

14/03/12 15:32:27 INFO mapred.JobClient: Total time spent by all reduces in occupied slots (ms)=0

14/03/12 15:32:27 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0

14/03/12 15:32:27 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0

14/03/12 15:32:27 INFO mapred.JobClient: Map-Reduce Framework

14/03/12 15:32:27 INFO mapred.JobClient: Map input records=105

14/03/12 15:32:27 INFO mapred.JobClient: Map output records=0

14/03/12 15:32:27 INFO mapred.JobClient: Input split bytes=167

14/03/12 15:32:27 INFO mapred.JobClient: Spilled Records=0

14/03/12 15:32:27 INFO mapred.JobClient: CPU time spent (ms)=1530

14/03/12 15:32:27 INFO mapred.JobClient: Physical memory (bytes) snapshot=145743872

14/03/12 15:32:27 INFO mapred.JobClient: Virtual memory (bytes) snapshot=1014980608

14/03/12 15:32:27 INFO mapred.JobClient: Total committed heap usage (bytes)=107479040

14/03/12 15:32:27 INFO mapred.JobClient: org.apache.hadoop.mapreduce.lib.input.FileInputFormatCounter

14/03/12 15:32:27 INFO mapred.JobClient: BYTES_READ=20675

14/03/12 15:32:27 INFO mapred.JobClient: distcp

14/03/12 15:32:27 INFO mapred.JobClient: Bytes copied=19004

14/03/12 15:32:27 INFO mapred.JobClient: Bytes expected=19004

14/03/12 15:32:27 INFO mapred.JobClient: Files copied=42

Backing up HBase Data from HDFS to Local File System:

1.Create the directory

hduser@webserver:/opt$ sudomkdirfullbackup

2. Change the read-write permission

hduser@webserver:/opt$ sudochmod -R 777 /opt/fullbackup/

3.Use distcp to copy the HBase root directory from the hdfs to the

Local file system.

hduser@webserver:/$ sudo -u hdfshadoopfs -copyToLocal hdfs://localhost:8020/hbase/* /opt/fullbackup/

hduser@webserver:/opt/fullbackup$ ls

cust dummyemp emp1 hbase.id hbase.version -ROOT-

Copying from Local Filesystem to HDFS:

hduser@webserver:/opt/fullbackup$ sudo -u hdfshadoopfs -put /opt/fullbackup hdfs://localhost:8020/backup

hduser@webserver:/opt/fullbackup$ sudo -u hdfshadoopfs -ls /

Found 7 items

drwxrwxrwx - root supergroup 0 2014-02-19 14:15 /_distcp_logs_rn8sji

drwxr-xr-x - hdfssupergroup 0 2014-03-12 17:03 /backup

drwxr-xr-x - hbasesupergroup 0 2014-03-12 13:59 /hbase

drwxr-xr-x - root supergroup 0 2014-03-12 15:41 /opt

drwxrwxrwt - hdfssupergroup 0 2014-03-07 14:31 /tmp

drwxrwxrwx - hdfssupergroup 0 2014-02-18 12:36 /user

drwxrwxrwx - hdfssupergroup 0 2014-01-07 16:37 /var

Restoring the table from the Previous Backup[Local file system to HDFS]

1. Drop the table emp1

hbase(main):001:0> list

TABLE

cust

dummy

emp

emp1

4 row(s) in 0.6790 seconds

hbase(main):002:0> disable 'emp1'

0 row(s) in 2.1620 seconds

hbase(main):003:0> drop 'emp1'

0 row(s) in 1.1260 seconds

2 . Restoring from Local file system to HDFS

hduser@webserver:/$ sudo -u hdfshadoopfs -copyFromLocal /opt/fullbackup/emp1 hdfs://localhost:8020/hbase

3. Checking the table restored or not

hduser@webserver:/$ sudo service hbase-master start

* Starting HBase master daemon (hbase-master):

* HBase master daemon is running

hduser@webserver:/$ sudohbase shell

14/03/12 17:16:29 WARN conf.Configuration: hadoop.native.lib is deprecated. Instead, use io.native.lib.available

HBase Shell; enter 'help<RETURN>' for list of supported commands.

Type "exit<RETURN>" to leave the HBase Shell

Version 0.94.6-cdh4.5.0, rUnknown, Wed Nov 20 15:48:11 PST 2013

hbase(main):001:0> list

TABLE

cust

dummy

emp

emp1

4 row(s) in 0.6340 seconds

Successfully restored.

You don't have to be great to start, but you have to start to be great

Thursday, 18 June 2015

Hbase Backup and Restore

1 comment: