HBase Backup and Restore
There are two broad strategies for performing HBase backups
Backing up with a
full cluster shutdown, and backing up on a live cluster. Each approach has pros
and cons.
Full Shutdown Backup
A full
shutdown backup has to stop HBase (or disable all tables) at first, then use
Hadoop's
distcp
command to copy the contents of an HBase directory to either another directory
on
the same
HDFS, or to a different HDFS.
Distcp
(distributed copy) is a tool provided by Hadoop for copying a large dataset on
the
same, or
different HDFS cluster. It uses MapReduce to copy files in parallel, handle
error and
recovery,
and report the job status.
Some environments can tolerate a periodic full shutdown of
their HBase cluster, for example if it is being used a back-end analytic
capacity and not serving front-end web-pages. The benefits are that the
NameNode/Master are RegionServers are down, so there is no chance of missing
any in-flight changes to either StoreFiles or metadata. The obvious con is that
the cluster is down.
Backing up HBase Data with Same
Cluster:
Shutdown the HBase Cluster
1. hduser@webserver:~$
sudo service hbase-master stop
*
Stopping HBase master daemon (hbase-master):
stopping
master.
2. hduser@webserver:~$
sudojps
1940 DataNode
2327 SecondaryNameNode
3068 Main
2865 HRegionServer
7497 Jps
2130 NameNode
3292 QuorumPeerMain
1813 TaskTracker
Make
sure the HMaster daemon is not listed in the output.
3. Also,
make sure mapred.map.tasks.speculative.execution is not set to
final and true on the client of the source cluster.
This property is set in the MapReduce configuration file
(mapred-site.xml) under
the $HADOOP_HOME/conf directory. If it is set to final and
true, remove the
setting. This is a client-side change; it will only affect
the MapReduce jobs submitted
from that client.
hduser@webserver:/etc/hadoop/conf$
cat mapred-site.xml
<?xml
version="1.0"?>
<?xml-stylesheet
type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>localhost:8021</value>
</property>
<!-- Enable Hue
plugins -->
<property>
<name>mapred.jobtracker.plugins</name>
<value>org.apache.hadoop.thriftfs.ThriftJobTrackerPlugin</value>
<description>Comma-separated
list of jobtracker plug-ins to be activated.
</description>
</property>
<property>
<name>jobtracker.thrift.address</name>
<value>0.0.0.0:9290</value>
</property>
</configuration>
3. To Create the directory in hdfs
hduser@webserver:~$
sudo -u hdfshadoopfs -mkdir /opt/fullbackup
4.Use distcp to copy the HBase root directory from
the source to the
Backup cluster. The HBase root directory is set by
the hbase.rootdir property
in the HBase configuration file (hbase-site.xml).
hduser@webserver:~$
sudo -u hdfshadoopdistcp
hdfs://localhost:8020/hbase/hdfs://localhost:8020/opt/fullbackup/
hdfs://localhost:8020/opt/fullbackup/
14/03/12 15:32:07 INFO tools.DistCp: srcPaths=[hdfs://localhost:8020/hbase]
14/03/12 15:32:07 INFO
tools.DistCp: destPath=hdfs://localhost:8020/opt/fullbackup
14/03/12 15:32:09 INFO
tools.DistCp: sourcePathsCount=106
14/03/12 15:32:09 INFO
tools.DistCp: filesToCopyCount=42
14/03/12 15:32:09 INFO
tools.DistCp: bytesToCopyCount=18.6 K
14/03/12 15:32:09 WARN
mapred.JobClient: Use GenericOptionsParser for parsing the arguments.
Applications should implement Tool for the same.
14/03/12 15:32:10 INFO
mapred.JobClient: Running job: job_201403121358_0001
14/03/12 15:32:11 INFO
mapred.JobClient: map 0% reduce 0%
14/03/12 15:32:24 INFO
mapred.JobClient: map 100% reduce 0%
14/03/12 15:32:27 INFO
mapred.JobClient: Job complete: job_201403121358_0001
14/03/12 15:32:27 INFO
mapred.JobClient: Counters: 27
14/03/12 15:32:27 INFO
mapred.JobClient: File System Counters
14/03/12 15:32:27 INFO
mapred.JobClient: FILE: Number of
bytes read=0
14/03/12 15:32:27 INFO
mapred.JobClient: FILE: Number of
bytes written=199262
14/03/12 15:32:27 INFO
mapred.JobClient: FILE: Number of
read operations=0
14/03/12 15:32:27 INFO
mapred.JobClient: FILE: Number of
large read operations=0
14/03/12 15:32:27 INFO
mapred.JobClient: FILE: Number of
write operations=0
14/03/12 15:32:27 INFO
mapred.JobClient: HDFS: Number of
bytes read=39946
14/03/12 15:32:27 INFO
mapred.JobClient: HDFS: Number of
bytes written=19004
14/03/12 15:32:27 INFO
mapred.JobClient: HDFS: Number of
read operations=319
14/03/12 15:32:27 INFO
mapred.JobClient: HDFS: Number of
large read operations=0
14/03/12 15:32:27 INFO
mapred.JobClient: HDFS: Number of
write operations=191
14/03/12 15:32:27 INFO
mapred.JobClient: Job Counters
14/03/12 15:32:27 INFO
mapred.JobClient: Launched map
tasks=1
14/03/12 15:32:27 INFO
mapred.JobClient: Total time spent by
all maps in occupied slots (ms)=15039
14/03/12 15:32:27 INFO
mapred.JobClient: Total time spent by
all reduces in occupied slots (ms)=0
14/03/12 15:32:27 INFO
mapred.JobClient: Total time spent by
all maps waiting after reserving slots (ms)=0
14/03/12 15:32:27 INFO
mapred.JobClient: Total time spent by
all reduces waiting after reserving slots (ms)=0
14/03/12 15:32:27 INFO
mapred.JobClient: Map-Reduce Framework
14/03/12 15:32:27 INFO
mapred.JobClient: Map input records=105
14/03/12 15:32:27 INFO
mapred.JobClient: Map output
records=0
14/03/12 15:32:27 INFO
mapred.JobClient: Input split
bytes=167
14/03/12 15:32:27 INFO
mapred.JobClient: Spilled Records=0
14/03/12 15:32:27 INFO
mapred.JobClient: CPU time spent
(ms)=1530
14/03/12 15:32:27 INFO
mapred.JobClient: Physical memory
(bytes) snapshot=145743872
14/03/12 15:32:27 INFO
mapred.JobClient: Virtual memory
(bytes) snapshot=1014980608
14/03/12 15:32:27 INFO
mapred.JobClient: Total committed
heap usage (bytes)=107479040
14/03/12 15:32:27 INFO
mapred.JobClient:
org.apache.hadoop.mapreduce.lib.input.FileInputFormatCounter
14/03/12 15:32:27 INFO
mapred.JobClient: BYTES_READ=20675
14/03/12 15:32:27 INFO
mapred.JobClient: distcp
14/03/12 15:32:27 INFO
mapred.JobClient: Bytes copied=19004
14/03/12 15:32:27 INFO
mapred.JobClient: Bytes
expected=19004
14/03/12 15:32:27 INFO
mapred.JobClient: Files copied=42
Backing up HBase Data from HDFS to Local File System:
1.Create the directory
hduser@webserver:/opt$
sudomkdirfullbackup
2. Change the read-write permission
hduser@webserver:/opt$ sudochmod -R 777
/opt/fullbackup/
3.Use
distcp to copy the HBase root directory from the hdfs to the
Local file
system.
hduser@webserver:/$ sudo -u hdfshadoopfs
-copyToLocal hdfs://localhost:8020/hbase/* /opt/fullbackup/
hduser@webserver:/opt/fullbackup$ ls
cust dummyemp
emp1 hbase.id hbase.version
-ROOT-
Copying from Local Filesystem to HDFS:
hduser@webserver:/opt/fullbackup$ sudo -u
hdfshadoopfs -put /opt/fullbackup hdfs://localhost:8020/backup
hduser@webserver:/opt/fullbackup$ sudo -u
hdfshadoopfs -ls /
Found 7 items
drwxrwxrwx - root
supergroup 0 2014-02-19
14:15 /_distcp_logs_rn8sji
drwxr-xr-x - hdfssupergroup 0 2014-03-12 17:03 /backup
drwxr-xr-x - hbasesupergroup 0 2014-03-12 13:59 /hbase
drwxr-xr-x - root
supergroup 0 2014-03-12
15:41 /opt
drwxrwxrwt - hdfssupergroup 0 2014-03-07 14:31 /tmp
drwxrwxrwx - hdfssupergroup 0 2014-02-18 12:36 /user
drwxrwxrwx - hdfssupergroup 0 2014-01-07 16:37 /var
Restoring the table
from the Previous Backup[Local file system to HDFS]
1. Drop the
table emp1
hbase(main):001:0>
list
TABLE
cust
dummy
emp
emp1
4 row(s) in 0.6790
seconds
hbase(main):002:0>
disable 'emp1'
0 row(s) in 2.1620
seconds
hbase(main):003:0>
drop 'emp1'
0 row(s) in 1.1260
seconds
2 . Restoring from
Local file system to HDFS
hduser@webserver:/$ sudo -u hdfshadoopfs -copyFromLocal
/opt/fullbackup/emp1
hdfs://localhost:8020/hbase
3. Checking
the table restored or not
hduser@webserver:/$ sudo service
hbase-master start
* Starting HBase master
daemon (hbase-master):
* HBase master daemon is running
hduser@webserver:/$ sudohbase shell
14/03/12 17:16:29 WARN
conf.Configuration: hadoop.native.lib is deprecated. Instead, use
io.native.lib.available
HBase Shell; enter
'help<RETURN>' for list of supported commands.
Type
"exit<RETURN>" to leave the HBase Shell
Version 0.94.6-cdh4.5.0,
rUnknown, Wed Nov 20 15:48:11 PST 2013
hbase(main):001:0> list
TABLE
cust
dummy
emp
emp1
4 row(s) in 0.6340
seconds
Excellent article. Very interesting to read. I really love to read such a nice article. Thanks! keep rocking.Big data hadoop online Course
ReplyDelete