mac builds a pseudo-distributed learning environment for big data - deploying HBase

  1. Select an [Apache Download Mirror] ( https://www.apache.org/dyn/closer.lua/hbase/). It is recommended to click the top link, enter HBase Releases, click the stable folder, and then download the binary file ending in tar.gz to the local. Do not download files ending in src.tar.gz just yet.

  1. Unzip, and then enter the directory to be unzipped.

$ tar xzvf hbase-3.0.0-SNAPSHOT-bin.tar.gz
$ cd hbase-3.0.0-SNAPSHOT/
  1. Before starting HBase, you need to set the JAVA_HOME environment variable. You can set variables through common settings of the operating system, and HBase also provides a central mechanism conf/hbase-env.sh. Edit this file, uncomment the line starting with JAVA_HOME and set it to the appropriate path for your operating system. The JAVA_HOME variable should be set to the directory containing the executable bin/java. Today, most Linux operating systems provide a mechanism, such as /usr/bin/alternatives on RHEL or CentOS, to easily switch environments. In this case, you can set JAVA_HOME to the directory containing the symlink to bin/java, usually /usr.

JAVA_HOME=/usr
  1. Edit the HBase main configuration file conf/hbase-site.xml. At this point, you need to specify the HBase and ZooKeeper data storage directories on the local file system, and be aware of some risks. By default, HBase creates a new directory under /tmp, but many services delete the contents of /tmp on restart, so you'll need to store your data elsewhere. The following configuration files are in hbase, in the home directory of a user named testuser. The first installation of HBase is empty, you can paste the <property> tag in the <configuration>.

Example 1. hbase-site.xml Standalone HBase configuration

<configuration>
  <property>
    <name>hbase.rootdir</name>
    <value>file:///home/testuser/hbase</value>
  </property>
  <property>
    <name>hbase.zookeeper.property.dataDir</name>
    <value>/home/testuser/zookeeper</value>
  </property>
  <property>
    <name>hbase.unsafe.stream.capability.enforce</name>
    <value>false</value>
    <description>
      Controls whether HBase will check for stream capabilities (hflush/hsync).

      Disable this if you intend to run on LocalFileSystem, denoted by a rootdir
      with the 'file://' scheme, but be mindful of the NOTE below.

      WARNING: Setting this to false blinds you to potential data loss and
      inconsistent system state in the event of process and/or node failures. If
      HBase is complaining of an inability to use hsync or hflush it's most
      likely not a false positive.
    </description>
  </property>
</configuration>

You do not need to create an HBase data directory. HBase is created automatically. HBase will attempt to migrate if you want to custom create the directory.

in the above example hbase.rootdir points to A directory in the local filesystem. The 'file://' prefix indicates the local file system. You should keep the warnings in the configuration examples in mind. In Standalone mode, HBase utilizes Apache Hadoopd's local file storage. But this method does not guarantee the persistence of HBase operation. This is only suitable for local development and test cases, which can control the cost of cluster failure very well. It is not suitable for production deployment, otherwise you will lose data.

For deploying HBase on HDFS, you can point hbase.rootdir like: hdfs: //namenode.example.org:8020/hbase. For more usage of this variable, see the chapter Deploying Standalone HBase based on HDFS.

  1. The script bin/start-hbase.sh provides a convenient way to start HBase. Execute the command, and you can see the message that HBase started successfully in the standard output log. You can use the jps command to confirm that you have a running HMaster. In the Standalone mode of HBase, all services run in the same JVM, such as HMaster, singleton HRegionServer and ZooKeeper daemon. You can go to the Web UI_ http://localhost:16010 _ View HBase.

Java must be installed and available. If you get an error that Java is not installed, java may be in a non-standard location, you can edit conf/hbase-env.sh , modify JAVA_HOME path, and make sure to include bin/java.

Procedure: Using HBase for the first time

  1. Connect to HBase

Use the hbase shell command in the bin/ directory of the HBase installation directory to connect to the running HBase instance. In the following example, when you start the HBase Shell and ignore some usage and version information, the HBase Shell ends with the character >.

$ ./bin/hbase shell
hbase(main):001:0>
  1. Preview help text for HBase Shell

Type help and press Enter, you can see the basic information of HBase Shell and some sample commands. Please note that table names, rows, and columns must be enclosed in quotation marks.

  1. create table

To create a table using create, you must implement a table name and column family name.

hbase(main):001:0> create 'test', 'cf'
0 row(s) in 0.4170 seconds

=> Hbase::Table - test
  1. table information

Use list to view existing tables

hbase(main):002:0> list 'test'
TABLE
test
1 row(s) in 0.0180 seconds

=> ["test"]

Use describe to view table details and configuration

hbase(main):003:0> describe 'test'
Table test is ENABLED
test
COLUMN FAMILIES DESCRIPTION
{NAME => 'cf', VERSIONS => '1', EVICT_BLOCKS_ON_CLOSE => 'false', NEW_VERSION_BEHAVIOR => 'false', KEEP_DELETED_CELLS => 'FALSE', CACHE_DATA_ON_WRITE =>
'false', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', MIN_VERSIONS => '0', REPLICATION_SCOPE => '0', BLOOMFILTER => 'ROW', CACHE_INDEX_ON_WRITE => 'f
alse', IN_MEMORY => 'false', CACHE_BLOOMS_ON_WRITE => 'false', PREFETCH_BLOCKS_ON_OPEN => 'false', COMPRESSION => 'NONE', BLOCKCACHE => 'true', BLOCKSIZE
 => '65536'}
1 row(s)
Took 0.9998 seconds
  1. insert data

Insert data using put

hbase(main):003:0> put 'test', 'row1', 'cf:a', 'value1'
0 row(s) in 0.0850 seconds

hbase(main):004:0> put 'test', 'row2', 'cf:b', 'value2'
0 row(s) in 0.0110 seconds

hbase(main):005:0> put 'test', 'row3', 'cf:c', 'value3'
0 row(s) in 0.0100 seconds

Here, we have inserted three pieces of data into the test table in sequence. First insert a piece of data whose rowkey is row1, column is cf:a, and value is value1. Columns in HBase contain column family prefixes. In this example, the column before the colon is cf, and the column after the colon is the column qualifier a.

  1. scan all data

One of the ways to get data from HBase is scan. Use the scan command to scan table data. You can limit the scan, but for now, get all the data.

hbase(main):006:0> scan 'test'
ROW                                      COLUMN+CELL
 row1                                    column=cf:a, timestamp=1421762485768, value=value1
 row2                                    column=cf:b, timestamp=1421762491785, value=value2
 row3                                    column=cf:c, timestamp=1421762496210, value=value3
3 row(s) in 0.0230 seconds
  1. Get a piece of data

Use the get command to get one piece of data at a time.

hbase(main):007:0> get 'test', 'row1'
COLUMN                                   CELL
 cf:a                                    timestamp=1421762485768, value=value1
1 row(s) in 0.0350 seconds
  1. disabled table

If you want to delete a table or change the settings of the table, and some other scenarios, first you need to disable the table using the disable command. You can "un-disable" ie enable a table with the enable command.

hbase(main):008:0> disable 'test'
0 row(s) in 1.1820 seconds

hbase(main):009:0> enable 'test'
0 row(s) in 0.1770 seconds

Disable the table again if you tested the enable command above:

hbase(main):010:0> disable 'test'
0 row(s) in 1.1820 seconds
  1. delete table

Use the drop command to drop a table.

hbase(main):011:0> drop 'test'
0 row(s) in 0.1370 seconds
  1. Quit HBase Shell.

Use the quit command to exit the command line and disconnect from the cluster. HBase is still running in the background.

Procedure: Stop HBase

  1. Script bin/start-hbase.sh This script provides convenience to start all HBase services. Similarly, the script bin/stop-hbase.sh is used to stop all HBase services.

$ ./bin/stop-hbase.sh
stopping hbase....................
$
  1. After using this command, it may take several minutes to stop the service process. You can use jps to confirm that the HMaster and HRegionServer processes are down.

Tags: Big Data Distribution Hadoop HBase

Posted by ganeshasri on Tue, 28 Mar 2023 01:27:27 +0530