Our Client is a NE based data solution provider in the healthcare industry. The client manages a single node CDH5 cluster Ver 5.3.2 in Ubuntu (Trusted Tahr) . The client had two main concerns. One of them being extracting data from HBase. Each table in HBase has its own metadata file. The metadata files provide information about the tables in HBase, including which columns to include and exclude from the output. The other concern was to convert the output data to JSON format.

In order to extract data from HBase, Pig is used. Originally, different approaches were made to interact with HBase. However, after exploring different options, Pig was an apt solution for this project due to its built-in functions and UDF flexibility. Only one UDF is used in this project, which is written in Python.

If you would like to read more about the article, click the link here ——-HbaseExtractionFinal