Legacy Community Health was founded in 1978. It is a federally qualified Health Center and they provide services such as adult primary care, dental care, vision services, nutrition services, HIV/AIDS care, behavioural health services, etc. Currently, traditional RDBMS is used to manage all the data regarding to patient, medication, transaction, and more. Our goal of this project was to load the current existing data in RDBMS to HBase and extract the data using conditions such as prefixes, base column names and etc.
Pig is mainly used for this project since its built in functions found to be suitable. Using UDF was necessary due to Pig Latin being a data flow language. Some of the functions didn’t work as expected in the early stage of the project. However, it didn’t take a long time to find a way to work around it. It was convenient to use Pig to interact with HBase. Working with positive arbitrary integer prefixed columns seemed somewhat challenging at first but the challenge was cracked by using a few built in functions.