Big Data

Data Lakes – Is it Time for Your Business to Wade In?

As data continues to grow in both volume and structural variety, traditional relational database approaches fall increasingly short in providing the needed flexibility, agility, scalability, and economy to support its processing.  Alternative and complimentary approaches for managing information have been pioneered, and given time to mature, in the last few years to satisfy today’s big data storage and processing needs. Most prominent among them for centrally managing the onslaught of all the information a business needs to process and store are Data Lakes.

What is the purpose of a Data Lake?

Data Lakes offer a far more economic and imminently scalable approach for ingesting and assimilating an ever changing range of input data primarily because they can be implemented on top of the open source Hadoop eco system. Hadoop provides an architecture that can scale as needed by simply adding commodity servers to the cluster for increased parallel processing and storage.  Due to […]

ThoughtSpot – For Near Instant Analytics Gratification

ThoughtSpot ups the ante when it comes to rapidly and effortlessly delivering insightful and completely ad-hoc data analytics and visuals to your business, even for large many TB data sets.

ThoughtSpot has trail blazed a new area of BI called Search BI. This type of BI differs from the current genre of more established BI tools such as Tableau in that it embeds and applies knowledge about how data of different categories is generally analyzed and most effectively visualized. This knowledge is then mapped onto your business’s specific domain data.   The alignment and cataloguing of the business domain data and Meta data is then used to provide an optimized, intelligent and guided search capability through it.  A business user simply begins typing what they are looking for into the search box and then ThoughtSpot offers completions of the search as the user types.  The suggested completions are offered in the order that […]

Big Data Trends in 2016

It is 2016 and data is growing more rapidly than ever. 2015 was big data’s year. There were many conferences related to big data everywhere. Professionals working in different industries, such as healthcare, insurance, bank, and etc., were eager to learn more about big data to solve their big data problems or perhaps, to seek its potentials.

If you would like to read more about the article, click the link here ——TrendsInBigData2016

2019-01-20T14:23:48+00:00Big Data|

HBase Data Extraction

Our Client is a NE based data solution provider in the healthcare industry. The client manages a single node CDH5 cluster Ver 5.3.2 in Ubuntu (Trusted Tahr) . The client had two main concerns. One of them being extracting data from HBase. Each table in HBase has its own metadata file. The metadata files provide information about the tables in HBase, including which columns to include and exclude from the output. The other concern was to convert the output data to JSON format.

In order to extract data from HBase, Pig is used. Originally, different approaches were made to interact with HBase. However, after exploring different options, Pig was an apt solution for this project due to its built-in functions and UDF flexibility. Only one UDF is used in this project, which is written in Python.

If you would like to read more about the article, click the link here ——-HbaseExtractionFinal

Pivotal GemFire

Hello everyone. This is an installation guide for Pivotal GemFire, which is a “distributed data management platform”. Pivotal is a company launched from VMware and EMC. It is relatively a new company that was founded in 2013 but with GE’s $105 million investment, they’ve been running strong with their own Hadoop distribution called, Pivotal HD.

On February 17th, 2015, Pivotal announced a partnership with Hortonworks and availability of their products, such as Pivotal HD, GemFire, HAWQ, and Greenplum, under open-source license.

Due to Pivotal’s path change and the partnership with Hortonworks, many Big Data users are intrigued but also questioning services Pivotal products offer. To guide these unanswered questions, we decided to give GemFire a try. Its high performance and HTML5 based dashboard, which visualizes and monitors status of GemFire clusters’ health and performance in real-time, are what keeps GemFire ahead of its competitors. Does that sound interesting enough? […]

2015-03-12T17:26:10+00:00Big Data|

Hortonworks Sandbox

In this tutorial, students will learn how to set environment to use Sandbox. We are using CentOS 6.6 for this tutorial. Although, we used CentOS 6.6 GUI in the tutorial, I’ve written it in a way that even server operating system users can follow the tutorial without any problem. Hope you all enjoy.

Click on to Download Hortonworks SandBox Tutorial Here —Sandbox

2015-03-12T17:24:10+00:00Big Data|

Hadoop 2.x on Amazon EC2

This is a Amazon EC2 tutorial. This tutorial will help students to understand current stable 2.x Hadoop and how it can be deploy on Amazon EC2 instance. In the tutorial, students will learn to create Hadoop cluster that is production ready.

Click on to Download Amazon EC2 Tutorial here —AmazonEC2

2015-03-12T17:23:24+00:00Big Data|

Hadoop Multi-node Cluster Installation on Centos 6

This is a Hadoop multi-node cluster installation guide, which will help you to understand how each node process in Hadoop. Everything in this guide is straightforward. We are using Centos6.6 since it is widely used in production servers. Every step is explained with pictures and comments. Just follow through all the steps and you shouldn’t have any problem. If you hit a wall because of some kind of error occurs during the installation process, please check if there is any spelling or indentation error. These small mistakes can fail Hadoop to run properly. I mainly used the “Hadoop Cluster Setup On Centos” video from Edureka to install Hadoop and create this guide. To see the guide, you simple click the “HadoopCentosMulitnode″ link below and the guide will open as a pdf file. Please enjoy.

Click on to Download Hadoop Multi-node Cluster Installation Guide here[…]

2015-03-12T17:22:11+00:00Big Data|

CDH5 Single Node Installation Guide

This is a CDH5 installation guide, which will give you some basic ideas on installation of Hadoop. Everything in this guide is straightforward. We are using Ubuntu Desktop in this tutorial so that even non-linux based operating system users can follow the guide easily. Every step is explained with pictures and comments. Just follow through all the steps and you shouldn’t have any problem installing CDH5. If you hit a wall because of some kind of error occurs during the installation process, please check if there is any spelling or indentation error. These small mistakes can fail CDH5 to run properly. I mainly used the “CDH5 Installation Guide” from Cloudera to create this guide. To see the guide, you simple click the “CDH5″ link below and the guide will open as a pdf file. Please enjoy.

Click on to Download CDH5 Installation Guide here —>[…]

2015-03-12T17:21:10+00:00Big Data|