Oracle View on Big Data
In today’s world, there are many internet applications, digital instruments there is a big scope of the streaming unlimited data collection, fraud detection etc.
The data usage has increased to hundreds of terabytes to petabytes. Generally, the unstructured data like comments in blogs, images, documents and on many websites, cannot be captured easily with traditional database technologies. Most of the traditional technologies work only on structured data. As technology is growing and there is huge usage of data. For example, if a user has posted a photo in Facebook, there are lakhs of people who will be viewing these petabytes of data. At Suneratech, we give importance to data, analyze even the unstructured data and combine with Oracle offerings to maximize business benefits.
What is Hadoop and NoSQL?
Hadoop: this is a massively distributed file system approach which stores massive amounts of data with high volumes of unstructured data. Hadoop share nothing architecture which means it spreads data across many nodes. This uses the server file system to access and store files. Hadoop is originated from Google, but is now organized by Apache project.
No SQL: "Not Only SQL" is another distributed system, but this operates more like a database than Hadoop. This relies on spreading data across many nodes to achieve parallelism. This is best suitable for semi structured Data.
The above systems are suitable for large superfast storage devices. The purpose of these storage mechanism is to access huge amounts of data.
Using of Hadoop and NoSQL for DW & Big Data Analytics
As there are huge volumes of unstructured data which is used, first we need Hadoop or NoSQL to shrink the size of the data in much smaller volume out of all the bad or complete data which is useful. Among this dataset traditional DB can be used to perform operations.
Oracle provides a few mechanisms of how to extract data out of Hadoop and NoSQL into our normal RDBMS.
Diverse ways of extracting data from Hadoop:
Data can be accessed in Hadoop with a programming language called MapReduce. These programs provide different functionalities to reduce data from all the data (HDFS) to just the data that we need
Method 1: Direct Connector for HDFS (ODCH)
Method 2: HIVE (Pull) and ODI
Method 3: Oracle Loader for Hadoop (Push)
Below are different Oracle Offerings
1) No SQL Database
2) Hadoop, MapReduce
3) Cloudera Manager to manage Hadoop. Cloudera is an OEMed product.
4) Applications such as HIVE (with MySql metadata store)
5) Oracle Loader for Hadoop
6) Oracle Direct Connector for HDFS
7) Oracle R connector for Hadoop