Oracle Big Data Preparation Cloud Service a Case Study
A university recently approached me and they wanted to understand how their students were using different facilities at the campus. We are talking about a new multi-million dollar campus, so there was a lot of interest to analyse and understand if all the facilities were being used to the best of their potential.
One key thing about students is that they love to connect their mobile devices, laptops and tablets to the network. The network logs had all the information we were after, such as Usernames, IPAddresses, MacAddresses, Timeins and Timeouts etc. Now the challenge was these network log files were not your typical structured files, rather they were unstructured files with these values hidden among a trove of other data.
I took a sample of these network log files and uploaded it into Oracle’s Big Data Preparation Cloud Service (BDPCS). BDPCS profiled this data and automatically detected some values such as IPAddresses. Then with a little bit of transformation and some Regex magic, I was able to extract other data easily. I spent a little time standardizing, cleansing, profiling the data and enriched it using the built in knowledge engine of BDPCS. Once I was satisfied with the data, it was very easy for me to remove all the other values which were just noise. I then published the data and got a structured file with columns like Username, IPAddress, MacAddress, TimeIn and TimeOut. This structured file was provided to the downstream system where it was easily integrated with Student system and Campus system and we got a heat map of how students traverse through the campus at different times.
Interestingly when I received the network log files and looked at them for the first time to the time I produced the output file based on my customer’s requirement, it took me less than one hour. This was surprising, as analysts keep telling us that projects are spending a lot of time preparing the data rather than doing the actual analysis. Some analysts are even claiming that 90% of time in big data projects is spent performing data preparation tasks.
Projects can use Oracle Big Data Preparation Cloud service to considerably reduce this time of data preparation. Its easy to use interface can be utilized by both IT and Business personnel. Here is a short video which gives a good overview of big data preparation cloud service.
You can find out more about this here.
https://cloud.oracle.com/bigdatapreparation
Safe Harbor Statement
The preceding is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described for Oracle’s products remains at the sole discretion of Oracle.
Disclaimer
The thoughts, practices and opinions expressed here are those of the author alone and do not necessarily reflect the views of Oracle.
Harris, this is a great example of how to ingest and transform unstructured data with BDP, and how to expose the result-set in a BI dahsboard in the form of a heat map. Great work!
Good one Harris
Great way to ingest, cleanse, and enrich Big Data - via Oracle’s Big Data Preparation Cloud Service
Please add comparative analysis with other service providers
Data wrangling challenge is well and truly addressed by BDPCS. I am amazed both structured and unstructured data can be ingested , profiled, cleansed and enriched.