HBase & Spark - Transformation+Aggregation of JSON.
I came across a use case where the processing is a bit messy when data is stored in a json format into HBase; and you need to do some transformation + aggregation of json object/array, Guess what Spark is you answer.
Presuming you have spark installed and known the basic, let connect to HBase first , Connect to HBase table and create an JavaPairRDD.
Now transformation:
And finally save your data as table and do whatever you want :)
Command to run spark Job
spark-submit --driver-memory 2g --executor-memory 2g --files SparkApp.properties --class "ul.spark.app.YourClass" --master "spark://master:7077" SparkAppV0.1.jar --driver-class-path "/opt/cloudera/parcels/CDH/lib/hbase/lib/htrace-core-3.1.0-incubating.jar" --driver-java-options "-Dspark.executor.extraClassPath=/opt/cloudera/parcels/CDH/lib/hbase/lib/htrace-core-3.1.0-incubating.jar"
Love to hear your feedback
Happy Learning - DD
Thanks! It seems you are running a standalone CDH cluster. We are using HDP cluster. Not much sample about integrating Spark with HBase in HDP. I can get it working in local mode and has some difficulty to get it working in yarn-client or yarn-cluster mode.
Hi Deepak, Can you share your setting in Spark-submit and what kind of jars you have to use to make it compile and execute ? thx
Good work Deepak. I am glad you are also sharing the knowledge widely.
Good work Deepak. Hope you are enjoying :)
Thanks for sharing