Running Spark Application using IntelliJ
Today, I was trying to build my first Spark application written in Java using IntelliJ. Once my sample application is ready, encountered few issues while trying to run the program through IntelliJ.
This blog post would talk about few ways of configuring your Spark application to run successfully.
Pre-Requisites:
- Have Java installed and JAVA_HOME environment variable set
- Have Spark installed (brew install apache-spark)
- IntelliJ with 'Scala' plugin installed
- Create new project
Open IntelliJ editor.
Navigate to File->New->Project...
Select 'Scala' from left menu, 'sbt' from right-side in New Project window, to create sbt-based Scala project.
Click on 'Next' to continue.
Provide name for new project. (scala-demo as project name in this example)
Select and change sbt and Scala versions if required.
Click on 'Finish' button to create the project.
- Write Hello World program
On the Project pane on the left, right-click src and select New => Scala class.
Name the class HelloWorld.scala
Change the code in the class to the following:
object HelloWorld extends App {
println("Hello, World!")
}
Right-click HelloWorld file and select Run 'HelloWorld'.
You should be able to see "Hello, World!" as output of the program.
3. Add Spark dependencies
Open build.sbt. Contents of the file should look similar to:
name := "scala-demo" version := "0.1" scalaVersion := "2.12.8"
Modify build.sbt to change its contents as specified below:
name := "scala-demo" version := "0.1" scalaVersion := "2.12.8" val sparkVersion = "2.4.0" libraryDependencies += "org.apache.spark" %% "spark-core" % sparkVersion libraryDependencies += "org.apache.spark" %% "spark-sql" % sparkVersion
Now we have spark dependencies added to the project. Enable auto-import if prompted.
4. Add spark code
Modify HelloWorld.Scala to reflect changes as specified below:
package scalademo
import org.apache.spark.sql.SparkSession
object HelloWorld {
def main(args: Array[String]): Unit = {
val spark =
SparkSession
.builder
.appName("Hello Spark App")
.config("spark.eventLog.enabled", false)
.getOrCreate()
println("Hello Spark")
spark.stop()
}
}
Try running the program by right-clicking on HelloWorld file and selecting 'Run HelloWorld' option.
You might get into below error:
ERROR SparkContext: Error initializing SparkContext. org.apache.spark.SparkException: A master URL must be set in your configuration
If you’ve got such Spark exception in the output, it means you simply forgot to specify the master URL. Probably you are running Spark locally. The most common mistake of running the application from IDE is the absence of master configuration.
5. Fix errors and run the Spark Application using IntelliJ
We should be providing master URL to fix the error observed previously. We can do that either through master() or through config() methods.
.master("local")
or
.config("spark.master", "local")
With these changes HelloWorld.scala contents should look similar to :
package scalademo
import org.apache.spark.sql.SparkSession
object HelloWorld {
def main(args: Array[String]): Unit = {
val spark =
SparkSession
.builder
.appName("Hello Spark App")
//.master("local")
.config("spark.master", "local")
.config("spark.eventLog.enabled", false)
.getOrCreate()
println("Hello Spark")
spark.stop()
}
}
For production systems, hard-coding value of master is not recommended. Always try to supply these variables from command line.
To pass configuration information when running in IntelliJ, use VM Option:
you can try with the VM option -Dspark.master=local[*], that pass the parameter to all places where it is not defined, so it should solve your problem.
With above configuration changes, you should be able to Spark Application successfully.
6. Run the Spark Application from command-line
Package the program by running 'sbt package' command. This would generate jar file.
Submit jar file to spark.
spark-submit --master local --class scalademo.HelloWorld target/scala-2.12/scala-demo_2.12-0.1.jar
This approach is recommended to pass configuration through command line parameters and avoid hard-coded values in the application code.
Try to learn something new, this post very helpful
Good one … only thing which is missing here is Winutils configuration. Download winutils and set the path of bin folder in VM options like below
Murali Poola Awesome. Big data at finger tips.. Do post some link on usecases Thanks a lot, Murali