Running Spark Application using IntelliJ

Today, I was trying to build my first Spark application written in Java using IntelliJ. Once my sample application is ready, encountered few issues while trying to run the program through IntelliJ.

This blog post would talk about few ways of configuring your Spark application to run successfully.

Pre-Requisites:

  • Have Java installed and JAVA_HOME environment variable set
  • Have Spark installed (brew install apache-spark)
  • IntelliJ with 'Scala' plugin installed
  1. Create new project

Open IntelliJ editor.

Navigate to File->New->Project...

Select 'Scala' from left menu, 'sbt' from right-side in New Project window, to create sbt-based Scala project.

No alt text provided for this image

Click on 'Next' to continue.

Provide name for new project. (scala-demo as project name in this example)

Select and change sbt and Scala versions if required.

No alt text provided for this image

Click on 'Finish' button to create the project.

  1. Write Hello World program

On the Project pane on the left, right-click src and select New => Scala class.

Name the class HelloWorld.scala

Change the code in the class to the following:

object HelloWorld extends App {
  println("Hello, World!")
}

Right-click HelloWorld file and select Run 'HelloWorld'.

You should be able to see "Hello, World!" as output of the program.

3. Add Spark dependencies

Open build.sbt. Contents of the file should look similar to:

name := "scala-demo"

version := "0.1"

scalaVersion := "2.12.8"

Modify build.sbt to change its contents as specified below:

name := "scala-demo"

version := "0.1"

scalaVersion := "2.12.8"
val sparkVersion = "2.4.0"

libraryDependencies += "org.apache.spark" %% "spark-core" % sparkVersion
libraryDependencies += "org.apache.spark" %% "spark-sql" % sparkVersion

Now we have spark dependencies added to the project. Enable auto-import if prompted.

4. Add spark code

Modify HelloWorld.Scala to reflect changes as specified below:

package scalademo

import org.apache.spark.sql.SparkSession

object HelloWorld {
  def main(args: Array[String]): Unit = {
    val spark =
      SparkSession
        .builder
        .appName("Hello Spark App")
        .config("spark.eventLog.enabled", false)
        .getOrCreate()

    println("Hello Spark")

    spark.stop()

  }
}

Try running the program by right-clicking on HelloWorld file and selecting 'Run HelloWorld' option.

You might get into below error:

ERROR SparkContext: Error initializing SparkContext.
org.apache.spark.SparkException: A master URL must be set in your configuration

If you’ve got such Spark exception in the output, it means you simply forgot to specify the master URL. Probably you are running Spark locally. The most common mistake of running the application from IDE is the absence of master configuration.

5. Fix errors and run the Spark Application using IntelliJ

We should be providing master URL to fix the error observed previously. We can do that either through master() or through config() methods.

.master("local")

or

.config("spark.master", "local")

With these changes HelloWorld.scala contents should look similar to :

package scalademo

import org.apache.spark.sql.SparkSession

object HelloWorld {
  def main(args: Array[String]): Unit = {
    val spark =
      SparkSession
        .builder
        .appName("Hello Spark App")
        //.master("local")
        .config("spark.master", "local")
        .config("spark.eventLog.enabled", false)
        .getOrCreate()

    println("Hello Spark")

    spark.stop()

  }
}

For production systems, hard-coding value of master is not recommended. Always try to supply these variables from command line. 

To pass configuration information when running in IntelliJ, use VM Option:

you can try with the VM option -Dspark.master=local[*], that pass the parameter to all places where it is not defined, so it should solve your problem.

No alt text provided for this image


With above configuration changes, you should be able to Spark Application successfully.

6. Run the Spark Application from command-line

Package the program by running 'sbt package' command. This would generate jar file.

Submit jar file to spark.

spark-submit --master local --class scalademo.HelloWorld target/scala-2.12/scala-demo_2.12-0.1.jar

This approach is recommended to pass configuration through command line parameters and avoid hard-coded values in the application code.




Try to learn something new, this post very helpful

Good one … only thing which is missing here is Winutils configuration. Download winutils and set the path of bin folder in VM options like below

  • No alternative text description for this image
Like
Reply

Murali Poola Awesome. Big data at finger tips.. Do post some link on usecases Thanks a lot, Murali

Like
Reply

To view or add a comment, sign in

More articles by Murali Poola

Others also viewed

Explore content categories