Running Spark Application using IntelliJ

Murali Poola

Published May 21, 2020

Today, I was trying to build my first Spark application written in Java using IntelliJ. Once my sample application is ready, encountered few issues while trying to run the program through IntelliJ.

This blog post would talk about few ways of configuring your Spark application to run successfully.

Pre-Requisites:

Have Java installed and JAVA_HOME environment variable set
Have Spark installed (brew install apache-spark)
IntelliJ with 'Scala' plugin installed

Create new project

Open IntelliJ editor.

Navigate to File->New->Project...

Select 'Scala' from left menu, 'sbt' from right-side in New Project window, to create sbt-based Scala project.

Click on 'Next' to continue.

Provide name for new project. (scala-demo as project name in this example)

Select and change sbt and Scala versions if required.

Click on 'Finish' button to create the project.

Write Hello World program

On the Project pane on the left, right-click src and select New => Scala class.

Name the class HelloWorld.scala

Change the code in the class to the following:

object HelloWorld extends App {
  println("Hello, World!")
}

Right-click HelloWorld file and select Run 'HelloWorld'.

You should be able to see "Hello, World!" as output of the program.

3. Add Spark dependencies

Open build.sbt. Contents of the file should look similar to:

name := "scala-demo"

version := "0.1"

scalaVersion := "2.12.8"

Modify build.sbt to change its contents as specified below:

name := "scala-demo"

version := "0.1"

scalaVersion := "2.12.8"
val sparkVersion = "2.4.0"

libraryDependencies += "org.apache.spark" %% "spark-core" % sparkVersion
libraryDependencies += "org.apache.spark" %% "spark-sql" % sparkVersion

Now we have spark dependencies added to the project. Enable auto-import if prompted.

4. Add spark code

Modify HelloWorld.Scala to reflect changes as specified below:

package scalademo

import org.apache.spark.sql.SparkSession

object HelloWorld {
  def main(args: Array[String]): Unit = {
    val spark =
      SparkSession
        .builder
        .appName("Hello Spark App")
        .config("spark.eventLog.enabled", false)
        .getOrCreate()

    println("Hello Spark")

    spark.stop()

  }
}

Try running the program by right-clicking on HelloWorld file and selecting 'Run HelloWorld' option.

You might get into below error:

ERROR SparkContext: Error initializing SparkContext.
org.apache.spark.SparkException: A master URL must be set in your configuration

If you’ve got such Spark exception in the output, it means you simply forgot to specify the master URL. Probably you are running Spark locally. The most common mistake of running the application from IDE is the absence of master configuration.

5. Fix errors and run the Spark Application using IntelliJ

We should be providing master URL to fix the error observed previously. We can do that either through master() or through config() methods.

.master("local")

.config("spark.master", "local")

With these changes HelloWorld.scala contents should look similar to :

package scalademo

import org.apache.spark.sql.SparkSession

object HelloWorld {
  def main(args: Array[String]): Unit = {
    val spark =
      SparkSession
        .builder
        .appName("Hello Spark App")
        //.master("local")
        .config("spark.master", "local")
        .config("spark.eventLog.enabled", false)
        .getOrCreate()

    println("Hello Spark")

    spark.stop()

  }
}

For production systems, hard-coding value of master is not recommended. Always try to supply these variables from command line.

To pass configuration information when running in IntelliJ, use VM Option:

you can try with the VM option -Dspark.master=local[*], that pass the parameter to all places where it is not defined, so it should solve your problem.

With above configuration changes, you should be able to Spark Application successfully.

6. Run the Spark Application from command-line

Package the program by running 'sbt package' command. This would generate jar file.

Submit jar file to spark.

spark-submit --master local --class scalademo.HelloWorld target/scala-2.12/scala-demo_2.12-0.1.jar

This approach is recommended to pass configuration through command line parameters and avoid hard-coded values in the application code.

Yuanzhen Zhuang 3y

Try to learn something new, this post very helpful

1 Reaction

Krishna Kumar Chourasiya 4y

Good one … only thing which is missing here is Winutils configuration. Download winutils and set the path of bin folder in VM options like below

Srilalitha Vishnubhotla 5y

Murali Poola Awesome. Big data at finger tips.. Do post some link on usecases Thanks a lot, Murali

See more comments

To view or add a comment, sign in

Running Spark Application using IntelliJ

Murali Poola

More articles by Murali Poola

Others also viewed

Java & Big O: From Theory to Milliseconds ⏱️

The Power of LangChain with the flexibilty of Java: LangChain4j and Oracle 26ai Vector Store Integration

Working with property files in Scala

How to run JVM builds and utilities quicker than Python

Building ZIO-Neo4J: Part 1

Strict event ordering in Kafka using Java 21 & Spring Boot | Bruce Wayne wants billionaire-style shipping

Kafka Client Library Comparison

The Main Thread Weekly

Announcing mq-rest-admin 1.1: IBM MQ administration libraries for Python, Java, and Go

Explore content categories

More articles by Murali Poola

Kafka Series - Using Apache Kafka Producer - Part 3

Kafka Series - Getting Started - Part2

Kafka Series - Getting Started - Part1

Programming for Concurrency in C# - Part 4

Programming for Concurrency in C# - Part 3

Programming for Concurrency in C# - Part 2

Programming for Concurrency in C# - Part 1

How to make sure that your web application is always running