Import Clean Data & Bullet Graph in R

Import Clean Data & Bullet Graph in R

Hello everyone, today we will learn how to create a bullet graph and some minor data cleaning along the way. Let's get started!

We will use library(readxl) & library(janitor) in this example. If you haven't install these libraries already, kindly install the libraries before starting.

install.packages("janitor")
install.packages("readxl")
        

First we will import the data we want to use.

library(readxl)
filePath = "YourFilePath"
setwd(filePath)
dataOri = read_excel("Global Superstore.xlsx")"        

Data Link:

https://drive.google.com/uc?export=download&id=1UAvSMnvnH_USEe62eS57NrtGvWatf8DB

We will first create a time series chart with "profit" and "order date".

bulletGraph <- ggplot(aes(x=Order Date, y=profit), data = dataOri) + geom_line()
bulletGraph         
No alt text provided for this image

As we've learnt in previous article, we will use ggplot to plot a graph and use "+ geom_line()" to tell R that we want a line chart. x="column name at x-axis" and y="column name at y-axis".

Unfortunately, we got an error. This happened because there is whitespace between "Order" and "Date". For R, we want to avoid having whitespace in column names because although R can accept a name containing space, but the spaces make it impossible to reference the object in a function.

Global Superstore.xlsx has many columns that uses whitespace and symbols like "-" in its column name. Hence, we will use janitor library to help us clean up the column name before importing.

library(janitor)
dataCleaned = read_excel("Global Superstore.xlsx",.name_repair = make_clean_names)
        
No alt text provided for this image

Notice that all special character including "whitespace" and "-" are replaced with "_" which is a valid character when referencing to an object in R.

Now, we can create our line graph with the cleaned data.

bulletGraph <- ggplot(aes(x=order_date, y=profit), data = dataCleaned) + geom_line()
bulletGraph        
No alt text provided for this image

While we've successfully created a time series graph, the axis names now don't look good because of the "_" and small cap. So, let us change the axis name to a better one.

bulletGraph <- ggplot(aes(x=order_date, y=profit), data = dataCleaned) + geom_line() + xlab("Date") + ylab("Profit")
bulletGraph        

xlab("Date") and ylab("Profit") responsible for the label of x and y axis changing their name at your choice.

No alt text provided for this image

Remember that we wanted a bullet graph, so we want to add a horizontal line to indicate our target.

bulletGraph <- ggplot(aes(x=order_date, y=profit), data = dataCleaned) + geom_line() + xlab("Date") + ylab("Profit") + geom_hline(aes(yintercept = 4000, color = "red"))
bulletGraph        

We would simply add geom_hline to add a horizontal line to the graph. aes(yintercept = 4000, color = "red") would tell R that we wanted a red line at Profit = 4000.

No alt text provided for this image

Here we go, a bullet graph that shows time series with red line indicating our target.

Hope you learnt something new today and see you again!



To view or add a comment, sign in

More articles by Chee-Chuan Foo

Others also viewed

Explore content categories