Publishing interactive data documents
(I am on a vacation this week, so this is a throwback from my past blog post)
When presented effectively, an interactive chart helps the audience to get an overview and in-depth look of complex data. In this article, I will show an example of using interactive charts effectively. I also introduce R markdown as one of the most efficient ways of authoring data-rich documents.
Motivating example
Visualizing data lets us see the bird's-eye view of data. Suppose we made many observations of ozone and temperature to learn about air quality. We can plot temperature (F) and ozone level (ppb) on the x and y axes, respectively as in Fig. 1. Just by looking at the plot, we gain a little insight on the relationship between the two measurements.
Figure 1: Ozone level and temperature
A more sophisticated visualization can present higher dimensional data on a flat surface: it explains not only the relationship between x and y, but also y and z, z and x, and so on. We can add wind speed measurements (mph) to our plot (Fig. 2).
Figure 2: Ozone level, temperature, and wind speed. Lighter blue indicates higher wind speed.
In the plot, I showed the wind temperature in shades of blue. A lighter blue indicates higher wind speed. We can observe that the wind speed tends to be low when the ozone level is high.
After getting an overview of the data, we may want to look at more detailed views of particular parts of the data. What if we want to know when the highest ozone level was observed? Fortunately, we use computer screens to view the graph more often these days. You can hover the mouse cursor over Fig. 3 to reveal the date of the observation.
Figure 3: Interactive chart of ozone level, temperature, and wind speed. Hover the mouse cursor over to reveal the date. (Click the image to see the interactive version on my web site)
Interactive chart with R markdown
You may already be familiar with the sophisticated interactive visualizations on New York Times articles. It used to require skills such as database query languages (e.g. SQL), web-frameworks (Python-Django, Ruby on Rails, nose.js, and etc), and front-end web application coding (JavaScript, HTML, CSS) to create such interactive visualizations.
Thanks to recent developments such as rCharts, it is now much easier to author and publish data-rich documents in R markdown.
With R markdown, we can:
- Load and transform data from multiple sources
- Run statistical analysis
- Produce figures
- Write narratives
Within a single text file. Having to switch between different programs to complete each of those tasks can be very disruptive to your thought process. With R markdown, we can focus on the research without being distracted by how each tool works.
R markdown example: Twitter impressions and engagements
Figure 4 is a demonstration of using R markdown to visualize my Twitter activities in 2014. The x-axis represents time. The y-axis represents the number of impressions (i.e. how many people saw each of my tweets.) The size of each bubble represents the number of engagements (i.e. the total number of viewer's activities such as clicking the profile or links, and expanding the tweets.)
Figure 4: Interactive chart: Daigo’s Twitter impressions and engagements in 2014. The size of the bubble corresponds to the number of engagements. Hover the mouse cursor over to reveal the details. (Click the image to see the interactive version on my web site)
A bubble chart like this is one way to effectively visualize the relationship between three variables. The chart above is also colored to distinguish between tweets in English and Japanese. Presented in this way, it is easy to see the majority of my tweets are in English.
After getting an overview of the impressions, engagements, and frequency of tweets in each language, I may want to find out which tweets actually got very high or low impressions or engagements Hovering the mouse over a bubble will reveal the content of each tweet. Each group can be toggled on and off, and the chart adjusts the zoom automatically.
Once data were prepared, it only took 8 lines of R code to generate the chart with the help of rCharts package and the reusable helper functionI wrote:
twitter_chart <- Highcharts$new()
twitter_chart$chart(type = "bubble")
twitter_chart$xAxis(type="datetime")
twitter_chart$yAxis(title="{text: 'Impressions'}", gridLineColor="#FFFFFF")
twitter_chart$series(name="English", data = en_values)
twitter_chart$series(name="Japanese", data = ja_values)
twitter_chart$legend(symbolWidth = 80)
The entire source of this article is available for viewing.
Conclusions
With the help of rCharts package, it is now much easier to publish data-rich documents. R markdown lets us focus on the data analysis and production of the manuscript without requiring in-depth knowledge of programming languages. Properly presented interactive charts help online publishers communicate data and their interpretations effectively.
Daigo Tanaka, Ph.D. is a founder of Anelen Co., LLC, a boutique consulting firm focused on Data Science. He introduced an agile start-up methodology to a pre-Stage A startup and helped to raise the total of $105MM. If you need a consultation on the topic of a startup operation and data science, please don't hesitate to connect him on LinkedIn or send a message.
nice and useful summary!