Avoid being misled by the data visualization method
https://www.zmescience.com/science/agriculture-science/cutting-down-trees-and-planting-new-ones-is-wrecking-the-soil/

Avoid being misled by the data visualization method

[scRNA-seq analysis, R] When I was visualizing the distribution of the reads of one gene to cell count by a histogram, the zero expression cells occupied the whole graph, like this:

No alt text provided for this image

Certainly, I'd like to see the distribution of true-expressing cells so I simply added + ylim() to set up the y-axis limit:

No alt text provided for this image

This was nice by a glimpse because two peaks showed up underlying the big zero-expression counts. But wait! Where are the zero expression cells? Shouldn't the bar just be chopped and left a stump? After reexamining the code, I found ggplot did this:

"Warning message: Removed 4 rows containing missing values (geom_bar). "

Fishy! Then I found a better option by adding + coord_cartesian(ylim = c(, )), instead of + ylim(), and reploted the same data:

No alt text provided for this image

Yes! The zero-expression cells are back and the gap between 0.4 and 0.5 is filled, which seems to result from the "Removed 4 rows" when using + ylim().

Lessons learned: when zooming in to a ggplot, be aware of automatic data dropping (Or just remember using + coord_cartesian() to avoid being misled for the most of time).

Lastly, here is a toy code for parallel comparison (zoom in to 0-5 count by two ways):

gridExtra::grid.arrange(grobs = list(
    ggplot(iris, aes(Sepal.Length)) + geom_histogram() + ggtitle("original data"),
    ggplot(iris, aes(Sepal.Length)) + geom_histogram() + ylim(0,5) + ggtitle("+ ylim"),
    ggplot(iris, aes(Sepal.Length)) + geom_histogram() + coord_cartesian(ylim = c(0,5)) + ggtitle("+ coord_cartesian")), ncol = 3)        
No alt text provided for this image

References:



To view or add a comment, sign in

More articles by Cheng-Yi Chen

  • A simply adorned data frame

    I recently helped my colleagues built a single-cell RNA-seq portal which hopefully will serve the general zebrafish…

Others also viewed

Explore content categories