Top R Libraries for Data Science
Category - Data Manipulation
Library - 1 - dplyr
Commits - 4354 Contributors - 136
Features - Powerful Library for data Wrangling, Works with local data frames and remote database tables, precise and simple command syntax.
Library - 2 - data.table
Commits - 3211 Contributors - 43
Features - quick aggregation of large data, laconic flexible syntax and wide suite of useful functions, friendly file reader and parallel file writer
Library - 3 - lubridate
Commits - 1427 Contributors - 45
Features - A set of functions to work with date and time format, easy and fast parsing of date-time data, expanded mathematical operations on time data
Library - 4 - jsonlite
Commits - 908 Contributors - 11
Features - robust and quick parsing JSON objects in R, great tool for interacting with web APIs and building pipelines, Function to stream, validate, and prettify JSON data.
Category - Graphic Displays
Library - 1 - ggplot2
Commits - 3903 Contributors - 133
Features - Powerful implementation of the grammar of graphics visualization, developed static graphics system, takes care of plot specifications.
Library - 2 - corrplot
Commits - 299 Contributors - 08
Features - abilities to visualize correlation matrices and confidence intervals, contains algorithms to do matrix reordering, flexible appearance details settings.
Library - 3 - lattice
Commits - 132 Contributors - 00
Features - high-level visualization system, emphasis on multivariate data, efficiently copes with nonstandard requirements.
Category - HTML Widgets
Library - 1 - plotly
Commits - 2986 Contributors - 26
Features - Rich features and plenty of available charts, web-based toolbox for building visualizations, abilities to make ggplot2 graphics interactive.
Library - 2 - ggvis
Commits - 2159 Contributors - 21
Features - Implementation of an interactive grammar of graphics, incorporates shiny reactive programming model and dplyr grammar of data transformation.
Library - 3 - DT DataTables
Commits - 1919 Contributors - 21
Features - Displays R Matrices and data frames as interactive HTML tables, creates sort-able tables with minimum of code, many useful features and styling options for tables.
Library - 4 - rCharts
Commits - 638 Contributors - 11
Features - Interactive JS charts from R, tools for creation, customization, and sharing.
Category - Reproducible Research
Library - 1 - knitr
Commits - 5467 Contributors - 96
Features - Transparent tool for easy dynamic report generation in R, enables integration of R code into LateX,LyX,HTML,Markdown,AsciiDoc, and reSturcturedText documents.
Library - 2 - markdown
Commits - 2297 Contributors - 56
Features - Next generation implementation of R Markdown based on pandoc, many static and dynamic output formats, abilities to define new formats for custom publishing requirements.
Library - 3 - slidify
Commits - 302 Contributors - 7
Features - Generates reproducible html5 slides from R Markdown, allows embedded code chunks and mathematical formulas, rich sharing and customizing opportunities.
Category - Machine Learning
Library - 1 - mlr
Commits - 3915 Contributors - 55
Features - Extensible framework for classification, regression, survival analysis, and clustering, ease extension mechanism through S3 inheritance.
Library - 2 - dmlc XGBoost
Commits - 3188 Contributors - 259
Features - Implementation of the Gradient Boosted Decision Trees algorithm, reach tools for for regression, classification and ranking problems, high speed and performance.
Library - 3 - caret
Commits - 1659 Contributors - 59
Features - Many models for classification and regression, powerful tools and algorithms for creating predictive models.
Library - 4 - gbm
Commits - 731 Contributors - 26
Features - Represents Generalized Boosted Regression Models, includes plenty of regression menthods, tools variable selection and final stage precision modeling.
Library - 5 - Prophet
Commits - 190 Contributors - 20
Features - High-Quality forecasts for time series data, manages data that has multiple seasonality with linear or non-linear growth, robust to missing data, shifts in the trend, and large outliers.
Library - 6 - randomForest
Commits - 56 Contributors - 0
Features - Implements Breiman's random forest algorithm for classification and regression, builds multiple decision trees and gives back the mean prediction of the individual trees.
Of course, this list of libraries is far from complete, but here I have collected the most generic and time-tested tools in my opinion. There are many other specific libraries that might be more efficient while solving particular tasks, so do not hesitate and share your thoughts and experience in the comment section.
#