Top R Libraries for Data Science

Top R Libraries for Data Science

Category -  Data Manipulation

Library - 1 - dplyr

Commits - 4354                                                   Contributors - 136

Features - Powerful Library for data Wrangling, Works with local data frames and remote database tables, precise and simple command syntax.

Library - 2 - data.table

Commits - 3211                                                   Contributors - 43

Features - quick aggregation of large data, laconic flexible syntax and wide suite of useful functions, friendly file reader and parallel file writer

Library - 3 - lubridate

Commits - 1427                                                   Contributors - 45

Features - A set of functions to work with date and time format, easy and fast parsing of date-time data, expanded mathematical operations on time data

Library - 4 - jsonlite

Commits - 908                                                     Contributors - 11

Features - robust and quick parsing JSON objects in R, great tool for interacting with web APIs and building pipelines, Function to stream, validate, and prettify JSON data.

Category - Graphic Displays

Library - 1 - ggplot2

Commits - 3903                                                   Contributors - 133

Features - Powerful implementation of the grammar of graphics visualization, developed static graphics system, takes care of plot specifications.

Library - 2 - corrplot

Commits - 299                                                     Contributors - 08

Features - abilities to visualize correlation matrices and confidence intervals, contains algorithms to do matrix reordering, flexible appearance details settings.

Library - 3 - lattice

Commits - 132                                                     Contributors - 00

Features - high-level visualization system, emphasis on multivariate data, efficiently copes with nonstandard requirements.

Category - HTML Widgets

Library - 1 - plotly

Commits - 2986                                                   Contributors - 26

Features - Rich features and plenty of available charts, web-based toolbox for building visualizations, abilities to make ggplot2 graphics interactive.

Library - 2 - ggvis

Commits - 2159                                                   Contributors - 21

Features - Implementation of an interactive grammar of graphics, incorporates shiny reactive programming model and dplyr grammar of data transformation.

Library - 3 - DT DataTables

Commits - 1919                                                   Contributors - 21

Features - Displays R Matrices and data frames as interactive HTML tables, creates sort-able tables with minimum of code, many useful features and styling options for tables.

Library - 4 - rCharts

Commits - 638                                                    Contributors - 11

Features - Interactive JS charts from R, tools for creation, customization, and sharing.

Category - Reproducible Research

Library - 1 - knitr

Commits - 5467                                                   Contributors - 96

Features - Transparent tool for easy dynamic report generation in R, enables integration of R code into LateX,LyX,HTML,Markdown,AsciiDoc, and reSturcturedText documents.

Library - 2 - markdown

Commits - 2297                                                   Contributors - 56

Features - Next generation implementation of R Markdown based on pandoc, many static and dynamic output formats, abilities to define new formats for custom publishing requirements.

Library - 3 - slidify

Commits - 302                                                     Contributors - 7

Features - Generates reproducible html5 slides from R Markdown, allows embedded code chunks and mathematical formulas, rich sharing and customizing opportunities.

Category - Machine Learning

Library - 1 - mlr

Commits - 3915                                                   Contributors - 55

Features - Extensible framework for classification, regression, survival analysis, and clustering, ease extension mechanism through S3 inheritance.

Library - 2 - dmlc XGBoost

Commits - 3188                                                   Contributors - 259

Features - Implementation of the Gradient Boosted Decision Trees algorithm, reach tools for for regression, classification and ranking problems, high speed and performance.

Library - 3 - caret

Commits - 1659                                                   Contributors - 59

Features - Many models for classification and regression, powerful tools and algorithms for creating predictive models.

Library - 4 - gbm

Commits - 731                                                     Contributors - 26

Features - Represents Generalized Boosted Regression Models, includes plenty of regression menthods, tools variable selection and final stage precision modeling.

Library - 5 - Prophet

Commits - 190                                                     Contributors - 20

Features - High-Quality forecasts for time series data, manages data that has multiple seasonality with linear or non-linear growth, robust to missing data, shifts in the trend, and large outliers.

Library - 6 - randomForest

Commits - 56                                                       Contributors - 0

Features - Implements Breiman's random forest algorithm for classification and regression, builds multiple decision trees and gives back the mean prediction of the individual trees.

Of course, this list of libraries is far from complete, but here I have collected the most generic and time-tested tools in my opinion. There are many other specific libraries that might be more efficient while solving particular tasks, so do not hesitate and share your thoughts and experience in the comment section.


To view or add a comment, sign in

Others also viewed

Explore content categories