Cleaning Data for Effective Data Science: Data Ingestion, Anomaly Detection, Value Imputation, and Feature Engineering Preview

Cleaning Data for Effective Data Science: Data Ingestion, Anomaly Detection, Value Imputation, and Feature Engineering

With Pearson and David Mertz Liked by 56 users
Duration: 4h 49m Skill level: Intermediate Released: 7/11/2025

Course details

Description

What is this course about?

The course introduces the tools and techniques needed for data ingestion, anomaly detection, value imputation, and feature engineering. Numerous ingested formats are addressed, including JSON, CSV, SQL RDBMS, HDF5, NoSQL databases, and binary serialized data structures. Instructor David Mertz outlines why some problems are peculiar to data representation, while others link to the data in itself. To address untidiness in data, learn how and when to impute missing values, detect unreliable data and statistical anomalies, and generate synthetic features that are necessary for successful data analysis and visualization goals. By the end of this course, you’ll be equipped with highly marketable and in-demand skills in data analysis, machine learning, and data integrity troubleshooting.

Note: This course was created by Pearson. We are pleased to host this training in our library.

Instructor

Who teaches this course?

David Mertz, PhD, is a data scientist, author, and former Python Foundation director and Anaconda senior trainer.

Objectives

What will I be able to do by the end of this course?

  • Analyze and process various data formats including tabular and hierarchical.
  • Detect and correct data anomalies and biases effectively.
  • Implement data ingestion across diverse formats such as JSON and CSV.
  • Apply value imputation techniques tailored to specific analytical purposes.
  • Engineer data features to enhance machine learning model performance.

Audience

Who is this course for?

  • Database administrators
  • Data scientists
  • Data analysts

Prerequisites

What do I need to know before taking this course?

  • Basic understanding of data structures and formats
  • Familiarity with data science principles and tools

Skills you’ll gain

Earn a sharable certificate

Share what you’ve learned, and be a standout professional in your desired industry with a certificate showcasing your knowledge gained from the course.

Sample certificate

Certificate of Completion

  • Showcase on your LinkedIn profile under “Licenses and Certificate” section

  • Download or print out as PDF to share with others

  • Share as image online to demonstrate your skill

Meet the instructors

Learner reviews

4.6 out of 5

14 ratings
  • 5 star
    Current value: 9 64%
  • 4 star
    Current value: 4 29%
  • 3 star
    Current value: 1 7%
  • 2 star
    Current value: 0 0%
  • 1 star
    Current value: 0 0%

Contents

What’s included

  • Learn on the go Access on tablet and phone

Similar courses

Download courses

Use your iOS or Android LinkedIn Learning app, and watch courses on your mobile device without an internet connection.