CSV files overview (And how to use Python to read/write simple csv files)

Ben W.

Published Jun 21, 2023

A CSV (Comma-Separated Values) file is a plain test file that stores tabular data, where each line represents a row, and the values within each row are separated by commas (or other delimiters). It is a common file format used for data exchange between different software applications, especially when we work with spreadsheet-like data.

Advantages of CSV files:

Simplicity: CSV files are easy to create and manipulate since they consist of plain text and a simple structure.
Compatibility: CSV files can be read and written by a wide range of software applications, making them highly compatible and versatile for data interchange.
Lightweight: CSV files have a relatively small file size compared to other file formats like Excel (.xlsx) or databases, making them suitable for handling large datasets efficiently.
Human-readable: Since CSV files are plain text, they can be easily viewed and understood by humans, which can be helpful for data analysis debugging purposes.
Integrations: CSV files can be imported into popular software tools such as spreadsheets, databases, statistical software, and programming languages, allowing seamless integration with various data processing workflows.

Disadvantages of CSV files:

Limited data types: CSV files do not support complex data structures or data types, such as formulas, images, or multiple sheets. They primarily store simple tabular data without any formatting or metadata.
Lack of standards: There is no universal standard for CSV files, which can sometimes lead to compatibility issues, especially when dealing with different delimiters or handling special characters.
No data validation: CSV files do not provide built-in mechanisms for data validation, integrity constraints, or data relationships, which can increase the risk of data inconsistencies or errors.
Limited metadata: CSV files lack the ability to store additional metadata or annotations about the data, such as column types, units, or descriptions. This information often needs to be managed separately.
Encoding issues: If the data in a CSV file includes special characters or uses different character encodings, it can cause problems during reading or writing operations if not handled correctly.

Overall, CSV files are widely used for their simplicity, compatibility, and ease of integration, but they may not be suitable for complex data structures or situations that require advanced data validation or metadata management.

Tabular data refers to data organized in a table-like structure, where information is arranged in rows and columns. It is a structured form of data representation commonly used in spreadsheets, databases, and CSV files. In a tabular format, each row represents a separate record or observation, while each column represents a specific attribute or variable associated with that record.

Main characteristics of tabular data:

Rows: Each row in the table represents a single entity, instance, or observation. For example, in a table of student data, each row might represent a different student.
Columns: Each column corresponds to a specific attribute or variable associated with the entities or observations. For example, in the same student data table, columns could include attributes such as "Name", "Age", "Grade", and City".
Cell: Each intersection of a row and column is called a cell, which holds the actual data value or entry. It contains the information related to the specific attribute for a particular entity or observation.

Tabular data provides a structured and organized way to represent and store data, enabling easy access, manipulation, and analysis. It is commonly used in a various domain, including finance, business, research, and data analysis, as it allows for efficient processing, sorting, filtering, and aggregation of data based on specific attribute or conditions.

A spreadsheet is a digital file or document that consists of rows and columns, forming a grid-like structure. It is used for organizing, storing, and manipulating data, especially in a tabular form. Spreadsheets are commonly associated with software applications like MS, Google Sheets, or LibreOffice Calc, which provide powerful tools and functionalities for working with spreadsheet data.

Main features of a spreadsheet:

Grid structure: Spreadsheets are organized in a grid format, where rows are identified by numbers (1, 2, 3, etc.), and columns are identified by letters (A, B, C, etc.). The intersections of rows and columns are called cells.
Cells: Each cell in a spreadsheet can hold various types of data, such as numbers, text, formulas, dates, images, etc. Cells are the basic building blocks of a spreadsheet and are used to input, display, and calculate data.
Formulas and Functions: Spreadsheets provide a wide range of mathematical and logical functions that can be applied to cells and ranges of cells. Formulas allow users to perform calculations, perform data transformations, or create relationships between different cells or ranges.
Data manipulation: Spreadsheets offer tools for sorting, filtering, formatting, and analyzing data. users can perform operations like sorting data in ascending or descending order, applying filters to display specific subsets of data, or formatting cells to change their appearance.
Charts and graphs: Spreadsheets provide built-in features for creating charts and graphs based on the data in the spreadsheet. These visual representations help in understanding and analyzing data trends, patterns, and relationships.
Collaboration and sharing: Many spreadsheet applications allow multiple users to work on the same spreadsheet simultaneously, facilitating collaboration and real-time updates. Spreadsheets can also be shared with others, enabling easy data exchange and collaboration among teams.

How to read/write CSV files using Python

Use regular CSV reader to read CSV files:

Recommended by LinkedIn

Automated Data Transformation: From XML to CSV

Furkan Çoban 2 years ago

ADO .net vs Entity Framework: Is It Still Worth…

Jesús Muñoz 5 months ago

SQL Meets Software Engineering: The Rise of Semantic…

Success Ekhosuehi 2 months ago

# Use CSV module to read CSV files
import csv
import os # Optional. This is to make sure that the file path is correct. 
os.chdir(r'file_path')

with open('file_name.csv', 'r') as csv_file:
    csv_reader = csv.reader(csv_file)

    for line in csv_reader:
        print(line)

No alt text provided for this image — Example

Use dictionary CSV reader to read CSV files:

import csv
import os
os.chdir(r'file_path')

# Read csv files using dictionary reader.
with open('file_name.csv', 'r') as csv_file:
    csv_reader = csv.DictReader(csv_file)
    for line in csv_reader:
        print(line)

The advantage of using dictionary reader to read csv files instead of using regular csv reader is that it is easier to manipulate the data using the indices, as we can see below:

Write a CSV file using regular CSV writer:

with open('file_name.csv', 'r') as csv_file:
    csv_reader = csv.reader(csv_file)

    # next(csv_reader) # We can use this to skip the first line (usually it's the header)

    with open('new_file_name.csv', 'w') as new_file:
        csv_writer = csv.writer(new_file, delimiter = '\t') <- Here, we can define the delimiter we'd like to use.

        for line in csv_reader:
            csv_writer.writerow(line)

Here, we can see that all the data in the CSV file is delimited by '\', which is defined by us in the code using the parameter "delimiter='\t'".

Write a CSV file using dictionary CSV writer:

Note: When we use dictionary CSV writer, we need to define the fieldnames (i.e., the header of the file).

import csv
import os
os.chdir(r'file_path')

# Read csv files using dictionary reader.
with open('file_name.csv', 'r') as csv_file:
    csv_reader = csv.DictReader(csv_file)

    with open('new_file_name.csv', 'w') as new_file:
        fieldnames = ['first_name', 'last_name', 'email']

        csv_writer = csv.DictWriter(new_file, fieldnames=fieldnames)

        csv_writer.writeheader()

        for line in csv_reader:
            # del line['email'] # We can use this line to delete a column we don't want in the new file.
            csv_writer.writerow(line)

To view or add a comment, sign in

CSV files overview (And how to use Python to read/write simple csv files)

Ben W.

Recommended by LinkedIn

More articles by Ben W.

Others also viewed

Building data pipelines in Python

Automating Post-Load Reconciliation

Extracting, Transforming and Loading (ETL) GDP Data with Python

Heard of Great Expectations DQ framework?

Harnessing Amazon Glue for Efficient ETL with Python

Cleaning data in a CSV file using Python:

Learning tech by analogy

Unveiling Your Cloud Consumption: A Deep Dive into Streamlit and Snowflake

ORM(Object Relational Mapping) in .NET

Explore content categories

Recommended by LinkedIn

More articles by Ben W.

International Parity Conditions Overview

NumPy (Python Library) Overview + Some code

Credit Valuation Adjustment (CVA) Overview

K-Means Clustering Algorithm Overview

Clustering overview

Others also viewed

Building data pipelines in Python

Automating Post-Load Reconciliation

Extracting, Transforming and Loading (ETL) GDP Data with Python

Heard of Great Expectations DQ framework?

Harnessing Amazon Glue for Efficient ETL with Python

Cleaning data in a CSV file using Python:

Learning tech by analogy

Unveiling Your Cloud Consumption: A Deep Dive into Streamlit and Snowflake

ORM(Object Relational Mapping) in .NET

Explore content categories