CSV files overview (And how to use Python to read/write simple csv files)

CSV files overview (And how to use Python to read/write simple csv files)

A CSV (Comma-Separated Values) file is a plain test file that stores tabular data, where each line represents a row, and the values within each row are separated by commas (or other delimiters). It is a common file format used for data exchange between different software applications, especially when we work with spreadsheet-like data.

Advantages of CSV files:

  1. Simplicity: CSV files are easy to create and manipulate since they consist of plain text and a simple structure.
  2. Compatibility: CSV files can be read and written by a wide range of software applications, making them highly compatible and versatile for data interchange.
  3. Lightweight: CSV files have a relatively small file size compared to other file formats like Excel (.xlsx) or databases, making them suitable for handling large datasets efficiently.
  4. Human-readable: Since CSV files are plain text, they can be easily viewed and understood by humans, which can be helpful for data analysis debugging purposes.
  5. Integrations: CSV files can be imported into popular software tools such as spreadsheets, databases, statistical software, and programming languages, allowing seamless integration with various data processing workflows.

Disadvantages of CSV files:

  1. Limited data types: CSV files do not support complex data structures or data types, such as formulas, images, or multiple sheets. They primarily store simple tabular data without any formatting or metadata.
  2. Lack of standards: There is no universal standard for CSV files, which can sometimes lead to compatibility issues, especially when dealing with different delimiters or handling special characters.
  3. No data validation: CSV files do not provide built-in mechanisms for data validation, integrity constraints, or data relationships, which can increase the risk of data inconsistencies or errors.
  4. Limited metadata: CSV files lack the ability to store additional metadata or annotations about the data, such as column types, units, or descriptions. This information often needs to be managed separately.
  5. Encoding issues: If the data in a CSV file includes special characters or uses different character encodings, it can cause problems during reading or writing operations if not handled correctly.

Overall, CSV files are widely used for their simplicity, compatibility, and ease of integration, but they may not be suitable for complex data structures or situations that require advanced data validation or metadata management.


Tabular data refers to data organized in a table-like structure, where information is arranged in rows and columns. It is a structured form of data representation commonly used in spreadsheets, databases, and CSV files. In a tabular format, each row represents a separate record or observation, while each column represents a specific attribute or variable associated with that record.

Main characteristics of tabular data:

  • Rows: Each row in the table represents a single entity, instance, or observation. For example, in a table of student data, each row might represent a different student.
  • Columns: Each column corresponds to a specific attribute or variable associated with the entities or observations. For example, in the same student data table, columns could include attributes such as "Name", "Age", "Grade", and City".
  • Cell: Each intersection of a row and column is called a cell, which holds the actual data value or entry. It contains the information related to the specific attribute for a particular entity or observation.

Tabular data provides a structured and organized way to represent and store data, enabling easy access, manipulation, and analysis. It is commonly used in a various domain, including finance, business, research, and data analysis, as it allows for efficient processing, sorting, filtering, and aggregation of data based on specific attribute or conditions.


A spreadsheet is a digital file or document that consists of rows and columns, forming a grid-like structure. It is used for organizing, storing, and manipulating data, especially in a tabular form. Spreadsheets are commonly associated with software applications like MS, Google Sheets, or LibreOffice Calc, which provide powerful tools and functionalities for working with spreadsheet data.

Main features of a spreadsheet:

  • Grid structure: Spreadsheets are organized in a grid format, where rows are identified by numbers (1, 2, 3, etc.), and columns are identified by letters (A, B, C, etc.). The intersections of rows and columns are called cells.
  • Cells: Each cell in a spreadsheet can hold various types of data, such as numbers, text, formulas, dates, images, etc. Cells are the basic building blocks of a spreadsheet and are used to input, display, and calculate data.
  • Formulas and Functions: Spreadsheets provide a wide range of mathematical and logical functions that can be applied to cells and ranges of cells. Formulas allow users to perform calculations, perform data transformations, or create relationships between different cells or ranges.
  • Data manipulation: Spreadsheets offer tools for sorting, filtering, formatting, and analyzing data. users can perform operations like sorting data in ascending or descending order, applying filters to display specific subsets of data, or formatting cells to change their appearance.
  • Charts and graphs: Spreadsheets provide built-in features for creating charts and graphs based on the data in the spreadsheet. These visual representations help in understanding and analyzing data trends, patterns, and relationships.
  • Collaboration and sharing: Many spreadsheet applications allow multiple users to work on the same spreadsheet simultaneously, facilitating collaboration and real-time updates. Spreadsheets can also be shared with others, enabling easy data exchange and collaboration among teams. 


How to read/write CSV files using Python

Use regular CSV reader to read CSV files:

# Use CSV module to read CSV files
import csv
import os # Optional. This is to make sure that the file path is correct. 
os.chdir(r'file_path')

with open('file_name.csv', 'r') as csv_file:
    csv_reader = csv.reader(csv_file)

    for line in csv_reader:
        print(line)        
No alt text provided for this image
Example

Use dictionary CSV reader to read CSV files:

import csv
import os
os.chdir(r'file_path')

# Read csv files using dictionary reader.
with open('file_name.csv', 'r') as csv_file:
    csv_reader = csv.DictReader(csv_file)
    for line in csv_reader:
        print(line)        

The advantage of using dictionary reader to read csv files instead of using regular csv reader is that it is easier to manipulate the data using the indices, as we can see below:

No alt text provided for this image
Here, we can use "first_name', 'last_name', and 'email' as indices to manipulate the data more easily.

Write a CSV file using regular CSV writer:

with open('file_name.csv', 'r') as csv_file:
    csv_reader = csv.reader(csv_file)

    # next(csv_reader) # We can use this to skip the first line (usually it's the header)

    with open('new_file_name.csv', 'w') as new_file:
        csv_writer = csv.writer(new_file, delimiter = '\t') <- Here, we can define the delimiter we'd like to use.

        for line in csv_reader:
            csv_writer.writerow(line)        
No alt text provided for this image

Here, we can see that all the data in the CSV file is delimited by '\', which is defined by us in the code using the parameter "delimiter='\t'".


Write a CSV file using dictionary CSV writer:

Note: When we use dictionary CSV writer, we need to define the fieldnames (i.e., the header of the file).

import csv
import os
os.chdir(r'file_path')

# Read csv files using dictionary reader.
with open('file_name.csv', 'r') as csv_file:
    csv_reader = csv.DictReader(csv_file)

    with open('new_file_name.csv', 'w') as new_file:
        fieldnames = ['first_name', 'last_name', 'email']

        csv_writer = csv.DictWriter(new_file, fieldnames=fieldnames)

        csv_writer.writeheader()

        for line in csv_reader:
            # del line['email'] # We can use this line to delete a column we don't want in the new file.
            csv_writer.writerow(line)        
No alt text provided for this image



To view or add a comment, sign in

More articles by Ben W.

  • International Parity Conditions Overview

    What are international parity conditions? International parity conditions show how expected inflation differentials…

  • NumPy (Python Library) Overview + Some code

    Introduction of NumPy NumPy (short for Numerical Python) is a powerful Python library for numerical computing. It…

  • Credit Valuation Adjustment (CVA) Overview

    Abstract: Credit Valuation Adjustment (CVA) is an essential concept in the world of finance, particularly in…

    1 Comment
  • K-Means Clustering Algorithm Overview

    K-means algorithm K-means algorithm is a clustering technique used to partition as set of data points into K clusters…

  • Clustering overview

    1. What is clustering? Clustering is a technique in machine learning and data mining that involves grouping a set of…

Others also viewed

Explore content categories