CSV files overview (And how to use Python to read/write simple csv files)
A CSV (Comma-Separated Values) file is a plain test file that stores tabular data, where each line represents a row, and the values within each row are separated by commas (or other delimiters). It is a common file format used for data exchange between different software applications, especially when we work with spreadsheet-like data.
Advantages of CSV files:
Disadvantages of CSV files:
Overall, CSV files are widely used for their simplicity, compatibility, and ease of integration, but they may not be suitable for complex data structures or situations that require advanced data validation or metadata management.
Tabular data refers to data organized in a table-like structure, where information is arranged in rows and columns. It is a structured form of data representation commonly used in spreadsheets, databases, and CSV files. In a tabular format, each row represents a separate record or observation, while each column represents a specific attribute or variable associated with that record.
Main characteristics of tabular data:
Tabular data provides a structured and organized way to represent and store data, enabling easy access, manipulation, and analysis. It is commonly used in a various domain, including finance, business, research, and data analysis, as it allows for efficient processing, sorting, filtering, and aggregation of data based on specific attribute or conditions.
A spreadsheet is a digital file or document that consists of rows and columns, forming a grid-like structure. It is used for organizing, storing, and manipulating data, especially in a tabular form. Spreadsheets are commonly associated with software applications like MS, Google Sheets, or LibreOffice Calc, which provide powerful tools and functionalities for working with spreadsheet data.
Main features of a spreadsheet:
How to read/write CSV files using Python
Use regular CSV reader to read CSV files:
Recommended by LinkedIn
# Use CSV module to read CSV files
import csv
import os # Optional. This is to make sure that the file path is correct.
os.chdir(r'file_path')
with open('file_name.csv', 'r') as csv_file:
csv_reader = csv.reader(csv_file)
for line in csv_reader:
print(line)
Use dictionary CSV reader to read CSV files:
import csv
import os
os.chdir(r'file_path')
# Read csv files using dictionary reader.
with open('file_name.csv', 'r') as csv_file:
csv_reader = csv.DictReader(csv_file)
for line in csv_reader:
print(line)
The advantage of using dictionary reader to read csv files instead of using regular csv reader is that it is easier to manipulate the data using the indices, as we can see below:
Write a CSV file using regular CSV writer:
with open('file_name.csv', 'r') as csv_file:
csv_reader = csv.reader(csv_file)
# next(csv_reader) # We can use this to skip the first line (usually it's the header)
with open('new_file_name.csv', 'w') as new_file:
csv_writer = csv.writer(new_file, delimiter = '\t') <- Here, we can define the delimiter we'd like to use.
for line in csv_reader:
csv_writer.writerow(line)
Here, we can see that all the data in the CSV file is delimited by '\', which is defined by us in the code using the parameter "delimiter='\t'".
Write a CSV file using dictionary CSV writer:
Note: When we use dictionary CSV writer, we need to define the fieldnames (i.e., the header of the file).
import csv
import os
os.chdir(r'file_path')
# Read csv files using dictionary reader.
with open('file_name.csv', 'r') as csv_file:
csv_reader = csv.DictReader(csv_file)
with open('new_file_name.csv', 'w') as new_file:
fieldnames = ['first_name', 'last_name', 'email']
csv_writer = csv.DictWriter(new_file, fieldnames=fieldnames)
csv_writer.writeheader()
for line in csv_reader:
# del line['email'] # We can use this line to delete a column we don't want in the new file.
csv_writer.writerow(line)