Converting Nested JSON data to CSV using python/pandas

Kaleab Tekle Woldemariam

Published Oct 27, 2017

I want to share how I converted a json (Javascript Object Notation) data into easily readable csv file. The data was obtained from Stackexchange API. https://api.stackexchange.com/docs/top-answerers-on-tags. I used the tag –‘python’ and period –‘month’. I saved this data.json to my working directory in PyCharm IDE.

{
  "items": [
    {
      "user": {
        "reputation": 42660,
        "user_id": 4909087,
        "user_type": "registered",
        "accept_rate": 96,
        "profile_image": "https://i.stack.imgur.com/q6xFr.jpg?s=128&g=1",
        "display_name": "cᴏʟᴅsᴘᴇᴇᴅ",
        "link": "https://stackoverflow.com/users/4909087/c%e1%b4%8f%ca%9f%e1%b4%85s%e1%b4%98%e1%b4%87%e1%b4%87%e1%b4%85"
      },
      "post_count": 347,
      "score": 623
    },
    {
      "user": {
        "reputation": 173184,
        "user_id": 2901002,
        "user_type": "registered",
        "accept_rate": 96,
        "profile_image": "https://i.stack.imgur.com/hMDvl.jpg?s=128&g=1",
        "display_name": "jezrael",
        "link": "https://stackoverflow.com/users/2901002/jezrael"
      },
      "post_count": 396,
      "score": 613
    },
    {
      "user": {
        "reputation": 95634,
        "user_id": 2336654,
        "user_type": "registered",
        "accept_rate": 96,
        "profile_image": "https://i.stack.imgur.com/mQIca.jpg?s=128&g=1",
        "display_name": "piRSquared",
        "link": "https://stackoverflow.com/users/2336654/pirsquared"
      },
      "post_count": 163,
      "score": 360
    },
    {
      "user": {
        "reputation": 13646,
        "user_id": 7964527,
        "user_type": "registered",
        "accept_rate": 100,
        "profile_image": "https://i.stack.imgur.com/mD4ZC.jpg?s=128&g=1",
        "display_name": "Wen",
        "link": "https://stackoverflow.com/users/7964527/wen"
      },
      "post_count": 213,
      "score": 325
    },
 ],
  "has_more": false,
  "quota_max": 10000,
  "quota_remaining": 9985
}

Objective:

List the top answerers in Stackoverflow with tag –‘Python’.

Method:

Create a python file named convert_JSON_to_CSV.py and import the modules pandas, csv and json. As the JSON data is nested, we need to only select the dictionary keys that we need. I especially need the ‘item’ key, and its subkeys, and furthermore I need the keys ‘score’ and ‘post_count’ for each Stackoverflow user(SO_user).

The js_r function reads the input data.json and converts it to python dictionary. I defined the original dict keys as keys= my_dic_data.keys(). Now, I only need the ‘items’ key which I used in dictionary comprehension to iterate over the dictionary of items in the my_dic_data python dictionary and assign it to a new key- ‘my_items’.

dict_you_want={'my_items':my_dic_data['items']for key in keys}

Then I used pandas to create a DataFrame(df) for dict_you_want as df. However this df has one column my_items each row consisting a dictionary of ‘user’.I want this dictionary to be used as a column, for this I apply an .apply(pd.Series) method on df and assign it df2.

df2=df['my_items'].apply(pd.Series)

Finally, We keep the columns ‘post_count’ and ‘score’ and drop df2’s ‘user’ column to avoid the nested ‘user’ dictionary, since we need to expand it one level. We concatenate (pd.concat) the df2 without ‘user’ column with expanded df2’s ‘user’ column by applying .apply (pd.Series) on df2 as df2[‘user’].apply(pd.Series).

df3=pd.concat([df2.drop(['user'],axis=1),df2['user'].apply(pd.Series)],axis=1)

Code: JSON_to_CSV.py:

import sys
import pandas as pd
from pandas import DataFrame
import json


data=r'C:\Users\Kaleab\Desktop\data.json'
print ("This is json data input", data)

# Reads and converts json to dict.
def js_r(data):
   with open(data, encoding='utf-8') as f_in:
       return(json.load(f_in))

if __name__ == "__main__":
    my_dic_data = js_r(data)
    print("This is my dictionary", my_dic_data)
keys= my_dic_data.keys()
print ("The original dict keys",keys)
# You assign a new dictionary key- SO_users, and make dictionary comprehension = { your_key: old_dict[your_key] for your_key in your_keys }
dict_you_want={'my_items':my_dic_data['items']for key in keys}

print ("These are the keys to dict_you_want",dict_you_want.keys())

print ("This is the dictionary of SO_users", dict_you_want)
df=pd.DataFrame(dict_you_want)
print ("df:", df)
#When .apply(pd.Series) method on items column is applied, the dictionaries in items column will be used as column headings
df2=df['my_items'].apply(pd.Series)
print ("df2",df2)
df3=pd.concat([df2.drop(['user'],axis=1),df2['user'].apply(pd.Series)],axis=1)
#df3=df2['user'].apply(pd.Series)

print ("df3",df3)


df3.to_csv('out.csv', sep=',',encoding='utf-8')

The output: out.csv

I hope you find this helpful. I appreciate comments.

Kaleab Tekle Woldemariam 8y

Chandrasekaran A Are you using Python 2.7? Try 3.x or search a solution on Stackoverflow.

1 Reaction

Chandrasekaran A 8y

('This is json data input', '/home/chandru/data.json') Traceback (most recent call last): File "convert.py", line 16, in <module> my_dic_data = js_r(data) File "convert.py", line 12, in js_r with open(data, encoding='utf-8') as f_in: TypeError: 'encoding' is an invalid keyword argument for this function I got the result.. Please help ..how to resolve Kaleab Woldemariam

See more comments

To view or add a comment, sign in

Converting Nested JSON data to CSV using python/pandas

Kaleab Tekle Woldemariam

More articles by Kaleab Tekle Woldemariam

Others also viewed

Mimicking Python Dictionaries in Power Query M: how, why, and is it better than a join/merge?

Introducing dremioframe - A Pythonic DataFrame Interface for Dremio

SQL vs Python in Data Pipelines

Easy way for Developers and DBAs analyze data from Oracle AWR reports using python

Configuring a Web server on a docker container + Running python scripts on container.

Evolving Delta Tables

🔹[Day 3/8] New in Microsoft Fabric: T-SQL and Python code against Warehouse

Comparing Python/pandas, R and SAS

Writing code to generate code - Python + SQL version

Explore content categories

More articles by Kaleab Tekle Woldemariam

It isn't about the PhD...Being Social

On Scientific Theory

Beginner in AI: Perceptron Algorithm, Maximum Likelihood and Cross Entropy Explained in two Class Problem

Mishaps in my Django/React JS Web Development Project Planning and Implementation

Today’s Radicals.

The Effect of ፌጦ on Teleportation

I can’t breathe!

Python Special Methods (aka Magic/Dunder Methods) and Sparse Vector Addition in Plain English

This is why we need collaborative filtering (aka only our favourite songs)

Why use recursive functions? Fibonacci and Prime Numbers