Mining social media data using Python (1) - REST API & OAuth
Introduction
Recently my friend asks if I can analyze his social media page to understand their members' behaviours and find out those VIPs. Anyone experiences this kind of request before? The challenge of this request is - How can I download the data from a social platform?
Actually there are two common ways to export data from social media, using Linkedin as examples:
1. Download CSV from the Linkedin Interface (Non-technical)
This is the most friendly way for non-technical users to export their data, no programming is required. However, the drawback of this method is that the data refresh is not automated and most of the time the exported contents are not rich enough for analysis.
2. Make requests to REST API (technical)
REST API is an application program interface (API) that allows two software programs to communicate with each other, and the communication way is simply using a HTTPS requests to GET or POST data. Compared with CSV download, REST API provides you an automated way to export data and the data richness is higher than method 1.
Before choosing REST API, you should read the platform's API document to see if your desired content is available to download.
(e.g. Profile API documentation - the "Request" shows you the URL to call the data and the "sample response" shows you sample data of the result.)
What is OAuth? How does it work?
OAuth is an open standard for access delegation, commonly used as a way for Internet users to grant websites or applications access to their information on other websites but without giving them the passwords. Most of the social media platform like LinkedIn, Facebook, Google are using OAuth for their APIs. So, OAuth is like the lock of a treasure box and data is the treasure within, you must learni the technnque to unlock the box.
In the concept of OAuth, there are 4 different roles involved in the authentication flows, including Resource Owner, Client and Resource Server & Authorization Server. And the image above shows how these 4 parties communicating with each others.
Real-life example: Creating a new account or logging into iHerb.com
iHerb.com is one of the web applications which make use of Google/Facebook OAuth as their authentication. Given you want to login iHerb.com using your Facebook account, you are experiencing the OAuth already. For all OAuth flow, there are mainly 4 roles defined and participanted.
OAuth roles:
- Resource Owner: You are the resource owner. The one using the client (iHerb.com)
- Client: iHerb.com, the one asks for your personal information in Resource Server.
- Resource Server (RS): Google/Facebook, which contains your personal information.
- Authorization Server (AS): Google/Facebook, which owns the OAuth engine.
OAuth Code flow:
The flow diagram above (by Takahiko Kawasaki @ Authlete.Inc) shows clearly how authorization process. Let see how this diagram explains the flows of our example:
- (User) Click "Sign in" in iHerb.com and enter a login page. iHerb asks you whether to "sign in with Google/Facebook".
- (Client) Once you click the button, iHerb makes an authorization request to Google/Facebook's authorization endpoint.
- (AS) Google/Facebook returns an authorization page to iHerb.
- (Client) iHerb displays the authorization page to you.
- (User) You are redirected to the Sign-in page of Google/Facebook. Before login, the page shows you the "scope" of information requested by iHerb. Input ID and password to login the service and approval request.
- (AS) Google/Facebook issues a short-lived authorization code to iHerb.
- (Client) iHerb presents the authorization code to Google/Facebook's token endpoint.
- (AS) Google/Facebook issues an access token to iHerb.
With the access token, iHerb can then go through the REST API call (Step A to D) to get your information like email, profile pic and user name for registration. That's why after sign in with Google/Facebook, you can create an account in iHerb instantly without inputting your personal information. iHerb has grabbed the information from Google/Facebook for account.
Pick your desired tool for making HTTPS request if REST API is the right option to retrieve your data
Any tools which make HTTPS requests can be used to export data (e.g. R, Python, Javascript, PHP). The mechanism and process on request they go through are just the same. The only difference between tools is the syntax.
In this article series, I will show you how to use Python to make requests because:
- it’s a free and open source tool, everyone can download and use it
- It’s so popular that you can find a lot of guidance and documentation in the web
- I love Python!!
What's next? Use Python to go through OAuth
Stay tuned, it is the time to take a break after reading so many words. In the next article, I will start sharing how to write Python script to act as a client to communicate with a Authorization Server and get the access token. Happy learning!
Next article: Mining social media data using Python (2) - Make API request with Python
Haha thanks Julian..... actually I am not that IT technical ..... so wanna share how I learn from a non technical perspective .... hope all people enjoys it
Great work. Love the way you introduce OAuth into the analytics world.