Converting Twitter Tweets to Audio Streams with Python
Goal
I wanted to grab public Twitter feeds and parse the tweets into audio streams (as mp3's.) It occurred to me this might be a fun and interesting code project.
Language and Modules
I chose to use Python for this task and used two libraries outside the Standard, these being:
- gTTS
- bs4
gTTS is a Google text to speech library, it can be installed with pip:
pip install gtts
bs4 (Beautiful Soup) is a HTML parser and can be installed with pip:
pip install bs4
Code
The code itself is fairly simple. The flow of execution starts with the menu method, which prints out some statements to the user and collects raw input for the twitter feed you want to scrape and convert to audio.
The next method in play is twtterAccountFeed, which takes the feed name as a parameter and appends it to the Twitter base url.
I import the sys module, only for handling an exit code on the script. If a supplied twitter feed doesn't exist, I kill the script with exit code 1.
Once the Twitter feed is setup I pass it into a method handling the BeautifulSoup parsing, the parseTweets method, which takes two parameters:
- html (the url we'll pull the HTML from, to parse)
- feed (the supplied Twitter handle - used for naming the mp3)
At this point I iterate over the HTML looking for div's with the class of js-tweet-text-container (where the text of the tweets is.)
I made use of a Stack Overflow article on a regular expression to strip out all the URLs (otherwise it would read out the URLs.) This cleaned up version of the tweets is thrown into a list for safe keeping.
The list is later converted to a string using ''.join(dialogue) where dialogue is the list name. At this point we have our text string of the tweets of the feed in question. Now we need to pass it to gTTS.
Since I'm leveraging a module/library to handle the TTS, this part is very simple:
gTTS is setup with the parameter value for text, being the string of 'cleaned up' tweets and the value for lang is set to English.
After that, I simply output the result of the TTS to a mp3 file using the gTTS 'save' method:
And that's it - the script can also be found on github:
Demo