Fall 2 Underway
This will be the first in hopefully a series of articles highlighting the work I and my cohort are doing while earning an MS in analytics.
We are just beginning the second module of the fall semester at the IAA. For these five weeks, I'm teaming up with Gus Conwell, Luna Gu, Andrew Guerrazzi, and Shalina Omar to complete homework and projects.
This past week, the cohort continued our introduction to data mining and text analytics, and dove further into time series data, seeing seasonal ARIMA models for the first time. Also, Tony Mostek used the movie "12 O'Clock High" as a backdrop for some interesting discussions on the situational leadership model.
Gus Conwell and Luna Gu led the charge on submitting our team's first data mining assignment -- making wine recommendations to a restaurant using market basket analysis. They cleaned data in SAS, discovered the relevant association rules using R, and authored the report in Google Docs. Big shoutout to Shalina Omar for providing the final proofing (and for tuning me into effectively using the 'Suggest Edits' feature)!
Meanwhile, Andrew Guerrazzi got the ball rolling on our text analytics project. Our chosen topic will dig into how the writing of a well-known author changed over the course of her career. Andrew used the EbookLib and BeautifulSoup libraries to convert .epub files to plain text for later use. This was a HUGE win for the team, as having usable data was one of our biggest question marks going into this topic.
I myself spent LOTS of time this week staring at Python errors while getting to know more about the Pandas, NLTK, and Matplotlib libraries. Needless to say, my Python related Google searches are becoming much more efficient :)
Thanks for reading.
Adam
I have never heard of ebooklib, but it sounds very useful! Does it work with any ebook format?
Your team's text project sounds fascinating! Looking forward to seeing the finished project.