Analytics>Forward 2019 Recap
The Research Triangle Analysts (RTA) "Analytics>Forward" unconference set a new record for attendees with 143 curious data explorers exchanging ideas at Blue Cross and Blue Shield in Durham, NC on March 9, 2019. This count of 143 outpaced the previous record of 141 set in 2018.
This article may be augmented in the coming days.
More pictures can be found, courtesy of RTA vice-president Ian Cook at: http://bit.ly/2O126hX and https://www.rtpanalysts.org/af2019photos.
Thank you to the Analytics>Forward sponsors: Blue Cross NC, Rho, GraphAware, Advance Auto Parts, Valassis Digital, NCDS / Renaissance Computing Institute, Valassis Digital, Conduent, and Talking Leaves. Find them on social media at: @BlueCrossNC, @graph_aware, @RENCI, @TheNCDS, @AdvanceAuto, @ValassisDigital, @rhoworld, @Conduent, @NTTDATAServices.
Keynote speaker Jordan Meyer kept the audience rapt with his discussion of how he worked with an international team of 3 to win $1 million in a Kaggle data science competition to improve Zillow's estimated home value (zEstimate) algorithm. We, the RTA board, were grateful to veteran machine learning specialist, Dr. Zeydy Ortiz of DataCrunch Lab, for building a relationship with Jordan after he participated in the NCData4Good event she helped organize and persuading him to speak.
Jordan followed a warm honoring of statistician Dr. Melinda Thielbar, who has tirelessly led the Research Triangle Analysts since its inception more than five years ago. Her successor as RTA president, Eric Yount, recalled the early days in a bar where analytical friends spoke about forming a community. The growth of the group and the breadth of talks and activities, all posted on Meetup [RTA LINK], has been staggering.
Doctor Thielbar also presented a courageous talk on machine learning biases that spoke not only to the risks to disadvantaged members of not analyzing those biases, but also costs to the bottom-line. The writer enjoyed the conversation in the room that featured diverse attendees and daring questions.
Analytics>Forward began with its signature unconference element: talk pitches. Any attendee was invited to take 60-seconds to pitch a talk or discussion. The talk need not have been prepared in the past, which this article writer emphasized to attendees as he ran around speaking with folks who attempted to make sense of what he was suggesting.
To our delight, quite a few individuals pitched a talk. We inferred from a survey to attendees, who expressed overwhelmingly positive sentiment, that there is value in indicating the degree of preparedness for one's talk. However, talks prepared at the last-minute often trigger great conversations and idea-generation, as the article writer experienced.
Above is Greg Frazier, who earned enough votes to present "Domain Expertise in Data Analysis" in a well-attended talk. The top vote-getter was RTA vice-president, Ian Cook of Cloudera, whose talk focused on federated learning's ability to alleviate privacy concerns associated with data aggregation. You can hear his talk pitch, as well as the preceding "Intro to Keras, A Simple Deep Learning Framework" pitch by Dhruv Sakelley by clicking here.
The talk titles and presenters, and slides in some cases, can be found on the RTA website here.
An event like Analytics>Forward requires great volunteers. Co-coordinator Sheri Frank (image below) procured the cool swag and also represented sponsor, Rho.
You can see all of the A>F planning committee members and volunteers in the animated .GIF at the top of this article.
Brian Fannin of Casualty Actuarial Society skillfully ordered the food for the day, which included a lunch of Mellow Mushroom gluten-free pizzas as well as greek and spinach salads.
I want to close by delving deeper into Jordan Meyer's talk. A few notes I took:
Jordan Meyer's A>F 2019 keynote
"Kaggle in the Real World: Practical Lessons from Winning The Zillow Prize"
3 techniques that gave the biggest lift
1. Outlier Threshold
2. Feature effect threshold – feature engineering => work to find useful ones + convert two variables: price and square-feet to price / square foot
3. Encoding categorical:
a. Label encoding : simple
b. One-hot encode : problem => creates large number of columns
c. Mean encoding : sometimes leads to overfit
(loop over in R with for loop)
- Very large search spaces: hyperopt (python), computer running 24/7 for 6 months
- Suggested read by Jordan: “Stacking Made Easy: An Introduction to StackNet” [Kaggle article]
- Fundamental theorem in machine learning: bias-variance tradeoff [high-bias (underfit) and high variance (overfit)]. Jordan thinks high variance worked well in urban areas where variability in neighboring blocks can be high. High-bias worked better in rural.
- Top features (predicted increased housing value) included:: GrlivArea, OverallQual,TotalBsmtSF, FireplaceQu
- Model blending: Tensor Flow, neural network regressor, light grandient boosting on elasticnet (Gamma loss) extreme gradient boosted trees regressor (Poisson Loss) TensorFlow Deep Learning Regressor
- Two $1200 GPUs in $6000 computer. Nema had AWS bill of $200 in at last one month. (Excelling in a Kaggle competition can be expensive and require an immense number of iterations)
The Analytics>Forward 2019 planning committee
- RTA board members: Eric Yount, Melinda Thielbar, Aaron Terry [Blue Cross NC employee and host liaison], Dan Kelly, Ian Cook, Rick Pack
- Other planning committee members: Brian Fannin, Sheri Frank, Zeydy Ortiz, Ph. D., Evelyn (Xiaoqing) Ma
- Volunteers: Arnetta Girardeau, Ryan Pack
See more content about the event by searching for the hashtag #AnalyticsForward.
#DataScience
I ordered pizza skillfully!
Referenced in the article: Eric Yount, Melinda Thielbar, Aaron Terry, Dan Kelly, Ian Cook, @Brian F., Sheri Frank, Zeydy Ortiz, Ph. D., Evelyn (Xiaoqing) Ma, Arnetta Girardeau, Ryan Pack, Dhruv Sakalley, Greg Frazier, Jordan Meyer
Great recap Rick! I think Analytics Forward is one of the best kept secrets in our area.