Data Scientists Assemble
With the release of the new Captain America: Civil War film a regular debate reared its head in my team: “who’s your favorite Avenger?*” I know, I know – we really smash that geeky stereotype. This generally leads to hours of debate as to why Iron Man would beat Thor but not The Hulk.
There seems to be a similar underlying appetite in the Data Science community to pit ‘thing A’ against ‘thing B’ in a fight to the death, similar to a Marvel Battle Royale. Lines are drawn, arguments / counter-arguments are created and then let battle commence until there is only one side standing, breathing heavily in the dirt over the vanquished foe… until next time.
An example of this can be found in the analytic software chosen by a Data Scientist to analyze data. One of the first big debates was open source vs proprietary software. The proponents of open source champion its flexibility, low cost and ease. ‘Ah’, the proprietary champions cry, ‘but what about the lack of support, documentation or scalability issues?’. Assuming you go for open source tools further civil war breaks out: python vs R vs Vowpal Wabbit vs… This has now reached the next level in my team with the rise of 2 factions within the R coding language: those who prefer data manipulation using the dplyr package versus those who espouse the value of data.table.
Stop it!
My frustration with all of this is that we run the risk of missing the point – we’re doing this analysis for a reason. As Data Scientists we need to be relentlessly focused on solving the business problem by using our wide knowledge of different tools/techniques and applying the right tool to the right job. It is definitely important to continue to try new approaches and innovate but not to the detriment of what makes the job family so precious – going from data to insight.
* The answer by the way is Hawkeye ;-)
I think the worst internal war in the data science community is this idea of a "fake data scientist", which is usually applied to those with a different background than the accuser. There aren't any "fake biologists" or "fake electrical engineers" so why has "fake data scientist" become such a popular phrase? This does no good except provide fodder to those who claim that data science is nothing more than BI or stats dressed up in fancy new clothes.
Great article, Dan! (and you meant Ant Man, right??)
I agree, Dan. Lots of technological arm wrestling going on at the moment. Better bin my planned next article on The Silver Surfer (Flink) vs The Flash (Spark)...
I love the artwork...do you have similar art for all the Avengers, and can I get that framed? ;-)