Interview with a Data Scientist: Tom Roth
As part of my Masters of Data Science, undertaken at RMIT University, students were required to reach out to a real data scientist in the industry, and ask them a series of questions pertaining to their outlook on data science as a discipline.
Tom Roth is a Data Scientist working at Caltex with several years experience in the field of data. I caught up with him on a lazy Sunday afternoon to talk about our favourite subject of machine learning and data science.
Graduating from the University of Sydney with honours, having completed a Bachelor of Science and Mathematical Statistics, Tom joined the Telstra Graduate program in February of 2016, working in both Data Analyst and Data Scientist jobs alike, before joining Caltex Australia full-time as a Data Scientist, and has been working there since 2017.
In his spare time, Tom also runs a blog on data science.
Tom’s complete professional history, can be found on his LinkedIn page.
1. How do you plan to tackle instances of bias entering your dataset, to ensure we're not creating biased machine learning models?
Ensuring your dataset sample proportionally represents the population, and provides adequate sampling representation of subcategories within your data would be the initial factors one should consider when attempting to avoid bias.
There are instances in which your dataset may inherently contain biases as a result of real-world scenarios. One example could be convictions handed out by the legal system in comparison to people coming from different cultural backgrounds.
An important way around this may be to retrain your dataset, based on different rules – have your dataset re-assign convictions to these people, while removing the feature of cultural background from discussion, and focusing on other variables – such variables more-so related to the crime itself, than the origins of the accused.
2. What industries do you see making the best use of data science and machine learning? What kind of improvements do you envisage?
The potential improvements to the medical industry are what excites me the most. Being able to utilise elements of machine and deep learning on ultrasound data in order to accurately predict instances of cancer in patients I find most fascinating.
Being able to utilise machine learning to more promptly identify patients showing early signs of cancers and other diseases, could quite positively change the course of people’s lives, allowing patients the opportunity for earlier treatment and remedies.
3. Are there any areas you see potentially causing controversy by introducing machine learning to?
Any areas involving artistic pursuits. Machine learning algorithms which can compose music or even paint pictures at a level similar to those who make it their profession.
This kind of change to an industry could quickly displace many people from their profession, making it too hard to compete against a plethora of algorithms ready to utilise artists creations against them.
If machine algorithms can produce art and compose music, one could argue whether there’s any soul or spirit left in the industry.
The same idea flows into automation of the current workforce – the advent of machine learning, and AI technologies could potentially lead into a situation where a significant portion of the population possesses skills which are no longer required, forcing people into a situation in which they either struggle for re-employment, and are retrained in other areas which machine learning cannot as quickly replace.
4. RStudio and Python are currently two large players in the data science market. Do you see either of these becoming a dominant force, or potentially a new software package entering the market?
RStudio and Python at this point are strongly established tools within the industry. I don’t expect any contenders in the medium term.
The industry however, is moving more towards Python and away from R, especially in Machine learning/deep learning, where it’s primarily moving towards Python.
RStudio has as a whole served data science well - acting as an open source, easily obtained stepping stone into data science for the early adopters of the community.
5. When attempting to convince or inform a target audience of an intricate data science finding, how do you best present this message in an easy to consume manner?
When convincing a target audience, the message should be tailored compared to who you’re presenting to.
You have stakeholders that are technical, who you can walk through the inner workings of the algorithm, while you have others who just want to see what the results mean to them and their business.
In both situations, you want an aesthetic presentation, of graphs and infographics – a story to walk your stakeholder through, to ensure they are able to digest the message an in easily understood way.
A lot of people get into data science because they like building stuff - building models is fun.
The marketing part doesn’t necessarily come naturally – some perspective is well needed – utilise your team lead, and the people around you, when constructing your message before going out to present.
---
This interview was conducted by Anthony Tsoukas, Information Analyst at Telstra, and Master of Data Science student at RMIT University.
Article photo by Markus Spiske on Unsplash.
Great interview Tom and Anthony.
Nice work Anthony