Forget Data Science
A very interesting conversation last night concluded with the idea that ANNs, DBNs, RBMs, RNNs, and other deep learning methods contain all of the tools one needs for all the tasks that data scientists are often given. Part of my job for the last few years has involved a wide gamut of tasks - natural language processing, geographical reasoning, filtering, clustering, anomaly detection, quantitative analysis, behavior comprehension, and so forth. Can all of these be tackled with just one base tool? I was really excited hearing this idea, so let's discuss it.
It was then suggested, don't even do the ugly first 50% of work that you are often tasked with as a data scientist. You know what I am talking about. The janitorial stuff: cleaning up data, tagging, and creating meta-data! I have to admit, the idea is appealing. Writing and editing SQL scripts to clean data can be cumbersome. Sometimes you end up writing scripts after the SQL to further massage the data. In bygone years, I programmed in Perl, but lately I'm using my new favorite darling, Julia, for the work.
Raw inputs work to create a deep network that will produce the results I want. I assume any model needs tweaking, or perhaps the selection of the most appropriate method of building the deep network. Finally, engineering the visuals, reports, and so forth. I am not sure it will be less work than what I am already doing, but it would simplify the approach. It would also mean becoming familiar with a tight set of machine learning tools, rather than a large set of diverse tools and statistics (as I have been doing).
I'm even more curious what you think about this? I haven't really made the argument here for those not familiar with the basics of Deep Learning. I will state that I've seen Deep Learning used for NLP (Digital Reasoning had remarkable success around 2010 adding it to their classifier, and that's just the start), visual data (these algorithms still lead all of the public challenges in terms of accuracy and speed), and audio data (for example, Microsoft has been using DNN for Cortana as far back as 2012.) Is there any problem not well suited to deep learning?
What are your thoughts?
EM E.M. Burlingame 蒲 奕 言 ...... thoughts??