Spotlight on various tools for machine learning

Last summer, I started working on machine learning. Since then I have solved problems, learned many modules (I mainly work in python). I thought its time to share my experience and views on some of tools I have used so far. I know there are many blogs out there talking about these things, but this post is specific to python language and it's tutorials. I will talk about tutorials you can follow and modules you can use to get things up and running in machine learning and deep learning.

What started with curiosity, built up to become a major career choice within two years. I started with Andrew Ng's coursera Machine learning soon after completion of course I did quick internet search and found humongous amount of blogs, article and tutorials on Machine Learning, some of them were concise and well articulated. I read many blogs and watched videos before settling to python for implementing ML.

Python

Coming from MATLAB background, python was easy to learn. There are plenty of tutorials on internet (more than possibly you think of). Python itself is not powerful, but packages in python like numpy (scientific computing module), scikit-learn( machine learning module), Pandas (Data analysis module), Matplotlib (visualization module), make our life easy. Machine learning boils down to humongous ugly non-meaningful not-so-complete scattered data, clean this data (pandas and numpy), fill up missing values based on physics of the problem, visualize it (matplotlib), apply machine learning algorithms to give meaning to it (scikit-learn). In short you need to learn all those modules to solve ML problems.

Why Python ?

  • Ease to learn : Python is easy to learn, if you have basic understanding of matlab apart from some differences like zero-based (python) to one-based (matlab) index, array slicing and tiny differences in syntax. But also in general its easy to learn.
  • Need to learn : Python offers wide range of powerful modules to work on, every possible application has python wrapper functions, in a way you can work on anything, once you know python philosophy. Also there are plenty of API for this particular language.
"With great power comes great responsibility" - Uncle Ben (spider-man)
  • Outcome of learn : First of all you will solve complex problems, that's great ain't it?. Many companies work on python now. Everybody has their customer data, they want to work with it and make money. Data science is one way, because python is computationally powerful, people are tending towards it.

Of all one the reasons, this tops : its FREE !! You don't have to spend single penny on it. All you need is a machine and internet connection. Off you go.

Numpy

Numpy is a scientific computing module, used for N-dimensional array manipulation and calculation. This is mainly used for linear algebra, random variables and matrix calculations. numpy is faster compared to python, whenever you call numpy function it takes the data outside python environment and does faster calculation and returns back with result. Some operations in python overlap with numpy, so I would suggest work everything on numpy instead of python. It can installed using pip command in terminal. There are plenty of tutorials on numpy, I am listing some of them below.

  1. http://www.numpy.org/ : official website for numpy documentation. It very well written and elegantly documented.
  2. Youtube video from enthought conference : This is 3 hours long video, which is enough to have basic understanding of numpy.
  3. Google Python class : Please do the coding assignment, it makes you learn better.

Pandas

Pandas is data analysis tool written on top of python and numpy. It's a open source tool. It has powerful data structure called data frames, help you to work with any datatype objects, very easy to read from file and store back. Normally after reading data, which has attributes columns and labels (ML jargon), I put it in dataframes, makes life easy. It provides many methods to manipulate each column or dataframe in whole or just single element. I was very excited (blown away my mind) when I first started working on this. It does very complex job under the hood, all user has to do is call right method.

  1. http://pandas.pydata.org/ : Official site, you will learn more here any thing other site.
  2. 10 minutes to pandas : Short introduction to pandas, for quick start.
  3. I have some documents myself, i will soon upload them to my GitHub page. They are short, give essence of pandas for starters.

scikit-learn

I was never been this thrilled, learning and working on this package in pleasure in itself. This is essentially best package I have worked on. This package has everything, you name it. Any algorithm (classification, regression), evaluation metrics (confusion matrix, accuracy, precision and recall), data pre-processing methods (PCA), feature selection methods (model based, K best), feature extraction methods. This made application of ML to real world problems much much much easier. If you cleaned the data (so that scikit learn library understands it) and know the statistics well, applying right algorithm and validation is not more than 5 lines of code literally (I mean literally).

  • http://scikit-learn.org/stable : This site is enough to start with ML, even if you have zero knowledge of machine learning. Site not only talk about library but also about algorithm, hyper parameters to tune for better results, it goes one step further and talks about what algorithm to choose based on application and type of data set. What more do you want ? This almost solves problem you are working on. All you need is scikit-learn installed and data (of course problem to solve). There you go your problem is solved.


  • Although scikit-learn official website is well written, there are some other tutorial. these assumes you know machine learning theory part. Jake VanderPlas Youtube pycon video. Its 3 hour long video, download resources and work on it, its really helpful
  • Python Machine Learning : This is really a good book, he talks about algorithm and how to implement them using sklearn (scikit learn, you call it sklearn during progamming). its available on amazon. if you are a university student, i am sure you will have this is your library (e-book)

Matplotlib

Matplotlib, as name suggests it a plotting library, it is equipped with many methods to help you visualize that data (normally, just by looking at data, gives you idea how to approach the problem, most of the times not always). You can also have interactive plotting using this, I haven't did it myself, but its cool to look at. One other plotting library is seaborn, which is also colorful as this. This is similar to MATLAB plot, which some extra features.

  1. Matplotlib official site : Almost all package have official sites, which are so well documented that you have build just by reading through it.
  2. Seaborn official site

I did not go through many tutorials, I feel official website is good enough for applications i have worked on.

One more tool, i haven't mentioned above, (this is my second favorite after scikit learn for ML and best ever for deep learning) currently I am working on is Tensorflow, a google product (increases the credibility, just by using google's name).

Tensorflow

This open source software, for numerical computation built on C++. Its analogous to lego blocks, your methods are small blocks using which you build a nice structure (deep learning architecture). You build graph (architecture) using tensorflow methods, then call session to run those graphs. This way is faster for computation as code already knows the graph before hand. Features that makes tensorflow stand aside are 1. it has built in gpu capability (I mean you can run deep architecture on gpu, which is way fater than CPU). 2. Also you can run tensorflow on distributed systems, helps you to work on big data. 3. More importantly, you can export model to mobile smartphone and run deep learning algorithms on it. (Currently I am working on this for real time applications using deep learning algorithms on smart phone under Dr. Nasser Kehtarnavaz for my masters in University Of Texas at Dallas)

  1. Tensorflow offcial site is good enough, its better than any site i mentioned before, it will guide you from installation through building architecture to running it on smartphone.
  2. Youtube channel : Hvass laboratories : Nice videos about how to get started with tensorflow.
  3. Udacity course : From google, gives brief introduction to deep learning.

Apart from all these tools, I want to talk about, jupyter notebook anaconda distribution, which is widely used to run these tools. Its development environment, very different from conventional. You can have presentation, run code, plot graph, even write report in it. I will not talk in detail, just follow the link here. Its self explanatory.

Conclusion

All these tools makes our job easier to build models, it can be classifier, recommender system or regression. I would suggest spend less time learning or watching tutorial videos, start building something using tools, its best way to learn. I spent lot of time watching many videos, but working on problems made me better. On other side of spectrum, learn basic understanding of what is available in field before you start working, because you might miss something important. Some interesting links below, these keep me updated and I always go back if i get stuck.

  1. Harvard class : Gives brief idea, about what's available for you in ML. Labs are tough and make you learn, even just by following lab materials. Dr. Rahul dave excellent speaker.
  2. Stanford Deep Learning : Talks on ConvNet, opened a whole new field for me. Thanks to speaker Andrej kaparthy. Inspired me to work on ConvNet.
  3. Udacity Deep Learning : Brief deep learning google course.
  4. Data machina magzine : Google it and subscribe, they send you articles on latest ML development.
  5. CMU machine learning : I go back here, if I have doubt on particular topics.
  6. Data Science Outline : Gave up job, learned data science in 6 months. Nice article to get started. He has listed some tutorials. Take it with pinch of salt.

I want to end this post on a negative note by British statistician George Box, but inspires to works towards perfection.

"Essentially, all models are wrong, but some are useful." --- Box, George E. P.


Nice over view srinivas kulkarni!! Very helpful for beginners in Machine learning field who are lost in immmense amount of sources and don't know from where to start.

Like
Reply

It's really good. Good job, Srinivas!

Like
Reply

Great article to start with. Got direction to proceed. Thanks srinivas kulkarni

To view or add a comment, sign in

Others also viewed

Explore content categories