Exploring practical applications of regression with Python and scikit-learn
This week I focused on regression analysis using Python and delved on the scikit-learn library. As I mentioned in my previous post, this module is heavily maths-based and seems a bit dry, but I'll do my best to make it interesting without diving too deep into the technical details.
However, before doing that, I want to share the feedback I got for my module 2 assignments. For the professional practice assignment, where I reflected on my ongoing study of Python, I scored 30/40. For the main module assignment, "Machine Learning using Cloud Computing," where I developed my first machine learning model using NLP to recognise sentiment analysis, I scored 73/100 which seems to be one of the top marks in the class again. This result was both surprising and rewarding. I say that because, at the end of the module 2 practical workshops, I felt I had lots of gaps in my understanding. But while I was writing this assignment and developing my artefact, I put in extra time to understand and learn what I needed to make it successful. This end goal helped me fill some of those gaps, which is why the feedback was so rewarding as it confirmed that I'd got my ideas and learning right. Reflecting on this, I can say that I learn better when I'm working on something real and practical.
This week, we explored scikit-learn, which is a machine learning library in Python that offer tools for data mining and data analysis. It's built on NumPy, SciPy, and matplotlib, and is used because it offers a range of supervised and unsupervised learning algorithms.
During the session, we looked at different types of regression using scikit-learn. Here are the main points to remember, without going too much into the details:
In the workshop we also touched on the use of Support Vector Machines (SVM) with scikit-learn. SVM is a machine learning algorithm that tries to fit the best possible line (or hyperplane) within a certain margin of tolerance. This makes SVM particularly effective for high-dimensional spaces and complex datasets. For example, SVM can be used for detecting spam emails by classifying emails into spam and non-spam categories based on their content.
The combination of Python and scikit-learn is a very useful toolset for data analysis, and I look forward to applying these skills in future projects. While reading the learning material is a bit challenging, especially when delving into the maths behind it, the practical applications make it worthwhile.