Machine Learning Internship At Codegnan.
Over the past two weeks, I embarked on an exciting learning adventure with Codegnan. During this time, I delved into the captivating world of machine learning, specifically focusing on topics such as arguments, NumPy arrays, indexing and slicing, universal and statistical functions, and lot more, as well as prominent libraries like Numpy, Pandas, Matplotlib, and Plotly, Scikit-learn. Furthermore, I had the opportunity to explore machine learning techniques, with a particular emphasis on regression, especially linear regression. In this article, I will share my experiences, and key takeaways from my two-week journey with Codegnan.
✨UNDERSTANDING ARGUMENTS:
To begin my machine learning journey, Codegnan provided an in-depth understanding of arguments and their types. In Python, we encounter two main types of arguments: positional arguments (args) and keyword arguments (*kwargs). Positional arguments are passed based on their position in the function call, whereas keyword arguments are passed with a keyword and corresponding value. This knowledge proved fundamental in writing clean and modular code.
Module: A python package consisting of variables, functions and classes.
Package: It is collection of modules.
✨MASTERING NUMPY ARRAYS,INDEXING,SLICING:
NumPy is a powerful library that provides support for large, multi-dimensional arrays and matrices, along with a vast collection of mathematical functions to operate on these arrays efficiently. Codegnan helped me grasp the concept of NumPy arrays, explore array creation techniques, and perform various operations on them. Moreover, I learned about indexing and slicing, enabling me to extract specific elements or sub-arrays from larger arrays easily. This knowledge laid a solid foundation for further exploration of machine learning algorithms.
✨EXPLORING UNIVERSAL AND STATISTICAL FUNCTIONS IN NUMPY:
In machine learning, we often encounter scenarios where we need to apply mathematical operations to entire arrays or perform statistical analysis on data.It introduced me to universal functions in NumPy, which enable element-wise operations on arrays, providing fast and efficient computations. Additionally, I gained insights into statistical functions provided by NumPy, such as mean, standard deviation. These functions proved invaluable when it came to data preprocessing and exploratory data analysis.
▪️Some of statistical functions are:
Mean-->mean()
Standard deviation-->std() -->sqrt of var()
▪️Universal functions :
Summation-->sum()
Logarithmic-->log()
Exponential-->exp()
✨HARNESSING THE POWER OF PANDAS:
Pandas is a widely used data manipulation library in Python, offering high-performance data structures and data analysis tools. It allowed me to gain proficiency in Pandas, enabling me to read, manipulate, clean, and transform data efficiently.
I have also learned about different topics such as
-->Series: A series is a one-dimensional labeled array that can hold any data type
-->DataFrame: It is a two-dimensional structures consisting of multiple series.
-->Loc():it gives dataframe as it is .
-->Iloc() :It is a integer index location-->rangeindex(0, 1,2...) .
-->fillna():used to fill missing values in the dataframe.
-->reindex():Used to change the row and column labels of a DataFrame.
-->Sort_values():Used to sort a DataFrame based on the values and lot more.
✨MATPLOTLIB:
Matplotlib is a popular plotting library in Python that provides a wide range of tools for creating various types of visualizations. It is widely used for data exploration, analysis, and presentation in fields such as data science, machine learning. Matplotlib provides a high degree of flexibility and customization options, allowing us to create professional-looking plots with just a few lines of code. Using matplotlib we can able to do
▪️data preparation.
▪️data cleaning.
▪️data transformation.
▪️data visualization.
✨PLOTLY:
Plotly is a powerful and interactive plotting library in Python that allows you to create highly customizable and interactive visualizations. It offers a wide range of chart types, including line plots, scatter plots, bar charts, pie charts.
CUFFLINKS:
It connects plotly with pandas to create graphs and charts of dataframes directly.
Recommended by LinkedIn
Some of the plots which are done using plotly are:
💠The 6 jars of machine learning:
1.Data
2.Tasks
3.Models
4.Loss
5.Learning
6.Evaluation
✨SCIKIT-LEARN:
It is a open source manchine learning library.It provides a wide range of functionalities for data preprocessing,model training,and evaluation.With the help of scikit-learn(sklearn) we implemented
🔹MIN-MAX normalization :It is used to transform features into predefined range between 0,1.
🔹 Standard Scaling:It is also know as Z-score normalisation. It is widely used to scale numerical features. By subtracting mean and dividing the standard deviation. This is used when the features have different units or scales.
We have also learned the methods to convert categorical values into numerical values they are
1.LABEL ENCODING:label encoding involves assigning a unique numerical label to each category.transforming the data into numerical representation.
2.ONEHOTENCODER():USed to transform categorical values to binary vectors.It creates a new binary features for each unique category.
Further I have learnt about machine learning algorithms such as
1⃣Supervisied learning.
2⃣Unsupervised learning.
Supervisied Learning involves with labeled data where the desired outputs are known.It classified into two types
1.CLASSIFICATION:Classification focuses on predicting discrete or categorical outcomes.
2.REGRESSION:It is a technique aims to predict the continuous numerical values.One of the key algorithms in regression is linear regression,which unables linear relationship between input and the target.
To perform the linear regression we need to split the data into training,testing using train_test_split.We need to fit to the data using the function fit().Later we use predict() function to predict the values. To know the model is best r not we used
• R-Squared method:
•Mean-Squared error(MSE).
We have also learned about model representation i.e,Overfitting,Underfitting,Ideal fitting.
Model Representation refers to how we design and structure our machine learning models to capture the underlying patterns in the data.
♦OverFitting: Overfitting occurs when a model becomes overly and starts to memorize the training data instead of generaling patterns.
♦Underfitting:It happens when a model is too simple to capture the complexity of a data.it fails to learn underlying patterns.
♦Ideal fitting:An ideally fitted model captures the relevant patterns in the data without overemphasizing noise or oversimplifying the relationships. This allows it to perform well on both the training set and unseen data, making accurate predictions.
My two-week journey with Codegnan has been incredibly enlightening and rewarding. I gained a strong foundation in various aspects of machine learning and python basics.
Thanks to our mentor Saketh Kallepu sir for teaching us the concepts more efficiently. Excited to learn more in this journey of my internship in machine learning.
Interested
Good keep learning Codegnan will always be there to guide you
I'm intrested