Neural Network and Deep Learning (Part 2) - Python Code
Overview
In the first part of my neural network series, I provided an overview of neural networks. In this part, I will explain how to implement these concepts using Python. We'll leverage the TensorFlow module, which provides mathematical functions to process each neuron, and the sklearn module, which helps normalize data and evaluate the results produced by the neural network.
Background
In this example, I use air particle data from an area to demonstrate the process. The goal is to analyze the data over time, classify it based on CO (carbon monoxide) levels, and predict NOx (Nitrogen Oxides) concentrations.
Two key aspects of this analysis involve classification and prediction:
This is an example of the data we currently have.
Classification Task
Now I will explain how to do the classification task.
1) Import the necessary modules
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.preprocessing import MinMaxScaler
from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix, accuracy_score, precision_score
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.regularizers import l2
2) Load the Data and Correct Incorrect Data for Further Processing
# Load the dataset
file_path = 'ClassPred.xlsx' # Change this to the correct file path
data = pd.read_excel(file_path, engine='openpyxl')
# Replace -1 values with NaN for easier manipulation
data.replace(-1, np.nan, inplace=True)
# Calculate the mean of each column (ignoring NaN values)
column_means = data.mean()
# Save the means to a file
dump(column_means, 'column_means.pkl')
# Replace NaN values with the mean of each column
data.fillna(column_means, inplace=True)
# Remove any non-numeric columns, including date and time
data = data.select_dtypes(include=[np.number])
3) Prepare a threshold. To keep it simple, in this example the threshold used is the mean of the CO values in the data.
# Calculate the mean of CO concentrations
co_mean = data['CO(GT)'].mean()
print(f"Mean CO concentration for thresholding: {co_mean}")
# Binary classification threshold (adjustable as needed)
threshold = co_mean
data['CO_high'] = (data['CO(GT)'] > threshold).astype(int)
4) Drop CO from the data so that it is not included as a parameter that affects the CO value. The CO concentration value is influenced by the concentration of other air particles.
# Drop 'CO(GT)' from the features
data = data.drop(columns=['CO(GT)'])
5) We then divide the data into two parts: training and testing datasets. The training data is used to train the neural network, while the testing data is used to evaluate whether the neural network's output, based on the training data, is accurate enough for classification. Typically, the data is split in an 80:20 ratio, but a 90:10 split can also be used, depending on the specific requirements of the task.
# Split the data into training and testing sets
X = data.drop('CO_high', axis=1) # Features
y = data['CO_high'] # Target variable
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42) # 20% test, 80% train
6) Next, we perform normalization on the data. Normalized data refers to numeric data that has been standardized and cleaned. Normalization is essential for reducing bias in the data and ensuring that all features are on a similar scale, which improves the performance and stability of the neural network.
# Normalization of the dataset (only numeric columns)
numeric_cols = X_train.columns # Select only input feature columns
scaler = MinMaxScaler()
scaler.fit(X_train[numeric_cols]) # Fit scaler to training input features only
X_train[numeric_cols] = scaler.transform(X_train[numeric_cols]) # Transform the training input features
X_test[numeric_cols] = scaler.transform(X_test[numeric_cols]) # Transform the test input features
7) With the data now fully cleaned, we arrive at the most exciting and crucial part of this article: training the neural network.
# Define the neural network model with an additional layer
model = Sequential()
model.add(Dense(16, input_dim=X_train.shape[1], activation='relu', kernel_regularizer=l2(0.01)))
model.add(Dense(1, activation='sigmoid'))
These code can be explain as:
model = Sequential()
This creates a sequential layered model. Can add more layers of neurons (nodes) within the sequential model.
model.add(Dense(16, input_dim=X_train.shape[1], activation='relu', kernel_regularizer=l2(0.01)))
This step adds a fully connected (dense) layer with 16 neurons (nodes). Each neuron connects to every input feature or neuron from the previous layer (as we see in the first article, all nodes are fully connected). ReLU (Rectified Linear Unit) is used as an activation function to the layer’s output. ReLU introduces non-linearity (not black-or-white scoring) and usually a good activation function for the first hidden layer.
kernel_regularizer=l2(0.01)
This code adds L2 regularization to the weights of this layer, penalizing large weight values to help prevent overfitting. Overfitting occurs when the neural network model learns not only the underlying patterns in the training data but also the noise and specific details that are irrelevant.
model.add(Dense(1, activation='sigmoid'))
This step adds the output layer to the model, which uses sigmoid as its activation function. Sigmoid output 0 or 1, simplifies output and is suitable for classification task.
8) Now that we have defined the various parameters suitable for the neural network model, we can proceed to initialize and run our neural network.
# Instantiate the optimizer with a custom learning rate
custom_lr = 0.001 # Example: Setting the learning rate to 0.001
optimizer = Adam(learning_rate=custom_lr)
# Compile the model
model.compile(loss='binary_crossentropy', optimizer=optimizer, metrics=['accuracy'])
# Train the model
history = model.fit(X_train, y_train, epochs=100, batch_size=32, validation_split=0.2)
These code can be explain as:
Recommended by LinkedIn
custom_lr = 0.001 # Example: Setting the learning rate to 0.0005
optimizer = Adam(learning_rate=custom_lr)
The learning rate determines how quickly the neural network model learns from the results it produces. A value of 1 indicates that the old values are completely replaced with the new ones. It is generally recommended for developers to start with a small learning rate, observe the performance of the neural network, and gradually increase it as confidence in the model grows. An alternative approach is to use the Adam Optimizer, which automatically adjusts the learning rate. The Adam Optimizer increases the learning rate as the model stabilizes, making it a popular and effective choice for training neural networks.
# Compile the model
model.compile(loss='binary_crossentropy', optimizer=optimizer, metrics=['accuracy'])
The model is then compiled using the binary cross-entropy loss function. The binary cross-entropy function is used because it is well-suited for binary classification tasks. A loss function itself is a mathematical tool that measures the difference between the predicted output of a machine learning model and the actual target values (a 20% split from training data, explained in the next code snippet). In this case, the binary cross-entropy loss function is used to evaluate the model's performance. It provides feedback to the model during training, guiding the optimization process to improve its predictions over time.
# Train the model
history = model.fit(X_train, y_train, epochs=100, batch_size=32, validation_split=0.2)
The model.fit method is used to train the neural network by iteratively passing the training data (x_train, y_train) through the model and adjusting its weights to minimize the loss function. Here's a breakdown of its key components:
This method allows the model to learn progressively and improve its predictions by minimizing the difference between predicted and actual values.
9) After training, we will evaluate the results produced by the model.
# Evaluate the model
_, accuracy = model.evaluate(X_test, y_test)
y_pred = (model.predict(X_test) > 0.5).astype(int) # Predict the class labels
# Confusion matrix
conf_matrix = confusion_matrix(y_test, y_pred)
print("Confusion Matrix:\n", conf_matrix)
# Plotting the training and validation loss
plt.figure(figsize=(6, 4))
plt.plot(history.history['loss'], label='Training Loss')
plt.plot(history.history['val_loss'], label='Validation Loss')
plt.title('Loss Plot for the Classification Task')
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.legend()
plt.show()
# Plotting the training and validation accuracy
plt.figure(figsize=(6, 4))
plt.plot(history.history['accuracy'], label='Training Accuracy')
plt.plot(history.history['val_accuracy'], label='Validation Accuracy')
plt.title('Accuracy Plot for the Classification Task')
plt.xlabel('Epoch')
plt.ylabel('Accuracy')
plt.legend()
plt.show()
These code can be explain as:
_, accuracy = model.evaluate(X_test, y_test)
Once the model is trained, its performance is evaluated by comparing the predictions to the test dataset. This step calculates the loss and measures accuracy based on the test dataset, providing insights into how well the model generalizes to unseen data.
y_pred = (model.predict(X_test) > 0.5).astype(int) # Predict the class labels
This code predicts the probabilities for the positive class (ranging between 0 and 1) for each sample in x_test. In the context of binary classification, probabilities greater than 0.5 are converted to 1 (positive class), while those 0.5 or below are converted to 0 (negative class).
The remaining code evaluates the model using a confusion matrix, accuracy, precision, and visualizations such as the loss and validation plots.
Classification Task Results
This is the result of a neural network model that has been trained on test data.
The Confusion matrix shows:
Which can be interpreted as:
Accuracy: 86.96% of all predictions (both positive and negative) are correct.
Precision: Of all the samples predicted as positive, 82.21% are actually positive.
For most tasks, these results are sufficient and ready to be presented.
Loss plot and accuracy plot show how the model learns over time and can also indicate whether the model has learned well enough and generalized the data well enough.
Loss Plot:
Both training and validation loss decrease steadily and stabilize as epochs progress, showing that the model has converged.
Accuracy Plot:
Both training and validation accuracy plateau at around 87%, suggesting strong predictive performance.
In general, this is the basic process of creating a neural network model for classification. In a future post, I will explain how to create a similar model for predicting continuous values. Both classification and prediction are highly versatile and powerful techniques applicable to a wide range of tasks.
This is a fascinating continuation of your series! The practical applications you're discussing are essential for anyone looking to deepen their understanding of neural networks. It will be interesting to see how the implementation with TensorFlow and sklearn unfolds. What specific challenges do you anticipate when working with these tools?