August 25th, 2023
Build an AI Model From Scratch
Learn how to take a popular data-set and turn it into a fully working AI model with explainability using Rezon Cloud!
Intro
In this tutorial, you'll learn how to build a machine learning model for diabetes prediction using the Pima Indian Diabetes dataset. We'll guide you through every step, from data preprocessing and model training to deployment using Rezon.
In a hurry? Watch this video instead:
1. Getting the Dataset
Acquiring Data
To start, we'll use the Pima Indian Diabetes dataset. You can download the dataset from this Kaggle link: Pima Indians Diabetes Database.
Dataset Overview
The dataset includes several medical variables such as the number of pregnancies, BMI, insulin level, glucose level, age, skin thickness, and blood pressure. Additionally, there's an 'Outcome' column that indicates whether the individual has diabetes.
2. Training the AI Model
Importing Libraries
The first cell imports all the necessary libraries for data manipulation, machine learning, and plotting:
import torch import torch.nn as nn import torch.optim as optim import numpy as np import matplotlib.pyplot as plt import pandas as pd from sklearn.preprocessing import MinMaxScaler from sklearn.model_selection import train_test_split
Preparing the Dataset
In this section, the Pima Indian Diabetes dataset is loaded into variables X
(features) and y
(labels). The dataset has 8 features and 1 output.
n_inputs = 8 n_outputs = 1 dataset = np.loadtxt('diabetes.csv', delimiter=',', skiprows=1) X = dataset[:,0:n_inputs] y = dataset[:,n_inputs]
Splitting and Scaling Data
The data is split into training and validation sets. Then, it's normalized using MinMaxScaler.
X_train, X_valid, y_train, y_valid = train_test_split(X, y, test_size=0.4, random_state=79) scaler = MinMaxScaler() scaler.fit(X_train) Xs_train = scaler.transform(X_train) Xs_valid = scaler.transform(X_valid)
Converting to PyTorch Tensors
The data is then converted to PyTorch tensors to be compatible with PyTorch models.
Xs_train = torch.tensor(Xs_train, dtype=torch.float32) Xs_valid = torch.tensor(Xs_valid, dtype=torch.float32) y_train = torch.tensor(y_train, dtype=torch.float32).reshape(-1, 1) y_valid = torch.tensor(y_valid, dtype=torch.float32).reshape(-1, 1)
Defining the Model
A neural network model is defined using PyTorch's Sequential API.
user_model = nn.Sequential( nn.Linear(8, 512), nn.ReLU(), nn.Linear(512, 256), nn.ReLU(), nn.Linear(256, 1), nn.Sigmoid() )
Training the Model
The model is trained using the Adam optimizer and Binary Cross Entropy loss function. The training is carried out for 100 epochs with a batch size of 100.
# Train: # Storage of losses: arr_losses_train = [] arr_losses_valid = [] # Loop: for epoch in range(n_epochs): for i in range(0, len(X), batch_size): # Training: Xs_train_batch = Xs_train[i:i+batch_size] y_train_batch_pred = user_model(Xs_train_batch) y_train_batch = y_train[i:i+batch_size] # Compare loss: loss = loss_fn(y_train_batch_pred, y_train_batch) # Get gradients: optimizer.zero_grad() # Loss and optimization: loss.backward() optimizer.step() # Get loss for validation: y_train_pred = user_model(Xs_train) y_valid_pred = user_model(Xs_valid) loss_train = loss_fn(y_train_pred, y_train) loss_valid = loss_fn(y_valid_pred, y_valid) # Store: arr_losses_train.append(loss_train.item()) arr_losses_valid.append(loss_valid.item()) # Print epoch info: if epoch%10 == 0: print(f'Epoch {epoch}: Train loss: {loss_train}. Valid loss: {loss_valid}')
After training, the model's performance is plotted to show the evolution of training and validation losses.
plt.plot(arr_losses_train, label='Training Loss') plt.plot(arr_losses_valid, label='Validation Loss') plt.xlabel("Epochs") plt.ylabel("Loss") plt.title('Evolution of the Loss') plt.legend()
The model's accuracy is also calculated.
# Compute accuracy (no_grad is optional) with torch.no_grad(): y_train_pred = user_model(Xs_train) y_valid_pred = user_model(Xs_valid) accuracy_train = (y_train_pred.round() == y_train).float().mean() accuracy_valid = (y_valid_pred.round() == y_valid).float().mean() print(f"Train Accuracy {accuracy_train}") print(f"Valid Accuracy {accuracy_valid}")
Finally, the trained model is saved.
torch.save(user_model, 'user_model.pt')
3. Signing Up for Rezon
Creating an Account
Visit Rezon and sign up if you haven't already. Upon signing up, you'll receive an API key. Take note of this as we'll use it in one of the next steps.
4. Installing Rezon CLI
Documentation
Refer to the installation documentation at Rezon Docs.
Installation
Run the commands specified in the documentation to install the CLI.
5. Deploying Your Model
Initialization
- Create a new folder for your project and move the trained model to a desired folder.
- Open a terminal and navigate to the newly created directory.
- Run
rezon init
to generate a config file.
Configuration
- Add the API key from the Rezon dashboard into your config file.
- Specify the model framework (in our case, PyTorch).
- Provide the absolute path to the trained model.
- Define names for your input data variables, like "Number of times pregnant," "BMI," etc.
Deployment
Run rezon deploy
from the CLI, specify a model name, and the model will be uploaded and dockerized on Rezon servers.
6. Running Predictions
Sample Input
You can send normalized input to your model through the CLI by using rezon run
. For example:
rezon run [0.8, 0.983, 0.890, 0.353, 0.000, 0.900, 0.734, 0.883]
Web App Interface
Alternatively, you can use the Rezon web app to run predictions and view explanations. The platform shows you the input-output contributions, helping you understand the factors affecting the predictions and detect bias.
Conclusion
With Rezon, the transition from developing your model to deploying it is seamless and comes with real-time explainability, auto-scaling, and observability features.
Thank you for following along, and have fun deploying your machine learning models with Rezon!