August 25th, 2023

Build an AI Model From Scratch

Learn how to take a popular data-set and turn it into a fully working AI model with explainability using Rezon Cloud!

Waldo

Intro

In this tutorial, you'll learn how to build a machine learning model for diabetes prediction using the Pima Indian Diabetes dataset. We'll guide you through every step, from data preprocessing and model training to deployment using Rezon.

In a hurry? Watch this video instead:

1. Getting the Dataset

Acquiring Data

To start, we'll use the Pima Indian Diabetes dataset. You can download the dataset from this Kaggle link: Pima Indians Diabetes Database.

Dataset Overview

The dataset includes several medical variables such as the number of pregnancies, BMI, insulin level, glucose level, age, skin thickness, and blood pressure. Additionally, there's an 'Outcome' column that indicates whether the individual has diabetes.

2. Training the AI Model

Importing Libraries

The first cell imports all the necessary libraries for data manipulation, machine learning, and plotting:

import torch
import torch.nn as nn
import torch.optim as optim
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from sklearn.preprocessing import MinMaxScaler
from sklearn.model_selection import train_test_split

Preparing the Dataset

In this section, the Pima Indian Diabetes dataset is loaded into variables X (features) and y (labels). The dataset has 8 features and 1 output.

n_inputs = 8
n_outputs = 1
dataset = np.loadtxt('diabetes.csv', delimiter=',', skiprows=1)
X = dataset[:,0:n_inputs]
y = dataset[:,n_inputs]

Splitting and Scaling Data

The data is split into training and validation sets. Then, it's normalized using MinMaxScaler.

X_train, X_valid, y_train, y_valid = train_test_split(X, y, test_size=0.4, random_state=79)
scaler = MinMaxScaler()
scaler.fit(X_train)
Xs_train = scaler.transform(X_train)
Xs_valid = scaler.transform(X_valid)

Converting to PyTorch Tensors

The data is then converted to PyTorch tensors to be compatible with PyTorch models.

Xs_train = torch.tensor(Xs_train, dtype=torch.float32)
Xs_valid = torch.tensor(Xs_valid, dtype=torch.float32)
y_train = torch.tensor(y_train, dtype=torch.float32).reshape(-1, 1)
y_valid = torch.tensor(y_valid, dtype=torch.float32).reshape(-1, 1)

Defining the Model

A neural network model is defined using PyTorch's Sequential API.

user_model = nn.Sequential(
    nn.Linear(8, 512),
    nn.ReLU(),
    nn.Linear(512, 256),
    nn.ReLU(),
    nn.Linear(256, 1),
    nn.Sigmoid()
)

Training the Model

The model is trained using the Adam optimizer and Binary Cross Entropy loss function. The training is carried out for 100 epochs with a batch size of 100.

# Train:
# Storage of losses:
arr_losses_train = []
arr_losses_valid = []

# Loop:

for epoch in range(n_epochs):
for i in range(0, len(X), batch_size): # Training:
Xs_train_batch = Xs_train[i:i+batch_size]
y_train_batch_pred = user_model(Xs_train_batch)
y_train_batch = y_train[i:i+batch_size] # Compare loss:
loss = loss_fn(y_train_batch_pred, y_train_batch) # Get gradients:
optimizer.zero_grad() # Loss and optimization:
loss.backward()
optimizer.step() # Get loss for validation:
y_train_pred = user_model(Xs_train)
y_valid_pred = user_model(Xs_valid)
loss_train = loss_fn(y_train_pred, y_train)
loss_valid = loss_fn(y_valid_pred, y_valid) # Store:
arr_losses_train.append(loss_train.item())
arr_losses_valid.append(loss_valid.item()) # Print epoch info:
if epoch%10 == 0:
print(f'Epoch {epoch}: Train loss: {loss_train}. Valid loss: {loss_valid}')

After training, the model's performance is plotted to show the evolution of training and validation losses.

plt.plot(arr_losses_train, label='Training Loss')
plt.plot(arr_losses_valid, label='Validation Loss')

plt.xlabel("Epochs")
plt.ylabel("Loss")
plt.title('Evolution of the Loss')
plt.legend()

The model's accuracy is also calculated.

# Compute accuracy (no_grad is optional)
with torch.no_grad():
    y_train_pred = user_model(Xs_train)
    y_valid_pred = user_model(Xs_valid)
accuracy_train = (y_train_pred.round() == y_train).float().mean()
accuracy_valid = (y_valid_pred.round() == y_valid).float().mean()
print(f"Train Accuracy {accuracy_train}")
print(f"Valid Accuracy {accuracy_valid}")

Finally, the trained model is saved.

torch.save(user_model, 'user_model.pt')

3. Signing Up for Rezon

Creating an Account

Visit Rezon and sign up if you haven't already. Upon signing up, you'll receive an API key. Take note of this as we'll use it in one of the next steps.

4. Installing Rezon CLI

Documentation

Refer to the installation documentation at Rezon Docs.

Installation

Run the commands specified in the documentation to install the CLI.

5. Deploying Your Model

Initialization

Create a new folder for your project and move the trained model to a desired folder.
Open a terminal and navigate to the newly created directory.
Run rezon init to generate a config file.

Configuration

Add the API key from the Rezon dashboard into your config file.
Specify the model framework (in our case, PyTorch).
Provide the absolute path to the trained model.
Define names for your input data variables, like "Number of times pregnant," "BMI," etc.

Deployment

Run rezon deploy from the CLI, specify a model name, and the model will be uploaded and dockerized on Rezon servers.

6. Running Predictions

Sample Input

You can send normalized input to your model through the CLI by using rezon run. For example:

rezon run [0.8, 0.983, 0.890, 0.353, 0.000, 0.900, 0.734, 0.883]

Web App Interface

Alternatively, you can use the Rezon web app to run predictions and view explanations. The platform shows you the input-output contributions, helping you understand the factors affecting the predictions and detect bias.

Conclusion

With Rezon, the transition from developing your model to deploying it is seamless and comes with real-time explainability, auto-scaling, and observability features.

Thank you for following along, and have fun deploying your machine learning models with Rezon!