Q&A 2 How do you train and save multiple models for deployment?

2.1 Explanation

Once your dataset is loaded and preprocessed, the next step in the deployment pipeline is to train machine learning models and save them for reuse. Saving models allows you to:

Avoid retraining every time the API is restarted
Load models instantly in production
Maintain version control and reproducibility

In this example, we’ll use the Titanic dataset and train multiple classification models. We’ll then save each model as a .joblib file into a models/ folder for future deployment.

2.2 Python Code

# scripts/train_n_save_models.py
import os
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier
from sklearn.svm import SVC
from sklearn.tree import DecisionTreeClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.naive_bayes import GaussianNB
import joblib

# Load and preprocess dataset
df = pd.read_csv("data/titanic.csv")
df.dropna(subset=["Age", "Fare", "Embarked", "Sex", "Survived"], inplace=True)
df["Sex"] = df["Sex"].astype("category").cat.codes
df["Embarked"] = df["Embarked"].astype("category").cat.codes

X = df[["Pclass", "Sex", "Age", "Fare", "Embarked"]]
y = df["Survived"]

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Define models to train
models = {
    "logistic_regression": LogisticRegression(max_iter=200),
    "random_forest": RandomForestClassifier(),
    "gradient_boosting": GradientBoostingClassifier(),
    "svc": SVC(probability=True),
    "decision_tree": DecisionTreeClassifier(),
    "knn": KNeighborsClassifier(),
    "naive_bayes": GaussianNB()
}

# Ensure models directory exists
os.makedirs("models", exist_ok=True)

# Train and save each model
for name, model in models.items():
    model.fit(X_train, y_train)
    joblib.dump(model, f"models/{name}.joblib")
    print(f"✅ Saved: models/{name}.joblib")

✅ Saved: models/logistic_regression.joblib
✅ Saved: models/random_forest.joblib
✅ Saved: models/gradient_boosting.joblib
✅ Saved: models/svc.joblib
✅ Saved: models/decision_tree.joblib
✅ Saved: models/knn.joblib
✅ Saved: models/naive_bayes.joblib

2.3 R Code

# R version not included in this example as the deployment focus uses joblib (.joblib) in Python.
# Alternative: Save R models using saveRDS() if needed for Shiny APIs.

✅ Takeaway: Save each trained model in a dedicated models/ folder using a consistent naming scheme. This enables fast, reliable deployment via your API.