Q&A 3 How do you evaluate models before deployment?
3.1 Explanation
Before deploying machine learning models, itβs important to evaluate their performance on unseen test data. This helps you:
- Compare models based on accuracy, precision, recall, and F1 score
- Select the best model(s) for deployment
- Detect overfitting or underfitting
- Create a summary table for documentation or reporting
In this Q&A, we load previously saved models from the models/ folder, evaluate them on test data, and store the results in a single CSV file: evaluation_summary.csv.
3.2 Python Code
# scripts/evaluate_models.py
import os
import joblib
import pandas as pd
from sklearn.metrics import accuracy_score, classification_report
from sklearn.model_selection import train_test_split
# Paths
MODEL_DIR = "models"
DATA_PATH = "data/titanic.csv"
OUTPUT_FILE = "data/evaluation_summary.csv"
# Load and preprocess Titanic data
df = pd.read_csv(DATA_PATH)
df = df.dropna(subset=["Age", "Fare", "Embarked", "Sex", "Survived"])
df["Sex"] = df["Sex"].astype("category").cat.codes
df["Embarked"] = df["Embarked"].astype("category").cat.codes
df["Survived"] = df["Survived"].astype(int)
features = ["Pclass", "Sex", "Age", "Fare", "Embarked"]
X = df[features]
y = df["Survived"]
# Train/test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Store results
results = []
# Evaluate all saved models
for filename in os.listdir(MODEL_DIR):
if filename.endswith(".joblib"):
model_path = os.path.join(MODEL_DIR, filename)
model = joblib.load(model_path)
model_name = filename.replace(".joblib", "")
y_pred = model.predict(X_test)
acc = accuracy_score(y_test, y_pred)
report = classification_report(y_test, y_pred, output_dict=True)
# Use macro avg for simplicity
precision = report["macro avg"]["precision"]
recall = report["macro avg"]["recall"]
f1 = report["macro avg"]["f1-score"]
results.append({
"Model": model_name,
"Accuracy": round(acc, 4),
"Precision": round(precision, 4),
"Recall": round(recall, 4),
"F1 Score": round(f1, 4)
})
# Save results to CSV
results_df = pd.DataFrame(results)
results_df.to_csv(OUTPUT_FILE, index=False)
print(f"\nβ
Evaluation summary saved to: {OUTPUT_FILE} see results below:\n")
print(results_df)β
Evaluation summary saved to: data/evaluation_summary.csv see results below:
Model Accuracy Precision Recall F1 Score
0 knn 0.6853 0.6841 0.6867 0.6838
1 svc 0.6364 0.6378 0.6109 0.6038
2 logistic_regression 0.7902 0.8057 0.7737 0.7784
3 gradient_boosting 0.7762 0.7858 0.7612 0.7652
4 random_forest 0.7832 0.7837 0.7742 0.7769
5 naive_bayes 0.7692 0.7734 0.7566 0.7600
6 decision_tree 0.6783 0.6746 0.6653 0.6664
3.3 R Code
# For a Python-based deployment workflow, use Python for evaluation.
# For R-based workflows, use caret::confusionMatrix() or metrics from modelr or yardstick.β Takeaway: Always evaluate your models and store the results before deployment. This ensures you deploy with confidence and clarity.