How to Use MLflow for Machine Learning Experiments

How to Use MLflow for Machine Learning Experiments

MLflow is an open-source platform designed to manage the entire machine learning lifecycle. It helps researchers and engineers track experiments, package models, manage versions, and deploy models efficiently.

If you train multiple models, perform hyperparameter tuning, or compare architectures, MLflow becomes extremely valuable because it allows you to log and visualize experiments automatically.

1. What MLflow Is Used For

MLflow contains four major components:

Component Purpose
Tracking Log experiments, metrics, parameters, and artifacts
Projects Package ML code for reproducible execution
Models Standard format to store and share ML models
Model Registry Manage model versions and deployment stages

Most users start with MLflow Tracking because it is the easiest way to monitor experiments.

2. Installing MLflow

pip install mlflow

To start the MLflow user interface:

mlflow ui

Then open the dashboard in your browser:

http://localhost:5000

The UI provides a visual interface for comparing different experiment runs.

3. Basic MLflow Example

Below is a minimal example demonstrating how to track a machine learning experiment.

import mlflow
import mlflow.sklearn

from sklearn.linear_model import LogisticRegression
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score


X, y = load_iris(return_X_y=True)

X_train, X_test, y_train, y_test = train_test_split(X, y)

with mlflow.start_run():

    model = LogisticRegression()
    model.fit(X_train, y_train)

    preds = model.predict(X_test)

    acc = accuracy_score(y_test, preds)

    mlflow.log_param("model", "LogisticRegression")

    mlflow.log_metric("accuracy", acc)

    mlflow.sklearn.log_model(model, "model")

MLflow will automatically record:

  • Model parameters
  • Evaluation metrics
  • Saved model artifacts
  • Execution metadata

4. What Appears in the MLflow Interface

Each experiment run stores several types of information:

Parameters


model = LogisticRegression
max_iter = 100

Metrics


accuracy = 0.94
loss = 0.32

Artifacts


model.pkl
plots
datasets
logs

5. Logging Additional Artifacts

You can log many other outputs generated during training.

Saving plots

mlflow.log_artifact("confusion_matrix.png")

Logging hyperparameters


mlflow.log_param("learning_rate", 0.001)
mlflow.log_param("batch_size", 32)

Logging multiple metrics


mlflow.log_metric("precision", precision)
mlflow.log_metric("recall", recall)

6. Organizing Experiments

You can group experiments into logical categories.

mlflow.set_experiment("CTR_prediction_models")

Each training execution will then appear as a new run within this experiment.

7. Loading a Saved Model

Models stored in MLflow can be loaded easily:

model = mlflow.sklearn.load_model("runs:/RUN_ID/model")

8. Model Registry

MLflow also allows you to register and version models.


mlflow.register_model(
    "runs:/RUN_ID/model",
    "CTRModel"
)

This creates versions such as:


CTRModel
  ├── Version 1
  ├── Version 2
  └── Version 3

Each version can move between stages:

  • Staging
  • Production
  • Archived

9. Typical MLflow Workflow


dataset
   ↓
train model
   ↓
log parameters
log metrics
log artifacts
   ↓
compare runs in MLflow UI
   ↓
register best model
   ↓
deploy model

10. Hyperparameter Search Example

MLflow is especially helpful when running multiple experiments.


for lr in [0.01, 0.001, 0.0001]:

    with mlflow.start_run():

        model = MyModel(lr=lr)

        acc = train(model)

        mlflow.log_param("learning_rate", lr)

        mlflow.log_metric("accuracy", acc)

The MLflow interface will automatically display all runs side-by-side so that you can identify the best configuration.

11. MLflow with Deep Learning

MLflow integrates with many ML frameworks including:

  • TensorFlow
  • PyTorch
  • Scikit-learn
  • XGBoost

Example for PyTorch:

mlflow.pytorch.log_model(model, "model")

12. MLflow with Pipelines

MLflow becomes even more powerful when integrated into machine learning pipelines.


Step 1: preprocess data
Step 2: train model
Step 3: evaluate model
Step 4: log results to MLflow

Tools such as Kubeflow or Airflow can automate this workflow.

13. Typical Project Structure


project/
│
├── train.py
├── MLproject
├── conda.yaml
├── data/
├── models/
└── mlruns/

The mlruns folder contains all experiment logs.

14. When MLflow Is Most Useful

MLflow becomes extremely useful when you:

  • compare multiple machine learning models
  • perform hyperparameter tuning
  • track experiments over time
  • store models with version control
  • deploy models in production environments

Build a Complete MLflow Pipeline in Just ~40 Lines

One of the most interesting aspects of MLflow is how little code is required to build a fully tracked machine learning workflow. With a small script, it is possible to create a pipeline that automatically logs experiments, tracks performance, and allows visual comparison between different model configurations.

In a compact implementation (around 40 lines of Python), you can design a workflow that performs the entire lifecycle of a typical machine learning experiment. Such a pipeline usually includes several key stages:

  • Dataset loading
    The pipeline begins by loading and preparing a dataset. This step may include reading data from files, performing basic preprocessing, and splitting the dataset into training and testing subsets.
  • Model training
    A machine learning model is then trained on the prepared data. This could be a classical algorithm such as logistic regression or a more advanced deep learning model depending on the problem you are solving.
  • Hyperparameter search
    Instead of training only a single configuration, the pipeline can loop over multiple hyperparameter values (for example learning rates or model depths). Each configuration creates a new MLflow run, making it easy to explore the impact of different settings.
  • Automatic experiment logging
    During each run, MLflow automatically records parameters, evaluation metrics, and model artifacts. This removes the need to manually store results in spreadsheets or external logs.
  • Experiment comparison
    Finally, all runs appear inside the MLflow dashboard. From there you can compare metrics, sort experiments by performance, and quickly identify the best model configuration.

This type of lightweight pipeline is extremely useful for experimentation, research, and rapid prototyping. Instead of manually managing results, MLflow provides a structured and visual way to analyze your experiments.

In a future post, we will walk through the implementation of such a pipeline step by step and show how to build a complete MLflow experiment workflow in just a few lines of code.

Comments

Popular posts from this blog

Automate Blog Content Creation with n8n and Grok 3 API

DAX: The Complete Guide

Hello world !