Kubeflow Churn Prediction Pipeline

Building a Churn Prediction ML Pipeline with Kubeflow

This guide walks you through creating and deploying a machine learning pipeline for churn prediction using Kubeflow Pipelines. We will write our components in Python, package them in Docker containers, deploy them to a Kubernetes cluster, and run and monitor the pipeline using the command line.

Step 1: Define the Project Structure

project_root/
├── components/
│   ├── preprocess/
│   │   └── preprocess.py
│   ├── train/
│   │   └── train.py
│   └── evaluate/
│       └── evaluate.py
├── pipeline.py
└── Dockerfile (for each component)

Step 2: Write Component Scripts

preprocess.py

def preprocess():
    print("Preprocessing data...")
    with open("/data/processed_data.txt", "w") as f:
        f.write("cleaned_data")

if __name__ == '__main__':
    preprocess()

train.py

def train():
    print("Training model...")
    with open("/model/model.txt", "w") as f:
        f.write("trained_model")

if __name__ == '__main__':
    train()

evaluate.py

def evaluate():
    print("Evaluating model...")
    with open("/metrics/metrics.txt", "w") as f:
        f.write("accuracy: 95%")

if __name__ == '__main__':
    evaluate()

Step 3: Dockerize Each Component

# Dockerfile for preprocess component
FROM python:3.9-slim
WORKDIR /app
COPY preprocess.py .
RUN mkdir /data
ENTRYPOINT ["python", "preprocess.py"]

Build and push each image:

docker build -t gcr.io/YOUR_PROJECT/preprocess:latest .
docker push gcr.io/YOUR_PROJECT/preprocess:latest

Step 4: Define Your Pipeline in pipeline.py

import kfp
import kfp.dsl as dsl

def churn_pipeline():
    preprocess = dsl.ContainerOp(
        name='Preprocess',
        image='gcr.io/YOUR_PROJECT/preprocess:latest',
        file_outputs={'processed_data': '/data/processed_data.txt'}
    )

    train = dsl.ContainerOp(
        name='Train',
        image='gcr.io/YOUR_PROJECT/train:latest',
        file_outputs={'model': '/model/model.txt'}
    ).after(preprocess)

    evaluate = dsl.ContainerOp(
        name='Evaluate',
        image='gcr.io/YOUR_PROJECT/evaluate:latest',
        file_outputs={'metrics': '/metrics/metrics.txt'}
    ).after(train)

if __name__ == '__main__':
    kfp.compiler.Compiler().compile(churn_pipeline, 'churn_pipeline.yaml')

Step 5: Deploy to Kubernetes Cluster

Make sure your Kubernetes cluster with Kubeflow installed is running. Push your Docker images to a container registry accessible by the cluster (e.g., Google Container Registry).

Step 6: Upload and Trigger Pipeline via CLI

Install the Kubeflow SDK: pip install kfp
Compile the pipeline YAML:

python pipeline.py

Connect to your Kubeflow Pipelines endpoint:

import kfp
client = kfp.Client(host='http://YOUR_KUBEFLOW_ENDPOINT')

Create an experiment and run the pipeline:

experiment = client.create_experiment(name='Churn-Prediction')

run = client.run_pipeline(
    experiment_id=experiment.id,
    job_name='churn-pipeline-run',
    pipeline_package_path='churn_pipeline.yaml'
)
print(f"Pipeline run ID: {run.id}")

Step 7: Monitor and Track the Pipeline

Open your Kubeflow UI in the browser.
Navigate to the Pipelines section.
Click on the experiment Churn-Prediction and select the run named churn-pipeline-run.
Track the status of each component (Preprocess, Train, Evaluate).

Step 8: Check Output Artifacts and Results

From the UI, click each component step to view logs and output paths.
You can download files such as:

/data/processed_data.txt – cleaned data
/model/model.txt – serialized model
/metrics/metrics.txt – model evaluation result

These files are available in the output artifacts panel of each component run.

Next Steps

Automate trigger with Argo Events or scheduled runs.
Persist outputs with PVC volumes.
Add monitoring, dashboards, or notifications.
Extend pipeline with model deployment or explainability.

Search This Blog

8-Chems

Building an ML Pipeline with Kubeflow

Building a Churn Prediction ML Pipeline with Kubeflow

Step 1: Define the Project Structure

Step 2: Write Component Scripts

preprocess.py

train.py

evaluate.py

Step 3: Dockerize Each Component

Step 4: Define Your Pipeline in pipeline.py

Step 5: Deploy to Kubernetes Cluster

Step 6: Upload and Trigger Pipeline via CLI

Step 7: Monitor and Track the Pipeline

Step 8: Check Output Artifacts and Results

Next Steps

Comments

Post a Comment

Popular posts from this blog

Automate Blog Content Creation with n8n and Grok 3 API

Neuro-Symbolic Integration: Enhancing LLMs with Knowledge Graphs

Understanding and Using the Generalized Pareto Distribution (GPD)

Understanding and Using the Generalized Pareto Distribution (GPD)

Best Free Platforms for Data Scientists and ML Practitioners — Hugging Face, Kaggle, Colab & SageMaker Lab

Automate Blog Content Creation with n8n and Grok 3 API