Activation Functions in Computer Vision

 

Tutorial: Activation Functions in Computer Vision

Tutorial: Activation Functions in Computer Vision

This tutorial explores the primary activation functions used in computer vision tasks, their mathematical formulations, properties, use cases, and differences. We'll include code fragments to demonstrate their implementation and visualize their behavior using a non-linear classification problem (moons dataset). The code uses Python with TensorFlow and NumPy for implementation and Matplotlib for visualization.

Table of Contents

1. Introduction to Activation Functions

Activation functions introduce non-linearity into neural networks, enabling them to model complex patterns like those found in images. In computer vision, activation functions are critical in convolutional neural networks (CNNs), helping to extract features (edges, textures) and make decisions (classification, detection). Each function has unique properties that affect training dynamics, convergence, and performance.

2. Common Activation Functions

Below, we describe the main activation functions, their mathematical definitions, properties, and typical applications in computer vision, along with code to visualize their behavior.

ReLU (Rectified Linear Unit)

Definition: f(x) = max(0, x)
Outputs the input directly if positive; otherwise, outputs zero.

Properties:

  • Range: [0, ∞)
  • Pros: Fast computation, reduces vanishing gradient problem, sparse activation.
  • Cons: "Dying ReLU" problem (neurons stuck at zero for negative inputs).
  • Use Cases: Hidden layers in CNNs (e.g., VGG, ResNet), feature extraction in image classification.

Code Fragment: Visualize ReLU function.

import numpy as np
import matplotlib.pyplot as plt

def relu(x):
    return np.maximum(0, x)

x = np.linspace(-10, 10, 100)
y = relu(x)

plt.figure(figsize=(6, 4))
plt.plot(x, y, label='ReLU')
plt.title('ReLU Activation Function')
plt.xlabel('x')
plt.ylabel('f(x)')
plt.grid(True)
plt.legend()
plt.savefig('relu_plot.png')

Example: In ResNet, ReLU is used after convolutional layers to introduce sparsity, helping detect high-level features like object shapes.

Sigmoid

Definition: f(x) = 1 / (1 + e^(-x))
Maps inputs to a range between 0 and 1.

Properties:

  • Range: (0, 1)
  • Pros: Interpretable as probabilities, smooth gradient.
  • Cons: Vanishing gradient for large positive/negative inputs, not zero-centered.
  • Use Cases: Output layers for binary classification, early CNNs.

Code Fragment: Visualize Sigmoid function.

def sigmoid(x):
    return 1 / (1 + np.exp(-x))

x = np.linspace(-10, 10, 100)
y = sigmoid(x)

plt.figure(figsize=(6, 4))
plt.plot(x, y, label='Sigmoid')
plt.title('Sigmoid Activation Function')
plt.xlabel('x')
plt.ylabel('f(x)')
plt.grid(True)
plt.legend()
plt.savefig('sigmoid_plot.png')

Example: In binary image classification (e.g., cat vs. dog), sigmoid is used in the output layer to produce a probability score.

Tanh

Definition: f(x) = tanh(x) = (e^x - e^(-x)) / (e^x + e^(-x))
Maps inputs to a range between -1 and 1.

Properties:

  • Range: (-1, 1)
  • Pros: Zero-centered, stronger gradients than sigmoid.
  • Cons: Still suffers from vanishing gradient for extreme inputs.
  • Use Cases: Early CNN layers, recurrent networks, feature normalization.

Code Fragment: Visualize Tanh function.

def tanh(x):
    return np.tanh(x)

x = np.linspace(-10, 10, 100)
y = tanh(x)

plt.figure(figsize=(6, 4))
plt.plot(x, y, label='Tanh')
plt.title('Tanh Activation Function')
plt.xlabel('x')
plt.ylabel('f(x)')
plt.grid(True)
plt.legend()
plt.savefig('tanh_plot.png')

Example: In older CNN architectures, tanh was used to normalize feature maps before passing them to deeper layers.

ELU (Exponential Linear Unit)

Definition: f(x) = x if x > 0 else α(e^x - 1)
where α is a hyperparameter (typically 1.0).

Properties:

  • Range: (-α, ∞)
  • Pros: Allows negative outputs, reduces vanishing gradient, smooth for negative inputs.
  • Cons: Slower computation due to exponential, hyperparameter tuning for α.
  • Use Cases: Modern CNNs, GANs, image generation.

Code Fragment: Visualize ELU function.

def elu(x, alpha=1.0):
    return np.where(x > 0, x, alpha * (np.exp(x) - 1))

x = np.linspace(-10, 10, 100)
y = elu(x)

plt.figure(figsize=(6, 4))
plt.plot(x, y, label='ELU')
plt.title('ELU Activation Function')
plt.xlabel('x')
plt.ylabel('f(x)')
plt.grid(True)
plt.legend()
plt.savefig('elu_plot.png')

Example: In GANs, ELU helps stabilize training by allowing negative activations, improving generated image quality.

3. Implementation and Comparison

We’ll implement a neural network to solve a non-linear classification problem (moons dataset) using each activation function. The moons dataset creates two interleaving half-circles, requiring a non-linear decision boundary. We’ll compare the performance of ReLU, Sigmoid, Tanh, and ELU.

Code Fragment: Full implementation with training and comparison.

import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import make_moons
from sklearn.model_selection import train_test_split
from tensorflow import keras
from tensorflow.keras import layers

# Generate moons dataset
X, y = make_moons(n_samples=1000, noise=0.1, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Function to create and train model
def create_and_train_model(activation, X_train, y_train, X_test, y_test):
    model = keras.Sequential([
        layers.Dense(16, activation=activation, input_shape=(2,)),
        layers.Dense(16, activation=activation),
        layers.Dense(1, activation='sigmoid')  # Sigmoid for binary classification
    ])
    
    model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
    history = model.fit(X_train, y_train, epochs=50, batch_size=32, 
                       validation_data=(X_test, y_test), verbose=0)
    return history

# Train models with different activation functions
activations = ['relu', 'sigmoid', 'tanh', 'elu']
histories = {}

for activation in activations:
    print(f"Training model with {activation} activation...")
    histories[activation] = create_and_train_model(activation, X_train, y_train, X_test, y_test)

# Plot validation accuracy
plt.figure(figsize=(12, 8))
for activation in activations:
    plt.plot(histories[activation].history['val_accuracy'], label=f'{activation} (val)')
    
plt.title('Validation Accuracy for Different Activation Functions')
plt.xlabel('Epoch')
plt.ylabel('Validation Accuracy')
plt.legend()
plt.grid(True)
plt.savefig('activation_comparison.png')

# Plot decision boundary for ReLU model (example)
def plot_decision_boundary(model, X, y, activation):
    x_min, x_max = X[:, 0].min() - 0.5, X[:, 0].max() + 0.5
    y_min, y_max = X[:, 1].min() - 0.5, X[:, 1].max() + 0.5
    xx, yy = np.meshgrid(np.arange(x_min, x_max, 0.01), np.arange(y_min, y_max, 0.01))
    Z = model.predict(np.c_[xx.ravel(), yy.ravel()])
    Z = (Z > 0.5).astype(int).reshape(xx.shape)
    
    plt.figure(figsize=(8, 6))
    plt.contourf(xx, yy, Z, cmap=plt.cm.RdBu, alpha=0.3)
    plt.scatter(X[:, 0], X[:, 1], c=y, cmap=plt.cm.RdBu, edgecolors='k')
    plt.title(f'Decision Boundary with {activation} Activation')
    plt.xlabel('Feature 1')
    plt.ylabel('Feature 2')
    plt.savefig(f'decision_boundary_{activation}.png')

# Example: Plot decision boundary for ReLU model
relu_model = keras.Sequential([
    layers.Dense(16, activation='relu', input_shape=(2,)),
    layers.Dense(16, activation='relu'),
    layers.Dense(1, activation='sigmoid')
])
relu_model.compile(optimizer='adam', loss='binary_crossentropy')
relu_model.fit(X_train, y_train, epochs=50, batch_size=32, verbose=0)
plot_decision_boundary(relu_model, X, y, 'ReLU')

Explanation:

  • The code generates a moons dataset with 1000 samples.
  • A neural network with two hidden layers (16 neurons each) is trained for each activation function.
  • The output layer uses sigmoid for binary classification.
  • Validation accuracy is plotted to compare performance.
  • A decision boundary is visualized for the ReLU model to show how it separates the classes.

4. Differences and Use Cases

Activation Range Zero-Centered Gradient Behavior Computation Best For
ReLU [0, ∞) No No vanishing gradient Fast CNN hidden layers, feature extraction
Sigmoid (0, 1) No Vanishing gradient Moderate Binary classification output
Tanh (-1, 1) Yes Vanishing gradient Moderate Early CNN layers, normalization
ELU (-α, ∞) Partially Reduced vanishing Slower Modern CNNs, GANs

Key Differences:

  • Gradient Behavior: ReLU and ELU mitigate vanishing gradients better than Sigmoid and Tanh, making them suitable for deep networks.
  • Output Range: Sigmoid and Tanh are bounded, useful for specific tasks (e.g., probabilities, normalization), while ReLU and ELU allow unbounded positive outputs.
  • Zero-Centered: Tanh is zero-centered, aiding optimization in some cases, while ReLU and Sigmoid are not.
  • Computation: ReLU is the fastest (simple thresholding), while ELU is slower due to the exponential operation.

Use Case Examples:

  • ReLU: Used in ResNet for image classification, as it enables training of very deep networks.
  • Sigmoid: Used in logistic regression-style outputs for binary image classification (e.g., detecting defects in manufacturing).
  • Tanh: Used in older architectures or when feature normalization is needed (e.g., in LSTMs for video analysis).
  • ELU: Used in GANs for generating high-quality images, as it stabilizes training with negative activations.

5. Conclusion

Activation functions are essential for enabling neural networks to solve complex computer vision tasks. ReLU is the default choice for most CNNs due to its simplicity and effectiveness. Sigmoid is ideal for binary outputs, Tanh for normalization, and ELU for advanced tasks like GANs. The choice depends on the task, network depth, and computational constraints. Experimentation (as shown in the code) helps determine the best function for a specific problem.

Run the Code:
To run the code fragments, ensure you have Python with TensorFlow, NumPy, Matplotlib, and Scikit-learn installed. The code generates plots (relu_plot.png, sigmoid_plot.png, etc.) and a comparison plot (activation_comparison.png) to visualize the results.

Comments

Popular posts from this blog

Building and Deploying a Recommender System on Kubeflow with KServe

CrewAI vs LangGraph: A Simple Guide to Multi-Agent Frameworks

Tutorial: Building Login and Sign-Up Pages with React, FastAPI, and XAMPP (MySQL)