Understanding and Using the Generalized Pareto Distribution (GPD)
The Generalized Pareto Distribution (GPD) is a probability distribution used in Extreme Value Theory to model values that exceed a certain high threshold. It is widely used in finance, insurance, hydrology, and environmental science.
📘 What is the GPD?
The GPD models the distribution of excess values over a threshold. That is, if we set a threshold u
, the GPD models the distribution of X − u | X > u
.
🔣 Probability Density Function (PDF)
f(x) = (1 / σ) * (1 + ξ * x / σ)^(-1/ξ - 1)
ξ
: Shape parameter (controls the heaviness of the tail)σ
: Scale parameter (spread)- Support:
x ≥ 0
ifξ ≥ 0
;0 ≤ x ≤ -σ/ξ
ifξ < 0
🛠️ Fitting GPD to Synthetic Insurance Claims (Python Example)
Let’s simulate a small dataset of insurance claims, set a threshold, and fit a Generalized Pareto Distribution using scipy
.
🔢 Step 1: Simulate Data
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import genpareto
# Seed for reproducibility
np.random.seed(42)
# Simulated insurance claims
claims = np.concatenate([
np.random.exponential(scale=1000, size=20), # regular claims
np.random.exponential(scale=5000, size=5) # large claims
])
print("Claims sample:", np.round(claims, 2))
🎯 Step 2: Define a High Threshold
threshold = 3000 # Choose a threshold
excesses = claims[claims > threshold] - threshold
print("Excesses over threshold:", np.round(excesses, 2))
🔧 Step 3: Fit the GPD
shape, loc, scale = genpareto.fit(excesses, floc=0)
print(f"Shape (ξ): {shape:.3f}")
print(f"Scale (σ): {scale:.3f}")
📊 Step 4: Visualize Fit
x = np.linspace(0, max(excesses), 100)
pdf = genpareto.pdf(x, shape, loc=0, scale=scale)
plt.hist(excesses, bins=10, density=True, alpha=0.6, label="Histogram of Excesses")
plt.plot(x, pdf, 'r-', label="GPD Fit")
plt.xlabel("Excess Over Threshold")
plt.ylabel("Density")
plt.title("GPD Fit to Excess Insurance Claims")
plt.legend()
plt.grid(True)
plt.show()
🧠 Interpreting Parameters
Parameter | Role | Interpretation |
---|---|---|
ξ (Shape) | Tail heaviness |
ξ > 0 → heavy tail (e.g., large risks) ξ = 0 → exponential tail ξ < 0 → bounded tail (max cap) |
σ (Scale) | Spread of excesses | Larger σ = more variability in extreme values |
μ (Location) | Threshold baseline (often 0) | Shift of the distribution, typically fixed at 0 |
📈 Understanding the PDF
The PDF of the GPD shows the likelihood of an excess value. For example:
- If the PDF is high near zero, most excesses are small.
- If the PDF decays slowly (ξ > 0), large excesses are more probable.
- If the PDF drops quickly (ξ < 0), very large excesses are rare.
📁 Optional: Save the Data in a CSV (Kaggle/Colab)
import pandas as pd
# Save the data to CSV
df = pd.DataFrame({'claims': claims})
df.to_csv('/kaggle/working/claims.csv', index=False)
# Reload the data
df_loaded = pd.read_csv('/kaggle/working/claims.csv')
✅ Summary
- Use GPD for modeling values above a high threshold (extremes).
- Fit shape and scale using MLE (e.g.,
scipy.stats.genpareto.fit
). - Interpret shape to understand tail behavior (risk of extremes).
This approach is powerful for risk analysis, reinsurance modeling, climate extremes, and more.
Comments
Post a Comment