Case study on ENSO effect

🧠 I built a Reinforcement Learning system using PSPO + LoRA, trained on ENSO climate dynamics — here's what I learned.

Most RL tutorials give you CartPole. I wanted something real.

So I built a full pipeline: a custom Proximal Softmax Policy Optimization (PSPO) trainer, a LoRA-adapted policy network, and an environment simulating ENSO (El Niño-Southern Oscillation) — one of the most complex coupled ocean-atmosphere systems on Earth.

Here's the breakdown.

🌊 The Problem Domain — ENSO

ENSO drives global weather patterns. El Niño and La Niña phases cause droughts, floods, and economic disruption across continents. Predicting and responding to them requires long-horizon reasoning — exactly the kind of task where standard RL struggles.

The RL agent had to learn optimal climate monitoring interventions (from passive observation to full atmospheric reanalysis injection) across three phases: Neutral, El Niño, and La Niña, with a 5-dimensional state space encoding sea surface temperature, the Southern Oscillation Index, thermocline depth, trade wind index, and the Oceanic Niño Index.

⚙️ Why PSPO instead of vanilla PPO?

Standard PPO uses a fixed clipping ratio. PSPO adds a softmax temperature parameter τ that scales the probability ratio before clipping:

r = exp((log π_new − log π_old) / τ)

With τ = 1.5, the policy updates are smoother — preventing overconfident steps in stochastic environments like ENSO where a single El Niño episode can cause catastrophic reward collapse.

PSPO also incorporates adaptive KL scheduling: if KL divergence exceeds 1.5× the target, the learning rate drops 10%. If it falls below 0.5× target, the rate increases 5%. This mirrors techniques used in RLHF for large language models — which is exactly the point. PSPO was designed with LLM fine-tuning in mind, and this project stress-tests those ideas in a climate control domain.

🔧 Why LoRA for a Policy Network?

LoRA (Low-Rank Adaptation) is typically used to fine-tune LLMs efficiently. The insight here: the same principle applies to RL policy networks.

Instead of updating all weights W (5→64→5 MLP), only two small matrices are trained per layer:

A ∈ ℝ^{r×d_in}
B ∈ ℝ^{d_out×r}

The effective weight delta is ΔW = (α/r) · B · A — with rank r=4 and α=16, the scale factor is 4×, and only ~0.4% of parameters are trainable. Base weights are frozen, preserving any pre-trained knowledge while adapting to the new ENSO task.

B is initialized to zero (so the adapter starts as an identity transform), and A with small Gaussian noise. This is critical — it means the fine-tuned policy starts identical to the base policy and diverges gradually, which pairs perfectly with PSPO's conservative update philosophy.

📈 Higher Context via GAE

One of the core design goals was longer temporal reasoning. ENSO events unfold over months, not steps. The agent needs to connect a warming SST anomaly today to a reward collapse 20 steps later.

I implemented Generalized Advantage Estimation (GAE) with λ=0.95 and a context window of k=10 steps. This blends multi-step TD errors:

A_t = δ_t + γλ · A_{t+1}

The longer rollout means the agent learns to anticipate phase transitions, not just react to immediate rewards. Neutral-phase episodes yield rewards of ~+15 to +20. El Niño episodes can collapse to −120. The policy learns to recognize the early SST warning signs and shift toward higher intervention actions before the crash.

📊 What the Results Show

After 200 training episodes:

Best reward: +20.70 (neutral phase, full stability)
El Niño average reward: ~−100 (high variance, hard exploration problem)
Adaptive LR ranged from 3e-4 → 9e-4 as KL stabilized
LoRA delta norm converged near zero — base policy largely preserved

The reward variance is high, and that's the honest result. ENSO is hard. The El Niño phase represents a genuinely adversarial environment that the policy struggles to stabilize without more episodes or a wider network. But the PSPO + LoRA framework holds — updates remain stable, KL stays bounded, and the adapter learns without catastrophic interference.

🛠️ The Stack

Everything is pure Python + NumPy — no PyTorch, no gym, no shortcuts:

Custom LoRALayer with finite-difference gradient estimation
Bjerknes coupled ocean-atmosphere dynamics for the environment
GAE, PSPO surrogate, and adaptive LR scheduler from scratch
Results exported to JSON, with a full interactive HTML visualization showing the architecture diagram, training curves, phase timeline, and a live episode simulator

The visualization lets you tune τ, ε, λ, and LoRA rank in real time and run a simulated ENSO episode with terminal output — useful for teaching the algorithm to anyone unfamiliar with RL.

🔑 Key Takeaways

PSPO's temperature scaling genuinely helps in high-variance environments. The softer ratio prevents the policy from overcorrecting during El Niño shocks.
LoRA isn't just for LLMs. Parameter-efficient adapters on small policy networks give you a clean separation between "what the model already knows" and "what it's learning for this task."
Context matters in RL. A longer GAE window meaningfully changes what the agent learns to value. For slow-moving, high-stakes domains (climate, finance, healthcare), k=10 or more is worth the compute.
Reward shaping is where domain knowledge lives. The ENSO reward function encodes climatological intuition: penalize SST extremes quadratically, reward stability exponentially, and add a small cost to high-intervention actions to prevent trivial solutions.

If you're working on RL, climate AI, or parameter-efficient fine-tuning and want to dig into the code or the visualization — happy to share. Drop a comment or connect.

https://github.com/Mathin26/ENSO-Project

#ReinforcementLearning #MachineLearning #ClimateAI #RLHF #LoRA #Python #DeepLearning #ENSO #AIResearch

Case study on ENSO effect

Mathinshack Meshack

More articles by this author

Explore content categories

A2A protocol demonstration with MCP Protocol

Mar 14, 2026

Frontend Web Application Prototype

Mar 5, 2026

Rule based AI implementation with html

Feb 28, 2026

Explore content categories