Case study on ENSO effect
🧠 I built a Reinforcement Learning system using PSPO + LoRA, trained on ENSO climate dynamics — here's what I learned.
Most RL tutorials give you CartPole. I wanted something real.
So I built a full pipeline: a custom Proximal Softmax Policy Optimization (PSPO) trainer, a LoRA-adapted policy network, and an environment simulating ENSO (El Niño-Southern Oscillation) — one of the most complex coupled ocean-atmosphere systems on Earth.
Here's the breakdown.
🌊 The Problem Domain — ENSO
ENSO drives global weather patterns. El Niño and La Niña phases cause droughts, floods, and economic disruption across continents. Predicting and responding to them requires long-horizon reasoning — exactly the kind of task where standard RL struggles.
The RL agent had to learn optimal climate monitoring interventions (from passive observation to full atmospheric reanalysis injection) across three phases: Neutral, El Niño, and La Niña, with a 5-dimensional state space encoding sea surface temperature, the Southern Oscillation Index, thermocline depth, trade wind index, and the Oceanic Niño Index.
⚙️ Why PSPO instead of vanilla PPO?
Standard PPO uses a fixed clipping ratio. PSPO adds a softmax temperature parameter τ that scales the probability ratio before clipping:
r = exp((log π_new − log π_old) / τ)
With τ = 1.5, the policy updates are smoother — preventing overconfident steps in stochastic environments like ENSO where a single El Niño episode can cause catastrophic reward collapse.
PSPO also incorporates adaptive KL scheduling: if KL divergence exceeds 1.5× the target, the learning rate drops 10%. If it falls below 0.5× target, the rate increases 5%. This mirrors techniques used in RLHF for large language models — which is exactly the point. PSPO was designed with LLM fine-tuning in mind, and this project stress-tests those ideas in a climate control domain.
🔧 Why LoRA for a Policy Network?
LoRA (Low-Rank Adaptation) is typically used to fine-tune LLMs efficiently. The insight here: the same principle applies to RL policy networks.
Instead of updating all weights W (5→64→5 MLP), only two small matrices are trained per layer:
The effective weight delta is ΔW = (α/r) · B · A — with rank r=4 and α=16, the scale factor is 4×, and only ~0.4% of parameters are trainable. Base weights are frozen, preserving any pre-trained knowledge while adapting to the new ENSO task.
B is initialized to zero (so the adapter starts as an identity transform), and A with small Gaussian noise. This is critical — it means the fine-tuned policy starts identical to the base policy and diverges gradually, which pairs perfectly with PSPO's conservative update philosophy.
📈 Higher Context via GAE
One of the core design goals was longer temporal reasoning. ENSO events unfold over months, not steps. The agent needs to connect a warming SST anomaly today to a reward collapse 20 steps later.
I implemented Generalized Advantage Estimation (GAE) with λ=0.95 and a context window of k=10 steps. This blends multi-step TD errors:
A_t = δ_t + γλ · A_{t+1}
The longer rollout means the agent learns to anticipate phase transitions, not just react to immediate rewards. Neutral-phase episodes yield rewards of ~+15 to +20. El Niño episodes can collapse to −120. The policy learns to recognize the early SST warning signs and shift toward higher intervention actions before the crash.
📊 What the Results Show
After 200 training episodes:
The reward variance is high, and that's the honest result. ENSO is hard. The El Niño phase represents a genuinely adversarial environment that the policy struggles to stabilize without more episodes or a wider network. But the PSPO + LoRA framework holds — updates remain stable, KL stays bounded, and the adapter learns without catastrophic interference.
🛠️ The Stack
Everything is pure Python + NumPy — no PyTorch, no gym, no shortcuts:
The visualization lets you tune τ, ε, λ, and LoRA rank in real time and run a simulated ENSO episode with terminal output — useful for teaching the algorithm to anyone unfamiliar with RL.
🔑 Key Takeaways
If you're working on RL, climate AI, or parameter-efficient fine-tuning and want to dig into the code or the visualization — happy to share. Drop a comment or connect.
#ReinforcementLearning #MachineLearning #ClimateAI #RLHF #LoRA #Python #DeepLearning #ENSO #AIResearch