How Did Google DeepMind Perform During the 2025 East Pacific Hurricane Season? AI Models Discussion 12-19-2025
This is the second blog examining Google DeepMind forecast performance during the 2025 season. The first looked at the Atlantic. In this blog, we will look at the performance in the East Pacific, which is part of the NHC area of responsibility. We will see if the performance in the East Pacific was as successful as the performance in the Atlantic.
As a reminder, the 6 forecasts/models that we’ll look at are:
GDMN (Google Deep Mind Ensemble Mean)
AVNO (Operational GFS deterministic)
AEMN (Operational GFS Ensemble Mean)
HFSA (Operational HAFS-A)
HFSB (Operational HAFS-B)
OFCL (Official NHC Forecast)
Track
Figure 1 shows the overall mean track error for the East Pacific from 2025. Google DeepMind performed better than all guidance including the NHC official forecast. For the East Pacific, most of the dynamical models (including GFS, GFS ensemble, HAFS-A, and HAFS-B) were a little worse and similar to each other.
One of the main drivers of the superior performance of Google DeepMind was the forecast skill during Hurricane Kiko, which was a long-tracked storm that moved from the East Pacific into the Central Pacific. This storm contributed ~⅓ of the Day-5 sample for the entire season. GDMN notably outperformed all models for the track forecast for Kiko (Figure 2).
Intensity
Figure 3 shows the intensity forecast errors from the East Pacific from 2025 for all models. Google DeepMind performed well and was comparable to the NHC forecast for most forecast hours (though slightly worse for Days 3-4), and HAFS-A for 120h. It’s worth noting that all guidance levelled off close to 10 knots error, which is generally considered to be close to or even less than the observational uncertainty for TCs without aircraft data. Most intensity guidance did well in the East Pacific in 2025, and Google DeepMind was no exception. One slight issue that was noted was Google DeepMind had a negative (weak) bias relative to NHC and the HAFS forecasts (Figure 4). The negative bias was actually worse than the GFS at Day 5. This is something to improve for the future.
Examining the individual storms, we see that for Kiko, GDMN had a notable negative (weak) bias (Figure 5), which likely partially explains why it had a weak bias in the overall seasonal statistics for the year.
Overall, the East Pacific statistics are encouraging for DeepMind’s performance - it was clearly the best model for track, and one of the best for intensity despite some negative bias that will hopefully be alleviated in future versions.