How Did Google DeepMind Perform During the 2025 Atlantic Hurricane Season? AI Models Discussion 12-15-2025

  • Andy Hazelton
    Andy Hazelton

Share post

How Did Google DeepMind Perform During the 2025 Atlantic Hurricane Season? AI Models Discussion 12-15-2025

Now that the hurricane season is coming to an end, it seems like a good time to look back on how some of the AI forecast guidance performed. At some point, we’ll do a deep dive looking at all of the various AI models to compare their relative performance, but at first it seems worthwhile to look at one of the AI tools that has already proven itself valuable and been incorporated into the NHC forecast suite: Google DeepMind. Recall that Google has worked with NHC to make their data available in real-time this summer. While the actual model itself is still a bit of a black box, real-time data is an excellent way to evaluate a model’s performance. We’ll start with a look at the Atlantic, and then examine the East Pacific in a later blog.

The 6 forecasts/models that we’ll look at in this blog are:

GDMN (Google Deep Mind Ensemble Mean)

AVNO (Operational GFS deterministic)

AEMN (Operational GFS Ensemble Mean)

HFSA (Operational HAFS-A)

HFSB (Operational HAFS-B)

OFCL (Official NHC Forecast)

Track

Figure 1 shows the average track errors for the Atlantic basin from 2025. Across the season as a whole, GDMN (Google Deep Mind) was notably better than other forecast models, and even outperformed the NHC official forecast. We included the comparison with the GFS ensemble mean (AEMN) since GDMN is also an ensemble mean, which tends to smooth out errors from single runs. AEMN did do much better than the deterministic GFS (comparable to HAFS-A and HAFS-B), but GDMN was still the clear winner.

Figure 1: Average track forecast errors from the Atlantic basin for the 2025 season for all models analyzed (including the NHC forecast).

Hurricane Imelda was one of the more challenging track cases from 2025, due to the binary interaction with Hurricane Humberto. As such, there were some large long-range track errors even in the NHC forecast (Figure 2). However, DeepMind once again did better than all models at all lead times, with most forecasts from the ensemble mean correctly keeping the storm off of the Southeast Coast.

Figure 2: Average track forecast errors from the Atlantic basin for Hurricane Imelda for all models analyzed (including the NHC forecast).

For Hurricane Melissa, the highest-impact case from 2025, GDMN was once again the best performer (Figure 3), and outperformed NHC at all lead times except 120h.

Figure 3: Average track forecast errors from the Atlantic basin for Hurricane Melissa for all models analyzed (including the NHC forecast).

Intensity

In previous generations of AI-based forecasts, intensity skill was extremely limited compared to track skill. The composite results from 2025 show that DeepMind has changed this paradigm. Figure 4 shows that GDMN outperformed all models, and even beat NHC except at very short leads.

Figure 4: Average intensity forecast errors from the Atlantic basin for the 2025 season for all models analyzed (including the NHC forecast).

Figure 5 shows the intensity bias (different from the absolute error because it shows the sign of the error). GDMN had a slight negative (weak) bias compared to HAFS and NHC, though much better than operational GFS. If further development of AI techniques like GDMN improves forecasts of rapid intensification, both the bias and absolute errors should come down even more.

Figure 5: Average intensity forecast bias from the Atlantic basin for the 2025 season for all models analyzed (including the NHC forecast).

Interestingly, the skill for pressure (Figure 6) was not as good as for wind, with HAFS-A and HAFS-B showing a smaller pressure bias. This implies that AI may not be handling the pressure-wind relationship in a physically consistent manner, and this might be another avenue for further refinement.

Figure 6: Average pressure forecast bias from the Atlantic basin for the 2025 season for all models analyzed (including the NHC forecast).

Interestingly, for Hurricane Melissa (Figure 7), GDMN actually had a lower negative (weak) bias than even HAFS and the NHC forecasts, despite the extreme intensification of this storm. Some of this might be due to the fact that GDMN’s track was more accurate, such that the timing of RI and weakening due to land was better than in the other models.

Figure 7: Average intensity forecast bias from the Atlantic basin for Hurricane Melissa for all models analyzed (including the NHC forecast).

Overall, these verification results in the Atlantic are extremely promising for the future of AI in hurricane forecasting. Google DeepMind is already becoming part of the forecast process, and hopefully other models will soon show similar skill and also be incorporated into forecast. Next, we’ll look at the East Pacific to see how similar it is to the Atlantic.