works
Scott Alexander Now I Really Won That AI Bet online This blog post details the resolution of a bet made in June 2022 concerning AI’s ability to master image compositionality by June 2025. The bet’s criteria involved generating images from five complex prompts, with success defined as accurate depiction of all elements in at least three out of five prompts. Initial attempts using DALL-E2 in 2022 failed, while Google Imagen results in September 2022, though initially claimed as a win, were deemed insufficient by an independent evaluation. Subsequent evaluations with DALL-E3 and Midjourney in January 2024 also fell short. While unconfirmed reports suggested Google Imagen 3 might have passed the test by late 2024, definitive success was achieved with ChatGPT 4o in June 2025, fulfilling the bet’s criteria. This progress is presented as evidence against the notion of AI as merely a “stochastic parrot” incapable of genuine understanding, arguing that increased pattern-matching depth equates to a form of understanding. The post concludes with a discussion of remaining limitations, such as AI’s struggle with very complex prompts, hypothesizing this stems from limitations in maintaining prompt information analogous to human working memory, and suggesting future improvements may hinge on advancements in AI agency and planning. – AI-generated abstract.

Now I Really Won That AI Bet

Scott Alexander

Astral Codex Ten, July 8, 2025

Abstract

This blog post details the resolution of a bet made in June 2022 concerning AI’s ability to master image compositionality by June 2025. The bet’s criteria involved generating images from five complex prompts, with success defined as accurate depiction of all elements in at least three out of five prompts. Initial attempts using DALL-E2 in 2022 failed, while Google Imagen results in September 2022, though initially claimed as a win, were deemed insufficient by an independent evaluation. Subsequent evaluations with DALL-E3 and Midjourney in January 2024 also fell short. While unconfirmed reports suggested Google Imagen 3 might have passed the test by late 2024, definitive success was achieved with ChatGPT 4o in June 2025, fulfilling the bet’s criteria. This progress is presented as evidence against the notion of AI as merely a “stochastic parrot” incapable of genuine understanding, arguing that increased pattern-matching depth equates to a form of understanding. The post concludes with a discussion of remaining limitations, such as AI’s struggle with very complex prompts, hypothesizing this stems from limitations in maintaining prompt information analogous to human working memory, and suggesting future improvements may hinge on advancements in AI agency and planning. – AI-generated abstract.

PDF

First page of PDF