GPT-5.4 Pro Hits IQ 150 as AI Capability Becomes a Macro Variable
GPT-5.4 Pro Hits IQ 150 as AI Capability Becomes a Macro Variable
OpenAI’s latest model scores higher than 99.96% of humans on a public IQ benchmark — a jump that is no longer just a lab milestone. With CPI, FOMC minutes and PPI all due this week, AI capability growth is beginning to behave like an economic signal.
From 136 to 150: OpenAI Breaks Its Own Record
OpenAI’s GPT-5.4 Pro has reached an IQ score of 150 on TrackingAI’s public Mensa-style benchmark — a sharp step up from the 136 score its o3 model posted on the Mensa Norway test last year. A score of 150 sits in a range historically associated with figures like Albert Einstein and Richard Feynman, implying fast abstraction, strong pattern recognition, and the ability to navigate complex multi-step problems with limited guidance.
GPT-5.4 was introduced by OpenAI as its most capable and efficient frontier model for professional work, with improvements in coding, tool use, and computer use, and a context window of up to one million tokens. OpenAI also said GPT-5.4 achieved a new state of the art on GDPval and exceeded human performance on OSWorld-Verified — two separate benchmarks pointing in the same direction.
| Model | Developer | Test | IQ Score |
|---|---|---|---|
| GPT-5.4 Pro | OpenAI | TrackingAI / Mensa-style | |
| o3 | OpenAI | Mensa Norway | |
| Claude (latest) | Anthropic | TrackingAI public board | |
| Gemini | TrackingAI public board |
A move from 136 to 150 compresses a complex capability shift into a single portable signal. For businesses, it feeds directly into decisions around automation, software budgets and headcount planning. For markets, it adds a variable alongside rates, inflation and growth expectations.
Public Benchmarks Have Limits — But the Curve Is Still Moving
IQ-style tests remain imperfect instruments for measuring frontier models. They compress a narrow slice of cognitive performance into a single number, obscuring variation across reasoning types, context handling, creativity and real-world problem-solving. Scores are sensitive to test design, training exposure, and pattern familiarity — making them a noisy proxy for general capability.
The methodology raises familiar questions: prompt structure, reproducibility, training-set contamination, and format familiarity. Those concerns were visible when o3 hit 136, and they remain active now.
Even so, the broader pattern has become harder to dismiss. One isolated benchmark result can be explained away. A cluster of gains across public IQ-style testing, coding, browser use, desktop navigation and knowledge-work performance carries more analytical weight.
“Investors do not need to accept every premise behind an IQ-style test to recognise that a jump of this size suggests acceleration rather than drift.“
Enterprise buyers also do not need to believe IQ equals general intelligence to see that systems with stronger pattern recognition, stronger tool use and stronger long-horizon task handling are moving toward economically useful territory. This points toward systems that can search, plan, verify, navigate and produce real work across extended contexts.
AI Capability Is Beginning to Overlap with the Economic Week Ahead
The week ahead runs through macro. Markets are focused on FOMC minutes, CPI and PPI — all due within days. But beneath that surface, a second economic track is taking shape, and OpenAI sits near its centre.
Capability growth in frontier AI increasingly intersects with capital allocation. A model that pushes higher on public reasoning tests while also improving in coding, search and computer use changes how businesses think about workflow redesign. It changes what enterprise buyers expect from copilots and agents. It changes how quickly organisations move from experimentation to deployment.
Jack Dorsey recently described Block moving “from hierarchy to intelligence,” using AI to take over coordination work once handled by management layers. That direction is becoming a commercial pattern, not an outlier.
The effects move through document workflows, spreadsheet workflows, customer support, research tasks, browser automation, internal operations, code generation and verification loops. The answer to where spending flows next extends beyond model subscription revenue into cloud demand, chips, data centres, networking, power and software licences.
Is the growth in intelligence itself beginning to behave like a macro variable? Faster capability gains can alter enterprise spending plans, tighten competitive pressure across white-collar functions, support higher infrastructure outlays and strengthen the case for AI-linked capital expenditure even in a slower nominal growth environment.
When TrackingAI shows GPT-5.4 Pro at 150, the number falls within a market that already views OpenAI as more than a lab — it is a platform company, a deployment company, an infrastructure customer and a signal generator for adjacent sectors. The score is compact, legible and easy to circulate. Its deeper relevance comes from the same place as the company’s broader product push: the frontier is still climbing, and the economic footprint of that climb is becoming harder to keep in a separate category.
