Hallucinations in AI
In the tech-pocalypse desert of 2023, only the AI oasis can save us.
Since the 1950s, AI has been a story of the boy who cried wolf – prophecies were over-promised and under-delivered. Early AI pioneers Herbert Simon and Marvin Minsky anticipated AGI before 2000. Even in 2021, AI products were generally niche (Siri, Alexa), protracted (autonomous vehicles, robotics), or domain-specific (recommender systems, Face ID, Google Translate, personalized ads).
AI seemed stuck in the realm of big tech and academia. Most people instead focused on more profitable endeavors elsewhere for the past decade: earning paychecks from the big tech profit pool, buying tokens in the crypto run-up, building SaaS companies that’d trade at 30x ARR, collecting management fees on ever-bigger VC funds.
When ChatGPT came online six months ago, even those working in the industry were caught off guard by its rapid growth. While tech as an industry is bigger than ever, the AI frontier is being driven by a much smaller fraction of the ecosystem than former technological shifts like the internet and software. How many people had a hand in building ChatGPT, Claude, and Bard? The transformer architecture was published in 2017, but very few people worked to implement it at scale. OpenAI and Anthropic combined employ fewer than 800 people.
With a sudden phase change driven by very few people, many technologists fear that the new world we’re entering will leave them behind. This fear creates hallucinations including hope (the LLM market deployment phase will unfold in a way that benefits me), cope (my market position isn’t so bad), and mope (the game is already over, and I lost).
Today, these hallucinations propel hyperbolic narratives around foundation model FUD, open source outperformance, incumbent invincibility, investor infatuation, and doomer declarations. It’s hard to know what to believe and who to trust. Every public narrative should be read with skepticism. You should read this article with a grain of salt, too (FF invested in Scale, and a large foundation model provider).
I’ll address a few of these narratives that are always shared with a high degree of certainty, when the trajectory of the industry is anything but.
Foundation models have already plateaued
If you were in AI, but not one of the few hundred researchers who contributed to releasing a frontier LLM in the past 6 months, there is a strong incentive to believe those models are not the terminal winners.
This is why you see a lot of companies raising $50m+ seed rounds to build “better” foundation models, where the story is something like “OpenAI and Anthropic are great, but we need [cheaper pricing, more openness, less filtering, better architecture, more localization, better domain-specificity].”
But starting a new language model company in 2023 requires coping with your non-winning market position. You might tell yourself a story that the current models are the proverbial end of history, there will be many winning models, or that the winners have no moat.
A common refrain I hear is that GPT-5 won’t be that much better than GPT-4, implying we’ve reached the end of the LLM capability S-curve. I think this is a form of cope. If you believe GPT-4 is the end of what’s possible to achieve with current LLM architecture, then two conclusions follow: 1) there will be many comparable LLM providers as others catch up, and 2) LLMs will be minimally disruptive to the current state of business affairs.
With conservative extrapolation, it’s hard to envision we won't get much more sophisticated LLMs within five years. But let’s pretend the skeptics are right, and GPT-5 is a smaller leap than GPT-4 was. Even then, GPT-4 has immense latent productive value that has yet to be mined. Training on proprietary data, reducing inference costs, and building the “last mile” of UX will drive a step-function in model capabilities, unlocking enormous economic surplus.
Humans are slow to adopt new tools. It took boomers a decade to use Google en masse. I am still figuring out how to best use GPT-4. But between new model capabilities and software products built around current LLM technology, we won’t look back on 2023 as a plateau.
Open source models will dominate
Open source has made incredible strides over the past few months, from Whisper forks to LLaMA. That said, many people have strong reasons to hope open source models win:
Developers who don’t like relying on Big Companies
VCs and AI infra startups that profit in a “many models” world
Founders and researchers who feel like side characters of the AI story written in 2023
Big tech companies without a hyperscaler to capture infra spend (if I can’t win, I at least want you to lose)
People who want an ungovernable product (some benign, some malevolent)
Culture warriors who want a politically unbiased product
I’m not sure if the sum of this demand means that open source wins. Debating the merits of OSS publicly is delicate: open source software is a protected class. It is unbecoming to say mean things about it.
But I have a lingering feeling that open source will not be the dominant modality for frontier, productionized models. Clayton Christensen famously thought the iPhone’s closed approach would never work:
“The transition from proprietary architecture to open modular architecture just happens over and over again. It happened in the personal computer… You also see modularity organized around the Android operating system that is growing much faster than the iPhone. So I worry that modularity will do its work on Apple.”
The current open source AI discussion has echoes of the early iPhone vs Android debate. Modular, open source approaches undoubtedly democratize technologies (Android dominates in emerging markets). But integrated products almost always capture the value. The iPhone has just 21% market share in terms of shipments, but 50% of revenue and 82% of profits.
Look at the list of operating systems. There are hundreds. How many ended up mattering? The power law distribution of operating system adoption was even more extreme than smartphones. Even within open source operating systems, there was only room for one power law winner in Linux. Almost every technology market has a power law distribution of outcomes: smartphones, social media networks, cloud infrastructure, even SaaS.
Regulators may prevent open source frontier models for safety purposes, but even in terms of market forces, open source has a weak position for most applications. Open source provides healthy competitive pressure: models from Facebook, Databricks, Hugging Face and others will keep the pricing power and neutrality of closed models in check. But the pitch for OSS models would make more sense if the large foundational model companies were abusing their power with extortionary pricing or extreme censorship. So far, I haven’t seen much of either.
There are some contexts in which open source makes a lot of sense:
For larger enterprises, on-prem open source models could become a critical modality: enterprises want to control their destiny and keep data private. In the data warehouse context, for example, some companies pick Databricks over Snowflake for its data interoperability, vs. Snowflake’s more locked-in product.
For certain applications, local processing with near-zero latency is more important than model quality (think real-time speech-to-text). These applications can use open-source models locally to trade off accuracy and breadth for speed.
But in the context of consumer-facing LLM apps and SaaS, it is hard to beat the quality of closed-source, well-maintained, integrated products. On rigorous eval sets, frontier models still dominate open source. As expectations rise for LLM reliability, robustness, and flexibility, I anticipate that open source approaches won’t be the primary modality for applications, and an even smaller percentage of the value captured.
Only incumbents will win in AI
The AI moper has convinced himself the game is already over, and only incumbents will capture value. It is an easy argument to make: the biggest wins of the past six months clearly favored incumbents. Microsoft’s GitHub Copilot and OpenAI are driving shareholder excitement, NVIDIA is on a tear via a GPU demand spike, and Midjourney benefits Google’s cloud platform.
Transitioning to cloud was hard: very few 20th century software companies did it successfully, leaving room for new entrants like Salesforce, Atlassian, and NetSuite. In contrast, LLMs are surprisingly easy for incumbents to integrate. Across many software categories, it seems like incumbents scooped up the obvious opportunities: Notion rolled out an AI document editor, Epic Health rolled out an AI-powered EHR, Intercom and Zendesk rolled out AI customer support.
The LLM API form factor means quick integration, and incumbents’ wide distribution means they show the fastest wins. In the short term, AI seems to be a sustaining innovation, not a disruptive one. But I think there are failures of imagination at the product ideation level: it is much easier to modify what’s already working than to think from scratch about what new opportunities exist.
Even with GPT-4 capabilities (not to mention subsequent models), there will be entirely new product form factors that won’t have the formula of incumbent software workflow + AI. They will require radically different user experiences to win customers. This is white space for new companies with embedded counter-positioning against incumbents, who can’t re-architect their entire product UX.
In just a few months of production-quality LLMs, promising subcategories emerged: enterprise search (Glean), legal automation (Harvey, Casetext), marketing automation (Typeface, Jasper), among others. Many of these apps will succeed. New categories take time: in the productivity wave of the 2010s, Figma and Notion took years to build products ready for prime time.
To say that only incumbents will win is lazy thinking: it justifies a mopey attitude of doing nothing since the game is over. The counterfactual seems preposterous: there will be a 20-year platform shift and no new startups win? Unlikely.
There are many VC-investable AI opportunities
On the opposite side of the gloomers are the VC cheerleaders who hope for investable startups. For a long time, venture capital was synonymous with technological progress. In periods of rapid tech adoption, VCs would make a lot of money.
But VCs aren’t the primary shareholders in the current leading AI companies. How many VCs hold equity in NVIDIA or Microsoft? Even among the startups with scale, Midjourney was bootstrapped, and OpenAI has only raised a small fraction of its capital from VCs.
Venture capitalists primarily invest in early stage companies, so they need to tell narratives about the opportunities for young startups: new foundation models, vector databases, LLM-driven applications. Some of these narratives could prove correct! But VCs have a strong predisposition to hallucinate market structures in the quest for AI returns.
Many firms have been quick to develop LLM infra market maps (1, 2, 3), which shoehorn AI into a 2010s SaaS framework: lots of point solutions that constitute an “AI stack”, each of which captures some value.
This seems shaky. Much of the infra today is built to complement the limitations of LLMs: complementary modalities, increased monitoring, chained commands, increased memory. But foundation models are built via API to abstract away infrastructure from developers. Incremental model updates cannibalize the complementary infra stack that has been built around them. If only a handful of LLMs dominate, then a robust infrastructure ecosystem matters less.
The app layer should play more favorably for VCs, where many new entrants require venture capital to get off the ground. LLMs will unlock new software categories in stodgy industries, from legal to medical to services. But because software incumbents can take advantage of LLM APIs themselves, the app-level opportunities for startups will be limited to new product paradigms.
If there aren’t many VC-investable opportunities, then VCs will have no economics in the most important platform shift of the last several decades. At the infra layer, VCs deeply want the “fragmented AI stack” thesis to be true, both as a coping mechanism and to demonstrate to LPs and founders that they’re on top of it.
Undoubtedly, some of these VC-backed companies will be large, standalone companies. They just need to be evaluated with a cautious lens given the incentives at play. I worry that the VC deployment velocity will outpace the venture-backed wins.
AI needs to be slowed down
There are many people whose incentive is to slow LLM development down: coping non-winning foundation model developers, the hopeful CCP, workers who risk job displacement, AI doomers.
Nth place LLM developers: The “Pause Giant AI Experiments” open letter reads as thinly veiled competitive FUD. The motives were too clearly self-interested: many of the signatories were AI developers behind the frontier, and wanted time to catch up.
The CCP: A dead giveaway that China is behind the US is that their state reporting focuses on the number of LLMs being released, rather than the quality or ubiquity. Like the signatories on the open letter, China would love for the US to slow down on LLMs, to narrow their competitive disadvantage.
Workers who could be displaced: There are many people who have a professional reason to slow down AI due to substitutive fear: copywriters, call center employees, clerical workers.
When GPT-4 fails on particular tasks in math and logic, people are eager to call it out. This type of content always seems to go viral. It is comforting to know that the AI is not yet better than us at everything.
AI doomers: I take AI takeoff risk seriously – we simply do not know how substitutive or adversarial it will become in 5-10 years from now. At the same time, the AI doomer argument is unfalsifiable: an apocalyptic prophecy is impossible to prove wrong. The messengers’ value systems also create their own set of biases; I’ll leave this analysis as an exercise to the reader.
The slowdown message should be received skeptically, when there is no empirical evidence of tight enough feedback loops to lead to an AI takeoff. Aggressive alarmism too early will discredit AI safety; claiming an AI apocalypse before there is one will desensitize people to real risks. In the meantime, the productivity benefits to humanity are too great to wait.
During the early internet era of the mid-90s – and crypto in the early 2010s – very few people showed up in the first few years, but almost all of them made money.
AI seems like the inverse. After the ChatGPT launch, everyone has shown up to participate in the AI game: researchers, developers, VCs, startups. But unlike former technological shifts, it is possible that very few new players stand to win in the deployment era of LLMs. Unlike crypto, the value prop of LLMs is immediately obvious, driving rapid demand; unlike the internet, LLM products leverage cheap ubiquitous computing to deploy quickly. The few AI players in pole position are winning at lightning speed.
Because the AI revolution is so unevenly distributed, there are very few people defining the frontier – this unfortunately means that the rest of the technology industry is some combination of uninformed and biased. Most talkers are playing catch-up, hallucinating a self-serving worldview that’s a mix of despondence (mope), denial (cope), and salesmanship (hope).
I haven’t been able to cover all the AI narratives I regularly come across, but whenever you hear one shared with certainty, alarm bells should go off in your head. Nothing is certain this early in the cycle. The truth lies in what isn’t discussed publicly, and you have to seek it out yourself via conversation and experience. The only trustworthy source is first-hand.
Thanks to Axel Ericsson, Scott Nolan, Melisa Tokmak, Philip Clark, Yasmin Razavi, Cat Wu, Divyahans Gupta, Brandon Camhi, Lachy Groom, and Roon for their thoughts and feedback on this article.