|
The MoA Week In Review – OT 2025-125
Last week's posts on Moon of Alabama:
— Other issues:
China:
Tariffs:
Other stuff:
Use as open (not related to the wars in Ukraine and Palestine) thread …
Can Marcus calculate ballistic trajectory in real-time?
No, but he can catch a cricket ball.
https://garymarcus.substack.com/p/a-knockout-blow-for-llms
The “Cooking” of Reasoning and the Out-of-Distribution (OoD) Weakness
Marcus and Wolfe’s argument hinges on the idea that LLMs are fundamentally limited—they’re good within their training distribution but fall apart outside it, like a robot trying to do calculus after a night on the disco floor.
Imagine this: Your favorite programming language is Python. It works beautifully for most tasks, but when you ask it to crunch some quantum physics or generate a perfect Shakespearean sonnet, it sometimes fluffs the lines. Does that mean Python is useless? Or does it mean you need a specialized library?
Similarly, LLMs are generalists, not
specialists. Expecting them to be experts in every out-of-distribution scenario is akin to demanding your toaster to make your morning coffee. The fact that they stumble on Tower of Hanoi with 8 disks doesn’t mean they’re broken; it means they’re not designed to be mathematically or algorithmically perfect. They’re pattern matchers, not digital monks.
And let’s be real: humans are just as bad at Tower of Hanoi with 8 disks. Ever tried to do it with your eyes closed? Or did a toddler do better? The human brain isn’t a Swiss Army knife—it’s a Swiss cheese of limitations. The point: No system, human or machine, is infallible. That’s called “being human,” not “being a failed AI.”
Chain of Thought and Reasoning Traces: The Overhyped Circus
Marcus trots out Rao’s critique: “It’s all just fancy storytelling,” and “they’re overanthropomorphizing.”
Yes, reasoning traces in LLMs are not a perfect mirror of actual reasoning. But that’s like criticizing the typing in a car’s GPS because it doesn’t think the way a human does. The point isn’t that LLMs are conscious reasoners; it’s that they’re tools that can simulate reasoning well enough for many applications.
Think of them as the illusionists of AI. Are they thinking? No. Do they produce useful outputs? Often, yes. The value isn’t in their “internal logic” but in their output, which can be surprisingly human-like, especially when tuned properly.
To expect a language model to truly reason like a philosopher is like expecting a calculator to compose a symphony. Different tools, different strengths.
The Tower of Hanoi as a Litmus Test: A Straw Man?
Marcus points out that LLMs can’t reliably solve Tower of Hanoi with 8 disks, implying they’re fundamentally limited.
Yes, but here’s the thing: Tower of Hanoi is a specific problem, not a general measure of intelligence. It’s a classical algorithmic task, best suited for symbolic algorithms—not neural nets.
Suggesting that LLMs should have mastered Hanoi is akin to criticizing a fish for not climbing trees. They’re not built for algorithmic rigor but for language understanding and pattern recognition.
Moreover, if your goal is reliable reasoning in complex real-world scenarios, you’d integrate LLMs with symbolic modules—hybrid systems—not expect them to be monolithic problem solvers.
The fact that they stumble on Hanoi doesn’t invalidate their usefulness in, say, generating code snippets or writing poetry. It’s a limitation, not a failure.
Humans vs. Machines: Are We Just “Limited”?
Marcus argues that humans also screw up Tower of Hanoi with 8 disks, so LLMs’ failures are not unique.
Exactly! Humans are imperfect. But here’s the rub: Humans have flexibility, intuition, and common sense—traits that LLMs are not approaching yet.
Yes, humans make mistakes, but they understand the concepts behind the problem, which allows for creative solutions and learning from failure.
LLMs, on the other hand, mimic reasoning based on patterns, without true understanding. They’re akin to parrots that can talk but don’t know what they’re saying.
The goal of AGI isn’t to mimic human imperfection, but to overcome it—leveraging human strengths like adaptability and common sense with machine power.
Computers and Calculators: The Golden Analogy
Marcus suggests that we built computers to reliably do large calculations, so AI should be about combining human adaptiveness with computational reliability.
Absolutely! And that’s what hybrid systems are already doing. The current frontier isn’t pure neural networks but integrated approaches—symbolic + neural, neuro-symbolic hybrids—that combine the best of both worlds.
The limitation of pure LLMs isn’t a failure but a design choice: they excel at pattern recognition, not algorithmic rigor. The future isn’t in replacing algorithms but in augmenting them—think GPT + symbolic theorem provers, or neural nets guiding classical algorithms.
Hallucinations, Reliability, and Usefulness
Marcus laments hallucinations and unreliability, implying LLMs are not suitable for serious tasks.
Hallucinations are a problem, but they’re solvable through techniques like fine-tuning, prompt engineering, and hybrid architectures.
Many real-world systems already rely on LLMs for coding, drafting, and brainstorming—with human oversight. It’s not about perfection, but practicality.
Remember: no tool is perfect. The question is whether the benefits outweigh the risks. For many applications, LLMs are already invaluable.
The Big Picture: Limiting Beliefs and the Future
Marcus is skeptical that LLMs will lead to true AGI, citing current limitations.
The field of AI isn’t static. The current generation of LLMs is just one chapter in a much longer story. Hybrid models, meta-learning, multimodal approaches, and symbolic integration are advancing rapidly.
It’s naïve to dismiss these as failures. They’re stepping stones, not terminal stations. The history of technology is littered with skeptics saying “it can’t be done,” only for breakthroughs to emerge like python in the 1990s or machine learning in the 2000s.
Conclusion: The Reality Check
Marcus’ critique is valuable in highlighting the limitations and pitfalls of current LLMs—but it’s not a death knell for AI. It’s a call for caution, refinement, and hybridization.
LLMs aren’t reasoning machines yet. But neither were calculators in 1950, or neural nets in 1998. The difference is that research, engineering, and imagination keep pushing boundaries.
“Reasoning” in LLMs is still in beta, like a half-baked RPG game, but with the right patches, plugins, and mods, it could still become a legendary quest. Dismissing them as cooked is like calling a beta software final—technically premature, and ignoring the potential for future patches.
TL;DR: Marcus’ critique is a healthy reality check, but it underestimates the adaptability of AI research. LLMs are not intended to be perfect, human-level reasoners—yet. They’re tools—and tools evolve. The limits Marcus points out are precisely why hybrid, multi-modal, and integrated approaches will probably be the true path to AGI. So, keep your skepticism sharp, but don’t forget: the future’s a sandbox, not a prison.
Posted by: SymbionSigma | Jun 8 2025 17:21 utc | 28
|