Logic Fails, AI Now Sees Patterns
LLMs don't truly understand—they predict patterns. Weak at strict calculation, strong at nuance via attention, shifting from logic to inference.
Why you think an AI Bot suddenly understands what you mean
We live with a persistent misunderstanding about what computers do. Ask a random passerby how a computer works, and the answer almost always revolves around ‘calculating’. We see the processor as a superior mathematician making binary calculations at lightning speed: zeros and ones, strict logic, black or white. If A happens, B follows. This image of the computer as the ultimate calculator has been correct for decades. But it is precisely this image that now stands in our way of understanding why artificial intelligence suddenly feels so human.
The reality is actually surprising: the modern Large Language Models (LLMs) that amaze us daily are actually quite lousy at that traditional calculation work. Ask a language model for a complex sum and the chance is (mid-2025) high that it will stumble. Where they excel, however, is something that previously seemed impossible for a machine: capturing nuance. This is not a triumph of faster processors alone, but of a fundamental shift in strategy: we have switched from logical reasoning to probabilistic thinking reflecting broader trends in AI.
Looking through a straw, the old way
To appreciate this leap, we must go back to how ‘older’ systems read text. Until recently, language models worked strictly sequentially. You can compare this to reading a book through a straw. You see word one, then word two, then word three.
Suppose a computer reads the sentence: “The man who forgot his keys, could not enter his house.”
A traditional model had to remember the information of “The man” while it made its way through “who forgot his keys”. By the time the model arrived at “could not enter his house”, the context of those first few words had often already faded in memory. This serial process is slow and error prone with long texts. The model did not ‘understand’ the sentence as a whole, but as a series of dominoes that had to fall over one by one.
The Transformer, everything on the table at once
The revolution started in 2017 with the introduction of the Transformer architecture [1]. This changed everything. Instead of looking through a straw, a Transformer lays the entire text open on the table at once.
The secret behind this is called the Attention mechanism. This is the engine that makes it possible to see not only words, but also the invisible threads between them. When an LLM processes a sentence, it calculates for every word how relevant all other words in that sentence are. It does not look at the order, but at the relationship.
Take the classic example of ambiguity: “The bank stood on the bank of the river.”
An old model would doubt. Is “bank” here a place to get money or a piece of furniture (bench)? A Transformer, however, immediately sees the word “river” and “bank” (edge) further down in the sentence. Via the Attention mechanism, those words are assigned a heavier weight in relation to the first “bank”. The AI Bot ignores the meaning of ‘financial institute’ and understands directly, without human intervention, that it concerns a geographical object. This ability to hold context over long distances in a text is why these models can suddenly summarize long articles without losing the thread.
Predicting is not calculating
Yet the architecture is only half the story. The real paradigm shift lies in the goal of the machine. This is where it becomes abstract for many people.
Traditional computers are built for determinism. They execute binary logic. 2 + 2 must always be 4. There is no room for discussion or interpretation. It is right or it is wrong.
LLMs work fundamentally differently. Their goal is not precision, but probability. An AI assistant is at its core an insanely advanced prediction machine. It does not try to calculate the ‘right’ answer based on logical laws; it tries to predict which word statistically is most likely to follow the sequence of words it has just seen.
The machine does not reason: “Because A is true, B must be true” (causal deduction). Instead, it sees patterns: “In 99% of the texts an LLM is trained on, word B follows after this context.” This form of automation is based on correlation, not on understanding cause and effect. That sounds like a weakness, but with language, which is by definition fluid and ambiguous, this is precisely an enormous strength. Language cannot be captured in binary rules, but it can be in statistical patterns.
The language of numbers, Embeddings
But how does a computer ‘see’ those patterns? After all, it reads no Dutch or English. Before an AI Bot does anything with your text, a translation takes place to the only language that machines truly speak: numbers.
This process is called creating Embeddings [2]. Every word (or part of a word) is converted into a long series of numbers, a so-called vector. You can view this as coordinates in a gigantic, multidimensional space.
The fascinating thing is that the meaning of words is captured in their position in that space. Words that resemble each other in meaning, such as “dog” and “pup”, get coordinates that lie close together. Words that have nothing to do with each other lie far apart. The LLM then executes lightning-fast matrix calculations on these vectors. This is where the powerful GPUs (graphics processing units) come around the corner [3]. They can execute these complex, parallel calculations on a scale that was previously unthinkable.
By calculating with the distance and direction between these vectors, the model can make connections that seem logical to us, but are purely mathematical proximity for the machine. It does not ‘get’ what a pup is, but it knows mathematically exactly where ‘pup’ is located relative to ‘dog’.
The Illusion of Intelligence
Is this then intelligence? That is a philosophical question, but technically speaking we are looking at a hall of mirrors of statistics. Because the model is trained on an unimaginable amount of text (practically the entire public internet), the predictions have become so accurate that they are no longer distinguishable from human insight.
The power of this technology lies in the combination: the Transformer architecture that enables parallel processing, coupled with the probabilistic model that lets go of rigid logic. This allows us to collaborate with systems that not only execute our commands, but also seem to sense our intentions.
The next step in innovation is not that these models learn to calculate even better, but that we learn better how we can steer these statistical engines. We have not built a super calculator, but a universal translator of human patterns. And in that translation lies the key to solving issues that were always unreachable for binary logic.
Related signals
- World Models, Beyond Autoregressive Illusions
- AI Tokens: The Secret Economics
- AI Costs Plummet 1000% As Tech Giants Race to Zero
References
[1] Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, et al. Attention Is All You Need. arXiv. 2017. https://arxiv.org/abs/1706.03762
[2] Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J. Distributed Representations of Words and Phrases and their Compositionality. arXiv. 2013. https://arxiv.org/abs/1310.4546
[3] NVIDIA. What Is a GPU and How Does It Work? NVIDIA Blog. https://www.nvidia.com/en-us/geforce/news/what-is-a-gpu/