FWIW, LLMs don't do maths. They predict what they think the next token it should generate based on what tokens you give it to start from.
If they are doing maths under the hood, they've snuck an actual calculator in there and pretended it's "AI".
"We did it, Patrick! We made a technological breakthrough!"
A place for all those who loathe AI to discuss things, post articles, and ridicule the AI hype. Proud supporter of working people. And proud booer of SXSW 2024.
FWIW, LLMs don't do maths. They predict what they think the next token it should generate based on what tokens you give it to start from.
If they are doing maths under the hood, they've snuck an actual calculator in there and pretended it's "AI".
That's not entirely correct. They kinda "do maths". I tried to google OP's answer, and there's a bunch of papers showcasing how LLMs develop circuits to handle numbers. (I didn't find that specific one, though.) Of course everything is prediction with LLMs. But seems they try to form a model how to do base-10 maths. Surely they're bad at it and not a real calculator. And you're right. What people usually tend to do is give them tool access. Either to a proper calculator, or more often a Python sandbox and there will be a prompt to write a Python snippet to do arithmetic. But the usual models can also add and multiply smaller numbers without anything in the background. That's not really an achievement, they can simply memorize the basic multiplication tables.
Correct me if I'm wrong but what you're describing still sounds like a probabilistic output, right? Meaning it's not the same output every time (meaning it can't actually be doing math).
The randomization comes into effect later. The model weights itself don't change, that's just some numbers and they get multiplied. So it will always be exactly 94% certain that 5 times 12 equals 60. (Numbers entirely made up by me. And I'm oversimplifying.)
I think what you mean is the sampler and for example the temperature setting. That is added on top and switches things up and occasionally makes the LLM output something that's not the highest confidence token. And you're right. Cranking up temperature would make it output more random answers. But if you use for example ChatGPT on default settings, it should almost always give the correct answer to very basic arithmetic questions with low-ish numbers. I've never seen it do anything else. And you can always set temperature to zero and the sampler will give you deterministic output. So always the same for same input.
But with that said, I also tried decimal numbers, large values or proper equations and trigonometry or divisions. And ChatGPT will definitely not be able to give reliable answers. It's kind of surprising (to me) that it sometimes seems to pull it off, or at least have some vague idea where to go. But seems to me, elementary school level is the limit.
What the papers say is that there's more inside. So they don't just memorize or resort to random guessing. But there's actually more inside. But I've just skimmed those papers, so I don't know the exact details, seems they try to form some "understanding" of how addition works. We know they're not specifically made to be calculators and from my experience they're not good at it. But they're not always just rolling dice either.
(And transformers-based large language models (plus added memory) are Turing complete so... theoretically they could be an accurate calculator 😂 just an absurdly idiotic and wasteful one...)
Ultimately all of this is hard to compare to how a human does maths. I also memorized my multiplication tables, but other than that I do several steps in my head, pretty much how I learned it in school. A LLM not so much, we'd have to properly read the papers to find out how they do it, but it probably inferred different ways to give answers... Unless we're talking about the "reasoning" modes, but I don't think they do proper reasoning as of today.
It’s kind of surprising (to me) that it sometimes seems to pull it off, or at least have some vague idea where to go. But seems to me, elementary school level is the limit.
Could it be that it just has those specific example calculations in its training material? So it's just regurgitating some elementary school textbook?
I think it's both, but as I said I didn't read the papers properly. They seem to describe there's more to it. But obvously LLMs are made to regurgutate stuff and they're fed with textbooks and homework assignments and scientific papers. There is some effort to force them not to just memorize stuff. But obviously(?) that's the first thing they do. I don't see why they wouldn't just memorize and regurgitate what they can and simultaneously come up with a way to predict results that are less common in the training dataset. Whether that's proper math or not is a different story.
that's why I put it in quotes :)