Since LLMs essentially decide on one character at a time, I wonder if they would have better accuracy if asked to tell you the sum backwards. That's how we teach kids to add, right to left, carry the 1.

[–] hexaflexagonbear@hexbear.net 11 points 1 year ago

I think this is essentially what they did. The point of the paper is they made an architecture to make the llm more aware of an individual digit's position in a number. It helped with addition, multiplication, and even sorting.

[–] HexLlama@hexbear.net 5 points 1 year ago (1 children)

Its technically true that it decides token at a time but it also takes previous tokens into account.

[–] ta00000@hexbear.net 4 points 1 year ago* (last edited 1 year ago)

That's why it's easier. if you're going left to right you have to not only figure out the sum of the first number position, but also if there's a 1 to carry or not. Going right to left you only have to focus on one 1 digit add at a time and you already know if there's a carry by looking at the last addition.