The fix is not that hard, it’s a matter of reputation on having the chatbot answer “I don’t know” when the confidence on an answer isn’t high enough.
This has been tried, it's helping but it's not enough by itself. It's one of the mitigation steps I was thinking of. And companies do work very hard to reduce hallucinations, just look at Microsoft's newest thing.
From that article:
“Trying to eliminate hallucinations from generative AI is like trying to eliminate hydrogen from water,” said Os Keyes, a PhD candidate at the University of Washington who studies the ethical impact of emerging tech. “It’s an essential component of how the technology works.”
Text-generating models hallucinate because they don’t actually “know” anything. They’re statistical systems that identify patterns in a series of words and predict which words come next based on the countless examples they are trained on.
It follows that a model’s responses aren’t answers, but merely predictions of how a question would be answered were it present in the training set. As a consequence, models tend to play fast and loose with the truth. One study found that OpenAI’s ChatGPT gets medical questions wrong half the time.
I think you misunderstand how LLM's work, it doesn't have a confidence, it's not like it looks at it's data and say "hmm, yes, most say Paris is the capital of France, so that's the answer". It "just" puts weight on the next token depending on it's internal statistics, and then one of those tokens are picked, and the process start anew.
Teaching the model to say "I don't know" helps a bit, and was lauded as "The Solution" a year or two ago but turns out it didn't really help that much. Then you got Grounded approach, RAG, CoT, and so on, all with the goal to make the LLM more reliable. None of them solves the problem, because as the PhD said it's inherent in how LLM's work.
And no, local llm's aren't better, they're actually much worse, and the big companies are throwing billions on trying to solve this. And no, it's not because "that makes the llm look dumb" that they haven't solved it.
Early on I was looking into making a business of providing local AI to businesses, especially RAG. But no model I tried - even with the documents being part of the context - came close to reliable enough. They all hallucinated too much. I still check this out now and then just out of own interest, and while it's become a lot better it's still a big issue. Which is why you see it on the news again and again.
This is the single biggest hurdle for the big companies to turn their AI's from a curiosity and something assisting a human into a full fledged autonomous / knowledge system they can sell to customers, you bet your dangleberries they try everything they can to solve this.
And if you think you have the solution that every researcher and developer and machine learning engineer have missed, then please go prove it and collect some fat checks.