this post was submitted on 23 Mar 2026
75 points (74.5% liked)
Technology
83069 readers
4516 users here now
This is a most excellent place for technology news and articles.
Our Rules
- Follow the lemmy.world rules.
- Only tech related news or articles.
- Be excellent to each other!
- Mod approved content bots can post up to 10 articles per day.
- Threads asking for personal tech support may be deleted.
- Politics threads may be removed.
- No memes allowed as posts, OK to post as comments.
- Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
- Check for duplicates before posting, duplicates may be removed
- Accounts 7 days and younger will have their posts automatically removed.
Approved Bots
founded 2 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
Training data isn't stored in the model. The model processes that data and uses it to adjust the weights and measures of its parameters (usually several to a hundred billion or more for commercial models), which are divided among several layers, hidden sizes, and attention heads. These weights and the architecture are what are hard-coded into the model during training.
Inferencing is what happens when the model generates text from an input, and at this point the weights and measures are hard-coded so it doesn't actually retain all that information it was trained on. The context window refers to how many tokens (words or phrases) it can store in its memory at a time.
For every token that it processes, it runs a series of calculations on embedded vectors passes them through several layers in which they're considered against the context of all the other tokens in the context window. This involves matrix multiplication and is very compute-heavy. Think like 1-4GB of RAM for every billion parameters, plus several more GB of RAM for the context window. There's just no way it would be able to hold its entire training dataset in RAM at a time.
You would need to integrate retrieval-augmented generation to fetch the relevant data into the context window before generating a response, but that's not at all the same as containing all that knowledge in a stateful manner