It is related, inasmuch as it’s all generated from the same prompt and the “answer” will be statistically likely to follow from the “reasoning” text. But it is only likely to follow, which is why you can sometimes see a lot of unrelated or incorrect guff in “reasoning” steps that’s misinterpreted as deliberate lying by ai doomers.
I will confess that I don’t know what shapes the multiple “let me just check” or correction steps you sometimes see. It might just be a response stream that is shaped like self-checking. It is also possible that the response stream is fed through a separate llm session when then pushes its own responses into the context window before the response is finished and sent back to the questioner, but that would boil down to “neural networks pattern matching on each other’s outputs and generating plausible response token streams” rather than any sort of meaningful introspection.
I would expect the actual systems used by the likes of openai to be far more full of hacks and bodges and work-arounds and let’s-pretend prompts that either you or I could imagine.
Some people casting their eyes over this monster of a paper have less than positive thoughts about it. I’m not going to try and summarise the summaries here, but the threads aren’t long (and are vastly shorter than the paper) so reading them wouldn’t take long.
Dr. Cat Hicks on mastodon: https://mastodon.social/@grimalkina/114690973548997443
Ashley Juavinett on bluesky: https://bsky.app/profile/analog-ashley.bsky.social/post/3lru5sua3fk25