Great article, thanks for sharing it OP.
For example, the Anthropic researchers who located the concept of the Golden Gate Bridge within Claude didn’t just identify the regions of the model that lit up when the bridge was on Claude’s mind. They took a profound next step: They tweaked the model so that the weights in those regions were 10 times stronger than they’d been before. This form of “clamping” the model weights meant that even if the Golden Gate Bridge was not mentioned in a given prompt, or was not somehow a natural answer to a user’s question on the basis of its regular training and tuning, the activations of those regions would always be high.
The result? Clamping those weights enough made Claude obsess about the Golden Gate Bridge. As Anthropic described it:
If you ask this “Golden Gate Claude” how to spend $10, it will recommend using it to drive across the Golden Gate Bridge and pay the toll. If you ask it to write a love story, it’ll tell you a tale of a car who can’t wait to cross its beloved bridge on a foggy day. If you ask it what it imagines it looks like, it will likely tell you that it imagines it looks like the Golden Gate Bridge.
Okay, now imagine you're Elon Musk and you really want to change hearts and minds on the topic of, for example, white supremacy. AI chatbots have the potential to fundamentally change how a wide swath of people perceive reality.
If we think the reality distortion bubble is bad now (MAGAsphere, etc), how bad will things get when people implicitly trust the output from these models and the underlying process by which the model decides how to present information is weighted towards particular ideologies? Considering the rest of the article, which explores the way in which chatbots attempt to create a profile for the user and serve different content based on that profile, now it will be even easier to identify those most susceptible to mis/disinformation and deliver it with a cheery tone.
How might we, as a society, create a process for conducting oversight for these "tools"? We need a cohesive approach that can be explained to policymakers in a way that will call them to action on this issue.