doctor0710

joined 4 weeks ago
[–] doctor0710@lemmy.zip 6 points 2 days ago

From my other comment it looks like this dataset contains various strings that trigger refusal: https://huggingface.co/datasets/mlabonne/harmful_behaviors

[–] doctor0710@lemmy.zip 12 points 2 days ago

Also, you might want to research this Heretic project, which aims to remove safeguards from local models as those might be similar to what's in the larger versions. Figuring out the phrases they test the safeguards with might have some decent results. 

[–] doctor0710@lemmy.zip 22 points 2 days ago (10 children)

Asking questions about Chinese politics and/or Tiananmen Square  stops most China based AI models, like Qwen and whatever is on Huawei phones. They aren't that high traffic yet, but are certainly in the list of "all ai models" 

[–] doctor0710@lemmy.zip 30 points 6 days ago (4 children)

In the videogame Wolfenstein: The New Order there's mentions from a character that might be loosely based on her, or at least the details are very similar. Look up "Ramona's diary" if you'd like to read more.