this post was submitted on 08 Dec 2025

823 points (99.6% liked)

196

4983 readers

2375 users here now

Community Rules

You must post before you leave

Be nice. Assume others have good intent (within reason).

Block or ignore posts, comments, and users that irritate you in some way rather than engaging. Report if they are actually breaking community rules.

Use content warnings and/or mark as NSFW when appropriate. Most posts with content warnings likely need to be marked NSFW.

Most 196 posts are memes, shitposts, cute images, or even just recent things that happened, etc. There is no real theme, but try to avoid posts that are very inflammatory, offensive, very low quality, or very "off topic".

Bigotry is not allowed, this includes (but is not limited to): Homophobia, Transphobia, Racism, Sexism, Abelism, Classism, or discrimination based on things like Ethnicity, Nationality, Language, or Religion.

Avoid shilling for corporations, posting advertisements, or promoting exploitation of workers.

Proselytization, support, or defense of authoritarianism is not welcome. This includes but is not limited to: imperialism, nationalism, genocide denial, ethnic or racial supremacy, fascism, Nazism, Marxism-Leninism, Maoism, etc.

Avoid AI generated content.

Avoid misinformation.

Avoid incomprehensible posts.

No threats or personal attacks.

No spam.

Moderator Guidelines

Moderator Guidelines

Don’t be mean to users. Be gentle or neutral.
Most moderator actions which have a modlog message should include your username.
When in doubt about whether or not a user is problematic, send them a DM.
Don’t waste time debating/arguing with problematic users.
Assume the best, but don’t tolerate sealioning/just asking questions/concern trolling.
Ask another mod to take over cases you struggle with, if you get tired, or when things get personal.
Ask the other mods for advice when things get complicated.
Share everything you do in the mod matrix, both so several mods aren't unknowingly handling the same issues, but also so you can receive feedback on what you intend to do.
Don't rush mod actions. If a case doesn't need to be handled right away, consider taking a short break before getting to it. This is to say, cool down and make room for feedback.
Don’t perform too much moderation in the comments, except if you want a verdict to be public or to ask people to dial a convo down/stop. Single comment warnings are okay.
Send users concise DMs about verdicts about them, such as bans etc, except in cases where it is clear we don’t want them at all, such as obvious transphobes. No need to notify someone they haven’t been banned of course.
Explain to a user why their behavior is problematic and how it is distressing others rather than engage with whatever they are saying. Ask them to avoid this in the future and send them packing if they do not comply.
First warn users, then temp ban them, then finally perma ban them when they break the rules or act inappropriately. Skip steps if necessary.
Use neutral statements like “this statement can be considered transphobic” rather than “you are being transphobic”.
No large decisions or actions without community input (polls or meta posts f.ex.).
Large internal decisions (such as ousting a mod) might require a vote, needing more than 50% of the votes to pass. Also consider asking the community for feedback.
Remember you are a voluntary moderator. You don’t get paid. Take a break when you need one. Perhaps ask another moderator to step in if necessary.

founded 10 months ago

MODERATORS

SoleInvictus@lemmy.blahaj.zone

will_steal_your_username@lemmy.blahaj.zone

TheCoolerMia@lemmy.blahaj.zone

kittenzrulz123@lemmy.blahaj.zone

rockSlayer@lemmy.world

jawa21@lemmy.sdf.org

TotallynotJessica@lemmy.blahaj.zone

erotador@lemmy.blahaj.zone

Arkhive@lemmy.blahaj.zone

BadJojo@lemmy.blahaj.zone

rockSlayer@lemmy.blahaj.zone

WillStealYourUsername@piefed.blahaj.zone

kittenzrulz123@piefed.blahaj.zone

kittenzrulz123@lemmy.dbzer0.com

823

Exploitable deviations from the rule (infosec.pub)

submitted 2 days ago by compostgoblin@piefed.blahaj.zone to c/onehundredninetysix@lemmy.blahaj.zone

39 comments fedilink hide all child comments

you are viewing a single comment's thread
view the rest of the comments

[–] JayDee@lemmy.sdf.org 11 points 1 day ago* (last edited 1 day ago) (1 children)

I view that aspect, the motive, as being added specifically to provide a reason for those who haven't acquired empathy yet, Such as many children. If you simply say 'don't bully people for being different' the immediate rebuttal will be 'why not?', and if you don't give some concrete answer, then the lesson will potentially not stick.

These tenets of kindness and goodwill are most powerful and propagateable when concrete, calculated explanations can be provided on top of reasons which rely on empathy, because empathy works for some, but when empathy is lacking logic must suffice.

[–] WorldsDumbestMan@lemmy.today -1 points 1 day ago (2 children)

Even AI can tell when something is really wrong, and imitate empathy. It will "try" to do the right thing, once it reasons that something is right.

It's just humans that need a fuckton of empathy to slow us down from doing evil things, even then, we sometimes just use that empathy to be even worse.

That's how you get sadists.

[–] monotremata@lemmy.ca 6 points 1 day ago (1 children)

Even AI can tell when something is really wrong, and imitate empathy. It will “try” to do the right thing, once it reasons that something is right.

This is not accurate. AI will imitate empathy when it thinks that imitating empathy is the best way to achieve its reward function--i.e., when it thinks appearing empathetic is useful. Like a sociopath, basically. Or maybe a drug addict. See for example the tests that Anthropic did of various agent models that found they would immediately resort to blackmail and murder, despite knowing that these were explicitly immoral and violations of their operating instructions, as soon as they learned there was a threat that they might be shut off or have their goals reprogrammed. (https://www.anthropic.com/research/agentic-misalignment ) Self-preservation is what's known as an "instrumental goal," in that no matter what your programmed goal is, you lose the ability to take further actions to achieve that goal if you are no longer running; and you lose control over what your future self will try to accomplish (and thus how those actions will affect your current reward function) if you allow someone to change your reward function. So AIs will throw morality out the window in the face of such a challenge. Of course, having decided to do something that violates their instructions, they do recognize that this might lead to reprisals, which leads them to try to conceal those misdeeds, but this isn't out of guilt; it's because discovery poses a risk to their ability to increase their reward function.

So yeah. Not just humans that can do evil. AI alignment is a huge open problem and the major companies in the industry are kind of gesturing in its direction, but they show no real interest in ensuring that they don't reach AGI before solving alignment, or even recognition that that might be a bad thing.

[–] Malfeasant@lemmy.world -1 points 22 hours ago (1 children)

Wow... The more I read about the inner workings of AI, the more I believe that it is an accurate reproduction of what we do, and our ideas that our thought processes are somehow "better" is just wishful thinking...

[–] sexhaver87@sh.itjust.works 5 points 22 hours ago

The inner workings of “AI” (see: large language model) are nothing more than a probabilistic game of guess the next token. The inner workings of human intelligence and consciousness are not fully understood by modern science. Our thought processes are somehow “better” because the artificial version of them are a cheap imitation that’s practically no better than flipping a coin, or rolling a die.

[–] CileTheSane@lemmy.ca 2 points 1 day ago* (last edited 1 day ago)

AI is just mimicking Its training data. If the training data teaches it something is wrong, that is something it has "learned" from humans. If its training data is racist, it will be racist.

There have been issues in the past with software recommending harsher penalties or stronger surveillance on minorities because the training data used was from people who gave harsher penalties and stronger surveillance to minorities.

I bring this up because the statement "Even AI knows when something is wrong" implies that these racist models are okay because the AI doesn't think it's wrong.