hedgehog

joined 2 years ago
[–] hedgehog@ttrpg.network 3 points 4 hours ago

To be clear, I agree that the line you quoted is almost assuredly incorrect. If they changed it to "thousands of deepfake apps powered by open source technology" then I'd still be dubious, simply because it seems weird that there would be thousands of unique apps that all do the same thing, but that would at least be plausible. Most likely they misread something like https://techxplore.com/news/2025-05-downloadable-deepfake-image-generators.html and thought "model variant" (which in this context, explicitly generally means LoRA) and just jumped too hard on the "everything is an open source app" bandwagon.

I did some research - browsing https://github.com/topics/deepfakes (which has 153 total repos listed, many of which are focused on deepfake detection), searching DDG, clicking through to related apps from Github repos, etc..

In terms of actual open source deepfake apps, let's assume that "app" means, at minimum, a piece of software you can run locally, assuming you have access to arbitrary consumer-targeted hardware - generally at least an Nvidia desktop GPU - and including it regardless of whether you have to write custom code to use it (so long as the code is included), use the CLI, hit an API, use a GUI app, a web browser, or a phone app. Considering only apps that have as a primary use case, the capability to create deepfakes by face swapping videos, there are nonetheless several:

  • Roop
  • Roop Unleashed
  • Rope
  • Rope Live
  • VisoMaster
  • DeepFaceLab
  • DeepFaceLive
  • Reactor UI
  • inswapper
  • REFace
  • Refacer
  • Faceswap
  • deepfakes_faceswap
  • SimSwap

If you included forks of all those repos, then you'd definitely get into the thousands.

If you count video generation applications that can imitate people using, at minimum, Img2Img and 1 Lora OR 2 Loras, then these would be included as well:

  • Wan2GP
  • HunyuanVideoGP
  • FramePack Studio
  • FramePack eichi

And if you count the tools that integrate those, then these probably all count:

  • ComfyUI
  • Invoke AI
  • SwarmUI
  • SDNext
  • Automatic1111 SD WebUI
  • Fooocus
  • SD WebUI Forge
  • MetaStable
  • EasyDiffusion
  • StabilityMatrix
  • MochiDiffusion

If the potential criminals use easier ready-made (commercial) web-services instead of buying a RTX 5090, learning ComfyUI, dealing with the steep learning curve etc, we’d know we have to primarily fight those apps and services, not necessarily the generative AI tools.

This is the part where, to be able to answer that, someone would need to go and actually test out the deepfake apps and compare their outputs. I know that they get used for deepfakes because I've seen the outputs, but as far as I know, every single major platform - e.g., Kling, Veo, Runway, Sora - has safeguards in place to prevent nudity and sexual content. I'd be very surprised if they were being used en masse for this.

In terms of the SaaS apps used by people seeking to create nonconsensual, sexually explicit deepfakes... my guess is those are actually not really part of the figure that's being referenced in this article. It really seems like they're talking about doing video gen with LoRAs rather than doing face swaps.

[–] hedgehog@ttrpg.network 3 points 12 hours ago (2 children)

Without searching for them myself to confirm, it’s plausible, especially if you take it to mean “apps leveraging open source AI technology.”

There are a ton of open source AI repos, many of which provide video related capabilities. The number of true open source AI models is very slim, but “Open weight” AI models are commonly referred to as open source, and from the perspective of building your app, fine tuning the model, or creating Loras for it, open weight is good enough.

Some Loras come with details on the training data set, so even if the base model is only open weights, the Lora can still be open source.

Until recently, Civitai had Loras for famous people, e.g., Emma Watson, and apparently just regular people. There was a post here last week, I think (or maybe to some other community), to 404 Media, about those being taken down thanks to credit card processors drawing a line in the sand at deepfake imagery.

ComfyUI is a self hostable AI platform (and there are also many hosts that offer it) that lets you build a workflow from multiple nodes, each of which generally integrates some open source AI tech that was otherwise released. For example, there are nodes that add the capabilities to perform:

  • image generation with Stable Diffusion, Flux, Hidream, etc
  • TTS with KokoroTTS, Piper, F5 TTS, etc
  • video generation with AnimateDiff, Cog, Wan2.1, Hunyuan, FramePack, FantasyTalking, Float
  • video modification, i.e., LatentSync, which takes a video and lipsyncs it to a provided audio file
  • image manipulation, i.e., controlnet, img2img, inpainting, outpainting, or even specific tasks like “remove the background” or “change the face to this other face”

If you think of a deepfake as just a video of a recognizable person doing a thing, you can create a deepfake by:

  • taking an existing video and swapping the face in each frame
  • faceswap video specific approaches, i.e., Roop.
  • an image to video workflow, i.e., with Wan: “the person dances.” You can expand the options available with Wan by using Loras.
  • a text to video workflow, where you use a Lora for that person
  • an image+audio to video workflow, i.e., with FantasyTalking/Float, creating a lipsync to an audio file you provide
  • a video+audio to video workflow with LatentSync to make it look like they said something different, particularly using a TTS (like F5 TTS) that does voice cloning to generate the new audio

My suspicion is that most of the AI apps that are available online are just repackaging these open source technologies, but are not open source themselves. There are certainly some, of course, though the ones I know of are more generic and not deepfake specific (ComfyUI, SwarmUI, Invoke AI, Automatic1111, Forge, Fooocus, n8n, FramePack Studio, FramePack Eichi, Wan2GP, etc.).

This isn’t a licensing issue, as many open source projects are licensed with MIT or Apache licenses, which don’t require you to open source derivative products. Even if they used the GPL, it wouldn’t be required for a SaaS web app. Only the AGPL would protect against that, and even then, only the changes to the AGPL library would need to be shared; the front end app could still be proprietary.

The other issue could be them not knowing what “app” means. If you think of a Lora as an app, then the sentence might be accurate. I don’t know for sure that there were thousands of Loras for people that published their training data, but I wouldn’t be surprised if that were the case.

[–] hedgehog@ttrpg.network 1 points 6 days ago* (last edited 6 days ago)

I think the best way to handle this would be to just encode everything and upload all files. If I wanted some amount of history, I'd use some file system with automatic snapshots, like ZFS.

If I wanted to do what you've outlined, I would probably use rclone with filtering for the extension types or something along those lines.

If I wanted to do this with Git specifically, though, this is what I would try first:

First, add lossless extensions (*.flac, *.wav) to my repo's .gitignore

Second, schedule a job on my local machine that:

  1. Watches for changes to the local file system (e.g., with inotifywait or fswatch)
  2. For any new lossless files, if there isn't already an accompanying lossy files (i.e., identified by being collocated, having the exact same filename, sans extension, with an accepted extension, e.g., .mp3, .ogg - possibly also with a confirmation that the codec is up to my standards with a call to ffprobe, avprobe, mediainfo, exiftool, or something similar), it encodes the file to your preferred lossy format.
  3. Use git status --porcelain to if there have been any changes.
  4. If so, run git add --all && git commit --message "Automatic commit" && git push
  5. Optionally, automatically craft a better commit message by checking which files have been changed, generating text like Added album: "Satin Panthers - EP" by Hudson Mohawke or Removed album: "Brat" by Charli XCX; Added album "Brat and it's the same but there's three more songs so it's not" by Charli XCX

Third, schedule a job on my ~~remote machine~~ server that runs git pull at regular intervals.

One issue with this approach is that if you delete a file (as opposed to moving it), the space is not recovered on your local or your server. If space on your server is a concern, you could work around that by running something like the answer here (adjusting the depth to an appropriate amount for your use case):

git fetch --depth=1
git reflog expire --expire-unreachable=now --all
git gc --aggressive --prune=all

Another potential issue is that what I described above involves having an intermediary git to push to and pull from, e.g., running on a hosted Git forge, like GitHub, Codeberg, etc.. This could result in getting copyright complaints or something along those lines, though.

Alternatively, you could use your server as the git server (or check out forgejo if you want a Git forge as well), but then you can't use the above trick to prune file history and save space from deleted files (on the server, at least - you could on your local, I think). If you then check out your working copy in a way such that Git can use hard links, you should at least be able to avoid needing to store two copies on your server.

~~The other thing to check out, if you take this approach, is git lfs.~~ EDIT: Actually, I take that back - you probably don't want to use Git LFS.

[–] hedgehog@ttrpg.network 2 points 1 week ago (1 children)

Are you talking about a warning for a self signed cert or for not using HTTPS?

[–] hedgehog@ttrpg.network 3 points 1 week ago

It was already known before the whistleblower that:

  1. Siri inputs (all STT at that time, really) were processed off device
  2. Siri had false activations

The “sinister” thing that we learned was that Apple was reviewing those activations to see if they were false, with the stated intent (as confirmed by the whistleblower) of using them to reduce false activations.

There are also black box methods to verify that data isn’t being sent and that particular hardware (like the microphone) isn’t being used, and there are people who look for vulnerabilities as a hobby. If the microphones on the most/second most popular phone brand (iPhone, Samsung) were secretly recording all the time, evidence of that would be easy to find and would be a huge scoop - why haven’t we heard about it yet?

Snowden and Wikileaks dumped a huge amount of info about governments spying, but nothing in there involved always on microphones in our cell phones.

To be fair, an individual phone is a single compromise away from actually listening to you, so it still makes sense to avoid having sensitive conversations within earshot of a wirelessly connected microphone. But generally that’s not the concern most people should have.

Advertising tracking is much more sinister and complicated and harder to wrap your head around than “my phone is listening to me” and as a result makes for a much less glamorous story, but there are dozens, if not hundreds or thousands, of stories out there about how invasive advertising companies’ methods are, about how they know too much, etc.. Think about what LLMs do with text. The level of prediction that they can do. That’s what ML algorithms can do with your behavior.

If you’re misattributing what advertisers know about you to the phone listening and reporting back, then you’re not paying attention to what they’re actually doing.

So yes - be vigilant. Just be vigilant about the right thing.

[–] hedgehog@ttrpg.network 5 points 1 week ago (2 children)

proven by a whistleblower from apple

Assuming you have an iPhone. And even then, the whistleblower you’re referencing was part of a team who reviewed utterances by users with the “Hey Siri” wake word feature enabled. If you had Siri disabled entirely or had the wake word feature disabled, you weren’t impacted at all.

This may have been limited to impacting only users who also had some option like “Improve Siri and Dictation” enabled, but it’s not clear. Today, the Privacy Policy explicitly says that Apple can have employees review your interactions with Siri and Dictation (my understanding is the reason for the settlement is that they were not explicit that human review was occurring). I strongly recommend disabling that setting, particularly if you have a wake word enabled.

If you have wake words enabled on your phone or device, your phone has to listen to be able to react to them. At that point, of course the phone is listening. Whether it’s sending the info back somewhere is a different story, and there isn’t any evidence that I’m aware of that any major phone company does this.

[–] hedgehog@ttrpg.network 2 points 1 week ago (1 children)

Sure - Wikipedia says it better than I could hope to:

As English-linguist Larry Andrews describes it, descriptive grammar is the linguistic approach which studies what a language is like, as opposed to prescriptive, which declares what a language should be like.[11]: 25  In other words, descriptive grammarians focus analysis on how all kinds of people in all sorts of environments, usually in more casual, everyday settings, communicate, whereas prescriptive grammarians focus on the grammatical rules and structures predetermined by linguistic registers and figures of power. An example that Andrews uses in his book is fewer than vs less than.[11]: 26  A descriptive grammarian would state that both statements are equally valid, as long as the meaning behind the statement can be understood. A prescriptive grammarian would analyze the rules and conventions behind both statements to determine which statement is correct or otherwise preferable. Andrews also believes that, although most linguists would be descriptive grammarians, most public school teachers tend to be prescriptive.[11]: 26

[–] hedgehog@ttrpg.network 4 points 1 week ago (4 children)

You might be interested in reading up on the debate of “Prescriptive vs Descriptive” approaches in a linguistics context.

[–] hedgehog@ttrpg.network 2 points 1 week ago

You should try watching the live action series next - I bet you’d love it.

[–] hedgehog@ttrpg.network 3 points 1 week ago

The one I grabbed to test was the ROG Azoth.

I also checked my Iris and Moonlander - both cap out at 6, but I believe I can update that to be higher with QMK or add a config key via Oryx on the Moonlander to turn it on.

[–] hedgehog@ttrpg.network 4 points 1 week ago (2 children)

Per this thread from 2009, the limit was conditional upon using a particular keyboard descriptor documented elsewhere in the spec, but keyboards are not required to use that descriptor.

I tested just now on one of my mechanical keyboards, on MacOS, connected via USB C, using the Online Key Rollover Test, and was able to get 44 keys registered at the same time.

 

This only applies when the homophone is spoken or part of an audible phrase, so written text is safe.

It doesn’t change reality, just how people interpret something said aloud. You could change “Bare hands” to be interpreted as “Bear hands,” for example, but the person wouldn’t suddenly grow bear hands.

You can only change the meaning of the homophones.

It’s not all or nothing. You can change how a phrase is interpreted for everyone, or:

  • You can affect only a specific instance of a phrase - including all recordings of it, if you want - but you need to hear that instance - or a recording of it - to do so. If you hear it live, you can affect everyone else’s interpretation as it’s spoken.
  • You can choose not to affect how it is perceived by people when they say it aloud, and only when they hear it.
  • You can affect only the perception of particular people for a given phrase, but you must either be point at them (pictures work) or be able to refer to them with five or fewer words, at least one of which is a homophone. For example, “my aunt.” Note that if you do this, both interpretations of the homophone are affected, if relevant, (e.g., “my ant”).
  • You can make it so there’s a random chance (in 5% intervals, from 5% to 95%) that a phrase is misinterpreted.
 

cross-posted from: https://lemmy.world/post/19716272

Meta fed its AI on almost everything you’ve posted publicly since 2007

 

The video teaser yesterday about this was already DMCAed by Nintendo, so I don’t think this video will be up long.

view more: next ›