1

44

submitted 2 years ago by db0@lemmy.dbzer0.com to c/stable_diffusion@lemmy.dbzer0.com

15 comments fedilink

This is a copy of /r/stablediffusion wiki to help people who need access to that information

Howdy and welcome to r/stablediffusion! I'm u/Sandcheeze and I have collected these resources and links to help enjoy Stable Diffusion whether you are here for the first time or looking to add more customization to your image generations.

If you'd like to show support, feel free to send us kind words or check out our Discord. Donations are appreciated, but not necessary as you being a great part of the community is all we ask for.

Note: The community resources provided here are not endorsed, vetted, nor provided by Stability AI.

#Stable Diffusion

Local Installation

Active Community Repos/Forks to install on your PC and keep it local.

Online Websites

Websites with usable Stable Diffusion right in your browser. No need to install anything.

Mobile Apps

Stable Diffusion on your mobile device.

Tutorials

Learn how to improve your skills in using Stable Diffusion even if a beginner or expert.

Dream Booth

How-to train a custom model and resources on doing so.

Models

Specially trained towards certain subjects and/or styles.

Embeddings

Tokens trained on specific subjects and/or styles.

Bots

Either bots you can self-host, or bots you can use directly on various websites and services such as Discord, Reddit etc

3rd Party Plugins

SD plugins for programs such as Discord, Photoshop, Krita, Blender, Gimp, etc.

Other useful tools

Diffusion Toolkit - Image viewer/organizer that scans your images for PNGInfo generated.
Pixiz Morphing - Easily transition between 2 photos.
Bulk Image Resizing Made Easy 2.0

#Community

Games

PictionAIry : (Video|2-6 Players) - The image guessing game where AI does the drawing!

Podcasts

This is Not An AI Art Podcast - Doug Smith talks about Ai Art and provides the prompts/workflow on his site.

Databases or Lists

AiArtApps
Stable Diffusion Akashic Records
Questianon's SD Updates 1
Questianon's SD Updates 2
SW-Yw's Stable Diffusion Repo List
Plonk's SD Model List (NSFW)
Nightkall's Useful Lists
Civitai - Website with a list of custom models.

Still updating this with more links as I collect them all here.

FAQ

How do I use Stable Diffusion?

Check out our guides section above!

Will it run on my machine?

Stable Diffusion requires a 4GB+ VRAM GPU to run locally. However, much beefier graphics cards (10, 20, 30 Series Nvidia Cards) will be necessary to generate high resolution or high step images. However, anyone can run it online through DreamStudio or hosting it on their own GPU compute cloud server.
Only Nvidia cards are officially supported.
AMD support is available here unofficially.
Apple M1 Chip support is available here unofficially.
Intel based Macs currently do not work with Stable Diffusion.

How do I get a website or resource added here?

*If you have a suggestion for a website or a project to add to our list, or if you would like to contribute to the wiki, please don't hesitate to reach out to us via modmail or message me.

2

9

Emerging Properties in Unified Multimodal Pretraining (infosec.pub)

submitted 2 days ago* (last edited 2 days ago) by Even_Adder@lemmy.dbzer0.com to c/stable_diffusion@lemmy.dbzer0.com

0 comments fedilink

Abstract

Unifying multimodal understanding and generation has shown impressive capabilities in cutting-edge proprietary systems. In this work, we introduce BAGEL, an open0source foundational model that natively supports multimodal understanding and generation. BAGEL is a unified, decoder0only model pretrained on trillions of tokens curated from large0scale interleaved text, image, video, and web data. When scaled with such diverse multimodal interleaved data, BAGEL exhibits emerging capabilities in complex multimodal reasoning. As a result, it significantly outperforms open-source unified models in both multimodal generation and understanding across standard benchmarks, while exhibiting advanced multimodal reasoning abilities such as free-form image manipulation, future frame prediction, 3D manipulation, and world navigation. In the hope of facilitating further opportunities for multimodal research, we share the key findings, pretraining details, data creation protocal, and release our code and checkpoints to the community. The project page is at this https URL

Paper: https://arxiv.org/abs/2505.14683

Code: https://github.com/bytedance-seed/BAGEL

Demo: https://demo.bagel-ai.org/

Project Page: https://bagel-ai.org/

Model: https://huggingface.co/ByteDance-Seed/BAGEL-7B-MoT

3

19

Civitai pausing all credit card payments (civitai.com)

submitted 3 days ago by db0@lemmy.dbzer0.com to c/stable_diffusion@lemmy.dbzer0.com

5 comments fedilink

4

10

Step1X-3D: Towards High-Fidelity and Controllable Generation of Textured 3D Assets (infosec.pub)

submitted 3 days ago by Even_Adder@lemmy.dbzer0.com to c/stable_diffusion@lemmy.dbzer0.com

0 comments fedilink

Abstract

While generative artificial intelligence has advanced significantly across text, image, audio, and video domains, 3D generation remains comparatively underdeveloped due to fundamental challenges such as data scarcity, algorithmic limitations, and ecosystem fragmentation. To this end, we present Step1X-3D, an open framework addressing these challenges through: (1) a rigorous data curation pipeline processing >5M assets to create a 2M high-quality dataset with standardized geometric and textural properties; (2) a two-stage 3D-native architecture combining a hybrid VAE-DiT geometry generator with an diffusion-based texture synthesis module; and (3) the full open-source release of models, training code, and adaptation modules. For geometry generation, the hybrid VAE-DiT component produces TSDF representations by employing perceiver-based latent encoding with sharp edge sampling for detail preservation. The diffusion-based texture synthesis module then ensures cross-view consistency through geometric conditioning and latent-space synchronization. Benchmark results demonstrate state-of-the-art performance that exceeds existing open-source methods, while also achieving competitive quality with proprietary solutions. Notably, the framework uniquely bridges the 2D and 3D generation paradigms by supporting direct transfer of 2D control techniques~(e.g., LoRA) to 3D synthesis. By simultaneously advancing data quality, algorithmic fidelity, and reproducibility, Step1X-3D aims to establish new standards for open research in controllable 3D asset generation.

Technical Report: https://arxiv.org/abs/2505.07747

Code: https://github.com/stepfun-ai/Step1X-3D

Demo: https://huggingface.co/spaces/stepfun-ai/Step1X-3D

Project Page: https://stepfun-ai.github.io/Step1X-3D/

Models: https://huggingface.co/stepfun-ai/Step1X-3D

5

4

AniSora: Exploring the Frontiers of Animation Video Generation in the Sora Era (files.catbox.moe)

submitted 5 days ago* (last edited 5 days ago) by Even_Adder@lemmy.dbzer0.com to c/stable_diffusion@lemmy.dbzer0.com

1 comments fedilink

Abstract

Animation has gained significant interest in the recent film and TV industry. Despite the success of advanced video generation models like Sora, Kling, and CogVideoX in generating natural videos, they lack the same effectiveness in handling animation videos. Evaluating animation video generation is also a great challenge due to its unique artist styles, violating the laws of physics and exaggerated motions. In this paper, we present a comprehensive system, AniSora, designed for animation video generation, which includes a data processing pipeline, a controllable generation model, and an evaluation benchmark. Supported by the data processing pipeline with over 10M high-quality data, the generation model incorporates a spatiotemporal mask module to facilitate key animation production functions such as image-to-video generation, frame interpolation, and localized image-guided animation. We also collect an evaluation benchmark of 948 various animation videos, with specifically developed metrics for animation video generation. Our entire project is publicly available on this https URL.

Paper: https://arxiv.org/abs/2412.10255

Code: https://github.com/bilibili/Index-anisora/tree/main

Hugging Face: https://huggingface.co/IndexTeam/Index-anisora

Modelscope: https://www.modelscope.cn/organization/bilibili-index

Project Page: https://komiko.app/video/AniSora

6

8

Capsize-Games/airunner: v4.8.0 OpenVoice support (github.com)

submitted 6 days ago by Even_Adder@lemmy.dbzer0.com to c/stable_diffusion@lemmy.dbzer0.com

1 comments fedilink

Release: https://github.com/Capsize-Games/airunner/releases/tag/v4.8.0

7

3

rupeshs/fastsdcpu Release v1.0.0-beta.250 SANA sprint CPU support(OpenVINO) (github.com)

submitted 6 days ago* (last edited 6 days ago) by Even_Adder@lemmy.dbzer0.com to c/stable_diffusion@lemmy.dbzer0.com

0 comments fedilink

Release : https://github.com/rupeshs/fastsdcpu/releases/tag/v1.0.0-beta.250

8

10

Stability AI and Arm Collaborate to Release Stable Audio Open Small, Enabling Real-World Deployment for On-Device Audio Generation — Stability AI (stability.ai)

submitted 1 week ago by Even_Adder@lemmy.dbzer0.com to c/stable_diffusion@lemmy.dbzer0.com

2 comments fedilink

9

fpgaminer/joycaption: JoyCaption Beta One Release - An image captioning Visual Language Model (infosec.pub)

submitted 1 week ago* (last edited 1 week ago) by Even_Adder@lemmy.dbzer0.com to c/stable_diffusion@lemmy.dbzer0.com

1 comments fedilink

Model: https://huggingface.co/fancyfeast/llama-joycaption-beta-one-hf-llava

Demo: https://huggingface.co/spaces/fancyfeast/joy-caption-beta-one

Release Post: https://civitai.com/articles/14672

10

1

tin2tin/Pallaidium: v0.2.2 FramePack & LTX 0.9.7 Support Added (github.com)

submitted 2 weeks ago by Even_Adder@lemmy.dbzer0.com to c/stable_diffusion@lemmy.dbzer0.com

0 comments fedilink

11

14

Stable Diffusion Moment of Audio?? Ace-Step Audio Model Native Support in ComfyUI! (blog.comfy.org)

submitted 2 weeks ago by Even_Adder@lemmy.dbzer0.com to c/stable_diffusion@lemmy.dbzer0.com

0 comments fedilink

ACE-Step: https://lemmy.dbzer0.com/post/43702515

12

3

3dmindscapper/ComfyUI-Sam-Mesh: Comfyui Implementation of SaMesh Segmentation of 3D Meshes (github.com)

submitted 2 weeks ago by Even_Adder@lemmy.dbzer0.com to c/stable_diffusion@lemmy.dbzer0.com

0 comments fedilink

13

5

colinurbs/FramePack-Studio: Adding timestamped prompts and general quality of life features to FramePack (github.com)

submitted 2 weeks ago by Even_Adder@lemmy.dbzer0.com to c/stable_diffusion@lemmy.dbzer0.com

0 comments fedilink

14

5

PixelWave 04 - An Aesthetic Fine Tune of FLUX.1-schnell (huggingface.co)

submitted 2 weeks ago by Even_Adder@lemmy.dbzer0.com to c/stable_diffusion@lemmy.dbzer0.com

0 comments fedilink

Civitai Download: https://civitai.com/models/141592

15

7

Script to download Checkpoint (model/LORA) metadata from CivitAI based on the models in your local machine (gist.github.com)

submitted 2 weeks ago by Even_Adder@lemmy.dbzer0.com to c/stable_diffusion@lemmy.dbzer0.com

0 comments fedilink

16

0

Free Google Colab (T4) ForgeWebUI for Flux1.D + Adetailer (soon) + Shared Gradio | Civitai (civitai.com)

submitted 2 weeks ago by Even_Adder@lemmy.dbzer0.com to c/stable_diffusion@lemmy.dbzer0.com

0 comments fedilink

17

10

HiDream-ai/HiDream-E1 An Image Editing Model Built on HiDream-I1 (github.com)

submitted 3 weeks ago by Even_Adder@lemmy.dbzer0.com to c/stable_diffusion@lemmy.dbzer0.com

0 comments fedilink

18

5

SDNext Release for 2024-02-22 (github.com)

submitted 3 weeks ago* (last edited 3 weeks ago) by Even_Adder@lemmy.dbzer0.com to c/stable_diffusion@lemmy.dbzer0.com

0 comments fedilink

Change Log for SD.Next

Highlights for 2025-04-28

Another major release with over 120 commits!
Highlights include new Nunchaku Wiki inference engine that allows running FLUX.1 with 3-5x higher performance!
And a new FramePack extension for high-quality I2V and FLF2V video generation with unlimited duration!

What else?

New UI History tab
New models: Flex.2, LTXVideo-0.9.6, WAN-2.1-14B-FLF2V, schedulers: UniPC and LCM FlowMatch, features: CFGZero
Major updates to: NNCF, OpenVINO, ROCm, ZLUDA
Cumulative fixes since last release

ReadMe | ChangeLog | Docs | WiKi | Discord

19

6

aigc-apps/VideoX-Fun: Support for 14B and 1.3B model Control + Reference Image models, support for camera control, and retrained Inpaint model (github.com)

submitted 3 weeks ago by Even_Adder@lemmy.dbzer0.com to c/stable_diffusion@lemmy.dbzer0.com

0 comments fedilink

Model: https://huggingface.co/collections/alibaba-pai/wan21-fun-v11-680f514c89fe7b4df9d44f17

20

10

Capsize-Games/airunner: A privacy focused, local-first, multi-modal inference engine and agent platform for running LLMs, image generation, speech processing, and tool-based automation (github.com)

submitted 3 weeks ago by Even_Adder@lemmy.dbzer0.com to c/stable_diffusion@lemmy.dbzer0.com

0 comments fedilink

21

6

DDT: Decoupled Diffusion Transformer (infosec.pub)

submitted 3 weeks ago by Even_Adder@lemmy.dbzer0.com to c/stable_diffusion@lemmy.dbzer0.com

0 comments fedilink

Abstract

Diffusion transformers have demonstrated remarkable generation quality, albeit requiring longer training iterations and numerous inference steps. In each denoising step, diffusion transformers encode the noisy inputs to extract the lower-frequency semantic component and then decode the higher frequency with identical modules. This scheme creates an inherent optimization dilemma: encoding low-frequency semantics necessitates reducing high-frequency components, creating tension between semantic encoding and high-frequency decoding. To resolve this challenge, we propose a new Decoupled Diffusion Transformer (DDT), with a decoupled design of a dedicated condition encoder for semantic extraction alongside a specialized velocity decoder. Our experiments reveal that a more substantial encoder yields performance improvements as model size increases. For ImageNet 256×256, Our DDT-XL/2 achieves a new state-of-the-art performance of 1.31 FID (nearly 4× faster training convergence compared to previous diffusion transformers). For ImageNet 512×512, Our DDT-XL/2 achieves a new state-of-the-art FID of 1.28. Additionally, as a beneficial by-product, our decoupled architecture enhances inference speed by enabling the sharing self-condition between adjacent denoising steps. To minimize performance degradation, we propose a novel statistical dynamic programming approach to identify optimal sharing strategies.

Paper: https://arxiv.org/abs/2504.05741

Code: https://github.com/MCG-NJU/DDT

Demo: https://huggingface.co/spaces/MCG-NJU/DDT

22

10

From Reflection to Perfection: Scaling Inference-Time Optimization for Text-to-Image Diffusion Models via Reflection Tuning (infosec.pub)

submitted 4 weeks ago by Even_Adder@lemmy.dbzer0.com to c/stable_diffusion@lemmy.dbzer0.com

0 comments fedilink

Abstract

Recent text-to-image diffusion models achieve impressive visual quality through extensive scaling of training data and model parameters, yet they often struggle with complex scenes and fine-grained details. Inspired by the self-reflection capabilities emergent in large language models, we propose ReflectionFlow, an inference-time framework enabling diffusion models to iteratively reflect upon and refine their outputs. ReflectionFlow introduces three complementary inference-time scaling axes: (1) noise-level scaling to optimize latent initialization; (2) prompt-level scaling for precise semantic guidance; and most notably, (3) reflection-level scaling, which explicitly provides actionable reflections to iteratively assess and correct previous generations. To facilitate reflection-level scaling, we construct GenRef, a large-scale dataset comprising 1 million triplets, each containing a reflection, a flawed image, and an enhanced image. Leveraging this dataset, we efficiently perform reflection tuning on state-of-the-art diffusion transformer, FLUX.1-dev, by jointly modeling multimodal inputs within a unified framework. Experimental results show that ReflectionFlow significantly outperforms naive noise-level scaling methods, offering a scalable and compute-efficient solution toward higher-quality image synthesis on challenging tasks.

Paper: https://arxiv.org/abs/2504.16080

Code: https://github.com/Diffusion-CoT/ReflectionFlow

GenRef: https://huggingface.co/collections/diffusion-cot/reflectionflow-release-6803e14352b1b13a16aeda44

Project Page: https://diffusion-cot.github.io/reflection2perfection/

23

9

Skywork/SkyReels-V2-I2V-14B-720P - Infinite-Length Film Generative Model 720p Models Released (huggingface.co)

submitted 4 weeks ago by Even_Adder@lemmy.dbzer0.com to c/stable_diffusion@lemmy.dbzer0.com

1 comments fedilink

24

14

CivitAI will start banning some content soon (civitai.com)

submitted 4 weeks ago by db0@lemmy.dbzer0.com to c/stable_diffusion@lemmy.dbzer0.com

8 comments fedilink

This was bound to happen. No centralized VC-backed company could stay uncensored like that forever.

CivitAI already gathered all the growth it could, now it has to focus on making a profit and therefore pleasing credit card companies and advertisers.

Terms of Service Update We’ve updated Section 9.6 of our Terms of Service (ToS) to explicitly prohibit content depicting:

Incest

Self-harm, including depictions of anorexia or bulimia

Content that promotes hate, harm, or extremist ideologies

Bodily excretions and related fetishes, including;

Urine

Vomit

Menstruation

Diapers

And to;

Require all mature content (X, XXX) to include generation metadata. You'll be prompted to add metadata when uploading new content.

Additionally, the following content depicted in any mature or suggestive context (X, XXX) is explicitly prohibited;

Firearms aimed at or pointed towards individuals.

Mind-altered states, including being drunk, drugged, under hypnosis, or mind control.

Depiction of illegal substances or regulated products (e.g. narcotics, pharmaceuticals)

25

4

ostris/Flex.2-preview (huggingface.co)

submitted 1 month ago by Even_Adder@lemmy.dbzer0.com to c/stable_diffusion@lemmy.dbzer0.com

0 comments fedilink

Stable Diffusion

Also see

Other communities

Local Installation

Online Websites

Mobile Apps

Tutorials

Dream Booth

Models

Embeddings

Bots

3rd Party Plugins

Other useful tools

Games

Podcasts

Databases or Lists

FAQ

How do I use Stable Diffusion?

Will it run on my machine?

How do I get a website or resource added here?

Abstract

Abstract

Abstract

Change Log for SD.Next

Highlights for 2025-04-28

Abstract

Abstract