AntiTrumpAlliance

1257 readers

92 users here now

About

An alliance among all who oppose Donald Trump's actions, positions, cabinet, supporters, policies, or motives. This alliance includes anyone from the left or the right; anyone from any religion or lack thereof; anyone from any country or state; any man, woman or child.

Rules

-No pro-Trump posts or comments

-No off topic posts

-Be civil

-No trolling

-Follow Lemmy terms of service

Social Media

Discord

Other Communities

!desantisthreatensusa@lemmy.world

!antifascism@midwest.social

!politicus@kbin.social

founded 2 years ago

MODERATORS

Zealousideal_Fox900@lemmy.world

944

There's "Reality" and then there's whatever the hell this is (infosec.pub)

submitted 1 week ago by bytesonbike@discuss.online to c/antitrumpalliance@lemmy.world

156 comments fedilink hide all child comments

you are viewing a single comment's thread
view the rest of the comments

[–] luciferofastora@feddit.org 4 points 1 week ago* (last edited 1 week ago) (5 children)

That's image generation, not LLM (language/text generation), but the point stands

[–] vivalapivo@lemmy.today 8 points 1 week ago (4 children)

Hate to bring it to you, but today's image generation comes through LLMs

[–] luciferofastora@feddit.org 1 points 1 week ago* (last edited 1 week ago) (3 children)

(Multimodal) GPT ≠ "pure" LLM. GPT-4o uses an LLM for the language parts, as well as having voice processing and generation built-in, but it uses a technically distinct (though well-integrated) model called "GPT Image 1" for generating images.

You can't really train or treat image generation with the same approach as natural language, given it isn't natural language. A binary string doesn't adhere to the same patterns as human speech.

[–] BluesF@lemmy.world 4 points 1 week ago (1 children)

Just curious, does the LLM generate a text prompt for the image model, or is there a deeper integration at the embedding level/something else?

[–] luciferofastora@feddit.org 2 points 1 week ago

According to CometAPI:

Text prompts are first tokenized into word embeddings, while image inputs—if provided—are converted into patch embeddings [...] These embeddings are then concatenated and processed through shared self‑attention layers.

I haven't found any other sources to back that up, because most platforms seem more concerned with how to access it than how it works under the hood.

load more comments (1 replies)