The main maintainer of curl recently encountered a similar thing. Some users had used their own models to find and report hundreds of potential errors (and were open about using those tools when asked). After review, the maintainers incorporated around 40% of the suggested fixes, some being actual breaks and some being semantic QoL fixes. He was surprised that an AI might actually be useful for something like that.
But in the whole process, there was a human reviewing and checking the work. At no point were these fixes just taken as gospel, and even the reporters were using their own specialized models for this task. I think introducing AI-powered analysis isn't necessarily a bad thing, but relying upon public models and cutting out humans anywhere in the review and application process is a recipe for disaster.
Oof, that's rough.