Should Hexbear have a more robust robots.txt? in c/technology@hexbear.net

[–] neo@hexbear.net 2 points 1 year ago* (last edited 1 year ago)

I used to sit and monitor my server access logs. You can tell by the access patterns. Many of the well-behaved bots announce themselves in their user agents, so you can see when they're on. I could see them crawl the main body of my website, but not go to a subdomain, which is clearly linked from the homepage but is disallowed from my robots.txt.

On the other hand, spammy bots that are trying to attack you will often instead have access patterns that try to probe your website for common configurations for common CMSes like WordPress. They don't tend to crawl.

Google also provides a tool to test robots.txt, for example.

Should Hexbear have a more robust robots.txt? in c/technology@hexbear.net

[–] neo@hexbear.net 7 points 1 year ago (3 children)

I agree with your points 2-4 but I have observed on my own website that the crawlers who don't respect won't, and the crawlers who do respect will.

Should Hexbear have a more robust robots.txt? in c/technology@hexbear.net

[–] neo@hexbear.net 9 points 1 year ago (8 children)

Of course it's voluntary, but if entities like OpenAI say they will respect it then presumably they really will.

Who here has actually used any GPT or Image generator models? in c/technology@hexbear.net

[–] neo@hexbear.net 1 points 1 year ago

Truly. But this was just something I downloaded off the net and wanted to repurpose for my own needs, so rearchitecting it to use Grid or Flex was way more effort than I wished to put in.

Who here has actually used any GPT or Image generator models? in c/technology@hexbear.net

[–] neo@hexbear.net 6 points 1 year ago* (last edited 1 year ago) (2 children)

I've used a self-hosted Llama 3 to answer some questions about css and centering a div that I was having trouble with (I'm not a web dev by profession, nor am I aspiring to be one). You have to prod at it a few times to get it to tell you something useful which it ultimately did.

That's about as far as I can work with it: asking and re-asking it very common questions that have been discussed and answered 700 times over (but the answer to which is unknown to me, specifically) in the hopes of getting something actually useful. So to that end, of course it can give me an example implementation of common leetcode questions in C, but it cannot reliably do something that requires a bit more originality.