Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Is returning a 403 based on the user agent worth a blog post? Also, can't Bytespider just change their user agent to Byte-Spider? Or, just make their user agent a random string? It will be a forever arms race and require constant code updates to keep chasing that bot by user agent. You're probably better off whitelisting the known user agents and blocking everything else.

Also, does it really require a specific "gem"? This is HTTP request filtering, the router (as in the real router, like the metal box with network cables) can probably do it by itself these days.



For me the interesting part is that the crawler is going bezerk. Never ever should a single crawler be the cause of 80% of traffic.

Also why should they not respect the 403? Crawlers just go to anything they can find. It is not a targeted attack.


It might not be, but I couldn't find much about the topic so I figured I'd write it up and share. And you're right that this may be a bit of whack-a-mole, but for now I've cut my bandwidth down which means I may be able to downgrade my cloudinary plan to a lower tier, which is a big win for me since it accounts for like 20-30% of my total operating cost




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: