We do know, just ask anyone who runs a more popular site or does anything where abuse can be monetized (shopping, reviews, etc.). Avoiding that due to obscurity isn’t an answer because it’s saying you’re safe until something, possibly outside of your control, causes the bots to descend and give you an extra 500M requests with no chance of revenue.
I’m with OP: I don’t like this but the alternatives all look like the death of the open web.
The person you're responding to already said they ran a modestly sized site. What actual scale opens one up to abuse? If only the top 1% of sites need it, then it seems silly to say "everyone" needs it.
Stack Overflow was outside of the Cloudflare network for years, and anti-abuse was maybe 3 or 4 full-time jobs – much of which still needs to be done, because Cloudflare's anti-bot protection hasn't actually stopped it. Most UGC sites are not as big as Stack Overflow was at its peak.
I'm referring specifically to the activities of Charcoal (https://charcoal-se.org/) and their Stack Exchange staff counterparts, taken together. This is about large-scale platform abuse, of the sort that Cloudflare is alleged to prevent (but doesn't, really), not the more mundane (and laborious) task of manual quality control.
errr... so anything related to UGC now has a lower bound of 3-4 FTE? Sure, I'll hire a team of content moderators next time I think about putting a comment form under my blog...
Yes? Cloudflare doesn't replace moderators. At all. It only allegedly filters bot generated content, it doesn't filter user generated content and doesn't even intend to.
Please read their last sentence again and think about how much it understates the difference between stack overflow in its prime and a normal website. Also the "much of which still needs to be done".
The internet discourse focused on the big ISPs which refused to deploy CDN nodes and then said they needed to double-charge for peering capacity. Most smaller ISPs deployed those Open Connect nodes either becsuse they weren’t as greedy or felt that their customers had alternatives.
> Why not Anthropic? They’re a very rare company capable of charging $200 per month per seat level fee across the corporate workforce.
Because no part of this statement is accurate. They’d like investors to believe it’s very rare but they have multiple strong competitors, most of whom have much better financials, and the entire sector is worried that the open models are going to effectively cap rates below what they need to pay off their massive investments. Lastly, they’re not universally must-have in software development which is one of the domains best suited for LLMs but most corporate work lacks similar correctness oracles and we’re already seeing major corporate customers reconsider the cost/benefit ratio.
None of that means they’re doomed but a lot of stars need to align for them to keep their valuation up. They don’t need to go out of business for investors to lose money buying in at the peak.
Musk chose to make the political aspects unavoidable but I don’t think anything related to major investors’ reactions to one of the most hotly awaited IPOs here is primarily a political move. I’ve been on HN for a while and people here have always had a soft spot for SpaceX — there are probably grown adults now who were born after their parents were speculating about the company here! — and the valuations of SpaceX and Tesla have been the topic of discussion for many years, too. Toss in AI and X and it’d be more surprising if it wasn’t getting a lot of chatter.
Tesla has a P/E wildly out of line with the rest of their sector and is facing strong competition with a largely absentee CEO who has a history of making very bad decisions over the objections of more skilled staff (politics, of course, but also things like how the Cybertruck is so expensive to make and own). At some point that bubble is going to pop so I can understand a pension fund being more focused on long term returns passing on them.
The problem is that it’s both slow and treacherous. They can work on performance but the convenience is really undercut by it being wrong a significant amount of the time and incomplete even more frequently so I have to review the search results anyway.
I think there’s a lot of that dream — note how many of them became AI experts after striking out in cryptocurrency — but also a huge undercurrent of desperation. The rich guys who run most of the economy have made it clear that they want mass layoffs and that LLMs are the tool they’ll use to get there, so these guys are hoping that if they get on board early enough they’ll be the people doing it to everyone else rather than the targets. I’m not sure how successful that’ll be but it’s somewhat understandable how people might find themselves thinking that’s the best option available in the current economy.
I think there’s a solid argument for global auth middleware, where this is a problem if you use the path for exceptions like health-checks or a login endpoint.
The saving grace here is that people are most commonly doing this for reasons other than as a defense - serving static files efficiently, combining multiple services, caching, DDoS protection, etc. There are certainly some directly exposed FastAPI instances but it’s been against the grain for decades.
Or probably the most straightforward one, which is SSL termination. Most backend software usually has very bad support for HTTPS communication, while it's typically extensively documented for something like nginx. It also catches some other strangeness like making it easier to update the certificate.
The biggest risk is incorrect usage of the default_server directive, the proper way in which to handle it isn't usually taught in most "here's how you use nginx" tutorials. Most usually just have you edit the default server blocks.
Tldr that covers 99% of all cases: you want 2 default server blocks, one on port 80 and one on port 443. The one on port 80 should only return 444 (an internal nginx status code that stops the connection immediately with no response), while the one on port 443 should use ssl_reject_handshake to terminate the SSL connection as quickly as possible without causing strange errors (you also need a self-signed certificate because otherwise openssl refuses to do protocol negotiation correctly, but the cert doesn't actually do anything). After that, specify your actual domains as separate server blocks using server_name (including a separate one for each to do the port 80->443 redirect).
Arguably this should be the default configuration shipped by distros, but it isn't for some reason, which doesn't help matters.
If I’m reading https://github.com/nginx/nginx/pull/966 right (not a given on my phone), just having Nguni in front would help because It’s now filtering the characters which make this attack possible.
But you have to be super careful about defining the mitigations for this one, as for example Cloudflare passes malicious headers as-is without extra configuration, leaving hosts vulnerable when they are assumed to be protected.
Yes. you always want to test any mitigation but Cloudflare and AWS ALBs both blocked non-DNS characters in host headers with no additional configuration when I tested it. It would be surprising if Cloudflare didn’t because the Host header is how they know which customer to route a request to.
Enterprises can, but then they have to show their auditors that this has been done in a way which is robust and can’t be bypassed, and they have to build the kind of reports people need to be convinced of that — nothing is ever “just” in enterprise IT.
Longer term, you also have to be careful about building things around details which could change at any time. OpenAI and Anthropic have a ton of pressure to start banking huge profits and they very closely monitor customer activity. A time-honored strategy in this space is to shuffle the features enterprise customers depend on but which aren’t deal-breakers for most other customers into expensive enterprise plans. There’s possibly some counter pressure from companies like Google which have healthier finances but I wouldn’t count on that since they also have MBAs who’d be all too happy to invent pretexts to hike their prices to match.
I’m with OP: I don’t like this but the alternatives all look like the death of the open web.
reply