More

kjok · 2026-04-10T03:50:52 1775793052

Building automated analysis tool that help identify the use of GPL-licensed SDKs in mobile apps, promoting license compliance and supporting sustainable open-source development.

kjok · 2026-04-08T06:53:40 1775631220

How do you know that they were LLM scrapers? The reason I ask is because user agents could easily be spoofed?

JohnTHaller · 2026-04-08T16:43:44 1775666624

Well known ones. Hundreds of Chinese IPs. Useragents and IPs matched with a year long campaign from them.

kjok · 2026-04-08T06:24:18 1775629458

For those who have deployed Cloudflare in front, what are pros and cons? How's the user experience? Do they offer free bot protection?

MartijnHols · 2026-04-08T06:38:35 1775630315

I opted for Bunny Shield exactly to combat bots, in particular ones that spoof User Agents and rotate millions of IPs. It works great, detecting the vast majority of bots and challenging them. Much more user friendly than Cloudflare too, which typically resorts to challenging everyone (not that CF was ever an option due to various concerns).

I also added various rate limits such as 1 RPS to my expensive SSR pages, after which a visitor gets challenged. Again this blocks bots without harming power users much.

frankacter · 2026-04-08T07:24:58 1775633098

Some pros for us, in addition to bot protection.

* global distributed caching of content. This reduces the static load on our servers and bandwidth usages to essentially 0, and since it is served at an end point closest to wherever the client is, they get less latency. This includes user logged in specifics as well.

* shared precached common libraries (ie. jquery, etc) for faster client load times

* Offers automated minification of JS, CSS, and HTML, along with image optimization (serve size and resolution of image specific to the device user is viewing it from) to increase speed

* always up mode (even if my server is down for some reason, I can continue to serve static content)

* detailed analytics and reporting on usage / visitors

There are a lot more, but those are a few that come to mind.

kjok · 2026-04-08T06:19:45 1775629185

How are you measuring this? Does your solution rely on user agent or device fingerprinting? Curious to know what tools are available today and how accurate they are.

spiderfarmer · 2026-04-08T08:24:32 1775636672

I'm popular in Europe, there's no reason people from Singapore, Russia, Brazil and literally every other country in the world to all start visiting very old articles and permalinks for comments en masse.

Having honeypot links is the only thing that helps, but I'm running into massive IP tables, slowing things down.

This is not what I want to do with my time. I can't afford the expensive specialised tools. I'm just a solo entrepreneur on a shoestring budget. I just want to improve the website for my 3k real users and 10k real daily guests, not for bots.

kjok · 2026-04-06T21:54:41 1775512481

Thanks for sharing your approach!

> It is nothing special. We keep X number of machines in a warm pool.

I'd love to better understand the unit economics here. Specifically, whether cost is a meaningful factor.

The reason I ask is that many startups we've seen focus heavily on optimizing their technology to reduce cold/boot startup times. As you pointed out, perceived latency can also be improved by maintaining a warm pool of VMs.

Given that, I'm trying to determine whether it's more effective to invest in deeper technical optimizations, or to address the cold start problem by keeping a warm pool.

kjok · 2026-04-01T18:22:19 1775067739

> There are dozens of projects like this emerging right now. They all share the same challenge: establishing credibility.

Care to elaborate on the kind of "credibility" to be established here? All these bazillion sandboxing tools use the same underlying frameworks for isolation (e.g., ebpf, landlock, VMs, cgroups, namespaces) that are already credible.

simonw · 2026-04-01T18:25:25 1775067925

The problem is that those underlying frameworks can very easily be misconfigured. I need to know that the higher level sandboxing tools were written by people with a deep understanding of the primitives that they are building on, and a very robust approach to testing that their assumptions hold and they don't have any bugs in their layer that affect the security of the overall system.

Most people are building on top of Apple's sandbox-exec which is itself almost entirely undocumented!

kjok · 2026-04-01T18:59:44 1775069984

> The problem is that those underlying frameworks can very easily be misconfigured.

Agreed. I'm sure a number of these sandboxing solutions are vibe-coded, which makes your concerns regarding misconfigurations even more relevant.

cyanydeez · 2026-04-01T23:41:27 1775086887

I'm sure 100% of them are vibe coded. We were all wondering where this new era of software is, and now it's here, a bunch of nominally different tools that all claim to do the same thing.

I'm thinking the LocalLLM crowd should take their LLMs to trying to demolish these sandboxes.

kjok · 2026-04-01T03:31:35 1775014295

And this is exactly why we see noise on HN/Reddit when a supply-chain cyberattack breaks out, but no breach is ever reported. Enterprises are protected by internal mirroring.

kjok · 2026-03-31T17:25:57 1774977957

Curious to know why are coding agents not detecting such risks before importing dependencies?

mayhemducks · 2026-03-31T18:08:09 1774980489

I'm assuming you are talking about agents like claude-code and open-code which rely on GPT functions (AKA Large Language Models).

The reason they don't detect these risks is primarily because these risks are emergent, and happen overnight (literally in the case of axios - compromised at night). Axios has a good reputation. It is by definition impossible for a pre-trained LLM to keep up with time-sensitive changes.

kjok · 2026-03-31T18:31:27 1774981887

I mean that agents can scan the code to find anything "suspicious". After all, security vendors that claim to "detect" malware in packages are relying on LLMs for detection.

mayhemducks · 2026-03-31T20:31:32 1774989092

An LLM is not a suitable substitute for purpose-built SAST software in my opinion. In my experience, they are great at looking at logs, error messages, sifting through test output, and that sort of thing. But I don't think they're going to be too reliable at detecting malware via static analysis. They just aren't built for that.

kjok · 2026-03-23T17:32:54 1774287174

> I actually just published a paper...

This gives me an impression that the paper has already been published and is available publicly for us to read.

fwsgonzo · 2026-03-23T17:41:28 1774287688

Sorry about that, the conference was on Feb 2, and it's supposed to be out any day/week now. I don't have a date.

There is a blog-style writeup here: https://fwsgonzo.medium.com/an-update-on-tinykvm-7a38518e57e...

Not as rigorous as the paper, but the gist is there.

jeffbee · 2026-03-23T19:21:38 1774293698

Thanks! I'll keep an eye out for the paper.

kjok · 2026-01-29T20:59:14 1769720354

Maybe humans can focus on cybersecurity and fraud? That’s not going away with AI