Hacker Newsnew | past | comments | ask | show | jobs | submit | philipodonnell's commentslogin

Is the difficulty that in high entropy situations, you can’t really tell whether it’s because the model is uncertain, or because of the options are so semantically similar that it doesn’t matter which one you choose? Like pure synonyms.


If 2 (or more) tokens are synonymous with each other with high probabilities (49.9% each for a total of 99.8%), that's still low entropy. Not as low as a singular high-probability token, but low enough for us to consider this a low-entropy token distribution.

You can't look at a single token distribution, though. There are many legitimate high-confidence, high-accuracy cases in which many tokens could come next. For example, the first token of a paragraph. You need to look at pools of entropies over segments of the output or the whole output sequence.

Although there's a correlation between uncertainty and hallucinations or inaccuracies, there's no guarantee. This is a challenging area that we're monitoring the latest literature for and contributing where we can.


Despite being a different kind of writing, there are some interesting parallels with the article in what you wrote here


I’ve been experimenting with LiveStoreJS which uses a custom SQLite WASM binary for event sync, so for simplicity I’ve also used it for regular application data in browser and found no issues (yet). It surprised me that using a full database engine in memory could perform well vs native JS objects at scale but perhaps at scale is when it starts to shine. Just be wary of size limits beyond 16-20mb.


> I’ve been experimenting with LiveStoreJS which uses a custom SQLite WASM binary for event sync

i'm not sure whether this might be helpful to you, but 3.52 will include a revamped "kvvfs" which (A) also works (non-persistently) in Worker threads and (B) supports callbacks to asynchronously send all db page writes to the client.

<https://sqlite.org/wasm/doc/trunk/kvvfs.md>


Of all the interface modalities available, CLIs seem like the most natural for copilots to work with. Lots of examples in the training data, universal interface for help, maps well to the sequential nature of token generation, similar syntax for different OSs… I can see them replacing skills and MCP et al from the model’s perspective.


How do they prove this? It sounds like the plaintiffs basically claimed they were rejected a bunch of times and since the resume had recognizable indicators of protected classes they must have been discriminated against?

Don’t get me wrong, I do this work, and Workdays statement of “we don’t use protected classes” instead of “we test our models to prove they are unbiased when given recognizable indicators of protected classes” is pretty telling. Because it’s hard and if you solved it you would be proud. If you don’t control for it it WILL discriminate. See Amazon’s experiment a decade ago.

I’m just really curious how all this plays out in front of a judge.


You should just need the AGENTS.md right?


Anyone have a link to the actual report?



This is a great use case for using an algorithmic difficulty ramp where it can really dial in that curve to solve for getting people to play longer over multiple sessions.


This is abusive use of computer technology. Don't optimise for addition. If you're optimising for anything, it should be enjoyment.


I’ve built lots of pre-LLM data processing pipelines like this and the more I read people putting “agents” into this kind of context the less they resemble agents like the Anthropics of the world defines and the more they just resemble functions. I wonder if eventually there won’t be a distinction and it’ll just be a way to make processing and branching nodes in a pipeline less deterministic when you need more flexibility than pure code-rules can give you.


Isn’t this an arbitrage opportunity? Offer to pay a fraction of the cost per token but accept that your tokens will only be processed when the batch window isn’t big enough, then resell that for a markup to people who need non-time sensitive inference?


You may have already noticed that many providers have separate, much lower, prices for offline inference.


Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: