> That's why their point is what the subheadline says, that the moat is the system, not the model.
I'm skeptical; they provided a tiny piece of code and a hint to the possible problem, and their system found the bug using a small model.
That is hardly useful, is it? In order to get the same result , they had to know both where the bug is and what the bug is.
All these companies in the business of "reselling tokens, but with a markup" aren't going to last long. The only strategy is "get bought out and cash out before the bubble pops".
That HN was a neat community fifteen years ago, but like all things cool made by early adopters, it will eventually attract a following hoping to be somewhere, to exist among people doing things, but the tragedy of such followings is that they bring with them their toxicity, their immunity to their own poison, and drown out what they depend on until the early adopters early adopt away.
The real slop is all this lazy concern farming from an ant mill that is powerless to do anything except validate its own hand wringing.
>And people keep claiming the token providers are running inference at a profit.
Not everyone gets $1K of usage, and you don't know how fat the per-token margins are. It's like saying the local buffet place is losing money because you eat $100 worth of takeout for $30.
In addition to usage distribution aspects others called out .
$1K is not actual cost, just API pricing being compared to subscription pricing. It is quite possible that API has a large operating margins, and say costs only $100 to deliver $1K worth of API credits.
Yes and when we say things like that we are not talking about plans. Running inference at a profit means api token use is run profitably. It’s a huge unknown what’s happening at the plan level, we know there is subsidy happening but in aggregate impossible to know if it’s profitable or not.
This is great for RPG games; I made up a small cut-down RPG ruleset for my 6yo, and was going to try to 3d print some figurines, but...
This way, I can get my kid to make his own monsters; while he can't run blender to produce his own monsters, using these paper templates is sufficient for him.
Consent is absolutely important, but that does not mean that every single thing in the entire world requires explicit consent. You did not ask me for consent to use my words in your comment. That does not mean you're a bad person.
Free use is an important part of intellectual property law. If it did not exist, the powerful could, for example, stifle public criticism by declaring that they do not consent to you using their words or likeness. The ability to do that is important for society. It is also just generally important for creating works inspired by others, which is virtually every work. There has to be lines for cases where requiring attribution is required, and cases where it is not.
> You did not ask me for consent to use my words in your comment.
I am not representing your words as mine. I am not using your words to profit off. I am not making a gain by attributing your words to you.
> There has to be lines for cases where requiring attribution is required, and cases where it is not.
You are blurring the lines between "using a quote or likeness" and "giving credit to". I am skeptical that you don't know the difference between the two.
Regardless, any "perspective" that disregards the need to acquire consent is invalid. Even if you are going to ignore it, you have to acknowledge that you don't feel you need any consent from the people you are taking from.
This whole "silence is consent" attitude is baffling.
You made an incredibly strong statement that is much broader than what we are talking about. I am pointing out various cases where I think that broadness is incorrect, I am not equating the two.
I do not think that, if you read, say, https://steveklabnik.com/writing/when-should-i-use-string-vs... , and then later, a friend asks you "hey, should I use String or &str here?" that you need my consent to go "at the start, just use String" instead of "at the start, just use String, like Steve Klabnik says in https://steveklabnik.com/writing/when-should-i-use-string-vs... ". And if they say "hey that's a great idea, thank you" I don't think you're a bad person if you say "you're welcome" without "you should really be saying welcome to Steve Klabnik."
It is of course nice if you happen to do so, but I think framing it as a consent issue is the wrong way to think about it.
We recognize that this is different than simply publishing the exact contents of the blog post on your blog and calling it yours, because it is! To me, an LLM is a transformative derivative work, not an exact copy. Because my words are not in there, they are not being copied.
But again, I am not telling anyone else that they must agree with me. Simply stating my own relationship with my own creative output.
he doesn't have solid points, he conflates fair use with free use (?), ignores thousands of years of attribution history, and equates normal human to human learning with corporate LLMs training on original content (without consent). Great presentation, like you said, to cover the logical defects.
I did say "free use" instead of "fair use," yeah. That's my mistake, thank you for the correction. If I could edit my original comment, I would, mea culpa. Typos happen.
Fair use of training data hasn’t yet been settled in court. People here are treating it like it has been. But no amount of wishful thinking or moral arguments will change a verdict saying it’s fine for training data to be used as it has been.
Until that question is settled, it’s disingenuous to dismiss his points out of hand as conflating fair use or ignoring consent.
However, I don't feel comfortable suggesting that this is settled just yet, one district judge's opinion does not mean that other future cases may disagree, or we may at some point get explicit legislation one way or the other.
I think the court dropped the ball here. On the one hand, I think they were right that using existing works--copyrighted or otherwise--to train a model was transformable fair use. On the other hand, Anthropic and others trained their models on illicit copies of the works; they (more often than not) didn't pay the copyright holders.
There's a doctrine in Fifth Amendment law called "fruit of the poisonous tree." The general rule is that prosecutors don't get to present evidence in a criminal trial that they gained unlawfully. It's excluded. The jury never gets to see it even if it provides incontrovertible evidence of guilt. The point is to discourage law enforcement from violating the rights of the accused during the investigative process, and to obtain a warrant as the Amendment requires.
It seems to me that the same logic ought to be applied to these companies. They want to make money by building the best models they can. That's fine! They should be able to use all the source data they can legitimately obtain to feed their training process. But if they refuse to do so and resort to piracy, they mustn't be allowed to claim that they then used it fairly in the transformative process.
"[T]he test requires that we contemplate the likely result were the
conduct to be condoned as a fair use — namely to steal a work you could otherwise buy (a book, millions of books) so long as you at least loosely intend to make further copies for a purportedly transformative use (writing a book review with excerpts, training LLMs, etc.), without any accountability."
See also p. 31:
"The downloaded pirated copies used to build a central library were not justified by a fair use. Every factor points against fair use. Anthropic employees said copies of works (pirated ones, too) would be retained 'forever' for 'general purpose' even after Anthropic determined they would never be used for training LLMs. A separate justification was required for each use. None is even offered here except for Anthropic’s pocketbook and convenience."
Despite this consideration, the court still found for Anthropic on the question of fair use.
I don't read how that opposes what I said, that's part of the "training on pirated data is not fair use." That said, I am not a lawyer. From those pages:
> The copies used to train specific LLMs were justified as a fair use.
This is (in my understanding) because those were not the pirated copies.
> The copies used to convert purchased print library copies into digital library copies were justified, too, though for a different fair use.
Buying a book and then digitizing it for purposes of training is fair use.
> The downloaded pirated copies used to build a central library were not justified by a fair use.
Piracy is not fair use, you quoted this part as well.
In the conclusions section a the end of 31:
> This order grants summary judgment for Anthropic that the training use was a fair use. And, it grants that the print-to-digital format change was a fair use for a different reason. But it denies summary judgment for Anthropic that the pirated library copies must be treated as training copies.
Training is fair use. Pirating is not fair use, and therefore, you can't train on that either.
I think that's a reasonable way to interpret the court's order, but unfortunately the judge didn't really articulate the consequences of training on pirated copies "not fair use" as clearly as I would have liked. Does that mean they're simply liable for infringement of those works, or does it mean that they'd be enjoined from using them altogether to train the model? The genie was out of the bottle; how could it be put back in?
Anthropic settled the case with the publishers just a few months later, leaving the question mostly unsettled still.
I was just enumerating some of the issues with the '''solid''' points OP made. Actually addressing them would take too long and be exercise in futility, here, in HN, in april 2026. Why would I put in the effort, for my comment to be flagged and sent to the void? or worse, persisted for ever and used for training without my consent?
And yes, you are right, the legal and moral question of fair use in training data hasn't been settled yet; we agree here.
Where are you going with this line of thought? That making a copy of someone's work, using it for profit and not crediting them doesn't "take" anything from them?
I find that these discussions at the intersection of art and law tend to blur technical and familiar uses of words. So it's important to specify what was actually taken here because otherwise the discussion becomes muddy.
"making a copy of someone's work, using it for profit and not crediting them" wasn't really the scenario being discussed in this thread -- is that what you meant by "taking"?
Steve had made the point:
Not every single thing in the entire world requires explicit consent.
But actually taking someone else's verbatim work and selling it as your own is one of those instances where consent would be required, because many people see a clear line between someone selling another author's work and the author not getting a dollar because of that.
That doesn't preclude other instance where explicit consent is not required. For example, do I need your consent to learn from your work and produce similar work of my own? Am I required to credit you in my work for having learned from you? Am I taking from you if I don't share my profits with you?
Some rights holders would say yes, actually. Which, I don't agree with. I think it's important that we not require the artist's explicit consent for all things, because listening to some of rights holders (e.g. Disney), they have very expansive ideas about what kind of control they are owed by society over their creations.
Therefore, I think if you're going to claim something has been taken, you should specify what exactly.
I don't think the poster has a viewpoint that 'refuses consent', their viewpoint is their writing they put for others to view is for others to view, regardless of how it is viewed. They seem to be giving consent, not refusing it, no?
> This is what I was responding to. I do not understand your thinking in this post.
I thought it was clear from "refuses to include any sort of consent" that I am talking specifically about holding an opinion that refuses to include consideration for consent, not refuses consent for usage.
> In practice this doesn't work though, the Mastercard-Visa duopoly is an example,
MC/Visa duopoly is an example of lock-in via network effects. Not sure that that applies to a product that isn't affected by how many other people are running it.
I'm skeptical; they provided a tiny piece of code and a hint to the possible problem, and their system found the bug using a small model.
That is hardly useful, is it? In order to get the same result , they had to know both where the bug is and what the bug is.
All these companies in the business of "reselling tokens, but with a markup" aren't going to last long. The only strategy is "get bought out and cash out before the bubble pops".
reply