More

janalsncm · 2026-06-09T01:34:16 1780968856

Is there an excess of teachers in Alaska?

I can understand why rural schools would need H1Bs. They would probably need to pay a premium to attract teachers from out of state, not to mention Alaska. And rural schools are the least able to actually do that.

Maybe if the current admin really wants to keep the $100k fee, they can extend an olive branch by either waiving the fee or helping to fund American teachers to move to fill those jobs.

throwaway85825 · 2026-06-09T01:38:38 1780969118

There's as many teachers in Alaska as they're willing to pay for.

janalsncm · 2026-06-08T21:59:29 1780955969

If Apple is running the inference from Apple iPhones and Apple data centers then Apple has operational control. Google’s influence ends the moment they hand the weights over to Apple.

impulser_ · 2026-06-08T22:43:35 1780958615

They are using Google Cloud.

https://security.apple.com/blog/expanding-pcc/?linkId=100000...

"Now, we are collaborating with Google and NVIDIA to run new Apple Intelligence workloads on Google Cloud, extending our industry-leading PCC privacy commitments to third-party data centers for the first time."

materielle · 2026-06-09T00:06:35 1780963595

That’s not so special, though? There’s a difference between Google infra running Google services.

Versus any F500 company running their services on GCP.

It’s a bit whacky to think about because Apple will operate Google owned software on GCP. But it should be sandboxed just the same.

I’m not making a normative privacy argument here. Just pointing out that this is cloud business as usual. Perhaps it’s interesting Apple is doing it, but basically everything else is already using either AWS or GCP at this point.

airstrike · 2026-06-09T00:46:52 1780966012

I think the difference is scale. This is Apple, so it's an enormous amount of devices. And it's a seamless experience, to the user, going from local model to cloud models.

So the question about which model Apple was going to use and where has been highly anticipated, especially by the likes of OpenAI and Anthropic. Imagine if either one could say they have Apple as their customer?

Apple certainly has the cash to burn if they wanted to train their own model, but it also always seemed out of their core competency. This is a major win for Google.

So "business as usual" but with huge implications for the AI ecosystem in general.

dofm · 2026-06-08T22:48:15 1780958895

That is news — I guess not very surprising that they'd need more data centres than before.

But again there is no Apple-to-Google transfer in the inference in the sense of the comment I was originally replying to (I am not suggesting you're implying otherwise, obviously)

But I stand happily corrected where I said they aren't in the picture at all.

That is an interesting press release because it outlines what they would have had to do with any data centre they were outsourcing to.

impulser_ · 2026-06-08T22:49:41 1780958981

This is probably why Google had to rent compute from SpaceX. They needed to free up NVIDIA GPUs for Apple so they probably moved internal workloads to SpaceX compute.

ezfe · 2026-06-09T00:49:30 1780966170

iCloud already uses Google Cloud, so that still doesn't change the operational boundaries of where data goes

LoganDark · 2026-06-09T01:02:05 1780966925

I hope they are still using PCC hardware rather than running private data through third-party servers.

dofm · 2026-06-08T22:05:51 1780956351

Right — I suppose I mis-phrased my first sentence a bit, because I guess it can be interpreted as me saying the boundary is blurred, when what I was trying to write is: in operation there is nothing crossing any boundary; Google are not in the picture.

janalsncm · 2026-06-06T08:27:11 1780734431

Not just that, I think a lot of people are going to waste their time losing the battle (and make no mistake, they will lose) fighting against AI writing without ever asking themselves what makes writing good in the first place.

There’s good AI writing and bad organic writing. But it’s easier to point out a few LLM-isms than to actually identify the problems with text.

blharr · 2026-06-06T14:09:26 1780754966

> There's good AI writing

Sure, but the LLM-isms in AI writing are mentally exhausting to see in every way at this point.

The whole point of reading, frankly, is to understand the voice of other people. When you pass that through a distorted filter that makes everyone sound the same... its bad, lossy, frustrating communication

It's also dishonest. When you publish something that is direct output without your wording. Digital catfishing at best.

The only good AI writing is providing the prompt, because the question is way more interesting, and way more constructive to learning than the answer

janalsncm · 2026-06-06T20:35:05 1780778105

The point of writing is to convey an idea to another person or yourself at a future date. Authenticity has nothing to do with it. I frankly do not care about the “authentic voice” of the author of a random blog. I want to know if they have any interesting ideas.

wj · 2026-06-06T22:42:48 1780785768

I think because so much of an idea is shaped by the language used to convey it, it may be hard to separate the person from the LLM.

I think gp may want to know if a <person> has an interesting idea rather than <person + llm>.

janalsncm · 2026-06-08T18:30:47 1780943447

In that case, you can’t achieve that by writing things out by hand. A person could (and many do) use an LLM to generate an idea, then write it in their own voice. Or, they could use an LLM to write out an idea they came up with themselves.

In other words, since the idea generation component can completely independent from the writing component, what you’re asking is not possible in practice.

janalsncm · 2026-06-06T08:20:50 1780734050

I don’t think it’s absolutely embarrassing. First of all, the point of the author writing at all is to aid understanding, not produce prose. So from that standpoint, what would be embarrassing would be to include incorrect facts that suggest a fundamental misunderstanding of the topic.

From my read, it is fine. The brief history of LLMs is complicated since every single component has papers introducing enhancements. So it’s easy to ignore them or get bogged down with details.

The author appears to be a security researcher learning about LLMs for the purpose of defending against common attacks. So this piece is that person giving themselves a crash course on the topic. The fact that they cleaned up their notes with an LLM is frankly completely irrelevant.

janalsncm · 2026-06-06T05:26:51 1780723611

When I first started working with LLMs in 2019 AI was in no was synonymous with LLMs. I personally realized pretty quickly that they’d eventually be able to write software that compiles. Not necessarily good software, but software that passes a minimum threshold.

Then again there were all sorts of hallucination-adjacent issues which are still present but rarer as models get bigger. Wondering about the consequences for software engineering as an industry was a little bit of an “overpopulation on Mars” problem since GPT2 could barely string a paragraph together.

Another factor is the industry’s continued insistence on evaluating the ability to write software using leetcode. Well, Claude is probably the best leetcoder in the world now, but since our industry never figured out better evaluation criteria for candidates of course we are backed into a corner.

janalsncm · 2026-06-05T10:14:25 1780654465

You should write your readmes by hand. You’ll learn a lot more that way, and it’ll help to ground the project.

spacebacon · 2026-06-05T11:09:44 1780657784

It’s not as if they were one shot. 5 repos prior, two published pre-prints on SSRN and thousands of hours back my research that is right there for you to peer review and use freely.

janalsncm · 2026-06-05T10:12:25 1780654345

“Semiotic awareness” is not standard ML terminology. The dictionary definition of semiotic simply means “relating to symbols” so it’s a bit grandiose to say you have Qwen “awareness of symbols” when in reality it’s a marginal improvement if even true.

Also to say that a philosopher that died 100 years ago inspired a new attention head is another instance of GPT off his rocker again. You don’t need MAH to contextualize “freedom” in a sentence. Attention already does that.

janalsncm · 2026-06-05T09:57:52 1780653472

Your method appears to be similar to LoRA but simply less expressive. Some kind of manipulation to layers 7, 14, and 21. Did you compare with other layers? This is obviously extremely specific to a particular backbone.

Also your documents use a ton of nonstandard jargon which only serve to confuse laypeople and annoy anyone who is familiar with ML. Saying your change adds “semiotic awareness” is meaningless when your experiments claim only marginal improvements. Clearly the model had most of the capability before.

More generally, who is it for? People who have expertise in ML are not going to take it seriously. People who don’t?

spacebacon · 2026-06-05T10:54:44 1780656884

It is not LoRA. LoRA fine tunes capabilities into the model. SRT Adapter is a small overlay on a frozen model whose purpose is to make internal reasoning observable. It surfaces what the model is activating at moments of high divergence.

The layers 7, 14, and 21 were chosen after probing. They showed the strongest regime signals. We did compare other layers. The term semiotic awareness is just shorthand for detecting and modulating higher order meaning patterns. If the term is unhelpful I will drop it.

The capability gains are often marginal on standard benchmarks. The intended value is observability and steerability without retraining the backbone.

janalsncm · 2026-06-05T02:40:30 1780627230

It’s a data point. I could imagine in a hardware constrained setting we might not care about training on enormous token counts, and on smaller devices it’s great if we can simplify the architecture.

I agree that this isn’t proof that it scales to trillions of tokens, but this does show a scaled up experiment would be worth a shot.

Philpax · 2026-06-05T03:32:49 1780630369

The Chinchilla scaling laws give you a minimum for the number of tokens you should be using for a given size: if you can't meet what they suggest for that size, you should shrink the size, as, otherwise, the capacity of the model is going to waste.

I do agree that it is a datapoint, but GP's point is that this model was undertrained, so it's hard to draw the same conclusions from it that we would from other research.

janalsncm · 2026-06-02T09:15:03 1780391703

A helpful middle ground I’ve found is to build out the architecture you want, but stub out the tedious function implementations you don’t want to do yourself.

And by stub out I mean write the function signature yourself, including parameters it’ll accept and return types. Add a comment if necessary about what it will do.