I had that experience with the last house we rented before we bought.
We were quiet, predictable, don't-rock-the-boat tenants, and the rando owner mentioned that they valued that enough that raising rents wasn't worth the potential risk of new tenants who might cause them more hassle.
Analogous to that, this is something that I'd been wondering about with respect to hardware prices as silicon is reallocated from consumers to data centers: how am I to make heavy use of frontier (edit: i.e., cloud/data center-provided) AI models if I can't easily buy a machine worth using it on?
Yes, I get that and meant to imply it with "frontier". But my question was more about how my biggest uses by far of data center-provided AI in terms of tokens have been in things like experimenting with agentic coding on my desktops. If I'm just tapping away at my phone with some questions in a chat window, my AI needs are much lower. Compete with me for the HW resources that I need to make full use of your data center-provided AI supply, and I'll just drop my demand accordingly.
(I know, I know... the answer is probably that they expect me to just move my software development to the cloud, too. Joy!)
I'm reminded of The Hobbit with the phrase "Good morning" in the first chapter:
> "Good Morning!" said Bilbo, and he meant it. The sun was shining, and the grass was very green. But Gandalf looked at him from under long bushy eyebrows that stuck out further than the brim of his shady hat.
> "What do you mean?" he said. "Do you wish me a good morning, or mean that it is a good morning whether I want it or not; or that you feel good this morning; or that it is a morning to be good on?"
> "All of them at once," said Bilbo. "And a very fine morning for a pipe of tobacco out of doors, into the bargain.
From the linked post, it didn't read like a separate KV cache was needed:
> The draft models seamlessly utilize the target model's activations and share its KV cache, meaning they don't have to waste time recalculating context the larger model has already figured out.
That's great news. That has not been the case with other MTP implementations like Qwen3.5, but I see the section in the article saying Google introduced some architectural optimizations to make this possible.
Ah. We're back to the days of Emacs' old `M-x psychoanalyze-pinhead`, then. (Psychoanalyze-pinhead ran the Eliza chat-bot and fed it bizarre quotations collected from the Zippy the Pinhead comics.)
At least for the CPU/GPU split, llama.cpp recently added a `--fit` parameter (might default to on now?) that pairs with a `--fitc CONTEXTSIZE` parameter. That new feature will automatically look at your available VRAM and try to figure out a good CPU/GPU split for large models that leaves enough room for the context size that you request.
Hardware floating point was rare before the 486 DX and Pentiums. Not to mention that Integer<->FP conversion was slow. And division of any kind has always been slow. So you'd see a lot of fixed-point math approximations with power-of-two divisors so that you can shift-right.
I think the point the GP was trying to make is that the GitHub UI ought to be able to allow you to submit a branch with multiple well-organized commits and review each commit separately with its own PR. The curation of the commits that you'd do for stacked PRs could just as easily be done with commits on a single branch; some of us don't just toss random WIP and fixup commits on a branch and leave it to GitHub to squash at the end. I.e., it's the GitHub UI rather than Git that has been lacking.
(FWIW, I'm dealing with this sort of thing at work right now - working on a complex branch, rewriting history to keep it as a sequence of clean testable and reviewable commits, with a plan to split them out to individual PRs when I finish.)
> I think the point the GP was trying to make is that the GitHub UI ought to be able to allow you to submit a branch with multiple well-organized commits and review each commit separately with its own PR.
That's what this feature is, conceptually. In practice, it does seem slightly more cumbersome due to the fact that they're building it on top of the existing, branch-based PR system, but if you want to keep it to one commit, you can (and that's how I've been working with PRs for a while now regardless, honestly).
They confirmed in other comments here that you don't have to use the CLI, just like you don't have to use gh in general to make pull requests, it's just that they think the experience is nicer with it. This is largely a forge-side UI change.
> I think the point the GP was trying to make is that the GitHub UI ought to be able to allow you to submit a branch with multiple well-organized commits and review each commit separately with its own PR
So the point he's trying to make is that Gituhub UI should support Stacked PRs but call them something else because he doesn't like the name?
Cave Johnson here. I'll be honest, we're throwing science at the wall here to see what sticks. No idea what it'll do. Probably nothing. Best-case scenario, you might get some superpowers.
I've been using the 1M window at work through our enterprise plan as I'm beginning to adopt AI in my development workflow (via Cline). It seems to have been holding up pretty well until about 700k+. Sometimes it would continue to do okay past that, sometimes it started getting a bit dumb around there.
(Note that I'm using it in more of a hands-on pair-programming mode, and not in a fully-automated vibecoding mode.)
We were quiet, predictable, don't-rock-the-boat tenants, and the rando owner mentioned that they valued that enough that raising rents wasn't worth the potential risk of new tenants who might cause them more hassle.
reply