I started using marimo for the reactive execution, after being spoiled by Observable and Pluto.jl Being able to plug directly into Altair charts and tables was a huge boon. Then I discovered anywidget, which has been a game changer.
Now I use Claude to generate anywidgets for controls I need, and just focus on the heavy lifting with python, it's great. Being able to just have this all run in one flow with pair should make this 10x smoother.
As an example I get spreadsheets sent by clients that all have different file types, formatting, names, and business rules. I had Claude build me a widget to define a set of data-cleaning steps (merge x+y fields, split with regex, etc.). Now this task that used to take a lot of manual work and iteration is just upload a spreadsheet, preview and select my cleaning steps, run my algorithm and wait for it to come out the other side (with labelled progress bars). When it's done I get a table element and some interactive Altair charts to click on to filter and fine-tune, then I can just export the table and send it.
This task used to be done manually by a team, then I turned it into 1-2 hours with Jupyter. Marimo let me turn it into 5-15 minutes. Visually inspecting the results by a human is a requirement, so it's not completely automatable, but 15 mins turnaround every few weeks feels good enough.
Anyways, marimo rocks. The _only_ thing missing is the easy deploy for internal-users story as I cannot use molab (yet?).
Hey, thanks and glad to hear the marimo + anywidget combo has been an unlock (I'm also the creator of anywidget). Clearly I'm biased, but custom widgets are a powerful primitive (marrying web & data ecosystems), and it's exciting to see coding tools making it even more accessible to build them out for specific or one-off tasks.
Re: deployment, we hear you & stay tuned. You can provide input here [1].
Side note: if you're curious, I have an RFC out for widget composition (widgets within widgets) [2]. Should be shipping soon.
It's essentially a table layout with a plus button at the bottom. When you click it adds a new step as a row, then you pick the operation, the input columns and output column name.
If you want to add another step you click the plus again and add another row the same way. Each row can access any table field or output field defined above it in the DAG.
Then in Python a for loop runs over the steps in order and updates the data frame in place (well, in function, returning the new one). It uses a dictionary of function mappings and resolves input fields with kwargs.
The mental-model that I am using for online writing, is that it is analogous to the spectrum of `pretending <-> acting`. The worst writing (AI or otherwise), looks, sounds, feels like pretense, like a kid that tucks a towel into his shirt, and runs around, pretending to be a super-hero. Meanwhile, acting, true acting, is invisible, it is a synonym for _being_[1].
That said, a lot of the AI writing feels "procedural", in the sense that most corporate writing (whitepapers, press releases, etc) feel procedural (i.e. the result of a constructed procedure). Before AI, the constructed procedure was basically that a piece of writing passes through a bunch of people (e.g. engineering -> management -> marketing -> website/email), and the output is a bland, forgettable pablum designed to (1) be SEO-friendly, (2) be spam-filter friendly, (3) be easy to ingest, (4) look superficially trustworthy and authoritative (e.g. inflated page count, extra jargon, numbers, plots), (5) look like it belongs to the "scene" or "industry" by imitating all the other corporate writings out there[2].
AI is interesting, in the same way that computers or the internet or an encyclopedia are interesting: how people choose to use it tells you a lot about them. All of those technologies can be used to compensate for a lack of skill (it helps one pretend), or they can be used to forge a skill (it helps one become).
One has to pretend, before they can act (I guess? Feels intuitively correct to me). So perhaps, AI (and web, and computer, and encyclopedia) is only harmful to the extend that it does not nudge a person towards becoming[3]? And if so, that's a _cultural_ limitation, not a technological one.
[1]: I am not an actor, and so I might be wrong, but that is the impression I get from just watching and analyzing the acting in various films.
[2]: this becomes frustrating when you get criticized for producing something that "reads like $famousSomething", and then you get criticized again for producing something that "does not read like $typeOfFamousSomething".
[3]: No clue how you (plural -- let's bring back "yous") will convince your boss that you did not take the shortcut, because you were trying to "become more".
You can't expect a company to support you in directly hurting them. This reminds me of a guy I know that sued his bank regarding a loan, but he used a terrible lawyer. The bank pretty quickly took their own action, including stopping all payments from his account (as they claimed that the money was theirs).
It's not any company, its Meta and the channels they administrate come with a set of responsibilities and principles, and one such is not to break these by arbitrary, willful removal of totally legal ads.
Isn’t that their defense against responsibility for their customers’ content? Having some broad filtering for legal requirements or scams is one thing but if they’re doing this it seems like support for cases alleging that they have editorial control and therefore responsibility.
Legally, they don't need to choose. Section 230 limits provider liability for moderating user content and also limits provider liability for not moderating user content. I think the intent of Section 230 was to apply liability to the users making the content, not the service provider transmitting it; however, IIRC, cutrent jurisprudence makes it very hard to compell service providers to identify users in civil cases, so civil liability is hard to pursue, unless the user identifies themself in their content.
It's not a question of if they're a common carrier or nor; they don't need to be, and typically, they don't try to be.
That's true. I haven't been keeping up with FB lawsuits but from what I gather of HN sentiment, FB is not open and never has been. Any FB exec claiming to be open is probably just doing exactly what you said, and they'll probably find a way to spin it to include this exclusion as part of their "openness."
I reviewed 118 conversations with Claude since March 6, all on real work projects.
Each conversation was processed to assess level of frustration, source of frustration, and evaluated with Gemma 4 and Claude Opus for spot checking. I have a tool I use to manage my work trees, so most work has is done on branches prefixed with ad-hoc/feature/explore or similar, and data was tagged with branch names.
43% of my Claude Code sessions (Opus 4.6, high reasoning) ended with signals of frustration. 73% of total chat time (by total messages) was spent in conversations which were eventually ranked as frustrating.
Median time to frustration was 25 messages, and on average, each message from Claude has about a baseline 5% chance of being frustrating. Frustration by chat length actually matches this 5% baseline of IID Bernoullis -- which is surprising and interesting, as this should not be IID at all.
Frustration types:
- Wrong answers – 14% of sessions, 31% of frustration
- Instruction Following – 11% of sessions, 25% of frustration
- Overcomplication – 8% of sessions, 18% of frustration
- Destructive Actions (e.g. requesting to delete something or commit a change to prod) – 3% of sessions, 8% of frustration
- Non-responsive (service outages leading to non-response) 2% of sessions
- Miscommunication 2% of sessions
- Failed execution 2% of sessions
Half of frustrations happened in the first or last 20% of a chat by length. I interpret early frustrations to be recoverable, late frustrations to be terminal.
Early frustrations (sessions averaged 45 turns):
- 30% overcomplicating the problem
- 30% instruction following issues
- 30% wrong answers
- 10% destructive actions
Late frustrations (sessions averaged 12 turns -- i.e. terminal context early)
- 36% Wrong answers, with repetition
- 21% instruction following, with repeated correction from user (me)
- 14% Service interruptions/outages
- 7% failed execution
- 7% communication - Claude is unable to articulate some result, or understand the problem correctly.
Late frustrations led to the highest levels of frustration, 29% of the time.
I'm a data scientist — my most frustrating work with Claude was data cleaning/repair (a complex backfill) issues -- with 75% of sessions marked frustrating due to overcomplicating, instruction following, or destructive actions).
The best (least frustrating) workflows for DS were code-review, scoped feature work (with tickets), data validation, and config/setup tasks and automation.
Ad-hoc query work ended up in between -- ad-hoc requests were generally bootstrapping queries or doing rough analysis on good data.
Side note: all of my interactions with the /buddy feature were flagged as high frustration ("furious"). That was a false positive over mock arguing with it, but did provide a neat calibration signal. Those sessions were removed entirely from the analysis after classification.
I lived in apartments for a long time then moved into a house. I thought my cat who had never seen stairs would take some adjusting. Nope, he look up them, wiggled his butt, then ran full tilt to the top. Ran full tilt down them too.
One of our cats has arthritis and before we got her treatment she didn’t like them, but she’s perfectly happy now.
It’s always amazed me how much capability baby animals have right when they’re born, when they have near zero experience with their muscles and balance and senses. Or even just the instinct of a cat to chase a string is universal.
There’s something intrinsic to the structure of brains that seems to pre-encode a lot of evolutionarily useful content without a training phase.
I’d love to take a course on just this topic and what do we know about it.
To be fair, it's not like the baby animals pop into existence at birth, starting from scratch at that moment, but instead they've been growing/incubating for quite some time. Who knows, maybe that's the actual "training phase" for the animals, as what you say is true, they seem to have a lot of instincts already at birth, while human babies seem to almost "popped into existence at birth" with not a whole lot of instincts yet, compared to other animals at least.
They’ll have heard noises, experienced gyroscopic forces and gravity. But a calf being born and standing up within minutes to an hour is pretty neat. Same with vision, going from no sensory input to seeing.
Apparently piglets have full motor control in 8 hours after birth. I went to a local agricultural museum the other day and saw some week old piglets climbing over each other and nursing with no problems.
As I said, I would love to have the time and go back to school to learn way more about all of this. Nature and evolution are pretty amazing.
Also illustrates an adaptability-ability trade-off. A human baby is supplied a SOTA brain and sensors and actuators it can make sense of given time. A deer baby is preprogrammed to handle its sensors and actuators. In time, the human baby surpasses the deer baby in general ability.
I generally do like the model, it’s not a great agent though.
It’s good for summarization tasks, small tool use, and has pretty good world knowledge, though it does hallucinate.
reply