Nice, I recently found something like this was possible too. Gpt-5.5 one shotted the basic game, but then I added some ai generated graphics/sounds/music and asked it to write then up.
It kind of blows my mind I can go from: 'I want a fun way to help him learn vocabulary, and I loved total annihilation as a kid' to 'heres a game that's he finds genuinely fun that helps him learn something ' in a few prompts.
Yes, but it's important to note that just because a lot of aid is ineffective doesn't mean it all is. If you want to give to very poor people and be confident most (85%+) actually gets to them I encourage you to take a look at https://www.givedirectly.org/. Full disclosure, I'm an unpaid trustee of the UK sister charity
It depends on what you’re comparing it against. For $20, OpenAI is still probably the best value for SOTA models. In terms of limits, you can use GPT-5.4 instead of 5.5. The intelligence feels similar, but it’s cheaper. You can also experiment with other harnesses like pi. It’s lightweight but capable enough, and its token usage is definitely much more efficient.
Interesting. I don't have to use PowerPoint much, but I hate it when I do. I don't want the llm to write the words but I do want it to make things look nice. So does this work well now?
My pipeline for this is vscode + prompts + markdown templates + GitHub copilot -> markdown docs -> pandoc to produce.docx -> copilot in word for “nice” formatting -> copilot in ppt for nice decks. LLMs all the way down.
I find it’s easier to version control and diff the .md artefacts, those remain my authoritative source.
I was doing something like this, and then realized at least with claude that it’s so much better at HTML that it’s better to get an HTML-first deck together, which could then be turned into a PPT template and/or PDF directly, depending on needs.
It saved me a fair number of design-tweak steps in the md -> pandoc part of the workflow. Realistically, hand editing claude’s HTML is also easy in most cases, so I didn’t feel like I lost much (for the generative cases). Similarly if it’s mostly what I’ve written directly that’s the source it’ll be in markdown, and I’ve found it’s a faster path to have md -> (LLM-translated HTML deck) -> pdf.
If you don't want an LLM to write the words, surely you also want to decide on the data and graphs to show by yourself? Isn't that 90% of a presentation? The "looking nice" part doesn't matter as much, it could be black text on a white background and it would be fine.
The important part is the presentation matching your presenting cadence, which is something LLM generated presentations never get right. I don't have a problem with people generating presentations, but most of the time they just end up reading whatever is on the screen when presenting.
With a little bit of work, it works very well. You can generate powerpoint directly with Codex or Claude Cowork. There is also Canva support for these tools and it has its own AI integration. Another useful tool in this space is the Gemini integration in Google slides.
If you are a bit technical, reveal.js is actually really nice for this. I one shotted a pdf export for that uses a headless browser. I've used that a few times now.
What works well for me is to take an existing presentation and then some raw input and generate a new presentation in the same style as the old one from the raw input. After that, I can go in and tweak individual slides.
Another thing I did recently was take somebody's existing pitch deck and fix it with a one line prompt: "this deck is a bit meh, pimp it!" that worked unreasonably well. I like using shitty prompts like that. Codex often manages to do the right thing if you don't overthink your prompts.
Classic deck of somebody that used way too much text and only bullets. It did a great job on that presenting the content in a more simple and better structured way. Pulling out key facts and highlighting those, simplifying text, etc. Doing that manually would have taken hours.
I'm been impressed when testing this model today, but it still can't consistently adhere to the following prompt: make me an image of a pizza split into 10 equal slices with space in between the them, to help teach fractions to a child.
It doesn't reliably give you 10 slices, even if you ask it to number them. None of the frontier models seem to be able to get this right
Please retain vega lite speech as an option! It's incredible useful because you can tweak the chart or change it completely at a later date using e.g. the vega lite editor.
Could this be because they've found the 1m context uneconomical (ie costs too much to serve, or burns through users quota too quickly causing complaints), and so they're no longer targeting it as a goal
Thanks, interesting. Does this make it more surprising that the other benchmarks have improved? I'm not sure I understand the benchmarks well enough - but I'm wondering whether with agentic workflows it's possible to get away with a smaller more focussed context (and hence lower cost) whilst achieving the same or better performance, because of agentic model's ability to decide what the put in context as they work
Thank you! Noticed you're interested in similar areas. I've also previously done some work on maths problem generation. Similar to letterpaths, the core lib can then be used to power games/other educational apps. As I'm sure you've found as well, It's surprisingly difficult to generate random maths problems aligned to a curriculum!
Core lib is UK-focussed:
https://github.com/RobinL/maths-game-problem-generator
Thanks for taking the time to look. My biggest focus right now is own Numerikos. I hope I can make a better math learning platform. Math games are fun too. There are some nice ideas in the examples you have shared here.
The letters are based on how they're formed in the UK primary school curriculum.
Each letter is a json that defines the bezier curves according to a schema.
They were created by starting by drawing the letters freehand, yielding essentially a dot to dot, and then (2) using an approximation/smoothing algorithm to convert that into beziers. Finally,I went through touching up/fixing each letter by hand, using a purpose built editor.
So I would say overall it's more time consuming than challenging.
That stills leaves the problem of joining letters together. For that I heavily lent on AI to propose an algorithm, although it required a lot of back and forth to get something even semi decent. At the moment it's probably 'good enough' but there's still lots of room for improvement.
On the countries quiz, you should be able to move and zoom on the bloge using click and drag (or pinch and drag on mobile). Letter constellations uses shaders. Both of those are only tested on Chrome, so that might be the issue.
I work on an open source record linkage library called Splink. The list of features I want to add has always been far larger than the time I have. I enjoy working with LLMs because they have enabled me to work through my backlog much faster.
I think the primary reason I enjoy working with them is that coding, for me, has always been a means to and end. I like the product of it more than the process. This quote sums it up quite well for me:
> People are really worried about their jobs. And I just want to remind them that the purpose of your job and the tasks and tools that you use to do your job are related, not the same. I've been doing my job for 33 years. I'm the longest running tech CEO in the world, 34 years. And the tools that I've used to do my job has changed continuously in the last 34 years, and sometimes quite dramatically, you know, over the course of a couple, two, three years.
It's a vocab building game, playable here (desktop only): https://rupertlinacre.com/vocab_annihilation/
It kind of blows my mind I can go from: 'I want a fun way to help him learn vocabulary, and I loved total annihilation as a kid' to 'heres a game that's he finds genuinely fun that helps him learn something ' in a few prompts.
reply