DeepMind documentary: The Thinking Game
Documentary from Google DeepMind: The Thinking Game. Interesting behind-the-scenes history of DeepMind, with high-level overviews of AlphaGo, AlphaZero, AlphaFold, etc.
Exasperated prompting
I’ve seen a couple of papers suggesting that LLM output quality can be sensitive to tone and language style: “good manners” (paper) and “appropriate language” (paper).
This week, I was wrestling with some “vibe coding” to get a tricky integration working. After several hours of prompting, I was still coming up blank. In frustration, I typed a prompt that questioned the overall approach. The response from Gemini 3 (in VS Code) was uncharacteristically short: it basically stated the root cause and said it had fixed it. It also mostly ignored my questions — but the solution worked perfectly.
That made me wonder whether the “questioning the approach” angle pushed it toward the correct solution, versus just continuing to iterate on the same broken thread.
Prompt I used:
Thanks! That fixes the error, but the clicked option still doesn't get set as the selected value (for either the custom or built-in options). I'm trying to add custom options into the menu so that I can upgrade this feature to support Divi 5 (it was easy in Divi 4 to add the options). We've spent a lot of time trying to get this working, but it hasn't yet. Do you think we are going about it the right way? Should we just try to set the option via the dom (using our custom onchange handler)? Or is there a better approach that I'm not thinking of? Or should we keep pushing on trying to inject our custom options using the way Divi does it now? Here's the current debug log: ...
One to test at some point: try deliberately switching into “step back and question the approach” mode earlier, rather than only doing it after getting stuck.
Appropriate language
I (re)read the paper linked above: Mind the Gap: Divergence and Adaptation Strategies in Human-LLM Assistant vs Human-Human Interactions (paper).
Notes / takeaways:
- In their study, users who knew they were chatting to a bot tended to write in a noticeably different style than when chatting to a human online: less polite/formal, a bit less grammatically correct, and slightly less lexically diverse.
- They tested how that style shift affected downstream performance (intent classification), even when the underlying “meaning” of the query wasn’t changed.
- The model performed best on the human-human communication style, and performance degraded for both (a) the human-chatbot style and (b) an exaggeratedly polite/formal/grammatically correct variant.
- Mitigation-wise: training on a range of communication styles worked well; preprocessing user queries did not (and caused a small degradation, possibly due to subtle information loss).
Practical implication (as a user): it may be worth writing to an LLM more like you’d write to another human — assuming the model was trained heavily on human-human interactions (which many current LLMs presumably were).
Politeness
I also re-read this paper on politeness / good manners and LLM output: https://arxiv.org/ftp/arxiv/papers/2402/2402.14531.pdf
Notes / takeaways:
- They tested several LLMs on three tasks: summarization, language understanding, and stereotypical bias detection.
- Prompts were modified across 8 politeness levels, in English, Japanese, and Chinese.
- Outputs varied noticeably by politeness level and language.
- For English, the best results were typically achieved with moderate politeness; more impolite and excessively polite prompts tended to do worse.
- Differences were smaller for the newest model included in the study (GPT-4).
Again, the user-level implication is similar: respectful, normal human-to-human style seems like a good default.
GPT 5.2
OpenAI have released GPT 5.2: https://openai.com/index/introducing-gpt-5-2/
My quick notes from the announcement:
- Claimed improvements in knowledge-work tasks (e.g. spreadsheets, presentations, etc.).
- Described as a better coding model than GPT 5.1, including stronger front-end development and more complex/unconventional UI work (including 3D elements).
- Fewer hallucinations, plus significant improvements in long-context handling.
- Improved vision processing (notably chart reasoning and software interface understanding), with a stronger grasp of how elements are positioned within an image.
- For UI screenshot vision tasks, they note that enabling a Python tool and using maximum reasoning effort significantly improves results.
- Improved tool-calling: better tool selection and higher workflow completion rates (e.g. support ticket resolution).
- Improved scientific, mathematical, and general reasoning.
- Updated knowledge cut-off: Aug 31, 2025.
Availability/cost notes:
- Available in ChatGPT and via the API.
- API pricing mentioned: $1.75 in / $14 out vs $1.25 in / $10 out for GPT 5.1, with a claim that agents can be cheaper overall due to token efficiency.
- There’s also a more powerful (and more expensive) GPT 5.2 Pro variant.
OpenAI Responses compaction
The OpenAI Responses endpoint added a “compaction” feature: https://platform.openai.com/docs/api-reference/responses/compact
The idea: reduce a long-running conversation down to a smaller object that can be used as input to continue the conversation while preserving earlier state, saving tokens.
COLT monotonicity problem
OpenAI released a paper with a novel proof of the previously unsolved COLT monotonicity problem, with the mathematical work performed by GPT 5.2 Pro (and minimal human prompting to keep it on track). Discussion here: https://openai.com/index/gpt-5-2-for-science-and-math/
Google GenTabs (Disco)
Google released GenTabs: https://blog.google/technology/google-labs/gentabs-gemini-3/
It’s an experiment in augmenting a browser with on-the-fly generated apps. It’s waitlisted and (at least initially) US-only. Looks very cool.