Mossop Labs

AI Field Notes #01 – Prompting style, GPT 5.2, and assorted links

DeepMind documentary: The Thinking Game Documentary from Google DeepMind: The Thinking Game. Interesting behind-the-scenes history of DeepMind, with high-level overviews of AlphaGo, AlphaZero, AlphaFold, etc. Exasperated prompting I’ve seen a couple of papers...

Systems That Learn

Every developer knows that software changes over time. It grows, breaks, adapts. But what’s happening now — with systems that test, document, and even repair themselves — feels like something deeper. We are beginning to build systems that learn. They don’t just...

Self-Healing Tests and Automated Bug Fixing

Every software developer knows the sinking feeling that comes with a red line in a test report. Something broke. Maybe it’s your fault; maybe it’s not. Either way, the work starts — find what changed, fix it, and get everything green again. But what if the tests...

Synthetic Users and Interface Design

The real test of any feature isn’t whether it works — it’s whether people can use it. You can build the most technically perfect setting in the world, but if users can’t find it, understand it, or trust what it does, it might as well not exist. That’s why good...

AI-Assisted Feature Development

Once a system can reliably test and document a feature, the next question naturally follows: could it also help build one? Testing and development have long been treated as separate stages — one verifies, the other creates. But if an automated testing system can...

Tests as Documentation

Somewhere along the way, I started making my manually-written tests do more that just verify features. I started using them to document how features worked, where buttons lived, and what users could expect to see on their screens. What began as a technical artifact...

Evaluation and Validation

When an AI agent finishes writing an acceptance test, there’s a question that hangs in the air: is it right? The test may run successfully. The browser may click through the expected sequence of screens. The Codeception output may glow with green “OK” lines. But a...

Stability and Context Drift

No system stays still for long. WordPress updates, browsers change, and Divi — the page builder extended by many of my plugins — continues to evolve. Each new release brings refinements, bug fixes, and interface tweaks. And while that’s good news for users, it’s a...

Speed and Scale

If intelligence is the engine of this project, then time is its fuel. And early on, I was burning through a lot of it. Running acceptance tests is slow work. A single test can take minutes to complete, especially on a complex system like WordPress. Add the Divi page...

Case Study: Tencent’s XUAT-Copilot

It was around the time my own system first began to work — when the AI agent could reliably generate, run, and refine acceptance tests — that I paused to see who else might be exploring similar territory. The few resources I found mainly just offered prompts to get...