The Feedback Loop

When an AI agent writes a step-worth of test code, that step is only a hypothesis — a theory about how the world should respond. It’s not until the code actually runs that we find out whether the hypothesis holds true. This moment — when code meets reality — is where...

Generating Step Code

Once the AI agent decides what it needs to do next, it still has one crucial task left: to turn that decision into code — the executable instructions that actually perform the action in the browser. This is where the test comes to life, each line converting the...

Setting Goals and Sub-Goals

Each acceptance test we task our AI agent with creating needs a high-level goal: a short description of what the test should achieve. At first glance, it might seem enough to say, “Write a test that checks if the gallery arrows can be hidden.” But the precision of...

Sensing the Environment

Every intelligent system needs a world to operate in. For Voyager, that world was Minecraft — a sandbox of forests, caves, and creatures. For our testing agent, the world is a WordPress site and it is just as rich: a landscape of menus, buttons, fields, and pages,...

Translating Voyager to Testing

What made Voyager remarkable wasn’t that it played Minecraft — it’s that it learned through interaction. It explored an environment, noticed what worked, and built on its successes. That same principle applies surprisingly well to software testing. A WordPress site...

An AI Learns to Play Minecraft

In 2023, a group of researchers released a paper called Voyager: An Open-Ended Embodied Agent with Large Language Models that, on the surface, was about an AI playing Minecraft. That might sound trivial — perhaps another entertaining application to some in-game task —...

Teaching an AI to Write Acceptance Tests

At some stage in software development, you discover the real challenge isn’t coding new things — it’s coping with deployments and the worry whether what used to work still does. The software industry’s solution is automated acceptance testing. These tests are...