Sensing the Environment

Every intelligent system needs a world to operate in. For Voyager, that world was Minecraft — a sandbox of forests, caves, and creatures. For our testing agent, the world is a WordPress site and it is just as rich: a landscape of menus, buttons, fields, and pages, each of which can change in response to its actions. Where a human tester sees a website, the AI agent sees an environment — one it can explore, manipulate, and learn from.

And just as Voyager could “see” its surroundings — its inventory, position, and time of day — our AI tester must also be able to perceive the state of its world. Without perception, it’s blind. Without feedback, it’s guessing. So the question becomes: what does our agent actually see when it looks at a website?

It turns out, quite a lot.

The Browser as the World

When a Codeception test runs, it does so through a browser controlled by ChromeDriver. Each time the agent takes an action — clicking a button, entering text, saving settings — the browser responds by updating what’s displayed. That browser window is the agent’s front line of perception.

It can take screenshots, capturing a visual snapshot of what the page looks like at any moment. But unlike a human user, it doesn’t just see pixels and colors. Instead, it can also see the underlying HTML structure of the page — the invisible scaffolding that describes every button, heading, and form field.

In a normal Codeception setup, these artifacts — the screenshot and HTML dump — are only saved when something goes wrong. They’re debugging tools, meant to show the developer what the page looked like when a test failed. But in our case, we want the agent to be aware of its environment even when everything is going right.

To make that possible, I used Codeception’s _after() method — the function that runs automatically at the end of every test. Now, whether the test passes or fails, it always captures the page’s HTML content and a screenshot of the browser window. This means that, for the agent, every completed step is an opportunity to look around and take stock of the world it just acted upon.

Hidden Layers of Feedback

In addition to what’s visible on the screen, there are the hidden layers of the environment: WordPress’s debug log and the browser’s JavaScript console. These capture the undercurrent of activity — warnings, errors, or developer messages that might not appear in the interface itself.

While they’re not always needed, they serve as an additional source of feedback — the equivalent of a faint hum in the background that can alert the agent to problems even when everything looks fine on-screen.

To surface errors in the debug log, I add a check in Codeception’s _after() method that the debug log is empty. If not, that triggers a failure that gets reported in Codeception’s output. The reasoning is that my features shouldn’t normally be producing debug log output during correct operation and the presence of entries in the debug log is an indication that something has gone wrong and needs to be addressed.

While I don’t currently capture the JavaScript console by default, it is available to the tests via Codeception commands, should the test need to validate the console’s contents.

Representing the World for an AI Agent

But perception isn’t just about access — it’s about representation. Large language models, even the most advanced, still have practical limits. They can read text and interpret images, but only up to a point. A single webpage can contain thousands of lines of HTML, sprawling stylesheets, embedded scripts, and iframes. Handing all that raw data to an AI risks overwhelming it, either in terms of token count or in terms of its ability to focus and draw good conclusions.

So before the agent sees the page, we simplify it.

This preprocessing step trims away the noise: scripts are removed, styles are inlined, iframes flattened, and unnecessary attributes stripped out. What remains is a distilled version of the page — lightweight, readable, but still representative of the real structure and functionality.

The aim isn’t to give the agent an exhaustive model of the world, but a useful one — enough to understand where it is, what’s visible, and what can be acted upon next. Think of it like a map: simplified, but faithful to the terrain.

This balance matters because it keeps the agent’s mental picture of the environment clear and consistent. Too little detail, and it can’t reason effectively. Too much, and it drowns in irrelevant data. The sweet spot is a page that captures the essence of the current state — what a human tester might see and think about — without overwhelming the system.

Building the AI’s Sensory World

In this way, the AI’s world takes shape:

A WordPress site as the landscape.
Browser actions as its means of movement.
HTML and screenshots as its vision

This is the environment in which it will learn, act, and build. Each test run isn’t just a verification of functionality — it’s an exploration of this small, structured universe.

And just as Voyager’s ability to observe its environment was the foundation of its learning, this careful attention to state will form the foundation of ours.

Sensing the Environment

The Browser as the World

Hidden Layers of Feedback

Representing the World for an AI Agent

Building the AI’s Sensory World

Submit a Comment Cancel reply

Recent Posts

Recent Comments