No system stays still for long.
WordPress updates, browsers change, and Divi — the page builder extended by many of my plugins — continues to evolve. Each new release brings refinements, bug fixes, and interface tweaks. And while that’s good news for users, it’s a quiet headache for any automated system that’s trying to interact with it.
This gradual erosion — the slow drift between what the AI agent expects to see and what it actually encounters — is what I call context drift. Accordingly, we need strategies to keep our AI-driven test generation stable over time.
System Detection
Divi is a great example. The jump from Divi 4 to Divi 5 brought major internal changes, from the way settings are stored to the layout of the builder interface itself. A Codeception test that clicked confidently through Divi 4’s menus would struggle in Divi 5 — not because the feature stopped working, but because the underlying selectors and UI logic changed.
To address this, I implemented system detection. The test environment now identifies whether it’s running on Divi 4 or Divi 5 and automatically loads a different set of instructions and a separate step library for each.
This makes the agent’s behavior version-aware. If a test starts on Divi 4 and later switches to Divi 5 — as might happen when testing a migration — the agent switches configuration mid-test, changing both its reference library and its internal guidance. The process is seamless and, more importantly, stable.
By letting the agent understand which world it’s in, the system avoids trying to apply knowledge that no longer fits.
Helper Methods
Some stability problems don’t come from version drift but from quirks of the interface itself. Certain UI elements are simply tricky to automate — not just for the agent, but often for human testers too.
One example is the color picker in Divi 5. It looks simple, but it hides a subtle behavior: if the field isn’t blurred (that is, if focus isn’t removed) before closing the picker, the chosen color doesn’t save. The agent will usually correctly verify the value has been set in the field in the color picker modal, and correctly verify that the color picker is closed, but it fails to notice that the color is not retained in the closed form of the color picker.
Because each color field has a unique identifier, caching or reuse doesn’t help much here. The same bug appears again and again in slightly different forms.
My solution was to create a helper method — a custom Codeception command that wraps this behavior correctly. The helper handles the focus, click, and blur events automatically, ensuring the color always saves. Once that helper exists, the agent doesn’t need to think about the color picker anymore. It just calls the helper.
This kind of abstraction might seem like cheating — manually adding code to handle what the AI couldn’t. But in reality, it’s no different from paving over a pothole in the road. You’re not changing the journey; you’re just making it smoother. And if the color picker changes again in the future, there’s only one piece of code to update.
Over time, these helpers become a toolkit for resilience — the equivalent of giving the AI better tools for the job rather than asking it to be endlessly adaptable.
Robust Prompting
Even with system detection and helpers, perhaps the most powerful stabilizer of all is robust prompting — the quiet craft of teaching the AI how to think clearly and consistently.
When generation gets stuck on a step — maybe it keeps clicking the wrong button, or selects the wrong element because multiple options look similar — I dig into the issue. I try to understand not only what went wrong, but why. Was the model relying too much on hidden CSS classes? Was it ignoring visible button text? Was it assuming that similar-looking panels worked the same way when they didn’t?
For each insight, I add a note to an instructions file — a kind of standing set of principles that guide the AI’s behavior. Over time, this file has grown into a collection of small but powerful rules.
These instructions don’t hard-code behavior; they shape reasoning. They help the model generalize from one test to another, making decisions that hold up even as the interface evolves.
It’s a process of teaching by reflection — turning each past failure into guidance for future success.
Staying Relevant
Even with good prompting and separate libraries, time takes its toll. Within a single system — say, Divi 5 — updates still occur. UI labels shift. Controls get renamed. Steps that worked a month ago might no longer match anything.
To prevent the library from filling up with outdated or misleading examples, any step that’s reused but fails validation is automatically removed. This keeps the vector database fresh and relevant, ensuring that the AI’s memory doesn’t become a museum of broken code.
In this way, the system maintains a kind of living memory — always learning, always pruning.
Stable Progress
Stability isn’t about freezing a system in time. It’s about helping it adapt without losing its footing.
Between version detection, helper methods, robust prompting, and self-cleaning memory, the AI learns not just how to generate tests, but how to stay reliable as the world around it changes.