SEO Automation: What to Automate, and How

When I run into a repetitive task, my first reflex is to ask whether that task deserves my attention and my time, and instinctively I weigh the upside of spending those precious resources on alternative work that can’t be automated. And that, it seems to me, is the heart of the matter: having the judgment to know what to automate and what not to.

Automating tasks, and even complex processes, has never been easier thanks to AI. The superpower is so big that you slip fast into the temptation of wanting to automate everything. Who, in this age of vibe coding, hasn’t built themselves a ‘Ferrari’ to handle tasks you could perfectly well do on a “bicycle”?

On the mission to automate, what counts is: what to automate, how far, and with what safety net. That’s where you decide whether an automation frees you up, wastes your time, or even hands you a brand-new problem to maintain.

In the lines that follow I’ll share my criteria and the step-by-step I follow to build automations: from scripts to agents, and from agents to loops that improve themselves.

Automate to stop doing, not to show off

There’s a name for the work worth automating. In reliability engineering, Google calls it toil: work that is manual, repetitive, automatable, tactical, devoid of enduring value, and that scales linearly as what you manage grows. It’s not “bad work”; it’s necessary work that doesn’t make you better by repeating it. Google’s SRE teams set themselves an explicit ceiling: keep toil under 50% of their time, so the other half stays reserved for the work that reduces future toil.

Translated to SEO: pushing two hundred ALT tags by hand is toil. Exporting Search Console and pasting it into a sheet every Monday is toil. Combing through five thousand URLs hunting for broken canonicals is toil. Deciding which cluster you bet the quarter on is not: that’s judgment, and judgment doesn’t get delegated to a deterministic script.

That’s why I don’t automate blindly. I automate for a concrete, measurable reason: to gain time and speed, to move more business, or—the one I care about most—to take a load off my mind so I can think at a higher level. If an automation gives me none of that, it’s surplus.

And from my early years in the field I keep a warning from Bill Gates in The Road Ahead (1995) etched in my memory: automation applied to an efficient operation magnifies the efficiency; applied to an inefficient operation, it magnifies the inefficiency. Automating a broken process only brings you the disaster faster. Judgment first; the machine second.

What to automate (and what not to)

My heuristic comes straight from the definition of toil. I automate when three things hold at once:

It repeats. If I do it once a year it doesn’t pay off; if I do it every day, it does.
It has clear rules. I can describe the “how” without it hinging on my case-by-case intuition.
The result is verifiable. I can objectively check whether it came out right.

With those three filters, the SEO work that responds best is predictable: technical audits, keyword research and clustering, quality control of content and schema, internal linking, and data extraction via API. I apply it systematically with agents.

What I don’t automate matters just as much: the strategic decision (what to compete against or what to prioritize), the narrative and angle of a piece, relationships, and any judgment that needs business context. An AI can have an opinion, sure, but there the value lies in my judgment, and delegating it means losing exactly what sets me apart. That boundary, moreover, is what connects this work to getting cited by AI search engines: there’s little point in automating production if you lose the judgment that makes you a reliable source.

Ahrefs summed it up not long ago: the bottleneck is no longer “can the software do this?” (today the answer is almost always yes), but “what repetitive work are you still doing by hand and can you turn it into an automated workflow instead?” That’s exactly the question. Mine goes one rung further: once you decide to automate something, how you build it matters as much as the what.

The automation ladder

Not all automation is equal. There’s a ladder, and each rung demands more engineering than the one below.

Rung 1. Scripts: the deterministic stuff

When a task has fixed rules and zero ambiguity, a script is more than enough. It’s the most reliable automation there is, because it does exactly the same thing every time. A Python script that pulls from the Search Console API gets you every row, not the thousand the interface lets you export. It doesn’t need “intelligence”: it needs to be well written. Most of what I automate lives here: pushing metadata in bulk, purging tracking parameters, checking for broken links, downloading data from GA4/GSC/BigQuery, analyzing the losers and winners of a Core Update… Deterministic and boring, which is exactly what you want.

The beginner’s mistake is wanting an AI agent for something a for loop solves. If the rules are clear, the model only adds cost, latency, and a new way to get it wrong. On top of that, these well-built scripts become powerful tools for the agents on the next rung.

Rung 2. Agents: what can be taught (with a safety net)

The jump to agents happens when the task no longer has closed rules but can still be taught: drafting a first version, classifying search intents, auditing a page against a set of criteria, or benchmarking ourselves against the competition in the SERPs. An agent handles the ambiguity a script won’t tolerate. The problem is it also gets things wrong in ways a script can’t, and sometimes it does so with great confidence.

That’s why a loose agent isn’t automation: it’s a gamble. Automation starts when you surround it with three things:

Judges. Another model evaluates the first one’s output against a rubric. The technique has a name, LLM-as-a-judge, and one nuance worth knowing: judges carry their own biases (of position, of verbosity, of self-enhancement), so you design them carefully and don’t improvise them.
Gates. Deterministic checks that block anything failing to clear the bar before it reaches production. Applied to a model’s output, they’re the equivalent of CI/CD quality gates over code.
Feedback loops. The agent generates, the judge evaluates and critiques, and that critique feeds back in for a second attempt. Anthropic calls this pattern evaluator-optimizer: one LLM call generates the response while another provides evaluation and feedback, in a loop.

There’s an idea from Anthropic here that should be framed on the wall of every SEO team using agents: an agent’s result is the final state, not what the agent says. Example: a booking agent can finish by saying “your flight is booked,” but the real result is whether the reservation exists in the database. In SEO it’s identical: an agent claiming “meta updated” is worth nothing; what’s worth something is the meta actually having changed in the database. That’s why my gates check the state, not the story.

Rung 3. Self-improving loops: the system that writes the agents

The top rung isn’t “more agents.” It’s an agent that learns from its own mistakes without me rewriting the prompt every time. Research has been at this for years: Reflexion has the agent verbally reflect on feedback and store those lessons in an episodic memory for next time; Self-Refine uses one and the same model as generator, critic, and refiner, iterating on its own output. Nothing magical: it’s the previous rung’s feedback loop, closed in on itself.

The summit, as I see it, is a loop that iterates until it solves the whole task, clears the specs and the QA you set it, and—above all—writes the runbooks for the next agents. It’s the direction of spec-driven development, where intent becomes the source of truth instead of the code. And it’s, almost to the letter, what Anthropic is after with its Agent Skills: reusable instructions organized like an onboarding guide you’d create for a new team member, with the stated goal that agents themselves create, edit, and evaluate those skills on their own.

That’s the ceiling: not writing the script, nor even the agent, but the system that writes and improves the agents.

And a rule that saves me grief: the simplest solution that works wins. You don’t climb the ladder for sport; you climb a rung when the one below falls short, and not one more. (Anthropic gives the same advice with its agents: find the simplest solution and only increase complexity when it demonstrably improves results.)

What I build it with

I don’t work with a single stack: I use whatever fits each problem. Day to day I lean on AI coding assistants. I use Codex more for putting scripts together and debugging code, and Claude more for building the agentic processes and feedback loops—though I’ll admit that when I run out of Anthropic’s expensive tokens, I’m back in Codex.

When I pile up a lot of scripts to manage a single entity—a WordPress blog or a headless CMS like Strapi—I package them into an MCP. That turns a heap of loose tools into something an agent uses reliably, without me explaining every time what each function is called or what it expects to receive.

On the agent rung, I build the judge and the gate myself. The judge is another model call with its own rubric; the gate is a Python script without a drop of AI that checks the real state before calling anything good. I’ve got them running in my own workflow: one scores what comes out, the other blocks if the change isn’t truly in the database.

To kick off something new, I start with Claude and the superpowers:brainstorming skill. It forces me to describe the problem well and to leave the specs unambiguous before touching code. And at the very top lives a loop that iterates until it meets those specs and the QA I set it, and that keeps writing the runbooks for the next agents. Those runbooks, in my case, are skills: I write them once, I reuse them, and—more and more—it’s the agents themselves who refine them.

Frequently asked questions

What can you automate in SEO with AI?

Anything that repeats with clear rules and that you can check afterward: reviewing a site’s technical health, clustering keywords, generating drafts and briefs, monitoring quality, internal linking, or pulling data via API. What decides the strategy stays with you.

What should you NOT automate?

Judgment calls: where to compete, what to prioritize, the angle of each piece, dealing with people. If a task depends on your judgment, automating it strips away exactly what people choose you for.

Do I need to know how to code to automate my SEO?

It helps, but it’s not essential. With assistants like Claude Code, Codex, or Cursor you can write Python scripts leaning on the AI even if you start with little grounding. What you can’t outsource is the judgment of what to automate.

No-code or code?

No-code is fine to get started and for simple or one-off flows. The trouble comes when something breaks and you can’t open it up: you’re left waiting on the vendor. As soon as a flow is critical or repeats a lot, I prefer my own code.

What are “judges,” “gates,” and “feedback loops” when automating with agents?

A judge is a model that evaluates another’s output against a rubric; a gate is a check that blocks anything failing to clear the bar; a feedback loop reinjects the critique for a new attempt. They’re the safety net. Without them, turning an agent loose in production isn’t automating: it’s crossing your fingers.

What is MCP and why does it matter for SEO?

MCP is a protocol that lets an AI assistant use your tools and data. For an SEO, it means connecting the AI directly to your audits, APIs, and pipelines.

Continue reading about SEO & AI

LLM SEO with and without RAG: a practitioner's guide

More about me

Automate to stop doing, not to show off

What to automate (and what not to)