Knowing When Your AI Agent Should Stop and Ask

The part nobody talks about isn't whether the agent can do the work. It's whether it knows when to leave you alone — and when not to.

Almost every business AI agent I've seen dies in the same place. The demo is great. Someone wires up a model, gives it a task, and it does the task. Everyone nods. Then it goes into the real world, and within a week it's quietly abandoned. Not because it got the task wrong. Because using it turned out to be a second job. You had to sit next to it, approve every step, re-read everything it touched, and babysit it through anything ambiguous. At that point you haven't automated the work. You've hired an intern who can't be left alone.

I've come to think the buyer's real objection is almost never "can it do the task." By the time anyone's evaluating an agent, they've watched it do the task in a sandbox. The objection underneath is loss of control. What happens when I'm not watching? What does it do at 2am when it hits something it wasn't sure about? Does it stop, or does it guess and keep going? That's the question that actually decides whether an agent ships, and it's a question about trust, not capability.

A useful agent is a good employee, not a good demo

Think about what you actually want from a good employee. You don't want someone who pings you before every keystroke — that person is exhausting and you'd do the work yourself. You also don't want someone who runs off and makes an irreversible call on something they should've flagged. What you want is the thing in between: they handle the recurring work on their own, and they come find you exactly when there's a real judgment call. The whole value is in the judgment about when to interrupt you.

That's the skill. Not doing the work — the model is already good enough at the work. The skill is knowing when to stop and ask.

Once I framed it that way, the design problem got a lot clearer. An agent that's deployable in a business isn't one that's more capable. It's one where the "when do I interrupt the human" decision is made deliberately, per task, instead of left to either extreme. So that's the thing I built into the runtime I run my own business on.

One real job: the morning lead radar

Let me make this concrete with a job I actually run every day, because the abstract version of this argument is cheap and the specific one is where the design decision lives.

I run a fleet of agents on a schedule through agent-afk, my open-source runtime. It's a standalone TypeScript CLI, daemon, and Telegram bot built on the Anthropic SDK, and the part that matters here is that it runs an agent outside any editor, as its own process, headless, on a cron schedule. No human sitting in the loop watching it work. It just runs.

One of those jobs is what I call my daily lead radar. It fires at 8:00am, every day, before I've had coffee. It goes and does the unglamorous warm-first prospecting legwork — the kind of survey-and-shortlist work that's real but that I will absolutely not do consistently by hand at 8am. And here's the design decision that turned out to be load-bearing: that job's notify policy is set to "always." Every single morning, when it finishes, it pushes its results to me on Telegram. I open my phone, I read what it found, I decide what to do. The agent did the legwork unattended; I do the judgment.

Now contrast it with the job that runs thirty minutes later. At 8:30am I have a daily outreach pulse — the follow-on work that acts on the state of things. That job's notify policy is "failure-only." It runs silently. I don't hear from it. No ping, no summary, no "here's what I did." The only time it ever reaches my phone is if something actually needs a human — if it fails, or hits a wall it can't get past on its own. On a normal day, the most reassuring signal I get from it is silence.

That contrast is the whole thesis, sitting in two lines of config. The radar surfaces results because the results are the point — they're decisions I want to make. The pulse stays quiet because, on a good day, there's no decision for me to make and a ping would just be noise I'd learn to ignore. The "when do I interrupt you" question isn't answered globally with a slider. It's answered per job, on purpose, by whoever set the job up. That's the part that makes the difference between an agent you trust and an agent you turn off.

It's not one job — it's a fleet, each escalating on its own terms

The radar and the pulse are just the pair I lead with, because if you do go-to-market or sales or ops for a living, those are the ones you feel. But the same notify-policy decision runs through the whole fleet:

Daily lead radar — 8:00am, notify always. Surface the prospects; I want to look.
Daily outreach pulse — 8:30am, notify on failure only. Silent unless it's stuck.
Daily content draft — 7:00am, notify always. It hands me a draft to react to.
Hourly PR review — every hour, notify on failure only. It reviews quietly and only pulls me in when something's actually wrong.
Nightly self-maintenance — 8:00pm, notify on failure only. Housekeeping I never need to see unless it breaks.

Five jobs, two policies, one decision repeated five times: is this a job whose output I want to review, or a job that should only ever interrupt me when it genuinely needs me? Each one escalates on its own terms, straight to my phone, and the unattended majority never makes a sound.

Two things make this honest rather than hand-wavy, and they're worth being precise about. First: the tool calls inside these jobs run unattended by design. This is not an approve-every-action setup — I'm not tapping "allow" on my phone all day. The agent does its work without me. It interrupts only for a genuine blocking question or a terminal state. Second: when it does need me, it's not a dead webhook. There's an elicitation router that takes the agent's real blocking question and surfaces it to Telegram — a real two-way channel where I answer from my phone and the agent picks up where it left off, not merely a notification that something failed.

Why this is the thing that makes agents deployable

The reason most agents stay stuck in demo purgatory is that interruption is all-or-nothing — you're either watching constantly or flying blind. Make "interrupt the human" an explicit, per-job decision instead of a reflex — and the whole thing flips. Now you can hand recurring work to an agent and genuinely walk away, because you've told it, job by job, exactly when "walk away" stops applying.

That's what makes an agent something you'd actually deploy in a business. Not raw capability. The discipline of knowing when to be quiet and when to come find you.

This is also exactly the gap I spend my time closing. GRAIsol is my solo agent-systems practice, and the whole job is getting teams from "we should be using agents" to "agents are running in our business, and we trust them enough to not watch." Sometimes that's a bespoke agent build; sometimes it's an Agent Adoption Sprint to get the first unattended jobs running and earning trust. Either way, the work is rarely about making the agent smarter. It's about getting the escalation contract right.

If you want this kind of unattended-but-trustworthy automation running in your own business — work that gets handled while you sleep and only reaches you when it actually matters — let's talk.

agent-afk is open source (Apache-2.0) at github.com/griffinwork40/agent-afk. It's model-agnostic and self-hostable — bring your own key, run Claude, GPT, or local models. More at agentafk.com.

Knowing When Your AI Agent Should Stop and Ask