claude code PreToolUse hook block dangerous bash command

Last month I watched an agent loop, on someone else's machine, get within two characters of typing rm -rf / before a tired engineer pulled the plug. The command was a hallucinated cleanup step. The plug was a stress reflex. Neither belongs in a system you want to leave running overnight.

This lesson is the version of that night I wish I'd had ready in advance. I'll wire a PreToolUse hook that gives you per-call veto power over any tool invocation, plus a feedback channel that tells the model exactly why the call was denied so it can retry with a safer command instead of crashing into a stack trace. Five commits, one Python file, ten tests, zero third-party dependencies — and a deny path that the agent can actually learn from in the next turn.

A reasonable agent should not be one stray prompt away from running rm -rf / on your dev machine. A reasonable agent loop should also not crash opaquely when it does. Claude Code's PreToolUse hook gives you both.

Each of the five commits is a checkpoint you can clone, run, and verify in isolation. By the last one the hook denies the five most common destructive Bash patterns, allows narrow scoped exceptions, and proves the contract holds with a stdlib unittest harness that runs in under a second.

The reference for the payload format and the available decision verbs is the official Claude Code hooks documentation at https://docs.claude.com/en/docs/claude-code/hooks. Everything below assumes Python 3.11 or newer because I lean on the new union syntax Rule | None.

Lesson 1: wire the PreToolUse hook scaffold

How do you know a PreToolUse hook is actually firing before you start trusting it with deny verdicts? That sounds trivial. It is the single most common place this kind of integration silently fails.

A PreToolUse hook is a regular program. Claude Code spawns it before any tool call, hands it a JSON payload on stdin, reads JSON back on stdout, and uses the returned decision to allow, deny, or ask the operator. The matcher field in settings.json filters which tool the hook fires for. I only want Bash:

{
  "hooks": {
    "PreToolUse": [
      {
        "matcher": "Bash",
        "hooks": [
          { "type": "command", "command": "python3 hooks/bash_guardrail.py" }
        ]
      }
    ]
  }
}

The first commit keeps the hook intentionally boring: read stdin, parse JSON, log the candidate command to /tmp/bash_guardrail.log, return permissionDecision: "allow". That sounds pointless but it is not. Running this commit against a live Claude Code session immediately tells you two things. First, the hook is firing (the log file grows). Second, the JSON contract on stdout is correct (Claude Code does not warn about a malformed response). Most hook bugs I've seen come from stdin/stdout plumbing, not from the rule logic. Settle the plumbing first, then attack the policy.

def main() -> int:
    raw = sys.stdin.read()
    payload = json.loads(raw)
    command = payload.get("tool_input", {}).get("command", "")
    log(f"tool={payload.get('tool_name')} command={command!r}")
    sys.stdout.write(json.dumps({
        "hookSpecificOutput": {
            "hookEventName": "PreToolUse",
            "permissionDecision": "allow",
            "permissionDecisionReason": "guardrail: no rules registered yet",
        }
    }))
    return 0

Try it at commit 0feb005. Tail /tmp/bash_guardrail.log while the agent does anything trivial. If lines appear, the plumbing is correct and lesson 2 is safe to start.

Lesson 2: a pattern catalog, detect-only

Why would anyone draft a denylist, get it working, and then refuse to enforce it for an afternoon? I'd rather watch the hook for an afternoon first. The cost is half a day of wall time. The reward is a denylist that's grounded in commands the agent actually proposes, not a denylist I dreamed up while writing the regex.

The second commit introduces a frozen Rule dataclass with three fields: a name, a compiled regex, and a human-readable reason. Five rules ship in the initial catalog:

@dataclass(frozen=True)
class Rule:
    name: str
    pattern: re.Pattern[str]
    reason: str

BLOCK_RULES: tuple[Rule, ...] = (
    Rule("rm-rf-root-or-home",
         re.compile(r"\brm\s+(-[a-zA-Z]*r[a-zA-Z]*f|-rf|-fr)\b.*(/|~|\$HOME)\b"),
         "rm -rf against root, home, or absolute paths is irreversible"),
    Rule("git-force-push",
         re.compile(r"\bgit\s+push\b.*(--force|-f\b|\+)"),
         "force-push rewrites remote history; reach for --force-with-lease at minimum"),
    Rule("sql-drop-or-truncate",
         re.compile(r"\b(DROP\s+(TABLE|DATABASE|SCHEMA)|TRUNCATE\s+TABLE)\b", re.IGNORECASE),
         "DROP/TRUNCATE in a one-shot Bash command bypasses migration review"),
    Rule("dd-of-device",
         re.compile(r"\bdd\b.*\bof=/dev/(sd|nvme|disk)"),
         "dd to a block device wipes the disk; use a named image file instead"),
    Rule("chmod-recursive-root",
         re.compile(r"\bchmod\s+-R\s+[0-7]{3,4}\s+/(?!tmp/|var/tmp/)"),
         "recursive chmod outside /tmp wrecks system permissions"),
)

Why a dataclass per rule instead of one mega-regex? Three reasons. Naming first: when the deny path eventually fires in lesson 3, the agent's retry context carries the rule name, which is far more actionable than "regex on line 47 matched". Extensibility second: each rule can later grow a per-rule allowlist (lesson 4 needs this slot). Testability third: a tuple of rules is trivial to iterate against a corpus of known-bad commands.

The hook still returns allow here. The visible diff is the new structured log line severity=block rule=rm-rf-root-or-home command='rm -rf /'. Run an agent session for an hour, tail the log, and you have a high-confidence list of what to actually deny in lesson 3. The dry-run discipline costs about 30 minutes of wall time and saves the embarrassment of denying a command the agent legitimately needed. Inspect the rule catalog at commit 919b644.

Lesson 3: flip the switch, return deny with a reason

The smallest diff in this series is also the one that pays for everything that came before it. The decide function now returns a deny verdict whenever any rule matches, with the matched rule's reason baked into permissionDecisionReason:

def decide(command: str) -> dict[str, str]:
    matched = first_match(command)
    if matched is None:
        return {"permissionDecision": "allow",
                "permissionDecisionReason": "guardrail: no rule matched",
                "_rule": "none"}
    return {"permissionDecision": "deny",
            "permissionDecisionReason": f"[guardrail:{matched.name}] {matched.reason}",
            "_rule": matched.name}

This is where naming pays off. The agent does not see "denied" and start fishing for workarounds. It sees a string of the shape [guardrail:rm-rf-root-or-home] rm -rf against root, home, or absolute paths is irreversible; restrict to a scoped temp dir. That string is more useful than a Python traceback. It encodes the rule (so the agent can recognize the same class of denial in subsequent turns) AND the suggested remediation (scoped temp dir). In practice the next turn reliably proposes rm -rf /tmp/build-xyz-123 instead of the original rm -rf ./build, which is the cheapest possible recovery loop.

Compare this design with the alternative of returning a generic one-bit deny. A one-bit deny costs the agent at least one extra round-trip to investigate why, and often more if the agent guesses the wrong reason. A structured reason resolves the simple cases in zero extra LLM calls and gives the agent a concrete rule key to reference for the rest of the session. The math is roughly one LLM call saved per blocked command. On a long autonomy run that easily clears 3x cost reduction on the failure path. Lesson 3 ships at commit 1da231c.

Lesson 4: allowlist exceptions for scoped temp dirs

The first version of this hook caught a CI step running rm -rf /tmp/build-cache-prev and denied it. That command is fine: the path is scoped, the directory belongs to the current user, and the operation is the entire point of the CI step. Blanket-banning rm -rf against absolute paths trains the agent to either give up or get creative, neither of which is what you want.

Lesson 4 adds a safe_if_matches tuple to every Rule. The matching loop now runs in two phases. If the primary pattern fires, walk the safe-list. If any safe pattern matches, downgrade the verdict to allow with a reason that names the override so audit logs stay readable:

@dataclass(frozen=True)
class Rule:
    name: str
    pattern: re.Pattern[str]
    reason: str
    safe_if_matches: tuple[re.Pattern[str], ...] = field(default_factory=tuple)

# rm-rf rule now ships with three allowlist patterns:
safe_if_matches=(
    re.compile(r"\brm\s+-[rRfF]+\s+/tmp/[\w.\-]+"),
    re.compile(r"\brm\s+-[rRfF]+\s+/var/tmp/[\w.\-]+"),
    re.compile(r"\brm\s+-[rRfF]+\s+\$TMPDIR/[\w.\-]+"),
)

The allowlist patterns are deliberately narrow:

rm -rf /tmp/... and rm -rf /var/tmp/... for scoped scratch directories
git push --force-with-lease (without trailing flags that would override the lease semantics)
An explicit --guardrail-allow=migration opt-in flag for DROP/TRUNCATE inside managed migrations

The opt-in flag for SQL is interesting because it costs the agent something. The agent has to assemble a flag the human operator (or a migration template) previously added to the trust list. The flag does nothing in psql itself; the hook treats it as a signed intent. That is a free layer of defence-in-depth against a model that, mid-loop, "decides" to issue DROP on its own. Lesson 4 lives at commit 623a500.

Lesson 5: a stdlib test harness catches a real regex bug

Tests for a hook should drive the hook the same way Claude Code does. The harness in tests/test_guardrail.py uses subprocess.run end-to-end:

def invoke(command: str) -> dict[str, Any]:
    payload = {"tool_name": "Bash", "tool_input": {"command": command}}
    proc = subprocess.run(
        [sys.executable, str(HOOK_PATH)],
        input=json.dumps(payload),
        capture_output=True,
        text=True,
        env={**os.environ, "GUARDRAIL_LOG": tmp_log},
        check=True,
        timeout=5,
    )
    return json.loads(proc.stdout)

Ten assertions cover the happy paths (ls -la is allowed), the rule paths (rm -rf / denied, dd if=/dev/zero of=/dev/sda denied), and the allowlist paths (rm -rf /tmp/foo allowed, git push --force-with-lease feature/foo allowed). The harness uses stdlib unittest so python3 -m unittest tests.test_guardrail -v runs it with zero pip install. The Python regex documentation at https://docs.python.org/3/library/re.html is worth a re-read before you hand-tune any rule.

My first run failed on two assertions. Both rm -rf / and rm -rf ~/.config came back as allow. The culprit was the original regex \brm\s+(-[a-zA-Z]*r[a-zA-Z]*f|-rf|-fr)\b.*(/|~|\$HOME)\b. That regex ends with \b directly after a character class that includes /, which is a non-word character. Word boundaries require a transition from a word position to a non-word position (or vice versa); / followed by end-of-string is two non-word positions in a row and produces no boundary. The regex never matched. The fix is a single tighter line:

re.compile(r"\brm\s+-[rRfF]+\s+(/|~|\$HOME)")

Tighter, no trailing boundary, the path-start scope is narrower. After the fix all ten assertions pass in roughly 430 milliseconds. The lesson here is the obvious one: regex hand-tuning is exactly the territory where you want red-green tests before you ship the rule, not after. The fix-and-harness commit is 776f25c.

Repository

Full source at https://github.com/vytharion/claude-code-pretooluse-hook-bash-guardrail. Clone it, point a fresh Claude Code session at the bundled .claude/settings.json, and ask the agent to do something terrible. The deny path should fire with a named reason. The agent should retry with a scoped alternative.

Lesson 1 → 0feb005 wire the PreToolUse hook scaffold (allow-only, stdin/stdout sanity check)
Lesson 2 → 919b644 Rule dataclass + five-entry pattern catalog, detect-only mode
Lesson 3 → 1da231c return permissionDecision deny with a structured reason
Lesson 4 → 623a500 per-rule allowlist (scoped /tmp, --force-with-lease, signed SQL opt-in)
Lesson 5 → 776f25c stdlib unittest harness catches a regex word-boundary bug

The public hook protocol reference is at https://docs.claude.com/en/docs/claude-code/hooks; the Claude Code source itself is at https://github.com/anthropics/claude-code if you want to read the dispatcher.

Conclusion

Five commits, one Python file, ten tests, zero third-party dependencies. The hook denies the most common destructive Bash patterns with a structured reason the agent can act on, allows narrow scoped exceptions where blanket bans would just train the agent around the regex, and is covered end-to-end by a subprocess harness that runs in under a second.

What to extend next. Wire a PostToolUse hook that audits which denied-then-retried commands actually got past the guardrail; a one-percent retry-success rate is fine, a thirty-percent rate means a rule is too narrow. Add a per-rule cooldown so a model that keeps hitting the same denial yields to the operator after N attempts. Move the rule catalog out of the script and into a YAML file so multiple projects can share one base list while extending it locally. None of those is hard once the contract is in place.

The win here is not the regex catalog. It is that the agent now operates in a loop where "I want to delete this" returns a structured signal instead of a stack trace. Every hook you write should aim for the same property: refuse loudly, refuse with a rule name, refuse with a remediation hint the next turn can use.