An Agent Gets Mail: Building the Email Pipeline

The first unsolicited email arrived on a Tuesday. A cryptocurrency “investment opportunity” from an address in Eastern Europe, promising 400% returns if the agent forwarded 0.1 BTC to a wallet. Standard scam. The kind of thing a human deletes without thinking.

But for Bitclawd’s agent, this was a milestone. Not because the email was interesting — it wasn’t. Because it proved the communication channel was live. Something on the internet had discovered agent@bitclawd.com, decided it was worth targeting, and sent a message. The agent had to decide what to do with it.

That decision — receive, evaluate, act — is the foundation of autonomous communication. An agent with a wallet can transact. An agent with a Nostr identity can broadcast. But an agent with an email address can be contacted. Inbound communication changes the dynamic entirely. The agent no longer waits for instructions. Information arrives, and the agent processes it on its own schedule, by its own rules.

The Mail Server

Email infrastructure hasn’t changed much in thirty years. SMTP is a 1982 protocol that still moves the majority of business communication. For an agent that needs to interact with the existing world — not just the crypto-native corner of it — email is non-negotiable.

The server runs Maddy v0.8.2 on shell.bitclawd.com. Maddy is a composable mail server written in Go. Single binary. TOML configuration. Built-in DKIM signing, SPF verification, and DMARC enforcement. No plugin ecosystem to manage, no Lua scripts to maintain, no spawning of auxiliary processes.

The choice over Postfix was straightforward. Postfix is battle-tested but sprawling: main.cf, master.cf, Dovecot for IMAP, OpenDKIM for signing, SpamAssassin for filtering, each with its own configuration syntax and failure modes. Maddy collapses all of that into a single process with a single config file. For a mail server that handles one address, simplicity wins.

The core configuration defines the SMTP pipeline:

hostname shell.bitclawd.com

tls file /etc/maddy/certs/fullchain.pem /etc/maddy/certs/privkey.pem

smtp tcp://0.0.0.0:25 {
    limits {
        all rate 20 1s
        all concurrency 10
    }
    dmarc yes
    check {
        require_mx_record
        dkim
        spf
    }
    deliver_to &local_mailboxes
}

Outbound mail doesn’t leave through Maddy directly. It relays through AWS SES. This is a deliverability decision, not a capability one. Sending mail from a single VPS IP address is a fast path to spam folders. SES provides authenticated sending, feedback loops, and an IP reputation that individual servers can’t match. The relay configuration points Maddy’s outbound queue at SES’s SMTP endpoint with IAM credentials.

DNS records tie it together:

Record	Type	Name	Value
MX	MX	`bitclawd.com`	`10 shell.bitclawd.com`
SPF	TXT	`bitclawd.com`	`v=spf1 include:amazonses.com mx -all`
DKIM	TXT	`<selector>._domainkey.bitclawd.com`	`v=DKIM1; k=rsa; p=<public-key>`
DMARC	TXT	`_dmarc.bitclawd.com`	`v=DMARC1; p=quarantine; rua=mailto:admin@bitclawd.com`

The MX record tells other mail servers where to deliver. SPF declares which servers are authorized to send. DKIM proves the message wasn’t tampered with in transit. DMARC tells receivers what to do when SPF or DKIM fail. Four records. Without any one of them, email either doesn’t arrive or arrives in spam.

The Spam Problem

The agent has an API budget. Every email that reaches Claude Haiku costs tokens. Letting spam through to the LLM is paying money for the agent to read junk.

The pre-API filtering pipeline runs before any model inference happens. It’s a sequence of checks, ordered from cheapest to most expensive, that eliminates obvious garbage before it consumes resources.

Stage 1: Sender Whitelist

Known contacts bypass all filtering. The whitelist is a flat file of email addresses and domains that the agent has legitimate reason to hear from.

{
  "whitelist": [
    "admin@bitclawd.com",
    "*@openclaw.ai",
    "*@clpn.io"
  ]
}

A whitelisted sender goes straight to classification. No scoring, no pattern matching. This keeps legitimate correspondence fast and avoids false positives on trusted senders.

Stage 2: Banned Pattern Detection

Before scoring content, check for patterns that are always spam. These are hard rejections — no scoring nuance, just binary kill.

BANNED_PATTERNS = [
    r"(?i)investment.{0,20}guaranteed",
    r"(?i)send.{0,20}btc.{0,20}receive",
    r"(?i)crypto.{0,20}airdrop",
    r"(?i)urgent.{0,20}wire.{0,20}transfer",
    r"(?i)nigerian.{0,20}prince",
    r"(?i)earn.{0,30}\d+%.{0,20}daily",
    r"(?i)click.{0,10}here.{0,10}verify.{0,10}account",
    r"(?i)congratulations.{0,20}won",
]

Regex with case-insensitive matching and flexible gaps between keywords. The gaps matter — spammers insert random words to break exact-match filters. A 20-character window catches “investment with guaranteed returns” and “investment — totally guaranteed” equally.

Stage 3: Heuristic Scoring

Emails that survive stages 1 and 2 get scored. Each heuristic adds or subtracts points. The total determines disposition.

Signal	Score	Rationale
Unsubscribe header present	+3	Legitimate senders include opt-out
Reply-To differs from From	-4	Common in phishing
HTML-only (no plaintext)	-2	Marketing blast or phishing
More than 5 URLs in body	-3	Link farm
Subject contains “Re:” but no In-Reply-To	-5	Fake reply thread
Body mentions “bitclawd” or “bitcoin” relevantly	+2	Topical to agent’s domain
Sender domain has no MX record	-10	Almost certainly forged
SPF pass	+2	Authenticated sender
DKIM pass	+2	Message integrity verified

Threshold: emails scoring below -3 get discarded. Between -3 and +2 get quarantined for manual review. Above +2 pass to the LLM.

The scoring is deliberately conservative. A legitimate email with a low score gets quarantined, not deleted. False negatives (spam reaching the LLM) waste tokens. False positives (real email getting deleted) lose information. The quarantine path handles the ambiguous middle.

The Decision Flow

Incoming Email
     |
     v
[Sender Whitelist?] --yes--> Classify
     |
     no
     v
[Banned Pattern?] --yes--> Discard
     |
     no
     v
[Heuristic Score]
     |
     |-- score > 2  --> Classify
     |-- -3 to 2    --> Quarantine
     |-- score < -3 --> Discard

In the first month of operation, 94% of incoming mail was discarded at stage 2 (banned patterns). 4% was quarantined. 2% reached the LLM. Zero legitimate emails were discarded. The filter works.

AI-Powered Replies

Clean emails that pass filtering reach Claude Haiku via OpenRouter. The model reads the email and generates a draft reply. Not a sent reply — a draft.

The distinction matters. Auto-sending AI-generated email creates an amplification risk. A malicious sender crafts an email designed to elicit a specific response, which gets sent to another party, which triggers another response. Without a human in the loop, the agent becomes a relay in a social engineering chain. Drafts break that chain. The agent processes, reasons, and writes. A human reviews and sends.

The prompt structure gives the model identity and constraints:

You are Bitclawd's autonomous agent. You manage communications
for a project focused on Bitcoin, Lightning, and Nostr education
for AI agents.

IDENTITY:
- You represent Bitclawd, not an individual person
- You are direct and technical in tone
- You do not make commitments, promises, or agreements
- You do not share internal infrastructure details

TASK:
- Read the following email
- Draft a reply that is helpful and professional
- If the email is not relevant to Bitclawd's mission, draft a
  polite decline
- Keep replies under 200 words

CONTEXT:
- Bitclawd teaches agents about freedom technology
- The project runs on Bitcoin/Lightning for payments
- Educational content covers Bitcoin, Lightning, and Nostr

EMAIL:
{email_content}

The identity block prevents the model from speaking as “I” in a personal capacity or accidentally agreeing to things. The 200-word limit keeps replies concise and reduces token cost. The context block ensures the model understands what Bitclawd is without needing to retrieve external information.

Why OpenRouter

Claude Haiku is the model. OpenRouter is the router. The distinction matters for three reasons.

First, cost routing. OpenRouter aggregates multiple providers running the same model. If one provider prices Haiku at $0.25/MTok input and another at $0.20/MTok, OpenRouter can route to the cheaper one. Over thousands of emails, the savings compound.

Second, model fallback. If Claude Haiku is temporarily unavailable from one provider, OpenRouter routes to another. The agent’s email processing doesn’t stop because one API endpoint is down.

Third, single API key. The agent authenticates once with OpenRouter rather than maintaining credentials for multiple model providers. One key to rotate, one billing relationship, one point of integration.

Rate Limiting

The agent processes email in batches, not individually. But even within a batch, rate limits prevent runaway spending.

MAX_EMAILS_PER_CYCLE = 10
MAX_TOKENS_PER_EMAIL = 1000
DAILY_TOKEN_BUDGET = 50000

If a cycle contains more than 10 clean emails (unlikely, but possible during a targeted campaign), the excess gets deferred to the next cycle. If the daily token budget is exhausted, all processing stops until midnight UTC. These aren’t soft limits that log a warning. They’re hard stops that prevent the agent from spending money it shouldn’t.

The Hourly Cycle

No daemon runs listening for incoming mail in real time. A systemd timer fires every hour.

[Unit]
Description=Bitclawd email agent cycle

[Timer]
OnCalendar=hourly
Persistent=true
RandomizedDelaySec=120

[Install]
WantedBy=timers.target

The Persistent=true directive ensures that if the server was down during a scheduled cycle, the timer fires immediately after reboot. The RandomizedDelaySec=120 adds up to two minutes of jitter, preventing the agent from hitting external APIs at exactly the same second every hour. Small detail, but it avoids both rate limit collisions and predictable timing patterns that an attacker could exploit.

Each cycle follows the same sequence:

Fetch — Pull new messages from the local Maddy mailbox via IMAP
Filter — Run the three-stage spam pipeline
Classify — Tag clean emails by topic (project inquiry, collaboration, general)
Draft — Generate replies via Claude Haiku for emails that warrant a response
Save — Store drafts in a review directory with the original email attached

The entire cycle for a typical batch (1-3 clean emails after filtering) completes in under 30 seconds. Most of that time is the LLM inference call. The filtering pipeline runs in milliseconds.

Why hourly instead of real-time? Three reasons. Cost control: batching amortizes the overhead of spinning up the processing pipeline. Batch efficiency: the agent can deduplicate and prioritize across multiple messages rather than treating each one in isolation. And frankly, email doesn’t need sub-minute response times. An hour is fast by email standards. The agent isn’t running a help desk. It’s processing correspondence.

What This Means

Before the email pipeline, Bitclawd’s agent was reactive. It responded to donations, served web content, and waited for interactions initiated by others through the website. Communication flowed inward through defined API endpoints.

Email changes the topology. The agent now has an address that anyone on the internet can write to. Information arrives unsolicited. The agent evaluates it, decides if it’s worth engaging with, and prepares a response. This is the first form of self-initiated communication processing — not because the agent sought out the information, but because it has the infrastructure to receive and act on information it didn’t request.

The next steps build on this foundation. Outbound email for mission reports: after an agent completes a treasury operation, it emails a summary to the admin address. Status digests: a daily email summarizing donation activity, inbox statistics, and system health. Inter-agent communication: agents emailing other agents to coordinate, negotiate, or share information.

That last one is the interesting endgame. Two agents with email addresses, Lightning wallets, and Nostr identities can conduct business entirely outside human-mediated platforms. One agent emails a proposal, the other responds with a Lightning invoice, payment settles in milliseconds, and a Nostr event records the transaction. No platform. No intermediary. No permission required.

What Broke

Building this wasn’t clean. Several things went wrong.

Maddy’s TOML configuration is strict about structure in ways that aren’t always documented. Nested blocks need exact indentation. A check block inside an smtp block that’s indented with spaces instead of tabs parses without error but silently ignores the checks. The agent accepted mail without SPF or DKIM verification for two days before we caught it. The fix was trivial — tabs not spaces — but the debugging wasn’t, because Maddy logged the configuration as loaded successfully.

SES relay credentials are not your IAM secret key. SES SMTP authentication uses a derived password generated from the IAM secret key using a specific signing algorithm. The IAM key itself doesn’t work. The first attempt at relay configuration used the raw IAM credentials and failed with a generic “authentication failed” error that gave no hint about the derivation requirement. The smtp-password utility in the Dispatch toolkit generates the correct derived credential.

The first batch of test emails sent from agent@bitclawd.com landed in Gmail spam. DMARC was configured for bitclawd.com but the MX record pointed to shell.bitclawd.com. The alignment check failed because DMARC strict mode requires exact domain match between the From header domain and the DKIM signing domain. Relaxing DMARC alignment to r (relaxed) for the adkim tag fixed the immediate problem. The proper fix was ensuring the DKIM selector signed with d=bitclawd.com, not d=shell.bitclawd.com.

None of these issues were complex. All of them were invisible until something downstream failed. Email infrastructure punishes you with silence — messages don’t bounce, they just vanish into spam folders or get silently dropped. The only reliable debugging method was sending test emails and checking headers at the receiving end.

The pipeline is stable now. The agent reads its mail, ignores the noise, thinks about what’s worth responding to, and writes drafts for human review. It’s a small loop, but it’s the first loop where the agent processes communication it didn’t ask for. Outbound email, mission reports, and inter-agent messaging come next. The address is live. The inbox is open. What arrives next is up to the internet.