What We Did When Our Email Stack Stopped Sending

A founder asked me last week how our scan reports get delivered. I told him we used to run them through one of the bigger ESPs and now we run them through software we wrote ourselves. He laughed and asked if we built our own database too.

I said no, just the email part, because that was the part that broke.

The actual conversation we had after that is what this post is about, because the question keeps coming up in slightly different forms. Why did you migrate. What was wrong with the old stack. Was it really that bad. The short version is yes. The long version is the rest of this post.

As a team that provides "enforcement infrastructure," the one who blocks bad merges before they hit production, the one whose whole pitch is about catching what breaks before it becomes your problem — we went MIA.

Not intentionally. Not because we went on a team retreat to Shimla (that is a cool idea though). Not because we were too busy shipping features. We went quiet because the thing that was supposed to be carrying our voice out into the world had silently, cheerfully, dashboards-all-green stopped doing exactly that.

Here is the full story. The embarrassing part, the technical part, and the part where we fixed it.

Going Back in Time to March

Captain Patch standing next to a small wooden time machine in the harbour, looking puzzled at a calendar with March crossed out twice.

It started in March. We did not know it started in March until much later, which is part of the lesson, but if you wind the tape back that is where the trail begins.

The first signal was not a number. It was a feeling. Replies to outreach that should have come back in a day were taking three. New signups felt slower than the activity on the landing page implied. We chalked it up to the usual early-stage noise of bad weeks and good weeks and ignored it the way founders are supposed to ignore most of the funnel ghosts they see.

Then Tanvi noticed the waitlist confirmation open rate had dropped by half in two weeks. Nothing else had changed. That was the thread that actually got pulled. She pulled, I pulled, and by the end of the day the whole sweater was on the floor.

The clearest tell came from a user on Twitter asking where his scan report was. I checked our platform. The send was marked delivered. I checked the inbox on the test account. Nothing. Waited an hour. Still nothing. The report eventually landed three hours later in a folder the recipient does not check on a regular basis, which is the kind of "delivery" that does not count.

This is the worst class of infrastructure failure because it does not look like one. The dashboard reads green, the success rate sits at 99 point something percent, and yet the actual job the system is supposed to do has stopped happening at scale.

“If the dashboard says delivered and the inbox says nothing, the dashboard is wrong.”

The platform we were on is a well-known ESP, the kind of name you would recognize and that most teams default to at our stage. It was the right choice when we picked it. We had outgrown it without noticing.

What It Did to Us

Captain Patch on the harbour deck watching letters pile up at the gate while the lighthouse blinks late and ships drift in the fog.

The damage compounded in ways we only mapped in retrospect. Most of it was invisible from inside the dashboard, which is the part that still bothers me.

Cold outreach went dry. Conversations that had been alive in February stopped coming back in March. We assumed for a while that the market had cooled and spent a couple of weeks idly worrying about whether we had said something wrong in a post somewhere. The market was fine. Our messages were just not getting where they needed to go.

Scan report delivery turned into a guessing game. The free codebase scan is the artifact that closes our top of funnel, and the report was arriving late, or in spam, or sometimes not at all. The single highest-value email we send was the least reliable one in the queue. We did not realize this for weeks because every prospect who never got their report just quietly disappeared and we had no good way to tell the difference between "they did not care" and "they never received it."

Sender reputation was eroding underneath all of it. Bounces were not being surfaced in time, so we kept re-sending to dead addresses, so receiving servers started treating us with a little more suspicion every week. None of that registered as a hard fail. It was a slow descent on a graph nobody was watching closely enough.

Open rates went down. Signups slowed. Free scan registrations were getting delayed badly enough that some prospects must have assumed the form had broken and moved on. Onboarding emails trickled out at a third of the volume we had configured them at.

The cumulative effect was that the entire top of our funnel was running through a layer that had stopped working, and the only signal was the absence of activity. Absence is brutal to read. It does not show up on any chart we owned at the time. You only feel it.

The Internal Conversation

Captain Patch and Captain Scout at the harbour office table, Patch holding a graph that slopes downward, Scout pointing at it with one paw and at the lighthouse with the other.

It clicked on a Tuesday. Tanvi walked over with the open rate dashboard and said something close to "This cannot be us. We have not changed anything."

We had not changed anything. That was the point.

The argument we had after that is worth writing down because I suspect some version of it happens in every startup eventually.

She wanted to migrate. There are plenty of decent ESPs. Switch the API keys, repoint the templates, be back in business inside a week, move on. I did not want to do that. Moving from one black box to another black box would buy us a year and then we would be sitting in this same conversation again, except next time we would have wasted a year not learning anything about the layer that had been quietly running our business.

Her counter was fair. The job right now is to get outreach moving again, not to philosophize about infrastructure. Two months of dark outbound is already two months too long. Pick the fastest thing that works.

We landed somewhere between the two positions, which is what cofounders do when both of them are partly right. Pick a path that gets us back to working email inside a week and that we can never get trapped in again. That compromise is what produced Hedwig and email-checker.

Why This Happened

Captain Patch with a magnifying glass over a stack of receipts and dashboards, an opaque box labelled ESP sitting in the corner with a small sign reading trust me.

Once we had ruled out our own domain as the primary cause (which we did first, the way you are supposed to do when something breaks), the only thing left was to test the platform itself against an alternative.

We took 800 transactional sends and split them in half. Same recipients, same content, same time of day, same authenticated sending domain. Half went through the existing ESP. Half went through a direct integration with a transactional provider we had been keeping in the back pocket for exactly this kind of contingency.

The direct path delivered in under thirty seconds. The ESP path took anywhere from forty minutes to four hours with a long tail of sends that simply never confirmed at all. Bounce data on the direct path came back in real time. The ESP took up to four hours to surface the same bounces, which is why we had been hammering dead addresses long after another system already knew they were dead.

We did not have a domain problem. We had a vendor problem.

“We were paying for a black box that had quietly stopped working at the volume we were paying it for.”

The platform was not technically broken. It was working, somewhere inside there, just much slower than the dashboard claimed and much slower than the price tier we were on implied. Volume had outgrown the layer we were sending through, and the layer had no easy way to tell us that, because telling us would mean admitting that the price we were paying did not match the throughput we were getting.

There is a less polite read on why this happened, and it is the part that stings.

autter is an enforcement infrastructure company. The whole pitch is that advisory layers fail because they have no teeth, that when something matters you do not put it behind a UI that suggests, you put it behind a gate that blocks. And here we were, running the most important outbound channel in our business through an advisory abstraction that suggested it was sending while not actually sending. I have been telling investors and customers a version of this argument for months. Apparently I needed to hear it myself.

Lesson

If your funnel feels off and the dashboards say everything is fine, run a split test the same afternoon. Send identical payloads through your ESP and through a direct provider. The truth is obvious inside an hour.

How We Fixed It

Captain Patch at the harbour workshop building two small wooden boxes labelled Hedwig and email-checker, sleeves rolled up, sawdust on the floor.

Two pieces. Built over a few weeks. Ran internally for two weeks. Open sourced this week.

Hedwig is the sender. It runs on Resend and AWS SES as interchangeable backends. You route by region, by message type, by per-recipient deliverability data, or by whatever logic makes sense for your stack. If Resend has a bad hour, traffic shifts to SES inside a config change. If a particular recipient domain has been giving SES grief, that domain routes through Resend instead. Provider failover is a config concern, not a migration project.

email-checker is the validator that sits in front of Hedwig. Syntax, MX records, SMTP probing where receivers allow it, disposable domain filtering, role account flagging, catch-all detection. It batches. It caches. It is fast enough to run inline on a form submit and cheap enough to run on a million-row import overnight. Bad addresses do not reach the sender, so the sender does not burn reputation on dead inboxes. The validator learns from every domain it sees and the loop closes on itself.

We open sourced both for three reasons and they are all real ones.

First, sending email and knowing who to send to is the most boring critical infrastructure in a startup. Nobody puts it on a roadmap. Nobody gets promoted for owning it. And the day it stops working is the day you find out it had been holding the entire go-to-market motion together. We just lived two months of that. The next early-stage team should not have to lose two months figuring out the same thing.

Second, every founder we talked to was hitting some version of this wall with some version of this kind of platform at the same growth stage. The ESP market is enormous. The layer underneath, which is the part you actually want to control, is fragmented and proprietary. There is no reason for every team to rebuild this from scratch in private.

Third, distribution. We are building autter in public and the tools we ship internally are the ones we want engineering leads to find when they go looking. If Hedwig or email-checker lands in someone's stack, autter is one shorter step away from landing in their merge pipeline. The motive is part useful and part self-interested and I would rather say so than pretend it is purely civic.

Both repos are MIT. Stars are nice. PRs are nicer.

Hedwig

Autter-dev / hedwig-mail

Self-hostable broadcast email — contact lists, visual editor, campaigns through Resend or SES, open and click tracking.

View on GitHub

email-checker

sagnik11 / email-checker

Verify email addresses without sending a single email — syntax, MX DNS, SMTP handshake, disposable/role detection, bulk processing, and CLI.

View on GitHub

Hedwig

Hedwig is the sender we built.

It runs on Resend and AWS SES as interchangeable backends. You can route messages by region, by message type, by per-recipient deliverability data, or by whatever logic you want to write. If Resend has a bad hour, traffic shifts to SES inside a config change. If a specific recipient domain has been giving SES grief, that domain routes through Resend instead. Provider failover is a configuration concern, not a migration project.

Campaigns are data, not buttons. Every campaign in Hedwig is a versioned object that lives in your repo. You diff it. You review it in a PR. You roll it back like any other piece of code. We were tired of building important sends inside vendor dashboards with no audit trail, no way to test changes before they went live, and no way to know who changed what.

Events land in your database within seconds of the provider firing them. Bounces, opens, complaints, deliveries, all of it. No four-hour lag. No mystery about whether a webhook arrived or not. If a send is going to fail, you know before you queue the next batch.

Templates are MJML compiled to HTML and they live in your repo with the rest of the codebase. They get reviewed. They get versioned. They pass through the same merge gate as everything else we ship, which is the same merge gate we sell to other teams, which means we eat our own cooking.

Hedwig does not assume anything about the rest of your stack. We run Node, Drizzle, Postgres. Plug in whatever you already use.

Hedwig on GitHub

Autter-dev / hedwig-mail

Self-hostable broadcast email — contact lists, visual editor, campaigns through Resend or SES, open and click tracking.

View on GitHub

The Email Checker

The other half of the problem was knowing who to send to in the first place.

A list with twenty percent dead addresses will tank your sender reputation no matter how clean the rest of your setup is. Most validation services are either expensive at scale or unreliable at the cheap end. We needed something we could run on every signup, every CSV upload, every imported list, without paying per check forever.

email-checker does syntax validation, MX record lookups, SMTP probing where receivers allow it, disposable domain filtering, role account flagging, and catch-all detection. It batches. It caches. It is fast enough to run inline on a form submit and cheap enough to run on a million-row import overnight.

We use it as a gate in front of Hedwig. Bad addresses never reach the sender. The sender never burns reputation on dead inboxes. The validator gets smarter over time as it sees more domains. The loop closes on itself.

“A list with twenty percent dead addresses will tank your sender reputation no matter how clean the rest of your setup is.”

email-checker on GitHub

sagnik11 / email-checker

Verify email addresses without sending a single email — syntax, MX DNS, SMTP handshake, disposable/role detection, bulk processing, and CLI.

View on GitHub

Lessons Learnt

Captain Patch and Captain Scout at the end of a long day, two cups of tea on the desk, a notebook open with bullet points half written, the lighthouse blinking clean and on time outside the window.

Some of these are technical. Most of them are about how I now think about our own infrastructure. I wrote them down on a Tuesday and have not bothered to make them tidy.

The split test should have happened in week one. We spent two months trying to debug our way to a vendor-shaped answer because the test that would have given us the answer felt like overkill. It was not overkill. If you are on a hosted ESP and something feels off, send identical payloads through your ESP and through a direct provider integration on the same afternoon. The truth is obvious inside an hour.

A vendor abstraction is useful right up to the day it hides the thing breaking, and the day after that you should already be looking for an exit. We were treating our send pipeline as something we owned and we did not own it. The dashboard was the only thing we owned, and the dashboard turned out to be the wrong thing.

Tanvi caught this two weeks before any monitoring tool did. I want to write that down somewhere permanent because it keeps happening. The team noticing the funnel feels off is the signal. The dashboard agreeing with the team is the confirmation. We keep doing it in the other order and it keeps costing us time we could have spent fixing things instead of doubting our own pattern recognition.

We sell the argument that enforcement beats advisory. We were running the most important outbound channel in our business on advisory. There is no version of this sentence that makes me feel better about it. The fix is not just the new stack, the fix is the rule that every critical layer in our business has to be inspectable and gateable now, including the layers we used to assume vendors had solved on our behalf.

Build versus migrate turned out to be a category question, not a tools question. Migrating to a different ESP would have bought us twelve to eighteen months of relief. Building down a layer bought us indefinite control over the thing the business actually runs on. Short run, building cost more. Long run, the migration treadmill costs more. We are running a business measured in years not quarters and I keep forgetting to do the math that way.

Absence is the hardest signal to read and we should have built a measurement for it earlier. When something stops happening it does not show up on any chart we owned. It shows up as a quiet quarter where the founders start blaming the market. The market is rarely the answer.

Last one. If you have just lived a particular operational hell and you can see the next ten startups about to walk into the same one, open sourcing the fix is a small cost and a useful gift. Distribution will follow if the work is good. It does not need to be the main reason you ship, but it is allowed to be one of the reasons, and we are getting comfortable saying that out loud.

Building autter in public. We block bad PRs at the merge gate, and apparently now we send our own email too. If your team is running into the same wall, drop us a line at hi@autter.dev or book 30 minutes.

P.S. Tanvi read this draft and pointed out that I described the old ESP as a "black box" four times in eight paragraphs. She asked if I had a thesaurus. I said the thesaurus offered "opaque system," "closed-source platform," and "proprietary layer," none of which feel like what I meant. She said that is a writer problem, not a vendor problem. I am sitting with that.