Nothing fires twice: idempotency for business automations

I wired four systems into one flow for a real services business: Pipedrive as the CRM, a job-management platform for scheduling engineers, an accounts package for invoicing, and an e-sign tool for contracts. A deal reaches a stage, a job is created, a contract goes out, an invoice follows. In testing it worked every time. In production it occasionally worked twice, which is far worse than not working at all.

A failed automation is annoying. A duplicated one is expensive. Two jobs means two engineers booked for one visit. Two invoices means a confused customer and a credit note. Two emails means you look like a spammer to the exact person you were trying to impress. Nobody notices at the time, because both copies look perfectly valid.

At-least-once is the deal you actually signed

Webhooks do not promise to fire once. They promise to fire at least once, and every provider's documentation says so if you read far enough down. Pipedrive retries when your endpoint is slow to acknowledge. n8n will replay executions after a restart or a configured retry. Stripe is admirably loud about it, telling you outright to expect the same event more than once.

Then there are the humans. Someone clicks Save twice because the spinner hesitated. Someone drags a deal to a stage, has second thoughts, drags it back, then forward again. Each of those is a legitimate trigger as far as your automation is concerned. Testing fires once because you fired it once. Production eventually fires everything twice.

The 400 millisecond lesson

An early version of my flow created a job whenever a deal hit a particular pipeline stage. The handler was careful, or so I thought: it checked whether a job already existed for the deal before creating one. One afternoon two webhooks arrived for the same deal 400 milliseconds apart. Both handlers ran the existence check. Both found nothing, because neither had finished writing. Both created a job.

The only reason I know the precise shape of that failure is that every automated decision was logged: what fired, what the handler checked, what it decided, what it created. The log showed two creation calls with the same deal ID, timestamps 400ms apart, each having passed the check before the other landed. Without the log it would have been "the system did something weird", unfalsifiable and unfixable. With it, the race was provable in thirty seconds.

What actually held up

Check-then-create is not enough, because the check and the create are not atomic. The fixes that survived contact with production were structural.

First, idempotency keys derived from the source record. The key for "create a job from this deal" is the deal's own ID, not the webhook timestamp or a random ID minted per execution. Two firings for the same deal produce the same key, and the second is refused. Keys built from timestamps are worse than no keys, because they make every duplicate look unique.

Second, ordering. Write the foreign ID back before you move the record. When the job is created, its ID goes onto the Pipedrive deal first, and only then does the deal advance to the stage that confirms creation. A second trigger looks at the deal, sees a job ID already sitting there, and skips. The write-back is the lock. Get the order wrong (move the deal first, populate the ID after) and there is a window where a retry sees a triggered stage with no ID, and happily creates job number two.

Third, gates. A record advances only when the evidence from the previous step exists on the record itself. A deal cannot reach "contract sent" without an envelope ID from the e-sign tool, and cannot reach "invoiced" without an invoice number from the accounts package. State is not which column a human dragged the card to; state is what evidence the record carries, and the automation checks the evidence, never the card.

Fourth, the decision log. Every automated action records what fired it, what it checked, and what it did, in one place. This costs almost nothing to build in n8n or a small webhook handler, and it converts every duplicate from an argument into a query.

Ask the question before go-live

The design question: what happens when this fires twice at the same second? Not twice a minute apart, when the first run has comfortably finished. Twice at the same second, mid-write. If the honest answer is "two of everything", the automation is not done, however well the demo went. Asking before go-live costs an hour of thought. Asking afterwards costs an incident review, a credit note and an apology.

The five questions

Before you switch on any webhook automation, whether it lives in n8n, an off-the-shelf automation platform or a hand-rolled endpoint, ask these:

If this fires twice in the same second, does the second run create anything?
What is the idempotency key, and is it derived from the source record rather than from time?
Is the created record's ID written back to the source before the source changes state?
What evidence must exist before the record may advance, and does the automation check the evidence rather than the stage?
When a duplicate happens anyway, can you prove exactly what fired and what each run decided?

If you can answer all five, ship it. If you cannot, it worked in testing, and that is all it did.