emailvalidationdata qualityenrichment

Email Verification at Scale: Why MX Checking Beats Regex

Why a regex isn't enough to validate email addresses, and how to run real deliverability checks without building SMTP infrastructure.

S
Seek API Team
·

Email validation sounds trivially simple: check that the string has an @, a domain, a .com. Done. Ship it.

Except it isn’t done, and that regex is wrong more often than you think.

Why regex validation fails in production

A regex can tell you that a string looks like an email address. It cannot tell you:

  • Whether the domain actually exists
  • Whether the domain has mail servers configured (MX records)
  • Whether the specific mailbox exists
  • Whether the address has been abandoned or deactivated
  • Whether it’s a role address like noreply@, info@, or abuse@
  • Whether it’s a known disposable / throwaway domain

When you send marketing email to unverified addresses, you get:

Hard bounces — permanent delivery failures because the address doesn’t exist. Above ~2%, ESPs like SendGrid, Mailchimp, and Postmark start throttling or suspending your account.

Soft bounces — temporary failures. At scale, these degrade your sender reputation.

Spam traps — old, abandoned addresses repurposed by ISPs to catch bad senders. Hitting a spam trap can get your domain blacklisted.

A clean email list is not a nice-to-have. It’s the foundation of deliverability.

What a real email validation does

Proper validation runs in layers:

Layer 1 — Syntax check Does the string conform to RFC 5322? More permissive than most regex patterns.

Layer 2 — DNS/MX lookup Does the domain exist? Does it have configured mail exchange records? An email address at a domain with no MX records is unreachable.

Layer 3 — SMTP probe Connect to the mail server, simulate an outgoing message, check if the server accepts the recipient. If it responds with a 550 or 551, the address doesn’t exist.

Layer 4 — Risk scoring Is this a disposable email service (Mailinator, Guerrilla Mail, etc.)? Is this a role address? Has this address appeared in breach databases?

Running all four layers yourself requires:

  • DNS resolution libraries
  • SMTP socket connections (often blocked at the ISP level)
  • A maintained database of disposable email providers (10,000+ domains)
  • Robust error handling for tarpits and greylisting

Or you call a worker that does it in 0.8 seconds.

Using the Email Validator worker

curl -X POST https://api.seek-api.com/v1/workers/email-validator/jobs \
  -H "X-Api-Key: YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{"email": "user@example.com"}'

Response:

{
  "email": "user@example.com",
  "valid": true,
  "deliverabilityScore": 0.94,
  "mxRecords": ["mail.example.com"],
  "smtpCheck": "accepted",
  "isDisposable": false,
  "isRoleAddress": false,
  "isFree": false,
  "suggestion": null
}

The suggestion field catches common typos: gmail.congmail.com, outlok.comoutlook.com.

Bulk validation workflow

For a list of 50,000 emails from a form or data import:

import asyncio
import httpx

async def validate_email(client, email, api_key):
    r = await client.post(
        "https://api.seek-api.com/v1/workers/email-validator/jobs",
        headers={"X-Api-Key": api_key},
        json={"email": email}
    )
    return r.json()["job_uuid"]

async def bulk_validate(emails, api_key):
    async with httpx.AsyncClient() as client:
        job_uuids = await asyncio.gather(
            *[validate_email(client, e, api_key) for e in emails]
        )
    return job_uuids

Because each job is independent, 50,000 validations run concurrently. Total wall time for 50k validations is roughly equal to the time for one — limited by the worker’s average run time of 0.8 seconds, not by serial execution.

What to do with results

After bulk validation, segment your list:

ScoreAction
> 0.90Send freely
0.70 – 0.90Send with monitoring
0.50 – 0.70Double opt-in before sending
< 0.50Suppress immediately
isDisposable: trueRemove and flag the signup
suggestion != nullOffer correction to the user

Typical results on a cold or aged list: 15–30% of addresses fail at the MX or SMTP layer. Removing them before a send materially improves deliverability rates and protects sender reputation.

Pricing

Email validation costs $0.001 per address. Validating 100,000 emails costs $100. Compared to re-enabling a suppressed SendGrid account or recovering from a domain blacklisting, that’s cheap insurance.