aiasyncarchitecturejobs

Async Jobs for AI Pipelines: Why Synchronous APIs Won't Cut It

Synchronous HTTP calls break down the moment your AI pipeline takes more than a few seconds. Here's why async job queues are the right architecture — and how to implement them.

S
Sarah Lin · Platform Engineer
·

AI pipelines are slow. There’s no way around it. Whether you’re calling GPT-4o, running a multi-step RAG pipeline, or processing a batch of documents, the execution time is measured in seconds — and often tens of seconds.

This creates a fundamental mismatch with the standard HTTP request/response model. Most web APIs are designed to respond in milliseconds. Your frontend, your orchestrator, your serverless function — they all have timeouts that were set before AI workloads existed.

In this article, we’ll look at why synchronous AI pipelines fail at scale, what the async job queue pattern looks like, and how to implement it cleanly on Seek API.


The synchronous API anti-pattern

Here’s what a naive synchronous AI pipeline looks like:

// Client
const result = await fetch('/api/summarize', {
  method: 'POST',
  body: JSON.stringify({ url }),
});
const data = await result.json();
// → This hangs for 15 seconds. Then times out.

The problems compound quickly:

Client-side timeouts. Browser fetch defaults to no timeout, but many HTTP clients default to 30s. Your AI pipeline routinely exceeds that.

Server timeouts. Lambda functions timeout at 29s by default. Vercel Edge Functions offer 30s. Many reverse proxies timeout at 60-90s. Your pipeline doesn’t care.

Retry storms. When a request times out, clients retry. Three retries on a 15-second operation means 45 seconds of duplicate work and three times the LLM API costs.

Poor user experience. The user sees a spinner for 15 seconds, then an error. There’s no progress, no feedback, no way to check later.


The async job pattern

The solution is to decouple job submission from job completion. This is a 30-year-old pattern from message queues, applied to AI APIs:

Client                     Server
  |                           |
  |-- POST /jobs -----------→ |   (submit, instant)
  |← 200 { job_uuid } ------  |
  |                           |
  |                    [work happening]
  |                           |
  |-- GET /jobs/uuid ------→  |   (poll, ~2s later)
  |← 200 { status: PENDING }  |
  |                           |
  |-- GET /jobs/uuid ------→  |   (poll again)
  |← 200 { status: completed, result: {...} }

The key insight: submission and execution are separate concerns. The submission endpoint returns instantly. The work happens asynchronously. The client polls until ready.

This maps perfectly to how AI APIs actually behave:

  • Deterministic latency: Each poll is fast (milliseconds). No request ever hangs for 15 seconds.
  • Natural retry behavior: If a poll fails, retry the poll — not the expensive work.
  • Progress visibility: You can return intermediate status (PENDING → PROCESSING → completed).
  • Fire-and-forget: Clients can submit and check back later, even from a mobile app with a spotty connection.

Implementing async jobs with Seek API

On Seek API, every worker call is async by default. Here’s the full flow:

1. Submit the job

curl -X POST https://api.seek-api.com/v1/workers/gpt-summarizer/jobs \
  -H "X-Api-Key: sk_prod_xxxx" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com/long-article",
    "maxLength": 500
  }'

Response (immediate, ~50ms):

{
  "job_uuid": "job_f3a2b1c4d5e6",
  "worker_id": "gpt-summarizer",
  "status": "PENDING",
  "submitted_at": "2026-03-05T10:24:39.483Z"
}

2. Poll for completion

curl https://api.seek-api.com/v1/jobs/job_f3a2b1c4d5e6 \
  -H "X-Api-Key: sk_prod_xxxx"

After 2-3 seconds, you’ll get:

{
  "job_uuid": "job_f3a2b1c4d5e6",
  "status": "completed",
  "duration_ms": 2340,
  "cost_usd": 0.003,
  "response_json": {
    "title": "The Future of Infrastructure",
    "summary": "A deep analysis of how serverless computing is reshaping...",
    "keyPoints": ["Point 1", "Point 2", "Point 3"],
    "wordCount": 3420
  }
}

3. Handle the result

In your application, you’d poll with exponential backoff:

async function pollJob(jobUuid, apiKey, maxAttempts = 30) {
  for (let i = 0; i < maxAttempts; i++) {
    const res = await fetch(`https://api.seek-api.com/v1/jobs/${jobUuid}`, {
      headers: { 'X-Api-Key': apiKey },
    });
    const job = await res.json();

    if (job.status === 'completed') return job.response_json;
    if (job.status === 'failed')    throw new Error(job.error ?? 'Job failed');

    // Exponential backoff: 1s, 2s, 4s, max 8s
    const delay = Math.min(1000 * Math.pow(2, i), 8000);
    await new Promise((r) => setTimeout(r, delay));
  }
  throw new Error('Job timed out after polling');
}

Multi-step AI pipelines

The async job model becomes even more powerful for multi-step pipelines, where the output of one job feeds the input of the next.

Here’s a real-world example: a lead enrichment pipeline that:

  1. Scrapes a company website
  2. Extracts key information
  3. Runs an AI analysis
async function enrichLead(websiteUrl) {
  // Step 1: Scrape the website
  const scrapeJob = await submitJob('website-scraper', { url: websiteUrl });
  const scrapeResult = await pollJob(scrapeJob.job_uuid);

  // Step 2: Extract structured data
  const extractJob = await submitJob('data-extractor', {
    html: scrapeResult.html,
    schema: ['company_name', 'industry', 'tech_stack', 'team_size'],
  });
  const extractResult = await pollJob(extractJob.job_uuid);

  // Step 3: AI analysis
  const analysisJob = await submitJob('lead-scorer', {
    company: extractResult,
    criteria: ['b2b', 'funded', 'tech-enabled'],
  });
  const analysis = await pollJob(analysisJob.job_uuid);

  return { ...extractResult, analysis };
}

Each step runs on a dedicated worker with the right resources. Steps are isolated — a failure in step 2 doesn’t affect the step 1 result. You can retry individual steps without re-running the full pipeline.


When to go parallel

For pipelines where steps are independent, run them in parallel:

async function analyzeMultiplePages(urls) {
  // Submit all jobs simultaneously
  const jobs = await Promise.all(
    urls.map((url) => submitJob('page-analyzer', { url }))
  );

  // Poll them all in parallel
  const results = await Promise.all(
    jobs.map((job) => pollJob(job.job_uuid))
  );

  return results;
}

You get near-linear scaling: 10 URLs in parallel takes roughly the same time as 1 URL serially.


Handling failures gracefully

Async jobs fail. Networks fail. LLM APIs hit rate limits. Your pipeline needs to handle this cleanly.

Categories of failure

StatusMeaningAction
FAILEDJob threw an errorCheck error field. Retry if transient.
TIMEOUTJob exceeded timeoutIncrease timeout or optimize worker.
CANCELLEDManually cancelledResubmit if needed.

Retry-safe workers

Design your workers to be idempotent — running them twice with the same input should produce the same output. This makes retries safe:

export const handler = async (input) => {
  // Check cache first (using input hash as key)
  const cacheKey = hashInput(input);
  const cached = await cache.get(cacheKey);
  if (cached) return cached;

  // Do work
  const result = await doExpensiveWork(input);

  // Cache result
  await cache.set(cacheKey, result, { ttl: 3600 });

  return result;
};

Webhooks vs polling

Polling is simple and universally compatible. But for long-running jobs (5-60 minutes), you may prefer webhooks to avoid polling loops.

Seek API (coming Q2 2026) will support webhook callbacks on job completion:

{
  "url": "https://example.com/webhook",
  "maxResults": 500,
  "__callback_url": "https://your-server.com/hooks/job-done"
}

For now, polling with exponential backoff is the recommended approach and works well for jobs up to a few minutes.


Summary

Synchronous HTTP and AI pipelines are fundamentally mismatched. The async job pattern — submit, get UUID, poll — solves every problem:

  • No timeouts
  • Natural retry behavior
  • Fire-and-forget capability
  • Linear parallelism

The overhead is minimal (a few extra HTTP calls) and the resilience gains are massive. Whether you’re building a one-off enrichment script or a production AI pipeline processing thousands of documents per day, async jobs are the right default.