How to Process 10,000 URLs in Parallel Without Running a Server

You have a spreadsheet with 10,000 URLs. You need to:

Visit each one
Extract specific data
Store the result

If you’ve done this before, you know the usual path: spin up a queue (Redis + Bull, or SQS), deploy a worker process, handle retries and failures, monitor progress, scale up compute… and then tear it all down when you’re done.

There’s a much simpler model.

The problem with DIY batch processing

Large batch jobs require:

Queue infrastructure to distribute work
Worker processes to consume from the queue
Concurrency management (don’t hammer the source)
Retry logic for transient failures
Dead-letter handling for permanent failures
Monitoring to know when you’re done
Compute capacity that can handle the peak

For a one-off analysis, this is massively over-engineered. For a recurring job, it still requires ongoing ops. And none of this is your core product.

Async workers: a simpler model

Seek API runs workers in a distributed job execution platform. When you submit a job:

It enters a queue managed by the platform
It executes with appropriate concurrency
It retries on transient failures
The result is stored and available when complete

You submit thousands of jobs simultaneously and check for results. No infrastructure.

Submitting 10,000 jobs

import fetch from 'node-fetch';
import { readFileSync } from 'fs';

const urls = readFileSync('urls.txt', 'utf8').trim().split('\n');
const API_KEY = process.env.SEEKAPI_KEY;

async function submitJob(url) {
  const res = await fetch('https://api.seek-api.com/v1/workers/webpage-extractor/jobs', {
    method: 'POST',
    headers: { 'X-Api-Key': API_KEY, 'Content-Type': 'application/json' },
    body: JSON.stringify({ url })
  });
  const data = await res.json();
  return data.job_uuid;
}

// Rate limit submission to 50 req/s (platform limit)
async function submitBatch(urls, ratePerSecond = 50) {
  const jobIds = [];
  for (let i = 0; i < urls.length; i++) {
    const uuid = await submitJob(urls[i]);
    jobIds.push({ url: urls[i], uuid });
    if ((i + 1) % ratePerSecond === 0) await sleep(1000);
    if ((i + 1) % 1000 === 0) console.log(`Submitted ${i + 1} / ${urls.length}`);
  }
  return jobIds;
}

const jobs = await submitBatch(urls);
console.log(`All ${jobs.length} jobs submitted`);

10,000 submissions at 50/s takes ~200 seconds (3.3 minutes).

Polling for results

Don’t poll each job individually in sequence — that defeats the purpose. Instead, poll in batches and collect results as they complete:

async function collectResults(jobs) {
  const pending = new Map(jobs.map(j => [j.uuid, j.url]));
  const results = [];

  while (pending.size > 0) {
    // Check all pending jobs in parallel (up to 100 at a time)
    const batch = [...pending.keys()].slice(0, 100);
    
    await Promise.all(
      batch.map(async (uuid) => {
        const res = await fetch(`https://api.seek-api.com/v1/jobs/${uuid}`, {
          headers: { 'X-Api-Key': API_KEY }
        }).then(r => r.json());

        if (res.status === 'completed') {
          results.push({ url: pending.get(uuid), data: res.result });
          pending.delete(uuid);
        } else if (res.status === 'failed') {
          results.push({ url: pending.get(uuid), error: res.error });
          pending.delete(uuid);
        }
      })
    );

    console.log(`${pending.size} jobs remaining...`);
    if (pending.size > 0) await sleep(5000);
  }

  return results;
}

const results = await collectResults(jobs);

Using webhooks instead of polling

For 10K jobs, polling is manageable. For 100K+, use webhooks to avoid unnecessary API calls:

// Include webhook URL in each job submission
body: JSON.stringify({ 
  url,
  webhook: 'https://your-server.com/hooks/job-complete'
})

Your webhook endpoint receives the result as soon as each job finishes. No polling. No missed completions. Process each result as it arrives.

app.post('/hooks/job-complete', (req, res) => {
  const { job_uuid, status, result, original_input } = req.body;
  if (status === 'completed') processResult(original_input.url, result);
  res.sendStatus(200);
});

Throughput expectations

Job count	Avg job time	Total wall time (parallel)
100	5s	~10s
1,000	5s	~30s
10,000	5s	~2–3 min
100,000	5s	~15–20 min

The platform handles parallelism. You just submit and wait.

Saving results

Stream results to a file or database as they come in:

import { createWriteStream } from 'fs';
const output = createWriteStream('results.jsonl');

// In collectResults:
if (res.status === 'completed') {
  output.write(JSON.stringify({ url, data: res.result }) + '\n');
  pending.delete(uuid);
}

JSONL (one JSON object per line) is ideal for large result sets — it’s streamable, appendable, and every major data tool reads it.

What this replaces

With this approach, you eliminate:

A Redis instance
A queue worker service
An autoscaling policy
A dead-letter queue handler
A monitoring dashboard for queue depth
A retry configuration

For a batch analysis that runs once a month, maintaining that infrastructure doesn’t make sense. For one that runs every day, the ops overhead compounds. Workers the simpler, cheaper path in both cases.

Complete script

A complete, ready-to-run script for bulk processing is in the Seek API GitHub examples repository. It includes chunked submission, webhook support, and progress tracking.