✦ Currently in private beta

Stop debugging background job failures in the dark.

SpanHQ gives you full execution traces for every BullMQ job — every retry, every child job, every failure — with one npm install.

🎉 You're # on the waitlist!

We'll email you at when we launch. In the meantime, follow our progress on X.

Follow @HqSpan

—— developers already waiting

The Problem

When HTTP requests fail, you have traces.
When background jobs fail, you have nothing.

Without SpanHQ

Job ID: a3f9c2 — FAILED (attempt 3/3) Error: Cannot read properties of undefined // What triggered this job? Unknown. // Which downstream jobs never ran? Unknown. // How long has this been happening? Unknown. // Time to debug: 2+ hours

No context about parent jobs
Can't see downstream impact
Retry history lost on completion
Silent failures for hours

With SpanHQ

Trace: user_signup [6 jobs · 1 failure] ├─ ✅ SendWelcomeEmail 120ms ├─ ✅ CreateStripeCustomer 890ms (retry 2) │ └─ ✅ SyncPlan 45ms ├─ ✅ NotifySlack 67ms └─ ❌ GenerateReport DEAD └─ ⚠️ EmailReport SKIPPED

See full job execution chain instantly
Every retry logged with exact error
Know exactly what was skipped downstream
Find root cause in seconds

Setup

One npm install. Two lines of code.

Works with your existing BullMQ code. No migration. No YAML. No infra changes.

Install the package

npm install @spanhq/bullmq

Wrap your queue and worker

import { createTracer } from '@spanhq/bullmq'

const tracer = createTracer({
  apiKey: process.env.SPANHQ_API_KEY
})

const queue  = tracer.instrumentQueue(yourQueue)
const worker = tracer.instrumentWorker(yourWorker)

See your full job execution history

That's it. No config files. No YAML. No changes to your existing job handlers.

Works in under 5 minutes

Features

Everything you need to understand your jobs

Full Execution Trace Tree

See every parent-child job relationship visually. Know instantly which job triggered which, even across async boundaries.

Complete Retry History

Every attempt logged with exact timing and error. Never wonder which retry introduced the bug again.

NEW

AI Failure Diagnosis

When a job dies, AI analyzes the stack trace and tells you the root cause and exact fix. No more guessing.

Instant Dead Job Alerts

Get Slack and email alerts the moment a job exhausts all retries. Not when a customer complains — immediately.

Works With Existing Code

No changes to your job handlers. Wrap your existing Worker and Queue. Every job automatically traced.

Built for BullMQ

Not a generic tool repurposed for queues. Built from the ground up for BullMQ's event system, retry mechanics, and job lifecycle.

The Market Problem

Real developers. Real frustrations. We didn't invent this problem — Reddit did.

These are real comments from r/node. This is the pain SpanHQ was built to solve.

r/node · Posted by u/IllMaintenance8243 · 7.8K views · 21 comments

How do you debug BullMQ job failures in production?

I've been struggling with background jobs failing silently and spent hours digging through logs last week to find a simple retry issue. Curious how others handle this — do you have any tools or techniques that actually work?

⬆ 9 💬 21 comments

⬆7⬇

u/Bharath720

The biggest thing that helped me was making failures impossible to ignore. With BullMQ I usually add a global failed handler and push errors to logs or alerts immediately. Also make sure retries and backoff are visible, because silent retries can hide the real issue for a long time.

u/IllMaintenance8243 OP

This is exactly the pain I'm solving. I'm building a tool that makes the full job tree visible — every retry, every child job, exactly what failed and when.

⬆3⬇

u/ccb621

Telemetry. Every consumer emits a canonical log and trace. OpenTelemetry will help pinpoint errors.

u/IllMaintenance8243 OP

OpenTelemetry is powerful but the setup for BullMQ specifically is brutal. I'm building something that works out of the box for BullMQ with zero manual instrumentation.

⬆1⬇

u/Particular_Budget946

BullMQ's failed event combined with a dead letter queue pattern saved me a lot of pain. I log the job name, id, data, and error to a table on every failure. Also worth setting up Bull Board so you can see stuck and failed jobs at a glance.

u/IllMaintenance8243 OP

That means we have to set up the dashboard manually every time for different projects, right? That's what SpanHQ eliminates.

⬆1⬇

u/Obvious-Treat-4905

Silent failures in background jobs are the worst. Having proper retries + alerting makes a huge difference. Also adding structured logging around job start/fail helps a lot when debugging. Without visibility, you're basically guessing every time.

The Product

This is what SpanHQ looks like

Not mockups. Not wireframes. Real product screenshots.

Trace List View

See every trace at a glance — name, status, duration, and job count. Filter by completed, failed, or dead.

Analytics Dashboard

Real-time stats: total traces, success rate, avg duration, and dead job count with throughput and failure charts.

Trace DAG Visualizer

See the full execution tree — every parent, child, retry, and failure across your entire job chain. Click any node to see logs, metadata, and stack traces.

Be first to know when we launch.

Join the waitlist. No spam. One email when we're ready.

🎉 You're # on the waitlist!

We'll email you at when we launch.

Stop debugging background job failures in the dark.

When HTTP requests fail, you have traces.When background jobs fail, you have nothing.

One npm install. Two lines of code.

Everything you need to understand your jobs

Full Execution Trace Tree

Complete Retry History

AI Failure Diagnosis

Instant Dead Job Alerts

Works With Existing Code

Built for BullMQ

Real developers. Real frustrations. We didn't invent this problem — Reddit did.

How do you debug BullMQ job failures in production?

This is what SpanHQ looks like

Trace List View

Analytics Dashboard

Trace DAG Visualizer

Be first to know when we launch.

When HTTP requests fail, you have traces.
When background jobs fail, you have nothing.