โœฆ Currently in private beta

Stop debugging background job failures in the dark.

SpanHQ gives you full execution traces for every BullMQ job โ€” every retry, every child job, every failure โ€” with one npm install.

โ€”โ€” developers already waiting

When HTTP requests fail, you have traces.
When background jobs fail, you have nothing.

Without SpanHQ
Job ID: a3f9c2 โ€” FAILED (attempt 3/3) Error: Cannot read properties of undefined // What triggered this job? Unknown. // Which downstream jobs never ran? Unknown. // How long has this been happening? Unknown. // Time to debug: 2+ hours
  • No context about parent jobs
  • Can't see downstream impact
  • Retry history lost on completion
  • Silent failures for hours
With SpanHQ
Trace: user_signup [6 jobs · 1 failure] ├─ โœ… SendWelcomeEmail 120ms ├─ โœ… CreateStripeCustomer 890ms (retry 2) │ └─ โœ… SyncPlan 45ms ├─ โœ… NotifySlack 67ms └─ โŒ GenerateReport DEAD └─ โš ๏ธ EmailReport SKIPPED
  • See full job execution chain instantly
  • Every retry logged with exact error
  • Know exactly what was skipped downstream
  • Find root cause in seconds

One npm install. Two lines of code.

Works with your existing BullMQ code. No migration. No YAML. No infra changes.

1
Install the package
npm install @spanhq/bullmq
2
Wrap your queue and worker
import { createTracer } from '@spanhq/bullmq' const tracer = createTracer({ apiKey: process.env.SPANHQ_API_KEY }) const queue = tracer.instrumentQueue(yourQueue) const worker = tracer.instrumentWorker(yourWorker)
3
See your full job execution history

That's it. No config files. No YAML. No changes to your existing job handlers.

Works in under 5 minutes

Everything you need to understand your jobs

Full Execution Trace Tree

See every parent-child job relationship visually. Know instantly which job triggered which, even across async boundaries.

Complete Retry History

Every attempt logged with exact timing and error. Never wonder which retry introduced the bug again.

NEW

AI Failure Diagnosis

When a job dies, AI analyzes the stack trace and tells you the root cause and exact fix. No more guessing.

Instant Dead Job Alerts

Get Slack and email alerts the moment a job exhausts all retries. Not when a customer complains โ€” immediately.

Works With Existing Code

No changes to your job handlers. Wrap your existing Worker and Queue. Every job automatically traced.

Built for BullMQ

Not a generic tool repurposed for queues. Built from the ground up for BullMQ's event system, retry mechanics, and job lifecycle.

Real developers. Real frustrations. We didn't invent this problem โ€” Reddit did.

These are real comments from r/node. This is the pain SpanHQ was built to solve.

r/node ยท Posted by u/IllMaintenance8243 ยท 7.8K views ยท 21 comments

How do you debug BullMQ job failures in production?

I've been struggling with background jobs failing silently and spent hours digging through logs last week to find a simple retry issue. Curious how others handle this โ€” do you have any tools or techniques that actually work?

โฌ† 9 ๐Ÿ’ฌ 21 comments
โฌ†7โฌ‡
u/Bharath720

The biggest thing that helped me was making failures impossible to ignore. With BullMQ I usually add a global failed handler and push errors to logs or alerts immediately. Also make sure retries and backoff are visible, because silent retries can hide the real issue for a long time.

u/IllMaintenance8243 OP

This is exactly the pain I'm solving. I'm building a tool that makes the full job tree visible โ€” every retry, every child job, exactly what failed and when.

โฌ†3โฌ‡
u/ccb621

Telemetry. Every consumer emits a canonical log and trace. OpenTelemetry will help pinpoint errors.

u/IllMaintenance8243 OP

OpenTelemetry is powerful but the setup for BullMQ specifically is brutal. I'm building something that works out of the box for BullMQ with zero manual instrumentation.

โฌ†1โฌ‡
u/Particular_Budget946

BullMQ's failed event combined with a dead letter queue pattern saved me a lot of pain. I log the job name, id, data, and error to a table on every failure. Also worth setting up Bull Board so you can see stuck and failed jobs at a glance.

u/IllMaintenance8243 OP

That means we have to set up the dashboard manually every time for different projects, right? That's what SpanHQ eliminates.

โฌ†1โฌ‡
u/Obvious-Treat-4905

Silent failures in background jobs are the worst. Having proper retries + alerting makes a huge difference. Also adding structured logging around job start/fail helps a lot when debugging. Without visibility, you're basically guessing every time.

This is what SpanHQ looks like

Not mockups. Not wireframes. Real product screenshots.

SpanHQ traces list view

Trace List View

See every trace at a glance โ€” name, status, duration, and job count. Filter by completed, failed, or dead.

SpanHQ analytics dashboard

Analytics Dashboard

Real-time stats: total traces, success rate, avg duration, and dead job count with throughput and failure charts.

SpanHQ DAG trace visualization

Trace DAG Visualizer

See the full execution tree โ€” every parent, child, retry, and failure across your entire job chain. Click any node to see logs, metadata, and stack traces.

Be first to know when we launch.

Join the waitlist. No spam. One email when we're ready.