Prometheus Metrics in Next.js Without the Overhead

Integrating prom-client with Next.js for HTTP request metrics, latency histograms, and production observability that doesn't slow down your app.

Beyond console.log

When something breaks in production, console.log won't save you. You need metrics: request rates, error rates, latency distributions. You need to know what's happening now and how it compares to yesterday.

Prometheus has become the standard for application metrics. Integrating it with Next.js requires understanding both tools' constraints and finding patterns that work with both.

What to Measure

The foundation is the RED method: Rate, Errors, Duration. For every service, track how many requests it receives, how many fail, and how long they take.

Rate tells you about load. Errors tell you about reliability. Duration tells you about user experience. Together, they paint a complete picture of service health.

Beyond RED, application-specific metrics capture business logic: active users, emails sent, payments processed, cache hit rates. These vary by application but follow the same patterns.

The Singleton Challenge

Next.js development mode hot-reloads code. Each reload creates new metric registrations. Prometheus rejects duplicate metrics with the same name. The solution is the singleton pattern: store the registry globally so it survives reloads.

This is a general pattern for stateful modules in Next.js. Anything that accumulates state across requests—database connections, metric registries, caches—needs singleton treatment.

Middleware Limitations

Next.js middleware runs on the Edge Runtime, which lacks Node.js APIs. The Prometheus client library requires Node.js. Metrics collection can't happen directly in middleware.

The workaround is passing timing information through headers or context, then recording metrics in API routes that run on Node.js. This adds complexity but works reliably.

Label Cardinality

Prometheus stores time series for each unique combination of metric name and labels. High cardinality—many unique label values—creates massive storage and query overhead.

The classic mistake is using user IDs as labels. With a million users, you create a million time series per metric. Prometheus breaks down. Use aggregated labels instead: user type, subscription tier, geographic region.

This constraint shapes metric design. You can't answer "how many requests did user X make?" with Prometheus. You can answer "how many requests did premium users make?" The difference matters.

Histogram Buckets

Latency histograms need bucket boundaries that match your actual distribution. Default buckets rarely fit.

If your P99 latency is 200ms but the default buckets stop at 100ms, you lose resolution in the tail. If your service is fast and P50 is 5ms, coarse buckets obscure meaningful variation.

Measure first, then configure buckets. A histogram with wrong buckets is worse than no histogram—it gives false confidence.

Securing Metrics

Metrics expose internal details: endpoint names, error rates, infrastructure topology. Don't make them public.

Require authentication for the metrics endpoint. Or restrict access to internal networks. Or both. The scraper that collects metrics can authenticate; random internet users shouldn't see your error rates.

Dashboard Design

Raw metrics aren't useful. Dashboards transform them into insight.

The request rate graph shows traffic patterns. The error rate graph shows reliability. The latency heatmap shows distribution over time. Together, they tell a story.

Effective dashboards answer questions without requiring query construction. "Is the service healthy?" should be answerable at a glance. "Why is it slow?" should require drilling down, but the path should be obvious.

Alerting Philosophy

Alerts should be actionable. An alert that fires but requires no response trains operators to ignore alerts.

Alert on symptoms, not causes. "Error rate above 5%" is actionable—investigate and fix. "CPU above 80%" might not be—high CPU during expected load is fine. Symptom-based alerts avoid false positives from benign causes.

Alert thresholds need tuning. Too sensitive creates alert fatigue. Too lax misses real problems. Start conservative and adjust based on experience.

The Observability Shift

Metrics transform debugging from "the app feels slow" to "P99 latency increased 40% after yesterday's deploy, caused by a new N+1 query on the /api/products endpoint."

The difference is precision. Feelings are vague; numbers are specific. Specific problems have specific solutions. Vague problems lead to guessing.

This shift requires investment—instrumenting code, configuring collection, building dashboards, tuning alerts. The investment pays off when something breaks at 2 AM and you can identify the cause in minutes instead of hours.

The Broader Principle

Observability isn't about tools. It's about understanding your system under production conditions. Metrics are one pillar; logs and traces are others.

The goal is answering questions about system behavior without deploying new code or adding debugging. When something unexpected happens, the data to understand it should already exist.

Production systems are too complex to understand through code reading alone. Observability bridges the gap between what the code says and what the system does.