Monitoring Web Servers

5 min read

Image
A deep dive into web server metrics, how they're structured, visualized, and where things often go wrong.

Once, I debugged a production issue that only happened on Tuesdays. Everything looked fine at first glance: CPU, memory, request count, all normal. But users complained about slow responses. Turns out, the culprit was an unnoticed spike in “active connections,” hitting the server’s limit, causing requests to queue up. A missing alert on a non-standard metric led to hours of frustration.

The Standard Metrics

Web servers like Nginx, Apache, and Caddy expose a set of built-in metrics that cover most operational concerns:

These are a great start, but they don’t always tell the full story.

Beyond the Defaults: Custom Metrics

Sometimes, what you really need isn’t included out of the box. Here are a few examples of non-standard metrics that have saved me before:

Projects that expose these:

Understanding Metric Data Structures

Metrics typically follow a structured format. Here’s how Prometheus structures time-series data:

http_requests_total{method="GET",status="200",host="example.com"} 1234

This tells us:

Other formats include JSON (used by ELK stack) or plaintext stats like Varnish’s varnishstat output. Visualization tools like Grafana, Kibana, and Datadog help turn these raw numbers into insightful graphs.

Here are additional common types of metrics:

Visualization tools like Grafana use these structures to create time-series graphs, bar charts, and heatmaps, helping you monitor and make sense of the raw data collected over time.

How Things Go Wrong

Misconfigurations and bad practices can lead to blind spots. Here are some common mistakes:

Final Thoughts

Good monitoring isn’t just about collecting numbers, it’s about knowing which numbers matter. Go beyond defaults, visualize wisely, and keep an eye on what isn’t being monitored. Your future self will thank you when a weird Tuesday bug pops up again.