Files

Observability Stack

Problem Statement

Operations teams need correlated logs, dashboards, and alert examples that make incidents observable before they become customer-facing outages. A stack that only starts containers is not enough; it also needs meaningful sample data and incident exercises.

Solution Overview

This project defines a local observability environment with Elasticsearch, Logstash, Kibana, Grafana, Filebeat, alert rules, sample logs, and an incident simulation script. It is built to demonstrate practical monitoring workflows rather than a production-sized cluster.

Architecture Overview

Application/System Logs -> Filebeat -> Logstash -> Elasticsearch -> Kibana
                                                       |
                                                       v
                                                    Grafana

Incident Scenario -> Sample Logs -> Alert Rules -> Operator Review

Core components:

  • docker-compose.yml defines the observability services.
  • alerting/alert_rules.yml records alert intent and severity.
  • logs/ contains representative operational logs.
  • scenarios/incident_simulation.sh emits incident activity.
  • examples/ contains sample alert and log outputs.

How to Run

cd observability-stack

# Validate the compose model.
make test

# Start the stack.
make run

# Run the incident simulation.
make demo

# Stop the stack.
docker compose down

When running locally:

  • Kibana: http://localhost:5601
  • Grafana: http://localhost:3000
  • Elasticsearch: http://localhost:9200

Example Output

[2026-04-29 04:18:23] WARN Database connection pool nearing capacity
[2026-04-29 04:18:28] ERROR Database connection pool exhausted
[2026-04-29 04:18:33] ERROR Database query timeout occurred
[2026-04-29 04:18:44] INFO Database connections restored

Additional examples are available in examples/alert-output.txt and examples/sample-log.txt.

Real-World Use Case

A platform team can use this project to explain how logs move through an ingestion pipeline, how alert rules map to operational symptoms, and how incident exercises create evidence for on-call readiness reviews.