# Observability Stack ## Problem Statement Operations teams need correlated logs, dashboards, and alert examples that make incidents observable before they become customer-facing outages. A stack that only starts containers is not enough; it also needs meaningful sample data and incident exercises. ## Solution Overview This project defines a local observability environment with Elasticsearch, Logstash, Kibana, Grafana, Filebeat, alert rules, sample logs, and an incident simulation script. It is built to demonstrate practical monitoring workflows rather than a production-sized cluster. ## Architecture Overview ``` Application/System Logs -> Filebeat -> Logstash -> Elasticsearch -> Kibana | v Grafana Incident Scenario -> Sample Logs -> Alert Rules -> Operator Review ``` Core components: - `docker-compose.yml` defines the observability services. - `alerting/alert_rules.yml` records alert intent and severity. - `logs/` contains representative operational logs. - `scenarios/incident_simulation.sh` emits incident activity. - `examples/` contains sample alert and log outputs. ## How to Run ```bash cd observability-stack # Validate the compose model. make test # Start the stack. make run # Run the incident simulation. make demo # Stop the stack. docker compose down ``` When running locally: - Kibana: `http://localhost:5601` - Grafana: `http://localhost:3000` - Elasticsearch: `http://localhost:9200` ## Example Output ```text [2026-04-29 04:18:23] WARN Database connection pool nearing capacity [2026-04-29 04:18:28] ERROR Database connection pool exhausted [2026-04-29 04:18:33] ERROR Database query timeout occurred [2026-04-29 04:18:44] INFO Database connections restored ``` Additional examples are available in [examples/alert-output.txt](examples/alert-output.txt) and [examples/sample-log.txt](examples/sample-log.txt). ## Real-World Use Case A platform team can use this project to explain how logs move through an ingestion pipeline, how alert rules map to operational symptoms, and how incident exercises create evidence for on-call readiness reviews.