2026-04-29 23:14:14 +00:00
# Observability Stack
2026-04-29 23:30:30 +00:00
## Problem Statement
2026-04-29 23:14:14 +00:00
2026-04-29 23:30:30 +00:00
Operations teams need correlated logs, dashboards, and alert examples that make incidents observable before they become customer-facing outages. A stack that only starts containers is not enough; it also needs meaningful sample data and incident exercises.
2026-04-29 23:14:14 +00:00
2026-04-29 23:30:30 +00:00
## Solution Overview
2026-04-29 23:14:14 +00:00
2026-04-29 23:30:30 +00:00
This project defines a local observability environment with Elasticsearch, Logstash, Kibana, Grafana, Filebeat, alert rules, sample logs, and an incident simulation script. It is built to demonstrate practical monitoring workflows rather than a production-sized cluster.
2026-04-29 23:14:14 +00:00
2026-04-29 23:30:30 +00:00
## Architecture Overview
2026-04-29 23:14:14 +00:00
```
2026-04-29 23:30:30 +00:00
Application/System Logs -> Filebeat -> Logstash -> Elasticsearch -> Kibana
|
v
Grafana
2026-04-29 23:14:14 +00:00
2026-04-29 23:30:30 +00:00
Incident Scenario -> Sample Logs -> Alert Rules -> Operator Review
2026-04-29 23:14:14 +00:00
```
2026-04-29 23:30:30 +00:00
Core components:
2026-04-29 23:14:14 +00:00
2026-04-29 23:30:30 +00:00
- `docker-compose.yml` defines the observability services.
- `alerting/alert_rules.yml` records alert intent and severity.
- `logs/` contains representative operational logs.
- `scenarios/incident_simulation.sh` emits incident activity.
- `examples/` contains sample alert and log outputs.
2026-04-29 23:14:14 +00:00
2026-04-29 23:30:30 +00:00
## How to Run
2026-04-29 23:14:14 +00:00
``` bash
2026-04-29 23:30:30 +00:00
cd observability-stack
2026-04-29 23:14:14 +00:00
2026-04-29 23:30:30 +00:00
# Validate the compose model.
make test
2026-04-29 23:14:14 +00:00
2026-04-29 23:30:30 +00:00
# Start the stack.
make run
2026-04-29 23:14:14 +00:00
2026-04-29 23:30:30 +00:00
# Run the incident simulation.
make demo
2026-04-29 23:14:14 +00:00
2026-04-29 23:30:30 +00:00
# Stop the stack.
docker compose down
2026-04-29 23:14:14 +00:00
```
2026-04-29 23:30:30 +00:00
When running locally:
2026-04-29 23:14:14 +00:00
2026-04-29 23:30:30 +00:00
- Kibana: `http://localhost:5601`
- Grafana: `http://localhost:3000`
- Elasticsearch: `http://localhost:9200`
2026-04-29 23:14:14 +00:00
2026-04-29 23:30:30 +00:00
## Example Output
2026-04-29 23:14:14 +00:00
2026-04-29 23:30:30 +00:00
``` text
[2026-04-29 04:18:23] WARN Database connection pool nearing capacity
[2026-04-29 04:18:28] ERROR Database connection pool exhausted
[2026-04-29 04:18:33] ERROR Database query timeout occurred
[2026-04-29 04:18:44] INFO Database connections restored
2026-04-29 23:14:14 +00:00
```
2026-04-29 23:30:30 +00:00
Additional examples are available in [examples/alert-output.txt ](examples/alert-output.txt ) and [examples/sample-log.txt ](examples/sample-log.txt ).
2026-04-29 23:14:14 +00:00
2026-04-29 23:30:30 +00:00
## Real-World Use Case
2026-04-29 23:14:14 +00:00
2026-04-29 23:30:30 +00:00
A platform team can use this project to explain how logs move through an ingestion pipeline, how alert rules map to operational symptoms, and how incident exercises create evidence for on-call readiness reviews.