observability-stack/README.md

# Observability Stack

## Problem Statement

Operations teams need correlated logs, dashboards, and alert examples that make incidents observable before they become customer-facing outages. A stack that only starts containers is not enough; it also needs meaningful sample data and incident exercises.

## Solution Overview

This project defines a local observability environment with Elasticsearch, Logstash, Kibana, Grafana, Filebeat, alert rules, sample logs, and an incident simulation script. It is built to demonstrate practical monitoring workflows rather than a production-sized cluster.

## Architecture Overview

```
Application/System Logs -> Filebeat -> Logstash -> Elasticsearch -> Kibana
                                                       |
                                                       v
                                                    Grafana

Incident Scenario -> Sample Logs -> Alert Rules -> Operator Review
```

Core components:

- `docker-compose.yml` defines the observability services.
- `alerting/alert_rules.yml` records alert intent and severity.
- `logs/` contains representative operational logs.
- `scenarios/incident_simulation.sh` emits incident activity.
- `examples/` contains sample alert and log outputs.

## How to Run

```bash
cd observability-stack

# Validate the compose model.
make test

# Start the stack.
make run

# Run the incident simulation.
make demo

# Stop the stack.
docker compose down
```

When running locally:

- Kibana: `http://localhost:5601`
- Grafana: `http://localhost:3000`
- Elasticsearch: `http://localhost:9200`

## Example Output

```text
[2026-04-29 04:18:23] WARN Database connection pool nearing capacity
[2026-04-29 04:18:28] ERROR Database connection pool exhausted
[2026-04-29 04:18:33] ERROR Database query timeout occurred
[2026-04-29 04:18:44] INFO Database connections restored
```

Additional examples are available in [examples/alert-output.txt](examples/alert-output.txt) and [examples/sample-log.txt](examples/sample-log.txt).

## Real-World Use Case

A platform team can use this project to explain how logs move through an ingestion pipeline, how alert rules map to operational symptoms, and how incident exercises create evidence for on-call readiness reviews.
feat: Add comprehensive enterprise Linux infrastructure portfolio with Ansible, Python, and ELK stack 2026-04-29 23:14:14 +00:00			`# Observability Stack`

Polish infrastructure portfolio projects 2026-04-29 23:30:30 +00:00			`## Problem Statement`
feat: Add comprehensive enterprise Linux infrastructure portfolio with Ansible, Python, and ELK stack 2026-04-29 23:14:14 +00:00
Polish infrastructure portfolio projects 2026-04-29 23:30:30 +00:00			`Operations teams need correlated logs, dashboards, and alert examples that make incidents observable before they become customer-facing outages. A stack that only starts containers is not enough; it also needs meaningful sample data and incident exercises.`
feat: Add comprehensive enterprise Linux infrastructure portfolio with Ansible, Python, and ELK stack 2026-04-29 23:14:14 +00:00
Polish infrastructure portfolio projects 2026-04-29 23:30:30 +00:00			`## Solution Overview`
feat: Add comprehensive enterprise Linux infrastructure portfolio with Ansible, Python, and ELK stack 2026-04-29 23:14:14 +00:00
Polish infrastructure portfolio projects 2026-04-29 23:30:30 +00:00			`This project defines a local observability environment with Elasticsearch, Logstash, Kibana, Grafana, Filebeat, alert rules, sample logs, and an incident simulation script. It is built to demonstrate practical monitoring workflows rather than a production-sized cluster.`
feat: Add comprehensive enterprise Linux infrastructure portfolio with Ansible, Python, and ELK stack 2026-04-29 23:14:14 +00:00
Polish infrastructure portfolio projects 2026-04-29 23:30:30 +00:00			`## Architecture Overview`
feat: Add comprehensive enterprise Linux infrastructure portfolio with Ansible, Python, and ELK stack 2026-04-29 23:14:14 +00:00
			```
Polish infrastructure portfolio projects 2026-04-29 23:30:30 +00:00			`Application/System Logs -> Filebeat -> Logstash -> Elasticsearch -> Kibana`
			`\|`
			`v`
			`Grafana`
feat: Add comprehensive enterprise Linux infrastructure portfolio with Ansible, Python, and ELK stack 2026-04-29 23:14:14 +00:00
Polish infrastructure portfolio projects 2026-04-29 23:30:30 +00:00			`Incident Scenario -> Sample Logs -> Alert Rules -> Operator Review`
feat: Add comprehensive enterprise Linux infrastructure portfolio with Ansible, Python, and ELK stack 2026-04-29 23:14:14 +00:00			```

Polish infrastructure portfolio projects 2026-04-29 23:30:30 +00:00			`Core components:`
feat: Add comprehensive enterprise Linux infrastructure portfolio with Ansible, Python, and ELK stack 2026-04-29 23:14:14 +00:00
Polish infrastructure portfolio projects 2026-04-29 23:30:30 +00:00			- `docker-compose.yml` defines the observability services.
			- `alerting/alert_rules.yml` records alert intent and severity.
			- `logs/` contains representative operational logs.
			- `scenarios/incident_simulation.sh` emits incident activity.
			- `examples/` contains sample alert and log outputs.
feat: Add comprehensive enterprise Linux infrastructure portfolio with Ansible, Python, and ELK stack 2026-04-29 23:14:14 +00:00
Polish infrastructure portfolio projects 2026-04-29 23:30:30 +00:00			`## How to Run`
feat: Add comprehensive enterprise Linux infrastructure portfolio with Ansible, Python, and ELK stack 2026-04-29 23:14:14 +00:00
			```bash
Polish infrastructure portfolio projects 2026-04-29 23:30:30 +00:00			`cd observability-stack`
feat: Add comprehensive enterprise Linux infrastructure portfolio with Ansible, Python, and ELK stack 2026-04-29 23:14:14 +00:00
Polish infrastructure portfolio projects 2026-04-29 23:30:30 +00:00			`# Validate the compose model.`
			`make test`
feat: Add comprehensive enterprise Linux infrastructure portfolio with Ansible, Python, and ELK stack 2026-04-29 23:14:14 +00:00
Polish infrastructure portfolio projects 2026-04-29 23:30:30 +00:00			`# Start the stack.`
			`make run`
feat: Add comprehensive enterprise Linux infrastructure portfolio with Ansible, Python, and ELK stack 2026-04-29 23:14:14 +00:00
Polish infrastructure portfolio projects 2026-04-29 23:30:30 +00:00			`# Run the incident simulation.`
			`make demo`
feat: Add comprehensive enterprise Linux infrastructure portfolio with Ansible, Python, and ELK stack 2026-04-29 23:14:14 +00:00
Polish infrastructure portfolio projects 2026-04-29 23:30:30 +00:00			`# Stop the stack.`
			`docker compose down`
feat: Add comprehensive enterprise Linux infrastructure portfolio with Ansible, Python, and ELK stack 2026-04-29 23:14:14 +00:00			```

Polish infrastructure portfolio projects 2026-04-29 23:30:30 +00:00			`When running locally:`
feat: Add comprehensive enterprise Linux infrastructure portfolio with Ansible, Python, and ELK stack 2026-04-29 23:14:14 +00:00
Polish infrastructure portfolio projects 2026-04-29 23:30:30 +00:00			- Kibana: `http://localhost:5601`
			- Grafana: `http://localhost:3000`
			- Elasticsearch: `http://localhost:9200`
feat: Add comprehensive enterprise Linux infrastructure portfolio with Ansible, Python, and ELK stack 2026-04-29 23:14:14 +00:00
Polish infrastructure portfolio projects 2026-04-29 23:30:30 +00:00			`## Example Output`
feat: Add comprehensive enterprise Linux infrastructure portfolio with Ansible, Python, and ELK stack 2026-04-29 23:14:14 +00:00
Polish infrastructure portfolio projects 2026-04-29 23:30:30 +00:00			```text
			`[2026-04-29 04:18:23] WARN Database connection pool nearing capacity`
			`[2026-04-29 04:18:28] ERROR Database connection pool exhausted`
			`[2026-04-29 04:18:33] ERROR Database query timeout occurred`
			`[2026-04-29 04:18:44] INFO Database connections restored`
feat: Add comprehensive enterprise Linux infrastructure portfolio with Ansible, Python, and ELK stack 2026-04-29 23:14:14 +00:00			```

Polish infrastructure portfolio projects 2026-04-29 23:30:30 +00:00			`Additional examples are available in [examples/alert-output.txt](examples/alert-output.txt) and [examples/sample-log.txt](examples/sample-log.txt).`
feat: Add comprehensive enterprise Linux infrastructure portfolio with Ansible, Python, and ELK stack 2026-04-29 23:14:14 +00:00
Polish infrastructure portfolio projects 2026-04-29 23:30:30 +00:00			`## Real-World Use Case`
feat: Add comprehensive enterprise Linux infrastructure portfolio with Ansible, Python, and ELK stack 2026-04-29 23:14:14 +00:00
Polish infrastructure portfolio projects 2026-04-29 23:30:30 +00:00			`A platform team can use this project to explain how logs move through an ingestion pipeline, how alert rules map to operational symptoms, and how incident exercises create evidence for on-call readiness reviews.`