Files
portfolio/observability-stack
Mateusz Suski 7757020014
CI Pipeline / lint-ansible (push) Waiting to run
CI Pipeline / test-python (push) Waiting to run
CI Pipeline / validate-docker (push) Waiting to run
CI Pipeline / security-scan (push) Waiting to run
CI Pipeline / documentation (push) Waiting to run
CI Pipeline / integration-test (push) Blocked by required conditions
feat: Add comprehensive enterprise Linux infrastructure portfolio with Ansible, Python, and ELK stack
2026-04-29 23:14:14 +00:00
..

Observability Stack

A comprehensive monitoring and logging stack for enterprise infrastructure observability using the ELK (Elasticsearch, Logstash, Kibana) stack and Grafana. Includes sample data ingestion, alerting rules, and incident simulation scenarios.

Overview

The Observability Stack provides a complete monitoring solution with:

  • Elasticsearch: Distributed search and analytics engine for logs and metrics
  • Logstash: Data processing pipeline for log ingestion and transformation
  • Kibana: Visualization and exploration interface for Elasticsearch data
  • Grafana: Advanced metrics dashboarding and alerting platform
  • Sample Logs: Realistic log data for testing and demonstration
  • Alerting: Automated incident detection and notification rules
  • Incident Simulation: Scenarios for testing monitoring and response procedures

Architecture

┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐
│   Log Sources   │    │   Logstash      │    │   Elasticsearch │
│   (Applications │───►│   (Ingestion &  │───►│   (Storage &    │
│    / Systems)   │    │    Processing)  │    │    Analytics)   │
└─────────────────┘    └─────────────────┘    └─────────────────┘
         │                       │                       │
         ▼                       ▼                       ▼
┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐
│   Alerting      │    │   Kibana        │    │   Grafana       │
│   Rules         │    │   (Dashboards & │    │   (Metrics &    │
│                 │    │    Exploration) │    │    Dashboards)  │
└─────────────────┘    └─────────────────┘    └─────────────────┘

Quick Start

Prerequisites

  • Docker and Docker Compose
  • At least 4GB RAM available
  • Ports 5601 (Kibana), 9200 (Elasticsearch), 3000 (Grafana) available

Setup

cd observability-stack

# Start the observability stack
docker-compose up -d

# Wait for services to be ready (may take 2-3 minutes)
sleep 180

# Verify services are running
curl -X GET "localhost:9200/_cluster/health?pretty"
curl -X GET "localhost:5601/api/status"
curl -X GET "localhost:3000/api/health"

Access Interfaces

Project Structure

observability-stack/
├── docker-compose.yml     # Service orchestration
├── logstash/             # Logstash configuration
│   ├── pipeline/         # Processing pipelines
│   └── config/           # Logstash settings
├── elasticsearch/        # Elasticsearch configuration
│   └── config/           # Cluster settings
├── kibana/              # Kibana configuration
│   └── config/           # Dashboard settings
├── grafana/             # Grafana configuration
│   ├── provisioning/     # Dashboards and datasources
│   └── dashboards/       # Dashboard definitions
├── logs/                 # Sample log data
│   └── sample.log        # Realistic application logs
├── alerting/             # Alert configuration
│   └── alert_rules.yml   # Alert definitions
├── scenarios/            # Incident simulation
│   └── incident_simulation.sh  # Simulation scripts
└── README.md

Services Configuration

Elasticsearch

Configuration: elasticsearch/config/elasticsearch.yml

Key settings:

  • Single-node cluster for development
  • Memory limits and heap sizing
  • Security enabled with basic authentication
  • CORS enabled for Kibana access

Data Indices:

  • logs-*: Application and system logs
  • metrics-*: System and application metrics
  • alerts-*: Alert and incident data

Logstash

Pipelines: logstash/pipeline/

  • apache_logs: Apache/Nginx access log processing
  • system_logs: System log parsing and enrichment
  • application_logs: Custom application log processing
  • metrics_pipeline: Metrics data processing

Input Sources:

  • Filebeat agents
  • TCP/UDP syslog inputs
  • HTTP endpoints for metrics
  • Docker container logs

Kibana

Dashboards:

  • Log analysis dashboard
  • System metrics overview
  • Application performance dashboard
  • Security events dashboard

Saved Objects:

  • Index patterns for log data
  • Visualizations for common metrics
  • Search queries for troubleshooting

Grafana

Data Sources:

  • Elasticsearch for logs and metrics
  • Prometheus (if available)
  • InfluxDB for time-series data

Dashboards:

  • Infrastructure overview
  • Application performance
  • System resources
  • Custom business metrics

Log Ingestion

Sample Data

The stack includes realistic sample logs for testing:

# Ingest sample logs
curl -X POST "localhost:8080" \
  -H "Content-Type: application/json" \
  -d @logs/sample.log

Log Formats Supported

  • Apache/Nginx: Combined log format
  • Syslog: RFC 3164/5424 compliant
  • JSON: Structured application logs
  • Custom: Configurable parsing rules

Data Enrichment

Logstash pipelines add:

  • GeoIP location data
  • User agent parsing
  • Timestamp normalization
  • Host metadata enrichment

Alerting and Monitoring

Alert Rules

Located in alerting/alert_rules.yml:

alert_rules:
  - name: "High CPU Usage"
    condition: "cpu_usage > 90"
    duration: "5m"
    severity: "critical"
    channels: ["email", "slack"]

  - name: "Disk Space Low"
    condition: "disk_usage > 85"
    duration: "10m"
    severity: "warning"
    channels: ["email"]

  - name: "Service Down"
    condition: "service_status == 'down'"
    duration: "2m"
    severity: "critical"
    channels: ["email", "pagerduty"]

Alert Channels

  • Email: SMTP-based notifications
  • Slack: Real-time messaging
  • PagerDuty: Incident management integration
  • Webhook: Custom HTTP endpoints

Incident Simulation

Available Scenarios

cd scenarios

# Simulate disk space exhaustion
./incident_simulation.sh --type disk-full --severity critical

# Simulate service failure
./incident_simulation.sh --type service-down --service nginx

# Simulate network latency
./incident_simulation.sh --type network-latency --delay 500ms

# Simulate high CPU usage
./incident_simulation.sh --type high-cpu --cores 4

Scenario Types

  • disk-full: Filesystem capacity exhaustion
  • service-down: Application service failures
  • network-latency: Network performance degradation
  • high-cpu: CPU utilization spikes
  • memory-leak: Memory consumption growth
  • log-flood: Excessive log generation

Dashboards and Visualization

Kibana Dashboards

Pre-configured dashboards for:

  1. Log Analysis

    • Log volume over time
    • Error rate trends
    • Top error messages
    • Geographic request distribution
  2. System Monitoring

    • CPU and memory usage
    • Disk I/O statistics
    • Network traffic
    • System load averages
  3. Application Performance

    • Response time distributions
    • Request rate metrics
    • Error percentages
    • User session analytics

Grafana Dashboards

Advanced visualization panels:

  • Infrastructure Overview: Multi-system resource usage
  • Application Metrics: Custom business KPIs
  • Alert Status: Active alerts and trends
  • Capacity Planning: Resource utilization forecasting

API Endpoints

Elasticsearch APIs

# Cluster health
GET /_cluster/health

# Index statistics
GET /_cat/indices?v

# Search logs
GET /logs-*/_search
{
  "query": {
    "match": {
      "message": "ERROR"
    }
  }
}

Kibana APIs

# Get dashboard list
GET /api/saved_objects/_find?type=dashboard

# Export visualizations
GET /api/saved_objects/visualization/{id}

Grafana APIs

# Get dashboard list
GET /api/search?query=*

# Alert status
GET /api/alerts

Configuration Management

Environment Variables

# Elasticsearch
ES_JAVA_OPTS="-Xms1g -Xmx1g"
ELASTIC_PASSWORD="elastic"

# Logstash
LS_JAVA_OPTS="-Xms512m -Xmx512m"

# Grafana
GF_SECURITY_ADMIN_PASSWORD="admin"

Scaling Configuration

For production deployment:

version: '3.8'
services:
  elasticsearch:
    deploy:
      replicas: 3
      resources:
        limits:
          memory: 4G
          cpus: '2.0'

Security Considerations

Authentication

  • Elasticsearch basic authentication enabled
  • Grafana admin credentials configured
  • Kibana anonymous access disabled

Network Security

  • Services bound to localhost only
  • Internal network for service communication
  • TLS encryption for external access (production)

Data Protection

  • Elasticsearch encryption at rest
  • Log data retention policies
  • Backup and recovery procedures

Troubleshooting

Common Issues

Elasticsearch Won't Start:

# Check memory allocation
docker-compose logs elasticsearch

# Verify Java heap settings
docker-compose exec elasticsearch ps aux

Logstash Pipeline Errors:

# Check pipeline configuration
docker-compose logs logstash

# Validate pipeline syntax
docker-compose exec logstash logstash -t -f /usr/share/logstash/pipeline/

Kibana Connection Issues:

# Verify Elasticsearch connectivity
curl -u elastic:elastic "localhost:9200/_cluster/health"

# Check Kibana logs
docker-compose logs kibana

Performance Tuning

Elasticsearch:

  • Increase heap size for larger datasets
  • Configure shard allocation
  • Enable index optimization

Logstash:

  • Adjust worker threads
  • Configure batch sizes
  • Enable persistent queues

Grafana:

  • Configure query caching
  • Set dashboard refresh intervals
  • Optimize panel queries

Development and Testing

Adding New Dashboards

  1. Create dashboard JSON in grafana/dashboards/
  2. Update provisioning configuration
  3. Restart Grafana service

Custom Alert Rules

  1. Define rules in alerting/alert_rules.yml
  2. Update alerting configuration
  3. Test rules with simulation scenarios

Log Pipeline Development

  1. Add pipeline configuration in logstash/pipeline/
  2. Test with sample data
  3. Validate parsing with Kibana

Backup and Recovery

Data Backup

# Elasticsearch snapshot
curl -X PUT "localhost:9200/_snapshot/backup/snapshot_$(date +%Y%m%d_%H%M%S)" \
  -H "Content-Type: application/json" \
  -d '{"indices": "*"}'

Configuration Backup

# Backup all configurations
tar -czf backup_$(date +%Y%m%d).tar.gz \
  logstash/ elasticsearch/ kibana/ grafana/

Contributing

  1. Follow existing configuration patterns
  2. Test changes with simulation scenarios
  3. Update documentation for new features
  4. Ensure backward compatibility

License

Enterprise Internal Use Only