feat: Add comprehensive enterprise Linux infrastructure portfolio with Ansible, Python, and ELK stack
CI Pipeline / lint-ansible (push) Waiting to run
CI Pipeline / test-python (push) Waiting to run
CI Pipeline / validate-docker (push) Waiting to run
CI Pipeline / security-scan (push) Waiting to run
CI Pipeline / documentation (push) Waiting to run
CI Pipeline / integration-test (push) Blocked by required conditions

This commit is contained in:
Mateusz Suski
2026-04-29 23:14:14 +00:00
parent 2313efac88
commit 7757020014
33 changed files with 6165 additions and 0 deletions
+147
View File
@@ -0,0 +1,147 @@
# Architecture Overview
## Enterprise Infrastructure Portfolio Architecture
This document provides a high-level overview of the architecture and design principles implemented across the three main projects in this portfolio.
## Overall Architecture
```
┌─────────────────────────────────────────────────────────────┐
│ Enterprise Portfolio │
├─────────────────────────────────────────────────────────────┤
│ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────┐ │
│ │ Infra Simulator│ │Migration │ │Observability│ │
│ │ (Ansible/Docker│ │Validation │ │Stack │ │
│ │ Container Sim) │ │(Python CLI) │ │(ELK/Grafana)│ │
│ └─────────────────┘ └─────────────────┘ └─────────────┘ │
├─────────────────────────────────────────────────────────────┤
│ Infrastructure Simulation │ Validation Framework │ Monitoring │
└─────────────────────────────────────────────────────────────┘
```
## Project Architectures
### 1. Enterprise Infrastructure Simulator
**Architecture Pattern:** Container-based Infrastructure Simulation
```
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ Ansible │ │ Docker │ │ Simulation │
│ Controller │◄──►│ Containers │◄──►│ Scripts │
│ │ │ (Linux Nodes) │ │ │
└─────────────────┘ └─────────────────┘ └─────────────────┘
│ │ │
▼ ▼ ▼
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ Inventory │ │ Playbooks │ │ Scenarios │
│ Management │ │ (Provision/ │ │ (Scaling/ │
│ │ │ Patch/ │ │ Failures) │
│ │ │ Harden/ │ │ │
│ │ │ Decommission)│ │ │
└─────────────────┘ └─────────────────┘ └─────────────────┘
```
**Key Components:**
- **Ansible Controller:** Central orchestration for infrastructure operations
- **Docker Containers:** Simulated Linux nodes with realistic configurations
- **Simulation Scripts:** Automated scaling and failure injection
- **Inventory System:** Dynamic host management and grouping
- **Playbook Library:** Modular automation for different lifecycle phases
### 2. Migration Validation Framework
**Architecture Pattern:** Data Collection and Comparison Pipeline
```
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ CLI Interface │ │ Data │ │ Validation │
│ (cli.py) │◄──►│ Collectors │◄──►│ Engine │
└─────────────────┘ └─────────────────┘ └─────────────────┘
│ │ │
▼ ▼ ▼
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ JSON │ │ Comparison │ │ HTML │
│ Snapshots │ │ Logic │ │ Reports │
│ (Before/After)│ │ │ │ │
└─────────────────┘ └─────────────────┘ └─────────────────┘
```
**Key Components:**
- **CLI Interface:** Command-line tool for migration workflow orchestration
- **Data Collectors:** Specialized modules for system data extraction
- **Validation Engine:** Snapshot comparison and difference analysis
- **Report Generator:** HTML output with change visualization
- **JSON Storage:** Structured data persistence for before/after states
### 3. Observability Stack
**Architecture Pattern:** Distributed Monitoring and Logging
```
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ Logstash │ │ Elasticsearch │ │ Kibana │
│ (Ingestion) │◄──►│ (Storage) │◄──►│ (Visualization)│
└─────────────────┘ └─────────────────┘ └─────────────────┘
▲ ▲ ▲
│ │ │
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ Sample Logs │ │ Alert Rules │ │ Grafana │
│ (Data Sources)│ │ (Conditions) │ │ (Dashboards) │
└─────────────────┘ └─────────────────┘ └─────────────────┘
```
**Key Components:**
- **Logstash Pipelines:** Data ingestion and transformation
- **Elasticsearch Cluster:** Distributed search and analytics
- **Kibana Dashboards:** Real-time visualization and exploration
- **Grafana Integration:** Advanced metrics and alerting
- **Alerting Engine:** Automated incident detection and notification
## Design Principles
### Infrastructure as Code
- All infrastructure defined in code (Ansible, Docker Compose, Python)
- Version-controlled configurations and automation
- Reproducible environments and deployments
### Modular Architecture
- Separated concerns across projects and components
- Reusable modules and playbooks
- Clear interfaces between systems
### Enterprise Standards
- Realistic naming conventions and structures
- Production-quality error handling and logging
- Security hardening and compliance considerations
### Observability First
- Comprehensive logging and monitoring
- Automated alerting and incident response
- Performance metrics and health checks
## Technology Stack
- **Containerization:** Docker, Docker Compose
- **Configuration Management:** Ansible
- **Programming Language:** Python 3.8+
- **Monitoring Stack:** ELK Stack (Elasticsearch, Logstash, Kibana)
- **Visualization:** Grafana
- **CI/CD:** Gitea Actions
- **Documentation:** Markdown
## Security Considerations
- Container security scanning integration
- Ansible vault for secrets management
- Network segmentation in Docker Compose
- Least privilege access principles
- Audit logging and compliance reporting
## Scalability and Performance
- Horizontal scaling through container orchestration
- Efficient data collection and processing
- Optimized Elasticsearch indexing
- Resource-aware automation scripts
+329
View File
@@ -0,0 +1,329 @@
# Runbooks and Operational Procedures
This document contains operational runbooks for deploying, managing, and troubleshooting the Enterprise Infrastructure Portfolio projects.
## Table of Contents
1. [Infrastructure Simulator Operations](#infrastructure-simulator-operations)
2. [Migration Validation Procedures](#migration-validation-procedures)
3. [Observability Stack Management](#observability-stack-management)
4. [Troubleshooting Guide](#troubleshooting-guide)
## Infrastructure Simulator Operations
### Starting the Infrastructure
```bash
cd enterprise-infra-simulator
make up
```
**Expected Outcome:**
- Docker containers for simulated Linux nodes are created
- Ansible inventory is populated
- Basic services are running on all nodes
**Verification:**
```bash
docker ps | grep infra-sim
ansible -i inventory/hosts.ini all -m ping
```
### Patching Operations
```bash
cd enterprise-infra-simulator
make patch
```
**Procedure:**
1. Backup current container states
2. Apply security patches via Ansible
3. Validate service availability
4. Generate patch report
**Rollback:**
```bash
docker-compose down
docker-compose up --scale node=0
make up
```
### Hardening Operations
```bash
cd enterprise-infra-simulator
ansible-playbook -i inventory/hosts.ini playbooks/harden.yml
```
**Hardening Steps:**
- Disable unnecessary services
- Configure firewall rules
- Set secure SSH configurations
- Apply CIS benchmarks
### Scaling Operations
```bash
cd enterprise-infra-simulator
./scripts/simulate_scaling.sh up 3
```
**Scaling Parameters:**
- Direction: up/down
- Count: number of nodes to add/remove
- Type: web/app/db
### Failure Simulation
```bash
cd enterprise-infra-simulator
./scripts/simulate_failure.sh --type network --duration 300
```
**Failure Types:**
- network: Network partition
- disk: Disk space exhaustion
- service: Service crashes
- node: Complete node failure
### Decommissioning
```bash
cd enterprise-infra-simulator
make destroy
```
**Decommission Steps:**
1. Graceful service shutdown
2. Data backup and export
3. Configuration cleanup
4. Container removal
## Migration Validation Procedures
### Pre-Migration Snapshot
```bash
cd migration-validation-framework
python cli.py snapshot --env production --label pre-migration
```
**Data Collected:**
- Mount points and filesystem usage
- Running services and their states
- Disk usage statistics
- Network configurations
### Post-Migration Validation
```bash
python cli.py snapshot --env production --label post-migration
python cli.py compare pre-migration post-migration
```
**Validation Checks:**
- Service availability verification
- Filesystem integrity
- Configuration consistency
- Performance metrics comparison
### Report Generation
```bash
python cli.py report --comparison-id <comparison-id> --format html
```
**Report Contents:**
- Executive summary
- Detailed change log
- Risk assessment
- Recommendations
## Observability Stack Management
### Starting the Stack
```bash
cd observability-stack
docker-compose up -d
```
**Service Startup Order:**
1. Elasticsearch
2. Logstash
3. Kibana
4. Grafana
### Log Ingestion Testing
```bash
# Send sample logs
curl -X POST "localhost:8080" -H "Content-Type: application/json" -d @logs/sample.log
```
### Alert Configuration
```bash
# Load alert rules
curl -X POST "localhost:3000/api/alerts" -H "Authorization: Bearer <token>" -d @alerting/alert_rules.json
```
### Incident Simulation
```bash
cd observability-stack
./scenarios/incident_simulation.sh --type disk-full --severity critical
```
**Incident Types:**
- disk-full: Simulate disk space exhaustion
- service-down: Service failure simulation
- high-cpu: CPU utilization spike
- network-latency: Network performance degradation
## Troubleshooting Guide
### Common Issues
#### Ansible Connection Failures
**Symptoms:**
- `UNREACHABLE` errors in Ansible output
- SSH connection timeouts
**Resolution:**
```bash
# Check container status
docker ps | grep infra-sim
# Verify SSH keys
ansible -i inventory/hosts.ini all -m ping --private-key ~/.ssh/id_rsa
# Restart containers
make destroy && make up
```
#### Elasticsearch Cluster Issues
**Symptoms:**
- Kibana shows "No living connections"
- Logstash pipeline failures
**Resolution:**
```bash
# Check cluster health
curl -X GET "localhost:9200/_cluster/health?pretty"
# Restart services
docker-compose restart elasticsearch logstash kibana
```
#### Python Import Errors
**Symptoms:**
- ModuleNotFoundError in migration framework
- Collector failures
**Resolution:**
```bash
# Install dependencies
pip install -r requirements.txt
# Check Python path
python -c "import sys; print(sys.path)"
```
#### Docker Resource Constraints
**Symptoms:**
- Container startup failures
- Out of memory errors
**Resolution:**
```bash
# Check Docker resources
docker system df
# Clean up unused resources
docker system prune -a
# Increase Docker memory limit
# Edit /etc/docker/daemon.json
{
"memory": "4g",
"cpu-count": 2
}
```
### Log Locations
- **Ansible:** `enterprise-infra-simulator/ansible.log`
- **Docker:** `docker logs <container-name>`
- **Elasticsearch:** `observability-stack/logs/elasticsearch.log`
- **Migration Framework:** `migration-validation-framework/logs/validation.log`
### Performance Monitoring
```bash
# Infrastructure monitoring
ansible -i inventory/hosts.ini all -m shell -a "top -b -n1 | head -20"
# Elasticsearch metrics
curl -X GET "localhost:9200/_cluster/stats?pretty"
# Python performance
python -m cProfile cli.py snapshot
```
### Backup and Recovery
#### Infrastructure Backup
```bash
cd enterprise-infra-simulator
docker-compose exec ansible ansible-playbook /playbooks/backup.yml
```
#### Data Backup
```bash
cd observability-stack
docker-compose exec elasticsearch curl -X PUT "localhost:9200/_snapshot/backup" -H "Content-Type: application/json" -d @backup_config.json
```
#### Migration Data Backup
```bash
cd migration-validation-framework
python cli.py backup --destination /backup/location
```
## Emergency Procedures
### Complete System Reset
```bash
# Stop all services
docker-compose down -v
cd enterprise-infra-simulator && make destroy
# Clean up volumes
docker volume prune -f
# Restart from clean state
cd enterprise-infra-simulator && make up
cd observability-stack && docker-compose up -d
```
### Incident Response
1. **Assess Impact:** Check monitoring dashboards
2. **Isolate Issue:** Use failure simulation scripts to reproduce
3. **Implement Fix:** Apply appropriate runbook procedure
4. **Validate Recovery:** Run validation framework
5. **Document Incident:** Update runbooks with lessons learned
## Maintenance Schedules
- **Daily:** Log rotation and cleanup
- **Weekly:** Security patching and updates
- **Monthly:** Performance optimization and capacity planning
- **Quarterly:** Architecture review and modernization