feat: Add comprehensive enterprise Linux infrastructure portfolio with Ansible, Python, and ELK stack
CI Pipeline / lint-ansible (push) Waiting to run
CI Pipeline / test-python (push) Waiting to run
CI Pipeline / validate-docker (push) Waiting to run
CI Pipeline / security-scan (push) Waiting to run
CI Pipeline / documentation (push) Waiting to run
CI Pipeline / integration-test (push) Blocked by required conditions
CI Pipeline / lint-ansible (push) Waiting to run
CI Pipeline / test-python (push) Waiting to run
CI Pipeline / validate-docker (push) Waiting to run
CI Pipeline / security-scan (push) Waiting to run
CI Pipeline / documentation (push) Waiting to run
CI Pipeline / integration-test (push) Blocked by required conditions
This commit is contained in:
@@ -0,0 +1,147 @@
|
||||
# Architecture Overview
|
||||
|
||||
## Enterprise Infrastructure Portfolio Architecture
|
||||
|
||||
This document provides a high-level overview of the architecture and design principles implemented across the three main projects in this portfolio.
|
||||
|
||||
## Overall Architecture
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────┐
|
||||
│ Enterprise Portfolio │
|
||||
├─────────────────────────────────────────────────────────────┤
|
||||
│ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────┐ │
|
||||
│ │ Infra Simulator│ │Migration │ │Observability│ │
|
||||
│ │ (Ansible/Docker│ │Validation │ │Stack │ │
|
||||
│ │ Container Sim) │ │(Python CLI) │ │(ELK/Grafana)│ │
|
||||
│ └─────────────────┘ └─────────────────┘ └─────────────┘ │
|
||||
├─────────────────────────────────────────────────────────────┤
|
||||
│ Infrastructure Simulation │ Validation Framework │ Monitoring │
|
||||
└─────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
## Project Architectures
|
||||
|
||||
### 1. Enterprise Infrastructure Simulator
|
||||
|
||||
**Architecture Pattern:** Container-based Infrastructure Simulation
|
||||
|
||||
```
|
||||
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
|
||||
│ Ansible │ │ Docker │ │ Simulation │
|
||||
│ Controller │◄──►│ Containers │◄──►│ Scripts │
|
||||
│ │ │ (Linux Nodes) │ │ │
|
||||
└─────────────────┘ └─────────────────┘ └─────────────────┘
|
||||
│ │ │
|
||||
▼ ▼ ▼
|
||||
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
|
||||
│ Inventory │ │ Playbooks │ │ Scenarios │
|
||||
│ Management │ │ (Provision/ │ │ (Scaling/ │
|
||||
│ │ │ Patch/ │ │ Failures) │
|
||||
│ │ │ Harden/ │ │ │
|
||||
│ │ │ Decommission)│ │ │
|
||||
└─────────────────┘ └─────────────────┘ └─────────────────┘
|
||||
```
|
||||
|
||||
**Key Components:**
|
||||
- **Ansible Controller:** Central orchestration for infrastructure operations
|
||||
- **Docker Containers:** Simulated Linux nodes with realistic configurations
|
||||
- **Simulation Scripts:** Automated scaling and failure injection
|
||||
- **Inventory System:** Dynamic host management and grouping
|
||||
- **Playbook Library:** Modular automation for different lifecycle phases
|
||||
|
||||
### 2. Migration Validation Framework
|
||||
|
||||
**Architecture Pattern:** Data Collection and Comparison Pipeline
|
||||
|
||||
```
|
||||
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
|
||||
│ CLI Interface │ │ Data │ │ Validation │
|
||||
│ (cli.py) │◄──►│ Collectors │◄──►│ Engine │
|
||||
└─────────────────┘ └─────────────────┘ └─────────────────┘
|
||||
│ │ │
|
||||
▼ ▼ ▼
|
||||
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
|
||||
│ JSON │ │ Comparison │ │ HTML │
|
||||
│ Snapshots │ │ Logic │ │ Reports │
|
||||
│ (Before/After)│ │ │ │ │
|
||||
└─────────────────┘ └─────────────────┘ └─────────────────┘
|
||||
```
|
||||
|
||||
**Key Components:**
|
||||
- **CLI Interface:** Command-line tool for migration workflow orchestration
|
||||
- **Data Collectors:** Specialized modules for system data extraction
|
||||
- **Validation Engine:** Snapshot comparison and difference analysis
|
||||
- **Report Generator:** HTML output with change visualization
|
||||
- **JSON Storage:** Structured data persistence for before/after states
|
||||
|
||||
### 3. Observability Stack
|
||||
|
||||
**Architecture Pattern:** Distributed Monitoring and Logging
|
||||
|
||||
```
|
||||
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
|
||||
│ Logstash │ │ Elasticsearch │ │ Kibana │
|
||||
│ (Ingestion) │◄──►│ (Storage) │◄──►│ (Visualization)│
|
||||
└─────────────────┘ └─────────────────┘ └─────────────────┘
|
||||
▲ ▲ ▲
|
||||
│ │ │
|
||||
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
|
||||
│ Sample Logs │ │ Alert Rules │ │ Grafana │
|
||||
│ (Data Sources)│ │ (Conditions) │ │ (Dashboards) │
|
||||
└─────────────────┘ └─────────────────┘ └─────────────────┘
|
||||
```
|
||||
|
||||
**Key Components:**
|
||||
- **Logstash Pipelines:** Data ingestion and transformation
|
||||
- **Elasticsearch Cluster:** Distributed search and analytics
|
||||
- **Kibana Dashboards:** Real-time visualization and exploration
|
||||
- **Grafana Integration:** Advanced metrics and alerting
|
||||
- **Alerting Engine:** Automated incident detection and notification
|
||||
|
||||
## Design Principles
|
||||
|
||||
### Infrastructure as Code
|
||||
- All infrastructure defined in code (Ansible, Docker Compose, Python)
|
||||
- Version-controlled configurations and automation
|
||||
- Reproducible environments and deployments
|
||||
|
||||
### Modular Architecture
|
||||
- Separated concerns across projects and components
|
||||
- Reusable modules and playbooks
|
||||
- Clear interfaces between systems
|
||||
|
||||
### Enterprise Standards
|
||||
- Realistic naming conventions and structures
|
||||
- Production-quality error handling and logging
|
||||
- Security hardening and compliance considerations
|
||||
|
||||
### Observability First
|
||||
- Comprehensive logging and monitoring
|
||||
- Automated alerting and incident response
|
||||
- Performance metrics and health checks
|
||||
|
||||
## Technology Stack
|
||||
|
||||
- **Containerization:** Docker, Docker Compose
|
||||
- **Configuration Management:** Ansible
|
||||
- **Programming Language:** Python 3.8+
|
||||
- **Monitoring Stack:** ELK Stack (Elasticsearch, Logstash, Kibana)
|
||||
- **Visualization:** Grafana
|
||||
- **CI/CD:** Gitea Actions
|
||||
- **Documentation:** Markdown
|
||||
|
||||
## Security Considerations
|
||||
|
||||
- Container security scanning integration
|
||||
- Ansible vault for secrets management
|
||||
- Network segmentation in Docker Compose
|
||||
- Least privilege access principles
|
||||
- Audit logging and compliance reporting
|
||||
|
||||
## Scalability and Performance
|
||||
|
||||
- Horizontal scaling through container orchestration
|
||||
- Efficient data collection and processing
|
||||
- Optimized Elasticsearch indexing
|
||||
- Resource-aware automation scripts
|
||||
@@ -0,0 +1,329 @@
|
||||
# Runbooks and Operational Procedures
|
||||
|
||||
This document contains operational runbooks for deploying, managing, and troubleshooting the Enterprise Infrastructure Portfolio projects.
|
||||
|
||||
## Table of Contents
|
||||
|
||||
1. [Infrastructure Simulator Operations](#infrastructure-simulator-operations)
|
||||
2. [Migration Validation Procedures](#migration-validation-procedures)
|
||||
3. [Observability Stack Management](#observability-stack-management)
|
||||
4. [Troubleshooting Guide](#troubleshooting-guide)
|
||||
|
||||
## Infrastructure Simulator Operations
|
||||
|
||||
### Starting the Infrastructure
|
||||
|
||||
```bash
|
||||
cd enterprise-infra-simulator
|
||||
make up
|
||||
```
|
||||
|
||||
**Expected Outcome:**
|
||||
- Docker containers for simulated Linux nodes are created
|
||||
- Ansible inventory is populated
|
||||
- Basic services are running on all nodes
|
||||
|
||||
**Verification:**
|
||||
```bash
|
||||
docker ps | grep infra-sim
|
||||
ansible -i inventory/hosts.ini all -m ping
|
||||
```
|
||||
|
||||
### Patching Operations
|
||||
|
||||
```bash
|
||||
cd enterprise-infra-simulator
|
||||
make patch
|
||||
```
|
||||
|
||||
**Procedure:**
|
||||
1. Backup current container states
|
||||
2. Apply security patches via Ansible
|
||||
3. Validate service availability
|
||||
4. Generate patch report
|
||||
|
||||
**Rollback:**
|
||||
```bash
|
||||
docker-compose down
|
||||
docker-compose up --scale node=0
|
||||
make up
|
||||
```
|
||||
|
||||
### Hardening Operations
|
||||
|
||||
```bash
|
||||
cd enterprise-infra-simulator
|
||||
ansible-playbook -i inventory/hosts.ini playbooks/harden.yml
|
||||
```
|
||||
|
||||
**Hardening Steps:**
|
||||
- Disable unnecessary services
|
||||
- Configure firewall rules
|
||||
- Set secure SSH configurations
|
||||
- Apply CIS benchmarks
|
||||
|
||||
### Scaling Operations
|
||||
|
||||
```bash
|
||||
cd enterprise-infra-simulator
|
||||
./scripts/simulate_scaling.sh up 3
|
||||
```
|
||||
|
||||
**Scaling Parameters:**
|
||||
- Direction: up/down
|
||||
- Count: number of nodes to add/remove
|
||||
- Type: web/app/db
|
||||
|
||||
### Failure Simulation
|
||||
|
||||
```bash
|
||||
cd enterprise-infra-simulator
|
||||
./scripts/simulate_failure.sh --type network --duration 300
|
||||
```
|
||||
|
||||
**Failure Types:**
|
||||
- network: Network partition
|
||||
- disk: Disk space exhaustion
|
||||
- service: Service crashes
|
||||
- node: Complete node failure
|
||||
|
||||
### Decommissioning
|
||||
|
||||
```bash
|
||||
cd enterprise-infra-simulator
|
||||
make destroy
|
||||
```
|
||||
|
||||
**Decommission Steps:**
|
||||
1. Graceful service shutdown
|
||||
2. Data backup and export
|
||||
3. Configuration cleanup
|
||||
4. Container removal
|
||||
|
||||
## Migration Validation Procedures
|
||||
|
||||
### Pre-Migration Snapshot
|
||||
|
||||
```bash
|
||||
cd migration-validation-framework
|
||||
python cli.py snapshot --env production --label pre-migration
|
||||
```
|
||||
|
||||
**Data Collected:**
|
||||
- Mount points and filesystem usage
|
||||
- Running services and their states
|
||||
- Disk usage statistics
|
||||
- Network configurations
|
||||
|
||||
### Post-Migration Validation
|
||||
|
||||
```bash
|
||||
python cli.py snapshot --env production --label post-migration
|
||||
python cli.py compare pre-migration post-migration
|
||||
```
|
||||
|
||||
**Validation Checks:**
|
||||
- Service availability verification
|
||||
- Filesystem integrity
|
||||
- Configuration consistency
|
||||
- Performance metrics comparison
|
||||
|
||||
### Report Generation
|
||||
|
||||
```bash
|
||||
python cli.py report --comparison-id <comparison-id> --format html
|
||||
```
|
||||
|
||||
**Report Contents:**
|
||||
- Executive summary
|
||||
- Detailed change log
|
||||
- Risk assessment
|
||||
- Recommendations
|
||||
|
||||
## Observability Stack Management
|
||||
|
||||
### Starting the Stack
|
||||
|
||||
```bash
|
||||
cd observability-stack
|
||||
docker-compose up -d
|
||||
```
|
||||
|
||||
**Service Startup Order:**
|
||||
1. Elasticsearch
|
||||
2. Logstash
|
||||
3. Kibana
|
||||
4. Grafana
|
||||
|
||||
### Log Ingestion Testing
|
||||
|
||||
```bash
|
||||
# Send sample logs
|
||||
curl -X POST "localhost:8080" -H "Content-Type: application/json" -d @logs/sample.log
|
||||
```
|
||||
|
||||
### Alert Configuration
|
||||
|
||||
```bash
|
||||
# Load alert rules
|
||||
curl -X POST "localhost:3000/api/alerts" -H "Authorization: Bearer <token>" -d @alerting/alert_rules.json
|
||||
```
|
||||
|
||||
### Incident Simulation
|
||||
|
||||
```bash
|
||||
cd observability-stack
|
||||
./scenarios/incident_simulation.sh --type disk-full --severity critical
|
||||
```
|
||||
|
||||
**Incident Types:**
|
||||
- disk-full: Simulate disk space exhaustion
|
||||
- service-down: Service failure simulation
|
||||
- high-cpu: CPU utilization spike
|
||||
- network-latency: Network performance degradation
|
||||
|
||||
## Troubleshooting Guide
|
||||
|
||||
### Common Issues
|
||||
|
||||
#### Ansible Connection Failures
|
||||
|
||||
**Symptoms:**
|
||||
- `UNREACHABLE` errors in Ansible output
|
||||
- SSH connection timeouts
|
||||
|
||||
**Resolution:**
|
||||
```bash
|
||||
# Check container status
|
||||
docker ps | grep infra-sim
|
||||
|
||||
# Verify SSH keys
|
||||
ansible -i inventory/hosts.ini all -m ping --private-key ~/.ssh/id_rsa
|
||||
|
||||
# Restart containers
|
||||
make destroy && make up
|
||||
```
|
||||
|
||||
#### Elasticsearch Cluster Issues
|
||||
|
||||
**Symptoms:**
|
||||
- Kibana shows "No living connections"
|
||||
- Logstash pipeline failures
|
||||
|
||||
**Resolution:**
|
||||
```bash
|
||||
# Check cluster health
|
||||
curl -X GET "localhost:9200/_cluster/health?pretty"
|
||||
|
||||
# Restart services
|
||||
docker-compose restart elasticsearch logstash kibana
|
||||
```
|
||||
|
||||
#### Python Import Errors
|
||||
|
||||
**Symptoms:**
|
||||
- ModuleNotFoundError in migration framework
|
||||
- Collector failures
|
||||
|
||||
**Resolution:**
|
||||
```bash
|
||||
# Install dependencies
|
||||
pip install -r requirements.txt
|
||||
|
||||
# Check Python path
|
||||
python -c "import sys; print(sys.path)"
|
||||
```
|
||||
|
||||
#### Docker Resource Constraints
|
||||
|
||||
**Symptoms:**
|
||||
- Container startup failures
|
||||
- Out of memory errors
|
||||
|
||||
**Resolution:**
|
||||
```bash
|
||||
# Check Docker resources
|
||||
docker system df
|
||||
|
||||
# Clean up unused resources
|
||||
docker system prune -a
|
||||
|
||||
# Increase Docker memory limit
|
||||
# Edit /etc/docker/daemon.json
|
||||
{
|
||||
"memory": "4g",
|
||||
"cpu-count": 2
|
||||
}
|
||||
```
|
||||
|
||||
### Log Locations
|
||||
|
||||
- **Ansible:** `enterprise-infra-simulator/ansible.log`
|
||||
- **Docker:** `docker logs <container-name>`
|
||||
- **Elasticsearch:** `observability-stack/logs/elasticsearch.log`
|
||||
- **Migration Framework:** `migration-validation-framework/logs/validation.log`
|
||||
|
||||
### Performance Monitoring
|
||||
|
||||
```bash
|
||||
# Infrastructure monitoring
|
||||
ansible -i inventory/hosts.ini all -m shell -a "top -b -n1 | head -20"
|
||||
|
||||
# Elasticsearch metrics
|
||||
curl -X GET "localhost:9200/_cluster/stats?pretty"
|
||||
|
||||
# Python performance
|
||||
python -m cProfile cli.py snapshot
|
||||
```
|
||||
|
||||
### Backup and Recovery
|
||||
|
||||
#### Infrastructure Backup
|
||||
```bash
|
||||
cd enterprise-infra-simulator
|
||||
docker-compose exec ansible ansible-playbook /playbooks/backup.yml
|
||||
```
|
||||
|
||||
#### Data Backup
|
||||
```bash
|
||||
cd observability-stack
|
||||
docker-compose exec elasticsearch curl -X PUT "localhost:9200/_snapshot/backup" -H "Content-Type: application/json" -d @backup_config.json
|
||||
```
|
||||
|
||||
#### Migration Data Backup
|
||||
```bash
|
||||
cd migration-validation-framework
|
||||
python cli.py backup --destination /backup/location
|
||||
```
|
||||
|
||||
## Emergency Procedures
|
||||
|
||||
### Complete System Reset
|
||||
|
||||
```bash
|
||||
# Stop all services
|
||||
docker-compose down -v
|
||||
cd enterprise-infra-simulator && make destroy
|
||||
|
||||
# Clean up volumes
|
||||
docker volume prune -f
|
||||
|
||||
# Restart from clean state
|
||||
cd enterprise-infra-simulator && make up
|
||||
cd observability-stack && docker-compose up -d
|
||||
```
|
||||
|
||||
### Incident Response
|
||||
|
||||
1. **Assess Impact:** Check monitoring dashboards
|
||||
2. **Isolate Issue:** Use failure simulation scripts to reproduce
|
||||
3. **Implement Fix:** Apply appropriate runbook procedure
|
||||
4. **Validate Recovery:** Run validation framework
|
||||
5. **Document Incident:** Update runbooks with lessons learned
|
||||
|
||||
## Maintenance Schedules
|
||||
|
||||
- **Daily:** Log rotation and cleanup
|
||||
- **Weekly:** Security patching and updates
|
||||
- **Monthly:** Performance optimization and capacity planning
|
||||
- **Quarterly:** Architecture review and modernization
|
||||
Reference in New Issue
Block a user