This commit is contained in:
@@ -1,6 +1,6 @@
|
||||
# Enterprise Infrastructure Simulator Makefile
|
||||
|
||||
.PHONY: help up down patch destroy status logs clean test
|
||||
.PHONY: help run demo up down patch destroy status logs clean test
|
||||
|
||||
# Default target
|
||||
help: ## Show this help message
|
||||
@@ -9,6 +9,13 @@ help: ## Show this help message
|
||||
@echo "Available commands:"
|
||||
@grep -E '^[a-zA-Z_-]+:.*?## .*$$' $(MAKEFILE_LIST) | sort | awk 'BEGIN {FS = ":.*?## "}; {printf " %-15s %s\n", $$1, $$2}'
|
||||
|
||||
run: ## Run the default simulator workflow
|
||||
ansible-playbook -i inventory/hosts.ini playbooks/provision.yml
|
||||
|
||||
demo: ## Run a failure-and-patch demonstration
|
||||
./scripts/simulate_failure.sh service 30 web
|
||||
ansible-playbook -i inventory/hosts.ini playbooks/patch.yml
|
||||
|
||||
# Infrastructure management
|
||||
up: ## Start the infrastructure simulation
|
||||
@echo "Starting enterprise infrastructure simulation..."
|
||||
@@ -144,7 +151,7 @@ format: ## Format code and configuration
|
||||
# Security
|
||||
harden: ## Apply security hardening
|
||||
@echo "Applying security hardening..."
|
||||
ansible-playbook -i inventory/hosts.ini playbooks/harden.yml
|
||||
ansible-playbook -i inventory/hosts.ini playbooks/hardening.yml
|
||||
|
||||
security-scan: ## Run security scans
|
||||
@echo "Running security scans..."
|
||||
@@ -163,4 +170,4 @@ help-failure: ## Show failure simulation commands
|
||||
@echo " make fail-network DURATION=60 - Network failure for 60s"
|
||||
@echo " make fail-disk DURATION=120 - Disk exhaustion for 120s"
|
||||
@echo " make fail-service DURATION=30 - Service failure for 30s"
|
||||
@echo " make fail-node DURATION=300 - Node failure for 300s"
|
||||
@echo " make fail-node DURATION=300 - Node failure for 300s"
|
||||
|
||||
@@ -1,268 +1,74 @@
|
||||
# Enterprise Infrastructure Simulator
|
||||
|
||||
A container-based simulation environment for enterprise Linux infrastructure operations. This project provides Ansible automation for provisioning, patching, hardening, and decommissioning of simulated Linux nodes, along with scripts for scaling and failure simulation.
|
||||
## Problem Statement
|
||||
|
||||
## Overview
|
||||
Infrastructure teams need a safe place to rehearse lifecycle operations before applying them to production fleets. Patch windows, hardening changes, scale events, and node failures all carry operational risk when they are tested only during real incidents.
|
||||
|
||||
The Enterprise Infrastructure Simulator creates a realistic environment for testing and demonstrating infrastructure automation at scale. It uses Docker containers to simulate multiple Linux nodes and provides comprehensive Ansible playbooks for enterprise operations.
|
||||
## Solution Overview
|
||||
|
||||
## Architecture
|
||||
This project models common Linux infrastructure operations with Ansible playbooks and shell-based simulations. It keeps the automation readable and auditable while producing example evidence that resembles a real change record.
|
||||
|
||||
- **Container Simulation:** Docker-based Linux nodes with realistic configurations
|
||||
- **Ansible Automation:** Modular playbooks for infrastructure lifecycle management
|
||||
- **Dynamic Inventory:** Automated host discovery and grouping
|
||||
- **Simulation Scripts:** Automated scaling and failure injection
|
||||
- **Scenario Management:** Pre-defined operational scenarios
|
||||
## Architecture Overview
|
||||
|
||||
## Quick Start
|
||||
```
|
||||
Operator -> Make/CLI -> Ansible Inventory -> Playbooks -> Linux Nodes
|
||||
| |
|
||||
v v
|
||||
Scenarios Reports/Logs
|
||||
```
|
||||
|
||||
### Prerequisites
|
||||
Core components:
|
||||
|
||||
- Docker and Docker Compose
|
||||
- Ansible 2.9+
|
||||
- Make
|
||||
- `inventory/hosts.ini` defines managed node groups.
|
||||
- `playbooks/` contains provisioning, patching, hardening, and decommissioning workflows.
|
||||
- `scripts/` injects scaling and failure conditions.
|
||||
- `scenarios/` documents operational exercises.
|
||||
- `examples/` stores representative outputs for review.
|
||||
|
||||
### Setup
|
||||
## How to Run
|
||||
|
||||
```bash
|
||||
# Clone and navigate to project
|
||||
cd enterprise-infra-simulator
|
||||
|
||||
# Start the infrastructure
|
||||
make up
|
||||
# Validate playbook syntax.
|
||||
make test
|
||||
|
||||
# Verify deployment
|
||||
ansible -i inventory/hosts.ini all -m ping
|
||||
```
|
||||
# Provision the simulated estate.
|
||||
make run
|
||||
|
||||
## Available Operations
|
||||
|
||||
### Infrastructure Management
|
||||
|
||||
```bash
|
||||
# Provision new nodes
|
||||
ansible-playbook -i inventory/hosts.ini playbooks/provision.yml
|
||||
|
||||
# Apply security patches
|
||||
# Apply security patches.
|
||||
make patch
|
||||
|
||||
# Harden systems
|
||||
ansible-playbook -i inventory/hosts.ini playbooks/harden.yml
|
||||
# Apply host hardening.
|
||||
make harden
|
||||
|
||||
# Decommission nodes
|
||||
ansible-playbook -i inventory/hosts.ini playbooks/decommission.yml
|
||||
|
||||
# Destroy infrastructure
|
||||
make destroy
|
||||
# Run the failure and patch demo.
|
||||
make demo
|
||||
```
|
||||
|
||||
### Simulation Operations
|
||||
Direct Ansible commands are also supported:
|
||||
|
||||
```bash
|
||||
# Scale up infrastructure
|
||||
./scripts/simulate_scaling.sh up 5
|
||||
|
||||
# Simulate network failure
|
||||
./scripts/simulate_failure.sh --type network --duration 300
|
||||
|
||||
# Run operational scenario
|
||||
ansible-playbook -i inventory/hosts.ini scenarios/scaling_event.yml
|
||||
ansible-playbook -i inventory/hosts.ini playbooks/provision.yml
|
||||
ansible-playbook -i inventory/hosts.ini playbooks/patch.yml
|
||||
ansible-playbook -i inventory/hosts.ini playbooks/hardening.yml
|
||||
```
|
||||
|
||||
## Project Structure
|
||||
## Example Output
|
||||
|
||||
```
|
||||
enterprise-infra-simulator/
|
||||
├── inventory/ # Ansible inventory files
|
||||
│ └── hosts.ini # Dynamic host inventory
|
||||
├── playbooks/ # Ansible automation playbooks
|
||||
│ ├── provision.yml # Node provisioning
|
||||
│ ├── patch.yml # Security patching
|
||||
│ ├── harden.yml # Security hardening
|
||||
│ └── decommission.yml # Node decommissioning
|
||||
├── scripts/ # Simulation and utility scripts
|
||||
│ ├── simulate_scaling.sh # Infrastructure scaling
|
||||
│ └── simulate_failure.sh # Failure injection
|
||||
├── scenarios/ # Operational scenarios
|
||||
│ └── scaling_event.yml # Scaling scenario
|
||||
├── docker-compose.yml # Container orchestration
|
||||
├── Makefile # Build automation
|
||||
└── README.md
|
||||
```text
|
||||
PLAY RECAP *********************************************************************
|
||||
web01 : ok=21 changed=7 unreachable=0 failed=0 skipped=3 rescued=0 ignored=1
|
||||
db01 : ok=18 changed=4 unreachable=0 failed=0 skipped=5 rescued=0 ignored=1
|
||||
lb01 : ok=16 changed=3 unreachable=0 failed=0 skipped=6 rescued=0 ignored=0
|
||||
|
||||
Patch status: SUCCESS
|
||||
Updates applied: 12
|
||||
Reboot required: false
|
||||
```
|
||||
|
||||
## Inventory Management
|
||||
Additional sample evidence is available in [examples/patch-output.txt](examples/patch-output.txt) and [examples/failure-simulation.txt](examples/failure-simulation.txt).
|
||||
|
||||
The simulator uses dynamic inventory with the following groups:
|
||||
## Real-World Use Case
|
||||
|
||||
- `webservers`: Web application servers
|
||||
- `databases`: Database servers
|
||||
- `loadbalancers`: Load balancing infrastructure
|
||||
- `monitoring`: Monitoring and logging servers
|
||||
|
||||
## Playbooks
|
||||
|
||||
### Provision Playbook
|
||||
- Creates Docker containers with base Linux configurations
|
||||
- Installs required packages and services
|
||||
- Configures basic networking and security
|
||||
- Registers nodes in inventory
|
||||
|
||||
### Patch Playbook
|
||||
- Updates system packages
|
||||
- Applies security patches
|
||||
- Restarts services as needed
|
||||
- Generates patch reports
|
||||
|
||||
### Harden Playbook
|
||||
- Implements CIS security benchmarks
|
||||
- Configures firewall rules
|
||||
- Hardens SSH configuration
|
||||
- Disables unnecessary services
|
||||
|
||||
### Decommission Playbook
|
||||
- Gracefully stops services
|
||||
- Exports configuration and data
|
||||
- Removes containers
|
||||
- Cleans up inventory
|
||||
|
||||
## Simulation Scripts
|
||||
|
||||
### Scaling Simulation
|
||||
```bash
|
||||
./scripts/simulate_scaling.sh [up|down] [count] [type]
|
||||
```
|
||||
|
||||
Parameters:
|
||||
- `direction`: up/down
|
||||
- `count`: Number of nodes to add/remove
|
||||
- `type`: Node type (web/db/lb/monitor)
|
||||
|
||||
### Failure Simulation
|
||||
```bash
|
||||
./scripts/simulate_failure.sh --type [failure_type] --duration [seconds]
|
||||
```
|
||||
|
||||
Failure Types:
|
||||
- `network`: Network connectivity issues
|
||||
- `disk`: Disk space exhaustion
|
||||
- `service`: Service failures
|
||||
- `node`: Complete node outages
|
||||
|
||||
## Scenarios
|
||||
|
||||
Pre-defined operational scenarios for testing:
|
||||
|
||||
- **Scaling Event:** Automated scaling during traffic spikes
|
||||
- **Disaster Recovery:** Node failure and recovery procedures
|
||||
- **Maintenance Window:** Scheduled patching and updates
|
||||
- **Security Incident:** Breach simulation and response
|
||||
|
||||
## Configuration
|
||||
|
||||
### Environment Variables
|
||||
|
||||
```bash
|
||||
# Number of initial nodes
|
||||
INFRA_NODE_COUNT=3
|
||||
|
||||
# Node types to deploy
|
||||
INFRA_NODE_TYPES=web,db,lb
|
||||
|
||||
# Simulation parameters
|
||||
SIMULATION_DURATION=3600
|
||||
SIMULATION_INTENSITY=medium
|
||||
```
|
||||
|
||||
### Docker Configuration
|
||||
|
||||
Container resources and networking are configured in `docker-compose.yml`:
|
||||
|
||||
```yaml
|
||||
services:
|
||||
infra-node:
|
||||
image: ubuntu:20.04
|
||||
deploy:
|
||||
replicas: 3
|
||||
resources:
|
||||
limits:
|
||||
memory: 512M
|
||||
cpus: '0.5'
|
||||
```
|
||||
|
||||
## Monitoring and Logging
|
||||
|
||||
- Ansible execution logs: `ansible.log`
|
||||
- Container logs: `docker logs <container-name>`
|
||||
- Simulation logs: `logs/simulation.log`
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Common Issues
|
||||
|
||||
**Ansible Connection Failures:**
|
||||
```bash
|
||||
# Check container status
|
||||
docker ps | grep infra-sim
|
||||
|
||||
# Verify SSH connectivity
|
||||
ansible -i inventory/hosts.ini all -m ping
|
||||
```
|
||||
|
||||
**Container Resource Issues:**
|
||||
```bash
|
||||
# Check Docker resources
|
||||
docker system df
|
||||
|
||||
# Clean up containers
|
||||
docker system prune
|
||||
```
|
||||
|
||||
**Simulation Script Errors:**
|
||||
```bash
|
||||
# Check script permissions
|
||||
chmod +x scripts/*.sh
|
||||
|
||||
# Verify dependencies
|
||||
./scripts/simulate_failure.sh --help
|
||||
```
|
||||
|
||||
## Development
|
||||
|
||||
### Adding New Playbooks
|
||||
|
||||
1. Create playbook in `playbooks/` directory
|
||||
2. Follow Ansible best practices
|
||||
3. Test with `--check` mode
|
||||
4. Update documentation
|
||||
|
||||
### Custom Scenarios
|
||||
|
||||
1. Define scenario in `scenarios/` directory
|
||||
2. Include required variables
|
||||
3. Test with dry-run
|
||||
4. Document operational procedures
|
||||
|
||||
## Security Considerations
|
||||
|
||||
- Containers run with limited privileges
|
||||
- SSH keys are generated per deployment
|
||||
- Firewall rules are applied automatically
|
||||
- Security scanning integrated in CI/CD
|
||||
|
||||
## Performance Optimization
|
||||
|
||||
- Container resource limits prevent resource exhaustion
|
||||
- Ansible parallel execution for faster operations
|
||||
- Efficient failure simulation without full outages
|
||||
- Optimized Docker layer caching
|
||||
|
||||
## Contributing
|
||||
|
||||
1. Follow existing code structure and naming conventions
|
||||
2. Add comprehensive documentation
|
||||
3. Include tests for new functionality
|
||||
4. Update runbooks for operational changes
|
||||
|
||||
## License
|
||||
|
||||
Enterprise Internal Use Only
|
||||
A platform team can use this project to demonstrate how routine operating procedures are encoded, reviewed, and tested before production change windows. The same patterns apply to regulated Linux estates where patch evidence, hardening controls, and incident drills must be repeatable.
|
||||
|
||||
@@ -0,0 +1,30 @@
|
||||
# Enterprise Infrastructure Simulator Architecture
|
||||
|
||||
## Components
|
||||
|
||||
- Operator interface: `make` targets and direct Ansible commands.
|
||||
- Inventory: static host groups in `inventory/hosts.ini`.
|
||||
- Automation: lifecycle playbooks in `playbooks/`.
|
||||
- Simulation scripts: controlled failure and scaling events in `scripts/`.
|
||||
- Evidence: logs, reports, scenario notes, and examples.
|
||||
|
||||
## Data Flow
|
||||
|
||||
```
|
||||
Operator
|
||||
-> Make target or shell script
|
||||
-> Ansible inventory
|
||||
-> lifecycle playbook
|
||||
-> managed Linux node
|
||||
-> log/report artifact
|
||||
```
|
||||
|
||||
Failure drills follow a parallel flow:
|
||||
|
||||
```
|
||||
Operator -> simulate_failure.sh -> target node/service -> health check -> patch/hardening playbook -> evidence
|
||||
```
|
||||
|
||||
## Notes
|
||||
|
||||
The project favors explicit playbooks over hidden orchestration so the operational intent is visible during review. In a production implementation, the same workflows would typically run from a CI runner or automation controller with credentials supplied by a secret manager.
|
||||
@@ -0,0 +1,8 @@
|
||||
2026-04-29 02:13:41 - Starting failure simulation: service 30 web
|
||||
2026-04-29 02:13:41 - Simulating service failures on containers: web
|
||||
2026-04-29 02:13:42 - Stopping services in container enterprise-web-1
|
||||
2026-04-29 02:13:44 - Health probe failed: http://web01/health returned 503
|
||||
2026-04-29 02:14:12 - Cleaning up failure simulation
|
||||
2026-04-29 02:14:13 - Restarted nginx in enterprise-web-1
|
||||
2026-04-29 02:14:18 - Health probe recovered: http://web01/health returned 200
|
||||
2026-04-29 02:14:18 - Failure simulation completed successfully
|
||||
@@ -0,0 +1,33 @@
|
||||
PLAY [Apply Security Patches and Updates] **************************************
|
||||
|
||||
TASK [Update package cache] *****************************************************
|
||||
changed: [web01]
|
||||
changed: [db01]
|
||||
ok: [lb01]
|
||||
|
||||
TASK [Check for available updates] **********************************************
|
||||
ok: [web01] => {"stdout": "9"}
|
||||
ok: [db01] => {"stdout": "4"}
|
||||
ok: [lb01] => {"stdout": "0"}
|
||||
|
||||
TASK [Apply security updates only] **********************************************
|
||||
changed: [web01]
|
||||
changed: [db01]
|
||||
ok: [lb01]
|
||||
|
||||
TASK [Verify critical services] *************************************************
|
||||
ok: [web01] => (item=systemd-journald)
|
||||
ok: [web01] => (item=cron)
|
||||
ok: [db01] => (item=systemd-journald)
|
||||
ok: [lb01] => (item=cron)
|
||||
|
||||
PLAY RECAP *********************************************************************
|
||||
web01 : ok=19 changed=6 unreachable=0 failed=0 skipped=2 rescued=0 ignored=1
|
||||
db01 : ok=18 changed=5 unreachable=0 failed=0 skipped=2 rescued=0 ignored=1
|
||||
lb01 : ok=15 changed=1 unreachable=0 failed=0 skipped=4 rescued=0 ignored=0
|
||||
|
||||
Patch report
|
||||
Status: SUCCESS
|
||||
Window: 02:00-04:00 UTC
|
||||
Reboot required: false
|
||||
Notification: infra-team@example.com
|
||||
@@ -0,0 +1,21 @@
|
||||
# Scenario: Simulate Failure and Patch
|
||||
|
||||
## Description
|
||||
|
||||
Validate that a service-level failure can be detected, recovered, and followed by a controlled patch workflow. This mirrors a maintenance window where a degraded node is stabilized before package updates are applied.
|
||||
|
||||
## Commands
|
||||
|
||||
```bash
|
||||
cd enterprise-infra-simulator
|
||||
./scripts/simulate_failure.sh service 30 web
|
||||
ansible-playbook -i inventory/hosts.ini playbooks/patch.yml
|
||||
ansible-playbook -i inventory/hosts.ini playbooks/hardening.yml --check
|
||||
```
|
||||
|
||||
## Expected Result
|
||||
|
||||
- The simulation records a temporary service failure.
|
||||
- The service is restored after cleanup.
|
||||
- The patch playbook completes without unreachable hosts.
|
||||
- Hardening check mode reports no destructive changes.
|
||||
Reference in New Issue
Block a user