Polish infrastructure portfolio projects
ci / validate (push) Waiting to run

This commit is contained in:
Mateusz Suski
2026-04-29 23:30:30 +00:00
parent b0537b4bff
commit 8783892241
34 changed files with 762 additions and 1226 deletions
+10 -3
View File
@@ -1,6 +1,6 @@
# Enterprise Infrastructure Simulator Makefile
.PHONY: help up down patch destroy status logs clean test
.PHONY: help run demo up down patch destroy status logs clean test
# Default target
help: ## Show this help message
@@ -9,6 +9,13 @@ help: ## Show this help message
@echo "Available commands:"
@grep -E '^[a-zA-Z_-]+:.*?## .*$$' $(MAKEFILE_LIST) | sort | awk 'BEGIN {FS = ":.*?## "}; {printf " %-15s %s\n", $$1, $$2}'
run: ## Run the default simulator workflow
ansible-playbook -i inventory/hosts.ini playbooks/provision.yml
demo: ## Run a failure-and-patch demonstration
./scripts/simulate_failure.sh service 30 web
ansible-playbook -i inventory/hosts.ini playbooks/patch.yml
# Infrastructure management
up: ## Start the infrastructure simulation
@echo "Starting enterprise infrastructure simulation..."
@@ -144,7 +151,7 @@ format: ## Format code and configuration
# Security
harden: ## Apply security hardening
@echo "Applying security hardening..."
ansible-playbook -i inventory/hosts.ini playbooks/harden.yml
ansible-playbook -i inventory/hosts.ini playbooks/hardening.yml
security-scan: ## Run security scans
@echo "Running security scans..."
@@ -163,4 +170,4 @@ help-failure: ## Show failure simulation commands
@echo " make fail-network DURATION=60 - Network failure for 60s"
@echo " make fail-disk DURATION=120 - Disk exhaustion for 120s"
@echo " make fail-service DURATION=30 - Service failure for 30s"
@echo " make fail-node DURATION=300 - Node failure for 300s"
@echo " make fail-node DURATION=300 - Node failure for 300s"
+44 -238
View File
@@ -1,268 +1,74 @@
# Enterprise Infrastructure Simulator
A container-based simulation environment for enterprise Linux infrastructure operations. This project provides Ansible automation for provisioning, patching, hardening, and decommissioning of simulated Linux nodes, along with scripts for scaling and failure simulation.
## Problem Statement
## Overview
Infrastructure teams need a safe place to rehearse lifecycle operations before applying them to production fleets. Patch windows, hardening changes, scale events, and node failures all carry operational risk when they are tested only during real incidents.
The Enterprise Infrastructure Simulator creates a realistic environment for testing and demonstrating infrastructure automation at scale. It uses Docker containers to simulate multiple Linux nodes and provides comprehensive Ansible playbooks for enterprise operations.
## Solution Overview
## Architecture
This project models common Linux infrastructure operations with Ansible playbooks and shell-based simulations. It keeps the automation readable and auditable while producing example evidence that resembles a real change record.
- **Container Simulation:** Docker-based Linux nodes with realistic configurations
- **Ansible Automation:** Modular playbooks for infrastructure lifecycle management
- **Dynamic Inventory:** Automated host discovery and grouping
- **Simulation Scripts:** Automated scaling and failure injection
- **Scenario Management:** Pre-defined operational scenarios
## Architecture Overview
## Quick Start
```
Operator -> Make/CLI -> Ansible Inventory -> Playbooks -> Linux Nodes
| |
v v
Scenarios Reports/Logs
```
### Prerequisites
Core components:
- Docker and Docker Compose
- Ansible 2.9+
- Make
- `inventory/hosts.ini` defines managed node groups.
- `playbooks/` contains provisioning, patching, hardening, and decommissioning workflows.
- `scripts/` injects scaling and failure conditions.
- `scenarios/` documents operational exercises.
- `examples/` stores representative outputs for review.
### Setup
## How to Run
```bash
# Clone and navigate to project
cd enterprise-infra-simulator
# Start the infrastructure
make up
# Validate playbook syntax.
make test
# Verify deployment
ansible -i inventory/hosts.ini all -m ping
```
# Provision the simulated estate.
make run
## Available Operations
### Infrastructure Management
```bash
# Provision new nodes
ansible-playbook -i inventory/hosts.ini playbooks/provision.yml
# Apply security patches
# Apply security patches.
make patch
# Harden systems
ansible-playbook -i inventory/hosts.ini playbooks/harden.yml
# Apply host hardening.
make harden
# Decommission nodes
ansible-playbook -i inventory/hosts.ini playbooks/decommission.yml
# Destroy infrastructure
make destroy
# Run the failure and patch demo.
make demo
```
### Simulation Operations
Direct Ansible commands are also supported:
```bash
# Scale up infrastructure
./scripts/simulate_scaling.sh up 5
# Simulate network failure
./scripts/simulate_failure.sh --type network --duration 300
# Run operational scenario
ansible-playbook -i inventory/hosts.ini scenarios/scaling_event.yml
ansible-playbook -i inventory/hosts.ini playbooks/provision.yml
ansible-playbook -i inventory/hosts.ini playbooks/patch.yml
ansible-playbook -i inventory/hosts.ini playbooks/hardening.yml
```
## Project Structure
## Example Output
```
enterprise-infra-simulator/
├── inventory/ # Ansible inventory files
│ └── hosts.ini # Dynamic host inventory
├── playbooks/ # Ansible automation playbooks
│ ├── provision.yml # Node provisioning
│ ├── patch.yml # Security patching
│ ├── harden.yml # Security hardening
│ └── decommission.yml # Node decommissioning
├── scripts/ # Simulation and utility scripts
│ ├── simulate_scaling.sh # Infrastructure scaling
│ └── simulate_failure.sh # Failure injection
├── scenarios/ # Operational scenarios
│ └── scaling_event.yml # Scaling scenario
├── docker-compose.yml # Container orchestration
├── Makefile # Build automation
└── README.md
```text
PLAY RECAP *********************************************************************
web01 : ok=21 changed=7 unreachable=0 failed=0 skipped=3 rescued=0 ignored=1
db01 : ok=18 changed=4 unreachable=0 failed=0 skipped=5 rescued=0 ignored=1
lb01 : ok=16 changed=3 unreachable=0 failed=0 skipped=6 rescued=0 ignored=0
Patch status: SUCCESS
Updates applied: 12
Reboot required: false
```
## Inventory Management
Additional sample evidence is available in [examples/patch-output.txt](examples/patch-output.txt) and [examples/failure-simulation.txt](examples/failure-simulation.txt).
The simulator uses dynamic inventory with the following groups:
## Real-World Use Case
- `webservers`: Web application servers
- `databases`: Database servers
- `loadbalancers`: Load balancing infrastructure
- `monitoring`: Monitoring and logging servers
## Playbooks
### Provision Playbook
- Creates Docker containers with base Linux configurations
- Installs required packages and services
- Configures basic networking and security
- Registers nodes in inventory
### Patch Playbook
- Updates system packages
- Applies security patches
- Restarts services as needed
- Generates patch reports
### Harden Playbook
- Implements CIS security benchmarks
- Configures firewall rules
- Hardens SSH configuration
- Disables unnecessary services
### Decommission Playbook
- Gracefully stops services
- Exports configuration and data
- Removes containers
- Cleans up inventory
## Simulation Scripts
### Scaling Simulation
```bash
./scripts/simulate_scaling.sh [up|down] [count] [type]
```
Parameters:
- `direction`: up/down
- `count`: Number of nodes to add/remove
- `type`: Node type (web/db/lb/monitor)
### Failure Simulation
```bash
./scripts/simulate_failure.sh --type [failure_type] --duration [seconds]
```
Failure Types:
- `network`: Network connectivity issues
- `disk`: Disk space exhaustion
- `service`: Service failures
- `node`: Complete node outages
## Scenarios
Pre-defined operational scenarios for testing:
- **Scaling Event:** Automated scaling during traffic spikes
- **Disaster Recovery:** Node failure and recovery procedures
- **Maintenance Window:** Scheduled patching and updates
- **Security Incident:** Breach simulation and response
## Configuration
### Environment Variables
```bash
# Number of initial nodes
INFRA_NODE_COUNT=3
# Node types to deploy
INFRA_NODE_TYPES=web,db,lb
# Simulation parameters
SIMULATION_DURATION=3600
SIMULATION_INTENSITY=medium
```
### Docker Configuration
Container resources and networking are configured in `docker-compose.yml`:
```yaml
services:
infra-node:
image: ubuntu:20.04
deploy:
replicas: 3
resources:
limits:
memory: 512M
cpus: '0.5'
```
## Monitoring and Logging
- Ansible execution logs: `ansible.log`
- Container logs: `docker logs <container-name>`
- Simulation logs: `logs/simulation.log`
## Troubleshooting
### Common Issues
**Ansible Connection Failures:**
```bash
# Check container status
docker ps | grep infra-sim
# Verify SSH connectivity
ansible -i inventory/hosts.ini all -m ping
```
**Container Resource Issues:**
```bash
# Check Docker resources
docker system df
# Clean up containers
docker system prune
```
**Simulation Script Errors:**
```bash
# Check script permissions
chmod +x scripts/*.sh
# Verify dependencies
./scripts/simulate_failure.sh --help
```
## Development
### Adding New Playbooks
1. Create playbook in `playbooks/` directory
2. Follow Ansible best practices
3. Test with `--check` mode
4. Update documentation
### Custom Scenarios
1. Define scenario in `scenarios/` directory
2. Include required variables
3. Test with dry-run
4. Document operational procedures
## Security Considerations
- Containers run with limited privileges
- SSH keys are generated per deployment
- Firewall rules are applied automatically
- Security scanning integrated in CI/CD
## Performance Optimization
- Container resource limits prevent resource exhaustion
- Ansible parallel execution for faster operations
- Efficient failure simulation without full outages
- Optimized Docker layer caching
## Contributing
1. Follow existing code structure and naming conventions
2. Add comprehensive documentation
3. Include tests for new functionality
4. Update runbooks for operational changes
## License
Enterprise Internal Use Only
A platform team can use this project to demonstrate how routine operating procedures are encoded, reviewed, and tested before production change windows. The same patterns apply to regulated Linux estates where patch evidence, hardening controls, and incident drills must be repeatable.
@@ -0,0 +1,30 @@
# Enterprise Infrastructure Simulator Architecture
## Components
- Operator interface: `make` targets and direct Ansible commands.
- Inventory: static host groups in `inventory/hosts.ini`.
- Automation: lifecycle playbooks in `playbooks/`.
- Simulation scripts: controlled failure and scaling events in `scripts/`.
- Evidence: logs, reports, scenario notes, and examples.
## Data Flow
```
Operator
-> Make target or shell script
-> Ansible inventory
-> lifecycle playbook
-> managed Linux node
-> log/report artifact
```
Failure drills follow a parallel flow:
```
Operator -> simulate_failure.sh -> target node/service -> health check -> patch/hardening playbook -> evidence
```
## Notes
The project favors explicit playbooks over hidden orchestration so the operational intent is visible during review. In a production implementation, the same workflows would typically run from a CI runner or automation controller with credentials supplied by a secret manager.
@@ -0,0 +1,8 @@
2026-04-29 02:13:41 - Starting failure simulation: service 30 web
2026-04-29 02:13:41 - Simulating service failures on containers: web
2026-04-29 02:13:42 - Stopping services in container enterprise-web-1
2026-04-29 02:13:44 - Health probe failed: http://web01/health returned 503
2026-04-29 02:14:12 - Cleaning up failure simulation
2026-04-29 02:14:13 - Restarted nginx in enterprise-web-1
2026-04-29 02:14:18 - Health probe recovered: http://web01/health returned 200
2026-04-29 02:14:18 - Failure simulation completed successfully
@@ -0,0 +1,33 @@
PLAY [Apply Security Patches and Updates] **************************************
TASK [Update package cache] *****************************************************
changed: [web01]
changed: [db01]
ok: [lb01]
TASK [Check for available updates] **********************************************
ok: [web01] => {"stdout": "9"}
ok: [db01] => {"stdout": "4"}
ok: [lb01] => {"stdout": "0"}
TASK [Apply security updates only] **********************************************
changed: [web01]
changed: [db01]
ok: [lb01]
TASK [Verify critical services] *************************************************
ok: [web01] => (item=systemd-journald)
ok: [web01] => (item=cron)
ok: [db01] => (item=systemd-journald)
ok: [lb01] => (item=cron)
PLAY RECAP *********************************************************************
web01 : ok=19 changed=6 unreachable=0 failed=0 skipped=2 rescued=0 ignored=1
db01 : ok=18 changed=5 unreachable=0 failed=0 skipped=2 rescued=0 ignored=1
lb01 : ok=15 changed=1 unreachable=0 failed=0 skipped=4 rescued=0 ignored=0
Patch report
Status: SUCCESS
Window: 02:00-04:00 UTC
Reboot required: false
Notification: infra-team@example.com
@@ -0,0 +1,21 @@
# Scenario: Simulate Failure and Patch
## Description
Validate that a service-level failure can be detected, recovered, and followed by a controlled patch workflow. This mirrors a maintenance window where a degraded node is stabilized before package updates are applied.
## Commands
```bash
cd enterprise-infra-simulator
./scripts/simulate_failure.sh service 30 web
ansible-playbook -i inventory/hosts.ini playbooks/patch.yml
ansible-playbook -i inventory/hosts.ini playbooks/hardening.yml --check
```
## Expected Result
- The simulation records a temporary service failure.
- The service is restored after cleanup.
- The patch playbook completes without unreachable hosts.
- Hardening check mode reports no destructive changes.