Initial CV-aligned infrastructure portfolio
ci / validate (push) Failing after 1m8s

Rework portfolio around Linux operations, Zabbix monitoring, migration validation, and ELK/Grafana log observability.

Add AAP-style LVM resize workflow, Zabbix server/proxy/agent automation assets, Linux/AIX monitoring templates, and updated validation CI.
This commit is contained in:
Mateusz Suski
2026-05-04 17:37:24 +00:00
commit 35e6b139fc
114 changed files with 6422 additions and 0 deletions
@@ -0,0 +1,30 @@
# Incident Response Runbook
## Filesystem Alert
1. Confirm current usage and growth trend.
2. Check whether the host is Linux or AIX and use the correct runbook.
3. Validate application ownership of the filesystem.
4. Clean known temporary paths or request LVM expansion when approved.
5. Attach before/after evidence to the incident ticket.
## Agent Unreachable
1. Confirm whether data loss affects one host, one proxy, or one network segment.
2. Check proxy queue and last seen timestamp.
3. Validate agent service state and firewall path.
4. For active checks, confirm `ServerActive` and hostname match.
## Proxy Backlog
1. Check server reachability from proxy.
2. Check proxy DB filesystem usage.
3. Confirm whether config sync recently changed.
4. Reduce noise by temporarily disabling non-critical discovery rules if required.
## Unsupported Items
1. Identify affected template and item key.
2. Check whether item is Linux-specific or AIX-specific.
3. Validate agent version and custom user parameters.
4. Roll back template change if canary host group is affected.
@@ -0,0 +1,29 @@
# Zabbix Maintenance Runbook
## Server Checks
- Confirm Zabbix server process and web frontend availability.
- Check database health, free space, and slow queries.
- Review cache usage, poller utilization, and housekeeper activity.
- Confirm recent values are arriving for representative Linux and AIX hosts.
## Proxy Checks
- Confirm proxy last seen timestamp.
- Check proxy queue and delayed values.
- Validate proxy database size and filesystem usage.
- Confirm active/passive connectivity based on proxy mode.
## Template Maintenance
- Import templates in a controlled window.
- Watch unsupported items after import.
- Validate a small canary host group before wider rollout.
- Document changed triggers and thresholds.
## Common Failure Modes
- Agent unreachable: check DNS, firewall, agent service, proxy route.
- Unsupported item: check key spelling, OS capability, agent version, user parameter.
- Proxy backlog: check WAN, DB size, proxy process, server availability.
- Alert noise: review trigger thresholds and dependency design.
@@ -0,0 +1,27 @@
# Zabbix Proxy Design
## Purpose
Zabbix proxies reduce dependency on direct connectivity between the central server and monitored hosts. They are useful for client networks, segmented environments, remote sites, and maintenance windows.
## Active Proxy
- Proxy connects to the Zabbix server.
- Good for restricted networks where inbound access to the proxy is not allowed.
- Hosts can use active agent checks against the proxy.
- Main operational checks: proxy last seen, delayed values, local DB size, config sync.
## Passive Proxy
- Zabbix server connects to the proxy.
- Useful when central server can reach the proxy network.
- Requires firewall rules from server to proxy.
- Main operational checks: proxy listener, network latency, poller load.
## Operational Signals
- Proxy queue growth.
- Unsupported items after template changes.
- Agent unreachable or active checks delayed.
- Proxy DB growth during WAN outage.
- Config sync failures after maintenance.