Files
portfolio/professional-infra/zabbix-monitoring-incident-response/docs/incident-response-runbook.md
T
Mateusz Suski 35e6b139fc
ci / validate (push) Failing after 1m8s
Initial CV-aligned infrastructure portfolio
Rework portfolio around Linux operations, Zabbix monitoring, migration validation, and ELK/Grafana log observability.

Add AAP-style LVM resize workflow, Zabbix server/proxy/agent automation assets, Linux/AIX monitoring templates, and updated validation CI.
2026-05-04 17:37:24 +00:00

31 lines
1.1 KiB
Markdown

# Incident Response Runbook
## Filesystem Alert
1. Confirm current usage and growth trend.
2. Check whether the host is Linux or AIX and use the correct runbook.
3. Validate application ownership of the filesystem.
4. Clean known temporary paths or request LVM expansion when approved.
5. Attach before/after evidence to the incident ticket.
## Agent Unreachable
1. Confirm whether data loss affects one host, one proxy, or one network segment.
2. Check proxy queue and last seen timestamp.
3. Validate agent service state and firewall path.
4. For active checks, confirm `ServerActive` and hostname match.
## Proxy Backlog
1. Check server reachability from proxy.
2. Check proxy DB filesystem usage.
3. Confirm whether config sync recently changed.
4. Reduce noise by temporarily disabling non-critical discovery rules if required.
## Unsupported Items
1. Identify affected template and item key.
2. Check whether item is Linux-specific or AIX-specific.
3. Validate agent version and custom user parameters.
4. Roll back template change if canary host group is affected.