Rework portfolio around Linux operations, Zabbix monitoring, migration validation, and ELK/Grafana log observability. Add AAP-style LVM resize workflow, Zabbix server/proxy/agent automation assets, Linux/AIX monitoring templates, and updated validation CI.
This commit is contained in:
+30
@@ -0,0 +1,30 @@
|
||||
# Incident Response Runbook
|
||||
|
||||
## Filesystem Alert
|
||||
|
||||
1. Confirm current usage and growth trend.
|
||||
2. Check whether the host is Linux or AIX and use the correct runbook.
|
||||
3. Validate application ownership of the filesystem.
|
||||
4. Clean known temporary paths or request LVM expansion when approved.
|
||||
5. Attach before/after evidence to the incident ticket.
|
||||
|
||||
## Agent Unreachable
|
||||
|
||||
1. Confirm whether data loss affects one host, one proxy, or one network segment.
|
||||
2. Check proxy queue and last seen timestamp.
|
||||
3. Validate agent service state and firewall path.
|
||||
4. For active checks, confirm `ServerActive` and hostname match.
|
||||
|
||||
## Proxy Backlog
|
||||
|
||||
1. Check server reachability from proxy.
|
||||
2. Check proxy DB filesystem usage.
|
||||
3. Confirm whether config sync recently changed.
|
||||
4. Reduce noise by temporarily disabling non-critical discovery rules if required.
|
||||
|
||||
## Unsupported Items
|
||||
|
||||
1. Identify affected template and item key.
|
||||
2. Check whether item is Linux-specific or AIX-specific.
|
||||
3. Validate agent version and custom user parameters.
|
||||
4. Roll back template change if canary host group is affected.
|
||||
Reference in New Issue
Block a user