31 lines
1.1 KiB
Markdown
31 lines
1.1 KiB
Markdown
|
|
# Incident Response Runbook
|
||
|
|
|
||
|
|
## Filesystem Alert
|
||
|
|
|
||
|
|
1. Confirm current usage and growth trend.
|
||
|
|
2. Check whether the host is Linux or AIX and use the correct runbook.
|
||
|
|
3. Validate application ownership of the filesystem.
|
||
|
|
4. Clean known temporary paths or request LVM expansion when approved.
|
||
|
|
5. Attach before/after evidence to the incident ticket.
|
||
|
|
|
||
|
|
## Agent Unreachable
|
||
|
|
|
||
|
|
1. Confirm whether data loss affects one host, one proxy, or one network segment.
|
||
|
|
2. Check proxy queue and last seen timestamp.
|
||
|
|
3. Validate agent service state and firewall path.
|
||
|
|
4. For active checks, confirm `ServerActive` and hostname match.
|
||
|
|
|
||
|
|
## Proxy Backlog
|
||
|
|
|
||
|
|
1. Check server reachability from proxy.
|
||
|
|
2. Check proxy DB filesystem usage.
|
||
|
|
3. Confirm whether config sync recently changed.
|
||
|
|
4. Reduce noise by temporarily disabling non-critical discovery rules if required.
|
||
|
|
|
||
|
|
## Unsupported Items
|
||
|
|
|
||
|
|
1. Identify affected template and item key.
|
||
|
|
2. Check whether item is Linux-specific or AIX-specific.
|
||
|
|
3. Validate agent version and custom user parameters.
|
||
|
|
4. Roll back template change if canary host group is affected.
|