35e6b139fc
ci / validate (push) Failing after 1m8s
Rework portfolio around Linux operations, Zabbix monitoring, migration validation, and ELK/Grafana log observability. Add AAP-style LVM resize workflow, Zabbix server/proxy/agent automation assets, Linux/AIX monitoring templates, and updated validation CI.
30 lines
1.0 KiB
Markdown
30 lines
1.0 KiB
Markdown
# Zabbix Maintenance Runbook
|
|
|
|
## Server Checks
|
|
|
|
- Confirm Zabbix server process and web frontend availability.
|
|
- Check database health, free space, and slow queries.
|
|
- Review cache usage, poller utilization, and housekeeper activity.
|
|
- Confirm recent values are arriving for representative Linux and AIX hosts.
|
|
|
|
## Proxy Checks
|
|
|
|
- Confirm proxy last seen timestamp.
|
|
- Check proxy queue and delayed values.
|
|
- Validate proxy database size and filesystem usage.
|
|
- Confirm active/passive connectivity based on proxy mode.
|
|
|
|
## Template Maintenance
|
|
|
|
- Import templates in a controlled window.
|
|
- Watch unsupported items after import.
|
|
- Validate a small canary host group before wider rollout.
|
|
- Document changed triggers and thresholds.
|
|
|
|
## Common Failure Modes
|
|
|
|
- Agent unreachable: check DNS, firewall, agent service, proxy route.
|
|
- Unsupported item: check key spelling, OS capability, agent version, user parameter.
|
|
- Proxy backlog: check WAN, DB size, proxy process, server availability.
|
|
- Alert noise: review trigger thresholds and dependency design.
|