35e6b139fc
ci / validate (push) Failing after 1m8s
Rework portfolio around Linux operations, Zabbix monitoring, migration validation, and ELK/Grafana log observability. Add AAP-style LVM resize workflow, Zabbix server/proxy/agent automation assets, Linux/AIX monitoring templates, and updated validation CI.
1.0 KiB
1.0 KiB
Zabbix Maintenance Runbook
Server Checks
- Confirm Zabbix server process and web frontend availability.
- Check database health, free space, and slow queries.
- Review cache usage, poller utilization, and housekeeper activity.
- Confirm recent values are arriving for representative Linux and AIX hosts.
Proxy Checks
- Confirm proxy last seen timestamp.
- Check proxy queue and delayed values.
- Validate proxy database size and filesystem usage.
- Confirm active/passive connectivity based on proxy mode.
Template Maintenance
- Import templates in a controlled window.
- Watch unsupported items after import.
- Validate a small canary host group before wider rollout.
- Document changed triggers and thresholds.
Common Failure Modes
- Agent unreachable: check DNS, firewall, agent service, proxy route.
- Unsupported item: check key spelling, OS capability, agent version, user parameter.
- Proxy backlog: check WAN, DB size, proxy process, server availability.
- Alert noise: review trigger thresholds and dependency design.