Rework portfolio around Linux operations, Zabbix monitoring, migration validation, and ELK/Grafana log observability. Add AAP-style LVM resize workflow, Zabbix server/proxy/agent automation assets, Linux/AIX monitoring templates, and updated validation CI.
This commit is contained in:
@@ -0,0 +1,29 @@
|
||||
# Zabbix Maintenance Runbook
|
||||
|
||||
## Server Checks
|
||||
|
||||
- Confirm Zabbix server process and web frontend availability.
|
||||
- Check database health, free space, and slow queries.
|
||||
- Review cache usage, poller utilization, and housekeeper activity.
|
||||
- Confirm recent values are arriving for representative Linux and AIX hosts.
|
||||
|
||||
## Proxy Checks
|
||||
|
||||
- Confirm proxy last seen timestamp.
|
||||
- Check proxy queue and delayed values.
|
||||
- Validate proxy database size and filesystem usage.
|
||||
- Confirm active/passive connectivity based on proxy mode.
|
||||
|
||||
## Template Maintenance
|
||||
|
||||
- Import templates in a controlled window.
|
||||
- Watch unsupported items after import.
|
||||
- Validate a small canary host group before wider rollout.
|
||||
- Document changed triggers and thresholds.
|
||||
|
||||
## Common Failure Modes
|
||||
|
||||
- Agent unreachable: check DNS, firewall, agent service, proxy route.
|
||||
- Unsupported item: check key spelling, OS capability, agent version, user parameter.
|
||||
- Proxy backlog: check WAN, DB size, proxy process, server availability.
|
||||
- Alert noise: review trigger thresholds and dependency design.
|
||||
Reference in New Issue
Block a user