Files
portfolio/professional-infra/zabbix-monitoring-incident-response/docs/maintenance-runbook.md
T
Mateusz Suski 35e6b139fc
ci / validate (push) Failing after 1m8s
Initial CV-aligned infrastructure portfolio
Rework portfolio around Linux operations, Zabbix monitoring, migration validation, and ELK/Grafana log observability.

Add AAP-style LVM resize workflow, Zabbix server/proxy/agent automation assets, Linux/AIX monitoring templates, and updated validation CI.
2026-05-04 17:37:24 +00:00

1.0 KiB

Zabbix Maintenance Runbook

Server Checks

  • Confirm Zabbix server process and web frontend availability.
  • Check database health, free space, and slow queries.
  • Review cache usage, poller utilization, and housekeeper activity.
  • Confirm recent values are arriving for representative Linux and AIX hosts.

Proxy Checks

  • Confirm proxy last seen timestamp.
  • Check proxy queue and delayed values.
  • Validate proxy database size and filesystem usage.
  • Confirm active/passive connectivity based on proxy mode.

Template Maintenance

  • Import templates in a controlled window.
  • Watch unsupported items after import.
  • Validate a small canary host group before wider rollout.
  • Document changed triggers and thresholds.

Common Failure Modes

  • Agent unreachable: check DNS, firewall, agent service, proxy route.
  • Unsupported item: check key spelling, OS capability, agent version, user parameter.
  • Proxy backlog: check WAN, DB size, proxy process, server availability.
  • Alert noise: review trigger thresholds and dependency design.