Files
Mateusz Suski e851568c8c
lint / shell-yaml-ansible (push) Failing after 16s
Add standalone Bash incident check scripts
2026-05-11 18:49:00 +00:00

2.8 KiB

Linux Operations Bash Toolkit

Small, practical Bash scripts for Linux operations checks and incident triage. The scripts are sanitized examples inspired by production Linux operations work and avoid destructive actions or root-only assumptions.

Diagram

flowchart TD
  A["bash"] --> B["os-healthcheck"]
  A --> C["incident-checks"]
  A --> D["disk-full"]
  A --> E["veritas"]
  A --> F["gpfs"]
  B --> B1["Host diagnostics"]
  C --> C1["Standalone triage checks"]
  D --> D1["Incident workflow"]
  E --> E1["VxVM and VCS change flow"]
  F --> F1["Spectrum Scale expansion flow"]

Scripts

  • os-healthcheck/healthcheck.sh - general host health overview.
  • os-healthcheck/disk_check.sh - filesystem usage threshold check.
  • os-healthcheck/service_check.sh - critical service status check.
  • os-healthcheck/system_report.sh - writes a timestamped system report to /tmp.
  • os-healthcheck/network_troubleshoot.sh - local and optional remote network diagnostics.
  • incident-checks/ - standalone read-only incident checks for CPU, memory/OOM, services, SSH failures, TLS certificates, DNS, NTP, filesystems, inodes, and JVM diagnostics.

Usage

cd infra-run/scripts/bash/os-healthcheck

./healthcheck.sh
./disk_check.sh
./disk_check.sh 90
./service_check.sh
./service_check.sh sshd nginx zabbix-agent
./system_report.sh
./network_troubleshoot.sh
./network_troubleshoot.sh google.com

cd ../incident-checks
./check_high_cpu.sh
./check_high_memory_oom.sh --since "24 hours ago"
./check_service_restart_loop.sh --service sshd
./check_certificate_expiry.sh --host example.com

Standards

  • Scripts use Bash and should keep #!/usr/bin/env bash plus strict mode.
  • Read-only checks should report missing tools without hiding the problem.
  • Change-capable scripts must default to dry-run behavior and require explicit --execute.
  • Output should use OK, WARNING, and CRITICAL where practical.
  • Validate changed scripts with ./scripts/check-bash.sh from the repository root.

Exit Codes

disk_check.sh:

  • 0 - all filesystems are below the threshold.
  • 1 - one or more filesystems are at or above the threshold.
  • 2 - invalid threshold input.

service_check.sh:

  • 0 - all checked services are active.
  • 1 - at least one service is inactive, failed, missing, or cannot be checked.

network_troubleshoot.sh:

  • 0 - no obvious local, DNS, or connectivity issue detected.
  • 1 - DNS, interface, gateway, or target connectivity problems detected.

healthcheck.sh and system_report.sh are informational. They print warnings for missing tools where possible.

Notes

  • Requires Bash.
  • Designed for RHEL, Oracle Linux, and Ubuntu style systems.
  • Handles missing tools such as ss, traceroute, nc, and journalctl gracefully.
  • Does not require root and does not make system changes.