e03865b453
lint / shell-yaml-ansible (push) Failing after 17s
revert Add L2 incident triage report wrapper
Linux Operations Bash Toolkit
Small, practical Bash scripts for Linux operations checks and incident triage. The scripts are sanitized examples inspired by production Linux operations work and avoid destructive actions or root-only assumptions.
Diagram
flowchart TD
A["bash"] --> B["os-healthcheck"]
A --> C["incident-checks"]
A --> D["disk-full"]
A --> E["veritas"]
A --> F["gpfs"]
B --> B1["Host diagnostics"]
C --> C1["Standalone triage checks"]
D --> D1["Incident workflow"]
E --> E1["VxVM and VCS change flow"]
F --> F1["Spectrum Scale expansion flow"]
Scripts
os-healthcheck/healthcheck.sh- general host health overview.os-healthcheck/disk_check.sh- filesystem usage threshold check.os-healthcheck/service_check.sh- critical service status check.os-healthcheck/system_report.sh- writes a timestamped system report to/tmp.os-healthcheck/network_troubleshoot.sh- local and optional remote network diagnostics.incident-checks/- standalone read-only incident checks for CPU, memory/OOM, services, SSH failures, TLS certificates, DNS, NTP, filesystems, inodes, and JVM diagnostics.
Usage
cd infra-run/scripts/bash/os-healthcheck
./healthcheck.sh
./disk_check.sh
./disk_check.sh 90
./service_check.sh
./service_check.sh sshd nginx zabbix-agent
./system_report.sh
./network_troubleshoot.sh
./network_troubleshoot.sh google.com
cd ../incident-checks
./check_high_cpu.sh
./check_high_memory_oom.sh --since "24 hours ago"
./check_service_restart_loop.sh --service sshd
./check_certificate_expiry.sh --host example.com
Standards
- Scripts use Bash and should keep
#!/usr/bin/env bashplus strict mode. - Read-only checks should report missing tools without hiding the problem.
- Change-capable scripts must default to dry-run behavior and require explicit
--execute. - Output should use
OK,WARNING, andCRITICALwhere practical. - Validate changed scripts with
./scripts/check-bash.shfrom the repository root.
Exit Codes
disk_check.sh:
0- all filesystems are below the threshold.1- one or more filesystems are at or above the threshold.2- invalid threshold input.
service_check.sh:
0- all checked services are active.1- at least one service is inactive, failed, missing, or cannot be checked.
network_troubleshoot.sh:
0- no obvious local, DNS, or connectivity issue detected.1- DNS, interface, gateway, or target connectivity problems detected.
healthcheck.sh and system_report.sh are informational. They print warnings for missing tools where possible.
Notes
- Requires Bash.
- Designed for RHEL, Oracle Linux, and Ubuntu style systems.
- Handles missing tools such as
ss,traceroute,nc, andjournalctlgracefully. - Does not require root and does not make system changes.