2026-05-05 21:26:02 +00:00
# Linux Operations Bash Toolkit
Small, practical Bash scripts for Linux operations checks and incident triage. The scripts are sanitized examples inspired by production Linux operations work and avoid destructive actions or root-only assumptions.
2026-05-06 06:36:53 +00:00
## Diagram
``` mermaid
flowchart TD
A["bash"] --> B["os-healthcheck"]
2026-05-11 18:49:00 +00:00
A --> C["incident-checks"]
A --> D["disk-full"]
A --> E["veritas"]
A --> F["gpfs"]
2026-05-06 06:36:53 +00:00
B --> B1["Host diagnostics"]
2026-05-11 18:49:00 +00:00
C --> C1["Standalone triage checks"]
D --> D1["Incident workflow"]
E --> E1["VxVM and VCS change flow"]
F --> F1["Spectrum Scale expansion flow"]
2026-05-06 06:36:53 +00:00
```
2026-05-05 21:26:02 +00:00
## Scripts
2026-05-05 21:50:20 +00:00
- `os-healthcheck/healthcheck.sh` - general host health overview.
- `os-healthcheck/disk_check.sh` - filesystem usage threshold check.
- `os-healthcheck/service_check.sh` - critical service status check.
- `os-healthcheck/system_report.sh` - writes a timestamped system report to `/tmp` .
- `os-healthcheck/network_troubleshoot.sh` - local and optional remote network diagnostics.
2026-05-11 18:49:00 +00:00
- `incident-checks/` - standalone read-only incident checks for CPU, memory/OOM, services, SSH failures, TLS certificates, DNS, NTP, filesystems, inodes, and JVM diagnostics.
2026-05-05 21:26:02 +00:00
## Usage
``` bash
2026-05-05 21:50:20 +00:00
cd infra-run/scripts/bash/os-healthcheck
2026-05-05 21:26:02 +00:00
./healthcheck.sh
./disk_check.sh
./disk_check.sh 90
./service_check.sh
./service_check.sh sshd nginx zabbix-agent
./system_report.sh
./network_troubleshoot.sh
./network_troubleshoot.sh google.com
2026-05-11 18:49:00 +00:00
cd ../incident-checks
./check_high_cpu.sh
./check_high_memory_oom.sh --since "24 hours ago"
./check_service_restart_loop.sh --service sshd
./check_certificate_expiry.sh --host example.com
2026-05-05 21:26:02 +00:00
```
2026-05-10 11:11:03 +00:00
## Standards
- Scripts use Bash and should keep `#!/usr/bin/env bash` plus strict mode.
- Read-only checks should report missing tools without hiding the problem.
- Change-capable scripts must default to dry-run behavior and require explicit `--execute` .
- Output should use `OK` , `WARNING` , and `CRITICAL` where practical.
- Validate changed scripts with `./scripts/check-bash.sh` from the repository root.
2026-05-05 21:26:02 +00:00
## Exit Codes
`disk_check.sh` :
- `0` - all filesystems are below the threshold.
- `1` - one or more filesystems are at or above the threshold.
- `2` - invalid threshold input.
`service_check.sh` :
- `0` - all checked services are active.
- `1` - at least one service is inactive, failed, missing, or cannot be checked.
`network_troubleshoot.sh` :
- `0` - no obvious local, DNS, or connectivity issue detected.
- `1` - DNS, interface, gateway, or target connectivity problems detected.
`healthcheck.sh` and `system_report.sh` are informational. They print warnings for missing tools where possible.
## Notes
- Requires Bash.
- Designed for RHEL, Oracle Linux, and Ubuntu style systems.
- Handles missing tools such as `ss` , `traceroute` , `nc` , and `journalctl` gracefully.
- Does not require root and does not make system changes.