Add standalone Bash incident check scripts
lint / shell-yaml-ansible (push) Failing after 16s

This commit is contained in:
Mateusz Suski
2026-05-11 18:49:00 +00:00
parent 8a7b7c5abc
commit e851568c8c
27 changed files with 1623 additions and 6 deletions
+15 -6
View File
@@ -7,13 +7,15 @@ Small, practical Bash scripts for Linux operations checks and incident triage. T
```mermaid
flowchart TD
A["bash"] --> B["os-healthcheck"]
A --> C["disk-full"]
A --> D["veritas"]
A --> E["gpfs"]
A --> C["incident-checks"]
A --> D["disk-full"]
A --> E["veritas"]
A --> F["gpfs"]
B --> B1["Host diagnostics"]
C --> C1["Incident workflow"]
D --> D1["VxVM and VCS change flow"]
E --> E1["Spectrum Scale expansion flow"]
C --> C1["Standalone triage checks"]
D --> D1["Incident workflow"]
E --> E1["VxVM and VCS change flow"]
F --> F1["Spectrum Scale expansion flow"]
```
## Scripts
@@ -23,6 +25,7 @@ flowchart TD
- `os-healthcheck/service_check.sh` - critical service status check.
- `os-healthcheck/system_report.sh` - writes a timestamped system report to `/tmp`.
- `os-healthcheck/network_troubleshoot.sh` - local and optional remote network diagnostics.
- `incident-checks/` - standalone read-only incident checks for CPU, memory/OOM, services, SSH failures, TLS certificates, DNS, NTP, filesystems, inodes, and JVM diagnostics.
## Usage
@@ -37,6 +40,12 @@ cd infra-run/scripts/bash/os-healthcheck
./system_report.sh
./network_troubleshoot.sh
./network_troubleshoot.sh google.com
cd ../incident-checks
./check_high_cpu.sh
./check_high_memory_oom.sh --since "24 hours ago"
./check_service_restart_loop.sh --service sshd
./check_certificate_expiry.sh --host example.com
```
## Standards