85 lines
2.8 KiB
Markdown
85 lines
2.8 KiB
Markdown
# Linux Operations Bash Toolkit
|
|
|
|
Small, practical Bash scripts for Linux operations checks and incident triage. The scripts are sanitized examples inspired by production Linux operations work and avoid destructive actions or root-only assumptions.
|
|
|
|
## Diagram
|
|
|
|
```mermaid
|
|
flowchart TD
|
|
A["bash"] --> B["os-healthcheck"]
|
|
A --> C["incident-checks"]
|
|
A --> D["disk-full"]
|
|
A --> E["veritas"]
|
|
A --> F["gpfs"]
|
|
B --> B1["Host diagnostics"]
|
|
C --> C1["Standalone triage checks"]
|
|
D --> D1["Incident workflow"]
|
|
E --> E1["VxVM and VCS change flow"]
|
|
F --> F1["Spectrum Scale expansion flow"]
|
|
```
|
|
|
|
## Scripts
|
|
|
|
- `os-healthcheck/healthcheck.sh` - general host health overview.
|
|
- `os-healthcheck/disk_check.sh` - filesystem usage threshold check.
|
|
- `os-healthcheck/service_check.sh` - critical service status check.
|
|
- `os-healthcheck/system_report.sh` - writes a timestamped system report to `/tmp`.
|
|
- `os-healthcheck/network_troubleshoot.sh` - local and optional remote network diagnostics.
|
|
- `incident-checks/` - standalone read-only incident checks for CPU, memory/OOM, services, SSH failures, TLS certificates, DNS, NTP, filesystems, inodes, and JVM diagnostics.
|
|
|
|
## Usage
|
|
|
|
```bash
|
|
cd infra-run/scripts/bash/os-healthcheck
|
|
|
|
./healthcheck.sh
|
|
./disk_check.sh
|
|
./disk_check.sh 90
|
|
./service_check.sh
|
|
./service_check.sh sshd nginx zabbix-agent
|
|
./system_report.sh
|
|
./network_troubleshoot.sh
|
|
./network_troubleshoot.sh google.com
|
|
|
|
cd ../incident-checks
|
|
./check_high_cpu.sh
|
|
./check_high_memory_oom.sh --since "24 hours ago"
|
|
./check_service_restart_loop.sh --service sshd
|
|
./check_certificate_expiry.sh --host example.com
|
|
```
|
|
|
|
## Standards
|
|
|
|
- Scripts use Bash and should keep `#!/usr/bin/env bash` plus strict mode.
|
|
- Read-only checks should report missing tools without hiding the problem.
|
|
- Change-capable scripts must default to dry-run behavior and require explicit `--execute`.
|
|
- Output should use `OK`, `WARNING`, and `CRITICAL` where practical.
|
|
- Validate changed scripts with `./scripts/check-bash.sh` from the repository root.
|
|
|
|
## Exit Codes
|
|
|
|
`disk_check.sh`:
|
|
|
|
- `0` - all filesystems are below the threshold.
|
|
- `1` - one or more filesystems are at or above the threshold.
|
|
- `2` - invalid threshold input.
|
|
|
|
`service_check.sh`:
|
|
|
|
- `0` - all checked services are active.
|
|
- `1` - at least one service is inactive, failed, missing, or cannot be checked.
|
|
|
|
`network_troubleshoot.sh`:
|
|
|
|
- `0` - no obvious local, DNS, or connectivity issue detected.
|
|
- `1` - DNS, interface, gateway, or target connectivity problems detected.
|
|
|
|
`healthcheck.sh` and `system_report.sh` are informational. They print warnings for missing tools where possible.
|
|
|
|
## Notes
|
|
|
|
- Requires Bash.
|
|
- Designed for RHEL, Oracle Linux, and Ubuntu style systems.
|
|
- Handles missing tools such as `ss`, `traceroute`, `nc`, and `journalctl` gracefully.
|
|
- Does not require root and does not make system changes.
|