Compare commits

..

5 Commits

Author SHA1 Message Date
Mateusz Suski 65c7c82f0f Update README and add CHANGELOG with initial toolkits summary 2026-05-05 21:47:33 +00:00
Mateusz Suski 76e24796bb Add disk full incident response toolkit 2026-05-05 21:44:08 +00:00
Mateusz Suski 5dd8c34952 Add GPFS storage expansion toolkit 2026-05-05 21:40:46 +00:00
Mateusz Suski c42d8bfb8f Add Veritas VxVM and VCS storage expansion toolkit 2026-05-05 21:40:09 +00:00
Mateusz Suski 9fb291f834 Add initial Linux operations Bash toolkit with network diagnostics 2026-05-05 21:26:02 +00:00
40 changed files with 3136 additions and 7 deletions
+39
View File
@@ -0,0 +1,39 @@
# Changelog
## [Initial Version]
### Added
- Repository structure:
- `infra-run`
- `platform-projects`
- `labs`
- Linux operations Bash toolkit:
- healthcheck
- disk usage checks
- service checks
- system reporting
- Disk full incident toolkit:
- disk analysis
- large files detection
- deleted open files detection
- safe cleanup suggestions
- Network troubleshooting script:
- interface, routing, DNS, connectivity checks
- Veritas storage toolkit:
- VxVM disk detection
- diskgroup extension
- volume/filesystem resize
- VCS freeze/unfreeze workflow
- GPFS storage toolkit:
- cluster validation
- NSD planning
- filesystem expansion
- rebalance
- Runbook-style structure and step-based execution.
### Notes
- All scripts default to dry-run where change actions are present.
- Designed for safety and readability.
- No destructive actions without explicit confirmation.
+56 -7
View File
@@ -1,10 +1,59 @@
# Portfolio
Personal infrastructure engineering portfolio focused on Linux operations, automation, monitoring and lab-based infrastructure projects.
This repository demonstrates real-world Linux infrastructure and operations experience through sanitized scripts, runbooks, and project structure. It focuses on production operations, incident response, troubleshooting, automation, and enterprise infrastructure patterns.
Main areas:
- Linux operations automation
- Infrastructure troubleshooting and runbooks
- Monitoring and observability
- Virtualization and clustering
- Kubernetes, Terraform and lab environments
## Core Project
### infra-run
`infra-run` is the core operational project in this repository. It contains Linux operations automation, incident response tooling, Bash-based operational scripts, and runbook-style workflows for pre-checks, controlled changes, troubleshooting, and post-change validation.
## Toolkits
### Linux Operations Toolkit
[infra-run/scripts/bash/](./infra-run/scripts/bash/)
General Linux operations scripts for host health checks, disk usage checks, service validation, and system reporting. The toolkit is written for practical operations checks on RHEL, Oracle Linux, and Ubuntu-style systems.
### Disk Full Incident Toolkit
[infra-run/scripts/bash/disk-full/](./infra-run/scripts/bash/disk-full/)
Production-style disk full incident workflow covering filesystem usage, inode pressure, large file discovery, deleted open files, top directory analysis, log cleanup review, and safe cleanup suggestions. The scenario reflects common incidents involving logs, temporary files, deleted files held open by processes, and inode exhaustion.
### Network Troubleshooting
[infra-run/scripts/bash/](./infra-run/scripts/bash/)
OS-level network diagnostics for interfaces, routes, DNS resolution, gateway reachability, listening sockets, and optional remote connectivity checks. The script is designed for first-pass troubleshooting during Linux operations incidents.
### Veritas Storage Toolkit
[infra-run/scripts/bash/veritas/](./infra-run/scripts/bash/veritas/)
Veritas VxVM and VCS storage expansion workflow covering new LUN detection, VxVM disk initialization, diskgroup extension, volume and filesystem resize, and VCS service group freeze/unfreeze handling. The approach is cluster-safe, dry-run by default, and organized around pre-check, change, and post-check steps.
### GPFS Storage Toolkit
[infra-run/scripts/bash/gpfs/](./infra-run/scripts/bash/gpfs/)
GPFS / IBM Spectrum Scale filesystem expansion workflow covering cluster validation, candidate disk discovery, NSD stanza planning, NSD creation, filesystem expansion, optional rebalance, post-checks, and change reporting.
## Repository Structure
- `infra-run` - core operational automation, scripts, runbooks, and infrastructure operations examples.
- `platform-projects` - larger infrastructure topics including storage, clustering, monitoring, virtualization, and log analysis.
- `labs` - experimentation and lab work for Kubernetes, Terraform, Docker, networking, and CI/CD.
## Design Principles
- Safety first, with dry-run behavior by default.
- Pre-check, change, and post-check workflow.
- Real-world scenarios, not tutorials.
- Minimal but practical tooling.
## Notes
- Scripts are simplified and sanitized for portfolio use.
- Examples are based on real production operations patterns.
+51
View File
@@ -0,0 +1,51 @@
# Linux Operations Bash Toolkit
Small, practical Bash scripts for Linux operations checks and incident triage. The scripts are sanitized examples inspired by production Linux operations work and avoid destructive actions or root-only assumptions.
## Scripts
- `healthcheck.sh` - general host health overview.
- `disk_check.sh` - filesystem usage threshold check.
- `service_check.sh` - critical service status check.
- `system_report.sh` - writes a timestamped system report to `/tmp`.
- `network_troubleshoot.sh` - local and optional remote network diagnostics.
## Usage
```bash
./healthcheck.sh
./disk_check.sh
./disk_check.sh 90
./service_check.sh
./service_check.sh sshd nginx zabbix-agent
./system_report.sh
./network_troubleshoot.sh
./network_troubleshoot.sh google.com
```
## Exit Codes
`disk_check.sh`:
- `0` - all filesystems are below the threshold.
- `1` - one or more filesystems are at or above the threshold.
- `2` - invalid threshold input.
`service_check.sh`:
- `0` - all checked services are active.
- `1` - at least one service is inactive, failed, missing, or cannot be checked.
`network_troubleshoot.sh`:
- `0` - no obvious local, DNS, or connectivity issue detected.
- `1` - DNS, interface, gateway, or target connectivity problems detected.
`healthcheck.sh` and `system_report.sh` are informational. They print warnings for missing tools where possible.
## Notes
- Requires Bash.
- Designed for RHEL, Oracle Linux, and Ubuntu style systems.
- Handles missing tools such as `ss`, `traceroute`, `nc`, and `journalctl` gracefully.
- Does not require root and does not make system changes.
+124
View File
@@ -0,0 +1,124 @@
#!/usr/bin/env bash
set -o errexit
set -o nounset
set -o pipefail
TIMESTAMP="${TIMESTAMP:-$(date +%Y%m%d_%H%M%S)}"
DRY_RUN="${DRY_RUN:-true}"
LOG_FILE="${LOG_FILE:-/tmp/disk_full_${TIMESTAMP}.log}"
WARN_THRESHOLD="${WARN_THRESHOLD:-80}"
CRIT_THRESHOLD="${CRIT_THRESHOLD:-90}"
EMERGENCY_THRESHOLD="${EMERGENCY_THRESHOLD:-95}"
log() {
local level="$1"
shift
local message="$*"
printf '%s: %s\n' "$level" "$message" | tee -a "$LOG_FILE"
}
ok() {
log "OK" "$@"
}
warning() {
log "WARNING" "$@"
}
critical() {
log "CRITICAL" "$@"
}
section() {
printf '\n== %s ==\n' "$1" | tee -a "$LOG_FILE"
}
require_cmd() {
local cmd="$1"
if command -v "$cmd" >/dev/null 2>&1; then
return 0
fi
warning "Command not available: $cmd"
return 1
}
run_cmd() {
if [[ "$#" -eq 0 ]]; then
critical "run_cmd called without a command"
return 2
fi
if [[ "$DRY_RUN" == "true" ]]; then
ok "DRY-RUN: $*"
return 0
fi
ok "RUN: $*"
"$@" 2>&1 | tee -a "$LOG_FILE"
}
confirm_execute() {
local target="${1:-disk-full remediation}"
if [[ "$DRY_RUN" == "true" ]]; then
ok "Safe mode enabled. No destructive actions will be taken."
return 0
fi
warning "Execution mode requested for: $target"
warning "Confirm the affected filesystem, application impact, backups, and change approval before continuing."
printf 'Type EXECUTE to continue: '
read -r confirmation
if [[ "$confirmation" != "EXECUTE" ]]; then
critical "Confirmation failed. Aborting."
exit 1
fi
ok "Execution confirmed by operator."
}
validate_path() {
local path="$1"
if [[ -z "$path" ]]; then
critical "Path cannot be empty"
return 2
fi
if [[ ! -e "$path" ]]; then
critical "Path does not exist: $path"
return 2
fi
}
usage_percent_number() {
local value="$1"
printf '%s\n' "${value%\%}"
}
status_for_percent() {
local percent="$1"
if (( percent >= EMERGENCY_THRESHOLD )); then
printf 'CRITICAL'
elif (( percent >= CRIT_THRESHOLD )); then
printf 'WARNING'
elif (( percent >= WARN_THRESHOLD )); then
printf 'WARNING'
else
printf 'OK'
fi
}
safe_find_prune_args() {
printf '%s\n' \
-path /proc -o \
-path /sys -o \
-path /dev -o \
-path /run -o \
-path /tmp/systemd-private-\*
}
+47
View File
@@ -0,0 +1,47 @@
#!/usr/bin/env bash
set -o errexit
set -o nounset
set -o pipefail
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
# shellcheck source=00_env.sh
. "$SCRIPT_DIR/00_env.sh"
exit_code=0
section "Disk Space Overview"
if require_cmd df; then
df -h 2>&1 | tee -a "$LOG_FILE"
else
critical "df is required for disk overview"
exit 1
fi
section "Inode Overview"
df -i 2>&1 | tee -a "$LOG_FILE" || warning "Unable to collect inode usage"
section "Filesystems Sorted By Usage"
df -P -h | awk 'NR == 1 { next } { print $5, $6, $1, $2, $3, $4 }' | sort -rn | while read -r used mount fs size used_space avail; do
percent="$(usage_percent_number "$used")"
level="$(status_for_percent "$percent")"
printf '%s: %s used on %s (%s, size=%s used=%s avail=%s)\n' "$level" "$used" "$mount" "$fs" "$size" "$used_space" "$avail" | tee -a "$LOG_FILE"
done
section "Threshold Summary"
while read -r fs size used avail pct mount; do
percent="$(usage_percent_number "$pct")"
level="$(status_for_percent "$percent")"
if (( percent >= EMERGENCY_THRESHOLD )); then
critical "$mount is ${pct} full on $fs (size=$size used=$used avail=$avail)"
exit_code=1
elif (( percent >= CRIT_THRESHOLD )); then
warning "$mount is ${pct} full on $fs (size=$size used=$used avail=$avail)"
elif (( percent >= WARN_THRESHOLD )); then
warning "$mount is ${pct} full on $fs (size=$size used=$used avail=$avail)"
else
ok "$mount is ${pct} full on $fs"
fi
done < <(df -P -h | awk 'NR > 1 { print $1, $2, $3, $4, $5, $6 }')
exit "$exit_code"
+63
View File
@@ -0,0 +1,63 @@
#!/usr/bin/env bash
set -o errexit
set -o nounset
set -o pipefail
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
# shellcheck source=00_env.sh
. "$SCRIPT_DIR/00_env.sh"
SEARCH_PATH="/"
TOP_N=20
usage() {
printf 'Usage: %s [--path <path>] [--top <N>]\n' "$(basename "$0")"
}
while [[ "$#" -gt 0 ]]; do
case "$1" in
--path) SEARCH_PATH="${2:-}"; shift 2 ;;
--top) TOP_N="${2:-}"; shift 2 ;;
-h|--help) usage; exit 0 ;;
*) critical "Unknown argument: $1"; usage; exit 2 ;;
esac
done
if ! [[ "$TOP_N" =~ ^[0-9]+$ ]] || (( TOP_N < 1 )); then
critical "--top must be a positive integer"
exit 2
fi
validate_path "$SEARCH_PATH" || exit 2
require_cmd find || exit 1
require_cmd sort || exit 1
require_cmd head || exit 1
section "Largest Files Under $SEARCH_PATH"
warning "Read-only scan. Permission errors can be normal without root access."
find "$SEARCH_PATH" -xdev \
\( -path /proc -o -path /sys -o -path /dev -o -path /run \) -prune -o \
-type f -printf '%s\t%p\n' 2>/dev/null |
sort -rn |
head -n "$TOP_N" |
awk '
function human(bytes) {
split("B KB MB GB TB PB", unit)
size = bytes
idx = 1
while (size >= 1024 && idx < 6) {
size = size / 1024
idx++
}
return sprintf("%.1f%s", size, unit[idx])
}
{
size = $1
$1 = ""
sub(/^\t/, "")
printf "%10s %s\n", human(size), $0
}
' | tee -a "$LOG_FILE"
ok "No files were modified."
+41
View File
@@ -0,0 +1,41 @@
#!/usr/bin/env bash
set -o errexit
set -o nounset
set -o pipefail
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
# shellcheck source=00_env.sh
. "$SCRIPT_DIR/00_env.sh"
section "Deleted But Open Files"
if ! require_cmd lsof; then
warning "lsof is not installed or not in PATH. Install lsof or run equivalent tooling with appropriate privileges."
exit 0
fi
warning "Read-only check. Full results may require elevated privileges."
deleted_output="$(lsof -nP +L1 2>/dev/null || true)"
if [[ -z "$deleted_output" ]]; then
ok "No deleted open files detected by lsof."
exit 0
fi
printf '%s\n' "$deleted_output" |
awk '
NR == 1 {
printf "%-20s %-10s %-12s %s\n", "PROCESS", "PID", "SIZE", "PATH"
next
}
{
path = $9
for (i = 10; i <= NF; i++) {
path = path " " $i
}
printf "%-20s %-10s %-12s %s\n", $1, $2, $7, path
}
' | tee -a "$LOG_FILE"
warning "Space from deleted files is released when the owning process closes the file or is safely restarted."
+51
View File
@@ -0,0 +1,51 @@
#!/usr/bin/env bash
set -o errexit
set -o nounset
set -o pipefail
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
# shellcheck source=00_env.sh
. "$SCRIPT_DIR/00_env.sh"
SEARCH_PATH="/"
DEPTH=2
TOP_N=25
usage() {
printf 'Usage: %s [--path <path>] [--depth <N>] [--top <N>]\n' "$(basename "$0")"
}
while [[ "$#" -gt 0 ]]; do
case "$1" in
--path) SEARCH_PATH="${2:-}"; shift 2 ;;
--depth) DEPTH="${2:-}"; shift 2 ;;
--top) TOP_N="${2:-}"; shift 2 ;;
-h|--help) usage; exit 0 ;;
*) critical "Unknown argument: $1"; usage; exit 2 ;;
esac
done
if ! [[ "$DEPTH" =~ ^[0-9]+$ ]]; then
critical "--depth must be a non-negative integer"
exit 2
fi
if ! [[ "$TOP_N" =~ ^[0-9]+$ ]] || (( TOP_N < 1 )); then
critical "--top must be a positive integer"
exit 2
fi
validate_path "$SEARCH_PATH" || exit 2
require_cmd du || exit 1
require_cmd sort || exit 1
require_cmd head || exit 1
section "Top Directories Under $SEARCH_PATH"
warning "Read-only scan. Permission errors can be normal without root access."
du -x -h --max-depth="$DEPTH" "$SEARCH_PATH" 2>/dev/null |
sort -hr |
head -n "$TOP_N" |
tee -a "$LOG_FILE"
ok "No directories were modified."
+97
View File
@@ -0,0 +1,97 @@
#!/usr/bin/env bash
set -o errexit
set -o nounset
set -o pipefail
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
# shellcheck source=00_env.sh
. "$SCRIPT_DIR/00_env.sh"
EXECUTE=false
LOG_PATH="/var/log"
DAYS_OLD=14
usage() {
printf 'Usage: %s [--path <path>] [--days-old <N>] [--execute]\n' "$(basename "$0")"
}
while [[ "$#" -gt 0 ]]; do
case "$1" in
--path) LOG_PATH="${2:-}"; shift 2 ;;
--days-old) DAYS_OLD="${2:-}"; shift 2 ;;
--execute) EXECUTE=true; DRY_RUN=false; shift ;;
-h|--help) usage; exit 0 ;;
*) critical "Unknown argument: $1"; usage; exit 2 ;;
esac
done
if ! [[ "$DAYS_OLD" =~ ^[0-9]+$ ]] || (( DAYS_OLD < 1 )); then
critical "--days-old must be a positive integer"
exit 2
fi
validate_path "$LOG_PATH" || exit 2
require_cmd find || exit 1
require_cmd sort || exit 1
require_cmd xargs || true
section "Large Log Files In $LOG_PATH"
find "$LOG_PATH" -xdev -type f \( -name '*.log' -o -name '*log' -o -name 'messages*' -o -name 'syslog*' \) -size +100M -printf '%s\t%p\n' 2>/dev/null |
sort -rn |
awk '
function human(bytes) {
split("B KB MB GB TB", unit)
size = bytes
idx = 1
while (size >= 1024 && idx < 5) {
size = size / 1024
idx++
}
return sprintf("%.1f%s", size, unit[idx])
}
{ size = $1; $1 = ""; sub(/^\t/, ""); printf "%10s %s\n", human(size), $0 }
' | tee -a "$LOG_FILE"
section "Old Rotated Logs Eligible For Review"
mapfile -t rotated_logs < <(
find "$LOG_PATH" -xdev -type f \
\( -name '*.gz' -o -name '*.1' -o -name '*.old' -o -name '*.bz2' -o -name '*.xz' \) \
-mtime +"$DAYS_OLD" -print 2>/dev/null | sort
)
if [[ "${#rotated_logs[@]}" -eq 0 ]]; then
ok "No old rotated logs found under $LOG_PATH with age greater than $DAYS_OLD days."
else
printf '%s\n' "${rotated_logs[@]}" | tee -a "$LOG_FILE"
fi
section "Suggested Cleanup Commands"
cat <<SUGGESTIONS | tee -a "$LOG_FILE"
# Review large active logs before truncating. Prefer application-aware log rotation:
logrotate -d /etc/logrotate.conf
# Remove old rotated logs only after retention approval:
$(basename "$0") --path "$LOG_PATH" --days-old "$DAYS_OLD" --execute
SUGGESTIONS
if [[ "$EXECUTE" != "true" ]]; then
ok "Safe mode. No logs were removed."
exit 0
fi
if [[ "${#rotated_logs[@]}" -eq 0 ]]; then
ok "Execution requested, but there are no eligible old rotated logs to remove."
exit 0
fi
confirm_execute "remove old rotated logs from $LOG_PATH"
for file in "${rotated_logs[@]}"; do
if [[ -f "$file" && ! -L "$file" ]]; then
run_cmd rm -f -- "$file"
else
warning "Skipped non-regular file or symlink: $file"
fi
done
ok "Old rotated log cleanup completed."
+78
View File
@@ -0,0 +1,78 @@
#!/usr/bin/env bash
set -o errexit
set -o nounset
set -o pipefail
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
# shellcheck source=00_env.sh
. "$SCRIPT_DIR/00_env.sh"
EXECUTE=false
TRUNCATE_FILE=""
RESTART_SERVICE=""
usage() {
printf 'Usage: %s [--truncate-file <path>] [--restart-service <name>] [--execute]\n' "$(basename "$0")"
}
while [[ "$#" -gt 0 ]]; do
case "$1" in
--truncate-file) TRUNCATE_FILE="${2:-}"; shift 2 ;;
--restart-service) RESTART_SERVICE="${2:-}"; shift 2 ;;
--execute) EXECUTE=true; DRY_RUN=false; shift ;;
-h|--help) usage; exit 0 ;;
*) critical "Unknown argument: $1"; usage; exit 2 ;;
esac
done
section "Emergency Disk Full Quick Fix Options"
cat <<OPTIONS | tee -a "$LOG_FILE"
Possible actions after incident commander approval:
1. Truncate a verified active log file:
$0 --truncate-file /path/to/large.log --execute
2. Restart a specific service holding deleted files open:
$0 --restart-service service-name --execute
Review application impact before either action. Truncation preserves the file inode but destroys file contents.
OPTIONS
if [[ -z "$TRUNCATE_FILE" && -z "$RESTART_SERVICE" ]]; then
ok "No quick fix requested. Printed options only."
exit 0
fi
if [[ "$EXECUTE" != "true" ]]; then
warning "Quick fix arguments supplied without --execute. No changes made."
exit 0
fi
confirm_execute "emergency disk-full quick fix"
if [[ -n "$TRUNCATE_FILE" ]]; then
validate_path "$TRUNCATE_FILE" || exit 2
if [[ ! -f "$TRUNCATE_FILE" || -L "$TRUNCATE_FILE" ]]; then
critical "Refusing to truncate non-regular file or symlink: $TRUNCATE_FILE"
exit 2
fi
warning "Truncating file contents: $TRUNCATE_FILE"
: > "$TRUNCATE_FILE"
ok "Truncated $TRUNCATE_FILE"
fi
if [[ -n "$RESTART_SERVICE" ]]; then
if [[ "$RESTART_SERVICE" == *"/"* || "$RESTART_SERVICE" == *".."* ]]; then
critical "Invalid service name: $RESTART_SERVICE"
exit 2
fi
if require_cmd systemctl; then
run_cmd systemctl restart "$RESTART_SERVICE"
ok "Restart requested for service: $RESTART_SERVICE"
else
critical "systemctl is required to restart services"
exit 1
fi
fi
+90
View File
@@ -0,0 +1,90 @@
#!/usr/bin/env bash
set -o errexit
set -o nounset
set -o pipefail
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
# shellcheck source=00_env.sh
. "$SCRIPT_DIR/00_env.sh"
BEFORE_FILE=""
exit_code=0
usage() {
printf 'Usage: %s [--before-file <df_output_file>]\n' "$(basename "$0")"
}
while [[ "$#" -gt 0 ]]; do
case "$1" in
--before-file) BEFORE_FILE="${2:-}"; shift 2 ;;
-h|--help) usage; exit 0 ;;
*) critical "Unknown argument: $1"; usage; exit 2 ;;
esac
done
require_cmd df || exit 1
section "Post-Cleanup Disk Space"
df -h 2>&1 | tee -a "$LOG_FILE"
section "Post-Cleanup Inodes"
df -i 2>&1 | tee -a "$LOG_FILE" || warning "Unable to collect inode usage"
section "Critical Filesystem Check"
while read -r fs size used avail pct mount; do
percent="$(usage_percent_number "$pct")"
if (( percent >= EMERGENCY_THRESHOLD )); then
critical "$mount is still ${pct} full on $fs (size=$size used=$used avail=$avail)"
exit_code=1
elif (( percent >= CRIT_THRESHOLD )); then
warning "$mount remains high at ${pct} on $fs"
else
ok "$mount is ${pct} full on $fs"
fi
done < <(df -P -h | awk 'NR > 1 { print $1, $2, $3, $4, $5, $6 }')
if [[ -n "$BEFORE_FILE" ]]; then
section "Before And After Comparison"
if [[ ! -f "$BEFORE_FILE" ]]; then
warning "Before file not found: $BEFORE_FILE"
else
awk '
NR == FNR && FNR > 1 {
before[$6] = $5
next
}
FNR > 1 {
mount = $6
if (mount in before) {
before_pct = before[mount]
after_pct = $5
gsub(/%/, "", before_pct)
gsub(/%/, "", after_pct)
if (after_pct < before_pct) {
status = "OK"
result = "improved"
} else if (after_pct == before_pct) {
status = "WARNING"
result = "unchanged"
} else {
status = "WARNING"
result = "increased"
}
printf "%s: %s before=%s after=%s (%s)\n", status, mount, before[mount], $5, result
}
}
' "$BEFORE_FILE" <(df -P -h) | tee -a "$LOG_FILE"
fi
else
warning "No --before-file supplied. Improvement comparison skipped."
fi
if [[ "$exit_code" -eq 0 ]]; then
ok "Post-check completed without emergency-threshold filesystems."
else
critical "One or more filesystems remain at or above ${EMERGENCY_THRESHOLD}%."
fi
exit "$exit_code"
@@ -0,0 +1,84 @@
# Linux Disk Full Incident Toolkit
Production-style Bash toolkit for diagnosing and handling a disk full incident on Linux systems. It is intentionally conservative: default mode is safe, cleanup actions require `--execute` and an operator confirmation prompt, and the scripts do not assume root access.
## Why Disk Full Incidents Happen
- **Logs** - application, audit, system, or middleware logs can grow faster than rotation policy expects.
- **Temporary files** - failed jobs, installers, archives, and batch workloads often leave large files in `/tmp`, `/var/tmp`, or application work directories.
- **Deleted open files** - a process can keep writing to a file after it has been deleted, hiding disk usage from normal directory listings until the process closes the file.
- **Inode exhaustion** - a filesystem can fail writes even when space is available if it has too many small files and no free inodes.
## Safety Model
- Safe dry-run behavior is the default.
- No script blindly deletes files.
- Cleanup operations require `--execute` and confirmation.
- Missing optional commands are reported as `WARNING`.
- Output is formatted with `OK`, `WARNING`, and `CRITICAL` for incident notes.
- The scripts are designed to work without root, while warning when permissions may limit visibility.
## Scripts
- `00_env.sh` - shared configuration and helper functions.
- `01_disk_overview.sh` - `df -h`, `df -i`, sorted mount usage, and threshold highlights.
- `02_find_big_files.sh` - read-only largest-file discovery.
- `03_deleted_open_files.sh` - deleted but open file detection with `lsof` when available.
- `04_top_dirs.sh` - largest directory discovery with `du`.
- `05_log_cleanup.sh` - safe log cleanup analysis and optional old rotated log removal.
- `06_quick_fix.sh` - defensive emergency actions for verified truncation or service restart.
- `07_postcheck.sh` - validation after cleanup, with optional before/after comparison.
- `disk_full_runbook.sh` - guided incident workflow.
## Example Usage
```bash
cd infra-run/scripts/bash/disk-full
./01_disk_overview.sh
./02_find_big_files.sh --path /var --top 20
./03_deleted_open_files.sh
./04_top_dirs.sh --path /var --depth 2
./05_log_cleanup.sh
./07_postcheck.sh
```
Run the guided read-only workflow:
```bash
./disk_full_runbook.sh --path /var --top 20 --depth 2
```
Review old rotated logs without deleting them:
```bash
./05_log_cleanup.sh --path /var/log --days-old 14
```
Remove old rotated logs only after approval:
```bash
./05_log_cleanup.sh --path /var/log --days-old 14 --execute
```
Emergency truncation of a verified active log:
```bash
./06_quick_fix.sh --truncate-file /var/log/app/verified-large.log --execute
```
Restart a specific service after confirming it is holding deleted files open:
```bash
./06_quick_fix.sh --restart-service app.service --execute
```
## Exit Codes
- `0` - OK
- `1` - operational issue detected or still critical
- `2` - invalid input
## Production Warning
Use this toolkit as an incident aid, not an autopilot. Confirm the affected filesystem, application ownership, retention requirements, backup expectations, and change approval before cleanup. In enterprise environments, coordinate service restarts and file truncation with application owners because both can destroy evidence or interrupt production workloads.
+67
View File
@@ -0,0 +1,67 @@
#!/usr/bin/env bash
set -o errexit
set -o nounset
set -o pipefail
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
# shellcheck source=00_env.sh
. "$SCRIPT_DIR/00_env.sh"
SEARCH_PATH="/"
TOP_N=20
DEPTH=2
EXECUTE=false
usage() {
printf 'Usage: %s [--path <path>] [--top <N>] [--depth <N>] [--execute]\n' "$(basename "$0")"
}
while [[ "$#" -gt 0 ]]; do
case "$1" in
--path) SEARCH_PATH="${2:-}"; shift 2 ;;
--top) TOP_N="${2:-}"; shift 2 ;;
--depth) DEPTH="${2:-}"; shift 2 ;;
--execute) EXECUTE=true; DRY_RUN=false; shift ;;
-h|--help) usage; exit 0 ;;
*) critical "Unknown argument: $1"; usage; exit 2 ;;
esac
done
section "Disk Full Incident Workflow"
cat <<FLOW | tee -a "$LOG_FILE"
Step 1. Disk overview
$SCRIPT_DIR/01_disk_overview.sh
Step 2. Find largest files
$SCRIPT_DIR/02_find_big_files.sh --path "$SEARCH_PATH" --top "$TOP_N"
Step 3. Check deleted but open files
$SCRIPT_DIR/03_deleted_open_files.sh
Step 4. Identify top directories
$SCRIPT_DIR/04_top_dirs.sh --path "$SEARCH_PATH" --depth "$DEPTH"
Step 5. Review safe log cleanup suggestions
$SCRIPT_DIR/05_log_cleanup.sh
Step 6. Optional emergency quick fix, only after approval
$SCRIPT_DIR/06_quick_fix.sh --truncate-file /path/to/verified.log --execute
$SCRIPT_DIR/06_quick_fix.sh --restart-service service-name --execute
Step 7. Post-check
$SCRIPT_DIR/07_postcheck.sh
FLOW
if [[ "$EXECUTE" == "true" ]]; then
warning "--execute was supplied to the runbook. Destructive actions are still not run automatically."
fi
section "Running Read-Only Incident Checks"
"$SCRIPT_DIR/01_disk_overview.sh" || warning "Disk overview reported critical usage"
"$SCRIPT_DIR/02_find_big_files.sh" --path "$SEARCH_PATH" --top "$TOP_N" || warning "Large-file scan reported an issue"
"$SCRIPT_DIR/03_deleted_open_files.sh" || warning "Deleted-open-file check reported an issue"
"$SCRIPT_DIR/04_top_dirs.sh" --path "$SEARCH_PATH" --depth "$DEPTH" || warning "Top-directory scan reported an issue"
"$SCRIPT_DIR/05_log_cleanup.sh" || warning "Log cleanup suggestion step reported an issue"
section "Next Manual Decision"
ok "Review findings, identify owner and retention requirements, then run a targeted cleanup script with --execute only if approved."
+29
View File
@@ -0,0 +1,29 @@
#!/usr/bin/env bash
set -o errexit
set -o nounset
set -o pipefail
threshold="${1:-80}"
if [[ ! "$threshold" =~ ^[0-9]+$ ]] || (( threshold < 1 || threshold > 100 )); then
printf 'CRITICAL: invalid threshold "%s"; provide an integer from 1 to 100\n' "$threshold" >&2
exit 2
fi
status=0
warning_threshold=$(( threshold > 5 ? threshold - 5 : threshold ))
while read -r filesystem size used avail use_percent mountpoint; do
usage="${use_percent%\%}"
if (( usage >= threshold )); then
printf 'CRITICAL: %s mounted on %s is %s used; threshold is %s%% (%s free)\n' "$filesystem" "$mountpoint" "$use_percent" "$threshold" "$avail"
status=1
elif (( usage >= warning_threshold )); then
printf 'WARNING: %s mounted on %s is %s used; threshold is %s%%\n' "$filesystem" "$mountpoint" "$use_percent" "$threshold"
else
printf 'OK: %s mounted on %s is %s used\n' "$filesystem" "$mountpoint" "$use_percent"
fi
done < <(df -P -x tmpfs -x devtmpfs | awk 'NR > 1 {print $1, $2, $3, $4, $5, $6}')
exit "$status"
+114
View File
@@ -0,0 +1,114 @@
#!/usr/bin/env bash
set -o errexit
set -o nounset
set -o pipefail
TIMESTAMP="${TIMESTAMP:-$(date +%Y%m%d_%H%M%S)}"
DRY_RUN="${DRY_RUN:-true}"
LOG_FILE="${LOG_FILE:-/tmp/gpfs_extend_${TIMESTAMP}.log}"
FILESYSTEM="${FILESYSTEM:-}"
NSD_STANZA="${NSD_STANZA:-}"
FAILURE_GROUP="${FAILURE_GROUP:-}"
STORAGE_POOL="${STORAGE_POOL:-system}"
USAGE="${USAGE:-dataAndMetadata}"
log() {
local level="$1"
shift
local message="$*"
printf '%s: %s\n' "$level" "$message" | tee -a "$LOG_FILE"
}
ok() {
log "OK" "$@"
}
warning() {
log "WARNING" "$@"
}
critical() {
log "CRITICAL" "$@"
}
require_cmd() {
local cmd="$1"
if command -v "$cmd" >/dev/null 2>&1; then
ok "Command available: $cmd"
return 0
fi
critical "Required command not found: $cmd"
return 1
}
validate_gpfs_command() {
local cmd="$1"
if command -v "$cmd" >/dev/null 2>&1; then
return 0
fi
warning "GPFS command not available, skipping: $cmd"
return 1
}
run_cmd() {
if [[ "$#" -eq 0 ]]; then
critical "run_cmd called without a command"
return 2
fi
if [[ "$DRY_RUN" == "true" ]]; then
log "OK" "DRY-RUN: $*"
return 0
fi
log "OK" "RUN: $*"
"$@" 2>&1 | tee -a "$LOG_FILE"
}
run_readonly() {
if [[ "$#" -eq 0 ]]; then
critical "run_readonly called without a command"
return 2
fi
log "OK" "READ-ONLY: $*"
"$@" 2>&1 | tee -a "$LOG_FILE"
}
confirm_execute() {
local target="${1:-GPFS change}"
if [[ "$DRY_RUN" == "true" ]]; then
ok "Dry-run mode enabled. No changes will be made."
return 0
fi
warning "Execution mode requested for: $target"
warning "Coordinate this change with storage, GPFS, application, and change-management teams."
printf 'Type EXECUTE to continue: '
read -r confirmation
if [[ "$confirmation" != "EXECUTE" ]]; then
critical "Confirmation failed. Aborting."
exit 1
fi
ok "Execution confirmed by operator."
}
usage_value_valid() {
case "$1" in
dataOnly|metadataOnly|dataAndMetadata) return 0 ;;
*) return 1 ;;
esac
}
section() {
printf '\n== %s ==\n' "$1" | tee -a "$LOG_FILE"
}
+37
View File
@@ -0,0 +1,37 @@
#!/usr/bin/env bash
set -o errexit
set -o nounset
set -o pipefail
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
# shellcheck source=00_env.sh
. "$SCRIPT_DIR/00_env.sh"
run_optional() {
local description="$1"
shift
section "$description"
if validate_gpfs_command "$1"; then
run_readonly "$@" || warning "$description command failed"
fi
}
section "GPFS / Spectrum Scale Cluster Overview"
ok "Log file: $LOG_FILE"
run_optional "GPFS daemon state on all nodes" mmgetstate -a
run_optional "Cluster definition" mmlscluster
run_optional "Cluster configuration" mmlsconfig
run_optional "Managers and quorum information" mmlsmgr
run_optional "NSD inventory" mmlsnsd
run_optional "Disk inventory for all filesystems" mmlsdisk all
run_optional "Filesystem definitions" mmlsfs all
run_optional "Mount state for all filesystems" mmlsmount all
section "Mounted GPFS filesystems from df"
if command -v df >/dev/null 2>&1; then
df -h -t gpfs 2>/dev/null | tee -a "$LOG_FILE" || df -h | awk 'NR == 1 || /gpfs|mmfs/' | tee -a "$LOG_FILE"
else
warning "df command not available"
fi
+103
View File
@@ -0,0 +1,103 @@
#!/usr/bin/env bash
set -o errexit
set -o nounset
set -o pipefail
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
# shellcheck source=00_env.sh
. "$SCRIPT_DIR/00_env.sh"
usage() {
printf 'Usage: %s --fs <filesystem>\n' "$(basename "$0")"
}
while [[ "$#" -gt 0 ]]; do
case "$1" in
--fs)
FILESYSTEM="${2:-}"
shift 2
;;
-h|--help)
usage
exit 0
;;
*)
critical "Unknown argument: $1"
usage
exit 2
;;
esac
done
if [[ -z "$FILESYSTEM" ]]; then
critical "Missing required --fs <filesystem>"
usage
exit 2
fi
missing=0
for cmd in mmgetstate mmlscluster mmlsfs mmlsdisk mmlsmount mmlsmgr df; do
require_cmd "$cmd" || missing=1
done
if [[ "$missing" -ne 0 ]]; then
exit 2
fi
issues=0
section "GPFS daemon state"
state_output="$(mmgetstate -a 2>&1 || true)"
printf '%s\n' "$state_output" | tee -a "$LOG_FILE"
if printf '%s\n' "$state_output" | awk 'NR > 1 && $0 !~ / active / { found=1 } END { exit found ? 0 : 1 }'; then
warning "Not all GPFS nodes appear active"
fi
section "Target filesystem definition"
if mmlsfs "$FILESYSTEM" 2>&1 | tee -a "$LOG_FILE"; then
ok "Filesystem exists: $FILESYSTEM"
else
critical "Filesystem does not exist or cannot be queried: $FILESYSTEM"
exit 1
fi
section "Target filesystem mount state"
mount_output="$(mmlsmount "$FILESYSTEM" 2>&1 || true)"
printf '%s\n' "$mount_output" | tee -a "$LOG_FILE"
if printf '%s\n' "$mount_output" | grep -Eiq 'not mounted|no file systems were found|not found'; then
warning "Filesystem may not be mounted anywhere: $FILESYSTEM"
fi
section "Existing disks"
if ! mmlsdisk "$FILESYSTEM" 2>&1 | tee -a "$LOG_FILE"; then
critical "Unable to list disks for filesystem: $FILESYSTEM"
issues=1
fi
section "Filesystem capacity"
df -h 2>&1 | awk -v fs="$FILESYSTEM" 'NR == 1 || $0 ~ fs || $0 ~ /gpfs|mmfs/' | tee -a "$LOG_FILE"
section "Cluster health"
if command -v mmhealth >/dev/null 2>&1; then
health_output="$(mmhealth cluster show 2>&1 || true)"
printf '%s\n' "$health_output" | tee -a "$LOG_FILE"
if printf '%s\n' "$health_output" | grep -Eiq 'degraded|failed|down|error|unhealthy'; then
warning "Cluster health output indicates a degraded condition"
fi
else
warning "mmhealth command not available, skipping health check"
fi
section "Managers and quorum"
mmlsmgr 2>&1 | tee -a "$LOG_FILE" || {
critical "Unable to query GPFS manager/quorum information"
issues=1
}
if [[ "$issues" -eq 0 ]]; then
ok "Precheck completed for filesystem: $FILESYSTEM"
exit 0
fi
critical "Precheck found operational validation failures"
exit 1
+83
View File
@@ -0,0 +1,83 @@
#!/usr/bin/env bash
set -o errexit
set -o nounset
set -o pipefail
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
# shellcheck source=00_env.sh
. "$SCRIPT_DIR/00_env.sh"
EXCLUDE_MOUNTED=false
EXCLUDE_EXISTING_NSD=false
usage() {
printf 'Usage: %s [--exclude-mounted] [--exclude-existing-nsd]\n' "$(basename "$0")"
}
while [[ "$#" -gt 0 ]]; do
case "$1" in
--exclude-mounted)
EXCLUDE_MOUNTED=true
shift
;;
--exclude-existing-nsd)
EXCLUDE_EXISTING_NSD=true
shift
;;
-h|--help)
usage
exit 0
;;
*)
critical "Unknown argument: $1"
usage
exit 2
;;
esac
done
for cmd in lsblk findmnt; do
require_cmd "$cmd" || exit 2
done
warning "Candidate devices are not automatically safe. Confirm every device with the storage and cluster teams before use."
existing_gpfs_devices=""
if [[ "$EXCLUDE_EXISTING_NSD" == "true" ]]; then
if command -v mmlsnsd >/dev/null 2>&1; then
existing_gpfs_devices="$(mmlsnsd 2>/dev/null || true)"
elif command -v mmlsdisk >/dev/null 2>&1; then
existing_gpfs_devices="$(mmlsdisk all 2>/dev/null || true)"
else
warning "mmlsnsd and mmlsdisk are unavailable; cannot exclude existing GPFS devices"
fi
fi
section "Block device inventory"
lsblk -dpno NAME,TYPE,SIZE,MODEL,SERIAL,MOUNTPOINT 2>&1 | tee -a "$LOG_FILE"
section "Candidate devices"
found=0
while read -r name type size model serial mountpoint; do
[[ "$type" == "disk" ]] || continue
if [[ "$EXCLUDE_MOUNTED" == "true" ]]; then
if [[ -n "${mountpoint:-}" ]] || findmnt -rn --source "$name" >/dev/null 2>&1; then
continue
fi
fi
if [[ "$EXCLUDE_EXISTING_NSD" == "true" ]] && [[ -n "$existing_gpfs_devices" ]]; then
if printf '%s\n' "$existing_gpfs_devices" | grep -Fq "$name"; then
continue
fi
fi
printf 'OK: candidate=%s size=%s model=%s serial=%s mountpoint=%s\n' \
"$name" "${size:-unknown}" "${model:-unknown}" "${serial:-unknown}" "${mountpoint:-none}" | tee -a "$LOG_FILE"
found=1
done < <(lsblk -dpno NAME,TYPE,SIZE,MODEL,SERIAL,MOUNTPOINT)
if [[ "$found" -eq 0 ]]; then
warning "No candidate devices found with the selected filters"
fi
+76
View File
@@ -0,0 +1,76 @@
#!/usr/bin/env bash
set -o errexit
set -o nounset
set -o pipefail
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
# shellcheck source=00_env.sh
. "$SCRIPT_DIR/00_env.sh"
DEVICES=""
SERVERS=""
OUTPUT=""
usage() {
printf 'Usage: %s --fs <filesystem> --devices "/dev/sdb /dev/sdc" --servers "node1,node2" --failure-group <number> --pool <storage_pool> --usage <dataOnly|metadataOnly|dataAndMetadata> [--output <path>]\n' "$(basename "$0")"
}
while [[ "$#" -gt 0 ]]; do
case "$1" in
--fs) FILESYSTEM="${2:-}"; shift 2 ;;
--devices) DEVICES="${2:-}"; shift 2 ;;
--servers) SERVERS="${2:-}"; shift 2 ;;
--failure-group) FAILURE_GROUP="${2:-}"; shift 2 ;;
--pool) STORAGE_POOL="${2:-}"; shift 2 ;;
--usage) USAGE="${2:-}"; shift 2 ;;
--output) OUTPUT="${2:-}"; shift 2 ;;
-h|--help) usage; exit 0 ;;
*) critical "Unknown argument: $1"; usage; exit 2 ;;
esac
done
if [[ -z "$FILESYSTEM" || -z "$DEVICES" || -z "$SERVERS" || -z "$FAILURE_GROUP" || -z "$STORAGE_POOL" || -z "$USAGE" ]]; then
critical "Missing required input"
usage
exit 2
fi
if ! [[ "$FAILURE_GROUP" =~ ^-?[0-9]+$ ]]; then
critical "--failure-group must be an integer"
exit 2
fi
if ! usage_value_valid "$USAGE"; then
critical "--usage must be one of: dataOnly, metadataOnly, dataAndMetadata"
exit 2
fi
if [[ -z "$OUTPUT" ]]; then
OUTPUT="/tmp/gpfs_nsd_${FILESYSTEM}_${TIMESTAMP}.stanza"
fi
safe_fs="$(printf '%s' "$FILESYSTEM" | tr -c '[:alnum:]_' '_')"
{
printf '# Generated GPFS NSD stanza for filesystem %s\n' "$FILESYSTEM"
printf '# Review with storage and cluster teams before use.\n\n'
for device in $DEVICES; do
if [[ "$device" != /dev/* ]]; then
critical "Device must be an absolute /dev path: $device"
exit 2
fi
device_base="$(basename "$device" | tr -c '[:alnum:]_' '_')"
nsd_name="nsd_${safe_fs}_${device_base}"
printf '%%nsd:\n'
printf ' device=%s\n' "$device"
printf ' nsd=%s\n' "$nsd_name"
printf ' servers=%s\n' "$SERVERS"
printf ' usage=%s\n' "$USAGE"
printf ' failureGroup=%s\n' "$FAILURE_GROUP"
printf ' pool=%s\n\n' "$STORAGE_POOL"
done
} > "$OUTPUT"
ok "Generated NSD stanza: $OUTPUT"
warning "This script only writes a stanza file. It does not create NSDs or modify GPFS."
+59
View File
@@ -0,0 +1,59 @@
#!/usr/bin/env bash
set -o errexit
set -o nounset
set -o pipefail
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
# shellcheck source=00_env.sh
. "$SCRIPT_DIR/00_env.sh"
usage() {
printf 'Usage: %s --fs <filesystem> --stanza <stanza_file> [--execute]\n' "$(basename "$0")"
}
while [[ "$#" -gt 0 ]]; do
case "$1" in
--fs) FILESYSTEM="${2:-}"; shift 2 ;;
--stanza) NSD_STANZA="${2:-}"; shift 2 ;;
--execute) DRY_RUN=false; shift ;;
-h|--help) usage; exit 0 ;;
*) critical "Unknown argument: $1"; usage; exit 2 ;;
esac
done
if [[ -z "$FILESYSTEM" || -z "$NSD_STANZA" ]]; then
critical "Missing required --fs or --stanza"
usage
exit 2
fi
if [[ ! -r "$NSD_STANZA" ]]; then
critical "Stanza file does not exist or is not readable: $NSD_STANZA"
exit 2
fi
for cmd in mmlsfs mmcrnsd mmadddisk; do
require_cmd "$cmd" || exit 2
done
if ! mmlsfs "$FILESYSTEM" >/dev/null 2>&1; then
critical "Filesystem does not exist or cannot be queried: $FILESYSTEM"
exit 1
fi
warning "Adding NSDs must be coordinated with storage, GPFS, application, and change-management teams."
section "Planned GPFS changes"
ok "DRY-RUN: mmcrnsd -F $NSD_STANZA"
ok "DRY-RUN: mmadddisk $FILESYSTEM -F $NSD_STANZA"
confirm_execute "create NSDs and add disks to $FILESYSTEM"
if [[ "$DRY_RUN" == "false" ]]; then
run_cmd mmcrnsd -F "$NSD_STANZA"
run_cmd mmadddisk "$FILESYSTEM" -F "$NSD_STANZA"
section "Post-add NSD inventory"
mmlsnsd 2>&1 | tee -a "$LOG_FILE" || warning "mmlsnsd command failed after execution"
section "Post-add filesystem disks"
mmlsdisk "$FILESYSTEM" 2>&1 | tee -a "$LOG_FILE" || warning "mmlsdisk command failed after execution"
fi
+56
View File
@@ -0,0 +1,56 @@
#!/usr/bin/env bash
set -o errexit
set -o nounset
set -o pipefail
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
# shellcheck source=00_env.sh
. "$SCRIPT_DIR/00_env.sh"
BACKGROUND=false
usage() {
printf 'Usage: %s --fs <filesystem> [--execute] [--background]\n' "$(basename "$0")"
}
while [[ "$#" -gt 0 ]]; do
case "$1" in
--fs) FILESYSTEM="${2:-}"; shift 2 ;;
--execute) DRY_RUN=false; shift ;;
--background) BACKGROUND=true; shift ;;
-h|--help) usage; exit 0 ;;
*) critical "Unknown argument: $1"; usage; exit 2 ;;
esac
done
if [[ -z "$FILESYSTEM" ]]; then
critical "Missing required --fs <filesystem>"
usage
exit 2
fi
for cmd in mmlsdisk mmrestripefs; do
require_cmd "$cmd" || exit 2
done
warning "Restripe/rebalance can be I/O intensive. Run only in an approved change window."
section "Current disk balance"
mmlsdisk "$FILESYSTEM" 2>&1 | tee -a "$LOG_FILE" || warning "Unable to show current disk state"
section "Planned rebalance"
if [[ "$BACKGROUND" == "true" ]]; then
if [[ "$DRY_RUN" == "true" ]]; then
ok "DRY-RUN: mmrestripefs $FILESYSTEM -b &"
else
confirm_execute "background restripe for $FILESYSTEM"
ok "RUN: mmrestripefs $FILESYSTEM -b &"
mmrestripefs "$FILESYSTEM" -b 2>&1 | tee -a "$LOG_FILE" &
fi
else
ok "DRY-RUN: mmrestripefs $FILESYSTEM -b"
confirm_execute "restripe for $FILESYSTEM"
if [[ "$DRY_RUN" == "false" ]]; then
run_cmd mmrestripefs "$FILESYSTEM" -b
fi
fi
+89
View File
@@ -0,0 +1,89 @@
#!/usr/bin/env bash
set -o errexit
set -o nounset
set -o pipefail
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
# shellcheck source=00_env.sh
. "$SCRIPT_DIR/00_env.sh"
usage() {
printf 'Usage: %s --fs <filesystem>\n' "$(basename "$0")"
}
while [[ "$#" -gt 0 ]]; do
case "$1" in
--fs) FILESYSTEM="${2:-}"; shift 2 ;;
-h|--help) usage; exit 0 ;;
*) critical "Unknown argument: $1"; usage; exit 2 ;;
esac
done
if [[ -z "$FILESYSTEM" ]]; then
critical "Missing required --fs <filesystem>"
usage
exit 2
fi
issues=0
run_check() {
local description="$1"
shift
section "$description"
if command -v "$1" >/dev/null 2>&1; then
"$@" 2>&1 | tee -a "$LOG_FILE" || {
critical "$description failed"
issues=1
}
else
warning "$1 command not available, skipping"
fi
}
run_check "GPFS daemon state" mmgetstate -a
run_check "Target filesystem mount state" mmlsmount "$FILESYSTEM"
run_check "Target filesystem disks" mmlsdisk "$FILESYSTEM"
run_check "NSD inventory" mmlsnsd
section "Filesystem capacity"
if command -v df >/dev/null 2>&1; then
df -h 2>&1 | awk -v fs="$FILESYSTEM" 'NR == 1 || $0 ~ fs || $0 ~ /gpfs|mmfs/' | tee -a "$LOG_FILE"
else
warning "df command not available, skipping"
fi
section "Cluster health"
if command -v mmhealth >/dev/null 2>&1; then
health_output="$(mmhealth cluster show 2>&1 || true)"
printf '%s\n' "$health_output" | tee -a "$LOG_FILE"
if printf '%s\n' "$health_output" | grep -Eiq 'degraded|failed|down|error|unhealthy'; then
critical "Cluster health output indicates an issue"
issues=1
fi
else
warning "mmhealth command not available, skipping"
fi
section "Recent GPFS journal entries"
if command -v journalctl >/dev/null 2>&1; then
journalctl -u 'gpfs*' -n 50 --no-pager 2>&1 | tee -a "$LOG_FILE" || warning "journalctl GPFS query failed"
else
warning "journalctl command not available, skipping"
fi
section "Recent kernel messages"
if command -v dmesg >/dev/null 2>&1; then
dmesg -T 2>/dev/null | tail -50 | tee -a "$LOG_FILE" || warning "dmesg query failed"
else
warning "dmesg command not available, skipping"
fi
if [[ "$issues" -eq 0 ]]; then
ok "Post-check completed without detected operational failures"
exit 0
fi
critical "Post-check detected one or more issues"
exit 1
+78
View File
@@ -0,0 +1,78 @@
#!/usr/bin/env bash
set -o errexit
set -o nounset
set -o pipefail
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
# shellcheck source=00_env.sh
. "$SCRIPT_DIR/00_env.sh"
REPORT_FILE=""
usage() {
printf 'Usage: %s --fs <filesystem>\n' "$(basename "$0")"
}
while [[ "$#" -gt 0 ]]; do
case "$1" in
--fs) FILESYSTEM="${2:-}"; shift 2 ;;
-h|--help) usage; exit 0 ;;
*) critical "Unknown argument: $1"; usage; exit 2 ;;
esac
done
if [[ -z "$FILESYSTEM" ]]; then
critical "Missing required --fs <filesystem>"
usage
exit 2
fi
REPORT_FILE="/tmp/gpfs_extend_report_${FILESYSTEM}_${TIMESTAMP}.txt"
append_section() {
local title="$1"
shift
{
printf '\n== %s ==\n' "$title"
if command -v "$1" >/dev/null 2>&1; then
"$@" 2>&1 || printf 'WARNING: command failed: %s\n' "$*"
else
printf 'WARNING: command not available: %s\n' "$1"
fi
} >> "$REPORT_FILE"
}
{
printf 'GPFS / Spectrum Scale Filesystem Expansion Report\n'
printf 'Hostname: %s\n' "$(hostname 2>/dev/null || printf 'unknown')"
printf 'Date: %s\n' "$(date)"
printf 'Target filesystem: %s\n' "$FILESYSTEM"
} > "$REPORT_FILE"
append_section "GPFS daemon state" mmgetstate -a
append_section "Cluster definition" mmlscluster
append_section "Managers and quorum" mmlsmgr
append_section "Target filesystem mount state" mmlsmount "$FILESYSTEM"
append_section "Target filesystem disks" mmlsdisk "$FILESYSTEM"
append_section "NSD inventory" mmlsnsd
append_section "Filesystem capacity" df -h
if command -v mmhealth >/dev/null 2>&1; then
append_section "Cluster health" mmhealth cluster show
else
printf '\n== Cluster health ==\nWARNING: mmhealth command not available\n' >> "$REPORT_FILE"
fi
if command -v journalctl >/dev/null 2>&1; then
append_section "Recent GPFS journal entries" journalctl -u 'gpfs*' -n 50 --no-pager
fi
if command -v dmesg >/dev/null 2>&1; then
{
printf '\n== Recent kernel messages ==\n'
dmesg -T 2>/dev/null | tail -50 || printf 'WARNING: dmesg query failed\n'
} >> "$REPORT_FILE"
fi
ok "Generated report: $REPORT_FILE"
+136
View File
@@ -0,0 +1,136 @@
# GPFS / IBM Spectrum Scale Filesystem Expansion Toolkit
Safe, sanitized Bash examples for planning and executing a GPFS / IBM Spectrum Scale filesystem expansion. The scripts are written as portfolio-grade operational tooling for a Linux Infrastructure Engineer: conservative defaults, clear validation, dry-run behavior, and explicit operator confirmation before changes.
These scripts are examples. Exact GPFS commands, flags, quorum practices, failure-group design, and storage naming standards vary by Spectrum Scale version and site policy.
## Concepts
- **Cluster** - the Spectrum Scale administrative domain containing the nodes, daemon configuration, quorum policy, filesystems, and NSDs.
- **Node** - a server participating in the GPFS cluster. Nodes may be clients, NSD servers, quorum nodes, manager-capable nodes, or a mix of roles.
- **Quorum** - the voting mechanism that protects the cluster from split-brain conditions. Expansion work should not proceed during quorum instability.
- **Filesystem** - the GPFS namespace and data layout presented to clients, backed by one or more NSDs.
- **NSD** - Network Shared Disk, the GPFS abstraction for a disk or LUN that is served to the cluster.
- **Failure group** - a placement hint that tells GPFS which disks share a failure domain, such as an enclosure, rack, site, controller pair, or storage array.
- **Storage pool** - a named pool of NSDs used for placement and lifecycle policy, commonly `system` plus optional data pools.
- **Restripe/rebalance** - the operation that redistributes data after disks are added. It can be I/O intensive and should run only in an approved change window.
## Required Tools
Common GPFS / Spectrum Scale tools expected in production include:
- `mmgetstate`
- `mmlscluster`
- `mmlsfs`
- `mmlsdisk`
- `mmlsnsd`
- `mmcrnsd`
- `mmadddisk`
- `mmrestripefs`
The toolkit also uses common Linux tools such as `df`, `lsblk`, `findmnt`, `journalctl`, and `dmesg` where available. Missing optional commands are reported as `WARNING` and skipped.
## Safety Model
- Default mode is dry-run.
- Real GPFS modifications require `--execute`.
- Destructive or high-impact steps also prompt for `EXECUTE`.
- Disk detection is read-only and never partitions, formats, wipes, or modifies devices.
- Device selection must always be confirmed with the storage team and cluster owners.
- The scripts do not assume production disk names.
Output uses a consistent status format:
- `OK`
- `WARNING`
- `CRITICAL`
Exit codes:
- `0` - OK
- `1` - operational validation failure
- `2` - invalid input or missing requirement
## Scripts
- `00_env.sh` - shared configuration and helper functions.
- `01_cluster_overview.sh` - read-only cluster overview.
- `02_precheck_gpfs.sh` - pre-expansion validation for a target filesystem.
- `03_detect_new_disks.sh` - read-only candidate block-device discovery.
- `04_create_nsd_stanza.sh` - generate an NSD stanza file.
- `05_add_nsd_to_filesystem.sh` - create NSDs and add disks to a filesystem, dry-run by default.
- `06_rebalance_filesystem.sh` - optional restripe/rebalance, dry-run by default.
- `07_postcheck_gpfs.sh` - post-change validation.
- `08_generate_report.sh` - text report for the change record.
- `gpfs_extend_runbook.sh` - guided order of operations plus safe read-only checks.
## Example Workflow
```bash
cd infra-run/scripts/bash/gpfs
./01_cluster_overview.sh
./02_precheck_gpfs.sh --fs gpfs01
./03_detect_new_disks.sh --exclude-mounted --exclude-existing-nsd
./04_create_nsd_stanza.sh \
--fs gpfs01 \
--devices "/dev/sdb /dev/sdc" \
--servers "gpfsnsd01,gpfsnsd02" \
--failure-group 10 \
--pool system \
--usage dataAndMetadata
```
Review the generated stanza with the storage and cluster teams. Confirm device identity, LUN masking, multipath naming, failure group placement, and site standards before continuing.
Dry-run the add step:
```bash
./05_add_nsd_to_filesystem.sh \
--fs gpfs01 \
--stanza /tmp/gpfs_nsd_gpfs01_YYYYmmdd_HHMMSS.stanza
```
Execute only in an approved change window:
```bash
./05_add_nsd_to_filesystem.sh \
--fs gpfs01 \
--stanza /tmp/gpfs_nsd_gpfs01_YYYYmmdd_HHMMSS.stanza \
--execute
```
Optional rebalance:
```bash
./06_rebalance_filesystem.sh --fs gpfs01
./06_rebalance_filesystem.sh --fs gpfs01 --execute --background
```
Post-check and report:
```bash
./07_postcheck_gpfs.sh --fs gpfs01
./08_generate_report.sh --fs gpfs01
```
Runbook helper:
```bash
./gpfs_extend_runbook.sh \
--fs gpfs01 \
--devices "/dev/sdb /dev/sdc" \
--servers "gpfsnsd01,gpfsnsd02" \
--failure-group 10 \
--pool system \
--usage dataAndMetadata
```
## Operational Notes
- Do not run these scripts blindly on production clusters.
- Confirm disk and multipath identity with the storage team before creating NSDs.
- Validate quorum and manager health before expansion.
- Confirm application I/O risk and rollback procedures before `mmadddisk` or `mmrestripefs`.
- Confirm the Spectrum Scale version and local standards for stanza fields before executing changes.
+94
View File
@@ -0,0 +1,94 @@
#!/usr/bin/env bash
set -o errexit
set -o nounset
set -o pipefail
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
# shellcheck source=00_env.sh
. "$SCRIPT_DIR/00_env.sh"
DEVICES=""
SERVERS=""
EXECUTE=false
usage() {
printf 'Usage: %s --fs <filesystem> --devices "/dev/sdb /dev/sdc" --servers "node1,node2" --failure-group <number> --pool <storage_pool> --usage <dataOnly|metadataOnly|dataAndMetadata> [--execute]\n' "$(basename "$0")"
}
while [[ "$#" -gt 0 ]]; do
case "$1" in
--fs) FILESYSTEM="${2:-}"; shift 2 ;;
--devices) DEVICES="${2:-}"; shift 2 ;;
--servers) SERVERS="${2:-}"; shift 2 ;;
--failure-group) FAILURE_GROUP="${2:-}"; shift 2 ;;
--pool) STORAGE_POOL="${2:-}"; shift 2 ;;
--usage) USAGE="${2:-}"; shift 2 ;;
--execute) EXECUTE=true; DRY_RUN=false; shift ;;
-h|--help) usage; exit 0 ;;
*) critical "Unknown argument: $1"; usage; exit 2 ;;
esac
done
section "Recommended GPFS Expansion Flow"
cat <<FLOW
Step 1: Cluster overview
$SCRIPT_DIR/01_cluster_overview.sh
Step 2: GPFS precheck
$SCRIPT_DIR/02_precheck_gpfs.sh --fs <filesystem>
Step 3: Detect candidate disks
$SCRIPT_DIR/03_detect_new_disks.sh --exclude-mounted --exclude-existing-nsd
Step 4: Generate NSD stanza
$SCRIPT_DIR/04_create_nsd_stanza.sh --fs <filesystem> --devices "/dev/sdb /dev/sdc" --servers "node1,node2" --failure-group <number> --pool <storage_pool> --usage <usage>
Step 5: Create NSDs and add disks to filesystem
$SCRIPT_DIR/05_add_nsd_to_filesystem.sh --fs <filesystem> --stanza <stanza_file> [--execute]
Step 6: Optional restripe/rebalance
$SCRIPT_DIR/06_rebalance_filesystem.sh --fs <filesystem> [--execute] [--background]
Step 7: Post-check
$SCRIPT_DIR/07_postcheck_gpfs.sh --fs <filesystem>
Step 8: Generate report
$SCRIPT_DIR/08_generate_report.sh --fs <filesystem>
FLOW
if [[ -z "$FILESYSTEM" ]]; then
warning "No --fs supplied. Printed runbook only."
exit 0
fi
if [[ "$EXECUTE" == "true" ]]; then
warning "--execute was supplied. Destructive steps still require the individual script confirmation prompt."
else
DRY_RUN=true
fi
section "Running Safe Read-Only Steps"
"$SCRIPT_DIR/01_cluster_overview.sh" || warning "Cluster overview reported warnings or failures"
"$SCRIPT_DIR/02_precheck_gpfs.sh" --fs "$FILESYSTEM" || warning "Precheck reported warnings or failures"
"$SCRIPT_DIR/03_detect_new_disks.sh" --exclude-mounted --exclude-existing-nsd || warning "Disk detection reported warnings or failures"
if [[ -n "$DEVICES" || -n "$SERVERS" || -n "$FAILURE_GROUP" ]]; then
if [[ -z "$DEVICES" || -z "$SERVERS" || -z "$FAILURE_GROUP" ]]; then
warning "NSD stanza generation requires --devices, --servers, --failure-group, --pool, and --usage"
else
"$SCRIPT_DIR/04_create_nsd_stanza.sh" \
--fs "$FILESYSTEM" \
--devices "$DEVICES" \
--servers "$SERVERS" \
--failure-group "$FAILURE_GROUP" \
--pool "$STORAGE_POOL" \
--usage "$USAGE"
fi
fi
section "Next Manual Step"
if [[ "$EXECUTE" == "true" ]]; then
warning "Run 05_add_nsd_to_filesystem.sh manually with --execute after reviewing the generated stanza."
else
ok "Review outputs and generated stanza. Add disks only through 05_add_nsd_to_filesystem.sh with --execute."
fi
+68
View File
@@ -0,0 +1,68 @@
#!/usr/bin/env bash
set -o errexit
set -o nounset
set -o pipefail
section() {
printf '\n== %s ==\n' "$1"
}
run_or_warn() {
local description="$1"
shift
if command -v "$1" >/dev/null 2>&1; then
"$@" || printf 'WARNING: %s command failed\n' "$description"
else
printf 'WARNING: %s command not available\n' "$1"
fi
}
top_processes() {
local sort_key="$1"
if command -v ps >/dev/null 2>&1; then
ps -eo pid,ppid,comm,%cpu,%mem --sort="$sort_key" | head -n 11
else
printf 'WARNING: ps command not available\n'
fi
}
section "Host"
hostname
uptime
section "OS"
if [[ -r /etc/os-release ]]; then
. /etc/os-release
printf '%s\n' "${PRETTY_NAME:-Unknown Linux}"
else
printf 'WARNING: /etc/os-release not readable\n'
fi
uname -r
section "CPU Load"
if [[ -r /proc/loadavg ]]; then
awk '{print "1m="$1, "5m="$2, "15m="$3}' /proc/loadavg
else
uptime
fi
section "Memory"
run_or_warn "memory usage" free -h
section "Disk"
run_or_warn "disk usage" df -h -x tmpfs -x devtmpfs
section "Failed systemd Services"
if command -v systemctl >/dev/null 2>&1; then
systemctl --failed --no-pager || true
else
printf 'WARNING: systemctl command not available\n'
fi
section "Top CPU Processes"
top_processes "-%cpu"
section "Top Memory Processes"
top_processes "-%mem"
+148
View File
@@ -0,0 +1,148 @@
#!/usr/bin/env bash
set -o errexit
set -o nounset
set -o pipefail
target="${1:-}"
status=0
warnings=()
criticals=()
section() {
printf '\n[%s]\n' "$1"
}
warn() {
warnings+=("$1")
printf 'WARNING: %s\n' "$1"
}
critical() {
criticals+=("$1")
status=1
printf 'CRITICAL: %s\n' "$1"
}
have() {
command -v "$1" >/dev/null 2>&1
}
run_if_available() {
local command_name="$1"
shift
if have "$command_name"; then
"$@" || warn "$command_name command failed"
else
warn "$command_name command not available"
fi
}
section "LOCAL NETWORK"
if have ip; then
ip addr || warn "ip addr command failed"
printf '\nRouting table:\n'
ip route || warn "ip route command failed"
printf '\nDefault gateway:\n'
if ! ip route show default; then
critical "default gateway not found"
elif ! ip route show default | grep -q '^default '; then
critical "default gateway not configured"
fi
else
warn "ip command not available"
fi
section "INTERFACES"
active_interfaces=0
if have ip; then
ip -br link || warn "interface state query failed"
active_interfaces="$(ip -br link 2>/dev/null | awk '$2 == "UP" && $1 != "lo" {count++} END {print count+0}')"
if (( active_interfaces == 0 )); then
critical "no active non-loopback interface detected"
else
printf 'OK: %s active non-loopback interface(s) detected\n' "$active_interfaces"
fi
else
warn "cannot inspect interface state without ip command"
fi
section "DNS"
if [[ -r /etc/resolv.conf ]]; then
cat /etc/resolv.conf
else
warn "/etc/resolv.conf not readable"
fi
dns_target="${target:-localhost}"
if have getent; then
if getent hosts "$dns_target" >/dev/null 2>&1; then
printf 'OK: DNS resolution succeeded for %s\n' "$dns_target"
getent hosts "$dns_target"
else
critical "DNS resolution failed for ${dns_target}"
fi
elif have nslookup; then
if nslookup "$dns_target"; then
printf 'OK: DNS resolution succeeded for %s\n' "$dns_target"
else
critical "DNS resolution failed for ${dns_target}"
fi
else
warn "no DNS lookup tool available"
fi
section "CONNECTIVITY"
if [[ -n "$target" ]]; then
if have ping; then
if ping -c 3 -W 2 "$target"; then
printf 'OK: ping succeeded for %s\n' "$target"
else
critical "ping failed for ${target}"
fi
else
warn "ping command not available"
fi
run_if_available traceroute traceroute "$target"
if have nc; then
if nc -vz -w 3 "$target" 443; then
printf 'OK: TCP 443 reachable on %s\n' "$target"
else
critical "TCP 443 connectivity failed for ${target}"
fi
elif have curl; then
if curl --head --silent --show-error --connect-timeout 5 "https://${target}" >/dev/null; then
printf 'OK: HTTPS connectivity succeeded for %s\n' "$target"
else
critical "HTTPS connectivity failed for ${target}"
fi
else
warn "no TCP connectivity test tool available (nc or curl)"
fi
else
printf 'OK: no target provided; skipped remote connectivity checks\n'
fi
section "PORTS"
if have ss; then
ss -tuln || warn "ss command failed"
else
warn "ss command not available"
fi
section "SUMMARY"
if (( ${#criticals[@]} > 0 )); then
printf 'CRITICAL: %s issue(s) detected\n' "${#criticals[@]}"
fi
if (( ${#warnings[@]} > 0 )); then
printf 'WARNING: %s warning(s) detected\n' "${#warnings[@]}"
fi
if (( status == 0 )); then
printf 'OK: no obvious DNS or connectivity problems detected\n'
fi
exit "$status"
+60
View File
@@ -0,0 +1,60 @@
#!/usr/bin/env bash
set -o errexit
set -o nounset
set -o pipefail
services=("$@")
service_exists() {
local service="$1"
systemctl list-unit-files "${service}.service" --no-legend 2>/dev/null | awk '{print $1}' | grep -qx "${service}.service"
}
pick_default_scheduler() {
if service_exists cron; then
printf 'cron'
elif service_exists crond; then
printf 'crond'
else
printf 'cron'
fi
}
pick_default_ssh() {
if service_exists sshd; then
printf 'sshd'
elif service_exists ssh; then
printf 'ssh'
else
printf 'sshd'
fi
}
if ! command -v systemctl >/dev/null 2>&1; then
printf 'CRITICAL: systemctl command not available; cannot check services\n' >&2
exit 1
fi
if (( ${#services[@]} == 0 )); then
services=("$(pick_default_ssh)" "$(pick_default_scheduler)")
fi
status=0
for service in "${services[@]}"; do
if ! service_exists "$service"; then
printf 'CRITICAL: %s service not found\n' "$service"
status=1
continue
fi
if systemctl is-active --quiet "$service"; then
printf 'OK: %s is active\n' "$service"
else
state="$(systemctl is-active "$service" 2>/dev/null || true)"
printf 'CRITICAL: %s is %s\n' "$service" "${state:-unknown}"
status=1
fi
done
exit "$status"
+81
View File
@@ -0,0 +1,81 @@
#!/usr/bin/env bash
set -o errexit
set -o nounset
set -o pipefail
host="$(hostname)"
timestamp="$(date '+%Y-%m-%d_%H%M%S')"
report="/tmp/system_report_${host}_${timestamp}.txt"
section() {
printf '\n== %s ==\n' "$1"
}
run_or_warn() {
local description="$1"
shift
if command -v "$1" >/dev/null 2>&1; then
"$@" || printf 'WARNING: %s command failed\n' "$description"
else
printf 'WARNING: %s command not available\n' "$1"
fi
}
{
section "Host"
hostname
section "Date"
date
section "Uptime"
uptime
section "OS"
if [[ -r /etc/os-release ]]; then
. /etc/os-release
printf '%s\n' "${PRETTY_NAME:-Unknown Linux}"
else
printf 'WARNING: /etc/os-release not readable\n'
fi
section "Kernel"
uname -r
section "CPU Load"
if [[ -r /proc/loadavg ]]; then
awk '{print "1m="$1, "5m="$2, "15m="$3}' /proc/loadavg
else
uptime
fi
section "Memory"
run_or_warn "memory usage" free -h
section "Disk"
run_or_warn "disk usage" df -h -x tmpfs -x devtmpfs
section "Failed systemd Services"
if command -v systemctl >/dev/null 2>&1; then
systemctl --failed --no-pager || true
else
printf 'WARNING: systemctl command not available\n'
fi
section "Listening Ports"
if command -v ss >/dev/null 2>&1; then
ss -tuln || printf 'WARNING: ss command failed\n'
else
printf 'WARNING: ss command not available\n'
fi
section "Recent Kernel Messages"
if command -v journalctl >/dev/null 2>&1; then
journalctl -k -n 50 --no-pager || printf 'WARNING: journalctl kernel log query failed\n'
else
printf 'WARNING: journalctl command not available\n'
fi
} > "$report"
printf 'System report written to: %s\n' "$report"
+238
View File
@@ -0,0 +1,238 @@
#!/usr/bin/env bash
set -o errexit
set -o nounset
set -o pipefail
DRY_RUN=true
TIMESTAMP="$(date +%Y%m%d_%H%M%S)"
LOG_FILE="${LOG_FILE:-/tmp/veritas_extend_${TIMESTAMP}.log}"
SERVICE_GROUP=""
DISKGROUP=""
VOLUME=""
MOUNTPOINT=""
SIZE=""
DISKS=""
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
log() {
local level="${1:-INFO}"
shift || true
local message="$*"
local line
line="$(printf '%s [%s] %s' "$(date '+%Y-%m-%d %H:%M:%S')" "$level" "$message")"
printf '%s\n' "$line"
printf '%s\n' "$line" >> "$LOG_FILE"
}
ok() {
log "OK" "$*"
}
warning() {
log "WARNING" "$*"
}
critical() {
log "CRITICAL" "$*"
}
require_cmd() {
local cmd="$1"
if ! command -v "$cmd" >/dev/null 2>&1; then
critical "required command not found: $cmd"
return 1
fi
return 0
}
run_cmd() {
local description="$1"
shift
if (( "$#" == 0 )); then
critical "run_cmd called without a command"
return 2
fi
log "INFO" "$description"
log "INFO" "command: $*"
if [[ "$DRY_RUN" == "true" ]]; then
log "INFO" "DRY-RUN: command not executed"
return 0
fi
"$@" 2>&1 | tee -a "$LOG_FILE"
}
confirm_execute() {
local prompt="${1:-Type EXECUTE to continue with real changes}"
local answer=""
if [[ "$DRY_RUN" == "true" ]]; then
ok "dry-run mode active; confirmation not required"
return 0
fi
warning "real execution mode requested with --execute"
warning "$prompt"
printf 'Type EXECUTE to continue: '
read -r answer
if [[ "$answer" != "EXECUTE" ]]; then
critical "confirmation failed; no changes made"
exit 2
fi
}
usage_common() {
cat <<'USAGE'
Common options:
--sg <service_group>
--dg <diskgroup>
--vol <volume>
--mount <mountpoint>
--size <+SIZE>
--disks "disk1 disk2"
--execute
--help
USAGE
}
parse_common_args() {
while (( "$#" > 0 )); do
case "$1" in
--sg)
if [[ -z "${2:-}" ]]; then
critical "missing value for --sg"
exit 2
fi
SERVICE_GROUP="${2:-}"
shift 2
;;
--dg)
if [[ -z "${2:-}" ]]; then
critical "missing value for --dg"
exit 2
fi
DISKGROUP="${2:-}"
shift 2
;;
--vol)
if [[ -z "${2:-}" ]]; then
critical "missing value for --vol"
exit 2
fi
VOLUME="${2:-}"
shift 2
;;
--mount)
if [[ -z "${2:-}" ]]; then
critical "missing value for --mount"
exit 2
fi
MOUNTPOINT="${2:-}"
shift 2
;;
--size)
if [[ -z "${2:-}" ]]; then
critical "missing value for --size"
exit 2
fi
SIZE="${2:-}"
shift 2
;;
--disks)
if [[ -z "${2:-}" ]]; then
critical "missing value for --disks"
exit 2
fi
DISKS="${2:-}"
shift 2
;;
--execute)
DRY_RUN=false
shift
;;
--help|-h)
usage_common
exit 0
;;
*)
critical "unknown argument: $1"
usage_common
exit 2
;;
esac
done
}
require_nonempty() {
local value="$1"
local name="$2"
if [[ -z "$value" ]]; then
critical "missing required argument: $name"
return 1
fi
return 0
}
require_inputs() {
local failed=0
local name
for name in "$@"; do
case "$name" in
sg) require_nonempty "$SERVICE_GROUP" "--sg" || failed=1 ;;
dg) require_nonempty "$DISKGROUP" "--dg" || failed=1 ;;
vol) require_nonempty "$VOLUME" "--vol" || failed=1 ;;
mount) require_nonempty "$MOUNTPOINT" "--mount" || failed=1 ;;
size) require_nonempty "$SIZE" "--size" || failed=1 ;;
disks) require_nonempty "$DISKS" "--disks" || failed=1 ;;
*) critical "internal error: unknown required input '$name'"; failed=1 ;;
esac
done
if (( failed != 0 )); then
usage_common
exit 2
fi
}
has_cmd() {
command -v "$1" >/dev/null 2>&1
}
capture_cmd() {
local description="$1"
shift
log "INFO" "$description"
log "INFO" "command: $*"
"$@" 2>&1 | tee -a "$LOG_FILE"
}
disk_status_line() {
local disk="$1"
vxdisk list "$disk" 2>/dev/null | awk -F': *' '
/device:/ {device=$2}
/status:/ {status=$2}
END {
if (device != "" || status != "") {
print device "|" status
}
}'
}
vxprint_volume_device() {
local dg="$1"
local vol="$2"
vxprint -g "$dg" -F '%device' "$vol" 2>/dev/null || true
}
+42
View File
@@ -0,0 +1,42 @@
#!/usr/bin/env bash
set -o errexit
set -o nounset
set -o pipefail
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
# shellcheck source=00_env.sh
source "$SCRIPT_DIR/00_env.sh"
parse_common_args "$@"
missing=0
for cmd in lsblk vxdisk vxdctl; do
require_cmd "$cmd" || missing=1
done
if (( missing != 0 )); then
exit 2
fi
ok "Veritas LUN discovery started"
log "INFO" "log file: $LOG_FILE"
capture_cmd "Current Linux block devices" lsblk
capture_cmd "Current VxVM disks" vxdisk list
run_cmd "Refresh VxVM device discovery" vxdctl enable
run_cmd "Scan disks known to VxVM" vxdisk scandisks
ok "Candidate VxVM disks with status 'online invalid'"
candidate_count=0
while read -r disk status rest; do
if [[ "$status $rest" == *"online invalid"* ]]; then
printf ' %s %s %s\n' "$disk" "$status" "$rest" | tee -a "$LOG_FILE"
candidate_count=$(( candidate_count + 1 ))
fi
done < <(vxdisk list 2>/dev/null | awk 'NR > 1 {print $1, $4, $5, $6, $7, $8}')
if (( candidate_count == 0 )); then
warning "no candidate disks detected with VxVM status 'online invalid'"
else
ok "detected $candidate_count candidate disk(s); review before initialization"
fi
+94
View File
@@ -0,0 +1,94 @@
#!/usr/bin/env bash
set -o errexit
set -o nounset
set -o pipefail
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
# shellcheck source=00_env.sh
source "$SCRIPT_DIR/00_env.sh"
parse_common_args "$@"
require_inputs sg dg vol mount
missing=0
for cmd in hastatus hagrp hares vxdisk vxdg vxprint df findmnt; do
require_cmd "$cmd" || missing=1
done
if (( missing != 0 )); then
exit 2
fi
status=0
ok "Precheck started for service group '$SERVICE_GROUP', diskgroup '$DISKGROUP', volume '$VOLUME'"
log "INFO" "log file: $LOG_FILE"
if hastatus -sum >/dev/null 2>&1; then
ok "VCS status is available"
else
critical "VCS does not appear to be running or hastatus failed"
status=1
fi
if hagrp -display "$SERVICE_GROUP" >/dev/null 2>&1; then
ok "service group exists: $SERVICE_GROUP"
else
critical "service group not found: $SERVICE_GROUP"
status=1
fi
group_state="$(hagrp -state "$SERVICE_GROUP" 2>/dev/null || true)"
printf '%s\n' "$group_state" | tee -a "$LOG_FILE"
if printf '%s\n' "$group_state" | grep -qi "ONLINE"; then
ok "service group is online"
else
critical "service group is not online"
status=1
fi
online_node="$(printf '%s\n' "$group_state" | awk '/ONLINE/ {print $NF; exit}')"
if [[ -n "$online_node" ]]; then
ok "possible online node: $online_node"
else
warning "unable to identify online node from hagrp output"
fi
if vxdg list "$DISKGROUP" >/dev/null 2>&1; then
ok "diskgroup exists: $DISKGROUP"
else
critical "diskgroup not found: $DISKGROUP"
status=1
fi
if vxprint -g "$DISKGROUP" "$VOLUME" >/dev/null 2>&1; then
ok "volume exists: $VOLUME"
else
critical "volume not found in diskgroup: $VOLUME"
status=1
fi
if findmnt --target "$MOUNTPOINT" >/dev/null 2>&1; then
ok "mountpoint is mounted: $MOUNTPOINT"
fs_type="$(findmnt --noheadings --output FSTYPE --target "$MOUNTPOINT" | awk 'NR == 1 {print $1}')"
ok "filesystem type: ${fs_type:-unknown}"
else
critical "mountpoint is not mounted: $MOUNTPOINT"
status=1
fi
capture_cmd "Current filesystem usage" df -h "$MOUNTPOINT" || status=1
capture_cmd "Current VxVM layout" vxprint -g "$DISKGROUP" -ht || status=1
capture_cmd "Current VCS service group display" hagrp -display "$SERVICE_GROUP" || status=1
if hares -display 2>/dev/null | grep -F "$SERVICE_GROUP" | tee -a "$LOG_FILE"; then
ok "displayed VCS resources related to service group: $SERVICE_GROUP"
else
warning "no VCS resource display rows matched service group: $SERVICE_GROUP"
fi
if (( status == 0 )); then
ok "precheck completed successfully"
else
critical "precheck found one or more issues"
fi
exit "$status"
+31
View File
@@ -0,0 +1,31 @@
#!/usr/bin/env bash
set -o errexit
set -o nounset
set -o pipefail
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
# shellcheck source=00_env.sh
source "$SCRIPT_DIR/00_env.sh"
parse_common_args "$@"
require_inputs sg
missing=0
for cmd in hagrp grep; do
require_cmd "$cmd" || missing=1
done
if (( missing != 0 )); then
exit 2
fi
ok "Current service group state"
capture_cmd "hagrp state for $SERVICE_GROUP" hagrp -state "$SERVICE_GROUP"
warning "Freezing a VCS service group prevents automatic failover actions while the freeze is active"
confirm_execute "This will persistently freeze VCS service group '$SERVICE_GROUP'."
run_cmd "Freeze VCS service group persistently" hagrp -freeze "$SERVICE_GROUP" -persistent
ok "Freeze state check"
hagrp -display "$SERVICE_GROUP" 2>&1 | tee -a "$LOG_FILE" | grep -i "Frozen" || true
ok "freeze step completed"
+56
View File
@@ -0,0 +1,56 @@
#!/usr/bin/env bash
set -o errexit
set -o nounset
set -o pipefail
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
# shellcheck source=00_env.sh
source "$SCRIPT_DIR/00_env.sh"
parse_common_args "$@"
require_inputs disks
missing=0
for cmd in vxdisk vxdisksetup awk; do
require_cmd "$cmd" || missing=1
done
if (( missing != 0 )); then
exit 2
fi
status=0
for disk in $DISKS; do
if ! vxdisk list "$disk" >/dev/null 2>&1; then
critical "disk not found in vxdisk list: $disk"
status=1
continue
fi
info="$(disk_status_line "$disk")"
disk_status="${info#*|}"
if [[ "$disk_status" != *"online invalid"* ]]; then
critical "disk '$disk' is not safe to initialize; status is '${disk_status:-unknown}', expected 'online invalid'"
status=1
continue
fi
ok "disk '$disk' validated as online invalid"
done
if (( status != 0 )); then
critical "one or more disks failed validation; no initialization attempted"
exit 1
fi
confirm_execute "This will initialize VxVM metadata on disk(s): $DISKS"
for disk in $DISKS; do
run_cmd "Initialize VxVM disk $disk" vxdisksetup -i "$disk"
done
for disk in $DISKS; do
capture_cmd "Post-initialization VxVM disk state for $disk" vxdisk list "$disk"
done
ok "disk initialization step completed"
+70
View File
@@ -0,0 +1,70 @@
#!/usr/bin/env bash
set -o errexit
set -o nounset
set -o pipefail
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
# shellcheck source=00_env.sh
source "$SCRIPT_DIR/00_env.sh"
parse_common_args "$@"
require_inputs dg disks
missing=0
for cmd in vxdg vxdisk vxprint tr; do
require_cmd "$cmd" || missing=1
done
if (( missing != 0 )); then
exit 2
fi
if ! vxdg list "$DISKGROUP" >/dev/null 2>&1; then
critical "diskgroup not found: $DISKGROUP"
exit 1
fi
ok "diskgroup exists: $DISKGROUP"
status=0
for disk in $DISKS; do
if ! vxdisk list "$disk" >/dev/null 2>&1; then
critical "disk not found in vxdisk list: $disk"
status=1
continue
fi
summary="$(vxdisk list 2>/dev/null | awk -v disk="$disk" '$1 == disk {print $0}')"
if [[ -z "$summary" ]]; then
warning "unable to find summary row for $disk; using detailed status only"
elif printf '%s\n' "$summary" | awk '{print $3}' | grep -qv '^-'; then
critical "disk '$disk' appears to belong to a diskgroup: $summary"
status=1
continue
fi
info="$(disk_status_line "$disk")"
disk_status="${info#*|}"
if [[ "$disk_status" == *"online invalid"* ]]; then
critical "disk '$disk' is still online invalid; initialize it before adding to a diskgroup"
status=1
continue
fi
ok "disk '$disk' appears initialized and unassigned"
done
if (( status != 0 )); then
critical "one or more disks failed validation; diskgroup extension not attempted"
exit 1
fi
confirm_execute "This will add disk(s) '$DISKS' to VxVM diskgroup '$DISKGROUP'."
for disk in $DISKS; do
alias_base="$(printf '%s_%s' "$DISKGROUP" "$disk" | tr -c 'A-Za-z0-9_' '_')"
run_cmd "Add disk $disk to diskgroup $DISKGROUP as $alias_base" vxdg -g "$DISKGROUP" adddisk "${alias_base}=${disk}"
done
capture_cmd "Diskgroup details after extension" vxdg list "$DISKGROUP"
capture_cmd "VxVM layout after diskgroup extension" vxprint -g "$DISKGROUP" -ht
ok "diskgroup extension step completed"
+81
View File
@@ -0,0 +1,81 @@
#!/usr/bin/env bash
set -o errexit
set -o nounset
set -o pipefail
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
# shellcheck source=00_env.sh
source "$SCRIPT_DIR/00_env.sh"
parse_common_args "$@"
require_inputs dg vol mount size
missing=0
for cmd in vxdg vxprint vxassist df findmnt; do
require_cmd "$cmd" || missing=1
done
if (( missing != 0 )); then
exit 2
fi
if [[ ! "$SIZE" =~ ^\+[0-9]+[KkMmGgTtPp]?$ ]]; then
critical "invalid --size '$SIZE'; use a grow-by value such as +10G"
exit 2
fi
status=0
vxdg list "$DISKGROUP" >/dev/null 2>&1 || { critical "diskgroup not found: $DISKGROUP"; status=1; }
vxprint -g "$DISKGROUP" "$VOLUME" >/dev/null 2>&1 || { critical "volume not found: $VOLUME"; status=1; }
findmnt --target "$MOUNTPOINT" >/dev/null 2>&1 || { critical "mountpoint is not mounted: $MOUNTPOINT"; status=1; }
if (( status != 0 )); then
exit 1
fi
fs_type="$(findmnt --noheadings --output FSTYPE --target "$MOUNTPOINT" | awk 'NR == 1 {print $1}')"
device="$(findmnt --noheadings --output SOURCE --target "$MOUNTPOINT" | awk 'NR == 1 {print $1}')"
ok "filesystem type: ${fs_type:-unknown}"
ok "mounted device: ${device:-unknown}"
capture_cmd "Filesystem usage before expansion" df -h "$MOUNTPOINT"
capture_cmd "VxVM layout before volume expansion" vxprint -g "$DISKGROUP" -ht
confirm_execute "This will grow VxVM volume '$VOLUME' in diskgroup '$DISKGROUP' by '$SIZE'."
run_cmd "Grow VxVM volume by requested size" vxassist -g "$DISKGROUP" growby "$VOLUME" "$SIZE"
case "$fs_type" in
vxfs)
warning "VxFS fsadm syntax can vary by Veritas release and site standard"
warning "manual filesystem resize recommended after volume growth; review a command such as: fsadm -F vxfs -b <new_size_or_supported_option> $MOUNTPOINT"
;;
xfs)
if has_cmd xfs_growfs; then
run_cmd "Resize XFS filesystem online" xfs_growfs "$MOUNTPOINT"
else
critical "xfs_growfs not found; cannot resize XFS safely"
exit 1
fi
;;
ext3|ext4)
if has_cmd resize2fs; then
if [[ -n "$device" ]]; then
run_cmd "Resize ext filesystem" resize2fs "$device"
else
critical "unable to detect mounted device for resize2fs"
exit 1
fi
else
critical "resize2fs not found; cannot resize ext filesystem safely"
exit 1
fi
;;
*)
warning "unsupported or unknown filesystem type '$fs_type'; volume growth command was handled according to dry-run/execute mode"
warning "manual filesystem resize required after confirming platform-specific procedure"
;;
esac
capture_cmd "Filesystem usage after expansion attempt" df -h "$MOUNTPOINT"
capture_cmd "VxVM layout after volume expansion attempt" vxprint -g "$DISKGROUP" -ht
ok "volume and filesystem expansion step completed"
+94
View File
@@ -0,0 +1,94 @@
#!/usr/bin/env bash
set -o errexit
set -o nounset
set -o pipefail
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
# shellcheck source=00_env.sh
source "$SCRIPT_DIR/00_env.sh"
parse_common_args "$@"
require_inputs sg dg vol mount
missing=0
for cmd in hagrp vxdisk vxdg vxprint df findmnt; do
require_cmd "$cmd" || missing=1
done
if (( missing != 0 )); then
exit 2
fi
status=0
ok "Post-check started"
log "INFO" "log file: $LOG_FILE"
group_state="$(hagrp -state "$SERVICE_GROUP" 2>/dev/null || true)"
printf '%s\n' "$group_state" | tee -a "$LOG_FILE"
if printf '%s\n' "$group_state" | grep -qi "ONLINE"; then
ok "service group is online"
else
critical "service group is not online"
status=1
fi
freeze_display="$(hagrp -display "$SERVICE_GROUP" 2>/dev/null | grep -i "Frozen" || true)"
printf '%s\n' "$freeze_display" | tee -a "$LOG_FILE"
if printf '%s\n' "$freeze_display" | grep -Eqi "(1|true|yes|persistent)"; then
ok "service group still appears frozen before unfreeze"
else
warning "unable to confirm service group freeze state; review before unfreezing"
fi
if vxdg list "$DISKGROUP" >/dev/null 2>&1; then
ok "diskgroup imported and visible: $DISKGROUP"
else
critical "diskgroup not visible: $DISKGROUP"
status=1
fi
volume_line="$(vxprint -g "$DISKGROUP" -v "$VOLUME" 2>/dev/null || true)"
printf '%s\n' "$volume_line" | tee -a "$LOG_FILE"
if printf '%s\n' "$volume_line" | grep -Eqi "(ENABLED|ACTIVE|started|fsgen)"; then
ok "volume appears enabled or active"
else
critical "unable to confirm volume is enabled or active"
status=1
fi
if findmnt --target "$MOUNTPOINT" >/dev/null 2>&1; then
ok "mountpoint is mounted: $MOUNTPOINT"
else
critical "mountpoint is not mounted: $MOUNTPOINT"
status=1
fi
capture_cmd "Filesystem usage after expansion" df -h "$MOUNTPOINT" || status=1
capture_cmd "VxVM layout after expansion" vxprint -g "$DISKGROUP" -ht || status=1
capture_cmd "VxVM disk list after expansion" vxdisk list || status=1
if has_cmd journalctl; then
capture_cmd "Recent kernel journal messages" journalctl -k -n 50 || warning "journalctl check failed; review permissions or system logging"
else
warning "journalctl not found; skipping kernel journal check"
fi
if has_cmd dmesg; then
log "INFO" "Recent dmesg messages"
log "INFO" "command: dmesg -T | tail -50"
if dmesg -T 2>&1 | tail -50 | tee -a "$LOG_FILE"; then
ok "captured recent dmesg messages"
else
warning "dmesg check failed; review permissions or kernel logging"
fi
else
warning "dmesg not found; skipping dmesg check"
fi
if (( status == 0 )); then
ok "post-check completed successfully; compare df output with precheck baseline for expected size increase"
else
critical "post-check found one or more issues"
fi
exit "$status"
+31
View File
@@ -0,0 +1,31 @@
#!/usr/bin/env bash
set -o errexit
set -o nounset
set -o pipefail
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
# shellcheck source=00_env.sh
source "$SCRIPT_DIR/00_env.sh"
parse_common_args "$@"
require_inputs sg
missing=0
for cmd in hagrp grep; do
require_cmd "$cmd" || missing=1
done
if (( missing != 0 )); then
exit 2
fi
ok "Current service group freeze state"
hagrp -display "$SERVICE_GROUP" 2>&1 | tee -a "$LOG_FILE" | grep -i "Frozen" || true
confirm_execute "This will persistently unfreeze VCS service group '$SERVICE_GROUP'."
run_cmd "Unfreeze VCS service group persistently" hagrp -unfreeze "$SERVICE_GROUP" -persistent
ok "Verify service group freeze state"
hagrp -display "$SERVICE_GROUP" 2>&1 | tee -a "$LOG_FILE" | grep -i "Frozen" || true
capture_cmd "Current service group state" hagrp -state "$SERVICE_GROUP"
ok "unfreeze step completed"
+106
View File
@@ -0,0 +1,106 @@
# Veritas VxVM/VCS Storage Expansion Toolkit
Production-style Bash examples for expanding storage in a Veritas environment. These scripts are sanitized operational tooling for a Linux Infrastructure Engineer portfolio: they show the flow, guardrails, logging, and validation patterns used in enterprise change work.
## VxVM vs VCS
Veritas Volume Manager (VxVM) manages disks, disk groups, volumes, plexes, and subdisks. It is the storage virtualization layer used to initialize SAN LUNs, add capacity to disk groups, and grow volumes.
Veritas Cluster Server (VCS) manages application availability through service groups and resources. During storage changes, freezing the relevant service group can prevent unexpected automated failover actions while operators perform controlled work.
## Safety Notes
- Default mode is always dry-run.
- Real execution requires `--execute`.
- Mutating scripts require an interactive `EXECUTE` confirmation after `--execute`.
- Disk names are never assumed. Candidate disks must be supplied explicitly.
- Disks are initialized only when VxVM reports the expected `online invalid` state.
- Filesystem growth is conservative and depends on detected filesystem type.
- Exact Veritas and filesystem commands can differ by product version, OS, and site standards.
## Required Tools
Common commands used by the toolkit:
- Linux: `bash`, `lsblk`, `df`, `findmnt`, `awk`, `grep`, `tee`
- VCS: `hastatus`, `hagrp`, `hares`
- VxVM: `vxdctl`, `vxdisk`, `vxdisksetup`, `vxdg`, `vxprint`, `vxassist`
- Filesystem resize tools as applicable: `fsadm`, `xfs_growfs`, `resize2fs`
- Optional log checks: `journalctl`, `dmesg`
## Scripts
- `00_env.sh` - shared configuration, logging, dry-run handling, argument helpers.
- `01_detect_new_luns.sh` - discovers Linux block devices and VxVM `online invalid` candidates.
- `02_precheck_vcs_vxvm.sh` - validates cluster, diskgroup, volume, and filesystem state.
- `03_freeze_vcs_group.sh` - freezes a VCS service group.
- `04_init_vxvm_disks.sh` - initializes candidate VxVM disks.
- `05_extend_diskgroup.sh` - adds initialized disks to a diskgroup.
- `06_extend_volume_fs.sh` - grows a VxVM volume and resizes the filesystem where safe.
- `07_postcheck_vcs_vxvm.sh` - validates final state and gathers post-change evidence.
- `08_unfreeze_vcs_group.sh` - unfreezes the VCS service group.
- `veritas_extend_runbook.sh` - prints the recommended order and can optionally run the steps.
## Example Workflow
Print the runbook only:
```bash
./veritas_extend_runbook.sh \
--sg app_sg \
--dg appdg \
--vol appvol \
--mount /app \
--size +100G \
--disks "emc0_1234 emc0_1235"
```
Run all steps in dry-run mode:
```bash
./veritas_extend_runbook.sh \
--run \
--sg app_sg \
--dg appdg \
--vol appvol \
--mount /app \
--size +100G \
--disks "emc0_1234 emc0_1235"
```
Run a controlled execution. Each mutating step still asks for `EXECUTE`:
```bash
./veritas_extend_runbook.sh \
--run \
--execute \
--sg app_sg \
--dg appdg \
--vol appvol \
--mount /app \
--size +100G \
--disks "emc0_1234 emc0_1235"
```
Run individual steps:
```bash
./01_detect_new_luns.sh
./02_precheck_vcs_vxvm.sh --sg app_sg --dg appdg --vol appvol --mount /app
./03_freeze_vcs_group.sh --sg app_sg
./04_init_vxvm_disks.sh --disks "emc0_1234 emc0_1235"
./05_extend_diskgroup.sh --dg appdg --disks "emc0_1234 emc0_1235"
./06_extend_volume_fs.sh --dg appdg --vol appvol --mount /app --size +100G
./07_postcheck_vcs_vxvm.sh --sg app_sg --dg appdg --vol appvol --mount /app
./08_unfreeze_vcs_group.sh --sg app_sg
```
## Exit Codes
- `0` - OK.
- `1` - operational validation or execution failure.
- `2` - invalid input or missing required command.
## Operational Reminder
Use these scripts as examples and adapt them to local runbooks, naming standards, multipath stack, Veritas release, filesystem type, and change-control policy before production use.
+94
View File
@@ -0,0 +1,94 @@
#!/usr/bin/env bash
set -o errexit
set -o nounset
set -o pipefail
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
# shellcheck source=00_env.sh
source "$SCRIPT_DIR/00_env.sh"
RUN_STEPS=false
usage() {
cat <<'USAGE'
Usage:
./veritas_extend_runbook.sh --sg <service_group> --dg <diskgroup> --vol <volume> --mount <mountpoint> --size <+SIZE> --disks "disk1 disk2" [--execute] [--run]
Options:
--run Run each step in the recommended order. Without --run, only print the runbook.
--execute Pass --execute to change steps. Dry-run remains the default.
USAGE
usage_common
}
args=()
while (( "$#" > 0 )); do
case "$1" in
--run)
RUN_STEPS=true
shift
;;
--help|-h)
usage
exit 0
;;
*)
args+=("$1")
shift
;;
esac
done
parse_common_args "${args[@]}"
cat <<FLOW
Veritas VxVM/VCS storage expansion runbook
Mode: $(if [[ "$DRY_RUN" == "true" ]]; then printf 'DRY-RUN'; else printf 'EXECUTE'; fi)
Log file: $LOG_FILE
Step 1: Detect new LUNs
$SCRIPT_DIR/01_detect_new_luns.sh
Step 2: Run VCS/VxVM precheck
$SCRIPT_DIR/02_precheck_vcs_vxvm.sh --sg "$SERVICE_GROUP" --dg "$DISKGROUP" --vol "$VOLUME" --mount "$MOUNTPOINT"
Step 3: Freeze VCS service group
$SCRIPT_DIR/03_freeze_vcs_group.sh --sg "$SERVICE_GROUP"
Step 4: Initialize new VxVM disks
$SCRIPT_DIR/04_init_vxvm_disks.sh --disks "$DISKS"
Step 5: Add disks to diskgroup
$SCRIPT_DIR/05_extend_diskgroup.sh --dg "$DISKGROUP" --disks "$DISKS"
Step 6: Grow volume and filesystem
$SCRIPT_DIR/06_extend_volume_fs.sh --dg "$DISKGROUP" --vol "$VOLUME" --mount "$MOUNTPOINT" --size "$SIZE"
Step 7: Run post-check
$SCRIPT_DIR/07_postcheck_vcs_vxvm.sh --sg "$SERVICE_GROUP" --dg "$DISKGROUP" --vol "$VOLUME" --mount "$MOUNTPOINT"
Step 8: Unfreeze VCS service group
$SCRIPT_DIR/08_unfreeze_vcs_group.sh --sg "$SERVICE_GROUP"
FLOW
if [[ "$RUN_STEPS" != "true" ]]; then
warning "runbook printed only; add --run to invoke steps"
exit 0
fi
require_inputs sg dg vol mount size disks
execute_arg=()
if [[ "$DRY_RUN" == "false" ]]; then
warning "--execute supplied to wrapper; destructive steps will request confirmation individually"
execute_arg=(--execute)
fi
"$SCRIPT_DIR/01_detect_new_luns.sh"
"$SCRIPT_DIR/02_precheck_vcs_vxvm.sh" --sg "$SERVICE_GROUP" --dg "$DISKGROUP" --vol "$VOLUME" --mount "$MOUNTPOINT"
"$SCRIPT_DIR/03_freeze_vcs_group.sh" --sg "$SERVICE_GROUP" "${execute_arg[@]}"
"$SCRIPT_DIR/04_init_vxvm_disks.sh" --disks "$DISKS" "${execute_arg[@]}"
"$SCRIPT_DIR/05_extend_diskgroup.sh" --dg "$DISKGROUP" --disks "$DISKS" "${execute_arg[@]}"
"$SCRIPT_DIR/06_extend_volume_fs.sh" --dg "$DISKGROUP" --vol "$VOLUME" --mount "$MOUNTPOINT" --size "$SIZE" "${execute_arg[@]}"
"$SCRIPT_DIR/07_postcheck_vcs_vxvm.sh" --sg "$SERVICE_GROUP" --dg "$DISKGROUP" --vol "$VOLUME" --mount "$MOUNTPOINT"
"$SCRIPT_DIR/08_unfreeze_vcs_group.sh" --sg "$SERVICE_GROUP" "${execute_arg[@]}"