Files
portfolio/infra-run/scripts/bash/gpfs/README.md
T
2026-05-06 06:46:27 +00:00

5.0 KiB

GPFS / IBM Spectrum Scale Filesystem Expansion Toolkit

Safe, sanitized Bash examples for planning and executing a GPFS / IBM Spectrum Scale filesystem expansion. The scripts are written as portfolio-grade operational tooling for a Linux Infrastructure Engineer: conservative defaults, clear validation, dry-run behavior, and explicit operator confirmation before changes.

These scripts are examples. Exact GPFS commands, flags, quorum practices, failure-group design, and storage naming standards vary by Spectrum Scale version and site policy.

Diagram

flowchart TD
  A["gpfs"]
  click A href "./" "gpfs"

Concepts

  • Cluster - the Spectrum Scale administrative domain containing the nodes, daemon configuration, quorum policy, filesystems, and NSDs.
  • Node - a server participating in the GPFS cluster. Nodes may be clients, NSD servers, quorum nodes, manager-capable nodes, or a mix of roles.
  • Quorum - the voting mechanism that protects the cluster from split-brain conditions. Expansion work should not proceed during quorum instability.
  • Filesystem - the GPFS namespace and data layout presented to clients, backed by one or more NSDs.
  • NSD - Network Shared Disk, the GPFS abstraction for a disk or LUN that is served to the cluster.
  • Failure group - a placement hint that tells GPFS which disks share a failure domain, such as an enclosure, rack, site, controller pair, or storage array.
  • Storage pool - a named pool of NSDs used for placement and lifecycle policy, commonly system plus optional data pools.
  • Restripe/rebalance - the operation that redistributes data after disks are added. It can be I/O intensive and should run only in an approved change window.

Required Tools

Common GPFS / Spectrum Scale tools expected in production include:

  • mmgetstate
  • mmlscluster
  • mmlsfs
  • mmlsdisk
  • mmlsnsd
  • mmcrnsd
  • mmadddisk
  • mmrestripefs

The toolkit also uses common Linux tools such as df, lsblk, findmnt, journalctl, and dmesg where available. Missing optional commands are reported as WARNING and skipped.

Safety Model

  • Default mode is dry-run.
  • Real GPFS modifications require --execute.
  • Destructive or high-impact steps also prompt for EXECUTE.
  • Disk detection is read-only and never partitions, formats, wipes, or modifies devices.
  • Device selection must always be confirmed with the storage team and cluster owners.
  • The scripts do not assume production disk names.

Output uses a consistent status format:

  • OK
  • WARNING
  • CRITICAL

Exit codes:

  • 0 - OK
  • 1 - operational validation failure
  • 2 - invalid input or missing requirement

Scripts

  • 00_env.sh - shared configuration and helper functions.
  • 01_cluster_overview.sh - read-only cluster overview.
  • 02_precheck_gpfs.sh - pre-expansion validation for a target filesystem.
  • 03_detect_new_disks.sh - read-only candidate block-device discovery.
  • 04_create_nsd_stanza.sh - generate an NSD stanza file.
  • 05_add_nsd_to_filesystem.sh - create NSDs and add disks to a filesystem, dry-run by default.
  • 06_rebalance_filesystem.sh - optional restripe/rebalance, dry-run by default.
  • 07_postcheck_gpfs.sh - post-change validation.
  • 08_generate_report.sh - text report for the change record.
  • gpfs_extend_runbook.sh - guided order of operations plus safe read-only checks.

Example Workflow

cd infra-run/scripts/bash/gpfs

./01_cluster_overview.sh
./02_precheck_gpfs.sh --fs gpfs01
./03_detect_new_disks.sh --exclude-mounted --exclude-existing-nsd

./04_create_nsd_stanza.sh \
  --fs gpfs01 \
  --devices "/dev/sdb /dev/sdc" \
  --servers "gpfsnsd01,gpfsnsd02" \
  --failure-group 10 \
  --pool system \
  --usage dataAndMetadata

Review the generated stanza with the storage and cluster teams. Confirm device identity, LUN masking, multipath naming, failure group placement, and site standards before continuing.

Dry-run the add step:

./05_add_nsd_to_filesystem.sh \
  --fs gpfs01 \
  --stanza /tmp/gpfs_nsd_gpfs01_YYYYmmdd_HHMMSS.stanza

Execute only in an approved change window:

./05_add_nsd_to_filesystem.sh \
  --fs gpfs01 \
  --stanza /tmp/gpfs_nsd_gpfs01_YYYYmmdd_HHMMSS.stanza \
  --execute

Optional rebalance:

./06_rebalance_filesystem.sh --fs gpfs01
./06_rebalance_filesystem.sh --fs gpfs01 --execute --background

Post-check and report:

./07_postcheck_gpfs.sh --fs gpfs01
./08_generate_report.sh --fs gpfs01

Runbook helper:

./gpfs_extend_runbook.sh \
  --fs gpfs01 \
  --devices "/dev/sdb /dev/sdc" \
  --servers "gpfsnsd01,gpfsnsd02" \
  --failure-group 10 \
  --pool system \
  --usage dataAndMetadata

Operational Notes

  • Do not run these scripts blindly on production clusters.
  • Confirm disk and multipath identity with the storage team before creating NSDs.
  • Validate quorum and manager health before expansion.
  • Confirm application I/O risk and rollback procedures before mmadddisk or mmrestripefs.
  • Confirm the Spectrum Scale version and local standards for stanza fields before executing changes.