4.9 KiB
GPFS / IBM Spectrum Scale Filesystem Expansion Toolkit
Safe, sanitized Bash examples for planning and executing a GPFS / IBM Spectrum Scale filesystem expansion. The scripts are written as portfolio-grade operational tooling for a Linux Infrastructure Engineer: conservative defaults, clear validation, dry-run behavior, and explicit operator confirmation before changes.
These scripts are examples. Exact GPFS commands, flags, quorum practices, failure-group design, and storage naming standards vary by Spectrum Scale version and site policy.
Concepts
- Cluster - the Spectrum Scale administrative domain containing the nodes, daemon configuration, quorum policy, filesystems, and NSDs.
- Node - a server participating in the GPFS cluster. Nodes may be clients, NSD servers, quorum nodes, manager-capable nodes, or a mix of roles.
- Quorum - the voting mechanism that protects the cluster from split-brain conditions. Expansion work should not proceed during quorum instability.
- Filesystem - the GPFS namespace and data layout presented to clients, backed by one or more NSDs.
- NSD - Network Shared Disk, the GPFS abstraction for a disk or LUN that is served to the cluster.
- Failure group - a placement hint that tells GPFS which disks share a failure domain, such as an enclosure, rack, site, controller pair, or storage array.
- Storage pool - a named pool of NSDs used for placement and lifecycle policy, commonly
systemplus optional data pools. - Restripe/rebalance - the operation that redistributes data after disks are added. It can be I/O intensive and should run only in an approved change window.
Required Tools
Common GPFS / Spectrum Scale tools expected in production include:
mmgetstatemmlsclustermmlsfsmmlsdiskmmlsnsdmmcrnsdmmadddiskmmrestripefs
The toolkit also uses common Linux tools such as df, lsblk, findmnt, journalctl, and dmesg where available. Missing optional commands are reported as WARNING and skipped.
Safety Model
- Default mode is dry-run.
- Real GPFS modifications require
--execute. - Destructive or high-impact steps also prompt for
EXECUTE. - Disk detection is read-only and never partitions, formats, wipes, or modifies devices.
- Device selection must always be confirmed with the storage team and cluster owners.
- The scripts do not assume production disk names.
Output uses a consistent status format:
OKWARNINGCRITICAL
Exit codes:
0- OK1- operational validation failure2- invalid input or missing requirement
Scripts
00_env.sh- shared configuration and helper functions.01_cluster_overview.sh- read-only cluster overview.02_precheck_gpfs.sh- pre-expansion validation for a target filesystem.03_detect_new_disks.sh- read-only candidate block-device discovery.04_create_nsd_stanza.sh- generate an NSD stanza file.05_add_nsd_to_filesystem.sh- create NSDs and add disks to a filesystem, dry-run by default.06_rebalance_filesystem.sh- optional restripe/rebalance, dry-run by default.07_postcheck_gpfs.sh- post-change validation.08_generate_report.sh- text report for the change record.gpfs_extend_runbook.sh- guided order of operations plus safe read-only checks.
Example Workflow
cd infra-run/scripts/bash/gpfs
./01_cluster_overview.sh
./02_precheck_gpfs.sh --fs gpfs01
./03_detect_new_disks.sh --exclude-mounted --exclude-existing-nsd
./04_create_nsd_stanza.sh \
--fs gpfs01 \
--devices "/dev/sdb /dev/sdc" \
--servers "gpfsnsd01,gpfsnsd02" \
--failure-group 10 \
--pool system \
--usage dataAndMetadata
Review the generated stanza with the storage and cluster teams. Confirm device identity, LUN masking, multipath naming, failure group placement, and site standards before continuing.
Dry-run the add step:
./05_add_nsd_to_filesystem.sh \
--fs gpfs01 \
--stanza /tmp/gpfs_nsd_gpfs01_YYYYmmdd_HHMMSS.stanza
Execute only in an approved change window:
./05_add_nsd_to_filesystem.sh \
--fs gpfs01 \
--stanza /tmp/gpfs_nsd_gpfs01_YYYYmmdd_HHMMSS.stanza \
--execute
Optional rebalance:
./06_rebalance_filesystem.sh --fs gpfs01
./06_rebalance_filesystem.sh --fs gpfs01 --execute --background
Post-check and report:
./07_postcheck_gpfs.sh --fs gpfs01
./08_generate_report.sh --fs gpfs01
Runbook helper:
./gpfs_extend_runbook.sh \
--fs gpfs01 \
--devices "/dev/sdb /dev/sdc" \
--servers "gpfsnsd01,gpfsnsd02" \
--failure-group 10 \
--pool system \
--usage dataAndMetadata
Operational Notes
- Do not run these scripts blindly on production clusters.
- Confirm disk and multipath identity with the storage team before creating NSDs.
- Validate quorum and manager health before expansion.
- Confirm application I/O risk and rollback procedures before
mmadddiskormmrestripefs. - Confirm the Spectrum Scale version and local standards for stanza fields before executing changes.