# GPFS / IBM Spectrum Scale Filesystem Expansion Toolkit Safe, sanitized Bash examples for planning and executing a GPFS / IBM Spectrum Scale filesystem expansion. The scripts are written as readable operational examples for a Linux Infrastructure Engineer: conservative defaults, clear validation, dry-run behavior, and explicit operator confirmation before changes. These scripts are examples. Exact GPFS commands, flags, quorum practices, failure-group design, and storage naming standards vary by Spectrum Scale version and site policy. ## Diagram ```mermaid flowchart TD A["gpfs"] --> B["01_cluster_overview.sh"] A --> C["02_precheck_gpfs.sh"] A --> D["03_detect_new_disks.sh"] A --> E["04_create_nsd_stanza.sh"] A --> F["05_add_nsd_to_filesystem.sh"] A --> G["06_rebalance_filesystem.sh"] A --> H["07_postcheck_gpfs.sh"] A --> I["08_generate_report.sh"] A --> J["gpfs_extend_runbook.sh"] ``` ## Concepts - **Cluster** - the Spectrum Scale administrative domain containing the nodes, daemon configuration, quorum policy, filesystems, and NSDs. - **Node** - a server participating in the GPFS cluster. Nodes may be clients, NSD servers, quorum nodes, manager-capable nodes, or a mix of roles. - **Quorum** - the voting mechanism that protects the cluster from split-brain conditions. Expansion work should not proceed during quorum instability. - **Filesystem** - the GPFS namespace and data layout presented to clients, backed by one or more NSDs. - **NSD** - Network Shared Disk, the GPFS abstraction for a disk or LUN that is served to the cluster. - **Failure group** - a placement hint that tells GPFS which disks share a failure domain, such as an enclosure, rack, site, controller pair, or storage array. - **Storage pool** - a named pool of NSDs used for placement and lifecycle policy, commonly `system` plus optional data pools. - **Restripe/rebalance** - the operation that redistributes data after disks are added. It can be I/O intensive and should run only in an approved change window. ## Required Tools Common GPFS / Spectrum Scale tools expected in production include: - `mmgetstate` - `mmlscluster` - `mmlsfs` - `mmlsdisk` - `mmlsnsd` - `mmcrnsd` - `mmadddisk` - `mmrestripefs` The toolkit also uses common Linux tools such as `df`, `lsblk`, `findmnt`, `journalctl`, and `dmesg` where available. Missing optional commands are reported as `WARNING` and skipped. ## Safety Model - Default mode is dry-run. - Real GPFS modifications require `--execute`. - Destructive or high-impact steps also prompt for `EXECUTE`. - Disk detection is read-only and never partitions, formats, wipes, or modifies devices. - Device selection must always be confirmed with the storage team and cluster owners. - The scripts do not assume production disk names. Output uses a consistent status format: - `OK` - `WARNING` - `CRITICAL` Exit codes: - `0` - OK - `1` - operational validation failure - `2` - invalid input or missing requirement ## Scripts - `00_env.sh` - shared configuration and helper functions. - `01_cluster_overview.sh` - read-only cluster overview. - `02_precheck_gpfs.sh` - pre-expansion validation for a target filesystem. - `03_detect_new_disks.sh` - read-only candidate block-device discovery. - `04_create_nsd_stanza.sh` - generate an NSD stanza file. - `05_add_nsd_to_filesystem.sh` - create NSDs and add disks to a filesystem, dry-run by default. - `06_rebalance_filesystem.sh` - optional restripe/rebalance, dry-run by default. - `07_postcheck_gpfs.sh` - post-change validation. - `08_generate_report.sh` - text report for the change record. - `gpfs_extend_runbook.sh` - guided order of operations plus safe read-only checks. ## Example Workflow ```bash cd infra-run/scripts/bash/gpfs ./01_cluster_overview.sh ./02_precheck_gpfs.sh --fs gpfs01 ./03_detect_new_disks.sh --exclude-mounted --exclude-existing-nsd ./04_create_nsd_stanza.sh \ --fs gpfs01 \ --devices "/dev/sdb /dev/sdc" \ --servers "gpfsnsd01,gpfsnsd02" \ --failure-group 10 \ --pool system \ --usage dataAndMetadata ``` Review the generated stanza with the storage and cluster teams. Confirm device identity, LUN masking, multipath naming, failure group placement, and site standards before continuing. Dry-run the add step: ```bash ./05_add_nsd_to_filesystem.sh \ --fs gpfs01 \ --stanza /tmp/gpfs_nsd_gpfs01_YYYYmmdd_HHMMSS.stanza ``` Execute only in an approved change window: ```bash ./05_add_nsd_to_filesystem.sh \ --fs gpfs01 \ --stanza /tmp/gpfs_nsd_gpfs01_YYYYmmdd_HHMMSS.stanza \ --execute ``` Optional rebalance: ```bash ./06_rebalance_filesystem.sh --fs gpfs01 ./06_rebalance_filesystem.sh --fs gpfs01 --execute --background ``` Post-check and report: ```bash ./07_postcheck_gpfs.sh --fs gpfs01 ./08_generate_report.sh --fs gpfs01 ``` Runbook helper: ```bash ./gpfs_extend_runbook.sh \ --fs gpfs01 \ --devices "/dev/sdb /dev/sdc" \ --servers "gpfsnsd01,gpfsnsd02" \ --failure-group 10 \ --pool system \ --usage dataAndMetadata ``` ## Operational Notes - Do not run these scripts blindly on production clusters. - Confirm disk and multipath identity with the storage team before creating NSDs. - Validate quorum and manager health before expansion. - Confirm application I/O risk and rollback procedures before `mmadddisk` or `mmrestripefs`. - Confirm the Spectrum Scale version and local standards for stanza fields before executing changes.