Files
portfolio/infra-run/scripts/bash/gpfs/README.md
T
Mateusz Suski ca5a876d03
lint / shell-yaml-ansible (push) Failing after 21s
Improve infra-run portfolio credibility
2026-05-08 21:18:22 +00:00

152 lines
5.3 KiB
Markdown

# GPFS / IBM Spectrum Scale Filesystem Expansion Toolkit
Safe, sanitized Bash examples for planning and executing a GPFS / IBM Spectrum Scale filesystem expansion. The scripts are written as readable operational examples for a Linux Infrastructure Engineer: conservative defaults, clear validation, dry-run behavior, and explicit operator confirmation before changes.
These scripts are examples. Exact GPFS commands, flags, quorum practices, failure-group design, and storage naming standards vary by Spectrum Scale version and site policy.
## Diagram
```mermaid
flowchart TD
A["gpfs"] --> B["01_cluster_overview.sh"]
A --> C["02_precheck_gpfs.sh"]
A --> D["03_detect_new_disks.sh"]
A --> E["04_create_nsd_stanza.sh"]
A --> F["05_add_nsd_to_filesystem.sh"]
A --> G["06_rebalance_filesystem.sh"]
A --> H["07_postcheck_gpfs.sh"]
A --> I["08_generate_report.sh"]
A --> J["gpfs_extend_runbook.sh"]
```
## Concepts
- **Cluster** - the Spectrum Scale administrative domain containing the nodes, daemon configuration, quorum policy, filesystems, and NSDs.
- **Node** - a server participating in the GPFS cluster. Nodes may be clients, NSD servers, quorum nodes, manager-capable nodes, or a mix of roles.
- **Quorum** - the voting mechanism that protects the cluster from split-brain conditions. Expansion work should not proceed during quorum instability.
- **Filesystem** - the GPFS namespace and data layout presented to clients, backed by one or more NSDs.
- **NSD** - Network Shared Disk, the GPFS abstraction for a disk or LUN that is served to the cluster.
- **Failure group** - a placement hint that tells GPFS which disks share a failure domain, such as an enclosure, rack, site, controller pair, or storage array.
- **Storage pool** - a named pool of NSDs used for placement and lifecycle policy, commonly `system` plus optional data pools.
- **Restripe/rebalance** - the operation that redistributes data after disks are added. It can be I/O intensive and should run only in an approved change window.
## Required Tools
Common GPFS / Spectrum Scale tools expected in production include:
- `mmgetstate`
- `mmlscluster`
- `mmlsfs`
- `mmlsdisk`
- `mmlsnsd`
- `mmcrnsd`
- `mmadddisk`
- `mmrestripefs`
The toolkit also uses common Linux tools such as `df`, `lsblk`, `findmnt`, `journalctl`, and `dmesg` where available. Missing optional commands are reported as `WARNING` and skipped.
## Safety Model
- Default mode is dry-run.
- Real GPFS modifications require `--execute`.
- Destructive or high-impact steps also prompt for `EXECUTE`.
- Disk detection is read-only and never partitions, formats, wipes, or modifies devices.
- Device selection must always be confirmed with the storage team and cluster owners.
- The scripts do not assume production disk names.
Output uses a consistent status format:
- `OK`
- `WARNING`
- `CRITICAL`
Exit codes:
- `0` - OK
- `1` - operational validation failure
- `2` - invalid input or missing requirement
## Scripts
- `00_env.sh` - shared configuration and helper functions.
- `01_cluster_overview.sh` - read-only cluster overview.
- `02_precheck_gpfs.sh` - pre-expansion validation for a target filesystem.
- `03_detect_new_disks.sh` - read-only candidate block-device discovery.
- `04_create_nsd_stanza.sh` - generate an NSD stanza file.
- `05_add_nsd_to_filesystem.sh` - create NSDs and add disks to a filesystem, dry-run by default.
- `06_rebalance_filesystem.sh` - optional restripe/rebalance, dry-run by default.
- `07_postcheck_gpfs.sh` - post-change validation.
- `08_generate_report.sh` - text report for the change record.
- `gpfs_extend_runbook.sh` - guided order of operations plus safe read-only checks.
## Example Workflow
```bash
cd infra-run/scripts/bash/gpfs
./01_cluster_overview.sh
./02_precheck_gpfs.sh --fs gpfs01
./03_detect_new_disks.sh --exclude-mounted --exclude-existing-nsd
./04_create_nsd_stanza.sh \
--fs gpfs01 \
--devices "/dev/sdb /dev/sdc" \
--servers "gpfsnsd01,gpfsnsd02" \
--failure-group 10 \
--pool system \
--usage dataAndMetadata
```
Review the generated stanza with the storage and cluster teams. Confirm device identity, LUN masking, multipath naming, failure group placement, and site standards before continuing.
Dry-run the add step:
```bash
./05_add_nsd_to_filesystem.sh \
--fs gpfs01 \
--stanza /tmp/gpfs_nsd_gpfs01_YYYYmmdd_HHMMSS.stanza
```
Execute only in an approved change window:
```bash
./05_add_nsd_to_filesystem.sh \
--fs gpfs01 \
--stanza /tmp/gpfs_nsd_gpfs01_YYYYmmdd_HHMMSS.stanza \
--execute
```
Optional rebalance:
```bash
./06_rebalance_filesystem.sh --fs gpfs01
./06_rebalance_filesystem.sh --fs gpfs01 --execute --background
```
Post-check and report:
```bash
./07_postcheck_gpfs.sh --fs gpfs01
./08_generate_report.sh --fs gpfs01
```
Runbook helper:
```bash
./gpfs_extend_runbook.sh \
--fs gpfs01 \
--devices "/dev/sdb /dev/sdc" \
--servers "gpfsnsd01,gpfsnsd02" \
--failure-group 10 \
--pool system \
--usage dataAndMetadata
```
## Operational Notes
- Do not run these scripts blindly on production clusters.
- Confirm disk and multipath identity with the storage team before creating NSDs.
- Validate quorum and manager health before expansion.
- Confirm application I/O risk and rollback procedures before `mmadddisk` or `mmrestripefs`.
- Confirm the Spectrum Scale version and local standards for stanza fields before executing changes.