BeeGFS Technical Guide: Architecture, Installation, and Best Practices
BeeGFS parallel filesystem architecture, four-component design (management, metadata, storage, client), installation steps, troubleshooting, and production best practices for HPC clusters.
When an HPC cluster’s compute nodes outpace its storage system, even the most powerful processors sit idle waiting for data. BeeGFS (formerly FhGFS) is a parallel filesystem designed to eliminate this bottleneck by distributing metadata and data across multiple servers, enabling aggregate I/O bandwidth that scales linearly with the number of storage nodes.
Architecture: Four Components
BeeGFS follows a clean separation of concerns with four distinct service types:
Management server (mgmtd): The single coordinator that tracks the cluster topology — which metadata and storage services are running, which clients are connected. It does not handle file data or metadata itself; it is a lightweight coordination service. High availability through active-passive failover is supported.
Metadata server (meta): Stores directory trees, file attributes, and the mapping between files and their storage targets. Unlike traditional NAS systems, BeeGFS can run multiple metadata servers simultaneously, each responsible for a portion of the namespace. More metadata servers means higher throughput for operations like ls, find, and stat.
Storage server (stor): Stores the actual file content in chunks. Files are striped across multiple storage targets according to the stripe count and chunk size configured at creation time. Adding storage servers increases both capacity and aggregate bandwidth.
Client (client): A kernel module installed on every compute node that mounts the BeeGFS filesystem. The client is responsible for direct communication with metadata and storage servers — there is no proxy in the data path.
Installation Overview
BeeGFS packages are available for RHEL/CentOS and Debian/Ubuntu from the official repository at www.beegfs.io/release/.
# Add repository (RHEL 8 example)
wget -O /etc/yum.repos.d/beegfs-rhel8.repo \
https://www.beegfs.io/release/beegfs_7.4/dists/beegfs-rhel8.repo
# Install services on dedicated nodes
yum install beegfs-mgmtd # on management node
yum install beegfs-meta # on metadata node(s)
yum install beegfs-storage # on storage node(s)
yum install beegfs-client beegfs-helperd beegfs-utils # on all compute nodes
# Initialize management service
/opt/beegfs/sbin/beegfs-setup-mgmtd -p /data/mgmtd
# Initialize metadata service (must point to mgmtd)
/opt/beegfs/sbin/beegfs-setup-meta \
-p /data/meta \
-s 1 \
-m <mgmtd-hostname>
# Initialize storage service with two targets
/opt/beegfs/sbin/beegfs-setup-storage \
-p /data/storage1 \
-s 2 \
-i 201 \
-m <mgmtd-hostname>
# Start services
systemctl enable --now beegfs-mgmtd
systemctl enable --now beegfs-meta
systemctl enable --now beegfs-storage
systemctl enable --now beegfs-helperd beegfs-client
Stripe Configuration
BeeGFS allows per-directory stripe settings, which is a powerful feature for HPC workloads with different I/O profiles.
# Set stripe width to 4 storage targets for a large-file directory
beegfs-ctl --setpattern --chunksize=1m --numtargets=4 /mnt/beegfs/scratch/large_files
# Verify stripe settings
beegfs-ctl --getentryinfo /mnt/beegfs/scratch/large_files
# Set stripe count for a directory with many small files (reduce overhead)
beegfs-ctl --setpattern --chunksize=512k --numtargets=2 /mnt/beegfs/scratch/small_files
For MPI-IO workloads with collective I/O, set numtargets equal to the number of MPI processes that write simultaneously to maximize parallelism.
Buddy Mirroring for High Availability
BeeGFS Buddy Mirroring replicates metadata and storage data across pairs of servers. If one server in a pair fails, the other continues serving data without interruption.
# Enable storage mirroring (creates mirror groups automatically)
beegfs-ctl --addmirrorgroup --automatic --nodetype=storage
# Enable metadata mirroring
beegfs-ctl --addmirrorgroup --automatic --nodetype=meta
# Enable mirroring for a specific directory
beegfs-ctl --setpattern --pattern=buddymirror --numtargets=4 /mnt/beegfs/critical_data
# Check mirror group status
beegfs-ctl --listmirrorgroups --nodetype=storage
Monitoring and Diagnostics
# Overall system health
beegfs-ctl --listnodes --nodetype=storage --reachable
beegfs-ctl --listnodes --nodetype=meta --reachable
# Check storage target capacity
beegfs-ctl --storagetargets --longnodes
# Real-time throughput statistics
beegfs-iostat -i 5
# Client connection status
beegfs-net
# Check for unreachable targets
beegfs-check-servers
Common Problems and Solutions
Mount hangs after storage node failure: If a storage target becomes unavailable and Buddy Mirroring is not enabled, client I/O to files stored on that target will block. The beegfs-client mount option tuneRemoteFSync=false prevents fsync from blocking indefinitely. For production, enable Buddy Mirroring.
Metadata performance degradation: Metadata operations (directory listings, file creation) slow down when a single metadata server handles too many namespaces. Add a second metadata server and use beegfs-ctl --mirrormd to redistribute the namespace.
Uneven storage target fill rates: If some targets fill faster than others, file creation will fail even with available capacity on other targets. Use beegfs-ctl --resyncstoragetargets to trigger a rebalance. Configure tuneTargetChooser=randomized in beegfs-client.conf to distribute writes more evenly.
Client kernel module compilation errors: BeeGFS ships a DKMS package that rebuilds the kernel module on kernel updates. Ensure kernel-devel (RHEL) or linux-headers (Debian) matches the running kernel version.
Best Practices for HPC Workloads
- Separate management, metadata, and storage networks onto dedicated NICs. Mixing management traffic with storage I/O increases latency variability.
- Size metadata servers with fast NVMe SSDs — metadata operations are latency-sensitive. A slow metadata server bottlenecks all file operations regardless of storage bandwidth.
- Use a dedicated scratch directory per job with the
--chunksizetuned to the application’s I/O size. Sequential large-file workloads benefit from 1–4 MB chunks; random small-file workloads from 256–512 KB. - Enable Buddy Mirroring in production to survive single-server failures without job interruption.
- Run IOR benchmarks before deploying in production to verify that the configuration achieves the expected aggregate bandwidth.
BeeGFS is one of the most widely deployed parallel filesystems in mid-range HPC clusters precisely because it delivers high performance without the operational complexity of Lustre. For sizing, architecture, and deployment support, visit Mevasis HPC Storage Solutions or contact us for a technical consultation.