
Artificial Intelligence and Machine Learning HPC
GPU cluster infrastructure for LLM training and inference — NVIDIA H100, InfiniBand, and Kubernetes integration.
Why Does Training AI Models Require High-Performance Computing?
Developing large language models (LLMs), image classification networks, or recommendation systems quickly exceeds the limits of a single GPU or a standard cloud virtual machine. The pre-training phase of a GPT-scale model processes trillions of tokens across billions of parameters; this means hundreds of GPUs running continuously for weeks, even months. Even during fine-tuning and inference phases, GPU memory and network bandwidth become critical bottlenecks for low latency and high throughput.
For AI companies and research groups operating in Turkey, meeting this computing need domestically carries additional significance: data sovereignty, KVKK compliance, and a predictable cost structure independent of the currency risk of foreign cloud providers.
Artificial Intelligence and ML Workloads
LLM Pre-Training
Training models from scratch is one of HPC’s most intensive workloads. In Transformer-based models, each iteration requires synchronizing enormous gradient matrices between GPUs across layers. Tools used in this process:
- PyTorch FSDP / DeepSpeed ZeRO — splits model state into pieces across GPUs
- Megatron-LM — NVIDIA framework combining tensor and pipeline parallelism
- NCCL (NVIDIA Collective Communications Library) — all-reduce operations between GPUs
- Hugging Face Accelerate — abstraction layer for multi-GPU and multi-node training
InfiniBand HDR/NDR network is mandatory for low-latency GPU-GPU communication; standard Ethernet cannot meet this latency budget.
Fine-Tuning and RLHF
Adapting an existing model to a specific domain or task requires less computation than full pre-training, but still needs dedicated GPU infrastructure. Parameter-efficient methods like LoRA and QLoRA still require multiple GPUs for large models (70B+). Reinforcement Learning from Human Feedback (RLHF) has particularly high memory pressure since it must load the reward model, actor, and reference policy into memory simultaneously. Tools used:
- TRL (Transformer Reinforcement Learning) — Hugging Face’s RLHF/PPO toolkit
- Axolotl — open-source framework standardizing fine-tuning workflows
- LLaMA-Factory — multi-model fine-tuning platform
- vLLM / SGLang — for fast inference in the RLHF loop
Large-Scale Inference
Putting trained models into production requires a very different but equally critical infrastructure profile from training: low P99 latency, high concurrent user capacity, and cost-effective GPU utilization. Production inference stacks:
- vLLM — high-throughput LLM inference with PagedAttention
- Triton Inference Server — NVIDIA’s multi-model serving framework
- TensorRT-LLM — inference library optimized for H100/A100
- Ray Serve — distributed inference scaling
MLOps and Experiment Management
Managing the model development cycle is as important as computation. For experiment tracking, model registry, data pipeline orchestration, and continuous training:
- MLflow / Weights & Biases (W&B) — experiment tracking and model management
- Kubeflow Pipelines / Argo Workflows — ML workflow orchestration
- DVC (Data Version Control) — data and model versioning
- Apache Airflow — scheduling and dependency management
Mevasis GPU Cluster Architecture
Mevasis offers two core configurations for AI workloads:
Configuration A — LLM Training Cluster
compute_node:
gpu: NVIDIA H100 SXM5 80GB
gpu_per_node: 8
cpu: AMD EPYC 9454 (48 cores)
system_memory: 1.5 TB DDR5 ECC
local_storage: 8x 3.84 TB NVMe (RAID 0)
networks:
gpu_interconnect: InfiniBand NDR 400 Gb/s (NVLink included)
management: 25 GbE (out-of-band)
storage: 100 GbE
shared_storage:
type: WEKA / Lustre parallel file system
capacity: 2 PB raw (1.2 PB net)
bandwidth: 200 GB/s read, 100 GB/s write
software_stack:
container: Docker + NVIDIA Container Toolkit
orchestration: Kubernetes + GPU Operator
mpi: OpenMPI 5.x
cuda: 12.x
monitoring: Prometheus + Grafana + DCGM
Configuration B — Inference and Fine-Tuning Servers
server:
gpu: NVIDIA A100 80GB or H100 PCIe
gpu_count: 4 or 8
cpu: Intel Xeon Scalable 4th Gen
memory: 512 GB - 1 TB DDR5
network: 100 GbE
use: Fine-tuning, small model training, production inference
Typical Workload Comparison
| Model Size | Task | Minimum GPU | Recommended Configuration | Duration |
|---|---|---|---|---|
| 7B parameters | Full fine-tuning | 2x A100 80GB | 4x A100 | 6–12 hours |
| 7B parameters | QLoRA fine-tuning | 1x A100 40GB | 1x A100 80GB | 2–4 hours |
| 70B parameters | Full fine-tuning | 8x A100 80GB | 8x H100 80GB | 2–5 days |
| 70B parameters | QLoRA fine-tuning | 4x A100 80GB | 4x H100 | 12–24 hours |
| 405B parameters | Pre-training | 64x H100 | 128x H100 | Weeks |
| Any | Production inference | 1x A100 | 2–4x H100 | Continuous |
Data Sovereignty and KVKK Compliance
Datasets used for training AI models often contain personal data: customer conversations, health records, legal documents, or financial transaction histories. When this data leaves Turkey’s borders, serious obligations arise under the Personal Data Protection Law (KVKK).
Mevasis infrastructure is deployed in Turkey-located facilities. Your data is not transferred abroad and does not enter third-party cloud providers’ systems. Organizations can run all model training, inference, and data storage processes in compliance with KVKK within Turkey.
Additionally, TL-based pricing eliminating currency fluctuation risk, domestic supply chain, and Turkish-language technical support are among the factors that differentiate Mevasis from global cloud alternatives.
What Our Team Does for You
We don’t just rent servers; we provide architecture design and installation support tailored to your workload:
- Cluster sizing: Calculating GPU count, memory, and network bandwidth based on model architecture, dataset size, and target training duration
- Software installation: Installation and configuration of PyTorch, CUDA, NCCL, MPI, Kubernetes GPU Operator, and monitoring tools
- Benchmarking: Performance tests with your actual workload and optimization recommendations
- MLOps integration: Integration of W&B, MLflow, or your preferred experiment tracking tool with the cluster
- Ongoing support: Resource planning, queue management, and performance monitoring
Related Mevasis Services
- GPU Server Rental — H100 and A100 GPU servers on hourly or monthly basis
- Managed HPC Cluster — Fully managed multi-node, InfiniBand-connected clusters
- Custom Infrastructure Consulting — Architecture design, capacity planning, and cost optimization
Let’s jointly determine the right GPU infrastructure for your AI project. Share your model size, dataset, and timeline; we’ll prepare a custom configuration and price quote for you.