Artificial Intelligence and Machine Learning HPC

Why Does Training AI Models Require High-Performance Computing?

Developing large language models (LLMs), image classification networks, or recommendation systems quickly exceeds the limits of a single GPU or a standard cloud virtual machine. The pre-training phase of a GPT-scale model processes trillions of tokens across billions of parameters; this means hundreds of GPUs running continuously for weeks, even months. Even during fine-tuning and inference phases, GPU memory and network bandwidth become critical bottlenecks for low latency and high throughput.

For AI companies and research groups operating in Turkey, meeting this computing need domestically carries additional significance: data sovereignty, KVKK compliance, and a predictable cost structure independent of the currency risk of foreign cloud providers.

Artificial Intelligence and ML Workloads

LLM Pre-Training

Training models from scratch is one of HPC’s most intensive workloads. In Transformer-based models, each iteration requires synchronizing enormous gradient matrices between GPUs across layers. Tools used in this process:

PyTorch FSDP / DeepSpeed ZeRO — splits model state into pieces across GPUs
Megatron-LM — NVIDIA framework combining tensor and pipeline parallelism
NCCL (NVIDIA Collective Communications Library) — all-reduce operations between GPUs
Hugging Face Accelerate — abstraction layer for multi-GPU and multi-node training

InfiniBand HDR/NDR network is mandatory for low-latency GPU-GPU communication; standard Ethernet cannot meet this latency budget.

Fine-Tuning and RLHF

Adapting an existing model to a specific domain or task requires less computation than full pre-training, but still needs dedicated GPU infrastructure. Parameter-efficient methods like LoRA and QLoRA still require multiple GPUs for large models (70B+). Reinforcement Learning from Human Feedback (RLHF) has particularly high memory pressure since it must load the reward model, actor, and reference policy into memory simultaneously. Tools used:

TRL (Transformer Reinforcement Learning) — Hugging Face’s RLHF/PPO toolkit
Axolotl — open-source framework standardizing fine-tuning workflows
LLaMA-Factory — multi-model fine-tuning platform
vLLM / SGLang — for fast inference in the RLHF loop

Large-Scale Inference

Putting trained models into production requires a very different but equally critical infrastructure profile from training: low P99 latency, high concurrent user capacity, and cost-effective GPU utilization. Production inference stacks:

vLLM — high-throughput LLM inference with PagedAttention
Triton Inference Server — NVIDIA’s multi-model serving framework
TensorRT-LLM — inference library optimized for H100/A100
Ray Serve — distributed inference scaling

MLOps and Experiment Management

Managing the model development cycle is as important as computation. For experiment tracking, model registry, data pipeline orchestration, and continuous training:

MLflow / Weights & Biases (W&B) — experiment tracking and model management
Kubeflow Pipelines / Argo Workflows — ML workflow orchestration
DVC (Data Version Control) — data and model versioning
Apache Airflow — scheduling and dependency management

Mevasis GPU Cluster Architecture

Mevasis offers two core configurations for AI workloads:

Configuration A — LLM Training Cluster

compute_node:
  gpu: NVIDIA H100 SXM5 80GB
  gpu_per_node: 8
  cpu: AMD EPYC 9454 (48 cores)
  system_memory: 1.5 TB DDR5 ECC
  local_storage: 8x 3.84 TB NVMe (RAID 0)

networks:
  gpu_interconnect: InfiniBand NDR 400 Gb/s (NVLink included)
  management: 25 GbE (out-of-band)
  storage: 100 GbE

shared_storage:
  type: WEKA / Lustre parallel file system
  capacity: 2 PB raw (1.2 PB net)
  bandwidth: 200 GB/s read, 100 GB/s write

software_stack:
  container: Docker + NVIDIA Container Toolkit
  orchestration: Kubernetes + GPU Operator
  mpi: OpenMPI 5.x
  cuda: 12.x
  monitoring: Prometheus + Grafana + DCGM

Configuration B — Inference and Fine-Tuning Servers

server:
  gpu: NVIDIA A100 80GB or H100 PCIe
  gpu_count: 4 or 8
  cpu: Intel Xeon Scalable 4th Gen
  memory: 512 GB - 1 TB DDR5
  network: 100 GbE
  use: Fine-tuning, small model training, production inference

Typical Workload Comparison

Model Size	Task	Minimum GPU	Recommended Configuration	Duration
7B parameters	Full fine-tuning	2x A100 80GB	4x A100	6–12 hours
7B parameters	QLoRA fine-tuning	1x A100 40GB	1x A100 80GB	2–4 hours
70B parameters	Full fine-tuning	8x A100 80GB	8x H100 80GB	2–5 days
70B parameters	QLoRA fine-tuning	4x A100 80GB	4x H100	12–24 hours
405B parameters	Pre-training	64x H100	128x H100	Weeks
Any	Production inference	1x A100	2–4x H100	Continuous

Data Sovereignty and KVKK Compliance

Datasets used for training AI models often contain personal data: customer conversations, health records, legal documents, or financial transaction histories. When this data leaves Turkey’s borders, serious obligations arise under the Personal Data Protection Law (KVKK).

Mevasis infrastructure is deployed in Turkey-located facilities. Your data is not transferred abroad and does not enter third-party cloud providers’ systems. Organizations can run all model training, inference, and data storage processes in compliance with KVKK within Turkey.

Additionally, TL-based pricing eliminating currency fluctuation risk, domestic supply chain, and Turkish-language technical support are among the factors that differentiate Mevasis from global cloud alternatives.

What Our Team Does for You

We don’t just rent servers; we provide architecture design and installation support tailored to your workload:

Cluster sizing: Calculating GPU count, memory, and network bandwidth based on model architecture, dataset size, and target training duration
Software installation: Installation and configuration of PyTorch, CUDA, NCCL, MPI, Kubernetes GPU Operator, and monitoring tools
Benchmarking: Performance tests with your actual workload and optimization recommendations
MLOps integration: Integration of W&B, MLflow, or your preferred experiment tracking tool with the cluster
Ongoing support: Resource planning, queue management, and performance monitoring

GPU Server Rental — H100 and A100 GPU servers on hourly or monthly basis
Managed HPC Cluster — Fully managed multi-node, InfiniBand-connected clusters
Custom Infrastructure Consulting — Architecture design, capacity planning, and cost optimization

Let’s jointly determine the right GPU infrastructure for your AI project. Share your model size, dataset, and timeline; we’ll prepare a custom configuration and price quote for you.

Contact Our Technical Team →