Skip to content

Why AI/ML Researchers Are Stuck with Inefficient GPU Setups (And How to Fix It)

If you're an AI/ML researcher, chances are you have a GPU permanently tethered to your development machine. Our user study confirms this is the norm—but is it the smartest approach?

The Inefficiency of Typical GPU Development Environments

Picture this: You're deep in thought about your machine learning model architecture, grabbing coffee, or writing code. Meanwhile, your expensive GPU sits idle, burning through your budget while contributing nothing to your research.

But the inefficiency goes deeper than idle time. With just one GPU, computational power is limited. You cannot run hyperparameter tuning in parallel or use distributed training to accelerate your jobs.

Why Existing Solutions Like Kubernetes Miss the Mark

You might wonder: "Aren't there plenty of software solutions that promise easier GPU workload scheduling?"

Yes. However, almost all existing GPU scheduling tools impose a critical bottleneck: slow iteration cycles. The feedback cycle increases from seconds to minutes, plus several hours to set up any new workload. Developers need to build containers that recreate their dev environment for every GPU workload, even for a simple test. This might be acceptable for deploying applications in production environments, but are productivity killers during the iterative research phase of model training.

The Version Hell Problem

Speed is not the only blocker. All container orchestration tools require developers to recreate their development environment. Even slight mismatches between local and containerized package versions can lead to subtle bugs that consume days of debugging.

On the other side of the spectrum, some solutions like Slurm force teams into standardized environments that break the moment someone needs a different library version or custom configuration.

The Learning Curve Challenge

Beyond technical limitations, many tools expect researchers to become infrastructure experts. Learning an entirely new way to manage packages, set up pipelines, and adapt workflows to fit specific tooling requirements takes weeks—time that could be spent on actual research. Many researchers simply give up and stick with their single GPU setup.

Your Development Machine Still Matters

Ultimately, none of these solutions eliminate the need for your development machine. You still need somewhere to build container images, connect your IDE, and write code. Unless all remaining development can happen on your laptop, using these solutions means there's one more infrastructure component to maintain. At that point, removing the GPU is hardly an improvement that add value.

Introducing Velda: Local Experience, Unlimited Scale

We built Velda on one principle: requesting compute resources should feel identical to running locally.

Velda provides development environments that can scale. Imagine you have a development machine with infinite resources, but you only pay for what's actually used. Velda makes this a reality.

vrun Is All You Need

Prefix vrun to any command to run it with additional power. For example, to run your training job with one A100 GPU:

vrun -P a100-1 ./training.py

With this single command, you get:

  • Instant scaling: Our optimized scheduler can start your workload within a second when resources are available, or it typically takes less than a minute to provision a new instance from your cloud provider.

  • Consistent environment: Your entire development setup—storage, secrets, configurations, environment variables—will be mirrored and securely isolated from others. Skip complex, error prone and slow container setups!

  • Flexible workload: Training, model inference, data processing, compilation, or experimental scripts—it works with everything that can run in a regular container.

  • True local-like experience: All existing workflows remain unchanged. Continue using the tools and frameworks you're familiar with, just with more compute power.

The Result: Better AI Research, Lower GPU Costs

By accessing GPUs without any friction, Velda's customers can effectively share GPUs throughout their organization. This significantly improves GPU utilization while reducing infrastructure costs.

Developers can also deliver model improvements faster, because they're never slowed down by the static performance of their development machine. The result is accelerated AI research cycles and improved team productivity.

Flexible Deployment Options

Velda can be deployed with an on-premises GPU cluster, existing Kubernetes cluster, or directly with your cloud hyperscaler. Cluster admins define pools that hide the details of complex resource management from different providers, while providing enough flexibility for AI/ML researchers to make changes.

Velda is available as open source, which is completely free to use, and also available in enterprise for those who need support and additional optimizations.

Subscribe to us and learn more about the developer journey with Velda!