Graphics Processing Unit (GPU) as a Service (GPUaaS) 

24 Jan, 2023
3 mins. read
Vysakh Nair
Cloud Architect

GPU (graphics processing units) and CPU (central processing units) are both processors, but different in the number of cores and functionality. GPUs are more powerful than CPUs in executing the complex tasks hence used in a wide range of modern technologies like machine learning (ML), artificial intelligence (AI), high performance computing (HPC), gaming, video editing and content creation etc.

GPUs use parallel processing, meaning separate portions of each task will be handled by different processors. Also, each GPU has its random-access memory (RAM) to store the data that processes. There are two types of GPUs, integrated and discrete. Integrated GPUs come embedded alongside, while discrete GPUs can be mounted on a separate circuit board. 

Graphic Cards and GPUs are the most used terms, but both are different. The graphics card is the hardware, while the GPU is a chip, part of the graphics card. The GPU performs the actual image and graphics processing, and graphics card presents images to the display unit. 

KEY FEATURES

The GPU and its architecture have many features which can support high performance computing requirements and below are some of those. 

  • Parallel Processing – Type of computation in which many calculations or processes are carried out simultaneously and separate portions of each task will be handled by different processors. 
  • Tensor Cores – GPUs with tensor cores allow faster matrix multiplication in the cores which increase the throughput and reduce the latency. 
  • High Memory Bandwidth – GPUs have higher memory bandwidth as this does parallel processing and can take lot of memory simultaneously which subsequently increases the performance. 

USE CASES

  • Artificial Intelligence & Machine Learning – Artificial intelligence is the simulation of human intelligence processes by machines, especially computer systems and Machine Learning is a subset of Artificial Intelligence. Machine Learning is the ability of computer systems to learn and make decisions from observation and data which require enhanced mathematical computation capability which can be provided by GPUs. 
  • Analytics and Data Science – GPUs are the best suited for analytics and data science programs to process large amounts of data from different sources with faster execution time. 
  • Video Editing, Content Creation and Gaming – A powerful GPU is a pre-requisite to have a smooth video rendering experience when you work with 1080p and 4K editing as you know GPU will support faster image processing. Also, GPUs are good in both 2D and 3D graphics rendering and with better graphics performance, games can be played at higher resolution at a faster frame rate. 
  • Blockchain and Cryptocurrency Mining – Blockchain is a distributed ledger based on peer-to-peer networks. Blockchain algorithms require compute platforms and CPU and GPU technologies together to make blockchain transactions increasingly faster and more secure. Currently, the most well-known use case of blockchain is cryptocurrency and CPUs were most commonly use in mining but considering the performance that GPU can provide will increase the mining efficiently more than 500 times compared to CPU, and it varies depending on the GPU models. As mining requires higher efficiency in performing similar kinds of repetitive computations which makes GPU suitable for cryptocurrency mining. 

PRODUCTS & LICENSING

Multiple customers and cloud providers who are into high performance computing would like to know if the GPUaaS can be leveraged from VMware hypervisor and the cloud automations solutions to enable their customers to use the advanced GPU use cases. Below are the products which can support GPUaaS requirements from VMware product portfolio. 

VMware vSphere – Customers can either go and use new GPU enabled hardware with VMware vSphere hypervisor installed or upgrade the existing VMware vSphere hypervisor hosts with supported GPU hardware and assign GPU to virtual machines by NVIDIA Virtual GPU (vGPU) or Passthrough (DirectPath I/O) modes. If you have more than one physical GPU on an ESXi host server, then a subset of those physical GPUs can be used with the NVIDIA vGPU setup while a separate subset of your GPUs can be used as Passthrough GPUs (also called DirectPath I/O). 

Figure1-Passthrough and vGPU Comparison

Form a licensing perspective PCI Passthrough can be used with any vSphere licensing and for NVIDIA vGPU vSphere Enterprise Plus license is required.

VMware vSphere Bitfusion – vSphere Bitfusion has a client-server architecture. vSphere Bitfusion server software shares the remote GPU which is shared from a vSphere ESXi host in Passthrough (DirectPath I/O) mode where the Bitfusion server is running. The client virtual machines (VMs) or Kubernetes clusters running artificial intelligence (AI), machine learning (ML) and video rendering applications access the remote GPU resources from Bitfusion server over a high bandwidth network using Bitfusion client. 

vSphere diagram illustrating how vSphere Bitfusion expands GPU virtualization.

Below are the licensing requirements for VMware vSphere Bitfusion deployment.

  • vSphere Enterprise Plus license plus Bit fusion Add-On
  • One or more enterprise-grade and supported GPUs per server licensed for VMware Bitfusion
  • Each Bit Fusion add-on per-CPU license entitles to up to 2 GPUs
VMware Aria Automation – In VMware Aria Automation, the blueprint architect can build blueprints and define how Virtual Machine or Container will be deployed, and software will be installed, configured, started, updated, and uninstalled within a Virtual Machine or Container. With GPU enabled vSphere hosts create a Virtual Machine template with the GPU profile, driver and all tools installed and the vGPU configured and this template can be used with VMware Aria Automation TensorFlow Actions in the blueprint defined that end users can consume through a self-service provisioning portal.

Form a licensing perspective PCI Passthrough can be used with any vSphere licensing and VMware Aria Suite Advanced or Enterprise license is required for VMware Aria Automation.

VMware Cloud Director – VMware Cloud Providers can leverage vSphere support for NVIDIA vGPU to enable the GPUaaS to the tenants by resource pooling, resource sizing profiles and placement policies. Cloud Providers can monitor NVIDIA vGPU allocation, usage per VDC and per VM to optimize utilization and meter/billing through vCloud API and UI dashboard. Cloud Providers can offer vApp Templates pre-configured with all the necessary placement policies, GPU Profiles assigned, VM and guest OS enabled for GPU. Tenants can use this vGPU profiles in Virtual Machine or Containers also tenants can use relevant applications to avail of AI/ML capabilities through VMware Marketplace offerings delivered via for free with App Launchpad, such as TensorFlow, Mxnet, Dkube, Cognitive Assistance, and Dask Parallel Computing.

Below are the licensing requirements for VMware Cloud Director with NVIDIA vGPU deployment.

  • Partners must purchase NVIDIA AI Enterprise and VMware compliant GPU hardware
  • Cloud Providers will be part of VMware Cloud Provider Program for the required licenses in PAYG model

VMware Horizon – VMware Horizon provides virtual desktop solutions from a supported hypervisor environment, and it can be accessed from a remotely connected device. GPUaaS can be enabled on these desktops by using the below graphics accelerations. 

Form a licensing perspective PCI Passthrough can be used with any vSphere licensing and for NVIDIA vGPU Enterprise Plus license is required with VMware Horizon Standard onwards.

HOW HUCO HELPS OUR CUSTOMERS TO ENABLE THIS CAPABILITY 

In this journey huco is helping customers to understand the requirements and identify the compute nodes and GPU models which can fulfill their requirement. Also, design and deploy the services to enable the GPUaaS capabilities from on-prem datacenter. 

Identify the supported hardware → huco can help customers to identify the VMware supported server and GPU hardware which can fulfill the business and technical requirements. 

Design and Deploy → Design and Deploy or Upgrade an environment which can support the GPUaaS use cases for the customer workloads.

Knowledge Transfer and Documentation → Document the design and deployment details in High-Level and Low-Level Design and provide knowledge transfer to customer’s technical team.

Support and Managed Services → With huco's iDOC (Remote Intelligent Digital Operation Center) offering, We provide Day 2 operation and adoption support.

huco is a leading cloud native partner in METNA region and 1st partner EMEA to achieve all the 7 Master Services Competencies (MSC) of VMware. Being a leading MSC partner of VMware, Huco has gained vast experience in implementing VMware products and acquired knowledge/skills/experience. huco works closely with the VMware product team to help customers to achieve the requirements.

For more information on how Huco helped customers in enabling the GPUaaS, please reach out to info@huco.co and post your inquiry/interest. Our VMware Experts are eager to help you in your journey towards accelerating your application by virtualizing GPUs.

Get in touch with us
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.