How Nvidia’s Volta Architecture is Accelerating AI, Ray Tracing and Computing into the new Generation.

“AI is not defined by any one industry. It exists in fields of super-computing, healthcare, financial services, big data analytics, and gaming. It is the future of every industry and market because every enterprise needs intelligence, and the engine of AI is the NVIDIA GPU computing platform.”

“NVIDIA Volta is the new driving force behind artificial intelligence. Volta will fuel breakthroughs in every industry. Humanity’s moonshots like eradicating cancer, intelligent customer experiences, and self-driving vehicles are within reach of this next era of AI.” – NVIDIA

These are the words from Nvidia themselves in regards to their upcoming new chip manufacturing architecture, Volta.


Volta is the latest name for a new GPU microarchitecture developed by Nvidia, succeeding Pascal (found in the GTX 1080 Ti, GTX 1080, GTX 1070 Ti). The architecture is named after Alessandro Volta, the inventor of the first electric battery. The architecture is produced with TSMC’s 12 nmFinFET process. The Volta Architecture of GPU’s are equipped with 640 Tensor Cores. This gives Volta the ability to deliver over 125 teraFLOPs per second (TFLOPS) of performance dedicated to deep learning and artificial intelligence, over a 5X increase from the Pascal microarchitecture of previous gen GPUs. With over 21 billion transistors, Volta is claimed to be the most advanced and powerful GPU architecture in the world. It pairs NVIDIA® CUDA® and Tensor Cores to deliver the performance of AI supercomputers into a GPU.

Designed specifically for deep learning, the first-generation Tensor Cores in Volta deliver groundbreaking performance with mixed-precision matrix multiply in FP16 and FP32—up to 12X higher peak teraflops (TFLOPS) for training and 6X higher peak TFLOPS for inference over the prior-generation NVIDIA Pascal™. This key capability enables Volta to deliver 3X performance speedups in training and inference over Pascal.

The Nvidia GeForce RTX 2080 Ti, the flagship GPU in the Turing Architecture lineup. RTX provides in-built support for ray tracing rendering technology.


Tensor Cores are unique and individual processing units within Volta and Turing GPUs that are designed to perform large amounts of mixed precision matrix multiplication. Since artificial intelligence is all about large data gathering and algorithm processing on them, dedicated Tensor cores help drastically improve the throughput and efficiency of AI processing. Nvidia’s latest generation of Tesla GPUs are powered by these exact Tensor Cores, a groudbreak for companies and machines requiring fast AI performance. Turing Tensor Cores developed by Nvidia provide a full range of precisions for inferences in large sets of data. These cores provide breakthrough performance in FP32, FP16, INT8 and INT4 sets. FP = Floating Point, INT = Integer.

A comparison of matrix operations on the new Turing Tensor Cores vs the older generation Pascal interface.

T4 delivers breakthrough performance for deep learning training in FP32, FP16, INT8, INT4, and binary precisions for inference. With 130 teraOPS (TOPS) of INT8 and 260TOPS of INT4, T4 has the world’s highest inference efficiency, up to 40X higher performance compared to CPUs with just 60 percent of the power consumption. T4 uses only around 75 watts of power at peak.