TSMC Chairman Liu Deyin Forecasts 1,000-Fold Increase in GPU Performance per Watt and Over One Trillion Transistors in Next 15 Years.

[Introduction to New Wisdom]In the past 25 years, the semiconductor process has continued to approach its limits, which led to the birth of ChatGPT. Today, the world's most powerful Nvidia GPU has more than 208 billion transistors. TSMC boss predicts that 1 trillion transistor GPUs will be available in the next ten years.

At the GTC 2024 conference, Lao Huang presented the world's most powerful GPU-Blackwell B200, which packs more than 208 billion transistors.

Advertisement

Compared with the previous generation H100 (80 billion), the B200 has more than twice the number of transistors, and its AI training performance has directly increased by 5 times, and its running speed has increased by 30 times.

If so, what does expanding the number of transistors from hundreds of billions to one trillion mean to the AI ​​industry?

Today, the front page of IEEE published an article written by the chairman and chief scientist of TSMC – “How we achieved a 1 trillion transistor GPU”?

Advertisement

The main purpose of this thousand-word article is to make people in the AI ​​community aware of the contribution that breakthroughs in semiconductor technology have made to AI technology.

From “Deep Blue” that defeated the human chess champion in 1997 to ChatGPT that exploded in 2023, AI has been put into everyone's mobile phones from research projects in laboratories in the past 25 years.

All this is thanks to major breakthroughs at three levels:ML algorithm innovation, massive data, and advances in semiconductor processes.

TSMC predicts that in the next 10 years, the number of transistors integrated into GPUs will reach 1 trillion! At the same time, GPU performance per watt will increase 1,000 times over the next 15 years.

Semiconductor technology continues to evolve, and ChatGPT was born.

From software and algorithms to architecture, circuit design and even device technology, every layer of the system has greatly improved the performance of AI. However, the continuous improvement of basic transistor device technology has made all this possible:

The chip processes used by IBM to train “Deep Blue” are 0.6 micron and 0.35 micron.

The 40nm process used by Ilya's team to train the deep neural network that won the ImageNet competition.

In 2016, AlphaGo trained by DeepMind defeated Lee Sedol using a 28-nanometer process.

The chip used to train ChatGPT is based on a 5-nanometer process, while the chip process of the latest version of the ChatGPT inference server has reached 4 nanometers.

It can be seen that the progress made in semiconductor process nodes from 1997 to the present has promoted the rapid development of AI today.

If the AI ​​revolution is to continue at its current pace, it will need innovation and support from the semiconductor industry.

If you carefully study the computing power requirements of AI, you will find that the amount of computing and memory access required for AI training has increased by several orders of magnitude in the past five years.

Taking GPT-3 as an example, its training requires the equivalent of more than 5 quadrillion operations per second for an entire day (equivalent to 5,000 gigaflops), and requires 3TB (30,000 petaflops). billion bytes) of memory capacity.

As new generations of generative AI applications emerge, demands for computing power and memory access continue to increase rapidly.

This brings up a looming question: How can semiconductor technology keep up with this pace of development?

From integrated chips to integrated chipsets

Since the birth of the integrated circuit, the semiconductor industry has been looking for ways to make chips smaller so that more transistors can be integrated into a chip the size of a fingernail.

Today, the integration process and packaging technology of transistors have moved to a higher level – the industry has moved from 2D space scaling to 3D system integration.

The chip industry is integrating multiple chips into a more integrated, highly interconnected system, marking a giant leap in semiconductor integration technology.

In the era of AI, a bottleneck in chip manufacturing is that photolithography chip manufacturing tools can only manufacture chips with an area of ​​no more than about 800 square millimeters. This is the so-called photolithography limit.

But now, TSMC can push past that limit by connecting multiple chips on a single piece of silicon with embedded interconnects, enabling large-scale integration not possible on a single chip.

For example, TSMC's CoWoS technology can package up to six chips within the limits of photolithography, as well as twelve high-bandwidth memory (HBM) chips.

High-bandwidth memory (HBM) is a key semiconductor technology that the AI ​​field increasingly relies on. It integrates systems by stacking chips vertically. This technology is called system integrated chip (SoIC) at TSMC.

HBM is composed of multiple layers of DRAM chips stacked vertically, all of which are located on top of a control logic IC. It uses through-silicon vias (TSV), a vertical connection method, to allow signals to pass through each layer of chips and connect individual memory chips through solder balls.

Currently, the most advanced GPUs rely heavily on HBM technology.

In the future, 3D SoIC technology will provide a new solutionwhich enables denser vertical connections between stacked chips compared to existing HBM technology.

Through the latest hybrid bonding technology, 12-layer chips can be stacked to develop a new HBM structure. This copper-to-copper connection is tighter than traditional solder ball connections.

Paper address:https://ieeexplore.ieee.org/document/9265044

The memory system is cryogenically bonded on a larger basic logic chip, with an overall thickness of just 600 microns.

As high-performance computing systems composed of many chips run large AI models, high-speed wired communications may become the next bottleneck in computing speed.

Currently, data centers are already using optical interconnect technology to connect server racks.

Article address:https://spectrum.ieee.org/optical-interconnects

In the near future, TSMC will need optical interfaces based on silicon photonics technology to package GPUs and CPUs together.

Paper address:https://ieeexplore.ieee.org/document/10195595

This enables optical communication between GPUs and improves the energy and area efficiency of bandwidth, allowing hundreds of servers to run efficiently like one giant GPU with unified memory.

Therefore, due to the promotion of AI applications, silicon photonics technology will become one of the most critical technologies in the semiconductor industry.

Towards a trillion-transistor GPU

The current GPU chips used for AI training have about 100 billion transistors, which have reached the processing limit of photolithography machines. If you want to continue to increase the number of transistors, you need to use multiple chips and integrate them through 2.5D and 3D technologies to complete computing tasks.

Currently, there are existing advanced packaging technologies such as CoWoS or SoIC that can integrate more transistors in GPUs.

TSMC predicts that within the next ten years, a single GPU using multi-chip packaging technology will have more than 1 trillion transistors.

At the same time, these chips also need to be connected through 3D stacking technology. But fortunately, the semiconductor industry has been able to significantly reduce the spacing of vertical connections, thereby increasing connection density.

Moreover, there is huge potential for increasing connection density in the future. TSMC believes that it is entirely possible to increase connection density by an order of magnitude or more.

Vertical connection density in 3D chips is growing at about the same rate as the number of transistors in GPUs

▲ The density of vertical connections in 3D chips is growing at about the same rate as the number of transistors in GPUs

GPU Energy Efficiency Performance Trends

So, how do these leading hardware technologies improve the overall performance of the system?

By observing the development of server GPUs, one trend is clear: so-called Energy Efficiency Performance (EEP) – a comprehensive indicator that reflects a system's energy efficiency and operating speed – is steadily improving.

Over the past 15 years, the semiconductor industry has achieved the feat of increasing EEP by approximately 3 times every two years..

From TSMC’s perspective, this growth trend will continue and will benefit from innovations in many aspects, including the application of new materials, advances in equipment and integration technology, breakthroughs in EUV technology, optimization of circuit design, and improvements in system architecture. innovation, as well as the comprehensive optimization of all these technical elements.

In addition, the concept of System Technology Co-Optimization (STCO) will become increasingly important.

In STCO, different functional modules within the GPU will be allocated to dedicated chiplets, and each module is built using the technology that best suits its performance and cost-effectiveness.

This optimized selection of each component will play a key role in improving overall performance and reducing costs.

Thanks to advances in semiconductor technology, EEP indicators are expected to increase three times every two years

▲ Thanks to the advancement of semiconductor technology, the EEP indicator is expected to increase three times every two years

A revolutionary moment for 3D integrated circuits

In 1978, Professor Carver Mead of Caltech and Lynn Conway of Xerox PARC jointly developed a revolutionary computer-aided design method.

They formulated a series of design rules to simplify the chip design process, allowing engineers to easily design complex large-scale integrated circuits even if they are not well versed in process technology.

Paper address:https://ai.eecs.umich.edu/people/conway/VLSI/VLSIText/PP-V2/V2.pdf

In the field of 3D chip design, we are also facing similar needs.

  • Designers must not only be proficient in chip and system architecture design, but also need to have knowledge of hardware and software optimization.

  • Manufacturers, on the other hand, need a deep understanding of chip technology, 3D integrated circuit technology and advanced packaging technology.

Just like in 1978, we need a common language so that electronic design tools can understand these technologies.

Today, a new hardware description language, 3Dblox, has been supported by most current technology and electronic design automation companies.

It gives designers the ability to freely design 3D integrated circuit systems without worrying about the limitations of the underlying technology.

Come out of the tunnel and face the future

In the tide of artificial intelligence, semiconductor technology has become a key force in promoting the development of AI and applications.

The new generation of GPUs has broken down traditional size and shape constraints. The development of semiconductor technology is no longer limited to shrinking transistors on a two-dimensional plane.

An AI system can integrate as many energy-saving transistors as possible, have an efficient system architecture optimized for specific computing tasks, and have an optimized relationship between software and hardware.

The progress of semiconductor technology over the past 50 years has been a clear tunnel, and everyone knows what to do next: keep shrinking the size of transistors.

Now, we have reached the end of this tunnel. Future semiconductor technology development will face more challenges, but at the same time, there are broader possibilities outside the tunnel.

And we will no longer be bound by the limitations of the past.

References:

Advertisement