The Newest Developments in Backbone Optical Communication Trends

In today’s article, Xiao Zaojun will talk to you about some of the latest technological trends in backbone network optical communications.

400G, it’s really here

You may have heard that since last year, domestic operator backbone networks have fully launched 400G commercial use.

Advertisement

First, there will be a large number of commercial verifications in 2023, and then the full launch of centralized procurement. 2024 will be the official launch of large-scale commercial use.

Not long ago, in March 2024, China Mobile opened the world's first 400G all-optical interprovincial (Beijing-Inner Mongolia) trunk line, which is regarded as an important landmark event.

The reason for upgrading the backbone network to 400G is obvious.

On the one hand, the growth in consumer Internet traffic brought about by residents’ digital life (high-definition video, remote conferencing, online live broadcasts, online games, etc.) continues.

Advertisement

On the other hand, the entire industry is promoting digital transformation, and the surge in traffic from industry digital systems has intensified the pressure on backbone networks.

The pressure on the backbone network has increased sharply, and there is another key reason – the explosion of AI.

After the rise of the AIGC large model, it triggered a wave of AI. In order to meet the needs of AI business, a large number of intelligent computing centers need to be built. Models are developing from hundreds of billions of parameters to trillions of parameters, and GPU computing clusters are also moving from kilo-card clusters to 10,000-card clusters or even 100,000-card clusters.

Xiao Zaojun introduced in a previous article that a GPU computing cluster is actually an array of massive GPU cards (GPU servers) connected through high-performance networks (such as InfiniBand, RoCEv2). It has extremely high requirements on network performance and reliability, which directly affects training efficiency and cost.

Just looking at the network port speed of the GPU server, it has started from 400G per port, and even needs to use 800G or higher.

Network port of the GPU server

Previously, GPU computing clusters fell into the category of DCN (Data Center Internal Network). Now, as the cluster scale continues to expand, distributed intelligent computing centers have begun to be considered for model training.

In other words, several intelligent computing centers in different places are used together for training.

This puts forward higher requirements for DCI (data center interconnection network), and the optical communication backbone network must be able to meet this demand in terms of technical performance.

Our country’s computing power strategy still adheres to the idea of ​​“national coordination and overall layout.” Starting from February 2022, my country has launched the East-West Calculation Project to create a national integrated computing power system.

To put it simply, on the one hand, we need to build a large number of data centers (equivalent to power plants), and on the other hand, we must also build a strong backbone transmission network (equivalent to the transmission grid) to “circulate” these computing power to meet the needs of The needs of all walks of life.

400G, how is it done?

The current optical communication backbone network, as the base of the entire digital society, must have ultra-large bandwidth (400G, 800G or even 1.6T in the future), ultra-low latency (multi-level delay circles), and ultra-large-scale networking (serving Distributed computing, as well as the AI ​​cluster just mentioned), ultra-high stability, ultra-high reliability, ultra-high security, ultra-flexible deployment, intelligent operation and maintenance control and other features.

Today, we mainly talk about the most important rate bandwidth.

With the development of optical communication technology to this day, if you want to improve the speed, it is nothing more than making a fuss in the following aspects:

First, there's the baud rate.

The transmission rate is the bit rate, which is the number of bits transmitted per unit time, and the unit is bit/s.

Bit rate = baud rate × number of bits corresponding to a single modulation state.

The baud rate is the number of symbols transmitted per unit time. The higher the baud rate, the more symbols are transmitted per second. Of course, the amount of information is larger, and the speed increases.

The baud rate is determined by the capabilities of the optical device. The more advanced the device chip process, the higher the baud rate and the higher the speed (bit rate).

Currently, the CMOS process has improved from 16nm to 7nm and 5nm, and the baud rate has gradually increased from 30+Gbaud to 64+Gbaud, 90+Gbaud, and 128+Gbaud.

Today's 400G can be commercialized because the baud rate can reach 128Gbaud.

Let’s look at the modulation method.

In the formula just now, the “number of binary digits corresponding to a single modulation state” is determined by the modulation method.

There are currently three main modulation schemes for 400G technology: 16QAM, 16QAM-PCS (PCS is a probabilistic shaping technology, which will be introduced next time) and QPSK, which are suitable for different application scenarios.

Optical communication is different from wireless communication, and we do not blindly pursue high-order modulation.

The lower the modulation order, the lower the line requirements and the lower the network construction cost. Therefore, in the early design stage of long-distance backbone networks, we basically focused on 16QAM and QPSK. Later, 16QAM-PCS joined the competition.

There was no mention of “digital data from the east and from the west” before. Operators believed that 400G would not require long-distance transmission. Therefore, low baud rate devices with more mature technology and lower prices were used in conjunction with 16QAM with a higher modulation order. , is the mainstream opinion in the industry.

Later, on the one hand, the transmission distance requirements increased, from more than 1,000 kilometers to several thousand kilometers, and on the other hand, 128GBaud baud rate devices quickly matured (in the DCN scenario, 800G rose rapidly, stimulating and promoting the industry chain), which provided The conditions were created for QPSK to stand out.

QPSK has higher tolerance to nonlinearity and can appropriately increase fiber input power compared to 16QAM-PCS. Secondly, QPSK's back-to-back OSNR threshold is optimized compared to 16QAM-PCS. Furthermore, the channel spacing of QPSK is set to 150GHz, so that there is almost no filtering cost during the transmission process.

These advantages have gradually made QPSK the industry's unanimous first choice for backbone networks and DCI.

A rough comparison of the three options

Now, for the first two solutions, the application scenarios being considered are more in urban areas or provincial trunk areas.

The third is the extended band.

Baud rate and modulation mainly affect the single wave rate. An optical fiber can have multiple waves, as long as the spectrum range is large enough.

Single wave bandwidth × single fiber wave number = single fiber bandwidth.

As written in the previous table, the channel spacing of QPSK 400G reaches 150GHz. Neither traditional C-band nor extended C-band is sufficient to meet the spectrum bandwidth requirements.

Therefore, the C6T+L6T method is now gradually adopted, with a total spectrum bandwidth of 12THz. Calculate, 80 waves, single wave 400G, together is a single fiber 32T capacity. If you sacrifice some distance and use it to save traffic, deploy QPSK or 16QAM-PCS, and the capacity can be even larger, reaching 48T.

For a detailed introduction to the bands, you can see here:What are the wavelength bands of optical communication?

The biggest problem with extending the band is whether the device can support it and whether the cost is controllable. The devices mentioned here include ITLA, CDM, ICR, EDFA and WSS, etc., which involve optical transceiver, optical path switching, amplification, etc.

If the band is expanded, there is another issue involved, and that is integration.

The current band expansion is actually more like a simple binding of two systems (C and L). The two systems operate independently, transmitting by combining the waves, and then splitting the waves when they reach the opposite end, and each continues processing.

If there are two systems, the volume will be larger, the power consumption will be higher, and the design will be more complex. Therefore, the industry needs to study how to integrate devices to truly enable a system to support different extended bands at the same time. That is to achieve true integration.

Optical fiber communications, in addition to optical modules and optical equipment, also need to pay attention to optical fibers.

The current mainstream optical fiber is G.652D optical fiber. 400G QPSK, on ​​G.652D, can also transmit 1500km with the help of EDFA amplification.

After years of verification, the industry has determined that G.654E optical fiber is the new successor. If G.654E with better performance is used, under the same conditions, the transmission distance of 400G QPSK can be increased by more than 30%.

G.654E optical fiber has the capability of large-scale production and will be deployed on a large scale on long-distance trunk lines. Some low-loss optical fibers of the G.654 series have also become the first choice for ultra-long-distance transmission across oceans in submarine cable systems.

In addition to traditional optical fiber. The industry also believes that multi-core optical fiber and hollow-core optical fiber have broad application prospects.

Multi-core optical fiber is a kind of space division multiplexing. Inserting more fiber cores into one optical fiber and using few modes can greatly increase the capacity of the optical fiber.

Hollow-core optical fibers are even more awesome. They directly make the optical fiber hollow and replace the glass fiber core with air.

Hollow-core optical fiber has been proven to bring greater capacity, lower latency, smaller transmission loss, and ultra-low nonlinearity. It is unanimously considered by the industry to be one of the most promising technologies in optical communications.

The next step for 400G, 800G or 1.6T?

After 400G is officially commercialized on a large scale, the entire industry will focus on the technical standard system beyond 400G.

The industry is still stepping up its debate on whether to pursue 800G, 1.2T or 1.6T next.

If you want to achieve a higher rate, you must continue to work on “modulation method + baud rate”. 130GBd, or higher 260GBd, is the inevitable direction. Higher baud rates mean that related devices must keep up and form a mature industrial chain.

Above 400G, you can no longer count on QPSK. 16QAM modulation is a generally accepted option in the industry.

The band also needs to be further expanded. On the basis of expanding C and L, consider expanding to S-band, U-band, E-band, etc. If it is C+L+S, it is 12T+5T, reaching a bandwidth of 17THz.

Due to the superposition of many factors, the transmission rate of a single optical fiber in a single direction exceeds 100Tbps, which is just around the corner.

Within the data center, 800G (based on a baud rate of 100GBd or above, single channel 100G) is already commercially available. Single channel 200G, 400G, 800G, but the time is sooner or later. In this regard, progress abroad is faster.

As capacity continues to increase, so do the technical challenges. The development of optical communications, to put it bluntly, depends on devices, chips, processes, and materials.

To meet the aforementioned power consumption, security, operation and maintenance requirements, it also relies on a series of innovations such as process, architecture, packaging, artificial intelligence, and digital twins. There is still a lot of work that needs to be done upstream and downstream of the industrial chain. The road ahead is still long.

last words

Optical communication is the digital artery of the entire society. Over the years, people have questioned many technologies (including 5G), but no one will question optical communications because it is a rigid need for social development.

The trend of increasing human data traffic will not change in the next few decades. The rapid rise of artificial intelligence technology will further amplify this trend.

The current development of optical communications cannot meet the demand. This means that companies will have greater motivation to invest resources in research and development in order to obtain profits.

It is hoped that the optical communications industry will further explode and pave the way for the development of a digital society.

references:

  • 1. “Key technologies, application progress and future prospects of high-speed optical transmission in the AI ​​era”, Institute of Technology and Standards, Academy of Information and Communications Technology, Zhang Haiyi;

  • 2. “Computing power network opens 400G all-optical new era”, China Mobile Research Institute, Duan Xiaodong;

  • 3. “400G All-optical Computing Power Internet in the AI ​​Era”, China Unicom Research Institute, Tang Xiongyan.

Advertisement