As noted in Ford Tamer’s blog post on “Bandwidth in the Age of COVID-19”, currently, there is an explosion in network data and traffic, driven by concurrent and overlapping trends ranging from the proliferation of consumer applications with media rich services such as Netflix, Hulu, YouTube, Instagram and TikTok, to increased demand for business services including high-performance computing, as well as the dramatic ramp in network usage resulting from seemingly endless video calls now a daily part of life for those of us working from home. Not surprisingly, more users and more applications are driving the volumes of data transfer and in turn, the need for more computing power and more storage. But why do these things require faster I/O speeds?
The short answer is that at different design points, there is simply no other way to increase the networking speed without increasing the I/O speed.
Let’s first think about the difference between a metal wire I/O and an optical fiber I/O. For simplicity’s sake, I may refer to the metal wire as copper, although sometimes different metals or combinations of metals are used. Today’s compute-processor silicon and switching silicon have metal I/Os that are typically connected inside a package, called a substrate, that connects the silicon metal I/Os to pins or balls on the outside of the package that are soldered or connected through a socket to a printed circuit board (PCB) for routing to the outside world.
Connecting multiple PCBs together requires either copper cable or optical fiber. The choice depends on the distance between the two boards and the speed of the connectivity. Copper cable, although typically cheaper than fiber-based connectivity, is more limited by speed and distance than optical fiber. The speed of the signal on a copper cable can be different than the speed of a signal on optical fiber within the same system.
So, what does speed really mean? In data transfer for networking, speed is measured in bits per second (bps). This means how many bits cross the wire every second. In today’s data centers, it is common to see speeds in the range of 10’s of gigabits per second on a single copper wire or optical fiber, and hundreds of gigabits per second in a group of wires or fibers connected to the same port. We are in the early phases of 100 gigabits per second as a common speed on a single optical fiber, and soon on a single copper wire. But there are many ways to send a bit across a wire, and not every method is the same. Note: Historically, there were many technologies that enabled transmission of data on copper cables. In the networking world, the most common was called “Base-T” but I will not address this technology since Base-T ran to the end of its lifetime at 10GBaseT due to a number of factors. However, there are some proposals to take Base-T to 25GBase-T, although there is very little to no support from the industry here. All references to Gbps below do not refer to any Base-T technology.
In 2014, the 25G Ethernet Consortium was launched to form an industry standard for 25Gbps over a single wire, using two wires for 50G and eventually four wires in each direction to get 100Gbps. At that time, 10Gbps was the fastest common single-wire electrical I/O which is 10 Gigabits per second, or 10 billion bits per second. The fastest single chip ethernet switch at that time was 1.28 Tbps with 128 ports of 10G I/0.
So why did the industry need 25Gbps I/0? As compute got faster, and cloud networks connected a higher number of servers together at faster speeds, the networking industry required a switch that was faster than 1.28Tbps. However, at the time, packaging technology could not enable more than 128 x 10Gbps I/O in any reasonably available form factor. So, the industry was forced to move to a faster I/O. And this is the real answer to the question I originally asked. The industry is continually forced to go to faster I/O speeds because of technology limitations in packaging. 25Gbps I/O eventually enabled a 3.2Tbps switch which has been the mainstream choice for top-of-rack switches in hyperscale networks for a long time.
So now, surely you must be asking yourself, what happened after 3.2Tbps terabit switches with 25G I/O?
To go beyond 3.2Tbps w/ 25G I/O, the industry needed a lower power method to send all these bits across a wire. For all the examples I mentioned above, data is sent across the wire as a series of 1s and 0s. These 1s and 0s are encoded with a method called Non-Return to Zero (NRZ). This is a very simple method where two voltage levels on the wire represent a “symbol” which gets encoded and decoded as these bits are initiated or terminated at their endpoints. In NRZ modulation, the symbol is either a single 0 or 1. NRZ is very simplistic as it can only send 1 bit per symbol. A ground level, or voltage of zero, represents a “0”, and a Vdd ,or high rail voltage, represents a “1”. These bits travel down the wire at the speed of the clock rate. Every clock cycle, the voltage is measured to determine if the signal is a zero or one. As you can see, the bit rate in the NRZ case is simply the clock rate. If the clock runs at 10 billion cycles per second, then the bit rate is 10 billion bits per second, or as I have shown above 10Gbps. So, in NRZ encoding, the bit rate is the clock rate.
To go deeper into this topic, I need to introduce the definition of “baud” (Bd) rate. Baud rate is named after Emile Baudot the inventor of Baudot code for telegraphy in the 1870s. Baudot code was used to transmit characters to a telegraph. For those of you that are familiar with ASCII, Baudot code eventually evolved over time into ASCII. However, I digress.
Baud rate today is defined as the rate that a symbol can travel down a wire. In the NRZ example, I referred to this as the clock rate. I think this is the easiest way to remember what baud rate is- simply the clock rate. For every clock cycle, one symbol is sent. In the NRZ example a symbol is a single bit; either a “0” or a “1”. So, if the industry is forced to go to faster I/Os, and would want lower power per bit, what are the options and what are the tradeoffs?
25Gbps SerDes as its predecessors, 10Gbps and 1Gbp before, is based on NRZ encoding. To go from 1Gbps in the 1990’s to 25Gbps in 2010s, the industry simply focused on increasing the baud rate. In other words, just cranking up the clock rate, brute force method, with no attention to getting smarter about the encoding. However, the solution is not so simple. As it turns out, turning up the baud rate causes a number of complexities on the design of data transmission components. The higher frequency makes it more difficult to design semiconductor packages, more difficult to design PCBs, more difficult to design the connectors, as well as the cables, and much more difficult to design any analog electronics that must be used to amplify the signal without creating too much signal distortion or noise.
Once again, the industry needed a smarter method to lower the power per bit and move past 25Gbps NRZ. It found one. The smarter approach is achieved with higher order modulation. This means encoding more data into the signal at every clock cycle. With higher order modulation, we can keep the baud rate (clock rate) the same and increase the data rate by having more bits represented as a symbol at every clock cycle. In NRZ, the symbol was a binary “0” or a “1”. But what if a symbol can be two binary bits for a total of four values every clock cycle? “00”, “01”, “10”, “11”. For those of you know that know binary, you would convert this to a decimal number in your head and read it as “0”, “1”, “2”, “3”. If we keep the same baud rate (clock rate) as NRZ, but use a new symbol encoding of four different values, instead of just the two different values, we are now effectively doubling the bit rate. We can now transmit 50Gbps at 25Gbaud with this new encoding instead of only 25Gbps at 25Gbaud with NRZ. This allows us to transfer data faster without having the additional complexities noted above associated with going to the faster baud rate approach.
Welcome to PAM4 encoding. PAM4, or spelled out Pulse Amplitude Modulation operates at four levels. This means that the voltage level of the signal is used to represent four levels. Recall in the discussion above with NRZ. NRZ had two values “0” and “1” represented by a voltage level of ground (0) or Vdd (1). With PAM4 we add more granularity to the voltage levels, with (00) represented by a ground voltage. (11) is represented by the rail voltage Vdd. (01) is represented by 1/3 of Vdd, and (10) is represented by 2/3 of Vdd.
Today we are starting to see mass deployment of 12.8Tbps switches utilizing 50G PAM4 I/0 in hyperscale data centers. These switches typically have modular cages on the front panel that accept different types of pluggable media. Typically, three types of media may be plugged into these front panel cages for connectivity to other devices in the data center: Direct attach passive Cable (DAC), Active Optical Cable (AOC), or pluggable optical modules. All of these modules typically have two, four, or eight wires or fibers in each direction for a total combined speed of 100Gbps, 200Gbps, or 400Gbps.
However, the benefit here is that the fastest baud rate is still only 25Gbaud. By using PAM4, the industry migrated from 3.2Tbps, with 100G cables and fiber assemblies, to 12.8Tbps switches with 400G cables and fiber assemblies without increasing the baud rate and achieving a lower power consumption per bit!
The industry is now sampling 25.6Tbps second switches with 50Gbps PAM4 electrical I/0. Yes, that is 512 I/0 in each direction. Clearly, packaging technology has evolved since 2012 when the most that could be done was 128 I/0 of 10Gbps in these large packet-switch silicon packages.
Today we are seeing common implementations of 100Gbps on a single fiber with PAM4. Eight of these fibers can be used in each direction so that now we have an 800G, pluggable, optical module for these switches. We believe that fairly soon, we will see 100Gbps PAM4 as a common electrical I/O.
Moving forward ,we envision 200Gbps on a single fiber with PAM4. This will enable 1.6T modules with eight fibers in each direction. I wonder what Emile Baudot would say if he could see these modules?