Rene Walker: Remaking the data center

A major transformation is sweeping over data center switching. Ethernet switch vendors propose data center collapse Three factors are driving the transformation: server virtualization, direct connection of Fibre Channel storage to the IP switching and enterprise cloud computing. Over the next few years the old switching equipment needs to be replaced with faster and more flexible switches. They all need speed and higher throughput to succeed but unlike the past it will take more than just a faster interface.

Without these changes, the dream of a more flexible and lower cost data center will remain just a dream. This time speed needs to be coupled with lower latency, abandoning spanning tree and supporting new storage protocols. Networking in the data center must evolve to a unified switching fabric. The answer is yes. Times are hard, money is tight; can a new unified-fabric really be justified? The cost savings from supporting server virtualization along with merging the separate IP and storage networks is just too great.

The good news is that the switching transformation will take years, not months, so there is still time to plan for the change. Supporting these changes is impossible without the next evolution in switching. The Drivers The story of how server virtualization can save money is well known. Virtualization allows multiple applications to run on the server within their own image, allowing utilization to climb into the 70% to 90% range. Running a single application on a server commonly results in utilization in the 10% to 30% range. This cuts the number of physical servers required; saves on power and cooling and increases operational flexibility.

Storage has been moving to IP for years, with a significant amount of storage already attached via NAS or iSCSI devices. The storage story is not as well known, but the savings are as compelling as the virtualization story. The cost saving and flexibility gain is well known. Moving Fibre Channel to the IP infrastructure is a cost saver. The move now is to directly connect Fibre Channel storage to the IP switches, eliminating the separate Fibre Channel storage-area network. The primary way is by reducing the number of adapters on a server.

Guaranteeing high availability means that each adapters needs to be duplicated resulting in four adapters per server. Currently servers need an Ethernet adapter for IP traffic and a separate storage adapter for the Fibre Channel traffic. A unified fabric reduces the number to two since the IP and Fibre Channel or iSCSI traffic share the same adapter. It also reduces operational costs since there is only one network to maintain. The savings grow since halving the number of adapters reduces the number of switch ports and the amount of cabling.

The third reason is internal or enterprise cloud computing. Over the years, this way of design and implementing applications has changed. In the past when a request reached an application, the work stayed within the server/application. Increasingly when a request arrives at the server, the application may only do a small part of the work; it distributes the work to other applications in the data center, making the data center one big internal cloud. It becomes critical that the cloud provide very low latency with no dropped packets.

Attaching storage directly to this IP cloud only increases the number of critical flows that pass over the switching cloud. A simple example shows why low latency is a must. With most of the switches installed in enterprises the get can take 50 to 100 microseconds to cross the cloud, which depending on the number of calls adds significant delays to processing. If the action took place within the server, then each storage get would only take a few microseconds to a nanosecond to perform. If a switch discards the packet, the response can be even longer.

What is the problem for the network? The only way internal cloud computing works is with a very low latency and non-discarding cloud. Why change the switches? Compared with the rest of the network the current data center switches provide very low latency, discard very few packets and support 10 Gigabit Ethernet interconnects. Why can't the current switching infrastructure handle virtualization, storage and cloud computing? The problem is that these new challenges need even lower latency, better reliability, higher throughput and support for Fibre Channel over Ethernet (FCoE) protocol.

The problem with the current switches is that they are based on a store-and-forward architecture. The first challenge is latency. Store-and-forward is generally associated with applications such as e-mail where the mail server receives the mail, stores it on a disk and then later forwards it to where it needs to go. How are layer 2 switches, which are very fast, store-and-forward devices? Store-and-forward is considered very slow. Switches have large queues.

Putting the packet in a queue is a form of store-and-forward. When a switch receives a packet, it puts it in a queue, and when the message reaches the front of the queue, it is sent. A large queue has been sold as an advantage since it means the switch can handle large bursts of data without discards. The math works as follows. The result of all the queues is that it can take 80 microseconds or more for a large packet to cross a three-tier data center.

It can take 10 microseconds to go from the server to the switch. For example, assume two servers are at the "far" end of the data center. Each switch to switch hop adds 15 microseconds and can add as much as 40 microseconds. A packet leaving the requesting server travels to the top of rack switch, then the end-of-row switch and onward to the core switch. That is four switch-to-switch hops for a minimum of 60 microseconds. The hops are then repeated to the destination server.

Add in the 10 microseconds to reach each server and the total is 80 microseconds. Latency of 80 microseconds each way was acceptable in the past when response time was measured in seconds, but with the goal to provide sub-second response time, the microseconds add up. The delay can increase to well over 100 microseconds and becomes a disaster if a switch has to discard the packet, requiring the TCP stack on the sending server to time out and retransmit the packet. An application that requires a large chunk of data can take a long time to get it when each get can only retrieve 1,564 byes at a time. The impact is not only on response time.

A few hundred round trips add up. The application has to wait for the data resulting in an increase in the elapsed time it takes to process the transaction. The new generation of switches overcomes the large latency of the past by eliminating or significantly reducing queues and speeding up their own processing. That means that while a server is doing the same amount of work, there is an increase in the number of concurrent tasks, lowering the server overall throughput. The words used to describe it are: lossless transport; non-blocking; low latency; guaranteed delivery; multipath and congestion management. Non-blocking means they either don't queue the packet or have a queue length of one or two.

Lossless transport and guaranteed delivery mean they don't discard packets. The first big change in the switches is the design of the way the switch forwards packets. A cut-through design can reduce switch time from 15 to 50 microseconds to 2 to 4 microseconds. Instead of a store-and-forward design, a cut-through design is generally used, which significantly reduces or eliminates queuing inside the switch. Cut-through is not new, but it has always been more complex and expensive to implement. The second big change is abandoning spanning tree within the data center switching fabric.

It is only now with the very low latency requirement that switch manufacturers can justify spending the money to implement it. The new generation of switches use multiple paths through the switching fabric to the destination. Currently all layer 2 switches determine the "best" path from one end-point to another one using the spanning tree algorithm. They are constantly monitoring potential congestion points, or queuing points, and pick the fastest and best path at the time the packet is being sent. Only one path is active, the other paths through the fabric to the destination are only used if the "best" path fails. A current problem with the multi-path approach is that there is no standard on how they do it.

Spanning tree has worked well since the beginning of layer 2 networking but the "only one path" is not good enough in a non-queuing and non-discarding world. Work is underway within standard groups to correct this problem but for the early versions each vendor has their own solution. Even when DCB and other standards are finished there will be many interoperability problems to work out, thus a single vendor solution may be the best strategy. A significant amount of the work falls under a standard referred to as Data Center Bridging (DCB). The reality is that for the immediate future mixing and matching different vendor's switches within the data center is not possible. Speed is still part of the solution.

The result of all these changes reduces the trip time mentioned from 80 microseconds to less than 10 microseconds, providing the needed latency and throughput to make fiber channel and cloud computing practical. The new switches are built for very dense deployment of 10 Gigabit and prepared for 40/100 Gigabit. Virtualization curve ball Server virtualization creates additional problems for the current data center switching environment. This causes operational complications and is a real problem if two virtual servers communicate with each other. The first problem is each physical server has multiple virtual images, each with their own media access control (MAC) address. The easiest answer is to put a soft-switch in the VM, which all the VM vendors provide.

There are several problems with this approach. This allows the server to present a single MAC address to the network switch and perform the functions of a switch for the VMs in the server. The soft switch needs to enforce policy and access control list (ACL); make sure VLANs are followed and implement security. If they were on different physical servers the network would make sure policy and security procedures were followed. For example, if one image is compromised, it should not be able to freely communicate with the other images on the server, if policy says they should not be talking to each other. The simple answer is that the group that maintains the server and the soft switch needs to make sure all the network controls are followed and in place.

Having the network group maintain the soft switch in the server creates the same set of problems. The practical problem with this approach is the coordination required between the two groups and the level of knowledge of the networking required by the server group. Today, the answer is to learn to deal with confusion and develop procedures to make the best of the situation and hope for the best. The idea is that coordination will be easier since the switch vendor built it and has hopefully made the coordination easier. A variation on this is to use a soft switch from the same vendor as the switches in the network. Cisco is offering this approach with VMware.

This would simplify the switch in the VM since it would not have to enforce policy, tag packets or worry about security. The third solution is to have all the communications from the virtual server sent to the network switch. The network switch would perform all these functions as if the virtual servers were directly connected to the servers and this was the first hop into the network. The problem is spanning tree does not allow a port to receive a packet and send it back on the same port. This approach has appeal since it keeps all the well developed processes in place and restores clear accountability on who does what.

The answer is to eliminate the spanning tree restriction of not allowing a message to be sent back over the port it came from. As the number of processors on the physical server keep increasing, the number of images increase, with the result that increasingly large amounts of data need to be moved in and out of the server. Spanning Tree and virtualization The second curve ball from virtualization is ensuring that there is enough throughput to and from the server and that the packet takes the best path through the data center. The first answer is to use 10 Gigabit and eventually 40 or 100 Gigabit. Using both adapters attached to different switches allows multiple paths along the entire route, helping to ensure low latency. This is a good answer but may not be enough since the data center needs to create a very low latency, non-blocking fabric with multiple paths.

Once again spanning tree is the problem. The reality is the new generation layer 2 switches in the data center will act more like routers, implementing their own version of OSPF at layer 2. Storage The last reason new switches are needed is Fibre Channel storage. The solution is to eliminate spanning tree, allowing both adapters to be used. Switches need to support the ability to run storage traffic over Ethernet/IP such as NAS, ISCSI or FCoE. Besides adding support for the FCoE protocol they will also be required to abandon spanning tree and enable greater cross sectional bandwidth. Currently the FCoE protocol is not finished and vendors are implementing a draft version.

For example Fibre Channel requires that both adapters to the server are active and carrying traffic, something the switch's spanning tree algorithm doesn't support. The good news is that it is getting close to finalization. The first step is to determine how much of your traffic needs very low latency right now. Current state of the market How should the coming changes in the data center affect your plan? If cloud computing, migrating critical storage or a new low latency application such as algorithmic stock trading is on the drawing broad, then it is best to start the move now to the new architecture.

The transformation can also be taken in steps. Most enterprises don't fall in that group yet but they will in 2010 or 2011 and thus have time to plan an orderly transformation. For example, one first step would be to migrate Fibre Channel storage onto the IP fabric and immediately reduce the number of adapters on each server. The storage traffic flows over the server's IP adapters and to the top of the rack switch which send the Fibre Channel traffic directly to the SAN. The core and end of rack switch do not have to be replaced. This can be accomplished by replacing just the top of the rack switch.

The top of the rack switch supports having both IP adapters active for storage traffic only with spanning tree's requirement of only one active adapter applying to just the data traffic. If low latency is needed, then all the data center switches need to be replaced. Brocade and Cisco currently offer this option. Most vendors have not yet implemented the full range features needed to support the switching environment described here. The first part is whether the switch can provide very low latency.

To understand where a vendor is; it is best to break it down into two parts. Many vendors such as Arista Networks, Brocade, Cisco, Extreme, Force 10 and Voltaire have switches that can. As is normally the case vendors are split on whether to wait until standards are finished before providing a solution or provide an implementation based on their best guess of what the standards will look like. The second part is whether the vendor can overcome the spanning tree problem along with support for dual adapters and multiple pathing with congestion monitoring. Cisco and Arista Networks have jumped in early and provide the most complete solutions. Other vendors are waiting for the standards to be completed in the next year before releasing products.

Rene Walker

Remaking the data center

0 comments:

Post a Comment

Search

My Blog List

Archive