Understanding PortLand - Richard Lucas' Blog

What is PortLand

Today, most web applications are hosted in the “cloud”, which is in turn hosted in huge datacenters located in various places all over the world. Modern datacenters today suffer from a few limitations including: lack of scalability, difficult management, inflexible communication, and limited support for virtual machine migration.[1] PortLand is a proposal that attempted to solve some of these limitations. And what PortLand proposes is a solution to mitigate these issues by delivering scalable layer 2 routing, forwarding, and addressing for large data center networks. Portland utilizes the three tier fat-tree topology of edge hosts connected to aggreggation hosts which are connected to core hosts (Figure 1).

Why Layer 2

In large data centers, certain requirements come into being:

Easy VM migration
Need for less active administration and configuration
Efficient end-host to end-host communication
No forwarding loops
Rapid and efficient failure detection

Layer 3, IP, doesn’t enable easy VM migration as migration requires that VM’s change their IP addresses. Additionally, configuration of a Layer 3 data center is onerous given that it requires each switch to be configured and DHCP servers to be synchronized. Layer 2, while not the silver bullet, is still more able to satisfy these requirements under the Portland scenario.

Portland is able to solve these issues through the use of Pseudo MAC addresses, controlled by a centralized mechanism (Fabric Manager), enabling efficient, loop-free forwarding with little necessary state management.

Fabric Manager

The fabric manager is a centralized (user) process running on a dedicated machine that maintains a soft state about the network configuration. These responsibilities include information about the topology, assisting with ARP resolution, fault tolerance, and multicast. The use of soft state is important here as it eliminates the need for active administration and replication is simpler as state does not need to be replicated as well.

What the Fabric Manager (FM) Does

The fabric manager reduces broadcast overhead in the common ARP request cases. So, when an edge switch intercepts a ARP request for IP to MAC mapping, it forwards the request onto the fabric manager, and the FM checks the PMAC table (see below), if the entry is there, it gives the PMAC back to the edge switch which creates the ARP reply and sends it to the original host.

VM Migration
Additionally, when doing VM migration, the VM will send out an ARP with it’s new IP to MAC mapping, which gets forwarded to the FM. The FM, in an effort to invalidate any cache that hosts may have regarding the newly migrated VM, sends out an invalidation message to the migrated VM’s previous switch.

Location
The fabric manager is located in the logical center of the data center. It can also be located

Pseudo MAC Addresses

In order to efficiently forward and route as well as easily migrate VMs, Portland utilizes the concept of heirarchical Pseudo MAC Addresses (PMACs). Each end host in the data center gets a PMAC and in turn the PMAC encodes the location of the host in the DC topology.

Components of PMAC

The 48-bit PMAC is composed of 4 parts in the form of [pod].[position].[port].[vmid], so:

Pod (16bits): The pod number of the edge switch
Position (8bits): Hosts position in the pod
Port (8bits): Switch-local view of the port number the host is connected to
VMID (16bits): Used to multiplex multiple VMs on the physical box.

This constructed PMAC ([pod].[position].[port].[vmid]) is then entered into the switch’s local PMAC table that is mapped to the hosts Actual MAC address (AMAC) and IP address. This mapping then is propogated to the Fabric Manager and it uses it to respond to ARP requests. Additionally, the switch creates a flow table entry to rewrite PMAC destination address to the AMAC for traffic headed to that host.

Why PMAC is useful

PMAC addresses enable efficient, provably loop-free forwarding with small switch state.

Given that traditional layer 2 based centers have scalability and efficiency issues due to the need to support broadcast (usually via ARP). It also requires that switches maintain large MAC forwarding tables that can have 100,000 to millions of entries, requiring memory and software that just isn’t practical in today’s switch hardware landscape.

Location Discovery Protocol

The Location Discovery Protocol (LDP) is used in the assignment of the PMAC. LDP in turn, utilizes a messaging system called Location Discovery Message (LDM), that gets periodically sent by switches in the system out on all of their ports. This protocol enables loop free propogation of packets by ensuring that packets travel down the topology and not back up.

Conclusion

Portland is a data center network fabric that utilizes PMACs, a Fabric Manager, and a Location Discovery Protocol as the core components for improving data handling in large scale data centers. This design improves on forwarding table sizes, which when layer 2 mapped, can contain 100,000 or more entries

[1] Mysore, Pamboris, Farrington, Huang, Miri, Radhakrishnan, Subramanya, and Vahdat. 2009. PortLand: A Scalable Fault-Tolerant Layer 2 Data Center Network Fabric