University Politehnica of Bucharest [307767]
University “Politehnica” [anonimizat],
[anonimizat], Student: [anonimizat],
s.l. dr. ing. Șerban OBREJA Mircea Cristian HUIDEȘ
2016
Copyright © 2016, Mircea Cristian Huideș
All rights reserved.
The author hereby grants to UPB permission to reproduce and to distribute publicly paper and electronic copies of this thesis document in whole or in part.
Abstract
Some network operators build and operate data centers that support over thousands of servers. Such data centers are referred to as "large-scale" to differentiate them from smaller infrastructures. Environments of this scale have a [anonimizat]. [anonimizat] 3 routing protocol is summarized and explained. Design difficulties are carefully analyzed in order to provide a better insight and future development is being kept in mind.
1. Large networks in the context of internet and cloud development
1.1. Introduction
When we speak about large networks in the context of cloud networking we must get familiarized with a few aspects that are fundamental to the design and overall structure of such a network.
We should start by defining what “The Cloud” [anonimizat].
[anonimizat], computing, storage and support without having to manage them or the added complexity and constraints that come with such much needed features.
The cloud can be seen as the Internet. [anonimizat]-farm construct of the Internet as accepting connections and designating out information as it floats. A connection to the “Cloud” can be vied in Fig.1.1.
Fig.1.1. – Connection to the Cloud
The benefits come with a [anonimizat], [anonimizat], no software update or necessary maintenance for the point of view of the user. [anonimizat], [anonimizat].
Heaving a server in residence or a dedicated network attached storage (NAS) device is not what the cloud is about. Also by storing data on an office network or on a home PC does not count as using the Cloud.
[anonimizat]. [anonimizat], connectivity and control are provided by the cloud as a service.
There are two categories within cloud networking: Cloud-Enabled Networking (CEN) and Cloud-Based Networking (CBN).
Management, policy definition and other aspects of control are managed by the Cloud for CEN cloud networking but connectivity, security, routing and switching services are kept local.
For CBN networking, there is no need for any local hardware other than that necessary for a connection to the Internet because all of the core features like addressing and packet path are handled by the Cloud. This category is also referred to as Network-as-a-Service (NaaS).
While providers that are categorized as CBN are by definition NaaS, it is not a rule that all NaaS offerings must or have to be constructed the same way. For a better understanding of the difference, let’s consider the following: Amazon Web Services (AWS) is one cloud-based service that offers users a robust and elastic computing infrastructure. On the other hand, users can contract with to manage hosted servers with any hosting provider. Both of the examples are cloud computing delivered as a service, but the capabilities, economics, extensibility and simplicity between the two approaches vary widely.
The new generation of cloud-based NaaS, built as an extension on global cloud data centers, take advantage of software-defined networking (SDN) and especially of virtualization technologies to provide an elastic and resilient NaaS that can host multiple virtual network services at once. This approach simplifies the work that is necessary on the side of the provider by making management and the whole solution easier and cheaper.
Cloud computing is what it sounds like, instead of having local servers or independent personal computers process applications and using their hardware resources, it relies on sharing computer resources. This is comparable to something called grid computing where unused computer power is harnessed to solve the problems that are too hard for any stand-alone equipment.
Different services like storage, applications and also server like features are accessible through the Internet by any devices that are permitted by the cloud provider.
In this way, high-performance computational power that is normally not available for normal users can facilitate consumer-oriented applications like financial simulations, security analysis and other computer intensive online applications like research,emulations and simulations.
By using large groups of equipment like servers and other hardware that has specialized software running, cloud computing can spread data processing across them obtaining results that would be individually impossible to obtain. Virtualization software and large pools of systems linked together connected in data centers make up the Cloud we know.
In order to connect all of this infrastructure and provide this features a Cloud network operator must have the necessary equipment (a big datacenter) but also a large network interconnecting all of the virtual servers, services and all of this must as well be connected to the internet through high speed redundant links. This is because the user must be granted full access to the services that are provided by these servers.
It is very important that this network of routers and switches, to be able to sustain high quantities of traffic and have a complex redundancy system in order to provide a high availability.
1.2. Cloud context Particularities
1.2.1. Particularities
CBN provides an advantage by not requiring any additional hardware beyond a working station and a connection to the Internet and dedicated servers. At the other end of the connection, Cloud-based networks require an Internet connection and work over any physical infrastructure, wired or wireless and the access can be made over a public or private infrastructure by using IPSec or other encrypting services over Virtual Private LAN.
By enabling users to securely access applications, files, printers, etc. at anytime and anywhere in the world, on any device, Cloud networks are often required to offer the services of a virtual private network (VPN) on top of other services. Each virtual cloud network acts like a borderless LAN from the perspective of the user and functions like a fully switched, any-to-any connectivity between servers, PCs, and mobile devices from anywhere. All of the aspects regarding building and managing an operational VPN which can be summarized into topology structure, high-availability, traffic engineering, capacity planning and the network operation center, by the definition of a NaaS delivery, means that they are not performed by the customer anymore, but by the cloud network operator.
Multi-tenant applications and highly resilient enterprise-level network capabilities are a new way to market distributed enterprise networks through which cloud networking distributes its services without requiring a big investment in networking equipment. Cloud networking is elegant and in a simple graceful manner, enables business to deploy remote locations within a short amount of time while keeping a centralized control of the network.
Cloud storage systems generally rely on hundreds of data servers. Because computers occasionally require maintenance or repair, it's important to store the same information on multiple machines. This is called redundancy. Without redundancy, a cloud storage system couldn't ensure clients that they could access their information at any given time. Most systems store the same data on servers that use different power supplies. That way, clients can access their data even if one power supply fails.
Not all cloud storage clients are worried about running out of storage space. They use cloud storage as a way to create backups of data. If something happens to the client's computer system, the data survives off-site. It's a digital-age variation of "don't put all your eggs in one basket."
1.2.2. Examples
Common Cloud Examples
The Cloud is connected to most applications used on our computers and sometimes the lines between cloud computing and local computing can get blurry. Some grate examples of this are given by Microsoft Office 365, OneDrive or Dropbox, which are software that utilize a form of cloud computing for storage or for software processing.
Web based cloud computing that only utilizes a workstation and a connection to the Internet is an extension of the features that a cloud network has. For example Microsoft Word, PowerPoint, Excel can be accessed only by having a connection to the Internet and a web browser.
Most of Google applications: Maps. Calendar, Translate, Gmail can be considered as cloud computing services but Google Drive is an application that at its core is designed with the cloud in mind. Not only you can create Sheets and Slides access them securely from multiple platforms, everything is done at a remote location and you have all of the functionalities as when working on a local computer.
Synchronization is another important feature of a cloud design and is meant to enhance the computing experience of users as this provides an additional back-up location for important files. Other examples besides Drive are Amazon Cloud Drive and Hybrid services like SugarSync or Dropbox which all have synchronization features but also store other application files. Another less known example is in gaming where Steam can provide Cloud features by saving and syncing of saved games and other properties, photos and achievements.
Facebook Cloud Example
For small websites, the user database and the server software can be hosted by the same physical device, in other words, the only communication that happens is between the Internet and that server. But, big networks are required to support the communication between different servers that are used to give a complete experience for a website like Facebook.
“Instead, a front-end web server handles the original HTTP request, and then fetches data from a number of different cache, database, and backend servers in order to render the final page. All of this additional communication must traverse the internal data center network.
In one measurement, a particular HTTP request required 88 cache lookups (648 KB), 35 database lookups (25.6 KB), and 392 backend remote procedure calls (257 KB), and took a total of 3.1 seconds for the page to completely load. This data has a number of uses, from building search indices to capacity planning to optimize product behavior.
Facebook’s current “4-post” data center network architecture (Fig. 1.2.) shows Facebook’s current data center network. Each rack contains a rack switch (RSW) with up to forty-four 10G downlinks and four or eight 10G uplinks (typically 10:1 oversubscription), one to each cluster switch (CSW). A cluster is a group of four CSWs and the corresponding server racks and RSWs. Each CSW has four 40G uplinks (10G×4), one to each of four “FatCat” aggregation switches (typically 4:1 oversubscription). The four CSWs in each cluster are connected in an 80G protection ring (10G×8) and the four FC switches are connected in a 160G protection ring (10G×16). Intra-rack cables are SFP+ direct attach copper; otherwise MMF is used (10GBASE-SR).
Fig. 1.2. – 4-post Data Center network architecture
The current 4-post architecture solved a number of issues Facebook had encountered in the past. For example, network failures used to be one of the primary causes of service outages. The additional redundancy in 4-post has made such outages rare. In another example, traffic that needed to cross between clusters used to traverse expensive router links.
The addition of the FC tier greatly reduced the traffic through such links. The main disadvantages of 4-post are all a direct result of using very large, modular CSW and FC switches.
First, a CSW failure reduces intra-cluster capacity to 75%; an FC failure reduces inter-cluster capacity to 75%. Second, the cluster size is dictated by the size of the CSW. This architecture, results in a fewer number of very large clusters, making it more difficult to allocate resources among Facebook product groups.”
2. Datacenter networks challenges
2.1. Introduction
A data center is a facility used to accommodate computer systems and components, such as telecommunications equipment and server storage systems. It has to include back-up systems, for redundancy of power supplies, data communications services, environmental controls, including but not limited to air conditioning, fire suppression and various security devices. Large data centers are dimensioned at industrial large scale operations using as much electricity as it needs, more or less as a small town.
IT operations continuity is the most determining aspect of the great majority of the organizational operations around the world. The most important concept is, therefore the business continuity, meaning that companies rely on their information systems to run their operations. If an IT system becomes unavailable the company operations will be impaired or stopped completely, as if the blood system of human body stops its function completely or partially. For this reason it is necessary to provide a reliable and protected infrastructure for IT operations, in order to minimize the probability of disruption. Information security is a must, so for this reason a data center has to offer an external and internal secure environment which minimizes the chances of a security breach.
For the above mentioned reasons, starting with the physical level, it is necessary to implement and also proof the efficiency of redundancy of mechanical cooling and power systems, emergency backup power generators serving the data center, together with backup fiber optic cables connections.
Some guidelines for data center spaces within telecommunications networks, and environmental requirements for the equipment intended for installation in those spaces are provided by standards as Telcordia GR-3160, NEBS Requirements for Telecommunications Data Center Equipment and Spaces. All this criteria were developed jointly by Telcordia and industry representatives and they may and should be applied to data center spaces housing data processing or Information Technology (IT) equipment. The equipment may be used to:
– Operate and manage a carrier's telecommunication network
– Provide data center based applications directly to the carrier's customers
– Provide hosted applications for a third party to provide services to their customers
– Provide a combination of these and similar data center applications
Effective data center operation requires a balanced investment in both the facility and the housed equipment, taking into consideration the cost of investment and the cost-benefit analysis. The first step is to establish a baseline facility environment suitable for equipment installation, in the line with standardization and modularity which can assure savings and efficiencies in the design or construction of telecommunications data centers.
Standardization is one of the most important aspects, meaning integrated building and equipment engineering and design must be in accordance with modularity principle that has the benefits of scalability and easier growth, even when planning forecasts are less than optimal. For these reasons, telecommunications data centers should be planned in an iterative process, using repetitive building blocks of equipment, and associated power and support necessities in all the cases when conditioning equipment principle is practical.
The use of dedicated centralized systems requires more accurate forecasts of future needs to prevent expensive over construction, or worse, under construction, that fails even to meet the immediate or near future needs.
Ideally, the "lights-out" data center, also known as a darkened or a dark data center, is a data center that has eliminated the need for direct access of personnel, except under extraordinary circumstances and therefore, it can be operated without lighting. All of the devices are accessed and managed in an unmanned manner, only by remote systems, with automation programs used to perform unattended operations. In addition to the energy savings, the reduction in staffing costs and also the ability to locate the site outside from population centers assures that implementing a lights-out data center reduces the threat of malicious attacks upon the infrastructure.
As a consequence, now days there is a trend to modernize data centers in order to take advantage of the performance and energy efficiency increases of newer IT equipment and capabilities, such as cloud computing. The process above presented is also known as data center transformation, which takes a step-by-step approach through integrated projects carried out over time, being different of the traditional method of data center upgrades that takes a serial and separate approach. The typical projects within a data center transformation initiative include standardization/consolidation, virtualization, automation and security.
2.2. Network infrastructure
Communications in data centers today are most often based on networks running the IP protocol suite on different equipment consisting of routers and switches that transport traffic between the servers and to the outside world. Therefore, the redundancy of the Internet connection is often provided by using two or more upstream connection from two or more Internet Service Providers (ISP’s), method often called multi-homing.
Some of the servers at the data center are used for running the basic Internet and Intranet services needed by internal users in the organization for day to day activities, for example: e-mail servers, proxy servers, Active Domains and DNS servers.
In the security network domain, the security elements are also usually deployed, including, but not limited to equipment and functions like firewalls, VPN gateways, intrusion detection systems, and so on. Also common, are monitoring systems for the network and some of the applications, together with additional off site monitoring systems, also typical for the case of a failure of communications inside the Data Center.
Having in mind the above mentioned aspects, I’m going to present first an overview of network design requirements and considerations for large-scale data centers and then traditional hierarchical data center network topologies, in contrast with Clos networks that are horizontally scaled out. Together with the above mentioned items, these will be followed by presenting arguments for selecting between BGP (Border Gateway Protocol) and OSPF (Open Shortest Path First) with a Clos topology as the most appropriate routing protocol to meet the considered requirements.
To the current today many large-scale data centers host applications that are generating significant amounts of server-to-server traffic that never leave the Data Center, widely named as "east-west" traffic. Examples of such applications being compute clusters such as Hadoop, massive data replication between clusters needed by certain applications, or virtual machine migrations which demand scaling "up" traditional tree topologies to match these bandwidth demands.
2.3. Classical datacenter design
Let’s see further a comparison between Top of Rack (ToR) and End of Row (EoR) Data Center Designs. The two most popular data center physical designs, “Top of Rack” (ToR), and “End of Row” (EoR) are examined in this section and a comparison between them is presented, in order to better understand the infrastructure that has to be interconnected to the Internet.
In order for Layer 2 connectivity to function well and still have highly scalability properties, clustering and virtualization solutions have to be used. VMware ESX Server handles quite well on the virtualization side while for clustering, Microsoft Cluster Service is used. Although these systems allow for a highly scalable Layer 2 network, data centers and organizations are still shifting to a more centralized, easy to manage and fast converging Layer 3 model.
Some DC and other locations take action because advances in technology are shifting away from using Spanning Tree Protocol for loop management and this also causes changes in the technologies used primarily to manage Layer 2 network topologies. The shift from STP can be seen on the figure below Fig. 2.1.
Fig. 2.1. – STP to L2MP/TRILL evolution
Even if certain improvements have been made to STP like RSTP+ in order to allow Layer 2 network to converge faster and reduce the delay and overhead traffic caused by rediscovery process, the delay introduced in cases of failure is still too great and no available solution exists that permits a solution at Layer 2 using STP for loop management.
Another enhancement made for Layer 2 topologies was made by configuring multiple links between the same devices to act as a single logical link. This technology that was standardized as IEEE 802.3ad was called PortChannel and the benefits of using it are many. On top of managing the loop problem by decreasing the number of logical ports that STP has to manage, it equalizes traffic load on the links present in the Ethernet bundle by forwarding traffic using a load-balancing algorithm.
If any link in the bundle experiences failure, it can be handled quite fast with close no traffic loss and little effect on the STP convergence process. These advances make Layer 2 topologies more attractive than in the past, but at Layer 3, the operator has a lot more flexibility in the network design and functionality.
2.3.1. Top of Rack Design
In the Top of Rack (ToR) design, depicted in Fig. 2.2., servers connect to one or two Ethernet switches installed inside the rack. The term “top of rack” has been adopted for this design but however, the actual physical location of the switches does not necessarily need to be at the top of the rack.
Obviously, the switch can be located not only on top but also in the middle or at the bottom of the rack, even though the preferred place is the first one. The reason for choosing that location is in respect with the advantages of accessibility and cable management. As all the corresponding cables for one rack stays within, this design is often referred as In-Rack cabling. Actually, the term “cabling” is represented by relatively short RJ45 patches from the server to the rack Ethernet switch.
This switch, links the rack to the Data Center network with high speed, high capacity optical fiber cables running directly from the rack to a common aggregation area. The “Distribution” or “Aggregation” area consists of high density modular Ethernet switches connected in a redundant configuration, with the main role in connecting all datacenter equipment.
Fig. 2.2. – Top of Rack (ToR) design
As long as we are talking of large quantity of equipment, optical cable connection spare a lot of copper cable expensive infrastructure. Another representative advantage of using optical fiber cable infrastructure for connecting the DC equipment is directly related to the distance limitations of copper twisted pair cable and bandwidth. Last but not least important is to mention the advantage of the limitation of electrical ground direct connection between different network elements in the DC, that makes the design and implementation, but also operations and maintenance to be electrical shock proof. Even if the design has to provide sufficient space for cabling, no matter what solution is adopted, it is to consider also the fact that optical fiber cable use will spare some money in terms of footprint reduction of cable trays, is less obstructing the airflows for cooling the equipment and can be easily posed than a bulky copper cable.
The ToR data center design avoids these issues as long as there is no need to for a large copper cabling infrastructure, which is often the key factor in choosing it over End of Row.
From the network management point of view, each rack can be treated and managed like an individual and modular unit within the data center, therefore it is very easy to change out or upgrade the server access technology rack-by-rack and any network upgrades or issues with the rack switches will generally only affect the servers within that rack, not an entire row of servers. In terms of what type the cable is and how fast a connection linked by it are, ToR offers more flexibility and options given that the server connects with very short copper cables within the rack. For example, a 10GBASE-CX1 copper cable could be used to provide a low cost, low power, 10 gigabit server connection, over distances of up to 7 meters, which works fine for a Top of Rack design.
On top of this, fiber provides much greater bandwidth at longer distances than copper could ever accomplish. Further enhancements can be made easily without a complex rethinking of the whole design because of the flexibility in bandwidth that fiber has over copper.
Given the current power challenges of 10 Gigabit over twisted pair copper (10GBASE-T), any future support of 40 or 100 Gigabit on twisted pair will likely have very short distance limitations (in-rack distances) but future transitions to 40 Gigabit and 100 Gigabit network connectivity will be easily supported on a fiber infrastructure, which makes the second important key factor why ToR design would be selected over EoR.
Fig. 2.3 – “Top of Rack” design
Integrated Switch modules inside of blade servers have made the adoption fiber connection for racks more appealing by taking the fiber directly inside the blade enclosure thus moving the concept for “Top of Rack” directly to the switches inside. This can be seen in figure Fig. 2.3 where each blade server can contain 2 or more Ethernet switches and FC switches. This design increases the bandwidth for each server at the cost of heaving more switches to manage.
With each rack switch being a unique control plane instance that must be managed the management domain will increase and this is one significant drawback of the Top of Rack design as long as in a large data center with many racks. A Top of Rack design can quickly become a management burden by adding many switches to the data center that are each individually managed. For example, in a data center with 40 racks, where each rack contained two “ToR” switches, the result will consist of 80 switches on the floor just to provide server access connections, except distribution and core switches. That means 80 copies of switch software that need to be updated, 80 configuration files that need to be created and archived, 80 different switches participating in the Layer 2 spanning tree topology. In other words, 80 different places that can go wrong.
If a ToR switch fails, the technician replacing the switch needs to know how to properly access and replace the archived configuration of the failed equipment, assuming it was correctly and recently archived. The individual may also be required to perform some verification testing and troubleshooting, which requires a higher individual skill set who may not always be available, especially in a remotely hosted “lights out” facility, or it will come at a high price.
Talking of scalability, higher port densities in the Aggregation switches of the ToR design, as seen before in the 80 switch example, with each switch having a single connection to each redundant aggregation switch, will take 80 ports for each Aggregation switch. The more ports you have in the aggregation switches, the more likely you are to face potential scalability constraints. One of these constraints might be, for example, STP Logical Ports, which are a product of aggregation ports and VLANs.
For example, if supporting 100 VLANs in single L2 domain with PVST on all 80 ports of the aggregation switches is needed, that would result in 8000 STP Logical Ports per aggregation switch. Only the most robust modular switches can handle this number.
As the data center grows in numbers of ports and VLANs, another possible scalability constraint is raw physical ports, that is if the aggregation switch has enough capacity to support all of the top of rack switches. Regarding support for 10 Gigabit connections to each top of rack switch, it is to know how well is (and if) the aggregation switch scalable in terms of 10 gigabit ports. Never the less, this is something that will need to be paid attention to.
Top of Rack advantages:
We don’t need to use copper outside the rack and so we avoid a complex cabling infrastructure
Easier overall cable management and only a small part of the infrastructure dedicated to cabling which in terms reduce costs
Upgrades per rack are easier and offers an enhanced modularity and flexibility
Permits future upgrades for the fiber infrastructure by permitting upgrades of 40, 100G
Short copper cables have a lower interference because of the distance they are used in and because they are shielded from the outside by the rack casing. This permits a high speed, small delay structure
Top of Rack disadvantages:
This design uses more switches and a wider aggregation in terms of ports from the upper layer
Server to server traffic happens at layer 2 and because of this we may have issues concerning scalability of STP logical ports.
All racks are connected physically through STP at layer 2 and this implies that we have more STP instances that can put overhead on the network and are harder to manage
2.3.2. End of Row Design (EoR)
Server racks are typically lined up side by side in a row, with each one containing for example, 12 server cabinets. The term “End of Row” (EoR) was adopted to describe an extra rack or cabinet placed at each end of the “server row” for the exclusive purpose of providing network connectivity to the servers within that row.
Fig. 2.4 – Redundancy for “EoR” design
Bundles of twisted pair cables (UTP – Cat6, 6A) are routed to each server cabinet. Some designs of “End of Rack” contain for individual cabinets around 48 individual copper cables. It is not a rule that the “EoR” has to be placed at the end of an actual row. As long as there are some network racks located at any position in the design that provide copper cabling to more than one row of servers, the “EoR” design is still respected. An example of such a design can be seen in figure Fig. 2.4.
Redundancy is important at this level in the network as well and sometimes, more than one bundle of Ethernet cables can be linked to between racks and different End of Row network ranks.
In order to assure that the individual servers use a relatively short RJ45 copper patch cable to connect from the server to the patch panel in the rack, patch panels at the top of the server cabinets are used. They contain the bundles of copper cables that are routed to each server in the cabinet.
The bundle of copper from each rack can be routed through overhead cable troughs or “ladder racks” for carrying the dense copper bundles to the “EoR” network racks. Alternative solution is to route copper bundles underneath a raised floor, at the expense of obstructing cool air flow, if this is the case for the adopted cooling system implementation and not for earthquake protection. This is an important matter because depending on how much copper is required, having a supplementary rack dedicated to patching all of the copper cable adjacent to the rack that contains the “EoR” network equipment may be needed. This means that at the end of row, one network rack will be used for the network switch and another network rack will be used for the cable patches.
The server connection to the switch will be made by cabling a link from the server to the patch panel and another link from the switch to the corresponding patch panel that establishes the link to the server. When there are many servers in the rack, the high quantity of RJ45 cable patches can cause management issues as they take a lot of space. Therefore, without careful planning this can quickly result in an ugly unmanageable mess.
A way to reducing extra lengthy cable is a variation of the “EoR” that is referred to as “Middle of Row” that places the aggregation racks in the middle in pairs as in Fig. 2.5. This does solve the problem of long cables but creates another problem that is called a single point of failure at the middle that can be disruptive at the same time to all the servers in the design.
Fig. 2.5.– Middle of Row Design
The “EoR” network switch is basically a modular chassis platform that supports hundreds of server connections with typically redundant supervisor engines, power supplies, and overall better high availability characteristics than usually found in a “ToR” switch, because of the modular EoR switch that is expected to have a longer life span of at least 5 to 7 years (or even longer). It is uncommon for the end of row switch to be frequently replaced, once in service – “it’s in” – and any further upgrades are usually component level upgrades such as new line cards or supervisor engines.
The EoR switch provides connectivity to the hundreds of servers within that row, unlike ToR, where each rack is its own managed unit therefore using EoR makes the entire row of servers to be treated like one holistic unit or “Pod” within the data center. Any issues or network upgrades at the EoR switch is service can impact the entire row of servers. In this respect, the data center network implemented under EoR design is always to be managed “per row”, rather than “per rack”.
From the point of view of link topology, on one hand, the ToR design extends the Layer 2 topology from the aggregation switch to each individual rack resulting in an overall larger Layer 2 footprint, and consequently a larger Spanning Tree topology; on the other hand, in the case of EoR design, extends the Layer 1 cabling topology from the “End of Row” switch to each rack, resulting in smaller and more manageable Layer 2 footprint and fewer STP nodes.
As seen before, EoR is a “per row” management model in terms of the data center cabling and also in terms of the network management model. Given there are usually two modular switches “per row” of servers, the result of this being far fewer switches to manage the equipment when compared to a Top of Rack design. In the previous example of 40 racks, let’s assume there are 10 racks per row, which would be 4 rows each with two “End of Row” switches. The result is 8 switches to manage, rather than 80 in the Top of Rack design. We can easily observe that the EoR design typically carries an advantage of an order of magnitude over ToR design, in terms of the number of individual switches requiring management. This is often a key factor why the End of Row design is selected over Top of Rack.
To continue the discussion started already in the Top of Rack section, except the advantages of using the optical fiber cable that were mentioned, it is to highlight at least two more, which are connected to the power consumption and of networks virtualization. For example, a 10 Gigabit server connection over twisted pair copper cable at a speed of 10GBASE-T Ethernet is challenging due to the current power requirements of 6-8W per end currently available in silicon cables. Secondly, the adoption of dense compute platforms and virtualization quickly accelerates the servers limited to 1GE network I/O connections. This is a new challenge for getting access to a wider scale consolidation and virtualization.
All of these above until the newcomer 100G and beyond will be in place.
End of Row advantages:
There are fewer switches for each rack to manage and this lowers cost for management and maintenance.
Switches at higher level in the aggregation design can have a lower port density because there are fewer pots needed.
We only have one STP instance per rack rather than having multiple STP instances with the equipment inside the rack since the connection between servers is made at Layer 1
Modular, high performance, high availability platform provided across servers.
End of Row disadvantages:
A backbone copper cabling network is required for each rack which increases cost and management challenges
We need more space and additional devices for cable management and patching servers,
High speed servers are bottlenecked because of the long twisted copper cables limits to higher power and high interference shielding.
Any change in a rack affects all the equipment being lees flexible than the top of rack architecture. Also there is little room for future improvement and the challenges are increased.
3. Clos network solution in the context of the challenges presented before
3.1. Overview of Clos
As we explained earlier, the cloud is not actually one big location but rather and interconnection of smaller server and equipment’s all tied together over the Internet. Each of this location may be connected in some particular way to the Internet but they all share some fundamental features.
Now, we may speak about what actually connects the Cloud to the Internet, the underlying network topology that acts as a backbone between the outside (Internet) and what is located inside (user data, services, datacenters – the cloud).
This stage in the network is as important as what is stored on the servers, as the cloud itself and when designed it must be viewed from all perspectives. A failure here means that anything behind the actual network will be inaccessible to the user and this must be avoided at all costs.
When we analyze such a network there are a few key features that stand out from the rest. We will speak about these features in short terms at the beginning of this chapter and as we understand the network itself and advance through the chapter we will provide a more in depth analysis.
High availability would be one of these features, in other terms downtime must be close to zero at this stage. This means that the equipment used should not require maintenance outside of usual software upgrades or configuration changes.
Imagine what would happen if all the servers at Google were to go down at once.
In order to have a high availability we can design the network with failovers in mind. This is called redundancy and it can be implemented at link level (links between Switch’s and Router’s) or at equipment level (having a parallel Switch or Router working in tandem with the active one and taking over when there is a problem).
A good design will take account of both solutions and will take advantage of the easiest implementation, because having back-ups for all and each equipment and each link can become costly. The costs can be reduced by analyzing what is called a Single Point of Failure and adding back-ups only at that specific point.
A single point of failure happens when in a network, there is a single point to point communication to a specific destination. Take for example two connected switches with a single link and no other secondary route between (direct or through other equipment). If that link were to fail, anything behind or connected to the second switch would become unreachable. We can solve this through a redundant back-up link or adding another switch that is connected to both switches.
Another feature that must be fulfilled and kept in mind when designing the network is that high speeds are necessary in order to satisfy all the network communication that takes place between the aggregated servers and the users on the Internet.
In other words the network must be non-blocking between its input and output, and accept new incoming connections fast without bottlenecking. In modern networks non-blocking actually means that a speed must be guaranteed between the input and output because all the link’s and processing power of the equipment is shared between all the traffic.
For the purpose of this topic we will study a branch of Clos networks that could become widely used again as we move from Layer 2 control to a Layer 3 one (OSI standard model) and as fiber optic links are more and more used.
Charles Clos started his work at Bell Labs, mainly focusing on finding a way to switch telephone calls in a scalable and cost-effective way. In 1953, he published a paper where he described how to use equipment having multiple stages of interconnections to switch calls named “A Study of Non-Blocking Switching Networks”.
In this paper he described how telephone calls could be switched with equipment that used multiple stages of interconnection to allow the calls to be completed. The switching points in the topology are called crossbar switches. Clos networks were designed to be “three-stage” architecture, consisting of an ingress stage, a middle stage, and an egress stage.
The Clos networks also called multistage switching network topologies provide alternate paths between outputs and inputs. In this way, it is a possible solution to minimize or eliminate the blocking that can otherwise occur in such networks. He made a systematic study of switching system performance, a field that has developed the basis of Clos networks which are active and of continuing practical importance.
These generalizations have extended the application of Clos networks well beyond their original technological context and have led to a number of interesting new results, especially in connection with systems that support multicast communication and are of importance in the case of cloud networking as well.
In order to achieve high capacity and speed performances both with resiliency in case of failure, Clos networks are used now days in modern data center networking architectures. As the model has a proven efficiency for many years it became now a key architectural design for data center networking, being reemerged with new applicability.
Clos networks are required when the physical circuit switching needs to exceed the capacity of the largest feasible single crossbar switch. The key advantage of Clos networks is that the number of cross points (which make up each crossbar switch) required can be far fewer than would be the case if the entire switching system were implemented with one large crossbar switch. In complex data centers, with huge interconnect structures, each heaving many optical fiber links, this becomes a very important issue.
Classic Clos networks have three stages: the ingress stage, middle stage, and the egress stage but it can be folded so we are left with only two stages. Each stage is made up of a number of crossbar switches (Fig. 3.1.), often just called crossbars. Each call entering an ingress crossbar switch can be routed through any of the available middle stage crossbar switches, to the relevant egress crossbar switch. A middle stage crossbar is available for a particular new call if both, the link connecting the ingress switch to the middle stage switch, and the link connecting the middle stage switch to the egress switch, are free.
Fig 3.1. – 3-stage clos network
“The advantage of such network is that connection between a large number of input and output ports can be made by using only small-sized switches. A bipartite matching between the ports can be made by configuring the switches in all stages. In Fig. 3.1., n represents the number of sources which feed into each of the m ingress stage crossbar switches. As can be seen, there is exactly one connection between each ingress stage switch and each middle stage switch. And each middle stage switch is connected exactly once to each egress stage switch.”
It can be shown that with k ≥ n, the Clos network can be non-blocking like a crossbar switch. That is for each input-output matching we can find an arrangement of paths for connecting the inputs and outputs through the middle-stage switches. The following theorem shows that for adding a new connection, there won’t be any need for rearranging the existing connections so long as the number of middle-stage switches is large enough.
Clos Theorem:
Fig. 3.2. – Basic input output structure
A Clos network is strictly non-blocking for circuit switching if and only if the number of second-stage switch modules:
Proof: With reference to Fig. 3.2., suppose that an input link of the first stage switch module “I”, asks for connection to an output link of a third-stage switch module “O”. In the worst case, the other (n-1) input links of are active and they use up (n-1) outgoing links of I, and the other (n-1) output links of O are active and they use up (n-1) incoming links of O. Furthermore, none of the (n-1) outputs of I and the (n-1) inputs of O are attached to a common second-stage switch module. In other words, at most 2(n-1) paths cannot be used for the new request. So, to be strictly non-blocking, we must have:
This means that at least one of the middle-stage modules is available for setting up the new path.
For Clos networks however, with broadband systems, the basic hypotheses related to circuit switching have to be changed. In multi rate systems each connection induces a load on the network which depends on its bandwidth characteristics. Many connections may share a common physical link, provided the sum of their weights does not exceed one because that would mean that the utilization of that link is higher than one.
External blocking is independent of the switch architecture and it can be solved only by properly sizing the trunk capacities between switching centers. While it can be excluded rather easily in the analytical study of the preceding sections, separating it from internal blocking in simulation without affecting the targeted bandwidth and fan-out distributions requires more thought
The need for data services increased as time passed, together with their associated “Quality of Service” implementation necessities, so big datacenter networks have started to use more and more the "fat tree" model of connectivity, by implementing the core – distribution – access architecture. The link speeds had to go higher and higher as you approach the core level, for preventing oversubscription. For exemplification, the access links to servers or desktops might have historically (already!!!) been 100Mbps Fast Ethernet links, the uplinks to the distribution switches is normally these days 1Gbps Ethernet links, and the uplinks to the core is maybe designed to have 4X1Gbps port channels.
Fig. 3.3. – Spine/Leaf/Aggregation
Inside the traditional networks depicted in Fig. 3.3., built using the spanning-tree protocol or layer-3 routed core networks, there is a single "best path" chosen from a set of multipaths, so all data traffic takes that "best path" until the limitation point that the path capacity gets its maximum, then it becomes congested and finally packets are dropped. The alternative paths are not utilized because their corresponding topology algorithm classify them to be less desirable or even removed to prevent loops from forming. Therefore the desirable goal is to migrate away from using spanning-tree while still maintaining a loop-free topology, with the full utilization of all the multiple redundant links. If we could use a method of Equal-Cost Multi-Path (ECMP) routing, then performance could increase and the network would have better resiliency in the event of a single switch failure or a link failure.
We have seen Clos networks making their reappearance in modern data center switching topologies by manifesting in the way that the switches are interconnected, but with the highlighted aspect of comprising top-of-rack switches and core switches, instead of a fabric within a single device.
The data network architecture as it is seen in the Spine-Leaf model comprises top of rack (ToR) switches that represent the leaf level or the ground floor which are attached to the core switches which represent the spine or the top floor.
The top floor level switches are not connected to each other but all the top floor switches are connected only to the ground floor switches, forming an upstream core device link. By adopting this kind of architecture, the number of uplinks from the ground floor switch equals the number of top floor switches. In the same manner, the number of downlinks from the top floor equals the number of ground floor switches. Therefore, the total number of connections is the number of ground floor switches multiplied by the number of top floor switches which is 8 X 4 = 32 links, as shown in the diagram below.
Fig. 3.4. – Spine and Leaf
In this Clos topology (Fig 3.4.), every ground level or lower-tier switch is connected to each of the top floor or top-tier switches using a full-mesh topology. As long as we have no oversubscription between the ground floor or lower-tier switches and their corresponding uplinks to the top floor, then we have defined a non-blocking architecture.
The above presented architecture is called a “folded clos network”, in respect with its founder name. In terms of datacenter design equipment, it works using the paths presented, the same way as depicted in Fig.3-4, usually this being the representation of network equipment. It is to be mentioned that, in terms of notations, the ground floor or leaf-level may often be called Tier 2 devices, and the top floor or spine level components Tier 1 devices.
In the real world, practically there is another layer added to the design which consists in the Tier 3 TOR (Top of Rack) Switches for aggregation of the actual servers. The devices directly connecting Tier 2 and Tier 3 along with their attached servers is usually referred to as a “cluster”.
After performing a simple cost-benefit analysis, the result marks the advantage of the Clos network as being the use of a set of identical and inexpensive devices to create the network tree together with major characteristics of high performance and resilience. In order to avoid a “preferred path” for any uplink to be artificially created, for any physical reason, the path is randomly chosen so that the traffic load is evenly distributed between the top floor switches. From the point of view of resilience or opposition to failure, if one of the top floor switches fails, this incident will only slightly degrade performance through the whole data center network. Such performances implemented in a data network without using Clos architecture would otherwise cost must more to construct.
A lot of examples of Clos networks are provided in many of the data center fabric architectures proposed by switch manufacturers. Part of them are cited below.
“Transparent Interconnect of Lots of Links (TRILL) is a layer-2 data center protocol that creates flat networks on top of a layer-3 routed network for the purposes of simplified server networking. TRILL allow for multiple paths to be used in a redundant Clos Network architecture and removes the need for spanning tree protocol and its blocked alternative links. Many vendors have implemented their own versions of TRILL.
Cisco's implementation of FabricPath is an extension of the TRILL standard. Cisco data center switches like Nexus 7000 switches are connected in a Clos network to Nexus 5000 and/or Nexus 2000 switches and FabricPath can be run within that data center and to connect to other data centers.
Juniper's QFabric System is actually not TRILL-based, but instead utilizes an interior fabric protocol developed by Juniper that is based on IEEE RFC1142, otherwise known as the IS-IS routing protocol. QFabric Nodes are interconnected to form a fabric that can utilize multiple redundant uplinks for greater performance and reliability.
Brocade Virtual Cluster Switching (VCS) Fabric is their implementation of the TRILL standard that allows for a Clos network topology to utilize multiple links.
Arista Spline architecture, depicted in Fig. 3.5., is the place where the terms ground floor and top floor are combined into a new word that represents a collapsed architecture that uses a single tier. At first, you may think that the term "Spline" is related to a method of connecting mechanical parts or some type of mathematics but this is what it represents:”
Fig. 3.5. – Arista Spline architecture
Summary
Some of you have been in the IT and networking industry for more than a decade, so you have probably observed different concepts born, evolve, peak, die, and then become alive again into some new technology. There was among others, Token Ring networks reformed into FDDI and then reappear in Ethernet ring topologies. You must have had the feeling of pulsation when centralized mainframes evolved into distributed computing and again when server consolidation and virtualizations have brought back computing to centralized data centers and now days back into cloud computing.
The cited examples conduct to the conclusion that enduring concepts, like Clos networks is, will undoubtedly be seen again and again maybe over generations in the evolution of networking technologies
3.2. Fat-Tree Design
3.2.1. Introduction
A network topology that uses switches is having as main goal to connect a large number of endpoints (processors or servers) by using switches that only have a limited number of ports. Network engineers must work in a clever way connecting switching elements and forming a topology, often called a switched fabric network that is able to interconnect an impressive amount of endpoints.
Fat-Tree network topology, proposed by Charles Leiserson in 1985, comes as a natural answer to the above mentioned challenge. Such network topology looks like a tree with the processors being connected to the bottom layer. The distinctive feature of a fat-tree is that for any switch, the number of links going down to its lower level users is equal to the number of links going up to its corresponding provider in the upper level. Therefore, the links get “fatter” towards the top of the tree, and switch in the root of the tree has most links compared to any other switch below it:
Fig. 3.6. – Fat-Tree. Circles at the bottom represent endpoints, and the squares are switches.
In Fat-tree, we use bouncing switches to do the routing. When server ”i” sends a packet to server “j”, the data packet first goes up to a spine switch r (up-path), and then travels down to the destination server (down-path). The spine switch is considered as a bouncing switch, since it bounces the packet from server i to server j. Fat-tree has a nice property: given a bouncing switch, there is one and only one path from server i to server j. For example, in the Fat-tree shown in Fig. 3.6, given a bouncing switch 3.0, the green links show the uniquely determined path from server 0 to server 15.
This set-up is particularly useful for networks-on-chip, specifically used in SoC’s (System-on-chip) designs. However, for enterprise networks that connect servers, commodity (off-the-shelf) switches are used, and they have a fixed number of ports. Hence, the design in Figure 1, where the number of ports varies from switch to switch, is not very usable. Therefore, alternative topologies were proposed that can efficiently utilize existing switches with their fixed number of ports.
There is a controversy whether such topologies should be called fat-trees, or rather “(folded) Clos networks”. However, the term fat-tree is widely used to describe such topologies. The example of this topology is given below:
Fig. 3.7. – Two-level fat-tree network. The network is built with 36-port switches on both levels.
A total of 60 nodes are currently connected
In the above Fig. 3.7., a two-level fat-tree network is represented, with the lower and upper level referred to respectively as edge and core layer. At both layers identical switches with 36 ports are used, with each of the four switches on the edge level having 18 ports dedicated to connecting servers (the small circles in the bottom of the picture) and the other 18 ports of each edge switch connected to the core layer. As shown, two bundles of 9 links (represented by thick lines) connect the two core switches.
Two servers connected to the same edge switch are able to communicate one to each other via the particular edge switch, without referring to the core level. In case of the two servers are connected to different edge switches, the packets will travel up to any of the core switches and then down to the target edge switch.
As seen, every edge switch maintains the distinctive property of the original fat-tree network: the number of links that go to its users (the servers) is equal to the number of links that go to its providers. The difference with the original fat-tree is that the intermediate switch has several providers (in this case, two), whereas in Figure 1 each switch only has one. That is the source of controversy in terminology described above.
3.2.2. Blocking
As already mentioned, the property of fat-tree networks is in their primary meaning to have on each intermediate level, the same number of links that go to the upper level (in two-level networks, this is from the edge level to the core level) and to the lower level (in this example, from the edge level down to nodes). On this intermediate switch, the number of uplinks and downlinks are in proportion of 1:1, therefore such networks are called non-blocking. You can arbitrarily divide nodes into pairs, and if all pairs start communicating, still every pair of nodes will be able to communicate at full link bandwidth.
It is important to be mentioned that the non-blocking effect is maintained, in a statistical sense, in packet-switching networks, in the same way as for circuit-switching networks, where you can designate a separate path between all communicating pairs, and there would be enough paths for every pair.
You can distribute downlink and uplink ports in a different proportion, as for example, you can plan to have twice more ports going down than going up (2:1), which means a blocking factor of 2.
For packet-switching networks, having a blocking network means that two packets that would follow separate paths (if available), must be queued (on the only existing physical path) with one of them having to wait, meaning some supplementary introduced latency, which is never good for the speed of parallel computing.
Blocking is only used because of lack of resources, as long as it allows implementing somewhat cheaper networks by the use of fewer switches. However, it is only a matter of network size, as the blocking factor increases in a geometrical progression, by the allowance of blocking, the cost of network decreases only in arithmetical progression. The other alternative use case for blocking networks is to connect more nodes using the same hardware than a non-blocking network can sustain.
3.2.3. Resistance to failure
Network switching equipment is “redundant” meaning that if one of them stops functioning, the others will still carry traffic, hence the network will run in a degraded mode (with blocking); however, all nodes will still be reachable.
But, if a line of switches fails, then all nodes connected to its ports will be unable to communicate. For really achieving the ultimate level of fault tolerance, we can equip all compute nodes with dual network interfaces, and connect each of them to separate switches (or at least to separate line boards). This is sometimes referred to as “dual-plane connection”. This can also be used to provide more bandwidth for compute nodes, which is especially helpful when using modern multi-core CPU’s in compute nodes.
3.2.4. Designing the network
The most challenging and interesting part of networking is the design process itself. For start we are going to discuss two very common cases. The first one refers to the star topology network — where building a fat-tree network is to be avoided because it exists implicitly.
As an example, we can build a 648-port fat-tree network by using 54 pieces of 36-port switches, but this solution will not be really useful because this will total the cost of 54 switches, instead of buying a single 648-port switch (which also implements a two-level fat-tree internally). Furthermore, one big switch is easier to manage than 54 small switches and the wiring pattern is really simple, as it consists in just placing the switch in the geometric center of the room and draw the corresponding cables from every rack to it. Any cabling between the edge and core levels is now inside the switch and is therefore very reliable and more than that we are saving on cables cost.
Also, if we plan to initially have, for example, 300 nodes in a cluster, then the modular switch is easier to expand as we have only to add required line boards. With the “hand-made” fat-tree network, we have to carefully design the core level to be big enough to accommodate future expansion, otherwise we’ll need to rewire a lot of cables, which means a new waste of resources and a subject of errors.
As a conclusion, in the case of having a small enough cluster to be accommodated by a single switch, then the star topology is preferred one, despite the fact that using a single big switch is more expensive than building the same network manually with smaller switches. Convenience, expandability and manageability considerations are all suggesting to follow this path. The current industrial “rule of thumb” is to use the smallest “monolithic” switches as edge switches, also referred to as “top-of-rack switches”.
3.2.5. Dealing with link bundles
A non-uniform distribution of links is presented in Fig. 3.8. and it can look like below:
Fig. 3.8. – Non-uniform distribution of links among core switches.
This wiring is uneven and also, the two core switches are unevenly loaded with packet processing.
Instead, by reconfiguration of the first wiring alternative, we can distribute links more uniformly using a bundle of 3 links to the first core switch, and a bundle of 2 links to the second. The result is presented in Fig. 3.9., as follows:
Fig. 3.9. – More uniform distribution of links.
3.2.6. Designing for expandability and design constraints
Network expansion is another challenge in designing and networking engineering work, because it implies, in the first phase, more edge switches. But what is worse, sooner or later, in the following stages, it will require more core switches, as well. As the wiring between the layers is complex, network expansion will imply the work for re-wiring a lot of connections. Again, this means a new waste of resources and a subject of errors in a costly and error-prone process.
But fortunately, there is a better way to cope with this issue, only by design from the very beginning the core level of the network in such a way that it will be able to accommodate the largest future expansion.
That means it has to have enough spare ports which implies to initially procuring and installing more core switches than is necessary in a non-expandable network. As a result that will be low core level port utilization. Edge switches, however, are procured in just the proper amounts, and are not reserved for future use. For every stage of expansion, more edge switches can be added, thus the existing cabling will not require rewiring and only new cables will need to be laid in place.
There are also other design constraints that affect the course of the design process. The most obvious constraint is the cost of network. Sometimes the rack space is not readily available, and can’t have more than a certain number of rack mount units. Then specifying the maximum size of network equipment as a constraint would weed out unsuitable configurations that cannot fit in the allotted space, even if they are less expensive than the final one presented.
3.3. Load-Balancing
3.3.1. Traffic Engineering
Now days, data center networks are characterized by multiple paths between each pair of servers, which are designed for improving the scalability and cost-effectiveness of existing data center networks. Therefore, many interesting solutions have been proposed with regard to the protocols chosen for packet data forwarding between servers. In this way, we can classify data center network architectures into three categories, switch only, server-only and hybrid. In a switch-only architecture, switches are the only network nodes responsible for packet forwarding.
Among the three categories of data center network architectures, the switch-only architecture is most widely deployed. This is likely because data center operators have accumulated extensive experience on switch-only architecture mainly by using Ethernet switches, and switches tend to be more reliable than servers. Under this architecture design it has been noticed that the most critical function to be implemented as future proof design is traffic balance, usually known as load balancing.
In any data center, load-balancing application is a critical function performed by network devices. If, traditionally the load-balancers were deployed as dedicated devices in the traffic forwarding path, due to the problems arisen from scaling load-balancers under growing traffic demand, a preferable solution is chosen to scale load-balancing layer horizontally, by adding more of the uniform nodes and distributing incoming traffic across these nodes. In this design the ideal choice is to use network infrastructure itself to distribute traffic across a group of load-balancers.
The combination of Anycast prefix advertisement and Equal Cost Multipath Protocol (ECMP) functionality can be implemented to accomplish the goal above mentioned, and furthermore to allow more granular load-distribution it is beneficial for the network to support the ability to perform controlled per-hop traffic engineering. For example, it is beneficial to directly control the ECMP next-hop set for Anycast prefixes at every level of network hierarchy.
In order to provide scalable network infrastructure, many data center network designs have been proposed, based on Fat-tree and VL2 implementations, built on Clos network architecture, which can provide higher network capacity. These Clos-based networks support also implementation on existing commodity Ethernet switches and routing techniques, for example ECMP and OSPF protocols. As a result, companies have adopted the solution of building the networks in data centers, using both Fat-tree and VL2, bearing in mind that these are reconfigurable, high bandwidth, non-blocking networks.
Existing routing designs count on ECMP for traffic load-balancing, based on the fact that this protocol chooses the next hop of a packet by hashing the flow-related data in the packet header, hence guarantees that a certain flow always takes the same routing path. However, due to hash collision, ECMP cannot achieve full bisectional bandwidth.
ECMP can only utilize 40-80% network capacity and even when the traffic load is moderate, network latency may have a long tail. This long-tail latency cause large RTT’s and flow completion time can lead to bad user experiences and loss of revenue.
3.3.2. Designing for Load-Balancing
We can design a per-packet round-robin based on routing algorithm called Digit-Reversal Bouncing (DRB). DRB is based on a sufficiency condition which enables per-packet load-balanced routing to fully utilize network bandwidth without causing bottlenecks for both Fat-tree and VL2.
In DRB, for each source-destination server pair, for each outgoing packet, the source selects one of the highest level switches as the bouncing switch and sends the packet to that switch. The bouncing switch then bounces the packet to the destination. DRB selects bouncing switches by digit-reversing their IDs, and as a result achieves perfect packet interleaving. Compared with random-based per-packet routing, DRB achieves smaller and bounded queue lengths in the network. As a result, DRB achieves high bandwidth utilization and low network latency at the same time.
In real world, the real-time traffic matrix is difficult to evaluate therefore we design a traffic oblivious load-balanced routing framework for Fat-tree, which works as will be presented in the following section.
When a source server sends a packet to a destination server, it first sends the packet to one of the bouncing switches. The packet is then bounced to the destination. Although this framework does not have any knowledge of the traffic matrix, it is still able to split traffic evenly in this framework. The key problem is how to select the bouncing switches properly. Oversubscribed Network and Network Congestion, in both cases of using Fat-tree and VL2, have full bisectional bandwidth and are also, re-arrangeable non-blocking networks. There is no congestion when the traffic matrix is feasible. In reality, congestions may occur when multiple senders send to the same receiver, known as many-to-one communication, or when the network is oversubscribed.
As we saw, DRB alone cannot handle network congestion, although DRB already perfectly load-balances the traffic in the network, hence the only way to handle congestion is to ask the senders to slow down. Existing TCP plus ECN can well handle this situation. When congestion is about to happen, switches in the network use ECN to mark packets and then TCP as an end-host transport protocol reacts and slows down.
As understood from above, given the divergence and unpredictability of traffic patterns in a data center, routing protocol should be designed to balance the data center traffic by fully exploiting the path diversity in Fat-tree. VLB (Valiant Load Balancing) is another simple-yet-efficient load balancing technique that performs destination independent random traffic spreading across intermediate switches, which comes to the best performance when packet-based VLB is used. In a data center with TCP/IP communications, it is, however, generally acknowledged that packet based VLB is not suitable.
This is because packets of the same TCP flow will follow different paths, and they will arrive damaged at the receiver, or the destination server. The receiver then generates duplicate ACKs (acknowledgements) for out-of-order packet arrivals. If and when the number of duplicate ACKs arrived at the sender or the source server is greater than, or equal to three, which is the default-fast retransmit (FR) threshold, the associated packet is deemed lost, which is not true in this case. Because of that, the sender carries out all unnecessary a fast retransmit action. As a result, the data center network utilization will be significantly lowered.
In the scope of avoiding the already mentioned packet out-of-order problem, existing Fat-tree networks based data centers have adopted flow-based VLB, where the routing objective is to balance the number of TCP flows traversed through each switch. Since packets of the same flow always follow the same path, there is no packet out of-order problem. But if the traffic in a data center contains large and long-live TCP flows, known as elephant flows, then flow-based VLB schemes can cause congestion on hotspot links.
To address this issue, dynamic flow scheduling can be used to identify and reassign elephant flows. More recently, multipath TCP (MPTCP) is adopted for improving the load balancing performance in data centers. With MPTCP, each TCP flow is split into a number of sub-flows, which are then spread over multiple network paths. MPTCP can be effective, but its delay performance may suffer especially when some critical sub-flows, for example the first sub-flow of a TCP flow experiences long end-to-end delay. Besides, MPTCP requires sophisticated modifications to the existing TCP implementations at both sender and receiver.
Despite all disadvantages, to fully exploit the path diversity in a Fat-tree topology, it is recommended to adopt packet-based valiant load balancing (VLB), which performs properly and the packet out-of-order problem is not as severe as not being managed. One solution for addressing the packet out-of-order problem is to slightly increase the fast retransmit threshold, in fact to duplicate ACK counter for triggering fast retransmit action, in order to suppress the unnecessary fast retransmits actions and their corresponding delays at TCP senders.
3.4. Layer 2 Clos network
Originally most data center designs used Spanning-Tree Protocol (STP) for loop free topology creation, typically utilizing variants of the traditional DC topology described in above. At the time, many DC switches either did not support Layer 3 routed protocols or supported it with additional licensing fees, which played a part in the design choice.
Before we can speak about a Clos network at different layers, we must take a small overview of the equipment that is used in this network topology. In most cases, the most used are Switches. Some of them only operate at layer 2 but as we will see if they support layer 3 functions, the advantages that are obtained are numerous.
Like a hub, a switch is a device that connects individual devices on an Ethernet network so that they can communicate with one another. But a switch also has an additional capability; it momentarily connects the sending and receiving devices so that they can use the entire bandwidth of the network without interference. If you use switches properly, they can improve the performance of your network by reducing network interference.
Switches have two benefits: (1) they provide each pair of communicating devices with a fast connection; and (2) they segregate the communication so that it does not enter other portions of the network. (Hubs, in contrast, broadcast all data on the network to every other device on the network.) These benefits are particularly useful if your network is congested and traffic pools in particular areas.
However, if your network is not congested or if your traffic patterns do not create pools of local traffic, then switches may cause your network performance to deteriorate. This performance degradation occurs because switches examine the information inside each signal on your network (to determine the addresses of the sender and receiver) and therefore process network information more slowly than hubs (which do not examine the signal contents).
Switches are a fundamental part of most networks, as they make possible for some users to send information over a network at the same time with others, without slowing each other down. Most switches operate by examining incoming or outgoing signals for information at OSI level 2, the data link level, than based on this analysis, just like routers, allow different networks to communicate with each other. Therefore, because a switch allows different nodes like a network connection point, which is in most cases a computer, to communicate directly with one another in a smooth and efficient manner, it allows these networks to maintain full-duplex Ethernet communication. Before switching technology was implemented, Ethernet traffic was half-duplex, meaning that data could be transmitted in only one direction at a time, like time division duplex technology. As in a fully switched network, each node can communicate only with the switch and not directly with other nodes, information packet data can be exchanged between node and switch both directions in the same time, although simultaneously.
As we saw before a simple switch fundamentally changes the way network nodes are communicating one with each other. The first question born in mind is, therefore, what makes the difference between a switch and a router? The answer resides in OSI model, as long as switches work at Layer 2, or data or Data link layer of the OSI Reference Model, using MAC addresses, while routers work at Layer 3, or Network Layer 3, using addresses with respect to their corresponding transport protocols types: IP, IPX or AppleTalk. Based on that MAC address, the algorithm implemented in the corresponding switch is used to decide where and how to forward packets, which is different from the algorithms used by routers to forward packets.
A hub is more like a switch than a router as long as it will pass along any broadcast packets it receives, to all the other segments in the broadcast domain, while a router will not. If we take an example with a four-way intersection we can imagine that all of the traffic passed through the intersection, no matter where it goes. Furthermore, if we imagine that this intersection is at an international border, then to pass through it, you must provide the border guard with the specific address that you are going to. If you failed to do so, meaning you don't have a specific destination when passing the border, then the guard will not let you pass. A router is therefore looking like a border domain network guard, if it’s acting like without the specific address of another device the packet data will not be let though. It is more or less like the EU: a good thing for keeping networks separate one from each other, but not so good when you want to move between different parts of the same network.
Talking about the same LAN, traffic data rely on packet-switching, meaning that the switch establishes a connection between two segments, just long enough to send the current packet to its destination. On the other hand, incoming packets, which are part of an Ethernet frame, are first saved to a temporary memory area, or buffered and then the MAC address contained in the frame's header is read and compared to a list of addresses maintained in the switch's lookup table. To understand how this procedure in possible to be applied we have to notice that in an Ethernet-based LAN, any Ethernet frame contains a normal packet as the payload of the frame, together with a special header, that includes the MAC address information, both for the source and destination of the packet.
In order to prevent broadcast storms and other unwanted side effects of looping it has been created by Digital Equipment Corporation the spanning-tree protocol (STP), which has been standardized as the 802.1d specification by the Institute of Electrical and Electronic Engineers (IEEE).
Essentially, a spanning tree uses the spanning-tree algorithm (STA), which knows from the MAC addresses table of the switch, that it has more than one way to communicate with a node, then determines which way is best and blocks out the other available path(s). The best thing is that the protocol keeps track of the other path(s), just in case the primary path is unavailable. STP works like this: each switch is assigned a group of IDs, one for the switch itself and one for each port on the switch. The switch's identifier, called the bridge ID (BID), is 8 bytes long and contains a bridge priority (2 bytes for each path) along with one of the switch's MAC addresses (6 bytes each). So, each port ID is 16 bits long with two parts one a 6-bit priority setting and second a 10-bit port number. The path cost is typically based on a guideline established as part of standard 802.1d, and according to the original specification that cost is 1,000 Mbps or 1 Gbps divided by the bandwidth of the segment connected to the port, that is a 10 Mbps connection would have a cost of 100. Therefore a path cost value is possible to be assigned to each port.
Switches that work at layer two were in general a good choice for our classical Clos network but the main constrains that diminish its utilization in current Clos-like networks becomes observable when we speak about loops that can form at layer 2 and the STP algorithm used to deal with them.
As we presented earlier, a switch working at layer two forwards any incoming packet received on one of its interfaces to all other interfaces. This means that if we were to have an increasing number of paths to our destination, a layer 2 loop will form.
To avoid creating loops, switches using the Spanning Tree Protocol algorithm first to pick the best path through a network then close all other links. For a Clos network this is bad, because when we design it we have to keep in mind that it has to have many redundancy features. When STP ignores them completely by blocking some links in order to have a loop free network, this is bad.
The best design uses all links at the same time, load-balancing the traffic and increasing the overall performance. STP does not allow this, and this is a big downfall for this configuration. Although, by using Vlans we can implement a kind of load-balancing by running different STP sessions for each Vlan, which is not recommended because it comes with its own disadvantages and it implies a more difficult management of the network. In this way we could make use of more links than if we have a single STP session through the whole network.
Furthermore, operators had many bad experiences with large failures due to issues caused by improper cabling, misconfiguration, or software flaws on a single device, as long as these failures regularly affect the entire spanning-tree domain and are very hard to troubleshoot due to the nature of the protocol. For these reasons and since almost all DataCenter traffic is now IP, therefore requiring a Layer 3 routing protocol at the network edge for external connectivity, design utilizing STP usually fail all of the requirements of large scale DC operators.
As we know STP does help us in having a loop free network but the algorithm does take a little time until it sets up. For example, there are 4 main stages that STP has to go through before a switch will start to forward packets.
One possible definition for each stage is depicted below:
Blocking State:
The Switch Ports will go into a blocking state at the time of election process, when a switch receives a BPDU on a port that indicates a better path to the Root Switch (Root Bridge), and if a port is not a Root Port or a Designated Port.
A port in the blocking state does not participate in frame forwarding and also discards frames received from the attached network segment. During blocking state, the port is only listening to and processing BPDUs on its interfaces. After 20 seconds, the switch port changes from the blocking state to the listening state.
Listening State:
After blocking state, a Root Port or a Designated Port will move to a listening state. All other ports will remain in a blocked state. During the listening state the port discards frames received from the attached network segment and it also discards frames switched from another port for forwarding. At this state, the port receives BPDUs from the network segment and directs them to the switch system module for processing. After 15 seconds, the switch port moves from the listening state to the learning state.
Learning State:
A port changes to learning state after listening state. During the learning state, the port is listening for and processing BPDUs . In the listening state, the port begins to process user frames and start updating the MAC address table. But the user frames are not forwarded to the destination. After 15 seconds, the switch port moves from the learning state to the forwarding state.
Forwarding State:
A port in the forwarding state forwards frames across the attached network segment. In a forwarding state, the port will process BPDUs , update its MAC Address table with frames that it receives, and forward user traffic through the port. Forwarding State is the normal state. Data and configuration messages are passed through the port, when it is in forwarding state.
This is a total of 50 seconds that a switch is not going to process any traffic frames. When a link fails, some of the stages must be parsed again and this is an unaffordable downtime in the context of Cloud networking.
There are some other versions of Spanning Tree like RSTP (Rapid Spanning Tree Protocol) and PVSTP (Per Vlan Spanning Tree Protocol) that try to minimize and optimize the amount of time needed to reach convergence in case of a failure, but these solutions are still far from what can be obtained if we work at a higher layer, the Layer.
There are some advantages in the L2-wide design, such as seamless server mobility, but that usually works with a limited number of devices and additional protocols such as TRILL (Transparent Interconnection of Lots of Links). No classical spanning-tree can ever work here because it is not suitable to scale when it comes to “many” links. Also, it would probably take a lifetime to reach convergence, broadcast storm can cripple the entire network and, even the sizes of ARP (Address Resolution Protocol) tables would become a problem.
Various enhancements to link-aggregation protocols such as Multi-Chassis Link-Aggregation (M-LAG) made it possible to use Layer 2 designs with active-active network paths while relying on STP as the backup for loop prevention.
The major downside of this approach is proprietary nature of such extensions. TRILL resolves many of the issues STP has for large scale DC design however currently the maturity of the protocol, limited number of implementations, and requirement for new equipment that supports it has limited it's applicability and increased the cost of such designs.
Neither TRILL nor M-LAG approach eliminate the fundamental problem of the shared broadcast domain, that is so detrimental to the operations of any Layer 2, Ethernet based solutions.
3.5. Layer 3 Clos network
3.5.1. Introduction
OSPF and IS-IS would be nice but there are some limitations they are facing in terms of scalability (database size may become a blocking point in large scale networks), traffic engineering (one example is the ability to shift traffic around a spine for maintenance, and this is pretty difficult to do with IS-IS/OSPF), hierarchical design necessity and last but not least, added complexity to the network because BGP is still used for connecting to the edge of the DC.
As we try and build a control plane for the layer 3 switches a tradeoff is made between classic routing protocols like IS-IS and OSPF that are proven and robust and a relatively new mode of interconnection.
3.5.2 Clos with OSPF
Modern networks usually have a diverse topology and OSPF has been designed as a link-state protocol in order to provide routing in such big networks that provide many types of services. In order to minimize the difficulty of such big network designs and to simplify the configurations of the equipment, a basic network design can be taken and replicated many times until a large network is created.
The result of this replicating procedure will be represented by a highly regular network topology with a distinctive pattern, including characteristics of “hub-and-spoke”, "fat-tree" and Clos topologies. The mix of the above mentioned data network designs, represents a challenge for the implementation of routing protocols, mainly due to the large number of routers. Nevertheless, the replication process, meaning in fact a kind of network modularity allows the simplification of protocol implementation.
Link state protocols in general and OSPF in particular, which are mainly used in networks design with regular topology don’t make any use of regularity, which make them vulnerable to the elements of the large scale of mixed designed real-life networks. These networks combine, at a large scale, “continents” of regular topologies with “islands” of free topology where OSPF works the best. Continuing the example above, these “islands” are the headquarters HQ network of the enterprise or interconnections between datacenters. For operational simplicity it is desirable to have the same routing protocol running in both parts of the network. Thus, we will present the extensions to OSPF to improve its scalability in the very large scale networks with regular topologies.
The link-state protocol, as OSPF is, allows all spoke sites placed into a single common protocol area to receive full topology information describing each spoke in that area and its connection to hubs. This information can be considered at least excessive, although it is already redundant, as all links from the site go to hubs and the knowledge of many other spoke links in the same area cannot reveal alternative paths to destinations outside of the site.
To make things even worse, spoke routers are small devices intended to serve tiny site network with a few routes and light traffic, consequently they do not have sufficient processing resources, RAM memory or CPU, to hold and process the same link-state database as hub routers, which are much bigger structures of equipment. Therefore, distance-vector protocols in the same topology propagate only prefix reachability information not to overload spoke routers with topology view.
As long as, inter-site visibility is not wanted in order to decrease the size of the routing table on spoke devices, or even of security reasons, link-state protocols do not allow routing information to be filtered within area. The flooding scope of the area is thus to be avoided as in case of distance-vector protocols that allow route filtering and summarization on per-neighbor basis. Furthermore, by having full topology visibility within an area may also lead hub routers to calculate suboptimal paths. As an example, let’s consider a hub-and-spoke network with two hubs A and B and two spoke sites S1 and S2, where each spoke site has a connection to both hubs. Hubs A and B are Area Border Routers (ABRs) between hub-and-spoke WAN and the backbone. If the link between hub A and site S1 fails, then A will choose, or at least should consider intra-area WAN route A -> S2 -> B -> S1, which is obvious not the case as long as a spoke site like S2 must not be taken into consideration for any transit traffic.
Another disadvantage of overloading computational resources refers to the very large number of point-to-point links in describing connections to all spokes in the area for the purpose of Router LSA of hub routers information. That implies the size and stability of the database that will lead to rebuild the Router LSA, as it grows directly proportional to the number of spokes in the area (N), while CPU resources consumed by flooding and processing hub's Router LSA grow as O * Nexp2.
As a solution to the above mentioned aspects is to separate each spoke into area of its own, which will solve the problem of spoke routers. Unfortunately the computing overload is transferred to hub routers, which will be asked to manage a lot of independent areas that is to be able to support thousands of NSSA areas, originate as many router LSAs, and translate multiple LSAs from/into each area. Managing common route filtering and summarization policy is also difficult.
The contradiction, in designing an optimal network, addresses, as the problems described above, to that than, on one hand, it is better to have as small areas as possible, while, on the other hand, it is better to have as big areas as possible. Compromising these requirements is more and more challenging and becomes difficult as the network size grows.
When implementing Clos with OSPF some other specific challenges arise. Because Clos networks provide equal multipaths for destinations, when a link experiences a failure a destination subnet rarely becomes unreachable. In such a situation, OSPF is designed to rebuild its Router LSAs and flood them to all of the neighboring routers. Although any router not connected to the problematic link does not change the routing for the subnet that is affected, the SFP on all nodes algorithm is still recalculated. This causes unnecessary CPU cycles to be wasted and in very big networks this can become an important issue. For distance-vector routing protocols, this does not pose any problem because they can detect that no change in reachability of subnets is present so there is no need to update neighboring routers.
Full knowledge regarding a network topology is the most important feature that permits a link-state protocol to work well within random topologies, but this becomes a limiting factor for routing scalability for networks with a regular topology.
For regular topologies, the low number of advertised prefixes and the connectivity that is characterized by a hub-and-spoke topology, make network designers to pick distance-vector routing protocols for their fast convergence.
3.5.3. Solution challenges
For designing a scalable network using the OSPF protocol the following considerations are to be taken care of:
– OSPF protocol must be implemented with the same properties of a distance vector protocol, meaning that we have to have as smaller areas as possible, to protect spoke routers from routing information sent from other spoke routers;
– Limit the number of LSAs, implicitly the size of the OSPF routing database by making it independent of the number of spoke sites because this puts additional overload on all network devices computing resources.
– Protection against routing loops that can be caused by accidental misconfiguration or automatic rerouting in case of a single network equipment failure.
– Have common administrative routing policy through a standardized configuration that provides unified management and a more scalable network.
As a conclusion, by using OSPF to propagate routing information between areas must comply with the requirements of a distance vector routing protocol, which means the announcement of reachability and routing metrics by area border Routers have to be propagated only for external routes (inter area routing).
4. Overview of BGP routing protocol
BGP (Border Gateway Protocol) is a path vector protocol. It is a network routing protocol which maintains the path information that gets updated dynamically. Updates travel propagate from one device to another and once they have looped through the network and returned to the same node are detected through their unique signature and then discarded. This algorithm is sometimes used in Bellman–Ford routing algorithms to avoid "Count to Infinity" problems.
It is different from the distance vector routing and link state routing. Each entry in the routing table contains the destination network, the next router and the path to reach the destination.
“Path Vector Messages in Border Gateway Protocol (BGP): The autonomous system boundary routers (ASBR), which participate in path vector routing, advertise the reachability of networks. Each router that receives a path vector message must verify that the advertised path is according to its policy. If the messages comply with the policy, the ASBR modifies its routing table and the message before sending it to the next neighbor. In the modified message it sends its own AS number and replaces the next router entry with its own identification.”
The best example of a path vector protocol is Border Gateway Protocol. For BGP, the routing table acquires and holds all the autonomous systems that are traversed in order for a packet to reach the destination system.
First of all, it is important to know the differences between BGP and the other routing protocols commonly used in today’s networks. Routing protocols can be separated in two categories: interior and exterior. Interior routing protocols, which are known as IGPs (Interior gateway protocols) are used inside an autonomous system to exchange routing information. An autonomous system is a set of routers, under a common administration that use an Interior Gateway Protocol and common metrics in order to determine the routing paths of packets inside the autonomous system and inter-autonomous system routing protocol to determine the paths for routes outside the autonomous system. Even though more than one IGP can be used inside an autonomous system, from the point of view of BGP, everything inside it represents a homogenous and independent entity.
An Exterior Gateway Protocol (EGP) is a routing protocol used to exchange information between different autonomous systems. The sole EGP protocol in use today is BGP, more specifically BGPv4.
Although BGP and IGPs are very different, at the core they are used for the same purpose. They are both used to exchange IPv4 prefixes, they have certain rules in order to choose the best path when having the prefix received from more than one source and also BGP forms neighbor relationships before any prefix exchange, just like some IGPs.
One of the most important differences between BGP and IGPs is that the first uses the best path algorithm which can be used with a great performance increase in Clos networks as well. This algorithm is used to choose the best BGP route using some rules that are more complex than the classical idea of metric, used by IGPs. By using this method, network engineers are able to influence and control the path for a route with a great level of flexibility.
When picking the best route, IGPs use an integer number called metric. RIP uses hop count as its metric, OSPF uses cost which is calculated from the bandwidth of a link, EIGRP uses a composite metric calculated using the bandwidth and the delay of the links and IS-IS uses also bandwidth. The common aspect is that all these routing protocols try to calculate the metric with the objective of having the best speed from the router that calculates this metric to the destination prefix. In comparison, BGP is not interested in the speed when choosing the best path. It uses a complex algorithm that has several degrees of liberty easily influenced by the network administrator, thus making BGP a policy-based routing protocol that allows for traffic to be controlled between different autonomous systems. Because of this aspect, BGP doesn’t always pick the optimum path to be used in the routing table.
4.1. Stability
Stability is important for any routing protocol, but considering the application we use it for, for BGP this is a critical aspect. A route flap can be devastating for the entire set of autonomous systems. BGP deals with this problem by implementing several timers used to minimize the effects of interfaces joggling between up and down states. Also a router tracks a route’s flapping history and, if a certain threshold is reached, the route is not used. This is called route dampening.
As previously stated, BGP is a policy based routing protocol. This means that we can apply certain configurations that change the default behavior of how routes are treated. But when we want to change and inbound policy for example, the router needs to receive the entire routing information from its neighbor again. This can be done by a “hard reset” which means that we interrupt the BGP session and we bring it back on. This generates a big amount of traffic in a short period of time and also a delay in the packet forwarding capability. The solution is to use capabilities such as soft reconfiguration and route refresh. They allow for the routing information to be updated without the need to reset the connection. These capabilities will be discussed in more detail in a future chapter.
Non-Stop-Forwarding (NSF) is another capability that aids to stability. When a BGP session restarts, for an amount of time, until the peers synchronize, there can be packet loss. To avoid this, the routing information from the previous session is used to forward traffic while the session is being reset.
4.2. Scalability
Scalability, in the case of BGP, can be evaluated considering two aspects: number of peers and number of routes supported, in our case layer 3 switches. Even though the maximum number of peers and routes supported depends on the hardware capabilities of the equipment (CPU, memory), generally speaking BGP was designed to meet the needs of the Internet’s routing table, which is very large. In order to increase scalability, only best paths, that the advertising router itself uses, are advertised to the neighbors. So, when the best path changes a message is sent to update the information that the neighbor has.
Regarding the number of peers, the problem is caused by the fact that within an autonomous system, all BGP speakers need to be in a logical full mesh connection in order to exchange routing information. This greatly limits the scalability of a topology because for each new router, a large number of BGP sessions need to be established. Two methods that are of great use in dealing with this problem are route reflection and confederations, which eliminate the need of having a full mesh topology.
For a Clos structure none of these solutions will be necessary because there are a lot fewer equipment that the maximum supported number by BGP and the overhead from the established sessions between the equipment does not represent a problem.
4.3. Flexibility
The great flexibility of BGP is a consequence of the fact that it is a policy based routing protocol. The large number of path attributes that can be used to influence routing decisions is the reason why BGP is such a flexible protocol.
BGP policies can be defined either for the inbound or outbound direction and they affect the route selection decisions. For example, an inbound policy can be applied in order to accept routes that are originated from a certain autonomous system or to change the default attributes of a route to make the router on which the policy is applied the exit point for the autonomous system.
These policies can be routing policies or administrative policies. Routing policies refer strictly to how traffic will be handled by the BGP speakers: prefer a route over another by modifying attributes, setting an exit point for the routers inside the autonomous system etc.
The administrative policies refer to what information should be allowed in and out of the autonomous system, regarding BGP. For example, is an enterprise connected to two ISPs. If no policies are applied, the enterprise could become a transitory autonomous system, meaning that ISP1 would send traffic towards the enterprise to reach routes from ISP2 or from the Internet in general this can be in principle the case between two DCs. To avoid this, outbound routing policies which impose that only local routes are advertised to the exterior are necessary.
4.4. Neighbor relationships
Just like a local network is viewed as a set of arbitrarily connected equipments, the Internet is viewed as a set of interconnected autonomous systems that communicate with each other via BGP. The routers that fulfill this purpose are called BGP speakers. BGP speakers that form a neighbor relationship with other BGP speakers are called BGP neighbors or peers. A BGP speaker has a limited number of neighbors with which it forms a TCP connection. These peers can be located in the same AS or in different AS’s. The speakers communicate with each other network reachability information as the policies implemented dictate. There are two categories of BGP neighbors: If the routers forming the neighbor relation are in different autonomous systems the two routers are referred to as external BGP peers. EBGP neighbors are usually connected to the same subnet. In the other case, if the routers are in the same autonomous systems, they are called internal BGP peers. As there are two different types of neighbor relationships that behave differently, the AS number of the peer must be associated with its identifier. Depending on the size of the network and the number of connections to the exterior, the number of peer relationships varies. As there are multiple connections to the outside, multiple external BGP adjacencies will be formed. These BGP speakers must offer a consistent image of the autonomous system from which they offer reachability information. For this to happen, the policy constraints applied to the BGP speakers must be consistent within the autonomous system.
In a redundant network, between two routers there is more than one path. In order to take advantage of this, BGP neighbor relationships are usually created between loopback interfaces. Because a loopback interface does not go down unless the router goes down, the BGP adjacency is no longer bound to a physical interface. But in order to create an adjacency using the loopback interface, a given router must know how to reach the neighbor’s loopback IP address. To fulfill this purpose, an IGP is used inside a domain. In the test topology for my practical part, I have chosen OSPF as an IGP. The sole purpose of OSPF in this scenario is to provide reachability in the domain to all loopback IP addresses, from any point of the network.
4.5. BGP Path Attributes
When a BGP speaker sends update messages about known networks, it sends in fact a list of prefixes, network masks and a set of path attributes. These update messages are called network layer reachability information. BGP uses the path attributes to decide whether to use a route or another when it has more than one path to a destination. As BGP is primarily a routing policy tool, these path attributes are very important and are used extensively in controlling the traffic paths. The process by which BGP examines the different path attributes for routes that compete for a place in the routing table is called the best path algorithm. The best path algorithm has many steps and even though some of the attributes may be identical for two competing routes, at a point one of them will be chosen.
Not all path attributes are used for the best path algorithm. For example, the Next Hop attributes points to the next hop for that route, but it has nothing to do with the selection of the best route. On the other hand, the Local Preference attribute is used inside an autonomous system to influence outbound paths. The attributes are split in the following categories: well-known or optional, mandatory or discretionary, transitive or nontransitive. There are four valid combinations of these characteristics in which path attributes reside:
Well-known mandatory
Well-known discretionary
Optional transitive
Optional non-transitive
All BGP implementations must contain the well-known attributes. The difference between the two Well-known categories is the following: even though all routers must recognize these attributes when receiving an update message, the mandatory ones must appear in all the messages as opposed to the discretionary attributes that don’t have to be present in all the update messages.
If an attribute in not mandatory is optional. Transitive attributes must be passed to other BGP speakers when received by a router that does not implement that specific transitive attribute. In the case of non-transitive, BGP routers eliminate the attribute and they do not pass it to their neighbors.
With respect to the previous categories, we can have the following attributes:
Well-known mandatory :
As Path
Next hop
Origin
Well-known discretionary:
Local preference
Atomic aggregate
Optional transitive:
Aggregator
Community
Optional non-transitive:
Multi-exit-discriminator
4.6. The AS-Path Attribute
The As Path is an important well known mandatory attribute because when there are no BGP policies implemented, the AS Path is usually the decisive attribute when choosing the best path. Also, this attribute is used as a loop prevention mechanism in inter-as communications: if a BGP speaker receives an update message containing a route that has in its AS Path the autonomous system in which it resides, it will discard the route. The AS Path is a list of the autonomous systems that a prefix has passed through and has the AS number where the route originated at the end of the list and the last AS placed at the beginning of the list. When a route is advertised outside an autonomous system (when it is sent to an external neighbor), the autonomous system number is added to the list. So, when an update is propagated inside the system the AS list does not change.
4.6.1. The Next-Hop Attribute
The Next-Hop attribute is well-known mandatory and is the IP address of the next hop that is to be used in order to reach a destination. The next hop does not need to be directly connected. Compared to IGP routing protocols that act inside an autonomous system and have as next hops directly connected routers, BGP works at the level of autonomous systems, and hence, if no modifications are made, next hops represent those autonomous systems (the entering points). If the next hop is not the immediate router, a recursive look-up in the IP routing table is necessary.
As a condition, a route must have a reachable next hop before it is considered as a valid candidate (the next hop must be under a certain route in the routing table). If a route is propagated inside an autonomous system, the next-hop for that route remains unchanged and it points to the IP address of the BGP speaker from the neighboring autonomous system that advertised the route, unless configured explicitly to use the IP address of the edge router in the local autonomous system.
This could be a problem because usually an IGP is used inside an autonomous system and it does not include the routes that are external to it. To solve this problem, we can either include those networks between our routers and the ISP’s routers in the IGP or we can configure a static route pointing to that next hop. In the case of external BGP relationships, the next hop is updated to the IP address of the neighbor that advertises the route. There is an exception to this rule that is called the “Third Party next hop”.
4.6.2. The Local Preference Attribute
The local preference attribute is a well-known discretionary attribute. It is exchanged between IGP peers in order to set a preferred exit point out of the autonomous system. A higher local preference is preferred when choosing a best path. As it is used only for internal purposes, this attribute is not included in routing advertisements sent to external neighbors. It is usually set by a route-map in the incoming direction of eBGP updates.
Fig. 4.1. – Local Preference
This is illustrated in the picture above (Fig. 4.1.). Autonomous system 64520 has two exit points, router A and router B. Both these routers receive an update about the 172.16.0.0 network. Router A and router B are IBGP neighbors. Router A sets the local preference for 172.16.0.0 to 200 and router B sets it to 150. In this case, all the routers from inside the autonomous system 64520 will prefer router A as the exit point.
4.7. The Community Attribute
The community attribute is an optional transitive attribute. If not understood by a certain router, it is passed by to its neighbors. A community is a group prefixes that share common properties. This is marked by a tag applied to these prefixes. In this way, polices can be applied to communities rather than to individual routes. This tag is 32 bits in length and is split in two parts: the upper 16 bits represent the AS number of the autonomous system that created the community. The lower 16 bits are the community number. They have only local significance.
4.8. The Multi Exit Discriminator Attribute (MED)
MED is an optional nontransitive attribute. As opposed to Local Preference, MED is used between external BGP peers to indicate which path to choose in order to enter into an AS. It is used in inter-as communications, when an AS tries to influence a neighboring AS as to which path it should use to reach a network, if more than one entry point in that AS exists.
Fig 4.2. – MED
Routers B and C from autonomous system 65500 form an eBGP relationship with router A from autonomous system 65000 as seen in Fig. 4.2. Router B sets the MED to 150 for updates to router A whereas router C sets the MED to 200. For consistency, the router with the lower MED is preferred as the next hop (as routes with lower metric are preferred when using an IGP).
4.9. Best Path algorithm
When a BGP speaker receives an update about a specific network, it will decide which path to use to reach that network based on the attributes described before. The list of path attributes is sequentially checked for a network that has two or more possible paths, and because BGP was not designed for performing load balancing, only one of them will be selected and used for routing (if there is no other route learned from another routing protocol, with a lower administrative distance). The rest are kept in the BGP table in case the winning route is lost.
In order to be considered for the best path algorithm, a route must fulfill the following properties:
The next hop for that specific path is reachable
The path is synchronized, if this feature is activated (not used anymore in today’s networks)
The path is allowed in the BGP table by inbound policies
The route is not dampened
Depending on the vendor, the path attributes are checked in a certain order. The following algorithm is from the Cisco routers. If the paths tie at the first step, the algorithm compares the next attribute and so on.
Weight (Cisco proprietary, higher is better)
Local preference (higher is better)
If the previous steps tie, the locally originated route is preferred (with the network command for example
Shortest AS path
Lowest origin code (IGP smaller that EGP smaller that incomplete)
MED (lowest wins). As a note, the MED is considered only if the neighboring autonomous system for the considered paths is the same.
Prefer external paths (EBGP) over internal ones IBGP
If all paths are internal, the BGP speaker will prefer the route with the shortest internal metric to the BGP next-hop (in this way, the internal routing protocol can influence the path selection process).
For EBGP, select the oldest path (this is because the probability for a route to be stable is higher if it is older)
The path with the lowest BGP neighbor ID is preferred
If the router ID is the same, prefer the neighbor with the lowest IP address (the IP address with which the TCP neighboring connection is made)
4.10. Hierarchical Route Reflection
As the networks grow bigger, it not unusual to have more than one RR in an autonomous system. As full mesh is required between these route reflectors, the number of iBGP connections could still be high. In order to prevent this, RR hierarchies has been introduced (Fig. 4.3.). This implies 2 or more levels of route reflectors, the lower one serving as clients to the higher hierarchies.
Fig. 4.3. – Hierarchical route reflectors
Hierarchical route reflection is illustrated in the picture above. Because Level 1 route reflectors are also clients, they don’t need to be fully meshed with each other, only the level two RRs still need to be fully meshed, because they don’t act as clients for any other BGP speaker. This further reduces the need of iBGP connections in an autonomous system.
As previously stated, a route reflector will only reflect the best path. Even though a hierarchical Route Reflection domain inherits the same characteristics as a normal route reflector domain, the consequences of the path selection process are more impactful on a hierarchical topology.
For example, if a route is received by R7, it will send it to both R4 and R2. R4 and R2 will send that route to R1. R1 will perform the best path algorithm and will select only one route to send to R5. R5 will receive the route from R1 but also from R3. It will perform the best path algorithm and will send only one path to R9 for example.
All the decisions made before the route reaches R9 are made based on the local policies implemented in each Router Reflector. If these policies are inconsistent throughout the autonomous system, suboptimal routing could occur. If the policies are the same on all routers, there will be no problems.
4.11. eBGP and iBGP
As a conclusion, it is important to have a clear knowledge of the functions fulfilled by eBGP and iBGP and the differences between them. As iBGP speakers do not pass information to each other, the peering inside the AS must be full mesh. The full-mesh requirement is not required for eBGP. The Local preference attribute is advertised only by iBGP and not by eBGP. The AS Path and next hop attributes are not modified by iBGP, whereas these attributes are modified when transmitted to external neighbors.
Because and IGP is usually used inside an autonomous system to have full connectivity, iBGP does not require direct connectivity (the IP address of the iBGP neighbor that is not on the same segment with the current router is reachable by an IGP). As opposed to this, because in general there is no IGP between two different autonomous systems, direct connectivity if required for eBGP peers.
Prefix synchronization is required between iBGP and IGP in order to prevent routing loops. This means that a prefix learned via iBGP is not considered for the best-path process unless that same prefix exists in the IGP. In today’s networks, synchronization is disabled by default and it is recommended to be left like this. EBGP has no synchronization requirements.
Routes in iBGP are not redistributed into IGP (by default), even if redistribution between BGP and IGP is configured. This is done is order to avoid loops. This is not the case for eBGP.
5. Integrating Clos with BGP
Network designs that make use of IP routing down to Tier-2 of the network have gained a lot of popularity in recent years. The benefit of these designs is highlighted by an enhanced network scalability and stability, as results of separating layer 2 broadcast domains. Commonly an IGP such as OSPF can be deployed in such a design as the main routing protocol but BGP is starting to look more appealing because of its traffic engineering properties and otter appealing features.
In comparison to the needs for network scalability and stability, as data centers grow in scale and because server count can exceed thousands for such routed designs, this Layer 3 only design, can scale to meet all requirements by using a greatly simplified network. This also facilitates for the needs of large DCs, in which this design has widespread the adoption in networks for which large Layer 2 adjacency and larger size Layer 3 network stability and scalability are more important than subnets.
In order to meet the requirements that have previously been the goal of large Layer 2 domains, application providers and network operators continuously develop new solutions.
The same principles of the Clos networks can be applied for creating an IP network fabric, as long as many networks often referred to as “Spine and Leaf “ networks as specified before are already designed like this and are.
An IP fabric is just a way of defining that we use a layer 3 design and we are actually creating a fabric over the physical underlying network.
5.1. BGP advantages over OSPF and other Routing Protocols
The most used protocols that work at layer 3 for the control plane of an IP fabric are: IS-IS, OSPF, and BGP. The protocols vary in terms of features and scale but each one fundamentally advertises prefixes, subnets. On one hand, IS-IS and OSPF use the flooding technique to send updates, other routing information and by creating areas they help limit the amount of flooding for the price of losing the benefits that SPF (Shortest Path First) routing protocol has, while on the other hand, BGP was developed from the ground up in order to support a large number of peering points and prefixes.
The ability to move traffic around in an IP fabric is useful, taking into consideration that we can steer traffic around a specific spine switch while it’s in maintenance or out of order. Due to the fact that IS-IS and OSPF have limited traffic tagging capabilities and traffic engineering, BGP is a preferred solution because it was designed having in mind to support extensive traffic engineering and tagging with features like local preference, and extended communities.
One of the interesting side effects of building a large IP fabric, despite the fact that OSPF and IS-IS work well across multiple vendors, the real winner here is BGP, because building such a large IP fabric well across multiple vendors is done iteratively and over time.
As an example of the above mentioned aspects, the best use case in the world for BGP implementation in IP fabrics is the Internet, as it consists of a huge number of equipment produced by different vendors with variable characteristics and they all use BGP as the control plane protocol to advertise prefixes, perform traffic engineering, and tag traffic. BGP became the best choice when selecting a control plane protocol for an IP network fabric due to its features regarding scalability, traffic tagging, and stability
It is to be also mentioned that BGP has less complexity within its protocol design referred to its internal data structures and state-machines which are simpler in comparison to a link-state IGP such as OSPF, IS-IS and other existing Layer 3 protocols. BGP simply counts on TCP for its basic transport, instead of implementing the necessary adjacency formation, adjacency maintenance and/or flow-control as other protocols require. More than that, overhead for the information flooding in BGP is less than the link-state IGPs, considering that every BGP enabled equipment in the design calculates and propagates only the best-path selected. Therefore a network failure is self-healed as fast as the BGP sender finds an alternate path if available, like in network fabric with highly symmetric topologies, such as Clos networks coupled with EBGP only solution.
In case of failure, the event propagation radius of a link-state IGP to an entire network regardless of the type of failure is a major disadvantage of IS-IS and OSPF protocols in comparison with BGP.
Some other major advantage of using BGP is related with the unnecessary periodic refreshes of routing information which is necessary in case of other deployed link-state IGPs, because BGP routes do not expire. So, this rarely causes an important impact to modern router control planes.
Application-defined forwarding paths, that establishment a peering session with an application "controller" which can relay routing information for the system, is supported by BGP like recursively resolved third-party next hops. BGP is able to manipulate multi-path to be non ECMP based or forwarding paths and although OSPF permits similar performance and functionality by using an approach such as "Forwarding Address", this poses more difficulty in implementation and lacks protocol simplicity.
Allocation scheme of all controlled and complex unwanted paths for a well-defined BGP ASN will be ignored by BGP which is a major advantage in comparison with all link-state IGPs that can accomplish this only by using multiple instance, processes, topology support that is not usually available in all Data Center equipment and which are very complex to configure and troubleshoot thus serving to separate the traditional single flooding domain that Data Centers commonly use. Under certain failure situations other routing protocols may pick-up unwanted lengthy paths while traversing multiple Tier-2 devices.
It is to be mentioned that an important thing to be done in implementing the BGP protocol in Clos structures is to prior visualize the content of BGP Local RIB and compare it to the router's RIB, together with BGP corresponding neighbor that has Adjacency RIB In and Adjacency RIB Out structures with incoming and outgoing information that can be easily configured to be the same on both sides of a BGP session.
BGPs best-path selection process prevents routing loops from creating because it prefers shorter AS PATH length and for devices present in the Spine that can present a longer route, they don’t permit their own AS in the path for the same reasons.
5.2 Private BGP ASNs
The range of private BGP ASNs limits operators to 1023 unique ASNs. Because it is quite possible that this number be exceeded by the network elements, an analysis should be performed regarding the ASNs assigned to the devices across different clusters in order to possibly reuse them. Private use of BGP ASNs like 65001… 65032 can be used within each individual cluster and assigned to network devices.
Some problems in accepting a device's own ASN in received route advertisements are generated by the loop detection mechanism in BGP that is used to avoid route suppression. Nevertheless Leaf devices for upstream EBGP must be configured with the "AllowAS In" feature that permits a device's own ASN to be accepted in received route advertisements. Of course, this feature does not solve the problem of routing misconfiguration that in some scenarios can lead to routing loops when different tier routes are advertised to different ASNs. Tier-1 devices never accept routes with a path including its own ASN to further avoid routing loops.
The extensive solution to this problem would be using more octets for BGP ASNs, which provides additional private available ASN's. The use of Four-Octet BGP ASNs can be made with the cost of increasing the BGP protocol complexity implementation, so it would make sense to reconsider the complexity of re-using ASNs. Nevertheless the Four-Octet BGP ASNs are not at this moment supported by all BGP implementations, which in turn may limit vendor selection of data center equipment.
5.3. Prefix Advertisement between Tiers
All point-to-point links and associated prefixes of a Clos network topology advertising of these routes into BGP puts additional path calculation stress for the BGP control plane and can create FIB (Forward Information Base) overload conditions in the network devices, almost for too little advantage. In order to solve this issue there have been identified two possible solutions.
The first one consists in not advertising any of the point-to-point links in BGP. Distant networks are automatically reachable by the advertising EBGP peer even if a design using EBGP changes the next-hop address at every device. Also, this prefixes are not necessary for normal operation of the network, excluding operational troubleshooting or monitoring systems that need to reach this addresses. Think of usual ping or traceroute debugging operations will not be available if needed.
The second solution is to summarize prefixes of point-to-point links on every device, which will require an address allocation scheme such as provisioning a consecutive block of IP addresses per Leaf and Spine devices to be used for addressing point-to-point interfaces to the lower layers, as Leaf uplinks will be numbered out of Spine addressing.
As long as server subnets are announced into BGP without summarization black-holing under a single link failure is avoided.
Black-holing problem can be avoided by using peer links within the same level but it costs in terms of complexity, number of ports used on the equipment. By tweaking every device to use a different BGP ASN, a simpler “ring” topology can be implemented as an alternative to the full-mesh network topology, but it would introduce extra hops for a given destination.
Also, for management purposes that a key property of the entire network, one thing that can be done is advertise the loopback prefixes for each device. This can be done in a simple way, by aggregating the loopback addressing space at all levels in our Clos Network.
5.4. WAN Connectivity
For connections to the to the WAN, routers or edge devices also referred to as Border Routers, in the context of a Clos network topology, have a dedicated cluster or clusters depending on how large the topology is.
The devices in this cluster play a special role in connecting the rest of the tiers to the WAN and have some important features.
When advertising the paths it can reach on the local network to the WAN routers, it is important to hide information about the network topology (AS PATH attribute) because this overhead information can cause collisions when interconnecting more DCs. In order to do so an implementation specific to BGP removes Private BGP ASN’s from the AS PATH. This is done by a feature called "Remove Private AS" that can remove the private ASNs found in the path attribute prior to sending the advertised routes to the neighboring WAN devices.
Another feature these devices must accomplish is to originate a default route to all devices in the DC. This is the connection to the WAN and as route summarization is risky for a Clos network topology, this is the only location in the network where such a default route can be originated. To provide resistance to a single-link failure that can cause the traffic to be captured in a black hole, Border Routers may alternatively relay the default route learned from the WAN, but this requires that all of this routers to be connected to the WAN routers located upstream. Chances are that operator configurations or implementation error can cause the BGP session to the WAN to be affected simultaneously and because the first approach introduces complicated conditional default configurations schemes it is recommended to take the second approach.
5.5. Route Summarization at the Edge
Prior to advertising network reachability information to the WAN network in a fully router network design it is desirable to summarize the IP prefixes network due to constraints and performance issues of the WAN network devices. A network with many layer 3 devices and a Clos topology with for example 1000 Tier-2 devices will have as many server subnets advertised into BGP, along with infrastructure or other prefixes.
As mentioned before, summarization for our network is not permitted due to the lack of peer link inside each tier. Although by interconnecting the Border Routers using a full mesh of physical links we can overcome this restriction. Also, BGP must be configured accordingly by adding a mesh of iBGP sessions such that all Leafs can exchange network reachability information and in the case of a device or link failure underneath the Border Routers the interconnecting peer links need to be appropriately sized for extra traffic that will be transported.
If protection from a single link or node failure is desired, Tier-1 devices may have additional physical links provisioned toward the Border Routers. This adds additional requirements as more ports will be used and necessary for the connection between Tier-1 devices and Border Routers, and can potentially make a non-uniform, larger port count, Clos network. At the same time, the number of ports available to "regular" Tier-2 switches and the number of clusters that could be interconnected via Tier-1 layer is reduced.
If any of the options described are implemented, summarization at the Border Routers toward the WAN network core without risking a routing black-hole condition under a single link failure is possible. Both of the options would result in non-uniform topology as additional links have to be provisioned on some network devices.
5.6. ECMP (Equal Cost Multipath) load-balancing
The primary mechanism for load-sharing used by a Clos topology is ECMP (Equal Cost Multipath). ECMP functions by instructing all lower tier devices to load-share traffic destined to the same IP prefix to all connected upper-tier devices. For a clos network, the number of the top stage equipment is equal to the number of ECMP paths between any two Leaf devices.
Any BGP implementation that uses ECMP for load sharing requires that BGP must support multi-path connections for all of the directly connected devices in upstream, downstream and at any point in the network topology. If we have for example 64 port devices we would require an ECMP fan-out of 32 when designing the Clos structure. When route summarization is performed on the spine equipment, Border Routers may have to support a larger fan-out in order to support connections to all of the tier-1 devices. Logical link-aggregation at layer 2 can also be used to compensate for fan-out limitations if the hardware used does not support wider ECMP.
For our layer 3 Clos, because we use a BGP IP fabric, policies across all paths may be applied as needed to equalize BGP attributes that can vary for different vendor defaults. Also, because no IGP is used, all costs are assumed to be zero.
5.7. The Impact of Failure on the network
The more devices that have to be notified by a failure, the larger the impact scope of the failure and typically the slower the convergence of the network. Once all devices within a failure impact scope recalculate their RIB’s (Routing Information Base) and update their FIB’s (Forward Information Base) respectively, after being notified of the failure it is said that a network has converged with respect to the issue. One of the main advantages of BGP over other link-state protocols is that it for our Clos structure it reduces the failure impact scope.
If the node experiencing the failure can find a backup path in its RIB, it can quickly overcome this failure and mask it from the rest of the network by not having to send updates to neighboring peers. This is possible because in a sense, BGP behaves like a distance-vector protocol and from the point of view of the local equipment, only the best path is set to its neighbors.
If an alternative route is not found when a failure occurs, the worst case scenario that can happen, all devices present in the Clos topology must withdraw an IP prefix completely from all FIBs or update all of the ECMP groups for the respective FIBs.
When a tier-1 device experiences a link failure, all other devices that are directly attached downstream must receive news about this failure and update their ECMP session for each IP prefix that was learned through that respective equipment.
All the prefixes share a single ECMO group so even if such widespread failures do occur, where multiple IP prefixes must be changed from the FIB, for implementations using such hierarchical designs, only a single change must be made in the FIB to accommodate the new changes.
Because summarization of the IP prefixes is not always possible within a Clos network design, due to the fact that this may create routing black-holes, reduction of the fault domain by using summarization we have to rely on the minimization of the failure scope provided through BGP alone.
By choosing EBGP we do not alter the fact that we have such large failure scopes in our design, because this is actually an unwanted property of our Clos network. It should be pointed out that failure in the top layer (Spine) of the design has an effect on more equipment than failure that can happen at the bottom level, the Leafs.
5.8. Routing Micro-Loops
Normally, any Leaf device has a default route pointing to the upstream devices and when it loses an IP subnet prefix for anything aggregated at lower Tiers, it will by default use that route. This, may lead to a micro-loop between the two Tier devices because the upstream Spine equipment will still have a path directing traffic to that specific subnet while the downstream Leaf will redirect traffic on the default route back to the Spine. Although this micro-loop will only last for the time it takes the Spine to update its routing table, it can cause some high traffic that can eventually make healthy packets get dropped because of congestion.
Static routes that discard this traffic by routing it to a black-hole interface are more specific than the default route and can be implemented in order to minimize impact of the micro-loops during network convergence. Static discard routes should be a summarization of all IPs in the subnets used by the DC servers in order for an easy configuration and troubleshooting and should have and administrative distance higher than the routes learned through BGP.
6. Practical application
6.1. Scope of the research
The main purpose of the practical research is the implementation, both in a physical environment and by means of a simulation, of all the transport network technologies deployed in a clos network environment. This is accomplished in order to demonstrate the working concept and advantages of a Clos network design using BGP as the main support of our Layer 3 network over the same network using STP at Layer 2.
Let us detail the layers composing our network in the design that we will physical implement.
The two different approaches that will be tested are comprising both layer 2 and layer 3 Protocols. The first one to be analyzed is the Layer 3 BGP implementation and afterwards we will take a quick look at the Layer 2 STP protocol implementation.
Leaf or the ground floor level comprised of 16 Layer 3 switches is the main point of aggregation for the access network (or Server LAN) as depicted in figure Fig6.1. It is made up of 16 layer 3 Switches, each connected with one link to each Spine Tier (Top Floor) Layer 3 Switch. In this way, we assure redundancy for each of the servers in case of link failure thus eliminating a single point of failure with respect to connectivity.
By using BGP together with the bulk of links we can assure traffic balancing across each of the Spine Switches, which provides constant high speed and low latency for all server communication and assures that all network resources are used evenly (links and switching processing).
The second tier consists of 4 main, high capacity and speed Layer 3 Switches also running BGP as the routing Layer 3 protocol that represent the backbone of the entire design. Any communication between the servers or directed to the exterior (WAN) must pass through these switches. The spine level (as it is often called) is in also in charge of load-balancing traffic destined to back to the servers and also, traffic that is directed to the Border Routers.
The presented configuration up until now represents the equivalent of a non-blocking matrix because it respects the Clos theorem that is, the minimum number of Leaf level switches must be interconnected respecting the formula 2expN-1 where N is the number of Spine level switches.
The Border Router level consists of two Routers also running BGP that provide connectivity to the Internet and other Data Centers. In some cases this level features load-balancing between the servers for incoming traffic and provides security policy implementation through another level of Firewalls or directly through adaptive security traffic types.
The aggregation layer – it is mostly deployed in communications networks due to the existence of wide area spaces, where the aggregation equipment serves for limiting the of the backhaul connections of the higher layer network equipment. At the aggregation point, multiple servers connect into Ethernet ports and the migration towards an IP topology means that we are no longer constrained to use VLANS for traffic separation. Another advantage of not using VLANs is the rapid deployment of new sites.
The all-IP infrastructure is meant to smoothen the traffic between servers, and access to the Internet, onto a converged network, that allows prioritizations and traffic policing. It also implements additional security features that must not be forgotten in such cases.
Spine network level, or backbone – is the part of the data network communications that ensures the rapid flow of information on an end-to-end basis. It is composed of network equipment dedicated both for the routing of packet core traffic, but it also offers high-speed packet forwarding, resiliency and fast convergence. The default gateways are distributed at this level either through static routing or through BGP sessions with the Border Routers, being the only place where we can distribute a default route if configured static.
A representation of the Clos network upon which the tests described in section 6.5 have been carried out, is depicted in Fig.6.1
Fig.6.1. – Clos Network topology
6.2. Transport Solution
As seen, there are two possible approaches when discussing about the data network solution design and implementation. The solution can be described form the access layer towards the core, or starting from the spine and emerging though the Leaf level. First we will talk about.
Each Layer 3 switch in running the BGP routing protocol, announcing in the protocol only one loopback interface that makes it reachable to the neighbors. The BGP AS area identifier is followed by the ID of each router, consisting in the IP address of the loopback interface advertised to all the other BGP neighbors.
It is not a necessity that all eBGP peers to be fully meshed neighbors, as long as any IP subnet is reachable via through each link. Another advantage is that our network can scale to the sizes necessary for a Clos network without the uses of Route Reflectors that have the job of distributing updates.
6.3. IP Addressing
Addressing between Leaves and Spines has been chosen in order to have an easy management and according to the seated areas configured into BGP. Between Spines and all the Leaves we used the point to point addressing (255.255.255.254) from the address class 192.168.1.0, 192.168.2.0, 192.168.3.0, 192.168.4.0 as follows:
IP address on the Spines are 192.168.x.y where x is the number of the Spine Switch and takes even values from 0 to 30 (ex: 0, 2, 4…28, 30). Each link between the Spines and Leaves must have an IP address at each side. An example for the addressing used between Spine1, Spine2 and Leaf1 respectively Leaf2 can be seen below.
IP Loopback addressing used by BGP as ID to other neighboring devices was also devised in such a way in order to have easy troubleshooting in case of network failure. For the Spine switches these addresses are 10.0.0.x where x is the number of the Spine Switch and for the Leaves, the loopback addresses are 10.0.0.1z where z takes values form 01 to 16 (ex: 10.0.0.101 for Leaf1, 10.0.0.116 for Leaf16). This can be also seen for Spine1, Spine4, Leaf1 and Leaf16 in the configuration below:
The Server LANs IP addresses have been chosen in such a way to reflect the Leaf switches that connects them to the rest of the network. LAN server have the subnet 172.16.1x.0/24 where x takes the values between 01, 16 (ex: 172.16.101.0/24 – Leaf1, 172.16.106.0/24) and the Leaves take the last IP of this subnet. Also for each Leaf, Dynamic Host Configuration Protocol DHCP has been configured for the subnets chosen. To exemplify, the following configuration for Leaf 1 and 6 is shown below.
Between the Spine Tier and the Border Routers we used 80.0.0.x/30 for connections between the Spines and Border_Router1 and 80.0.1.x/30 between Spines and Border_Router2. The following excert displays the configuration on Broder_Router1, Border_Router2 with connections to all Spines. The first IP from the subnets is given to the Border Router and the last is given to the Spine Switches.
6.4. Configurations
A configuration from two of the switches in our network for BGP can be seen below.
As we can see each Spine and Leaf Switch has been configured to advertise in BGP a unique router-id that identifies it within the network. Each one advertises only the default route and the Server subnet to each other neighboring switches in order to keep the routing table as small as possible on each one. The subnets between each switch is not advertised because it poses no use for network functionality, except if management is not made out of band.
In order to complete the BGP configuration we have to announce the peers with which each BGP Switch will form neighbor-ships and also it is to be mentioned that each Switch will have a different Autonomous System Number. Each Spine Switch has to form relationships with all the Leaves but not with other Spine Switches because it’s not necessary and there are no physical links between them as seen in Fig.6.1. Also, each Leaf will form relationships with all Spines but not with other Leaves for the same reasons.
BGP load balances the communication from the Leaves using the command maximum-path 4, respectively for Spines maxim-path 16 (because there are 16 leaves that can load balance the traffic). This can be seen in the routing table as shown below on Leaf1, Leaf16:
We can see that the subnets for the Server LANs present in the routing table are equally distributed between all of the Spine Switches. For example, 172.16.101.0/24172.16.114.0/24 are reachable through Spine2 while subnets 172.16.110.0/24 172.16.111.0/24 are available through Spine4. We can observe that the Leaves receives all available routes from the Spine Switches and choose to evenly load balance traffic between them. Consequently not all subnets are available through the same Spine Switch.
While the traffic from the Leaves can load balance between a maximum of 4 paths, for Spine switches the maximum number of load balancing paths is equal to 16 because there are 16 Leaf Switches.
As an observation, we can use a protocol named BDF in order to further decrease the convergence time in case of a failure. This protocol works at Layer 2 and when it detects a link failure between two BGP neighbors it removes the subnets affected by the link loos immediately instead of waiting for BGP to take notice that the neighbor on that link is down (BGP uses a TCP check system at a regular interval).
For Leaf1 and Spine1 we have the following configuration:
For Border_Router1 we have the following configuration:
All of the other equipment configurations files (Border_Router, Spines and Leaves) can be found in Annex1.
6.5. Laboratory and Simulation Description
First, we will show the connections between Spines, Leafs and Border_Router (commands will not be detailed for each equipment here but will be presented in Annex1 for further analysis). The physical topology is shown below (more detailed in Anex 2):
Show interface description shows us the local interface status and a description added manually for easy troubleshooting. Status and Protocol field tell us that the interface is up at Layer 2 and 3.
Best practice guides us to configure the description in such a way that it will reflect the destination to which the interface connects but also the remote interface of the neighboring device.
We can see a few examples of the different configuration for Spine, Leafs and Border_Router in the following section.
For testing purposes, only one interface on each Leaf has been configured to actually host servers since it is enough for the requirements of the testing procedure. It is to be noted that at least 12 more interfaces can be used by servers to aggregate to the topology on each Leaf.
In order to check connectivity fast between the different equipment and to get a sense of the design we can run show CDP neighbors. CDP is a Cisco proprietary protocol that can detect neighbors on point to point links and can show a lot of useful information about them like remote peer IP address, the interface on which it is connected and the interface that connects the remote equipment to the current one, name of the device and other important features.
There are more versions of this command, some that give a more detailed output than others. We will use a simple version “show CDP neighbors“ for a fast overview of the network topology and device connectivity.
Border_Router1#sh cdp neigh
Capability Codes: R – Router, T – Trans Bridge, B – Source Route Bridge
S – Switch, H – Host, I – IGMP, r – Repeater, P – Phone,
D – Remote, C – CVTA, M – Two-port Mac Relay
Device ID Local Interface Holdtme Capability Platform Port ID
SPINE1 Fas 0/0 133 R -WS-C3560- Fas 0/0
SPINE3 Fas 1/0 157 R -WS-C3560- Fas 0/0
SPINE2 Fas 0/1 131 R -WS-C3560- Fas 0/0
SPINE4 Fas 1/1 143 R -WS-C3560- Fas 0/0
SPINE1#sh cdp neigh
Device ID Local Interface Holdtme Capability Platform Port ID
LEAF16 Eth 2/7 179 R -WS-C3560- Eth 1/0
LEAF15 Eth 2/6 161 R -WS-C3560- Eth 1/0
LEAF14 Eth 2/5 147 R -WS-C3560- Eth 1/0
LEAF13 Eth 2/4 171 R -WS-C3560- Eth 1/0
LEAF12 Eth 2/3 147 R -WS-C3560- Eth 1/0
LEAF11 Eth 2/2 147 R -WS-C3560- Eth 1/0
LEAF10 Eth 2/1 132 R -WS-C3560- Eth 1/0
LEAF8 Eth 1/7 178 R -WS-C3560- Eth 1/0
LEAF9 Eth 2/0 177 R -WS-C3560- Eth 1/0
LEAF6 Eth 1/5 153 R -WS-C3560- Eth 1/0
LEAF7 Eth 1/6 166 R -WS-C3560- Eth 1/0
LEAF4 Eth 1/3 161 R -WS-C3560- Eth 1/0
LEAF5 Eth 1/4 145 R -WS-C3560- Eth 1/0
LEAF2 Eth 1/1 148 R -WS-C3560- Eth 1/0
LEAF3 Eth 1/2 151 R -WS-C3560- Eth 1/0
LEAF1 Eth 1/0 154 R -WS-C3560- Eth 1/0
Border_Router1 Fas 0/0 155 R -WS-C3560- Eth 1/0
Border_Router2 Fas 0/1 139 R -WS-C3560- Eth 1/0
LEAF1#sh cdp neigh
Device ID Local Interface Holdtme Capability Platform Port ID
SPINE1 Eth 1/0 163 R -WS-C3560- Eth 1/0
SPINE3 Eth 1/2 137 R -WS-C3560- Eth 1/0
SPINE2 Eth 1/1 158 R -WS-C3560- Eth 1/0
SPINE4 Eth 1/3 137 R -WS-C3560- Eth 1/0
If we want to further check the IP addresses on different interfaces for any particular equipment in a fast and organized manner we can issue the command “show ip interface brief”, which details again the interface status but also the assigned IP address on each interface of that equipment.
LEAF1#sh ip interface brief
Interface IP-Address OK? Method Status Protocol
FastEthernet0/0 unassigned YES NVRAM administratively down down
FastEthernet0/1 unassigned YES NVRAM administratively down down
Ethernet1/0 192.168.1.1 YES NVRAM up up
Ethernet1/1 192.168.2.1 YES NVRAM up up
Ethernet1/2 192.168.3.1 YES NVRAM up up
Ethernet1/3 192.168.4.1 YES NVRAM up up
Ethernet1/4 unassigned YES NVRAM administratively down down
Ethernet1/5 unassigned YES NVRAM administratively down down
Ethernet1/6 unassigned YES NVRAM administratively down down
Ethernet1/7 unassigned YES NVRAM administratively down down
Ethernet2/0 unassigned YES NVRAM administratively down down
Ethernet2/1 unassigned YES NVRAM administratively down down
Ethernet2/2 unassigned YES NVRAM administratively down down
Ethernet2/3 unassigned YES NVRAM administratively down down
Ethernet2/4 unassigned YES NVRAM administratively down down
Ethernet2/5 unassigned YES NVRAM administratively down down
Ethernet2/6 unassigned YES NVRAM administratively down down
Ethernet2/7 172.16.101.254 YES NVRAM up up
Loopback0 10.0.0.101 YES NVRAM up up
In this show command we can see the IP addresses assigned to each interface towards each of the Spine switches, the ip address on the loopback that is used to identify the BGP neighbor together with the router bgp “router-id”. Also we can check and see if the Servers LAN (172.16.101.0) is properly configured and up.
Border_Router1#sh ip int brief
Interface IP-Address OK? Method Status Protocol
FastEthernet0/0 80.0.0.1 YES NVRAM up up
FastEthernet0/1 80.0.0.5 YES NVRAM up up
FastEthernet1/0 80.0.0.9 YES NVRAM up up
FastEthernet1/1 80.0.0.13 YES NVRAM up up
FastEthernet2/0 unassigned YES NVRAM administratively down down
FastEthernet2/1 unassigned YES NVRAM administratively down down
Loopback0 100.0.0.1 YES manual up up
Loopback1 8.8.8.8 YES manual up up
For the border router, we can see configured the IP address between them and the Spines but more specific, the addresses 8.8.8.8 (configured on Border_Router1 and respectively 8.8.4.4 for Border_Router2) that represents the connection to the Internet/other DC.
Now, that we saw the basic Layer 3 configuration, we can start and check for any mistakes or to prove connectivity by issuing ping or traceroute commands. We will also check the load balancing implemented in the BGP configurations with traceroutes.
LEAF1#traceroute 172.16.111.1 source ethernet 2/7 probe 12
Type escape sequence to abort.
Tracing the route to 172.16.111.1
VRF info: (vrf in name/id, vrf out name/id)
1 192.168.4.0 [AS 103] 32 msec 28 msec 20 msec
2 192.168.4.21 [AS 103] 16 msec 32 msec 20 msec
3 172.16.111.1 [AS 211] 16 msec 32 msec 24 msec
LEAF1#traceroute 172.16.111.1 source eth2/7 probe 12
Type escape sequence to abort.
Tracing the route to 172.16.111.1
VRF info: (vrf in name/id, vrf out name/id)
1 192.168.1.0 [AS 101] 4 msec (we can see that the first hop on the route to the destination changes between
192.168.2.0 [AS 101] 8 msec 192.168.1.0 (Spine1), 192.168.2.0 (Spine2), 192.168.3.0 (Spine3) and 192.168.4.0 (Spine4)
192.168.3.0 [AS 101] 12 msec meaning that all packets get load-balanced between the 4 SPINES)
192.168.4.0 [AS 101] 12 msec
192.168.1.0 [AS 101] 16 msec
192.168.2.0 [AS 101] 12 msec
192.168.3.0 [AS 101] 12 msec
192.168.4.0 [AS 101] 16 msec
192.168.1.0 [AS 101] 12 msec
192.168.2.0 [AS 101] 12 msec
192.168.3.0 [AS 101] 12 msec
192.168.4.0 [AS 101] 16 msec
2 192.168.1.21 [AS 101] 16 msec (the second hop IP neighbor changes because the packets take different routes
192.168.2.21 [AS 101] 24 msec to reach the same subnet)
192.168.3.21 [AS 101] 24 msec
192.168.4.21 [AS 101] 20 msec
192.168.1.21 [AS 101] 28 msec
192.168.2.21 [AS 101] 12 msec
192.168.3.21 [AS 101] 20 msec
192.168.4.21 [AS 101] 16 msec
192.168.1.21 [AS 101] 32 msec
192.168.2.21 [AS 101] 16 msec
192.168.3.21 [AS 101] 24 msec
192.168.4.21 [AS 101] 16 msec
3 172.16.111.1 [AS 211] 24 msec (destination is reached)
LEAF1#traceroute 172.16.116.1 source ethernet 2/7 probe 12
Type escape sequence to abort.
Tracing the route to 172.16.116.1
VRF info: (vrf in name/id, vrf out name/id)
1 192.168.1.0 [AS 101] 8 msec
192.168.2.0 [AS 101] 12 msec
192.168.3.0 [AS 101] 8 msec
192.168.4.0 [AS 101] 4 msec
192.168.1.0 [AS 101] 12 msec (first hops)
192.168.2.0 [AS 101] 12 msec
192.168.3.0 [AS 101] 12 msec
192.168.4.0 [AS 101] 8 msec
192.168.1.0 [AS 101] 12 msec
192.168.2.0 [AS 101] 12 msec
192.168.3.0 [AS 101] 8 msec
192.168.4.0 [AS 101] 12 msec
2 192.168.1.31 [AS 101] 4 msec
192.168.2.31 [AS 101] 16 msec
192.168.3.31 [AS 101] 28 msec
192.168.4.31 [AS 101] 16 msec
192.168.1.31 [AS 101] 12 msec
192.168.2.31 [AS 101] 12 msec
192.168.3.31 [AS 101] 20 msec (second hops)
192.168.4.31 [AS 101] 16 msec
192.168.1.31 [AS 101] 16 msec
192.168.2.31 [AS 101] 16 msec
192.168.3.31 [AS 101] 16 msec
192.168.4.31 [AS 101] 16 msec
3 172.16.116.1 [AS 216] 12 msec
As we can see, traffic gets load balanced between the Spine switches because each Spine has a route to Leafs 11 and 16 and also back to Leaf1. The “probe 12” extension is used in order to send enough packets at each stage to see all of the paths that can be taken to reach our destination (if we use “probe 2” we will see only two Spines load-balancing).
Furthermore we can see this also by issuing commands specific to BGP like “show ip BGP” or “show ip route” that are shown next and also explained.
This output shows us that for almost all routes learned, BGP carefully chooses from different routes through the 4 Spine Switches in order to load balance all of the traffic that can be generated from the Leafs Server LANs. For instance, the route to subnet 172.16.105.0/24, 172.16.109.0/24 is added in the routing table containing multiple next hop IP addresses (Spine1-4)
Forwarding for these switches is done by Cisco Express Forwarding (CEF) which is an advanced layer 3 switching technology used mainly in large core networks or the Internet to enhance the overall network performance. Although CEF is a Cisco proprietary protocol other vendors of multi-layer switches or high-capacity routers offer a similar functionality where layer-3 switching or routing is done in hardware (in an ASIC) instead of by software and the (central) CPU.
CEF is mainly used to increase packet switching speed by reducing the overhead and delays introduced by other routing techniques. CEF consists of two key components: The Forwarding Information Base (FIB) and adjacencies. To see the CEF we can issue the command “show ip CEF”
172.16.116.0/24 192.168.1.20 Ethernet1/0
192.168.2.20 Ethernet1/1
192.168.3.20 Ethernet1/2
192.168.4.20 Ethernet1/3
192.168.1.20/31 attached Ethernet1/0
192.168.1.20/32 attached Ethernet1/0
192.168.1.21/32 receive Ethernet1/0
192.168.2.20/31 attached Ethernet1/1
192.168.2.20/32 attached Ethernet1/1
192.168.2.21/32 receive Ethernet1/1
192.168.3.20/31 attached Ethernet1/2
192.168.3.20/32 attached Ethernet1/2
192.168.3.21/32 receive Ethernet1/2
192.168.4.20/31 attached Ethernet1/3
192.168.4.20/32 attached Ethernet1/3
192.168.4.21/32 receive Ethernet1/3
224.0.0.0/4 drop
224.0.0.0/24 receive
240.0.0.0/4 drop
255.255.255.255/32 receive
In order to facilitate a fast Layer 3 switching, the CEF table includes the Prefix, IP address of the next hop and the local output interface for each item.
A test was created beforehand in order to test the feasibility. The emulation of the switches is done with the help of GNS3 and a powerful home PC with enough RAM to support all of the equipment in the topology.
As expected the performance was downgraded tenth fold by a factor of 10-100 when using GNS3. If in a real world environment we saw values of 100MB/s, while in our GNS3 implementation, a maximum of 1MB/s was reached because of the large number of switches and equipment used. This proves the concept but because of the limitations imposed by the hardware of the PC running the emulation, the real values of the nonblocking Clos design have been downgraded to the maximum traffic supported by the PC emulating the entire topology.
The hosts have been created using OracleVM VirtualBox separately and then connected to the GNS topology by tunneling each one to the desired Leaf Switch.
In order to test the quality of our network, that it is non-blocking and to test network stability, a program called Jperf is used. This was done by connecting 16 workstations of which 8 acted as clients and 8 as servers but all running Jperf. Each of the clients connected to one server and traffic was generated from all 8 clients to the servers simultaneously.
In Fig. 6.2. and Fig.6.3. we can see the traffic generated with Jperf from two random users to the Jperf servers hosted on different Leaves. The test was made when all 8 servers and clients where active
Fig 6.2. – Traffic generated from the first user (output bandwidth)
Fig 6.3. – Traffic generated from the second user (output bandwidth)
In Fig. 6.4. and respectively Fig. 6.5. we can see two of the servers receiving the traffic any gathering data about packet losses and bandwidth.
Fig. 6.4. – First server measuring the input bandwidth
Fig. 6.5. – First server measuring the input bandwidth
We can clearly see that traffic generated from any of the clients is not affected by other clients and we can say we have a nonblocking Clos topology at Layer 3.
In Annex 2 we have a few photos of the physical topology that was made in a lab. We can see the Leaves stacked on top of each other and the cable links to each of the respective Spine as depicted in Fig.6.1.
7. Conclusions
Each BGP Spine has its own ASN and this means that we don’t need complex iBGP connections between each Spine. For very big clos network this becomes a big issue because if we have many iBGP connections, we would also require additional equipment like Route Reflectors RR.
In order to use different AS for all equipment and still be able to load balance between the different Spine Routers, BGP’s Best Path Algorithm must be overwritten so that BGP neighbors with different AS can still become load balancers for the Routers in the Leaves.
In order to override the Best Path Algorithm we have to use a special command “bgp bestpath as-path multipath-relax” together with the “max-path x” where x specifies the number of load balancers.
The Best Path Algorithm takes into consideration a few variables when choosing the best route or when configured to load-balance. The variables used are Weight, Local Pref, AS Path, Origin, Med, neighbor type and IGP metric. In order for a BGP enabled equipment to permit load balancing, each of the variables that come from the different neighbors must be the same up to IGP metric.
This means that even if the Spine switches are eligible for load balancing, because of the different BGP AS-s used, in normal conditions load-balancing will not be taken into consideration. To change the behavior of BGP to accept different AS when load balancing on neighbors we have to tell it to ignore the AS and this is exactly what the command “bgp bestpath as-path multipath-relax” does.
In my opinion using eBGP instead of iBGP or Layer 2 protocols in large scale Clos topologies can prove quite challenging but the numerous benefits that it provides make it the best. Fast convergence, traffic engineering and scalability are all very important and BGP can satisfy each one.
8. References
The following pages were accessed during May – June 2016.
[1] http://bradhedlund.com/2009/04/05/top-of-rack-vs-end-of-row-data-center-designs/
[2] http://ieeexplore.ieee.org/xpl/login.jsp?tp=&arnumber=6883948&url=http%3A%2F%2Fieeexplore.ieee.org%2Fiel7%2F6878057%2F6883277%2F06883948.pdf%3Farnumber%3D6883948
[3] http://conferences.sigcomm.org/co-next/2013/program/p49.pdf
[4] http://www.networkworld.com/article/2226122/cisco-subnet/clos-networks–what-s-old-is-new-again.html
[5] http://conferences.sigcomm.org/sigcomm/2015/pdf/papers/p183.pdf
[6] http://clusterdesign.org/fat-trees/
[7] https://tools.ietf.org/html/draft-smirnov-ospf-dive-00#page-5
[8] https://en.wikipedia.org/wiki/Clos_network
[9] https://en.wikipedia.org/wiki/Data_center
[10] http://www.pcmag.com/article2/0,2817,2372163,00.asp
[11] http://engineering.mit.edu/ask/what-cloud-computing
[12] https://tools.ietf.org/html/draft-lapukhov-bgp-routing-large-dc-06
[13] http://www.webopedia.com/TERM/C/cloud_computing.html
[14] https://tools.ietf.org/html/draft-lapukhov-bgp-routing-large-dc-06
[15] http://nathanfarrington.com/papers/facebook-oic13.pdf
[16] http://www.juniper.net/us/en/local/pdf/whitepapers/2000565-en.pdf
[17] https://en.wikipedia.org/wiki/Cloud-based_networking
[18] http://m.huizhi123.com/view/f19875f95c54ee341816927ec986182f.html
[19] http://heyya.tk/
[20] http://wpedia.goo.ne.jp/enwiki/Cloud-based_networking
[21] https://www.reddit.com/r/factorio/comments/4mdbdq/belt_balancing_how_to_design_for_n_to_m_lanes/
[22] http://asseria.com/c.pl/194
[23] http://bradhedlund.com/2009/04/05/top-of-rack-vs-end-of-row-data-center-designs/
[24] http://backupsick.weebly.com/blog/archives/04-2016/6
[25] http://backupsick.weebly.com/blog/category/all/17
[26] http://pilotggetts2.tk/rack-patching.html
[27] http://lortega.tk/network-patch-panel-vs-switch.html
[28] http://clusterdesign.org/fat-trees/
[29] http://www.networkworld.com/article/2226122/cisco-subnet/clos-networks–what-s-old-is-new-again.html
[30] http://docplayer.net/2548989-Per-packet-load-balanced-low-latency-routing-for-clos-based-data-center-networks.html
[31] https://en.wikipedia.org/wiki/Clos_network
[32] http://www.mediander.com/connects/7919595/clos-network/#!/
Annex 1.
Border_Router1
Border_Router1#show run
Building configuration…
Current configuration : 2035 bytes
!
!
version 15.2
service timestamps debug datetime msec
service timestamps log datetime msec
!
hostname Border_Router1
!
boot-start-marker
boot-end-marker
!
!
enable secret 5 $1$qooN$BMGDl8f4KigEhGgvWj.780
!
no aaa new-model
no ip icmp rate-limit unreachable
ip cef
!
!
no ip domain lookup
no ipv6 cef
!
!
multilink bundle-name authenticated
!
!
ip tcp synwait-time 5
!
!
interface Loopback0
description – TO Internet –
ip address 100.0.0.1 255.255.255.255
!
interface Loopback1
description – TO Internet –
ip address 8.8.8.8 255.255.255.255
!
interface FastEthernet0/0
description – TO SPINE1 Fa0/0 –
ip address 80.0.0.1 255.255.255.252
speed auto
duplex auto
!
interface FastEthernet0/1
description – TO SPINE2 Fa0/0 –
ip address 80.0.0.5 255.255.255.252
speed auto
duplex auto
!
interface FastEthernet1/0
description – TO SPINE3 Fa0/0 –
ip address 80.0.0.9 255.255.255.252
speed auto
duplex auto
!
interface FastEthernet1/1
description – TO SPINE4 Fa0/0 –
ip address 80.0.0.13 255.255.255.252
speed auto
duplex auto
!
interface FastEthernet2/0
no ip address
shutdown
speed auto
duplex auto
!
interface FastEthernet2/1
no ip address
shutdown
speed auto
duplex auto
!
router bgp 11
bgp router-id 100.0.0.1
bgp log-neighbor-changes
bgp bestpath as-path multipath-relax
network 8.8.8.8 mask 255.255.255.255
network 100.0.0.1 mask 255.255.255.255
neighbor 80.0.0.2 remote-as 101
neighbor 80.0.0.6 remote-as 102
neighbor 80.0.0.10 remote-as 103
neighbor 80.0.0.14 remote-as 104
maximum-paths 4
!
ip forward-protocol nd
!
!
no ip http server
no ip http secure-server
!
!
control-plane
!
!
line con 0
exec-timeout 0 0
privilege level 15
logging synchronous
stopbits 1
line aux 0
exec-timeout 0 0
privilege level 15
logging synchronous
stopbits 1
line vty 0 4
password licenta
logging synchronous
login
line vty 5 15
password licenta
logging synchronous
login
!
!
end
Border_Router1# show ip route
Gateway of last resort is 80.0.0.14 to network 0.0.0.0
B* 0.0.0.0/0 [20/0] via 80.0.0.14, 00:59:35
[20/0] via 80.0.0.10, 00:59:35
[20/0] via 80.0.0.6, 00:59:35
[20/0] via 80.0.0.2, 00:59:35
8.0.0.0/32 is subnetted, 2 subnets
B 8.8.4.4 [20/0] via 80.0.0.14, 00:58:04
[20/0] via 80.0.0.10, 00:58:04
[20/0] via 80.0.0.6, 00:58:04
[20/0] via 80.0.0.2, 00:58:04
C 8.8.8.8 is directly connected, Loopback1
10.0.0.0/32 is subnetted, 20 subnets
B 10.0.0.1 [20/0] via 80.0.0.2, 00:59:35
B 10.0.0.2 [20/0] via 80.0.0.6, 00:59:35
B 10.0.0.3 [20/0] via 80.0.0.10, 00:59:35
B 10.0.0.4 [20/0] via 80.0.0.14, 00:59:35
B 10.0.0.101 [20/0] via 80.0.0.14, 00:59:35
[20/0] via 80.0.0.10, 00:59:35
[20/0] via 80.0.0.6, 00:59:35
[20/0] via 80.0.0.2, 00:59:35
B 10.0.0.102 [20/0] via 80.0.0.14, 00:59:35
[20/0] via 80.0.0.10, 00:59:35
[20/0] via 80.0.0.6, 00:59:35
[20/0] via 80.0.0.2, 00:59:35
B 10.0.0.103 [20/0] via 80.0.0.14, 00:59:35
[20/0] via 80.0.0.10, 00:59:35
[20/0] via 80.0.0.6, 00:59:35
[20/0] via 80.0.0.2, 00:59:35
B 10.0.0.104 [20/0] via 80.0.0.14, 00:59:35
[20/0] via 80.0.0.10, 00:59:35
[20/0] via 80.0.0.6, 00:59:35
[20/0] via 80.0.0.2, 00:59:35
B 10.0.0.105 [20/0] via 80.0.0.14, 00:59:35
[20/0] via 80.0.0.10, 00:59:35
[20/0] via 80.0.0.6, 00:59:35
[20/0] via 80.0.0.2, 00:59:35
B 10.0.0.106 [20/0] via 80.0.0.14, 00:59:35
[20/0] via 80.0.0.10, 00:59:35
[20/0] via 80.0.0.6, 00:59:35
[20/0] via 80.0.0.2, 00:59:35
B 10.0.0.107 [20/0] via 80.0.0.14, 00:59:35
[20/0] via 80.0.0.10, 00:59:35
[20/0] via 80.0.0.6, 00:59:35
[20/0] via 80.0.0.2, 00:59:35
B 10.0.0.108 [20/0] via 80.0.0.14, 00:59:35
[20/0] via 80.0.0.10, 00:59:35
[20/0] via 80.0.0.6, 00:59:35
[20/0] via 80.0.0.2, 00:59:35
B 10.0.0.109 [20/0] via 80.0.0.14, 00:59:35
[20/0] via 80.0.0.10, 00:59:35
[20/0] via 80.0.0.6, 00:59:35
[20/0] via 80.0.0.2, 00:59:35
B 10.0.0.110 [20/0] via 80.0.0.14, 00:59:35
[20/0] via 80.0.0.10, 00:59:35
[20/0] via 80.0.0.6, 00:59:35
[20/0] via 80.0.0.2, 00:59:35
B 10.0.0.111 [20/0] via 80.0.0.14, 00:59:35
[20/0] via 80.0.0.10, 00:59:35
[20/0] via 80.0.0.6, 00:59:35
[20/0] via 80.0.0.2, 00:59:35
B 10.0.0.112 [20/0] via 80.0.0.14, 00:59:35
[20/0] via 80.0.0.10, 00:59:35
[20/0] via 80.0.0.6, 00:59:35
[20/0] via 80.0.0.2, 00:59:35
B 10.0.0.113 [20/0] via 80.0.0.14, 00:59:35
[20/0] via 80.0.0.10, 00:59:35
[20/0] via 80.0.0.6, 00:59:35
[20/0] via 80.0.0.2, 00:59:35
B 10.0.0.114 [20/0] via 80.0.0.14, 00:59:35
[20/0] via 80.0.0.10, 00:59:35
[20/0] via 80.0.0.6, 00:59:35
[20/0] via 80.0.0.2, 00:59:35
B 10.0.0.115 [20/0] via 80.0.0.14, 00:59:35
[20/0] via 80.0.0.10, 00:59:35
[20/0] via 80.0.0.6, 00:59:35
[20/0] via 80.0.0.2, 00:59:35
B 10.0.0.116 [20/0] via 80.0.0.14, 00:59:35
[20/0] via 80.0.0.10, 00:59:35
[20/0] via 80.0.0.6, 00:59:35
[20/0] via 80.0.0.2, 00:59:35
80.0.0.0/8 is variably subnetted, 8 subnets, 2 masks
C 80.0.0.0/30 is directly connected, FastEthernet0/0
L 80.0.0.1/32 is directly connected, FastEthernet0/0
C 80.0.0.4/30 is directly connected, FastEthernet0/1
L 80.0.0.5/32 is directly connected, FastEthernet0/1
C 80.0.0.8/30 is directly connected, FastEthernet1/0
L 80.0.0.9/32 is directly connected, FastEthernet1/0
C 80.0.0.12/30 is directly connected, FastEthernet1/1
L 80.0.0.13/32 is directly connected, FastEthernet1/1
100.0.0.0/32 is subnetted, 2 subnets
C 100.0.0.1 is directly connected, Loopback0
B 100.0.0.2 [20/0] via 80.0.0.14, 00:58:04
[20/0] via 80.0.0.10, 00:58:04
[20/0] via 80.0.0.6, 00:58:04
[20/0] via 80.0.0.2, 00:58:04
172.16.0.0/24 is subnetted, 16 subnets
B 172.16.101.0 [20/0] via 80.0.0.14, 00:59:35
[20/0] via 80.0.0.10, 00:59:35
[20/0] via 80.0.0.6, 00:59:35
[20/0] via 80.0.0.2, 00:59:35
B 172.16.102.0 [20/0] via 80.0.0.14, 00:59:35
[20/0] via 80.0.0.10, 00:59:35
[20/0] via 80.0.0.6, 00:59:35
[20/0] via 80.0.0.2, 00:59:35
B 172.16.103.0 [20/0] via 80.0.0.14, 00:59:35
[20/0] via 80.0.0.10, 00:59:35
[20/0] via 80.0.0.6, 00:59:35
[20/0] via 80.0.0.2, 00:59:35
B 172.16.104.0 [20/0] via 80.0.0.14, 00:59:35
[20/0] via 80.0.0.10, 00:59:35
[20/0] via 80.0.0.6, 00:59:35
[20/0] via 80.0.0.2, 00:59:35
B 172.16.105.0 [20/0] via 80.0.0.14, 00:59:35
[20/0] via 80.0.0.10, 00:59:35
[20/0] via 80.0.0.6, 00:59:35
[20/0] via 80.0.0.2, 00:59:35
B 172.16.106.0 [20/0] via 80.0.0.14, 00:59:35
[20/0] via 80.0.0.10, 00:59:35
[20/0] via 80.0.0.6, 00:59:35
[20/0] via 80.0.0.2, 00:59:35
B 172.16.107.0 [20/0] via 80.0.0.14, 00:59:35
[20/0] via 80.0.0.10, 00:59:35
[20/0] via 80.0.0.6, 00:59:35
[20/0] via 80.0.0.2, 00:59:35
B 172.16.108.0 [20/0] via 80.0.0.14, 00:59:35
[20/0] via 80.0.0.10, 00:59:35
[20/0] via 80.0.0.6, 00:59:35
[20/0] via 80.0.0.2, 00:59:35
B 172.16.109.0 [20/0] via 80.0.0.14, 00:59:35
[20/0] via 80.0.0.10, 00:59:35
[20/0] via 80.0.0.6, 00:59:35
[20/0] via 80.0.0.2, 00:59:35
B 172.16.110.0 [20/0] via 80.0.0.14, 00:59:35
[20/0] via 80.0.0.10, 00:59:35
[20/0] via 80.0.0.6, 00:59:35
[20/0] via 80.0.0.2, 00:59:35
B 172.16.111.0 [20/0] via 80.0.0.14, 00:59:35
[20/0] via 80.0.0.10, 00:59:35
[20/0] via 80.0.0.6, 00:59:35
[20/0] via 80.0.0.2, 00:59:35
B 172.16.112.0 [20/0] via 80.0.0.14, 00:59:35
[20/0] via 80.0.0.10, 00:59:35
[20/0] via 80.0.0.6, 00:59:35
[20/0] via 80.0.0.2, 00:59:35
B 172.16.113.0 [20/0] via 80.0.0.14, 00:59:35
[20/0] via 80.0.0.10, 00:59:35
[20/0] via 80.0.0.6, 00:59:35
[20/0] via 80.0.0.2, 00:59:35
B 172.16.114.0 [20/0] via 80.0.0.14, 00:59:35
[20/0] via 80.0.0.10, 00:59:35
[20/0] via 80.0.0.6, 00:59:35
[20/0] via 80.0.0.2, 00:59:35
B 172.16.115.0 [20/0] via 80.0.0.14, 00:59:35
[20/0] via 80.0.0.10, 00:59:35
[20/0] via 80.0.0.6, 00:59:35
[20/0] via 80.0.0.2, 00:59:35
B 172.16.116.0 [20/0] via 80.0.0.14, 00:59:35
[20/0] via 80.0.0.10, 00:59:35
[20/0] via 80.0.0.6, 00:59:35
[20/0] via 80.0.0.2, 00:59:35
Border_Router1#show ip rp protocols
*** IP Routing is NSF aware ***
Routing Protocol is "bgp 11"
Outgoing update filter list for all interfaces is not set
Incoming update filter list for all interfaces is not set
IGP synchronization is disabled
Automatic route summarization is disabled
Neighbor(s):
Address FiltIn FiltOut DistIn DistOut Weight RouteMap
80.0.0.2
80.0.0.6
80.0.0.10
80.0.0.14
Maximum path: 4
Routing Information Sources:
Gateway Distance Last Update
80.0.0.2 20 00:58:16
80.0.0.6 20 00:59:48
80.0.0.10 20 00:58:16
80.0.0.14 20 00:58:47
Distance: external 20 internal 200 local 200
Border_Router1#ping 172.16.116.1 source loopback 0
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 172.16.116.1, timeout is 2 seconds:
Packet sent with a source address of 100.0.0.1
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 36/39/44 ms
Border_Router1#traceroute 172.16.101.1 source loopback 0 probe 12
Type escape sequence to abort.
Tracing the route to 172.16.101.1
VRF info: (vrf in name/id, vrf out name/id)
1 80.0.0.2 [AS 101] 16 msec
80.0.0.6 [AS 101] 4 msec
80.0.0.10 [AS 101] 16 msec
80.0.0.14 [AS 101] 12 msec
80.0.0.2 [AS 101] 8 msec
80.0.0.6 [AS 101] 4 msec
80.0.0.10 [AS 101] 8 msec
80.0.0.14 [AS 101] 12 msec
80.0.0.2 [AS 101] 4 msec
80.0.0.6 [AS 101] 4 msec
80.0.0.10 [AS 101] 12 msec
80.0.0.14 [AS 101] 12 msec
2 192.168.1.1 [AS 101] 8 msec
192.168.2.1 [AS 101] 16 msec
192.168.3.1 [AS 101] 8 msec
192.168.4.1 [AS 101] 16 msec
192.168.1.1 [AS 101] 20 msec
192.168.2.1 [AS 101] 16 msec
192.168.3.1 [AS 101] 4 msec
192.168.4.1 [AS 101] 20 msec
192.168.1.1 [AS 101] 24 msec
192.168.2.1 [AS 101] 16 msec
192.168.3.1 [AS 101] 4 msec
192.168.4.1 [AS 101] 20 msec
3 172.16.101.1 [AS 201] 44 msec
Border_Router2
Border_Router2#show running-config
Building configuration…
Current configuration : 2036 bytes
!
!
version 15.2
service timestamps debug datetime msec
service timestamps log datetime msec
!
hostname Border_Router2
!
boot-start-marker
boot-end-marker
!
!
enable secret 5 $1$aFrJ$tFU2mf/8InAykn5ijj4zI.
!
no aaa new-model
no ip icmp rate-limit unreachable
ip cef
!
!
no ip domain lookup
no ipv6 cef
!
!
multilink bundle-name authenticated
!
!
ip tcp synwait-time 5
!
!
interface Loopback0
description – TO Internet –
ip address 100.0.0.2 255.255.255.255
!
interface Loopback1
description – TO Internet –
ip address 8.8.4.4 255.255.255.255
!
interface FastEthernet0/0
description – TO SPINE1 Fa0/1 –
ip address 80.0.1.1 255.255.255.252
speed auto
duplex auto
!
interface FastEthernet0/1
description – TO SPINE2 Fa0/1 –
ip address 80.0.1.5 255.255.255.252
speed auto
duplex auto
!
interface FastEthernet1/0
description – TO SPINE3 Fa0/1 –
ip address 80.0.1.9 255.255.255.252
speed auto
duplex auto
!
interface FastEthernet1/1
description – TO SPINE4 Fa0/1 –
ip address 80.0.1.13 255.255.255.252
speed auto
duplex auto
!
interface FastEthernet2/0
no ip address
shutdown
speed auto
duplex auto
!
interface FastEthernet2/1
no ip address
shutdown
speed auto
duplex auto
!
router bgp 12
bgp router-id 100.0.0.2
bgp log-neighbor-changes
bgp bestpath as-path multipath-relax
network 8.8.4.4 mask 255.255.255.255
network 100.0.0.2 mask 255.255.255.255
neighbor 80.0.1.2 remote-as 101
neighbor 80.0.1.6 remote-as 102
neighbor 80.0.1.10 remote-as 103
neighbor 80.0.1.14 remote-as 104
maximum-paths 4
!
ip forward-protocol nd
!
!
no ip http server
no ip http secure-server
!
!
control-plane
!
!
line con 0
exec-timeout 0 0
privilege level 15
logging synchronous
stopbits 1
line aux 0
exec-timeout 0 0
privilege level 15
logging synchronous
stopbits 1
line vty 0 4
password licenta
logging synchronous
login
line vty 5 15
password licenta
logging synchronous
login
!
!
end
Border_Router2# sh ip route
Gateway of last resort is 80.0.1.14 to network 0.0.0.0
B* 0.0.0.0/0 [20/0] via 80.0.1.14, 01:02:21
[20/0] via 80.0.1.10, 01:02:21
[20/0] via 80.0.1.6, 01:02:21
[20/0] via 80.0.1.2, 01:02:21
8.0.0.0/32 is subnetted, 2 subnets
C 8.8.4.4 is directly connected, Loopback1
B 8.8.8.8 [20/0] via 80.0.1.14, 01:02:21
[20/0] via 80.0.1.10, 01:02:21
[20/0] via 80.0.1.6, 01:02:21
[20/0] via 80.0.1.2, 01:02:21
10.0.0.0/32 is subnetted, 20 subnets
B 10.0.0.1 [20/0] via 80.0.1.2, 01:02:21
B 10.0.0.2 [20/0] via 80.0.1.6, 01:02:21
B 10.0.0.3 [20/0] via 80.0.1.10, 01:02:21
B 10.0.0.4 [20/0] via 80.0.1.14, 01:02:21
B 10.0.0.101 [20/0] via 80.0.1.14, 01:02:21
[20/0] via 80.0.1.10, 01:02:21
[20/0] via 80.0.1.6, 01:02:21
[20/0] via 80.0.1.2, 01:02:21
B 10.0.0.102 [20/0] via 80.0.1.14, 01:02:21
[20/0] via 80.0.1.10, 01:02:21
[20/0] via 80.0.1.6, 01:02:21
[20/0] via 80.0.1.2, 01:02:21
B 10.0.0.103 [20/0] via 80.0.1.14, 01:02:21
[20/0] via 80.0.1.10, 01:02:21
[20/0] via 80.0.1.6, 01:02:21
[20/0] via 80.0.1.2, 01:02:21
B 10.0.0.104 [20/0] via 80.0.1.14, 01:02:21
[20/0] via 80.0.1.10, 01:02:21
[20/0] via 80.0.1.6, 01:02:21
[20/0] via 80.0.1.2, 01:02:21
B 10.0.0.105 [20/0] via 80.0.1.14, 01:02:21
[20/0] via 80.0.1.10, 01:02:21
[20/0] via 80.0.1.6, 01:02:21
[20/0] via 80.0.1.2, 01:02:21
B 10.0.0.106 [20/0] via 80.0.1.14, 01:02:21
[20/0] via 80.0.1.10, 01:02:21
[20/0] via 80.0.1.6, 01:02:21
[20/0] via 80.0.1.2, 01:02:21
B 10.0.0.107 [20/0] via 80.0.1.14, 01:02:21
[20/0] via 80.0.1.10, 01:02:21
[20/0] via 80.0.1.6, 01:02:21
[20/0] via 80.0.1.2, 01:02:21
B 10.0.0.108 [20/0] via 80.0.1.14, 01:02:21
[20/0] via 80.0.1.10, 01:02:21
[20/0] via 80.0.1.6, 01:02:21
[20/0] via 80.0.1.2, 01:02:21
B 10.0.0.109 [20/0] via 80.0.1.14, 01:02:21
[20/0] via 80.0.1.10, 01:02:21
[20/0] via 80.0.1.6, 01:02:21
[20/0] via 80.0.1.2, 01:02:21
B 10.0.0.110 [20/0] via 80.0.1.14, 01:02:21
[20/0] via 80.0.1.10, 01:02:21
[20/0] via 80.0.1.6, 01:02:21
[20/0] via 80.0.1.2, 01:02:21
B 10.0.0.111 [20/0] via 80.0.1.14, 01:02:21
[20/0] via 80.0.1.10, 01:02:21
[20/0] via 80.0.1.6, 01:02:21
[20/0] via 80.0.1.2, 01:02:21
B 10.0.0.112 [20/0] via 80.0.1.14, 01:02:21
[20/0] via 80.0.1.10, 01:02:21
[20/0] via 80.0.1.6, 01:02:21
[20/0] via 80.0.1.2, 01:02:21
B 10.0.0.113 [20/0] via 80.0.1.14, 01:02:21
[20/0] via 80.0.1.10, 01:02:21
[20/0] via 80.0.1.6, 01:02:21
[20/0] via 80.0.1.2, 01:02:21
B 10.0.0.114 [20/0] via 80.0.1.14, 01:02:21
[20/0] via 80.0.1.10, 01:02:21
[20/0] via 80.0.1.6, 01:02:21
[20/0] via 80.0.1.2, 01:02:21
B 10.0.0.115 [20/0] via 80.0.1.14, 01:02:21
[20/0] via 80.0.1.10, 01:02:21
[20/0] via 80.0.1.6, 01:02:21
[20/0] via 80.0.1.2, 01:02:21
B 10.0.0.116 [20/0] via 80.0.1.14, 01:02:21
[20/0] via 80.0.1.10, 01:02:21
[20/0] via 80.0.1.6, 01:02:21
[20/0] via 80.0.1.2, 01:02:21
80.0.0.0/8 is variably subnetted, 8 subnets, 2 masks
C 80.0.1.0/30 is directly connected, FastEthernet0/0
L 80.0.1.1/32 is directly connected, FastEthernet0/0
C 80.0.1.4/30 is directly connected, FastEthernet0/1
L 80.0.1.5/32 is directly connected, FastEthernet0/1
C 80.0.1.8/30 is directly connected, FastEthernet1/0
L 80.0.1.9/32 is directly connected, FastEthernet1/0
C 80.0.1.12/30 is directly connected, FastEthernet1/1
L 80.0.1.13/32 is directly connected, FastEthernet1/1
100.0.0.0/32 is subnetted, 2 subnets
B 100.0.0.1 [20/0] via 80.0.1.14, 01:02:21
[20/0] via 80.0.1.10, 01:02:21
[20/0] via 80.0.1.6, 01:02:21
[20/0] via 80.0.1.2, 01:02:21
C 100.0.0.2 is directly connected, Loopback0
172.16.0.0/24 is subnetted, 16 subnets
B 172.16.101.0 [20/0] via 80.0.1.14, 01:02:21
[20/0] via 80.0.1.10, 01:02:21
[20/0] via 80.0.1.6, 01:02:21
[20/0] via 80.0.1.2, 01:02:21
B 172.16.102.0 [20/0] via 80.0.1.14, 01:02:21
[20/0] via 80.0.1.10, 01:02:21
[20/0] via 80.0.1.6, 01:02:21
[20/0] via 80.0.1.2, 01:02:21
B 172.16.103.0 [20/0] via 80.0.1.14, 01:02:21
[20/0] via 80.0.1.10, 01:02:21
[20/0] via 80.0.1.6, 01:02:21
[20/0] via 80.0.1.2, 01:02:21
B 172.16.104.0 [20/0] via 80.0.1.14, 01:02:21
[20/0] via 80.0.1.10, 01:02:21
[20/0] via 80.0.1.6, 01:02:21
[20/0] via 80.0.1.2, 01:02:21
B 172.16.105.0 [20/0] via 80.0.1.14, 01:02:21
[20/0] via 80.0.1.10, 01:02:21
[20/0] via 80.0.1.6, 01:02:21
[20/0] via 80.0.1.2, 01:02:21
B 172.16.106.0 [20/0] via 80.0.1.14, 01:02:21
[20/0] via 80.0.1.10, 01:02:21
[20/0] via 80.0.1.6, 01:02:21
[20/0] via 80.0.1.2, 01:02:21
B 172.16.107.0 [20/0] via 80.0.1.14, 01:02:21
[20/0] via 80.0.1.10, 01:02:21
[20/0] via 80.0.1.6, 01:02:21
[20/0] via 80.0.1.2, 01:02:21
B 172.16.108.0 [20/0] via 80.0.1.14, 01:02:21
[20/0] via 80.0.1.10, 01:02:21
[20/0] via 80.0.1.6, 01:02:21
[20/0] via 80.0.1.2, 01:02:21
B 172.16.109.0 [20/0] via 80.0.1.14, 01:02:21
[20/0] via 80.0.1.10, 01:02:21
[20/0] via 80.0.1.6, 01:02:21
[20/0] via 80.0.1.2, 01:02:21
B 172.16.110.0 [20/0] via 80.0.1.14, 01:02:21
[20/0] via 80.0.1.10, 01:02:21
[20/0] via 80.0.1.6, 01:02:21
[20/0] via 80.0.1.2, 01:02:21
B 172.16.111.0 [20/0] via 80.0.1.14, 01:02:21
[20/0] via 80.0.1.10, 01:02:21
[20/0] via 80.0.1.6, 01:02:21
[20/0] via 80.0.1.2, 01:02:21
B 172.16.112.0 [20/0] via 80.0.1.14, 01:02:21
[20/0] via 80.0.1.10, 01:02:21
[20/0] via 80.0.1.6, 01:02:21
[20/0] via 80.0.1.2, 01:02:21
B 172.16.113.0 [20/0] via 80.0.1.14, 01:02:21
[20/0] via 80.0.1.10, 01:02:21
[20/0] via 80.0.1.6, 01:02:21
[20/0] via 80.0.1.2, 01:02:21
B 172.16.114.0 [20/0] via 80.0.1.14, 01:02:21
[20/0] via 80.0.1.10, 01:02:21
[20/0] via 80.0.1.6, 01:02:21
[20/0] via 80.0.1.2, 01:02:21
B 172.16.115.0 [20/0] via 80.0.1.14, 01:02:21
[20/0] via 80.0.1.10, 01:02:21
[20/0] via 80.0.1.6, 01:02:21
[20/0] via 80.0.1.2, 01:02:21
B 172.16.116.0 [20/0] via 80.0.1.14, 01:02:21
[20/0] via 80.0.1.10, 01:02:21
[20/0] via 80.0.1.6, 01:02:21
[20/0] via 80.0.1.2, 01:02:21
Border_Router2# sh ip protocols
*** IP Routing is NSF aware ***
Routing Protocol is "bgp 12"
Outgoing update filter list for all interfaces is not set
Incoming update filter list for all interfaces is not set
IGP synchronization is disabled
Automatic route summarization is disabled
Neighbor(s):
Address FiltIn FiltOut DistIn DistOut Weight RouteMap
80.0.1.2
80.0.1.6
80.0.1.10
80.0.1.14
Maximum path: 4
Routing Information Sources:
Gateway Distance Last Update
80.0.1.2 20 01:02:41
80.0.1.6 20 01:02:41
80.0.1.10 20 01:02:41
80.0.1.14 20 01:02:41
Distance: external 20 internal 200 local 200
Border_Router2#ping 172.16.116.1 source loopback 0
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 172.16.116.1, timeout is 2 seconds:
Packet sent with a source address of 100.0.0.2
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 12/24/32 ms
Border_Router2#traceroute 172.16.101.1 source loopback 0 probe 12
Type escape sequence to abort.
Tracing the route to 172.16.101.1
VRF info: (vrf in name/id, vrf out name/id)
1 80.0.1.2 [AS 101] 40 msec
80.0.1.6 [AS 101] 20 msec
80.0.1.10 [AS 101] 8 msec
80.0.1.14 [AS 101] 16 msec
80.0.1.2 [AS 101] 8 msec
80.0.1.6 [AS 101] 16 msec
80.0.1.10 [AS 101] 8 msec
80.0.1.14 [AS 101] 16 msec
80.0.1.2 [AS 101] 8 msec
80.0.1.6 [AS 101] 16 msec
80.0.1.10 [AS 101] 8 msec
80.0.1.14 [AS 101] 12 msec
2 192.168.1.1 [AS 101] 8 msec
192.168.2.1 [AS 101] 28 msec
192.168.3.1 [AS 101] 12 msec
192.168.4.1 [AS 101] 28 msec
192.168.1.1 [AS 101] 24 msec
192.168.2.1 [AS 101] 24 msec
192.168.3.1 [AS 101] 28 msec
192.168.4.1 [AS 101] 24 msec
192.168.1.1 [AS 101] 32 msec
192.168.2.1 [AS 101] 24 msec
192.168.3.1 [AS 101] 28 msec
192.168.4.1 [AS 101] 24 msec
3 172.16.101.1 [AS 201] 28 msec
Spine 1
SPINE1#sh running-config
Building configuration…
Current configuration : 3899 bytes
!
!
version 15.2
service timestamps debug datetime msec
service timestamps log datetime msec
!
hostname SPINE1
!
boot-start-marker
boot-end-marker
!
!
enable secret 5 $1$wMMZ$Pt7OES5dk5hReDmVvlvIS.
!
no aaa new-model
no ip icmp rate-limit unreachable
ip cef
!
!
no ip domain lookup
no ipv6 cef
!
!
multilink bundle-name authenticated
!
!
ip tcp synwait-time 5
!
!
interface Loopback0
ip address 10.0.0.1 255.255.255.255
!
interface FastEthernet0/0
description –TO WAN BR1 Fa0/0 –
ip address 80.0.0.2 255.255.255.252
speed auto
duplex auto
!
interface FastEthernet0/1
description –TO WAN BR2 Fa0/0 –
ip address 80.0.1.2 255.255.255.252
speed auto
duplex auto
!
interface Ethernet1/0
description – TO LEAF1 eth1/0 –
ip address 192.168.1.0 255.255.255.254
duplex full
!
interface Ethernet1/1
description – TO LEAF2 eth1/0 –
ip address 192.168.1.2 255.255.255.254
duplex full
!
interface Ethernet1/2
description – TO LEAF3 eth1/0 –
ip address 192.168.1.4 255.255.255.254
duplex full
!
interface Ethernet1/3
description – TO LEAF4 eth1/0 –
ip address 192.168.1.6 255.255.255.254
duplex full
!
interface Ethernet1/4
description – TO LEAF5 eth1/0 –
ip address 192.168.1.8 255.255.255.254
duplex full
!
interface Ethernet1/5
description – TO LEAF6 eth1/0 –
ip address 192.168.1.10 255.255.255.254
duplex full
!
interface Ethernet1/6
description – TO LEAF7 eth1/0 –
ip address 192.168.1.12 255.255.255.254
duplex full
!
interface Ethernet1/7
description – TO LEAF8 eth1/0 –
ip address 192.168.1.14 255.255.255.254
duplex full
!
interface Ethernet2/0
description – TO LEAF9 eth1/0 –
ip address 192.168.1.16 255.255.255.254
duplex full
!
interface Ethernet2/1
description – TO LEAF10 eth1/0 –
ip address 192.168.1.18 255.255.255.254
duplex full
!
interface Ethernet2/2
description – TO LEAF11 eth1/0 –
ip address 192.168.1.20 255.255.255.254
duplex full
!
interface Ethernet2/3
description – TO LEAF12 eth1/0 –
ip address 192.168.1.22 255.255.255.254
duplex full
!
interface Ethernet2/4
description – TO LEAF13 eth1/0 –
ip address 192.168.1.24 255.255.255.254
duplex full
!
interface Ethernet2/5
description – TO LEAF14 eth1/0 –
ip address 192.168.1.26 255.255.255.254
duplex full
!
interface Ethernet2/6
description – TO LEAF15 eth1/0 –
ip address 192.168.1.28 255.255.255.254
duplex full
!
interface Ethernet2/7
description – TO LEAF16 eth1/0 –
ip address 192.168.1.30 255.255.255.254
duplex full
!
router bgp 101
bgp router-id 10.0.0.1
bgp log-neighbor-changes
bgp bestpath as-path multipath-relax
network 0.0.0.0
network 10.0.0.1 mask 255.255.255.255
network 172.16.11.0 mask 255.255.255.0
neighbor 80.0.0.1 remote-as 11
neighbor 80.0.1.1 remote-as 12
neighbor 192.168.1.1 remote-as 201
neighbor 192.168.1.3 remote-as 202
neighbor 192.168.1.5 remote-as 203
neighbor 192.168.1.7 remote-as 204
neighbor 192.168.1.9 remote-as 205
neighbor 192.168.1.11 remote-as 206
neighbor 192.168.1.13 remote-as 207
neighbor 192.168.1.15 remote-as 208
neighbor 192.168.1.17 remote-as 209
neighbor 192.168.1.19 remote-as 210
neighbor 192.168.1.21 remote-as 211
neighbor 192.168.1.23 remote-as 212
neighbor 192.168.1.25 remote-as 213
neighbor 192.168.1.27 remote-as 214
neighbor 192.168.1.29 remote-as 215
neighbor 192.168.1.31 remote-as 216
maximum-paths 16
!
ip forward-protocol nd
!
!
no ip http server
no ip http secure-server
ip route 0.0.0.0 0.0.0.0 80.0.0.1 name TO_BORDER_ROUTER
!
!
control-plane
!
!
line con 0
exec-timeout 0 0
privilege level 15
logging synchronous
stopbits 1
line aux 0
exec-timeout 0 0
privilege level 15
logging synchronous
stopbits 1
line vty 0 4
password licenta
logging synchronous
login
line vty 5 15
password licenta
logging synchronous
login
!
!
end
SPINE1#sh ip route
Gateway of last resort is 80.0.0.1 to network 0.0.0.0
S* 0.0.0.0/0 [1/0] via 80.0.0.1
8.0.0.0/32 is subnetted, 2 subnets
B 8.8.4.4 [20/0] via 80.0.1.1, 01:10:36
B 8.8.8.8 [20/0] via 80.0.0.1, 01:11:07
10.0.0.0/32 is subnetted, 20 subnets
C 10.0.0.1 is directly connected, Loopback0
B 10.0.0.2 [20/0] via 192.168.1.31, 01:17:25
[20/0] via 192.168.1.29, 01:17:25
[20/0] via 192.168.1.27, 01:17:25
[20/0] via 192.168.1.25, 01:17:25
[20/0] via 192.168.1.23, 01:17:25
[20/0] via 192.168.1.21, 01:17:25
[20/0] via 192.168.1.19, 01:17:25
[20/0] via 192.168.1.17, 01:17:25
[20/0] via 192.168.1.15, 01:17:25
[20/0] via 192.168.1.13, 01:17:25
[20/0] via 192.168.1.11, 01:17:25
[20/0] via 192.168.1.9, 01:17:25
[20/0] via 192.168.1.7, 01:17:25
[20/0] via 192.168.1.5, 01:17:25
[20/0] via 192.168.1.3, 01:17:25
[20/0] via 192.168.1.1, 01:17:25
B 10.0.0.3 [20/0] via 192.168.1.31, 01:17:25
[20/0] via 192.168.1.29, 01:17:25
[20/0] via 192.168.1.27, 01:17:25
[20/0] via 192.168.1.25, 01:17:25
[20/0] via 192.168.1.23, 01:17:25
[20/0] via 192.168.1.21, 01:17:25
[20/0] via 192.168.1.19, 01:17:25
[20/0] via 192.168.1.17, 01:17:25
[20/0] via 192.168.1.15, 01:17:25
[20/0] via 192.168.1.13, 01:17:25
[20/0] via 192.168.1.11, 01:17:25
[20/0] via 192.168.1.9, 01:17:25
[20/0] via 192.168.1.7, 01:17:25
[20/0] via 192.168.1.5, 01:17:25
[20/0] via 192.168.1.3, 01:17:25
[20/0] via 192.168.1.1, 01:17:25
B 10.0.0.4 [20/0] via 192.168.1.31, 01:10:51
[20/0] via 192.168.1.29, 01:10:51
[20/0] via 192.168.1.27, 01:10:51
[20/0] via 192.168.1.25, 01:10:51
[20/0] via 192.168.1.23, 01:10:51
[20/0] via 192.168.1.21, 01:10:51
[20/0] via 192.168.1.19, 01:10:51
[20/0] via 192.168.1.17, 01:10:51
[20/0] via 192.168.1.15, 01:10:51
[20/0] via 192.168.1.13, 01:10:51
[20/0] via 192.168.1.11, 01:10:51
[20/0] via 192.168.1.9, 01:10:51
[20/0] via 192.168.1.7, 01:10:51
[20/0] via 192.168.1.5, 01:10:51
[20/0] via 192.168.1.3, 01:10:51
[20/0] via 192.168.1.1, 01:10:51
B 10.0.0.101 [20/0] via 192.168.1.1, 01:20:39
B 10.0.0.102 [20/0] via 192.168.1.3, 01:20:39
B 10.0.0.103 [20/0] via 192.168.1.5, 01:20:39
B 10.0.0.104 [20/0] via 192.168.1.7, 01:20:39
B 10.0.0.105 [20/0] via 192.168.1.9, 01:20:39
B 10.0.0.106 [20/0] via 192.168.1.11, 01:20:39
B 10.0.0.107 [20/0] via 192.168.1.13, 01:20:39
B 10.0.0.108 [20/0] via 192.168.1.15, 01:20:39
B 10.0.0.109 [20/0] via 192.168.1.17, 01:20:39
B 10.0.0.110 [20/0] via 192.168.1.19, 01:20:39
B 10.0.0.111 [20/0] via 192.168.1.21, 01:20:39
B 10.0.0.112 [20/0] via 192.168.1.23, 01:20:39
B 10.0.0.113 [20/0] via 192.168.1.25, 01:20:39
B 10.0.0.114 [20/0] via 192.168.1.27, 01:20:39
B 10.0.0.115 [20/0] via 192.168.1.29, 01:20:39
B 10.0.0.116 [20/0] via 192.168.1.31, 01:20:39
80.0.0.0/8 is variably subnetted, 4 subnets, 2 masks
C 80.0.0.0/30 is directly connected, FastEthernet0/0
L 80.0.0.2/32 is directly connected, FastEthernet0/0
C 80.0.1.0/30 is directly connected, FastEthernet0/1
L 80.0.1.2/32 is directly connected, FastEthernet0/1
100.0.0.0/32 is subnetted, 2 subnets
B 100.0.0.1 [20/0] via 80.0.0.1, 01:11:07
B 100.0.0.2 [20/0] via 80.0.1.1, 01:10:36
172.16.0.0/24 is subnetted, 16 subnets
B 172.16.101.0 [20/0] via 192.168.1.1, 01:20:39
B 172.16.102.0 [20/0] via 192.168.1.3, 01:20:39
B 172.16.103.0 [20/0] via 192.168.1.5, 01:20:39
B 172.16.104.0 [20/0] via 192.168.1.7, 01:20:39
B 172.16.105.0 [20/0] via 192.168.1.9, 01:20:39
B 172.16.106.0 [20/0] via 192.168.1.11, 01:20:39
B 172.16.107.0 [20/0] via 192.168.1.13, 01:20:39
B 172.16.108.0 [20/0] via 192.168.1.15, 01:20:39
B 172.16.109.0 [20/0] via 192.168.1.17, 01:20:39
B 172.16.110.0 [20/0] via 192.168.1.19, 01:20:39
B 172.16.111.0 [20/0] via 192.168.1.21, 01:20:39
B 172.16.112.0 [20/0] via 192.168.1.23, 01:20:39
B 172.16.113.0 [20/0] via 192.168.1.25, 01:20:39
B 172.16.114.0 [20/0] via 192.168.1.27, 01:20:39
B 172.16.115.0 [20/0] via 192.168.1.29, 01:20:39
B 172.16.116.0 [20/0] via 192.168.1.31, 01:20:39
192.168.1.0/24 is variably subnetted, 32 subnets, 2 masks
C 192.168.1.0/31 is directly connected, Ethernet1/0
L 192.168.1.0/32 is directly connected, Ethernet1/0
C 192.168.1.2/31 is directly connected, Ethernet1/1
L 192.168.1.2/32 is directly connected, Ethernet1/1
C 192.168.1.4/31 is directly connected, Ethernet1/2
L 192.168.1.4/32 is directly connected, Ethernet1/2
C 192.168.1.6/31 is directly connected, Ethernet1/3
L 192.168.1.6/32 is directly connected, Ethernet1/3
C 192.168.1.8/31 is directly connected, Ethernet1/4
L 192.168.1.8/32 is directly connected, Ethernet1/4
C 192.168.1.10/31 is directly connected, Ethernet1/5
L 192.168.1.10/32 is directly connected, Ethernet1/5
C 192.168.1.12/31 is directly connected, Ethernet1/6
L 192.168.1.12/32 is directly connected, Ethernet1/6
C 192.168.1.14/31 is directly connected, Ethernet1/7
L 192.168.1.14/32 is directly connected, Ethernet1/7
C 192.168.1.16/31 is directly connected, Ethernet2/0
L 192.168.1.16/32 is directly connected, Ethernet2/0
C 192.168.1.18/31 is directly connected, Ethernet2/1
L 192.168.1.18/32 is directly connected, Ethernet2/1
C 192.168.1.20/31 is directly connected, Ethernet2/2
L 192.168.1.20/32 is directly connected, Ethernet2/2
C 192.168.1.22/31 is directly connected, Ethernet2/3
L 192.168.1.22/32 is directly connected, Ethernet2/3
C 192.168.1.24/31 is directly connected, Ethernet2/4
L 192.168.1.24/32 is directly connected, Ethernet2/4
C 192.168.1.26/31 is directly connected, Ethernet2/5
L 192.168.1.26/32 is directly connected, Ethernet2/5
C 192.168.1.28/31 is directly connected, Ethernet2/6
L 192.168.1.28/32 is directly connected, Ethernet2/6
C 192.168.1.30/31 is directly connected, Ethernet2/7
L 192.168.1.30/32 is directly connected, Ethernet2/7
SPINE1#sh ip protocols
*** IP Routing is NSF aware ***
Routing Protocol is "bgp 101"
Outgoing update filter list for all interfaces is not set
Incoming update filter list for all interfaces is not set
IGP synchronization is disabled
Automatic route summarization is disabled
Neighbor(s):
Address FiltIn FiltOut DistIn DistOut Weight RouteMap
80.0.0.1
80.0.1.1
192.168.1.1
192.168.1.3
192.168.1.5
192.168.1.7
192.168.1.9
192.168.1.11
192.168.1.13
192.168.1.15
192.168.1.17
192.168.1.19
192.168.1.21
192.168.1.23
Address FiltIn FiltOut DistIn DistOut Weight RouteMap
192.168.1.25
192.168.1.27
192.168.1.29
192.168.1.31
Maximum path: 16
Routing Information Sources:
Gateway Distance Last Update
80.0.1.1 20 01:10:43
80.0.0.1 20 01:11:14
192.168.1.9 20 01:10:44
192.168.1.11 20 01:16:41
192.168.1.13 20 01:10:44
192.168.1.15 20 01:12:16
192.168.1.1 20 01:16:41
192.168.1.3 20 01:16:41
192.168.1.5 20 01:10:44
192.168.1.7 20 01:16:41
192.168.1.25 20 01:10:59
192.168.1.27 20 01:10:44
192.168.1.29 20 01:10:44
192.168.1.31 20 01:10:44
192.168.1.17 20 01:10:45
Gateway Distance Last Update
192.168.1.19 20 01:11:46
192.168.1.21 20 01:10:45
192.168.1.23 20 01:11:15
Distance: external 20 internal 200 local 200
SPINE1#ping 172.16.116.1 source loopback 0
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 172.16.116.1, timeout is 2 seconds:
Packet sent with a source address of 10.0.0.1
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 12/28/36 ms
SPINE1#traceroute 172.16.101.1 source loopback 0 probe 12
Type escape sequence to abort.
Tracing the route to 172.16.101.1
VRF info: (vrf in name/id, vrf out name/id)
1 192.168.1.1 16 msec
2 172.16.101.1 [AS 201] 4 msec
Spine 4
SPINE4#sh run
Building configuration…
Current configuration : 3864 bytes
!
!
version 15.2
service timestamps debug datetime msec
service timestamps log datetime msec
!
hostname SPINE4
!
boot-start-marker
boot-end-marker
!
!
enable secret 5 $1$Dq8L$mxQZkGpouCz7WL7uQhyJX.
!
no aaa new-model
no ip icmp rate-limit unreachable
ip cef
!
!
no ip domain lookup
no ipv6 cef
!
!
multilink bundle-name authenticated
!
!
ip tcp synwait-time 5
!
!
interface Loopback0
ip address 10.0.0.4 255.255.255.255
!
interface FastEthernet0/0
description –TO WAN BR1 Fa1/1 –
ip address 80.0.0.14 255.255.255.252
speed auto
duplex auto
!
interface FastEthernet0/1
description –TO WAN BR2 Fa0/1 –
ip address 80.0.1.14 255.255.255.252
speed auto
duplex auto
!
interface Ethernet1/0
description – TO LEAF1 eth1/3 –
ip address 192.168.4.0 255.255.255.254
duplex full
!
interface Ethernet1/1
description – TO LEAF2 eth1/3 –
ip address 192.168.4.2 255.255.255.254
duplex full
!
interface Ethernet1/2
description – TO LEAF3 eth1/3 –
ip address 192.168.4.4 255.255.255.254
duplex full
!
interface Ethernet1/3
description – TO LEAF4 eth1/3 –
ip address 192.168.4.6 255.255.255.254
duplex full
!
interface Ethernet1/4
description – TO LEAF5 eth1/3 –
ip address 192.168.4.8 255.255.255.254
duplex full
!
interface Ethernet1/5
description – TO LEAF6 eth1/3 –
ip address 192.168.4.10 255.255.255.254
duplex full
!
interface Ethernet1/6
description – TO LEAF7 eth1/3 –
ip address 192.168.4.12 255.255.255.254
duplex full
!
interface Ethernet1/7
description – TO LEAF8 eth1/3 –
ip address 192.168.4.14 255.255.255.254
duplex full
!
interface Ethernet2/0
description – TO LEAF9 eth1/3 –
ip address 192.168.4.16 255.255.255.254
duplex full
!
interface Ethernet2/1
description – TO LEAF10 eth1/3 –
ip address 192.168.4.18 255.255.255.254
duplex full
!
interface Ethernet2/2
description – TO LEAF11 eth1/3 –
ip address 192.168.4.20 255.255.255.254
duplex full
!
interface Ethernet2/3
description – TO LEAF12 eth1/3 –
ip address 192.168.4.22 255.255.255.254
duplex full
!
interface Ethernet2/4
description – TO LEAF13 eth1/3 –
ip address 192.168.4.24 255.255.255.254
duplex full
!
interface Ethernet2/5
description – TO LEAF14 eth1/3 –
ip address 192.168.4.26 255.255.255.254
duplex full
!
interface Ethernet2/6
description – TO LEAF15 eth1/3 –
ip address 192.168.4.28 255.255.255.254
duplex full
!
interface Ethernet2/7
description – TO LEAF16 eth1/3 –
ip address 192.168.4.30 255.255.255.254
duplex full
!
router bgp 104
bgp router-id 10.0.0.4
bgp log-neighbor-changes
bgp bestpath as-path multipath-relax
network 0.0.0.0
network 10.0.0.4 mask 255.255.255.255
neighbor 80.0.0.13 remote-as 11
neighbor 80.0.1.13 remote-as 12
neighbor 192.168.4.1 remote-as 201
neighbor 192.168.4.3 remote-as 202
neighbor 192.168.4.5 remote-as 203
neighbor 192.168.4.7 remote-as 204
neighbor 192.168.4.9 remote-as 205
neighbor 192.168.4.11 remote-as 206
neighbor 192.168.4.13 remote-as 207
neighbor 192.168.4.15 remote-as 208
neighbor 192.168.4.17 remote-as 209
neighbor 192.168.4.19 remote-as 210
neighbor 192.168.4.21 remote-as 211
neighbor 192.168.4.23 remote-as 212
neighbor 192.168.4.25 remote-as 213
neighbor 192.168.4.27 remote-as 214
neighbor 192.168.4.29 remote-as 215
neighbor 192.168.4.31 remote-as 216
maximum-paths 16
!
ip forward-protocol nd
!
!
no ip http server
no ip http secure-server
ip route 0.0.0.0 0.0.0.0 80.0.0.13 name TO_BORDER_ROUTER
!
!
control-plane
!
!
line con 0
exec-timeout 0 0
privilege level 15
logging synchronous
stopbits 1
line aux 0
exec-timeout 0 0
privilege level 15
logging synchronous
stopbits 1
line vty 0 4
password licenta
logging synchronous
login
line vty 5 15
password licenta
logging synchronous
login
!
!
end
SPINE4#i sh ip route
Gateway of last resort is 80.0.0.13 to network 0.0.0.0
S* 0.0.0.0/0 [1/0] via 80.0.0.13
8.0.0.0/32 is subnetted, 2 subnets
B 8.8.4.4 [20/0] via 80.0.1.13, 01:13:25
B 8.8.8.8 [20/0] via 80.0.0.13, 01:13:56
10.0.0.0/32 is subnetted, 20 subnets
B 10.0.0.1 [20/0] via 192.168.4.31, 01:20:14
[20/0] via 192.168.4.29, 01:20:14
[20/0] via 192.168.4.27, 01:20:14
[20/0] via 192.168.4.25, 01:20:14
[20/0] via 192.168.4.23, 01:20:14
[20/0] via 192.168.4.21, 01:20:14
[20/0] via 192.168.4.19, 01:20:14
[20/0] via 192.168.4.17, 01:20:14
[20/0] via 192.168.4.15, 01:20:14
[20/0] via 192.168.4.13, 01:20:14
[20/0] via 192.168.4.11, 01:20:14
[20/0] via 192.168.4.9, 01:20:14
[20/0] via 192.168.4.7, 01:20:14
[20/0] via 192.168.4.5, 01:20:14
[20/0] via 192.168.4.3, 01:20:14
[20/0] via 192.168.4.1, 01:20:14
B 10.0.0.2 [20/0] via 192.168.4.31, 01:20:14
[20/0] via 192.168.4.29, 01:20:14
[20/0] via 192.168.4.27, 01:20:14
[20/0] via 192.168.4.25, 01:20:14
[20/0] via 192.168.4.23, 01:20:14
[20/0] via 192.168.4.21, 01:20:14
[20/0] via 192.168.4.19, 01:20:14
[20/0] via 192.168.4.17, 01:20:14
[20/0] via 192.168.4.15, 01:20:14
[20/0] via 192.168.4.13, 01:20:14
[20/0] via 192.168.4.11, 01:20:14
[20/0] via 192.168.4.9, 01:20:14
[20/0] via 192.168.4.7, 01:20:14
[20/0] via 192.168.4.5, 01:20:14
[20/0] via 192.168.4.3, 01:20:14
[20/0] via 192.168.4.1, 01:20:14
B 10.0.0.3 [20/0] via 192.168.4.31, 01:20:14
[20/0] via 192.168.4.29, 01:20:14
[20/0] via 192.168.4.27, 01:20:14
[20/0] via 192.168.4.25, 01:20:14
[20/0] via 192.168.4.23, 01:20:14
[20/0] via 192.168.4.21, 01:20:14
[20/0] via 192.168.4.19, 01:20:14
[20/0] via 192.168.4.17, 01:20:14
[20/0] via 192.168.4.15, 01:20:14
[20/0] via 192.168.4.13, 01:20:14
[20/0] via 192.168.4.11, 01:20:14
[20/0] via 192.168.4.9, 01:20:14
[20/0] via 192.168.4.7, 01:20:14
[20/0] via 192.168.4.5, 01:20:14
[20/0] via 192.168.4.3, 01:20:14
[20/0] via 192.168.4.1, 01:20:14
C 10.0.0.4 is directly connected, Loopback0
B 10.0.0.101 [20/0] via 192.168.4.1, 01:20:55
B 10.0.0.102 [20/0] via 192.168.4.3, 01:20:55
B 10.0.0.103 [20/0] via 192.168.4.5, 01:20:55
B 10.0.0.104 [20/0] via 192.168.4.7, 01:20:55
B 10.0.0.105 [20/0] via 192.168.4.9, 01:20:55
B 10.0.0.106 [20/0] via 192.168.4.11, 01:20:55
B 10.0.0.107 [20/0] via 192.168.4.13, 01:20:55
B 10.0.0.108 [20/0] via 192.168.4.15, 01:20:55
B 10.0.0.109 [20/0] via 192.168.4.17, 01:20:55
B 10.0.0.110 [20/0] via 192.168.4.19, 01:20:55
B 10.0.0.111 [20/0] via 192.168.4.21, 01:20:55
B 10.0.0.112 [20/0] via 192.168.4.23, 01:20:55
B 10.0.0.113 [20/0] via 192.168.4.25, 01:20:55
B 10.0.0.114 [20/0] via 192.168.4.27, 01:20:55
B 10.0.0.115 [20/0] via 192.168.4.29, 01:20:55
B 10.0.0.116 [20/0] via 192.168.4.31, 01:20:55
80.0.0.0/8 is variably subnetted, 4 subnets, 2 masks
C 80.0.0.12/30 is directly connected, FastEthernet0/0
L 80.0.0.14/32 is directly connected, FastEthernet0/0
C 80.0.1.12/30 is directly connected, FastEthernet0/1
L 80.0.1.14/32 is directly connected, FastEthernet0/1
100.0.0.0/32 is subnetted, 2 subnets
B 100.0.0.1 [20/0] via 80.0.0.13, 01:13:56
B 100.0.0.2 [20/0] via 80.0.1.13, 01:13:25
172.16.0.0/24 is subnetted, 16 subnets
B 172.16.101.0 [20/0] via 192.168.4.1, 01:20:55
B 172.16.102.0 [20/0] via 192.168.4.3, 01:20:55
B 172.16.103.0 [20/0] via 192.168.4.5, 01:20:55
B 172.16.104.0 [20/0] via 192.168.4.7, 01:20:55
B 172.16.105.0 [20/0] via 192.168.4.9, 01:20:55
B 172.16.106.0 [20/0] via 192.168.4.11, 01:20:55
B 172.16.107.0 [20/0] via 192.168.4.13, 01:20:55
B 172.16.108.0 [20/0] via 192.168.4.15, 01:20:55
B 172.16.109.0 [20/0] via 192.168.4.17, 01:20:55
B 172.16.110.0 [20/0] via 192.168.4.19, 01:20:55
B 172.16.111.0 [20/0] via 192.168.4.21, 01:20:55
B 172.16.112.0 [20/0] via 192.168.4.23, 01:20:55
B 172.16.113.0 [20/0] via 192.168.4.25, 01:20:55
B 172.16.114.0 [20/0] via 192.168.4.27, 01:20:55
B 172.16.115.0 [20/0] via 192.168.4.29, 01:20:55
B 172.16.116.0 [20/0] via 192.168.4.31, 01:20:55
192.168.4.0/24 is variably subnetted, 32 subnets, 2 masks
C 192.168.4.0/31 is directly connected, Ethernet1/0
L 192.168.4.0/32 is directly connected, Ethernet1/0
C 192.168.4.2/31 is directly connected, Ethernet1/1
L 192.168.4.2/32 is directly connected, Ethernet1/1
C 192.168.4.4/31 is directly connected, Ethernet1/2
L 192.168.4.4/32 is directly connected, Ethernet1/2
C 192.168.4.6/31 is directly connected, Ethernet1/3
L 192.168.4.6/32 is directly connected, Ethernet1/3
C 192.168.4.8/31 is directly connected, Ethernet1/4
L 192.168.4.8/32 is directly connected, Ethernet1/4
C 192.168.4.10/31 is directly connected, Ethernet1/5
L 192.168.4.10/32 is directly connected, Ethernet1/5
C 192.168.4.12/31 is directly connected, Ethernet1/6
L 192.168.4.12/32 is directly connected, Ethernet1/6
C 192.168.4.14/31 is directly connected, Ethernet1/7
L 192.168.4.14/32 is directly connected, Ethernet1/7
C 192.168.4.16/31 is directly connected, Ethernet2/0
L 192.168.4.16/32 is directly connected, Ethernet2/0
C 192.168.4.18/31 is directly connected, Ethernet2/1
L 192.168.4.18/32 is directly connected, Ethernet2/1
C 192.168.4.20/31 is directly connected, Ethernet2/2
L 192.168.4.20/32 is directly connected, Ethernet2/2
C 192.168.4.22/31 is directly connected, Ethernet2/3
L 192.168.4.22/32 is directly connected, Ethernet2/3
C 192.168.4.24/31 is directly connected, Ethernet2/4
L 192.168.4.24/32 is directly connected, Ethernet2/4
C 192.168.4.26/31 is directly connected, Ethernet2/5
L 192.168.4.26/32 is directly connected, Ethernet2/5
C 192.168.4.28/31 is directly connected, Ethernet2/6
L 192.168.4.28/32 is directly connected, Ethernet2/6
C 192.168.4.30/31 is directly connected, Ethernet2/7
L 192.168.4.30/32 is directly connected, Ethernet2/7
SPINE4# sh ip protocols
*** IP Routing is NSF aware ***
Routing Protocol is "bgp 104"
Outgoing update filter list for all interfaces is not set
Incoming update filter list for all interfaces is not set
IGP synchronization is disabled
Automatic route summarization is disabled
Neighbor(s):
Address FiltIn FiltOut DistIn DistOut Weight RouteMap
80.0.0.13
80.0.1.13
192.168.4.1
192.168.4.3
192.168.4.5
192.168.4.7
192.168.4.9
192.168.4.11
192.168.4.13
192.168.4.15
192.168.4.17
192.168.4.19
192.168.4.21
192.168.4.23
Address FiltIn FiltOut DistIn DistOut Weight RouteMap
192.168.4.25
192.168.4.27
192.168.4.29
192.168.4.31
Maximum path: 16
Routing Information Sources:
Gateway Distance Last Update
80.0.1.13 20 01:13:33
80.0.0.13 20 01:13:34
192.168.4.13 20 01:20:33
192.168.4.15 20 01:15:06
192.168.4.9 20 01:20:33
192.168.4.11 20 01:20:33
192.168.4.5 20 01:20:33
192.168.4.7 20 01:20:22
192.168.4.1 20 01:14:05
192.168.4.3 20 01:20:33
192.168.4.29 20 01:20:33
192.168.4.31 20 01:20:33
192.168.4.25 20 01:13:34
192.168.4.27 20 01:20:33
192.168.4.21 20 01:20:33
Gateway Distance Last Update
192.168.4.23 20 01:20:34
192.168.4.17 20 01:20:34
192.168.4.19 20 01:14:36
Distance: external 20 internal 200 local 200
SPINE4#ping 172.16.116.1 source loopback 0
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 172.16.116.1, timeout is 2 seconds:
Packet sent with a source address of 10.0.0.4
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 4/16/32 ms
SPINE4#traceroute 172.16.101.1 source loopback 0 probe 12
Type escape sequence to abort.
Tracing the route to 172.16.101.1
VRF info: (vrf in name/id, vrf out name/id)
1 192.168.4.1 12 msec
2 172.16.101.1 [AS 201] 12 msec
LEAF 1
LEAF1#sh running-config
Building configuration…
Current configuration : 2800 bytes
!
!
version 15.2
service timestamps debug datetime msec
service timestamps log datetime msec
!
hostname LEAF1
!
boot-start-marker
boot-end-marker
!
!
enable secret 5 $1$I6/S$zjbK4c1F/hoKwm7bIoaB8.
!
no aaa new-model
no ip icmp rate-limit unreachable
ip cef
!
!
ip dhcp excluded-address 172.16.101.254
!
ip dhcp pool SERVERS
network 172.16.101.0 255.255.255.0
default-router 172.16.101.254
!
!
no ip domain lookup
no ipv6 cef
!
!
multilink bundle-name authenticated
!
!
ip tcp synwait-time 5
!
!
interface Loopback0
ip address 10.0.0.101 255.255.255.255
!
interface FastEthernet0/0
no ip address
shutdown
speed auto
duplex auto
!
interface FastEthernet0/1
no ip address
shutdown
speed auto
duplex auto
!
interface Ethernet1/0
description – TO SPINE 1 eth1/0 –
ip address 192.168.1.1 255.255.255.254
duplex full
!
interface Ethernet1/1
description – TO SPINE2 eth1/0 –
ip address 192.168.2.1 255.255.255.254
duplex full
!
interface Ethernet1/2
description – TO SPINE3 eth1/0 –
ip address 192.168.3.1 255.255.255.254
duplex full
!
interface Ethernet1/3
description – TO SPINE4 eth1/0 –
ip address 192.168.4.1 255.255.255.254
duplex full
!
interface Ethernet1/4
no ip address
shutdown
duplex full
!
interface Ethernet1/5
no ip address
shutdown
duplex full
!
interface Ethernet1/6
no ip address
shutdown
duplex full
!
interface Ethernet1/7
no ip address
shutdown
duplex full
!
interface Ethernet2/0
no ip address
shutdown
duplex full
!
interface Ethernet2/1
no ip address
shutdown
duplex full
!
interface Ethernet2/2
no ip address
shutdown
duplex full
!
interface Ethernet2/3
no ip address
shutdown
duplex full
!
interface Ethernet2/4
no ip address
shutdown
duplex full
!
interface Ethernet2/5
no ip address
shutdown
duplex full
!
interface Ethernet2/6
no ip address
shutdown
duplex full
!
interface Ethernet2/7
description – SERVERS_ACCESS –
ip address 172.16.101.254 255.255.255.0
duplex full
!
router bgp 201
bgp router-id 10.0.0.101
bgp log-neighbor-changes
bgp bestpath as-path multipath-relax
network 10.0.0.101 mask 255.255.255.255
network 172.16.101.0 mask 255.255.255.0
neighbor 192.168.1.0 remote-as 101
neighbor 192.168.2.0 remote-as 102
neighbor 192.168.3.0 remote-as 103
neighbor 192.168.4.0 remote-as 104
maximum-paths 4
!
ip forward-protocol nd
!
!
no ip http server
no ip http secure-server
!
!
control-plane
!
!
line con 0
exec-timeout 0 0
privilege level 15
logging synchronous
stopbits 1
line aux 0
exec-timeout 0 0
privilege level 15
logging synchronous
stopbits 1
line vty 0 4
password licenta
logging synchronous
login
line vty 5 15
password licenta
logging synchronous
login
!
!
end
LEAF1#sh ip route
Gateway of last resort is 192.168.4.0 to network 0.0.0.0
B* 0.0.0.0/0 [20/0] via 192.168.4.0, 01:21:15
[20/0] via 192.168.3.0, 01:21:15
[20/0] via 192.168.2.0, 01:21:15
[20/0] via 192.168.1.0, 01:21:15
8.0.0.0/32 is subnetted, 2 subnets
B 8.8.4.4 [20/0] via 192.168.4.0, 01:14:17
[20/0] via 192.168.3.0, 01:14:17
[20/0] via 192.168.2.0, 01:14:17
[20/0] via 192.168.1.0, 01:14:17
B 8.8.8.8 [20/0] via 192.168.4.0, 01:14:48
[20/0] via 192.168.3.0, 01:14:48
[20/0] via 192.168.2.0, 01:14:48
[20/0] via 192.168.1.0, 01:14:48
10.0.0.0/32 is subnetted, 20 subnets
B 10.0.0.1 [20/0] via 192.168.1.0, 01:24:50
B 10.0.0.2 [20/0] via 192.168.2.0, 01:23:18
B 10.0.0.3 [20/0] via 192.168.3.0, 01:22:17
B 10.0.0.4 [20/0] via 192.168.4.0, 01:21:15
C 10.0.0.101 is directly connected, Loopback0
B 10.0.0.102 [20/0] via 192.168.4.0, 01:22:16
[20/0] via 192.168.3.0, 01:22:16
[20/0] via 192.168.2.0, 01:22:16
[20/0] via 192.168.1.0, 01:22:16
B 10.0.0.103 [20/0] via 192.168.4.0, 01:22:16
[20/0] via 192.168.3.0, 01:22:16
[20/0] via 192.168.2.0, 01:22:16
[20/0] via 192.168.1.0, 01:22:16
B 10.0.0.104 [20/0] via 192.168.4.0, 01:22:16
[20/0] via 192.168.3.0, 01:22:16
[20/0] via 192.168.2.0, 01:22:16
[20/0] via 192.168.1.0, 01:22:16
B 10.0.0.105 [20/0] via 192.168.4.0, 01:22:16
[20/0] via 192.168.3.0, 01:22:16
[20/0] via 192.168.2.0, 01:22:16
[20/0] via 192.168.1.0, 01:22:16
B 10.0.0.106 [20/0] via 192.168.4.0, 01:22:17
[20/0] via 192.168.3.0, 01:22:17
[20/0] via 192.168.2.0, 01:22:17
[20/0] via 192.168.1.0, 01:22:17
B 10.0.0.107 [20/0] via 192.168.4.0, 01:22:17
[20/0] via 192.168.3.0, 01:22:17
[20/0] via 192.168.2.0, 01:22:17
[20/0] via 192.168.1.0, 01:22:17
B 10.0.0.108 [20/0] via 192.168.4.0, 01:22:17
[20/0] via 192.168.3.0, 01:22:17
[20/0] via 192.168.2.0, 01:22:17
[20/0] via 192.168.1.0, 01:22:17
B 10.0.0.109 [20/0] via 192.168.4.0, 01:22:17
[20/0] via 192.168.3.0, 01:22:17
[20/0] via 192.168.2.0, 01:22:17
[20/0] via 192.168.1.0, 01:22:17
B 10.0.0.110 [20/0] via 192.168.4.0, 01:22:17
[20/0] via 192.168.3.0, 01:22:17
[20/0] via 192.168.2.0, 01:22:17
[20/0] via 192.168.1.0, 01:22:17
B 10.0.0.111 [20/0] via 192.168.4.0, 01:22:17
[20/0] via 192.168.3.0, 01:22:17
[20/0] via 192.168.2.0, 01:22:17
[20/0] via 192.168.1.0, 01:22:17
B 10.0.0.112 [20/0] via 192.168.4.0, 01:22:17
[20/0] via 192.168.3.0, 01:22:17
[20/0] via 192.168.2.0, 01:22:17
[20/0] via 192.168.1.0, 01:22:17
B 10.0.0.113 [20/0] via 192.168.4.0, 01:22:17
[20/0] via 192.168.3.0, 01:22:17
[20/0] via 192.168.2.0, 01:22:17
[20/0] via 192.168.1.0, 01:22:17
B 10.0.0.114 [20/0] via 192.168.4.0, 01:22:17
[20/0] via 192.168.3.0, 01:22:17
[20/0] via 192.168.2.0, 01:22:17
[20/0] via 192.168.1.0, 01:22:17
B 10.0.0.115 [20/0] via 192.168.4.0, 01:22:17
[20/0] via 192.168.3.0, 01:22:17
[20/0] via 192.168.2.0, 01:22:17
[20/0] via 192.168.1.0, 01:22:17
B 10.0.0.116 [20/0] via 192.168.4.0, 01:22:17
[20/0] via 192.168.3.0, 01:22:17
[20/0] via 192.168.2.0, 01:22:17
[20/0] via 192.168.1.0, 01:22:17
100.0.0.0/32 is subnetted, 2 subnets
B 100.0.0.1 [20/0] via 192.168.4.0, 01:14:49
[20/0] via 192.168.3.0, 01:14:49
[20/0] via 192.168.2.0, 01:14:50
[20/0] via 192.168.1.0, 01:14:50
B 100.0.0.2 [20/0] via 192.168.4.0, 01:14:19
[20/0] via 192.168.3.0, 01:14:19
[20/0] via 192.168.2.0, 01:14:19
[20/0] via 192.168.1.0, 01:14:19
172.16.0.0/16 is variably subnetted, 17 subnets, 2 masks
C 172.16.101.0/24 is directly connected, Ethernet2/7
L 172.16.101.254/32 is directly connected, Ethernet2/7
B 172.16.102.0/24 [20/0] via 192.168.4.0, 01:22:18
[20/0] via 192.168.3.0, 01:22:18
[20/0] via 192.168.2.0, 01:22:18
[20/0] via 192.168.1.0, 01:22:18
B 172.16.103.0/24 [20/0] via 192.168.4.0, 01:22:18
[20/0] via 192.168.3.0, 01:22:18
[20/0] via 192.168.2.0, 01:22:18
[20/0] via 192.168.1.0, 01:22:18
B 172.16.104.0/24 [20/0] via 192.168.4.0, 01:22:18
[20/0] via 192.168.3.0, 01:22:18
[20/0] via 192.168.2.0, 01:22:18
[20/0] via 192.168.1.0, 01:22:18
B 172.16.105.0/24 [20/0] via 192.168.4.0, 01:22:18
[20/0] via 192.168.3.0, 01:22:18
[20/0] via 192.168.2.0, 01:22:18
[20/0] via 192.168.1.0, 01:22:18
B 172.16.106.0/24 [20/0] via 192.168.4.0, 01:22:18
[20/0] via 192.168.3.0, 01:22:18
[20/0] via 192.168.2.0, 01:22:18
[20/0] via 192.168.1.0, 01:22:18
B 172.16.107.0/24 [20/0] via 192.168.4.0, 01:22:18
[20/0] via 192.168.3.0, 01:22:18
[20/0] via 192.168.2.0, 01:22:18
[20/0] via 192.168.1.0, 01:22:18
B 172.16.108.0/24 [20/0] via 192.168.4.0, 01:22:18
[20/0] via 192.168.3.0, 01:22:18
[20/0] via 192.168.2.0, 01:22:18
[20/0] via 192.168.1.0, 01:22:18
B 172.16.109.0/24 [20/0] via 192.168.4.0, 01:22:18
[20/0] via 192.168.3.0, 01:22:18
[20/0] via 192.168.2.0, 01:22:18
[20/0] via 192.168.1.0, 01:22:18
B 172.16.110.0/24 [20/0] via 192.168.4.0, 01:22:18
[20/0] via 192.168.3.0, 01:22:18
[20/0] via 192.168.2.0, 01:22:18
[20/0] via 192.168.1.0, 01:22:18
B 172.16.111.0/24 [20/0] via 192.168.4.0, 01:22:19
[20/0] via 192.168.3.0, 01:22:19
[20/0] via 192.168.2.0, 01:22:19
[20/0] via 192.168.1.0, 01:22:19
B 172.16.112.0/24 [20/0] via 192.168.4.0, 01:22:19
[20/0] via 192.168.3.0, 01:22:19
[20/0] via 192.168.2.0, 01:22:19
[20/0] via 192.168.1.0, 01:22:19
B 172.16.113.0/24 [20/0] via 192.168.4.0, 01:22:19
[20/0] via 192.168.3.0, 01:22:19
[20/0] via 192.168.2.0, 01:22:19
[20/0] via 192.168.1.0, 01:22:19
B 172.16.114.0/24 [20/0] via 192.168.4.0, 01:22:19
[20/0] via 192.168.3.0, 01:22:19
[20/0] via 192.168.2.0, 01:22:19
[20/0] via 192.168.1.0, 01:22:19
B 172.16.115.0/24 [20/0] via 192.168.4.0, 01:22:19
[20/0] via 192.168.3.0, 01:22:19
[20/0] via 192.168.2.0, 01:22:19
[20/0] via 192.168.1.0, 01:22:19
B 172.16.116.0/24 [20/0] via 192.168.4.0, 01:22:19
[20/0] via 192.168.3.0, 01:22:19
[20/0] via 192.168.2.0, 01:22:19
[20/0] via 192.168.1.0, 01:22:19
192.168.1.0/24 is variably subnetted, 2 subnets, 2 masks
C 192.168.1.0/31 is directly connected, Ethernet1/0
L 192.168.1.1/32 is directly connected, Ethernet1/0
192.168.2.0/24 is variably subnetted, 2 subnets, 2 masks
C 192.168.2.0/31 is directly connected, Ethernet1/1
L 192.168.2.1/32 is directly connected, Ethernet1/1
192.168.3.0/24 is variably subnetted, 2 subnets, 2 masks
C 192.168.3.0/31 is directly connected, Ethernet1/2
L 192.168.3.1/32 is directly connected, Ethernet1/2
192.168.4.0/24 is variably subnetted, 2 subnets, 2 masks
C 192.168.4.0/31 is directly connected, Ethernet1/3
L 192.168.4.1/32 is directly connected, Ethernet1/3
LEAF1# sh ip rp protocols
*** IP Routing is NSF aware ***
Routing Protocol is "bgp 201"
Outgoing update filter list for all interfaces is not set
Incoming update filter list for all interfaces is not set
IGP synchronization is disabled
Automatic route summarization is disabled
Neighbor(s):
Address FiltIn FiltOut DistIn DistOut Weight RouteMap
192.168.1.0
192.168.2.0
192.168.3.0
192.168.4.0
Maximum path: 4
Routing Information Sources:
Gateway Distance Last Update
192.168.1.0 20 01:14:28
192.168.2.0 20 01:14:28
192.168.3.0 20 01:14:28
192.168.4.0 20 01:14:27
Distance: external 20 internal 200 local 200
LEAF1#ping 172.16.116.1 source loopback 0
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 172.16.116.1, timeout is 2 seconds:
Packet sent with a source address of 10.0.0.101
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 16/32/40 ms
LEAF1#traceroute 8.8.8.8 source loopback 0 probe 12
Type escape sequence to abort.
Tracing the route to 8.8.8.8
VRF info: (vrf in name/id, vrf out name/id)
1 192.168.1.0 [AS 101] 8 msec
192.168.2.0 [AS 101] 12 msec
192.168.3.0 [AS 101] 8 msec
192.168.4.0 [AS 101] 12 msec
192.168.1.0 [AS 101] 8 msec
192.168.2.0 [AS 101] 4 msec
192.168.3.0 [AS 101] 8 msec
192.168.4.0 [AS 101] 12 msec
192.168.1.0 [AS 101] 8 msec
192.168.2.0 [AS 101] 4 msec
192.168.3.0 [AS 101] 8 msec
192.168.4.0 [AS 101] 12 msec
2 80.0.0.1 [AS 101] 4 msec
80.0.0.5 [AS 101] 28 msec
80.0.0.9 [AS 101] 24 msec
80.0.0.13 [AS 101] 4 msec
80.0.0.1 [AS 101] 28 msec
80.0.0.5 [AS 101] 20 msec
80.0.0.9 [AS 101] 12 msec
80.0.0.13 [AS 101] 24 msec
80.0.0.1 [AS 101] 12 msec
80.0.0.5 [AS 101] 12 msec
80.0.0.9 [AS 101] 16 msec
80.0.0.13 [AS 101] 20 msec
LEAF 6
LEAF6#sh run
Building configuration…
Current configuration : 2806 bytes
!
!
version 15.2
service timestamps debug datetime msec
service timestamps log datetime msec
!
hostname LEAF6
!
boot-start-marker
boot-end-marker
!
!
enable secret 5 $1$6jLz$6WDXFzaZi2jaaK2K4KVNx0
!
no aaa new-model
no ip icmp rate-limit unreachable
ip cef
!
!
ip dhcp excluded-address 172.16.106.254
!
ip dhcp pool SERVERS
network 172.16.106.0 255.255.255.0
default-router 172.16.106.254
!
!
no ip domain lookup
no ipv6 cef
!
!
multilink bundle-name authenticated
!
!
ip tcp synwait-time 5
!
!
interface Loopback0
ip address 10.0.0.106 255.255.255.255
!
interface FastEthernet0/0
no ip address
shutdown
speed auto
duplex auto
!
interface FastEthernet0/1
no ip address
shutdown
speed auto
duplex auto
!
interface Ethernet1/0
description – TO SPINE1 eth1/5 –
ip address 192.168.1.11 255.255.255.254
duplex full
!
interface Ethernet1/1
description – TO SPINE2 eth1/5 –
ip address 192.168.2.11 255.255.255.254
duplex full
!
interface Ethernet1/2
description – TO SPINE3 eth1/5 –
ip address 192.168.3.11 255.255.255.254
duplex full
!
interface Ethernet1/3
description – TO SPINE4 eth1/5 –
ip address 192.168.4.11 255.255.255.254
duplex full
!
interface Ethernet1/4
no ip address
shutdown
duplex full
!
interface Ethernet1/5
no ip address
shutdown
duplex full
!
interface Ethernet1/6
no ip address
shutdown
duplex full
!
interface Ethernet1/7
no ip address
shutdown
duplex full
!
interface Ethernet2/0
no ip address
shutdown
duplex full
!
interface Ethernet2/1
no ip address
shutdown
duplex full
!
interface Ethernet2/2
no ip address
shutdown
duplex full
!
interface Ethernet2/3
no ip address
shutdown
duplex full
!
interface Ethernet2/4
no ip address
shutdown
duplex full
!
interface Ethernet2/5
no ip address
shutdown
duplex full
!
interface Ethernet2/6
no ip address
shutdown
duplex full
!
interface Ethernet2/7
description – SERVERS_ACCESS –
ip address 172.16.106.254 255.255.255.0
duplex full
!
router bgp 206
bgp router-id 10.0.0.106
bgp log-neighbor-changes
bgp bestpath as-path multipath-relax
network 10.0.0.106 mask 255.255.255.255
network 172.16.106.0 mask 255.255.255.0
neighbor 192.168.1.10 remote-as 101
neighbor 192.168.2.10 remote-as 102
neighbor 192.168.3.10 remote-as 103
neighbor 192.168.4.10 remote-as 104
maximum-paths 4
!
ip forward-protocol nd
!
!
no ip http server
no ip http secure-server
!
!
control-plane
!
!
line con 0
exec-timeout 0 0
privilege level 15
logging synchronous
stopbits 1
line aux 0
exec-timeout 0 0
privilege level 15
logging synchronous
stopbits 1
line vty 0 4
password licenta
logging synchronous
login
line vty 5 15
password licenta
logging synchronous
login
!
!
end
LEAF6#sh ip route
Gateway of last resort is 192.168.4.10 to network 0.0.0.0
B* 0.0.0.0/0 [20/0] via 192.168.4.10, 01:25:14
[20/0] via 192.168.3.10, 01:25:14
[20/0] via 192.168.2.10, 01:25:14
[20/0] via 192.168.1.10, 01:25:14
8.0.0.0/32 is subnetted, 2 subnets
B 8.8.4.4 [20/0] via 192.168.4.10, 01:18:16
[20/0] via 192.168.3.10, 01:18:16
[20/0] via 192.168.2.10, 01:18:16
[20/0] via 192.168.1.10, 01:18:16
B 8.8.8.8 [20/0] via 192.168.4.10, 01:18:47
[20/0] via 192.168.3.10, 01:18:47
[20/0] via 192.168.2.10, 01:18:47
[20/0] via 192.168.1.10, 01:18:47
10.0.0.0/32 is subnetted, 20 subnets
B 10.0.0.1 [20/0] via 192.168.1.10, 01:28:49
B 10.0.0.2 [20/0] via 192.168.2.10, 01:27:17
B 10.0.0.3 [20/0] via 192.168.3.10, 01:26:16
B 10.0.0.4 [20/0] via 192.168.4.10, 01:25:14
B 10.0.0.101 [20/0] via 192.168.4.10, 01:26:15
[20/0] via 192.168.3.10, 01:26:15
[20/0] via 192.168.2.10, 01:26:15
[20/0] via 192.168.1.10, 01:26:15
B 10.0.0.102 [20/0] via 192.168.4.10, 01:26:15
[20/0] via 192.168.3.10, 01:26:15
[20/0] via 192.168.2.10, 01:26:15
[20/0] via 192.168.1.10, 01:26:15
B 10.0.0.103 [20/0] via 192.168.4.10, 01:26:15
[20/0] via 192.168.3.10, 01:26:15
[20/0] via 192.168.2.10, 01:26:15
[20/0] via 192.168.1.10, 01:26:15
B 10.0.0.104 [20/0] via 192.168.4.10, 01:26:15
[20/0] via 192.168.3.10, 01:26:15
[20/0] via 192.168.2.10, 01:26:15
[20/0] via 192.168.1.10, 01:26:15
B 10.0.0.105 [20/0] via 192.168.4.10, 01:26:15
[20/0] via 192.168.3.10, 01:26:15
[20/0] via 192.168.2.10, 01:26:15
[20/0] via 192.168.1.10, 01:26:15
C 10.0.0.106 is directly connected, Loopback0
B 10.0.0.107 [20/0] via 192.168.4.10, 01:26:15
[20/0] via 192.168.3.10, 01:26:15
[20/0] via 192.168.2.10, 01:26:15
[20/0] via 192.168.1.10, 01:26:15
B 10.0.0.108 [20/0] via 192.168.4.10, 01:26:15
[20/0] via 192.168.3.10, 01:26:15
[20/0] via 192.168.2.10, 01:26:15
[20/0] via 192.168.1.10, 01:26:15
B 10.0.0.109 [20/0] via 192.168.4.10, 01:26:15
[20/0] via 192.168.3.10, 01:26:15
[20/0] via 192.168.2.10, 01:26:15
[20/0] via 192.168.1.10, 01:26:15
B 10.0.0.110 [20/0] via 192.168.4.10, 01:26:15
[20/0] via 192.168.3.10, 01:26:15
[20/0] via 192.168.2.10, 01:26:15
[20/0] via 192.168.1.10, 01:26:15
B 10.0.0.111 [20/0] via 192.168.4.10, 01:26:15
[20/0] via 192.168.3.10, 01:26:15
[20/0] via 192.168.2.10, 01:26:15
[20/0] via 192.168.1.10, 01:26:15
B 10.0.0.112 [20/0] via 192.168.4.10, 01:26:15
[20/0] via 192.168.3.10, 01:26:15
[20/0] via 192.168.2.10, 01:26:15
[20/0] via 192.168.1.10, 01:26:15
B 10.0.0.113 [20/0] via 192.168.4.10, 01:26:15
[20/0] via 192.168.3.10, 01:26:15
[20/0] via 192.168.2.10, 01:26:15
[20/0] via 192.168.1.10, 01:26:15
B 10.0.0.114 [20/0] via 192.168.4.10, 01:26:15
[20/0] via 192.168.3.10, 01:26:15
[20/0] via 192.168.2.10, 01:26:15
[20/0] via 192.168.1.10, 01:26:15
B 10.0.0.115 [20/0] via 192.168.4.10, 01:26:15
[20/0] via 192.168.3.10, 01:26:15
[20/0] via 192.168.2.10, 01:26:15
[20/0] via 192.168.1.10, 01:26:15
B 10.0.0.116 [20/0] via 192.168.4.10, 01:26:15
[20/0] via 192.168.3.10, 01:26:15
[20/0] via 192.168.2.10, 01:26:15
[20/0] via 192.168.1.10, 01:26:15
100.0.0.0/32 is subnetted, 2 subnets
B 100.0.0.1 [20/0] via 192.168.4.10, 01:18:47
[20/0] via 192.168.3.10, 01:18:47
[20/0] via 192.168.2.10, 01:18:47
[20/0] via 192.168.1.10, 01:18:47
B 100.0.0.2 [20/0] via 192.168.4.10, 01:18:16
[20/0] via 192.168.3.10, 01:18:16
[20/0] via 192.168.2.10, 01:18:16
[20/0] via 192.168.1.10, 01:18:16
172.16.0.0/16 is variably subnetted, 17 subnets, 2 masks
B 172.16.101.0/24 [20/0] via 192.168.4.10, 01:26:15
[20/0] via 192.168.3.10, 01:26:15
[20/0] via 192.168.2.10, 01:26:15
[20/0] via 192.168.1.10, 01:26:15
B 172.16.102.0/24 [20/0] via 192.168.4.10, 01:26:15
[20/0] via 192.168.3.10, 01:26:15
[20/0] via 192.168.2.10, 01:26:15
[20/0] via 192.168.1.10, 01:26:15
B 172.16.103.0/24 [20/0] via 192.168.4.10, 01:26:15
[20/0] via 192.168.3.10, 01:26:15
[20/0] via 192.168.2.10, 01:26:15
[20/0] via 192.168.1.10, 01:26:15
B 172.16.104.0/24 [20/0] via 192.168.4.10, 01:26:15
[20/0] via 192.168.3.10, 01:26:15
[20/0] via 192.168.2.10, 01:26:15
[20/0] via 192.168.1.10, 01:26:15
B 172.16.105.0/24 [20/0] via 192.168.4.10, 01:26:15
[20/0] via 192.168.3.10, 01:26:15
[20/0] via 192.168.2.10, 01:26:15
[20/0] via 192.168.1.10, 01:26:15
C 172.16.106.0/24 is directly connected, Ethernet2/7
L 172.16.106.254/32 is directly connected, Ethernet2/7
B 172.16.107.0/24 [20/0] via 192.168.4.10, 01:26:15
[20/0] via 192.168.3.10, 01:26:15
[20/0] via 192.168.2.10, 01:26:15
[20/0] via 192.168.1.10, 01:26:15
B 172.16.108.0/24 [20/0] via 192.168.4.10, 01:26:15
[20/0] via 192.168.3.10, 01:26:15
[20/0] via 192.168.2.10, 01:26:15
[20/0] via 192.168.1.10, 01:26:15
B 172.16.109.0/24 [20/0] via 192.168.4.10, 01:26:15
[20/0] via 192.168.3.10, 01:26:15
[20/0] via 192.168.2.10, 01:26:15
[20/0] via 192.168.1.10, 01:26:15
B 172.16.110.0/24 [20/0] via 192.168.4.10, 01:26:15
[20/0] via 192.168.3.10, 01:26:15
[20/0] via 192.168.2.10, 01:26:15
[20/0] via 192.168.1.10, 01:26:15
B 172.16.111.0/24 [20/0] via 192.168.4.10, 01:26:15
[20/0] via 192.168.3.10, 01:26:15
[20/0] via 192.168.2.10, 01:26:15
[20/0] via 192.168.1.10, 01:26:15
B 172.16.112.0/24 [20/0] via 192.168.4.10, 01:26:15
[20/0] via 192.168.3.10, 01:26:15
[20/0] via 192.168.2.10, 01:26:15
[20/0] via 192.168.1.10, 01:26:15
B 172.16.113.0/24 [20/0] via 192.168.4.10, 01:26:15
[20/0] via 192.168.3.10, 01:26:15
[20/0] via 192.168.2.10, 01:26:15
[20/0] via 192.168.1.10, 01:26:15
B 172.16.114.0/24 [20/0] via 192.168.4.10, 01:26:15
[20/0] via 192.168.3.10, 01:26:15
[20/0] via 192.168.2.10, 01:26:15
[20/0] via 192.168.1.10, 01:26:15
B 172.16.115.0/24 [20/0] via 192.168.4.10, 01:26:15
[20/0] via 192.168.3.10, 01:26:15
[20/0] via 192.168.2.10, 01:26:15
[20/0] via 192.168.1.10, 01:26:15
B 172.16.116.0/24 [20/0] via 192.168.4.10, 01:26:15
[20/0] via 192.168.3.10, 01:26:15
[20/0] via 192.168.2.10, 01:26:15
[20/0] via 192.168.1.10, 01:26:15
192.168.1.0/24 is variably subnetted, 2 subnets, 2 masks
C 192.168.1.10/31 is directly connected, Ethernet1/0
L 192.168.1.11/32 is directly connected, Ethernet1/0
192.168.2.0/24 is variably subnetted, 2 subnets, 2 masks
C 192.168.2.10/31 is directly connected, Ethernet1/1
L 192.168.2.11/32 is directly connected, Ethernet1/1
192.168.3.0/24 is variably subnetted, 2 subnets, 2 masks
C 192.168.3.10/31 is directly connected, Ethernet1/2
L 192.168.3.11/32 is directly connected, Ethernet1/2
192.168.4.0/24 is variably subnetted, 2 subnets, 2 masks
C 192.168.4.10/31 is directly connected, Ethernet1/3
L 192.168.4.11/32 is directly connected, Ethernet1/3
LEAF6#sh ip protocols
*** IP Routing is NSF aware ***
Routing Protocol is "bgp 206"
Outgoing update filter list for all interfaces is not set
Incoming update filter list for all interfaces is not set
IGP synchronization is disabled
Automatic route summarization is disabled
Neighbor(s):
Address FiltIn FiltOut DistIn DistOut Weight RouteMap
192.168.1.10
192.168.2.10
192.168.3.10
192.168.4.10
Maximum path: 4
Routing Information Sources:
Gateway Distance Last Update
192.168.2.10 20 01:18:58
192.168.3.10 20 01:18:28
192.168.1.10 20 01:18:27
192.168.4.10 20 01:18:58
Distance: external 20 internal 200 local 200
LEAF6# ping 172.16.116.1 source loopback 0
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 172.16.116.1, timeout is 2 seconds:
Packet sent with a source address of 10.0.0.106
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 16/25/28 ms
LEAF6#traceroute 172.16.101.1 source loopback 0 probe 12
Type escape sequence to abort.
Tracing the route to 172.16.101.1
VRF info: (vrf in name/id, vrf out name/id)
1 192.168.1.10 [AS 101] 4 msec
192.168.2.10 [AS 101] 12 msec
192.168.3.10 [AS 101] 8 msec
192.168.4.10 [AS 101] 4 msec
192.168.1.10 [AS 101] 4 msec
192.168.2.10 [AS 101] 12 msec
192.168.3.10 [AS 101] 4 msec
192.168.4.10 [AS 101] 16 msec
192.168.1.10 [AS 101] 12 msec
192.168.2.10 [AS 101] 8 msec
192.168.3.10 [AS 101] 12 msec
192.168.4.10 [AS 101] 12 msec
2 192.168.1.1 [AS 101] 0 msec
192.168.2.1 [AS 101] 8 msec
192.168.3.1 [AS 101] 32 msec
192.168.4.1 [AS 101] 28 msec
192.168.1.1 [AS 101] 28 msec
192.168.2.1 [AS 101] 4 msec
192.168.3.1 [AS 101] 32 msec
192.168.4.1 [AS 101] 28 msec
192.168.1.1 [AS 101] 28 msec
192.168.2.1 [AS 101] 4 msec
192.168.3.1 [AS 101] 32 msec
192.168.4.1 [AS 101] 24 msec
3 172.16.101.1 [AS 201] 28 msec
LEAF 11
LEAF11#sh run
Building configuration…
Current configuration : 2806 bytes
!
!
version 15.2
service timestamps debug datetime msec
service timestamps log datetime msec
!
hostname LEAF11
!
boot-start-marker
boot-end-marker
!
!
enable secret 5 $1$vJs0$cS1fZSPTu3lOfMaX2n0SU/
!
no aaa new-model
no ip icmp rate-limit unreachable
ip cef
!
!
ip dhcp excluded-address 172.16.111.254
!
ip dhcp pool SERVERS
network 172.16.111.0 255.255.255.0
default-router 172.16.111.254
!
!
!
no ip domain lookup
no ipv6 cef
!
!
multilink bundle-name authenticated
!
!
ip tcp synwait-time 5
!
!
interface Loopback0
ip address 10.0.0.111 255.255.255.255
!
interface FastEthernet0/0
no ip address
shutdown
speed auto
duplex auto
!
interface FastEthernet0/1
no ip address
shutdown
speed auto
duplex auto
!
interface Ethernet1/0
description – TO SPINE1 eth2/2 –
ip address 192.168.1.21 255.255.255.254
duplex full
!
interface Ethernet1/1
description – TO SPINE2 eth2/2 –
ip address 192.168.2.21 255.255.255.254
duplex full
!
interface Ethernet1/2
description – TO SPINE3 eth2/2 –
ip address 192.168.3.21 255.255.255.254
duplex full
!
interface Ethernet1/3
description – TO SPINE4 eth2/2 –
ip address 192.168.4.21 255.255.255.254
duplex full
!
interface Ethernet1/4
no ip address
shutdown
duplex full
!
interface Ethernet1/5
no ip address
shutdown
duplex full
!
interface Ethernet1/6
no ip address
shutdown
duplex full
!
interface Ethernet1/7
no ip address
shutdown
duplex full
!
interface Ethernet2/0
no ip address
shutdown
duplex full
!
interface Ethernet2/1
no ip address
shutdown
duplex full
!
interface Ethernet2/2
no ip address
shutdown
duplex full
!
interface Ethernet2/3
no ip address
shutdown
duplex full
!
interface Ethernet2/4
no ip address
shutdown
duplex full
!
interface Ethernet2/5
no ip address
shutdown
duplex full
!
interface Ethernet2/6
no ip address
shutdown
duplex full
!
interface Ethernet2/7
description – SERVERS_ACCESS –
ip address 172.16.111.254 255.255.255.0
duplex full
!
router bgp 211
bgp router-id 10.0.0.111
bgp log-neighbor-changes
bgp bestpath as-path multipath-relax
network 10.0.0.111 mask 255.255.255.255
network 172.16.111.0 mask 255.255.255.0
neighbor 192.168.1.20 remote-as 101
neighbor 192.168.2.20 remote-as 102
neighbor 192.168.3.20 remote-as 103
neighbor 192.168.4.20 remote-as 104
maximum-paths 4
!
ip forward-protocol nd
!
!
no ip http server
no ip http secure-server
!
!
control-plane
!
!
line con 0
exec-timeout 0 0
privilege level 15
logging synchronous
stopbits 1
line aux 0
exec-timeout 0 0
privilege level 15
logging synchronous
stopbits 1
line vty 0 4
password licenta
logging synchronous
login
line vty 5 15
password licenta
logging synchronous
login
!
!
end
LEAF11#sh ip route
Gateway of last resort is 192.168.4.20 to network 0.0.0.0
B* 0.0.0.0/0 [20/0] via 192.168.4.20, 01:28:24
[20/0] via 192.168.3.20, 01:28:24
[20/0] via 192.168.2.20, 01:28:24
[20/0] via 192.168.1.20, 01:28:24
8.0.0.0/32 is subnetted, 2 subnets
B 8.8.4.4 [20/0] via 192.168.4.20, 01:21:26
[20/0] via 192.168.3.20, 01:21:26
[20/0] via 192.168.2.20, 01:21:26
[20/0] via 192.168.1.20, 01:21:26
B 8.8.8.8 [20/0] via 192.168.4.20, 01:21:56
[20/0] via 192.168.3.20, 01:21:56
[20/0] via 192.168.2.20, 01:21:56
[20/0] via 192.168.1.20, 01:21:56
10.0.0.0/32 is subnetted, 20 subnets
B 10.0.0.1 [20/0] via 192.168.1.20, 01:31:59
B 10.0.0.2 [20/0] via 192.168.2.20, 01:30:27
B 10.0.0.3 [20/0] via 192.168.3.20, 01:29:25
B 10.0.0.4 [20/0] via 192.168.4.20, 01:28:24
B 10.0.0.101 [20/0] via 192.168.4.20, 01:29:25
[20/0] via 192.168.3.20, 01:29:25
[20/0] via 192.168.2.20, 01:29:25
[20/0] via 192.168.1.20, 01:29:25
B 10.0.0.102 [20/0] via 192.168.4.20, 01:29:25
[20/0] via 192.168.3.20, 01:29:25
[20/0] via 192.168.2.20, 01:29:25
[20/0] via 192.168.1.20, 01:29:25
B 10.0.0.103 [20/0] via 192.168.4.20, 01:29:25
[20/0] via 192.168.3.20, 01:29:25
[20/0] via 192.168.2.20, 01:29:25
[20/0] via 192.168.1.20, 01:29:25
B 10.0.0.104 [20/0] via 192.168.4.20, 01:29:25
[20/0] via 192.168.3.20, 01:29:25
[20/0] via 192.168.2.20, 01:29:25
[20/0] via 192.168.1.20, 01:29:25
B 10.0.0.105 [20/0] via 192.168.4.20, 01:29:25
[20/0] via 192.168.3.20, 01:29:25
[20/0] via 192.168.2.20, 01:29:25
[20/0] via 192.168.1.20, 01:29:25
B 10.0.0.106 [20/0] via 192.168.4.20, 01:29:25
[20/0] via 192.168.3.20, 01:29:25
[20/0] via 192.168.2.20, 01:29:25
[20/0] via 192.168.1.20, 01:29:25
B 10.0.0.107 [20/0] via 192.168.4.20, 01:29:25
[20/0] via 192.168.3.20, 01:29:25
[20/0] via 192.168.2.20, 01:29:25
[20/0] via 192.168.1.20, 01:29:25
B 10.0.0.108 [20/0] via 192.168.4.20, 01:29:25
[20/0] via 192.168.3.20, 01:29:25
[20/0] via 192.168.2.20, 01:29:25
[20/0] via 192.168.1.20, 01:29:25
B 10.0.0.109 [20/0] via 192.168.4.20, 01:29:25
[20/0] via 192.168.3.20, 01:29:25
[20/0] via 192.168.2.20, 01:29:25
[20/0] via 192.168.1.20, 01:29:25
B 10.0.0.110 [20/0] via 192.168.4.20, 01:29:25
[20/0] via 192.168.3.20, 01:29:25
[20/0] via 192.168.2.20, 01:29:25
[20/0] via 192.168.1.20, 01:29:25
C 10.0.0.111 is directly connected, Loopback0
B 10.0.0.112 [20/0] via 192.168.4.20, 01:29:25
[20/0] via 192.168.3.20, 01:29:25
[20/0] via 192.168.2.20, 01:29:25
[20/0] via 192.168.1.20, 01:29:25
B 10.0.0.113 [20/0] via 192.168.4.20, 01:29:25
[20/0] via 192.168.3.20, 01:29:25
[20/0] via 192.168.2.20, 01:29:25
[20/0] via 192.168.1.20, 01:29:25
B 10.0.0.114 [20/0] via 192.168.4.20, 01:29:25
[20/0] via 192.168.3.20, 01:29:25
[20/0] via 192.168.2.20, 01:29:25
[20/0] via 192.168.1.20, 01:29:25
B 10.0.0.115 [20/0] via 192.168.4.20, 01:29:25
[20/0] via 192.168.3.20, 01:29:25
[20/0] via 192.168.2.20, 01:29:25
[20/0] via 192.168.1.20, 01:29:25
B 10.0.0.116 [20/0] via 192.168.4.20, 01:29:25
[20/0] via 192.168.3.20, 01:29:25
[20/0] via 192.168.2.20, 01:29:25
[20/0] via 192.168.1.20, 01:29:25
100.0.0.0/32 is subnetted, 2 subnets
B 100.0.0.1 [20/0] via 192.168.4.20, 01:21:56
[20/0] via 192.168.3.20, 01:21:56
[20/0] via 192.168.2.20, 01:21:56
[20/0] via 192.168.1.20, 01:21:56
B 100.0.0.2 [20/0] via 192.168.4.20, 01:21:26
[20/0] via 192.168.3.20, 01:21:26
[20/0] via 192.168.2.20, 01:21:26
[20/0] via 192.168.1.20, 01:21:26
172.16.0.0/16 is variably subnetted, 17 subnets, 2 masks
B 172.16.101.0/24 [20/0] via 192.168.4.20, 01:29:25
[20/0] via 192.168.3.20, 01:29:25
[20/0] via 192.168.2.20, 01:29:25
[20/0] via 192.168.1.20, 01:29:25
B 172.16.102.0/24 [20/0] via 192.168.4.20, 01:29:25
[20/0] via 192.168.3.20, 01:29:25
[20/0] via 192.168.2.20, 01:29:25
[20/0] via 192.168.1.20, 01:29:25
B 172.16.103.0/24 [20/0] via 192.168.4.20, 01:29:25
[20/0] via 192.168.3.20, 01:29:25
[20/0] via 192.168.2.20, 01:29:25
[20/0] via 192.168.1.20, 01:29:25
B 172.16.104.0/24 [20/0] via 192.168.4.20, 01:29:25
[20/0] via 192.168.3.20, 01:29:25
[20/0] via 192.168.2.20, 01:29:25
[20/0] via 192.168.1.20, 01:29:25
B 172.16.105.0/24 [20/0] via 192.168.4.20, 01:29:25
[20/0] via 192.168.3.20, 01:29:25
[20/0] via 192.168.2.20, 01:29:25
[20/0] via 192.168.1.20, 01:29:25
B 172.16.106.0/24 [20/0] via 192.168.4.20, 01:29:25
[20/0] via 192.168.3.20, 01:29:25
[20/0] via 192.168.2.20, 01:29:25
[20/0] via 192.168.1.20, 01:29:25
B 172.16.107.0/24 [20/0] via 192.168.4.20, 01:29:25
[20/0] via 192.168.3.20, 01:29:25
[20/0] via 192.168.2.20, 01:29:25
[20/0] via 192.168.1.20, 01:29:25
B 172.16.108.0/24 [20/0] via 192.168.4.20, 01:29:25
[20/0] via 192.168.3.20, 01:29:25
[20/0] via 192.168.2.20, 01:29:25
[20/0] via 192.168.1.20, 01:29:25
B 172.16.109.0/24 [20/0] via 192.168.4.20, 01:29:25
[20/0] via 192.168.3.20, 01:29:25
[20/0] via 192.168.2.20, 01:29:25
[20/0] via 192.168.1.20, 01:29:25
B 172.16.110.0/24 [20/0] via 192.168.4.20, 01:29:25
[20/0] via 192.168.3.20, 01:29:25
[20/0] via 192.168.2.20, 01:29:25
[20/0] via 192.168.1.20, 01:29:25
C 172.16.111.0/24 is directly connected, Ethernet2/7
L 172.16.111.254/32 is directly connected, Ethernet2/7
B 172.16.112.0/24 [20/0] via 192.168.4.20, 01:29:25
[20/0] via 192.168.3.20, 01:29:25
[20/0] via 192.168.2.20, 01:29:25
[20/0] via 192.168.1.20, 01:29:25
B 172.16.113.0/24 [20/0] via 192.168.4.20, 01:29:25
[20/0] via 192.168.3.20, 01:29:25
[20/0] via 192.168.2.20, 01:29:25
[20/0] via 192.168.1.20, 01:29:25
B 172.16.114.0/24 [20/0] via 192.168.4.20, 01:29:25
[20/0] via 192.168.3.20, 01:29:25
[20/0] via 192.168.2.20, 01:29:25
[20/0] via 192.168.1.20, 01:29:25
B 172.16.115.0/24 [20/0] via 192.168.4.20, 01:29:25
[20/0] via 192.168.3.20, 01:29:25
[20/0] via 192.168.2.20, 01:29:25
[20/0] via 192.168.1.20, 01:29:25
B 172.16.116.0/24 [20/0] via 192.168.4.20, 01:29:25
[20/0] via 192.168.3.20, 01:29:25
[20/0] via 192.168.2.20, 01:29:25
[20/0] via 192.168.1.20, 01:29:25
192.168.1.0/24 is variably subnetted, 2 subnets, 2 masks
C 192.168.1.20/31 is directly connected, Ethernet1/0
L 192.168.1.21/32 is directly connected, Ethernet1/0
192.168.2.0/24 is variably subnetted, 2 subnets, 2 masks
C 192.168.2.20/31 is directly connected, Ethernet1/1
L 192.168.2.21/32 is directly connected, Ethernet1/1
192.168.3.0/24 is variably subnetted, 2 subnets, 2 masks
C 192.168.3.20/31 is directly connected, Ethernet1/2
L 192.168.3.21/32 is directly connected, Ethernet1/2
192.168.4.0/24 is variably subnetted, 2 subnets, 2 masks
C 192.168.4.20/31 is directly connected, Ethernet1/3
L 192.168.4.21/32 is directly connected, Ethernet1/3
LEAF11#sh ip r protocols
*** IP Routing is NSF aware ***
Routing Protocol is "bgp 211"
Outgoing update filter list for all interfaces is not set
Incoming update filter list for all interfaces is not set
IGP synchronization is disabled
Automatic route summarization is disabled
Neighbor(s):
Address FiltIn FiltOut DistIn DistOut Weight RouteMap
192.168.1.20
192.168.2.20
192.168.3.20
192.168.4.20
Maximum path: 4
Routing Information Sources:
Gateway Distance Last Update
192.168.4.20 20 01:22:09
192.168.1.20 20 01:21:38
192.168.2.20 20 01:22:09
192.168.3.20 20 01:21:38
Distance: external 20 internal 200 local 200
LEAF11#ping 172.16.116.1 source loopback 0
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 172.16.116.1, timeout is 2 seconds:
Packet sent with a source address of 10.0.0.111
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 16/24/32 ms
LEAF11#traceroute 172.16.116.1 source loopback 0 probe 12
Type escape sequence to abort.
Tracing the route to 8.8.8.8
VRF info: (vrf in name/id, vrf out name/id)
1 192.168.1.20 [AS 101] 8 msec
192.168.2.20 [AS 101] 12 msec
192.168.3.20 [AS 101] 12 msec
192.168.4.20 [AS 101] 16 msec
192.168.1.20 [AS 101] 16 msec
192.168.2.20 [AS 101] 4 msec
192.168.3.20 [AS 101] 12 msec
192.168.4.20 [AS 101] 16 msec
192.168.1.20 [AS 101] 12 msec
192.168.2.20 [AS 101] 20 msec
192.168.3.20 [AS 101] 12 msec
192.168.4.20 [AS 101] 16 msec
2 80.0.0.1 [AS 101] 16 msec
80.0.0.5 [AS 101] 24 msec
80.0.0.9 [AS 101] 16 msec
80.0.0.13 [AS 101] 24 msec
80.0.0.1 [AS 101] 24 msec
80.0.0.5 [AS 101] 32 msec
80.0.0.9 [AS 101] 16 msec
80.0.0.13 [AS 101] 20 msec
80.0.0.1 [AS 101] 32 msec
80.0.0.5 [AS 101] 28 msec
80.0.0.9 [AS 101] 8 msec
80.0.0.13 [AS 101] 28 msec
LEAF 16
LEAF16#sh running-config
Building configuration…
Current configuration : 2806 bytes
!
!
version 15.2
service timestamps debug datetime msec
service timestamps log datetime msec
!
hostname LEAF16
!
boot-start-marker
boot-end-marker
!
!
enable secret 5 $1$.Fts$1fjHKcAatNmQYjtWOUF2a0
!
no aaa new-model
no ip icmp rate-limit unreachable
ip cef
!
!
ip dhcp excluded-address 172.16.116.254
!
ip dhcp pool SERVERS
network 172.16.116.0 255.255.255.0
default-router 172.16.116.254
!
!
no ip domain lookup
no ipv6 cef
!
!
multilink bundle-name authenticated
!
!
ip tcp synwait-time 5
!
!
interface Loopback0
ip address 10.0.0.116 255.255.255.255
!
interface FastEthernet0/0
no ip address
shutdown
speed auto
duplex auto
!
interface FastEthernet0/1
no ip address
shutdown
speed auto
duplex auto
!
interface Ethernet1/0
description – TO SPINE1 eth2/7 –
ip address 192.168.1.31 255.255.255.254
duplex full
!
interface Ethernet1/1
description – TO SPINE2 eth2/7 –
ip address 192.168.2.31 255.255.255.254
duplex full
!
interface Ethernet1/2
description – TO SPINE3 eth2/7 –
ip address 192.168.3.31 255.255.255.254
duplex full
!
interface Ethernet1/3
description – TO SPINE4 eth2/7 –
ip address 192.168.4.31 255.255.255.254
duplex full
!
interface Ethernet1/4
no ip address
shutdown
duplex full
!
interface Ethernet1/5
no ip address
shutdown
duplex full
!
interface Ethernet1/6
no ip address
shutdown
duplex full
!
interface Ethernet1/7
no ip address
shutdown
duplex full
!
interface Ethernet2/0
no ip address
shutdown
duplex full
!
interface Ethernet2/1
no ip address
shutdown
duplex full
!
interface Ethernet2/2
no ip address
shutdown
duplex full
!
interface Ethernet2/3
no ip address
shutdown
duplex full
!
interface Ethernet2/4
no ip address
shutdown
duplex full
!
interface Ethernet2/5
no ip address
shutdown
duplex full
!
interface Ethernet2/6
no ip address
shutdown
duplex full
!
interface Ethernet2/7
description – SERVERS_ACCESS –
ip address 172.16.116.254 255.255.255.0
duplex full
!
router bgp 216
bgp router-id 10.0.0.116
bgp log-neighbor-changes
bgp bestpath as-path multipath-relax
network 10.0.0.116 mask 255.255.255.255
network 172.16.116.0 mask 255.255.255.0
neighbor 192.168.1.30 remote-as 101
neighbor 192.168.2.30 remote-as 102
neighbor 192.168.3.30 remote-as 103
neighbor 192.168.4.30 remote-as 104
maximum-paths 4
!
ip forward-protocol nd
!
!
no ip http server
no ip http secure-server
!
!
control-plane
!
!
line con 0
exec-timeout 0 0
privilege level 15
logging synchronous
stopbits 1
line aux 0
exec-timeout 0 0
privilege level 15
logging synchronous
stopbits 1
line vty 0 4
password licenta
logging synchronous
login
line vty 5 15
password licenta
logging synchronous
login
!
!
end
LEAF16#sh ip route
Gateway of last resort is 192.168.4.30 to network 0.0.0.0
B* 0.0.0.0/0 [20/0] via 192.168.4.30, 01:30:37
[20/0] via 192.168.3.30, 01:30:37
[20/0] via 192.168.2.30, 01:30:37
[20/0] via 192.168.1.30, 01:30:37
8.0.0.0/32 is subnetted, 2 subnets
B 8.8.4.4 [20/0] via 192.168.4.30, 01:23:40
[20/0] via 192.168.3.30, 01:23:40
[20/0] via 192.168.2.30, 01:23:40
[20/0] via 192.168.1.30, 01:23:40
B 8.8.8.8 [20/0] via 192.168.4.30, 01:24:10
[20/0] via 192.168.3.30, 01:24:10
[20/0] via 192.168.2.30, 01:24:10
[20/0] via 192.168.1.30, 01:24:10
10.0.0.0/32 is subnetted, 20 subnets
B 10.0.0.1 [20/0] via 192.168.1.30, 01:34:13
B 10.0.0.2 [20/0] via 192.168.2.30, 01:32:41
B 10.0.0.3 [20/0] via 192.168.3.30, 01:31:39
B 10.0.0.4 [20/0] via 192.168.4.30, 01:30:37
B 10.0.0.101 [20/0] via 192.168.4.30, 01:31:39
[20/0] via 192.168.3.30, 01:31:39
[20/0] via 192.168.2.30, 01:31:39
[20/0] via 192.168.1.30, 01:31:39
B 10.0.0.102 [20/0] via 192.168.4.30, 01:31:39
[20/0] via 192.168.3.30, 01:31:39
[20/0] via 192.168.2.30, 01:31:39
[20/0] via 192.168.1.30, 01:31:39
B 10.0.0.103 [20/0] via 192.168.4.30, 01:31:39
[20/0] via 192.168.3.30, 01:31:39
[20/0] via 192.168.2.30, 01:31:39
[20/0] via 192.168.1.30, 01:31:39
B 10.0.0.104 [20/0] via 192.168.4.30, 01:31:39
[20/0] via 192.168.3.30, 01:31:39
[20/0] via 192.168.2.30, 01:31:39
[20/0] via 192.168.1.30, 01:31:39
B 10.0.0.105 [20/0] via 192.168.4.30, 01:31:39
[20/0] via 192.168.3.30, 01:31:39
[20/0] via 192.168.2.30, 01:31:39
[20/0] via 192.168.1.30, 01:31:39
B 10.0.0.106 [20/0] via 192.168.4.30, 01:31:39
[20/0] via 192.168.3.30, 01:31:39
[20/0] via 192.168.2.30, 01:31:39
[20/0] via 192.168.1.30, 01:31:39
B 10.0.0.107 [20/0] via 192.168.4.30, 01:31:39
[20/0] via 192.168.3.30, 01:31:39
[20/0] via 192.168.2.30, 01:31:39
[20/0] via 192.168.1.30, 01:31:39
B 10.0.0.108 [20/0] via 192.168.4.30, 01:31:39
[20/0] via 192.168.3.30, 01:31:39
[20/0] via 192.168.2.30, 01:31:39
[20/0] via 192.168.1.30, 01:31:39
B 10.0.0.109 [20/0] via 192.168.4.30, 01:31:39
[20/0] via 192.168.3.30, 01:31:39
[20/0] via 192.168.2.30, 01:31:39
[20/0] via 192.168.1.30, 01:31:39
B 10.0.0.110 [20/0] via 192.168.4.30, 01:31:39
[20/0] via 192.168.3.30, 01:31:39
[20/0] via 192.168.2.30, 01:31:39
[20/0] via 192.168.1.30, 01:31:39
B 10.0.0.111 [20/0] via 192.168.4.30, 01:31:39
[20/0] via 192.168.3.30, 01:31:39
[20/0] via 192.168.2.30, 01:31:39
[20/0] via 192.168.1.30, 01:31:39
B 10.0.0.112 [20/0] via 192.168.4.30, 01:31:39
[20/0] via 192.168.3.30, 01:31:39
[20/0] via 192.168.2.30, 01:31:39
[20/0] via 192.168.1.30, 01:31:39
B 10.0.0.113 [20/0] via 192.168.4.30, 01:31:39
[20/0] via 192.168.3.30, 01:31:39
[20/0] via 192.168.2.30, 01:31:39
[20/0] via 192.168.1.30, 01:31:39
B 10.0.0.114 [20/0] via 192.168.4.30, 01:31:39
[20/0] via 192.168.3.30, 01:31:39
[20/0] via 192.168.2.30, 01:31:39
[20/0] via 192.168.1.30, 01:31:39
B 10.0.0.115 [20/0] via 192.168.4.30, 01:31:39
[20/0] via 192.168.3.30, 01:31:39
[20/0] via 192.168.2.30, 01:31:39
[20/0] via 192.168.1.30, 01:31:39
C 10.0.0.116 is directly connected, Loopback0
100.0.0.0/32 is subnetted, 2 subnets
B 100.0.0.1 [20/0] via 192.168.4.30, 01:24:10
[20/0] via 192.168.3.30, 01:24:10
[20/0] via 192.168.2.30, 01:24:10
[20/0] via 192.168.1.30, 01:24:10
B 100.0.0.2 [20/0] via 192.168.4.30, 01:23:40
[20/0] via 192.168.3.30, 01:23:40
[20/0] via 192.168.2.30, 01:23:40
[20/0] via 192.168.1.30, 01:23:40
172.16.0.0/16 is variably subnetted, 17 subnets, 2 masks
B 172.16.101.0/24 [20/0] via 192.168.4.30, 01:31:39
[20/0] via 192.168.3.30, 01:31:39
[20/0] via 192.168.2.30, 01:31:39
[20/0] via 192.168.1.30, 01:31:39
B 172.16.102.0/24 [20/0] via 192.168.4.30, 01:31:39
[20/0] via 192.168.3.30, 01:31:39
[20/0] via 192.168.2.30, 01:31:39
[20/0] via 192.168.1.30, 01:31:39
B 172.16.103.0/24 [20/0] via 192.168.4.30, 01:31:39
[20/0] via 192.168.3.30, 01:31:39
[20/0] via 192.168.2.30, 01:31:39
[20/0] via 192.168.1.30, 01:31:39
B 172.16.104.0/24 [20/0] via 192.168.4.30, 01:31:39
[20/0] via 192.168.3.30, 01:31:39
[20/0] via 192.168.2.30, 01:31:39
[20/0] via 192.168.1.30, 01:31:39
B 172.16.105.0/24 [20/0] via 192.168.4.30, 01:31:39
[20/0] via 192.168.3.30, 01:31:39
[20/0] via 192.168.2.30, 01:31:39
[20/0] via 192.168.1.30, 01:31:39
B 172.16.106.0/24 [20/0] via 192.168.4.30, 01:31:39
[20/0] via 192.168.3.30, 01:31:39
[20/0] via 192.168.2.30, 01:31:39
[20/0] via 192.168.1.30, 01:31:39
B 172.16.107.0/24 [20/0] via 192.168.4.30, 01:31:39
[20/0] via 192.168.3.30, 01:31:39
[20/0] via 192.168.2.30, 01:31:39
[20/0] via 192.168.1.30, 01:31:39
B 172.16.108.0/24 [20/0] via 192.168.4.30, 01:31:39
[20/0] via 192.168.3.30, 01:31:39
[20/0] via 192.168.2.30, 01:31:39
[20/0] via 192.168.1.30, 01:31:39
B 172.16.109.0/24 [20/0] via 192.168.4.30, 01:31:39
[20/0] via 192.168.3.30, 01:31:39
[20/0] via 192.168.2.30, 01:31:39
[20/0] via 192.168.1.30, 01:31:39
B 172.16.110.0/24 [20/0] via 192.168.4.30, 01:31:39
[20/0] via 192.168.3.30, 01:31:39
[20/0] via 192.168.2.30, 01:31:39
[20/0] via 192.168.1.30, 01:31:39
B 172.16.111.0/24 [20/0] via 192.168.4.30, 01:31:39
[20/0] via 192.168.3.30, 01:31:39
[20/0] via 192.168.2.30, 01:31:39
[20/0] via 192.168.1.30, 01:31:39
B 172.16.112.0/24 [20/0] via 192.168.4.30, 01:31:39
[20/0] via 192.168.3.30, 01:31:39
[20/0] via 192.168.2.30, 01:31:39
[20/0] via 192.168.1.30, 01:31:39
B 172.16.113.0/24 [20/0] via 192.168.4.30, 01:31:39
[20/0] via 192.168.3.30, 01:31:39
[20/0] via 192.168.2.30, 01:31:39
[20/0] via 192.168.1.30, 01:31:39
B 172.16.114.0/24 [20/0] via 192.168.4.30, 01:31:39
[20/0] via 192.168.3.30, 01:31:39
[20/0] via 192.168.2.30, 01:31:39
[20/0] via 192.168.1.30, 01:31:39
B 172.16.115.0/24 [20/0] via 192.168.4.30, 01:31:39
[20/0] via 192.168.3.30, 01:31:39
[20/0] via 192.168.2.30, 01:31:39
[20/0] via 192.168.1.30, 01:31:39
C 172.16.116.0/24 is directly connected, Ethernet2/7
L 172.16.116.254/32 is directly connected, Ethernet2/7
192.168.1.0/24 is variably subnetted, 2 subnets, 2 masks
C 192.168.1.30/31 is directly connected, Ethernet1/0
L 192.168.1.31/32 is directly connected, Ethernet1/0
192.168.2.0/24 is variably subnetted, 2 subnets, 2 masks
C 192.168.2.30/31 is directly connected, Ethernet1/1
L 192.168.2.31/32 is directly connected, Ethernet1/1
192.168.3.0/24 is variably subnetted, 2 subnets, 2 masks
C 192.168.3.30/31 is directly connected, Ethernet1/2
L 192.168.3.31/32 is directly connected, Ethernet1/2
192.168.4.0/24 is variably subnetted, 2 subnets, 2 masks
C 192.168.4.30/31 is directly connected, Ethernet1/3
L 192.168.4.31/32 is directly connected, Ethernet1/3
LEAF16#sh ip protocols
*** IP Routing is NSF aware ***
Routing Protocol is "bgp 216"
Outgoing update filter list for all interfaces is not set
Incoming update filter list for all interfaces is not set
IGP synchronization is disabled
Automatic route summarization is disabled
Neighbor(s):
Address FiltIn FiltOut DistIn DistOut Weight RouteMap
192.168.1.30
192.168.2.30
192.168.3.30
192.168.4.30
Maximum path: 4
Routing Information Sources:
Gateway Distance Last Update
192.168.4.30 20 01:24:19
192.168.2.30 20 01:24:18
192.168.3.30 20 01:23:48
192.168.1.30 20 01:23:48
Distance: external 20 internal 200 local 200
LEAF16#ping 172.16.101.1 source loopback 0
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 172.16.101.1, timeout is 2 seconds:
Packet sent with a source address of 10.0.0.116
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 24/34/44 ms
LEAF16#traceroute 8.8.4.4 source loopback 0 probe 12
Type escape sequence to abort.
Tracing the route to 8.8.4.4
VRF info: (vrf in name/id, vrf out name/id)
1 192.168.4.30 [AS 101] 12 msec
192.168.1.30 [AS 101] 8 msec
192.168.2.30 [AS 101] 8 msec
192.168.3.30 [AS 101] 4 msec
192.168.4.30 [AS 101] 12 msec
192.168.1.30 [AS 101] 8 msec
192.168.2.30 [AS 101] 8 msec
192.168.3.30 [AS 101] 8 msec
192.168.4.30 [AS 101] 12 msec
192.168.1.30 [AS 101] 8 msec
192.168.2.30 [AS 101] 12 msec
192.168.3.30 [AS 101] 0 msec
2 80.0.1.13 [AS 101] 12 msec
80.0.1.1 [AS 101] 16 msec
80.0.1.5 [AS 101] 24 msec
80.0.1.9 [AS 101] 16 msec
80.0.1.13 [AS 101] 16 msec
80.0.1.1 [AS 101] 16 msec
80.0.1.5 [AS 101] 16 msec
80.0.1.9 [AS 101] 8 msec
80.0.1.13 [AS 101] 16 msec
80.0.1.1 [AS 101] 20 msec
80.0.1.5 [AS 101] 16 msec
80.0.1.9 [AS 101] 28 msec
Annex 2.
Physical Lab Topology
Copyright Notice
© Licențiada.org respectă drepturile de proprietate intelectuală și așteaptă ca toți utilizatorii să facă același lucru. Dacă consideri că un conținut de pe site încalcă drepturile tale de autor, te rugăm să trimiți o notificare DMCA.
Acest articol: University Politehnica of Bucharest [307767] (ID: 307767)
Dacă considerați că acest conținut vă încalcă drepturile de autor, vă rugăm să depuneți o cerere pe pagina noastră Copyright Takedown.
