While helping a friend, I recently stumbled upon an interesting issue with Administrative distances that confused me for a bit. But, when I took a step back and started going through the route selection processes step by step, it started making sense. The issue? With the default Cisco Administrative Distances, OSPF was winning from eBGP on a specific prefix.
First, let’s explain Administrative Distances (AD) a bit:
When adding a route to the Routing Information Base and there are multiple routes of the same length from different protocols, the router needs to decide which route to use. It can not compare based on the protocol’s metrics, because the protocols metrics all mean something else. So, routers use an Administrative Distance to break the tie. Cisco uses the following defaults:
- 20 for eBGP
- 110 for OSPF
- 200 for iBGP.
Lower is better, so you would think that having a valid route in eBGP and in OSPF would always result in eBGP winning and installing it’s route. However, this is not always the case.
Here is the setup:
The satelites loopback address will function as the network we want to reach. This is 3.3.3.3/32. In this scenario, CE1 has routing information for the Satelite from three sources: eBGP via WAN, iBGP directly from the Satelite, and OSPF from the C router. Which way will the traffic go?
1 2 3 4 5 6 7 8 |
CE1#trace 3.3.3.3 Type escape sequence to abort. Tracing the route to 3.3.3.3 VRF info: (vrf in name/id, vrf out name/id) 1 10.0.1.4 4 msec 0 msec 0 msec ! C 2 10.0.2.2 1 msec 0 msec 0 msec ! CE2 3 10.0.12.3 0 msec * 2 msec ! Satelite |
It seems OSPF has won, even though we have a route with a better Admin Distance from eBGP:
1 2 3 |
CE1#show ip bgp neighbors 10.1.0.5 received-routes | i 3.3.3.3 r 3.3.3.3/32 10.1.0.5 0 1 65000 i |
The line begins with an r, which means there is a RIB failure. Other prefixes recieved from this eBGP neighbor get installed correctly:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
CE1#show ip bgp neighbors 10.1.0.5 received-routes | i 5.5.5.5 *> 5.5.5.5/32 10.1.0.5 0 0 1 i CE1#show ip route 5.5.5.5 Routing entry for 5.5.5.5/32 Known via "bgp 65000", distance 20, metric 0 Tag 1, type external Redistributing via ospf 1 Advertised by ospf 1 subnets Last update from 10.1.0.5 00:23:27 ago Routing Descriptor Blocks: * 10.1.0.5, from 10.1.0.5, 00:23:27 ago Route metric is 0, traffic share count is 1 AS Hops 1 Route tag 1 MPLS label: none |
Did you notice what was missing from the 3.3.3.3/32? The > indicating best. Let’s see what the BGP on CE1 knows about 3.3.3.3:1)What does the next-hop mismatch mean? A debug ip bgp internal
on CE2 shows the following, but I don’t know exactly what it means:
1 2 3 4 5 6 7 8 9 |
*Dec 24 15:32:19.564: BGP: net global:IPv4 Unicast:base 3.3.3.3/32 RIB-INSTALL Attempting to install. *Dec 24 15:32:19.564: BGP: net global:IPv4 Unicast:base 3.3.3.3/32 RIB-INSTALL Built route type: 512, flags: 200000, tag: 0, metric: 0 paths: 1. *Dec 24 15:32:19.564: BGP: net global:IPv4 Unicast:base 3.3.3.3/32 RIB-INSTALL Path 1, type: DEF, gw: 10.0.12.3, idb: N/A, topo_id: 0, src: 10.0.12.3, lbl: 1048577, flags: 0. *Dec 24 15:32:19.564: BGP: net global:IPv4 Unicast:base 3.3.3.3/32 RIB-INSTALL Installing 1 paths, multipath limit 1 (from 1). *Dec 24 15:32:19.564: RT: updating bgp 3.3.3.3/32 (0x0) : via 10.0.12.3 0 *Dec 24 15:32:19.564: RT: rib update return code: 17 *Dec 24 15:32:19.564: BGP: net global:IPv4 Unicast:base 3.3.3.3/32 RIB-INSTALL Worse distance than ospf route, next-hop mismatch. |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
CE1#show ip bgp 3.3.3.3/32 BGP routing table entry for 3.3.3.3/32, version 13 Paths: (2 available, best #1, table default, RIB-failure(17) - next-hop mismatch) Advertised to update-groups: 1 Refresh Epoch 4 Local, (received & used) 10.0.12.3 from 10.0.12.3 (3.3.3.3) Origin IGP, metric 0, localpref 100, valid, internal, best rx pathid: 0, tx pathid: 0x0 Refresh Epoch 3 1 65000, (received & used) 10.1.0.5 from 10.1.0.5 (5.5.5.5) Origin IGP, localpref 100, valid, external rx pathid: 0, tx pathid: 0 |
So, here we see iBGP winning the path selection within the BGP process. It will then install the route into the routing table, where we get a collision. AD’s are compared and OSPF wins with its 110 AD vs iBGP’s 200. BGP will never go back and compare the eBGP path to the current installed route, because it already did its own checks. Let’s see this in action with a debug ip routing on CE2, while we reload the Satelite
2) Now, this was even more fun, when reloading the satellite, I got into an update loop: both CE1 and CE2 recieved the iBGP route at the same time, and redistribute it into OSPF. After this, they both prefer the OSPF route from eachother and flush their own LSA 5. After which they both uninstall the OSPF route at the same time and prefer the iBGP route again, ad infinitum. :
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 |
CE1#debug ip routing IP routing debugging is on CE1# CE1# ! RELOAD Satelite CE1# *Dec 24 15:45:12.958: %BGP-5-NBR_RESET: Neighbor 10.0.12.3 reset (Peer closed the session) *Dec 24 15:45:12.958: %BGP-5-ADJCHANGE: neighbor 10.0.12.3 Down Peer closed the session *Dec 24 15:45:12.958: %BGP_SESSION-5-ADJCHANGE: neighbor 10.0.12.3 IPv4 Unicast topology base removed from session Peer closed the session CE1# *Dec 24 15:45:12.958: RT: updating bgp 3.3.3.3/32 (0x0) : via 10.1.0.5 0 *Dec 24 15:45:12.958: RT: closer admin distance for 3.3.3.3, flushing 1 routes *Dec 24 15:45:12.958: RT: add 3.3.3.3/32 via 10.1.0.5, bgp metric [20/0] *Dec 24 15:45:12.961: RT: del 0.0.0.0 via 10.1.0.5, bgp metric [20/0] *Dec 24 15:45:12.961: RT: delete network route to 0.0.0.0/0 *Dec 24 15:45:12.961: RT: default path has been cleared *Dec 24 15:45:12.961: RT: del 3.3.3.3 via 10.1.0.5, bgp metric [20/0] *Dec 24 15:45:12.961: RT: delete subnet route to 3.3.3.3/32 CE1# CE1# ! SATELLITE COMES BACK UP CE1# *Dec 24 15:45:27.849: %BGP-5-ADJCHANGE: neighbor 10.0.12.3 Up CE1# CE1# ! iBGP ROUTE WINS FROM eBGP, THEN OSPF WINS FROM iBGP CE1# *Dec 24 15:45:43.539: RT: updating bgp 3.3.3.3/32 (0x0) : via 10.0.12.3 0 *Dec 24 15:45:43.539: RT: add 3.3.3.3/32 via 10.0.12.3, bgp metric [200/0] *Dec 24 15:45:43.540: RT: updating ospf 3.3.3.3/32 (0x0) : via 10.0.1.4 Et0/1 0 *Dec 24 15:45:43.540: RT: closer admin distance for 3.3.3.3, flushing 1 routes *Dec 24 15:45:43.540: RT: add 3.3.3.3/32 via 10.0.1.4, ospf metric [110/1] *Dec 24 15:45:43.540: RT: updating bgp 3.3.3.3/32 (0x0) : via 10.0.12.3 0 *Dec 24 15:45:43.540: RT: rib update return code: 17 CE1# |
The solution here? Since iBGP is basically used as an external route processor, you can adjust the AD for iBGP, so it will win from OSPF:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 |
CE1(config)#router bgp 65000 CE1(config-router)#distance bgp 20 21 22 CE1(config-router)#end CE1#show ip bgp 3.3.3.3/32 BGP routing table entry for 3.3.3.3/32, version 4 Paths: (2 available, best #1, table default) Advertised to update-groups: 11 Refresh Epoch 2 Local, (received & used) 10.0.12.3 from 10.0.12.3 (3.3.3.3) Origin IGP, metric 0, localpref 100, valid, internal, best rx pathid: 0, tx pathid: 0x0 Refresh Epoch 2 1 65000, (received & used) 10.1.0.5 from 10.1.0.5 (5.5.5.5) Origin IGP, localpref 100, valid, external rx pathid: 0, tx pathid: 0 CE1#show ip route 3.3.3.3 Routing entry for 3.3.3.3/32 Known via "bgp 65000", distance 21, metric 0, type internal |
References
1. | ↑ | What does the next-hop mismatch mean? A debug ip bgp internal on CE2 shows the following, but I don’t know exactly what it means:
| ||
2. | ↑ | Now, this was even more fun, when reloading the satellite, I got into an update loop: both CE1 and CE2 recieved the iBGP route at the same time, and redistribute it into OSPF. After this, they both prefer the OSPF route from eachother and flush their own LSA 5. After which they both uninstall the OSPF route at the same time and prefer the iBGP route again, ad infinitum. |