Quick area type overview

Unless specified otherwise, all OSPF area starts out as a standard area. Of course, area 0 is always the backbone and needs the correct details. This means that the type of area 0 is always normal. A normal network will always reatin all information. The information just gets aggregated at area borders1)This assumes you aren’t doing any manual summarisation..
Continue reading

References   [ + ]

1. This assumes you aren’t doing any manual summarisation.

While writing my post about OSPF over DMVPN, I noticed something funny. When you set an OSPF network to non-broadcast, you need to set a manual neighbor. A fun trick with this is, that only one side needs the neighbor command, and the neighborship will come up. I thought this would be a nice way to keep configuration simple: Just put the same neighbor command at each spoke, and be done with it.

I noted a problem with this configuration, however, that turned out not to be a problem with my patience. Please follow me through this rabbit-hole, and see what it seemed my configuration didn’t work.

The problem

What actually happens when you set this up? Surprisingly little. You set the neighborship, you wait a bit, and then you get the message that the neighborship is down.

What happened? Shouldn’t OSPF start setting up the neighborship when one side attempts this?
The secret is in the OSPF priority. Apparently, if the side with the neighbor command has priority 0, the neighborship will not come up. Apparently, a nonbroadcast neighborship will first elect itself as DR, before setting up the neighborship. If it cannot elect an interim DR, it will not set up the tunnel:

Fix attemp #1: Interface priority

A fix is to set the priority to 1, so it will become BDR, then set up the neighborship. The problem is that, when the hub reloads, one of the spokes will take over as DR and remain there.

Fix attempt #2: Neighbor priority

I thought there might be another fix: if you set the priority on the neighbor statement, you let OSPF know you want that neighbor to become DR.

And see it doesn’t seem to work at first:

But exactly two minutes after the dead timer expired, this happens:

It worked, but it took TWO minutes plus dead timer? What happened? I shut down both tunnel interfaces to simulate a freshly starting network. I unshut the spoke before the tunnel, resulting in the ATTEMPT state failing before the wait timer on the hub interface expired.1 You can see here that the Hub elects itself as BDR almost 4 seconds after the neighborship attempt failed. The spoke apparently waits 2 minutes before reattempting. This seems like a logical explanation:

Back to the original problem

So , was I just too impatient the first time around? Let’s me reset everything. I shut both tunnels, and I first unshut the hub. I wait until it is back up and considers itself DR:

Everything is ready, so now I unshut the tunnel on the hub, and I get the following:

I thought this was maybe an order of operations thing, where the OSPF packet gets sent before NHRP is registered correctly. However, adding the command ip ospf transmit delay 2 to the interface doesn’t change it’s behaviour on this interface. The interface will still start right away:

Conclusion

I cannot find a good explanation for this problem, except that sometimes I am just too impatient.


  1. Remember: The wait timer and dead timer are the same value. 

Setup

The setup for this blog is the same as in the DMVPN Phases in distance vector post. The hostnames will be Phase-number, where router 1 is the hub. Loopbacks will phase.phase.phase.routernumber and put in the same OSPF area as the DMVPN link. To make sure neighborships will not be affected by timers, I set the tunnel timers the OSPF fast timers:

We also have to make sure none of the spokes become DR, so we set their priority to 0:

Since the neighborships have to be in the same area, you can’t summarise on the hub. And since we are mapping multicast to the NHS servers and they will be our DR (if we have a DR), I will be ignoring non-broadcast versions of the network types. However, there is an interesting bit about non-broadcast networks that I will talk about in an upcoming post.

Phase 1

Phase 1 is a true hub-spoke topology. A broadcast network type will not work. So, we will be forced to point-to-multipoint connections. We have two options for this, all routers are point-to-multipoint or only the hubs are point-to-multipoint and the spokes are point-to-point. The last option is exactly following the topology and it is the least config, since tunnel interfaces are point-to-point by default anyway.

The timers here are important, since point-to-point uses different default timers than point-to-multipoint. However, I noticed an interesting quirk on my test-routers: when setting the interface to p-t-mp, the manually set hello timer was overridden. Reapplying the hello to the interface fixed this issue.

We have loopbacks in our routing table:

And traces work:

Phase 2

As with Phase 1, if we use point-to-multipoint, things will still be routed through the hub. Will multipoint-to-multipoint be any different? Actually no, because all spokes only build a neighborship with the spoke and all connections are represented in OSPF as point-to-point links to the hub, with the /32 networks being set to the hub:

So, all routes will still traverse the hub, negating the point-to-multipoint capability of phase 2:

###Broadcast
So, for phase 2 a broadcast network-type should work better:

We have direct routes, let’s see what happens:

Success!

Phase 3

The whole point of phase 3 is that you can set the hub as a next-hop for all routes and it will override the RIB when the need arrives. However, the router I tried this on don’t actually listen to this.

Point to Multipoint

Setting the hub as a point-to-multipoint and the spokes as point-to-point, will get us a routing table where the Hub is used as the next hop:

All routes point through the hub, and even though NHRP redirect is set, the spoke will keep sending it to the hub, even when a tunnel is set up to spoke-to-spoke. Things routed over the tunnel will go through the hub, traffic to the tunnel ip will get redirected correctly:

Multipoint-to-Multipoint

So, what happens when we make them all point-to-multipoint? We get exactly the same behaviour. The only difference is an extra /32 in the routing table for the other spoke:

A repeat of the traceroute does not change the behaviour (see below).

Broadcast

When setting the network as broadcast, we see that the next-hop will be set correctly, the first trace goes via the hub, but tracing right after that will use the straight tunnel:

Conclusion

To make use of the correct routing paradigms, we should use the following network types per phase. Please note that Phase 3 doesn’t actually add anything when using OSPF:

Phase Network type hub Network type spoke
1 Point-to-Multipoint point-to-point
2 Broadcast (DR) Broadcast
3 Broadcast (DR) Broadcast

Distance vector routing over the DMVPN Phases

In my previous post, I explained the difference in the phases of DMVPN. The setup is as follows: R1 is the hub, R2 and R3 are spokes. Each has a loopback of 1.1.1.1, 2.2.2.2 and 3.3.3.3 respectively:

DMVPN Lay-out

So, we have the tunnels set up, but how do we route over it? It is not so difficult. Phase 1 and phase 3 both want all routes pointing to the hub, phase 2 want the routes pointing to the respective routers originating them. For the distance vector routing this boils down to:

Phase 1 2 3
EIGRP split-horizon OFF & next-hop-self ON split-horizon OFF & next-hop-self OFF Summarise
RIP split-horizon OFF split-horizon OFF & use RIPv2 Summarise and turn ON split horizon

Below are the explanations and parts of the configurations, as they differ from using the routing protocol on normal Ethernet interfaces. If you want full configurations, you can see examples of each on Fryguy’s Blog

The outputs below will have the DMVPN phase followed by the routernumber used as hostname. So, Phase3-1 is R1 (the hub) in Phase 3.

RIP

At first I thought that in RIP, you can’t turn of third party next hop, meaning that RIP shouldn’t work in Phase 1 and 3. Turns out I was wrong. First off, you can turn it off by running RIP version 1. But, even with third party next hop, you can send a packet through the tunnel with another spokes tunnel IP as destination, and the Hub will happily forward it. So long as you turn off split horizon on the hub, you are fine:

Let’s route through the hub:

This setup works for any phase. If you want to make sure to have the hub as the next hub, run RIPv1, other than that, it just works.

RIP and phase 3 optimisations

One of the points of DMVPN Phase 3 is being able to summarise your routes for a smaller RIB. However, if you turn on summarisation on the Hub’s tunnel interface, you will actually increase the number of routes in the RIB1:

The kicker? Now, when you turn ON split-horizon, it does work:

Yes, turn ON both summarisation and split-horizon, and you will get the effect you want! This will of course also work for Phase 1.

EIGRP phases one and two:

So, does this hold true for EIGRP as well? Well, kind of. If you want to advertise all the routes, turn off split-horizon and turn on next-hop-self:

If you want to summarise, it turns out that split-horizon and next-hop-self just don’t matter at all. Any combination of split-horizon and next-hop-self will result in the same behaviour[^3]:

Well I say any combination, but there is one exception: Summarising to the same subnet as you are receiving from the spoke, will need split-horizon turned off:

EIGRP in Phase 2

Phase 2 is simple, you forget all the things we did so far and just turn off split horizon on the hub:

And you will have full connectivity. Next time, OSPF over DMVPN.


  1. The RegEx used here matches anything with ” 2.” (Space + two + dot) 

The setup

In my previous post, I explained the building blocks of a DMVPN Phase 2 tunnel. DMVPN has three phases, differentiated in the way spoke-to-spoke traffic is handled:

  • Phase 1: route all traffic through the hub
  • Phase 2: Have the spoke as a next hop in your routing table for every spoke network. When you need to know where the spoke is, ask the hub and set up a spoke to spoke tunnel.
  • Phase 3: Route all traffic through the hub. If the hub notices traffic is spoke-to-spoke, it will send a redirect, triggering a spoke to spoke tunnel.

This means that Phase 1 is inefficient from a forwarding perspective, Phase 2 is inefficient in it’s use of routing information, Phase 3 addresses both problems. Phase 1 and 2 are considered obsolete, but we still need to know how to configure all of them. First I will explain the difference in config on the tunnel interface, then I will explain how to handle the routing over the different phases.

Tunnel configuration

The tunnel config difference is surprisingly small:

Phase Hub Spoke
1 mGRE GRE, so it also needs a destination address
2 mGRE mGRE
3 mGRE plus NHRP redirect & shortcut mGRE plus NHRP shortcut

So, moving from the phase 2 config to a phase 1, on the spokes you simple do:

When you want to move from phase 2 to phase 3, you have to add NHRP shortcut to all routers and NHRP redirects on the hub.

Spoke:

Hub:

And that is all there is to it. Well, all but the hardest part: You need to adjust your routing protocol to make sure to hubs and spokes behave in accordance to the phase they are in. In the next posts, I will explain this for distance-vector protocols and for OSPF respectively.

Building a DMVPN topology, has quite a few bits of configuration on quite some devices. It can all seem a bit daunting, especially since most examples just give you the different configurations for the hubs and the spokes in one go. Iin reality, there are only a few lines that differ among the devices. There is some configuration that has to be unique on each device, some that differentiates between a hub and a spoke and the rest is the same across the board. The differences all have to do with these answering these questions:

Scope Questions
Unique per device: How do I source my packets?
Hubs only: What should I do with incoming NHRP registrations?
Spokes only: Where should I send my NHRP registrations?
All devices: How do I identify a tunnel and how should I handle my packets?

I will handle each seperately. All configuration here is done under interface tunnel NUM.

Unique per device

To answer the question how to source the packets, you need two pieces of information: A source address outside of the tunnel and a source address inside of the tunnel, so there are only two lines unique per device:

Hub config

The difference between hubs and spokes is that the hub listens for incoming tunnels. You don’t actually have to configure anything, but since we want multicast to work for routing protocols to work properly, we will tell the Hub to register multicast traffic in NHRP:

Spoke config

The spokes are the ones actually actively setting up the tunnels. So, they need to know where to send their NHRP registrations. For this, they need to know two addresses. The NBMA (hubs public IP) and the Tunnel IP of the hub. You can repeat this line to create a multi-hub topology:

Shared config

The shared config is all about identifying the tunnel, and specifying common parameters, such as ipsec profiles, MTU and timers. All parameters should match between the devices.

First, we set the type of tunnel, which is mGRE

The MTU needs to be lowered, in order to be able to fit all the tunnel overhead in the packets. mGRE costs 28 bytes. If you add IPsec in tunnel mode, this could balloon fast. The biggest I could find in the IPsec overhead calculator was a total of 152 bytes, including GRE.

Next, we have the NHRP id and password in combination with the tunnel key to identify the set of tunnels. These are just identifiers, and can be set to anything you like. The password is set plaintext, if you don’t use any encryption on IPsec.

Optionally, we can set a holdtimer for NHRP routes. Cisco recommends a value between 300 – 600.

Last, but most certainly not least, we can set the IPsec profile. Without this, the tunnels will come up, but your data will be unencrypted. The configuration of the IPsec profile is out of scope of this post, but should also be the same on all DMVPN devices.

Conclusion

There you have it. The only difference between the spokes and the hubs is the NHRP mapping, the only difference between all devices is the source information. At least, for a phase 2 DMVPN, which we built here. The differences between the phases will be explained in a different post.

While trying my hand at an INE question, I was triggered to figure out how EIGRP summary addresses work. What happens if we summarise a network to the exact same prefix? How can we play with the results?

To try this out, I built a small topology. R1 represents the provider, who advertises only a default route through BGP. R2 and R3 will advertise the EIGRP networks to R1 and redistribute BGP into EIGRP. Later, R2 and R3 will summarise this default network. R1’s loopback (1.1.1.1) will represent the outside networks.

EIGRP Summary

The setup on R3:

Everything works, both R4 and R5 will take the shortest path out:

Now, let’s see what happens when we summarise the default network to the same prefix. Only on R3 first:

So, both R4 and R5 still have a default route. Howerver, R4 now wants to go through R5 and R3. At first I thought this was because I am redistributing the BGP route with a BW of 1Gbit and the summary is being advertised with a BW of 10Gbit. Later I figured out the real reason: The summarised route is advertised as EIGRP internal and the redistributed route is advertised as EIGRP external. EIGRP defaults to preferring internal routes, so the summarised route wins:

A trace should still work, right? R3 gets the default route from BGP. R4 and R5 know a default route towards R3. Let’s try this out:

Wait! We are getting unreachables. Why? Let’s check R3, because it is the router generating the unreachables:

The EIGRP summary route wins from the BGP route, because it has an AD of 5, which is better than eBGP’s 20. The router doesn’t care if the route points to Null0, or not. A route is a route. Let’s fix this, and let’s do all the same summarisation at R2 as well (not shown):

That is better. But, to be honest, we are not getting the summary route, we are getting the redistributed BGP route, as you can see from the D*EX. Does this even summarise any networks? Let’s have R1 advertise it’s loopback address and see what happens:

So, the summary does not get advertised, but it does suppress all other advertisements. Now, how can we break this further? Let’s kill the BGP connecting between R3 and R1, and see what happens:

We have the same problem as before, R3 is installing the Null0 route. R5 will not be able to connect to R1. Even worse, R4 will prefer the R3 route, because it is EIGRP internal instead of EIGRP external, and so is preferred:

Can we fix this? Yes, we can! Through the magic of an unreachable AD.

Now even R3 has a backup route and everyone can ping happily ever after.

So, in short, this is what I learned:

  • EIGRP summary routes will overwrite existing routes of inferior AD,
  • EIGRP will prefer internal routes over external routes, regardless of metric,
  • You can use an infinite AD on a summary to filter out more specific routes,

While helping a friend, I recently stumbled upon an interesting issue with Administrative distances that confused me for a bit. But, when I took a step back and started going through the route selection processes step by step, it started making sense. The issue? With the default Cisco Administrative Distances, OSPF was winning from eBGP on a specific prefix.

First, let’s explain Administrative Distances (AD) a bit:
When adding a route to the Routing Information Base and there are multiple routes of the same length from different protocols, the router needs to decide which route to use. It can not compare based on the protocol’s metrics, because the protocols metrics all mean something else. So, routers use an Administrative Distance to break the tie. Cisco uses the following defaults:

  • 20 for eBGP
  • 110 for OSPF
  • 200 for iBGP.

Lower is better, so you would think that having a valid route in eBGP and in OSPF would always result in eBGP winning and installing it’s route. However, this is not always the case.
Here is the setup:

CE1 and CE2 redistribute BGP into OSPF and advertise a default to the Satellite

The satelites loopback address will function as the network we want to reach. This is 3.3.3.3/32. In this scenario, CE1 has routing information for the Satelite from three sources: eBGP via WAN, iBGP directly from the Satelite, and OSPF from the C router. Which way will the traffic go?

It seems OSPF has won, even though we have a route with a better Admin Distance from eBGP:

The line begins with an r, which means there is a RIB failure. Other prefixes recieved from this eBGP neighbor get installed correctly:

Did you notice what was missing from the 3.3.3.3/32? The > indicating best. Let’s see what the BGP on CE1 knows about 3.3.3.3:1)What does the next-hop mismatch mean? A debug ip bgp internal on CE2 shows the following, but I don’t know exactly what it means:

So, here we see iBGP winning the path selection within the BGP process. It will then install the route into the routing table, where we get a collision. AD’s are compared and OSPF wins with its 110 AD vs iBGP’s 200. BGP will never go back and compare the eBGP path to the current installed route, because it already did its own checks. Let’s see this in action with a debug ip routing on CE2, while we reload the Satelite
2) Now, this was even more fun, when reloading the satellite, I got into an update loop: both CE1 and CE2 recieved the iBGP route at the same time, and redistribute it into OSPF. After this, they both prefer the OSPF route from eachother and flush their own LSA 5. After which they both uninstall the OSPF route at the same time and prefer the iBGP route again, ad infinitum. :

The solution here? Since iBGP is basically used as an external route processor, you can adjust the AD for iBGP, so it will win from OSPF:

References   [ + ]

1. What does the next-hop mismatch mean? A debug ip bgp internal on CE2 shows the following, but I don’t know exactly what it means:

2. Now, this was even more fun, when reloading the satellite, I got into an update loop: both CE1 and CE2 recieved the iBGP route at the same time, and redistribute it into OSPF. After this, they both prefer the OSPF route from eachother and flush their own LSA 5. After which they both uninstall the OSPF route at the same time and prefer the iBGP route again, ad infinitum.