Troubleshooting a discrepancy between convergence time of different VRFs with identical settings, running over the same topology, I got the chance delve deeper into OSPF timers. Since it took a while to wrap my head around all the moving parts and I only fully grokked it after drawing everything on a whiteboard for a colleague, I decided to explain this thoroughly here.
This will be a bit of a long read. In short: If you don’t set your LSA accept timers lower than your LSA send timers, you will have to wait for your retransmission time-out get
Multiple Layer 3 switches are connected in series. All devices have six VRFs and do not run MPLS. A single transit-VLAN per VRF is configured from one end of the series to the other. The switches on either end are connected with the rest of the network and receive a default route from BGP per VRF. Devices are called R and N to differentiate them more easily later, when we will be looking at a failure scenario.
Over this layer two connection, a full mesh of point-to-multipoint neighborships is made. Since this is a broadcast domain, neighborships will form automatically.
When one link fails, all neighborships transiting that link will time out. Timers are set to OSPF fast hellos with a multiplier of 5. So, minimum detection time is 800ms, maximum detection time is 1s.
When failing a link, we noticed that some VRFs would take 5 longer seconds to converge than other VRFs. What was even more puzzling: every test, it would be different VRFs that would be slow and ones that were previously slow would be fast this time around. We explored many possibilities, from replacing R1 with a proper router, to puzzling over internal software timers that would do a walk over all VRFs. With much needed help from Cisco TAC (thanks Dan!) we found the culprit, which turned out to be OSPF timers.
Before I can explain the behaviour, I need to explain the timers and mechanisms involved. Configuration examples below are the default values.
LSA send interval
To avoid flooding your neighbors, OSPF doesn’t send an LSA directly after each event. The first LSA is sent directly. After it is sent, OSPF will immediately start a backoff timer. During this timer, OSPF will delay generating LSAs. The timer is set in msec with the command
timers throttle lsa 0 5000 5000
. The first number is the initial delay between sending the first and sending the second self-generated LSA. The delay is increased by the second number each time, until the maximum is reached, which is set by the third number.
LSA Arrival timer
Receiving the same LSA with different information over and over again, is indicative of an unstable setup. In order to avoid calculating the same flap over and over again, OSPF ignores any LSA received shorter than the LSA arrival timer after accepting the previous LSA. This timer is set in msec per OSPF process with the command timers lsa arrival 1000.
OSPF needs the LSDB to be the same on all routers in the Area. To make sure this is correct, every LSA is acknowledged upon receipt. When an LSA is not acknowledged, it is resent after the retransmit interval in seconds, which you set per interface with the command ip ospf retransmit-interval 5.
To speed up convergence in our setup, we adjusted the LSA generation timers. The LSA accept timer left at the default of 80.
router ospf 1 vrf 1
timers throttle lsa 0 50 200
This meant that the first and second LSA would be updated and sent as soon as the first and second neighborship timed out. If the third neighborship went down within 50 milliseconds of the second LSA, sending the third LSA would be delayed until that timer expired. Meanwhile, the receiving router would ignore any LSA with the same LSID for 80 milliseconds after accepting an LSA.
The end result was a lottery based upon the timers. If the final LSA was sent after the accept timer timed out, everything would converge happily. If the final LSA was ignored, it would be resent after the retransmit time-out of 5 seconds. This is why some VRFs were so much slower than others. To illustrate, I drew out what happens when the timers collide. This is the advertisement of one router to another. The same thing will be happening the other way around.1)In these scenarios, I only look at the convergence of the OSPF LSDB. This is before the SPF calculation, so calculation timers are out of scope. I also ignore the time it takes to transmit the packets. This is possible, because the sending router will start counting when sending, and the receiving router will start counting when receiving the same LSA. Assuming no jitter, this offset will always be the same.
Why is this bad? R2 knows that N1, N2 and N3 are unreachable. It also got the first LSA from R1 saying it lost connection with N1. But, remember that all routers had a full neighborship. With the information R2 has, it will come to the following conclusion:
- R1 is still connected to N2 and N3.
- N2 and N3 are still connected to N1.
- This means N1, N2 and N3 are still reachable through R1.
R1 will make the following graph of the network:
R1 will have a similar misinterpretation of the network.
Fixing the timers
How can we fix this behaviour? By making sure that all routers accept the LSAs faster than they are generated. Remember: the router only ignores refreshed LSAs with the same LSID. This does not influence two LSAs from different routers sent at the same time. So, here is the same scenario as before, but with the backoff timer set to 50ms and the accept timer to 30:
Can this go wrong with the defaults?
In my opinion, the default values in this scenario are dangerous. If we look at the same scenario from the other side of the network, there are only two neighborships that time out. If those events are within 80 msec from each other, the second LSA will be dropped and we will have to wait for a retransmit time-out to converge the LSDB.
With hello timers 200ms apart, and a 80ms in which you can have a timer collision, there is a 40% chance of this occurring on a single router. The chances of everything going well when there are three routers in the remaining segment is 60% * 60% * 60% = 21.6%, going down with more routers in the domain.
If you run a full mesh point-to-multipoint, make sure you accept timer is lower than the generate timer for the LSA. Cisco advises setting it lower than the second timer in the command, but I would recommend setting it lower than the first timer as well. This is the only scenario I could think of, where there is a collision with the default timers. Of course, this is quite a corner case, and chances of running into it are low, but it is a point to be aware of when adjusting OSPF timers.
References [ + ]
|1.||↑||In these scenarios, I only look at the convergence of the OSPF LSDB. This is before the SPF calculation, so calculation timers are out of scope. I also ignore the time it takes to transmit the packets. This is possible, because the sending router will start counting when sending, and the receiving router will start counting when receiving the same LSA. Assuming no jitter, this offset will always be the same.|