SPF Delay – CCDE

SPF timers are usually one of those things that engineers don’t bother with. Hello/Dead timers are often adjusted, but not actual SPF timers themselves.

Different vendors, and even different platforms within vendors, can have dramatically different timers. Micro-loops can be even more pronounced when different vendors/platforms are involved.

SPF Timers

In OSPF, SPF is only run when certain conditions are met. One of those conditions is when a router originates a new type-1 LSA. If a router interface goes down, it will originate a new type-1 to let other routers in the area know about it. How soon after the interface goes down does the type-1 get sent? Once another router in the area receives that type-1, does it run SPF straight away? Does it flood the LSA before or after it runs SPF?
Micro-loops form when router’s FIBs do not agree on where the best path is. Two routers will bounce a packet backwards and forwards to each other until those routers agree on the forwarding path and have that path installed in their FIB.

The best way to understand this is to show the loop forming.

Let’s consider the following topology of five routers. The OSPF costs of each link is also displayed:
SPF Timers SPF Delay   CCDE

Most router interfaces have a cost of 50, while R3 has a second slower link with a cost of 200.

Under normal circumstances, any traffic from R1 to R5 with go through R2-R4.
SPF Timers2 SPF Delay   CCDE

R1#traceroute 10.0.0.5
Type escape sequence to abort.
Tracing the route to 10.0.0.5
VRF info: (vrf in name/id, vrf out name/id)
  1 192.168.12.2 12 msec 32 msec 16 msec
  2 192.168.24.4 44 msec 56 msec 16 msec
  3 192.168.45.5 68 msec 48 msec 48 msec

When the link between R2 and R4 fails, traffic should traverse the R2-R3-R4 links:
SPF Timers3 SPF Delay   CCDE
There are a number of milliseconds where this will not be the case.

In order to show how a micro-loop is formed, I’ll first need to artificially increase my SPF timers. This is because it’s very difficult to show an actual micro-loop simply with traceroute.
On R3 I’ll increase the wait time to run SPF after it receives an LSA to 10 seconds:

R3(config)#router ospf 1
R3(config-router)# timers throttle spf 10000 10000 10000

I’ll now break the link between R2 and R4 and run another traceroute from R1 to R5:

R2(config)#int gi2/0
R2(config-if)#shut
R1#traceroute 10.0.0.5
Type escape sequence to abort.
Tracing the route to 10.0.0.5
VRF info: (vrf in name/id, vrf out name/id)
  1 192.168.12.2 16 msec 16 msec 12 msec
  2  *  *
    192.168.23.3 36 msec
  3 192.168.23.2 40 msec 36 msec 68 msec
  4 192.168.23.3 44 msec 60 msec 60 msec
  5 192.168.23.2 56 msec 64 msec 60 msec
  6 192.168.23.3 100 msec 80 msec 80 msec
  7 192.168.23.2 80 msec 80 msec 84 msec
  8 192.168.23.3 80 msec 104 msec 104 msec
  9 192.168.23.2 100 msec 104 msec 100 msec
 10 192.168.23.3 128 msec 124 msec 124 msec
 11 192.168.23.2 132 msec 116 msec 124 msec
 12 192.168.23.3 152 msec 148 msec 148 msec
 13 192.168.23.2 144 msec 144 msec 148 msec
 14 192.168.23.3 152 msec
    192.168.45.5 112 msec 84 msec

Because R3 is delaying it’s SPF run until 10 seconds after it receives a relevant LSA, it still assumes the best path is through R2. R2 has run it’s SPF and it assumes the best path is through R3. This is the reason the packet bounces between both routers. The packet get to it’s destination only when R3 has run SPF and CEF updated.

Of course in the real world we don’t wait 10 seconds. But what are the actual timers? That depends a lot on which vendor and platform you’re running:

Vendor OS Initial SPF Delay (ms)
Cisco IOS & IOS-XE 5000
Cisco IOS-XR 50
Cisco NX-OS 200
Juniper Junos 200

The above list is of course not exhaustive.

The timers between vendors and platforms can be dramatically different. Even in an environment in when you are not cared about rapid convergence, it’s still important that your IGP routers all agree on their timers. Connecting an ASR1k to an ASR9k with default timers could cause traffic to loop for almost five seconds if left to the defaults. I would suggest you ensure all OSPF routers in an area, or all IS-IS routers in the same level, have identical timers.

Another option is to ensure the initial SPF delay run timer is set high enough so that LSA/LSP reaches all edges of the area/level. That way all router can run SPF at the same time and update their FIBs at the same time. The problem with this approach is that each router receives the LSA at different times. Even if they did receive them at exactly the same time, we are relying on the fact that all routers have 100% identical SPF and FIB-Update run times.

Further Reading

RFC 5715 – A Framework for Loop-Free Convergence
RFC 6976 – Framework for Loop-Free Convergence Using the Ordered Forwarding Information Base (oFIB) Approach

Been away for a while

I’ve not posted for a while as quite a bit has changed. I’ve moved to a new company and there is a lot to do. As part of the move I’ve moved house as well. This has given me zero time to update the blog.

Rest assured new content is coming soon!

Splitting a module from a python app

My OSPF checker is getting a bit big. The majority of the code is the function that parses the OSPF output and returns the required values.

I’d like to continue to refine what it can pull out. I’d also like to check non-IOS devices like Junos and IOS-XR output.

A function can very easily be moved into a new file and then called as a module. The great thing about this is that others can use the same module in different applications of their own. I can also create a separate module per OS that I’m interested in. Each can be edited separately.

The IOS OSPF checker has now been split into it’s own module like so:

import re
import sys

def ospf_information(i):
    int_list = {}
    ospf = re.split(r'[\n](?=GigabitEthernet|FastEthernet|Serial|Tunnel|Loopback|Dialer|BVI|Vlan|Virtual-Access)',i)
    print(ospf)
    for o in ospf:
        properties = {}
        interface =  re.search(r'(GigabitEthernet|FastEthernet|Serial|Tunnel|Loopback|Dialer|BVI|Vlan|Virtual-Access)[0-9]{1,4}/?[0-9]{0,4}.
?[0-9]{0,4}/?[0-9]{0,3}/?[0-9]{0,3}/?[0-9]{0,3}:?[0-9]{0,3}',o)
        if not interface:
            continue
        interface = interface.group()
        ip = re.search(r'(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}/\d{1,2})',o)
        if not ip:
            ip = re.search(r'Interface is unnumbered. Using address of [a-zA-Z]{1,10}[0-9]{1,5}/?[0-9]{0,5}.?[0-9]{0,5}',o)
            properties['IP'] = ip.group()
        else:
            properties['IP'] = ip.group()
        a = re.search(r'Area ([\s]{0,3}[0-9]{1,5})',o)
        properties['Area'] = a.group(1)
        n = re.search(r'Network Type ([\s]{0,3}[a-zA-Z_]{0,20})',o)
        properties['Net'] = n.group(1)
        c = re.search(r'Cost: ([0-9]{1,5})',o)
        properties['Cost'] = c.group(1)
        s = re.search(r'line protocol is[\s]([a-zA-Z]{1,4})',o)
        properties['Status'] = s.group(1)
        p = re.search(r'Passive',o)
        if p:
            properties['Neigh'] = "Passive Interface"
            properties['Adj'] = None
        else:
            ne = re.search(r'(?:Neighbor Count is )([0-9]{1,3})',o)
            if not ne:
                properties['Neigh'] = None
            else:
                properties['Neigh'] = ne.group(1)
            ad = re.search(r'(?:Adjacent neighbor count is )([0-9]{1,3})',o)
            if not ad:
                properties['Adj'] = None
            else:
                properties['Adj'] = ad.group(1)
        h = re.search(r'Hello ([0-9]{1,3})',o)
        if not h:
            properties['Hello'] = None
        else:
            properties['Hello'] = h.group(1)
        d = re.search(r'Dead ([0-9]{1,3})',o)
        if not d:
            properties['Dead'] = None
        else:
            properties['Dead'] = d.group(1)
        int_list[interface]=properties
    return int_list

if __name__ == "__main__":
    f = open(sys.argv[1])
    info = f.read()
    f.close()
    ospf = ospf_information(info)
    print("This device contains "+str(len(ospf))+" ospf enabled interfaces")
    print(ospf)

A couple of things to note here. The module now returns a dictionary. This allows any app using this module to easily extract whatever values it chooses instead of iterating through a list. The last section of code allows me to run the module directly against some raw router output directly to pull information out. This part is not run if calling as a module.

In my main application I now simply import the module and change how I call it slightly:

import ospfios
 ospf_int = ospfios.ospf_information(output)

I’ve started a preliminary Junos OSPF module which will return similar values:

import re
import sys

def ospf_information(i):
    int_list = {}
    ospf = re.split(r'[\n](?=ge|fe|lo|ae|et|fxp)',i)
    for o in ospf:
        properties = {}
        interface =  re.search(r'(ge|fe|lo|ae|et|fxp)([0-9]?)([-]?){0,1}[0-9]{1,5}/?[0-9]{0,5}/?[0-9]{0,5}/?[0-9]?[.][0-9]{1,5}',o)
        if not interface:
            continue
        interface = interface.group()
        ip = re.search(r'Address: (\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})',o)
        properties['IP'] = ip.group(1)
        c = re.search(r'Cost: ([0-9]{1,5})',o)
        properties['Cost'] = c.group(1)
        ad = re.search(r'(?:Adj count: )([0-9]{1,3})',o)
        properties['Adj'] = ad.group(1)
        h = re.search(r'Hello: ([0-9]{1,3})',o)
        properties['Hello'] = h.group(1)
        d = re.search(r'Dead: ([0-9]{1,3})',o)
        properties['Dead'] = d.group(1)
        int_list[interface]=properties
    return int_list

if __name__ == "__main__":
    f = open(sys.argv[1])
    info = f.read()
    f.close()
    ospf = ospf_information(info)
    print("This device contains "+str(len(ospf))+" ospf enabled interfaces")
    print(ospf)

A quick run directly on a small Junos box:

darreno@Jumpbox:~/git/ospf_checker$ python3 ospfjunos.py junos.txt
This device contains 4 ospf enabled interfaces
{'ge-1/3/0.641': {'IP': '10.11.31.227', 'Cost': '10', 'Adj': '1', 'Hello': '10', 'Dead': '40'}, 'lo0.0': {'IP': '10.11.225.224', 'Cost': '0', 'Adj': '0', 'Hello': '10', 'Dead': '40'}, 'ge-0/0/0.643': {'IP': '10.11.31.90', 'Cost': '10', 'Adj': '1', 'Hello': '10', 'Dead': '40'}, 'ge-0/2/0.644': {'IP': '10.11.31.94', 'Cost': '10', 'Adj': '1', 'Hello': '10', 'Dead': '40'}}

Splitting up your python app into multiple functions

I’ve been working on splitting my OSPF Checker into a few different functions. This has a few benefits which I’ve gone over before. I’ve split out logging into the device and capturing information into it’s own function. In future I’ll use this function to try SSH in first, and then telnetting in if that fails. I have a separate function that gets all my OSPF information.

Logging in:

def login(i):
    try:
        tn = telnetlib.Telnet(device,23,5)
        tn.read_until(b"Username: ")
        tn.write(user.encode('ascii') + b"\n")
        tn.read_until(b"Password: ")
        tn.write(password.encode('ascii') + b"\n")
        tn.write(b"\n")
        tn.write(b"terminal length 0\n")
        tn.write(b"show ver | include IOS\n")
        tn.write(b"show ip ospf interface\n")
        tn.write(b"exit\n")
        output=(tn.read_all().decode('ascii'))
        return output
    except:
        return None

OSPF Information:

def ospf_information(i):
    ospf_int = re.search(r'(GigabitEthernet|FastEthernet|Serial|Tunnel|Loopback|Dialer|BVI|Vlan|Virtual-Access)[0-9]{1,4}/?[0-9]{0,4}.?[0-9]{0,4}/?[0-9]{0,3}/?[0-9]{0,3}/?[0-9]{0,3}:?[0-9]{0,3}',i)
    if not ospf_int:
        return None
    ospf_int = ospf_int.group()
    ip = re.search(r'(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}/\d{1,2})',i)
    ip = ip.group()
    if not ip:
        ip = re.search(r'Interface is unnumbered. Using address of [a-zA-Z]{1,10}[0-9]{1,5}/?[0-9]{0,5}.?[0-9]{0,5}',i)
        ip = ip.group()
    a = re.search(r'Area ([\s]{0,3}[0-9]{1,5})',i)
    area = a.group(1)
    n = re.search(r'Network Type ([\s]{0,3}[a-zA-Z_]{0,20})',i)
    net = n.group(1)
    c = re.search(r'Cost: ([0-9]{1,5})',i)
    cost = c.group(1)
    p = re.search(r'Passive',i)
    if p:
        neighbour = "Passive"
        adjacency = None
    else:
        ne = re.search(r'(?:Neighbor Count is )([0-9]{1,3})',i)
        if not ne:
            neighbour = None
        else:
            neighbour = ne.group(1)
        ad = re.search(r'(?:Adjacent neighbor count is )([0-9]{1,3})',i)
        if not ad:
            adjacency = None
        else:
            adjacency = ad.group(1)
    h = re.search(r'Hello ([0-9]{1,3})',i)
    if not h:
        hello = None
    else:
        hello = h.group(1)
    d = re.search(r'Dead ([0-9]{1,3})',i)
    if not d:
        dead = None
    else:
        dead = d.group(1)
    return (ospf_int,ip,area,net,cost,neighbour,adjacency,hello,dead)

The great thing about the above code is that if I want to get more OSPF information, I simply add it to the ospf_information function. If I wrote another app to get other information, I can use the rest and replace ospf_information with something else.

I want to do a bit more splitting, but I’m liking the way it works thus far!

When a vlan is not a vlan

What is a vlan? What is a vlan-id? Are they the same thing?

Generally yes, but in the ISP world a vlan-id can also be a circuit identifier. While your view of a vlan might be a single broadcast domain, you’ll soon see that multiple vlan IDs can share the same single broadcast domain, or the same vlan-id could be in a completely different broadcast domain.

The Problem

I’ve written about this before. Carriers, at least in the UK, are offering more and more aggregated links to Service Providers. Each circuit to customer sites is aggregated over a single high-bandwidth link to your PE router. This cuts down on ports, cables, and man hours to plug them in.

Old way:

carrier old When a vlan is not a vlan

New way:

carrier new When a vlan is not a vlan
How are the p2p circuits aggregated over the core high-bandwidth link? Each p2p link is separated by a vlan tag on the PoP side. So we could say that any packet coming out of the core PE with vlan 2000 goes to site 1, while packets with vlan 3000 go to site 2. What happens if site 1 and site 2 are going to the same customer? What if you are providing a VPLS service to them? It’s essential to note that the vlan tag imposed by the carrier is used simply to determine what packet goes to which circuit. As we control the MPLS core, it’s ultimately up to us to decide which packet belongs in which broadcast domain, and that is regardless of the vlan id used by the carrier.

Relevant Initial Core Config

I’ll use the following topology:
vlans core When a vlan is not a vlan

R1, R2, and R3 are the core of the network. R1 is a Brocade Netiron running MPLS. R2 is a Cisco me3600x running MPLS. R2 is an me3600x running bridge-groups with no MPLS.

CE1, CE2, and CE3 are all customer routers.

R1 – Brocade XMR

interface ethernet 2/4
 port-name TO-R2
 enable
 route-only
 ip ospf area 0
 ip ospf network point-to-point
 ip address 10.10.10.10/24
!
router mpls
 policy
  traffic-eng ospf area 0

  mpls-interface e2/4

 lsp R1-R2
  to 192.168.224.4
  adaptive
  enable

R2 – Cisco me3600x running MPLS

mpls traffic-eng tunnels
!
router ospf 1
 mpls traffic-eng router-id Loopback0
 mpls traffic-eng area 0
!
interface GigabitEthernet0/1
 description TO-R1
 no switchport
 ip address 10.10.10.11 255.255.255.0
 ip ospf network point-to-point
 ip ospf 1 area 0
 mpls traffic-eng tunnels
!
interface Tunnel0
 ip unnumbered Loopback0
 tunnel mode mpls traffic-eng
 tunnel destination 192.168.224.61
 tunnel mpls traffic-eng autoroute announce
 tunnel mpls traffic-eng path-option 5 dynamic
 tunnel mpls traffic-eng record-route

There is no IP and MPLS configuration on R3 as it’s not running MPLS. I’ll show how the bridge-group is configured when I get to that part.

CPE Config

I’ll be using vlan 3000 to get to CE1, vlan 2000 to get to CE2, and double-tag vlan 3500,2500 to get to CE3. Each CE has their WAN interface in the same subnet as each other running OSPF. I’ll also enable OSPF on their loopbacks and WAN links.

CE1

This is a Juniper EX3200:

root@CE1> show configuration interfaces ge-0/0/0
vlan-tagging;
unit 3000 {
    vlan-id 3000;
    family inet {
        address 1.1.1.1/24;
    }
}

root@CE1> show configuration interfaces lo0.0
family inet {
    address 10.10.10.10/32;
}

root@CE1> show configuration protocols ospf
area 0.0.0.0 {
    interface ge-0/0/0.3000;
    interface lo0.0;
}

CE2

This is a Cisco 3750G:

interface Loopback0
 ip address 20.20.20.20 255.255.255.255
 ip ospf 1 area 0
!
interface Vlan2000
 ip address 1.1.1.2 255.255.255.0
 ip ospf 1 area 0
!
interface GigabitEthernet1/0/1
 switchport trunk encapsulation dot1q
 switchport trunk allowed vlan 2000
 switchport mode trunk

CE3

This is a Cisco 1841:

interface Loopback0
 ip address 30.30.30.30 255.255.255.255
 ip ospf 1 area 0
!
interface FastEthernet0/0.32
 encapsulation dot1Q 3500 second-dot1q 2500
 ip address 1.1.1.3 255.255.255.0
 ip ospf 1 area 0

VPLS Config

As you can see, each CPE will be using a different vlan tag. One site is even sending a double-tagged frame. They all need to be in the same broadcast domain. No problem as we are simply going to use the vlan tag to determine the service.

R2

Gi0/2 will create a LDP-signalled VPLS VC to R1 (aka manual set up). Interface gi0/2 vlan 2000 will be part of VPLS id 501:

ethernet evc TEST-EVC
 uni count 20
!
l2vpn vfi context TEST-VPLS
 vpn id 501
 member 192.168.224.61 encapsulation mpls
!
interface GigabitEthernet0/2
 switchport trunk allowed vlan none
 switchport mode trunk
 mtu 9800
 service instance 1 ethernet TEST-EVC
  encapsulation dot1q 2000
  rewrite ingress tag pop 1 symmetric
  bridge-domain 501
 !
interface Vlan501
 no ip address
 member vfi TEST-VPLS

What’s important to note here is that the me3600x still uses bridge-groups for VPLS, but it’s not exactly the same as just using bridge-groups by itself. You’ll see this soon enough when we configure R3.

R1

R1 will create a VPLS to R2. Vlan 3000 on interface 2/5 will be part of the same VPLS:

router mpls
 vpls TEST-VPLS 501
  vpls-peer 192.168.224.4
  vpls-mtu 1500
  vlan 3000
   tagged ethe 2/5

At this point R1 and R2 have the VPLS set up between them. Each CE is using different vlans on their WAN, but they are in fact on the same broadcast domain:

CE2#sh ip ospf neighbor

Neighbor ID     Pri   State           Dead Time   Address         Interface
1.1.1.1         128   FULL/DR         00:00:39    1.1.1.1         Vlan2000

CE2#ping 1.1.1.1

Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 1.1.1.1, timeout is 2 seconds:
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 1/5/17 ms

CE2#ping 10.10.10.10 so lo0

Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 10.10.10.10, timeout is 2 seconds:
Packet sent with a source address of 20.20.20.20
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 1/4/9 ms

The vlan-id used on the CPE, was merely used to push the frame into the correct VPLS. The VPLS itself is the broadcast domain, the vlan tag is irrelevant as its stripped on inbound into the PE router. You CAN however, ensure that the PE router does NOT strip the vlan tag. This has interesting use cases when you purposely want to separate on vlan id with in the VPLS. I wrote more on this over here so please give it a read. Both the Brocade and Cisco default to VC mode 5 when setting up a VPLS.

Bridge Group Config

I’m going to set up R3 so that it only uses bridge-groups. No routing or MPLS involved. Bridge-Groups work very similar to VPLS, though it’s on a single box. Traffic can be pushed from a bridge-group into a VPLS if needed. The bridge-group determines the broadcast domain. I can have multiple different vlans in the same bridge group.

For R3, gi0/2 is the interface pointing towards the core, while gi0/1 is pointing towards the customer. I’ll use different vlan ids on each, but they will be in the same bridge-group:

ethernet evc TEST
!
vlan 501
 name TEST-CE
!
interface GigabitEthernet0/1
 switchport trunk allowed vlan none
 switchport mode trunk
 service instance 1 ethernet TEST
  encapsulation dot1q 501
  rewrite ingress tag pop 1 symmetric
  bridge-domain 501
 !
interface GigabitEthernet0/2
 switchport trunk allowed vlan none
 switchport mode trunk
 service instance 1 ethernet TEST
  encapsulation dot1q 3500 second-dot1q 2500
  rewrite ingress tag pop 2 symmetric
  bridge-domain 501

I’m not going into detail, but I will cover the basics. When gi0/2 receives a double-tagged frame that matches 3500,2500 inbound, the me3600x will pop both tags off and the resulting frame will be part of bridge-group 501. Symmetric means that when a frame leaves gi0/2, it will re-add vlans 3500,2500 on top of the frame. As gi0/1 is also in bridge-group 501, the customer frame will be forwarded out that port, and it will have a single vlan tag of 501 popped on top.

At this point gi0/1 is connected to R1 eth2/3. For this customer I would be expecting a single tag of 501 coming inbound, and so I’ll place that vlan id into the VPLS from above:

 vpls TEST-VPLS 501
  vlan 501
   tagged ethe 2/3

Now all three CE routers should be fully adjacent:

CE3#sh ip ospf neighbor

Neighbor ID     Pri   State           Dead Time   Address         Interface
1.1.1.1         128   FULL/DR         00:00:35    1.1.1.1         FastEthernet0/0.32
1.1.1.2           1   FULL/DROTHER    00:00:37    1.1.1.2         FastEthernet0/0.32

CE3#ping 10.10.10.10 so lo0

Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 10.10.10.10, timeout is 2 seconds:
Packet sent with a source address of 30.30.30.30
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 1/2/4 ms
CE3#ping 20.20.20.20 so lo0

Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 20.20.20.20, timeout is 2 seconds:
Packet sent with a source address of 30.30.30.30
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 1/4/12 ms

Conclusions:

vlan tags have multiple uses. In most networks it informs the switches which vlan, and therefore broadcast domain, a frame is part of. They can also be circuit identifiers showing which VPLS/Circuit the frame belongs to. They can also be both at the same time, depending on the VPLS VC type you’re using.

For the above network it’s extremely simplified. Care must be taken when forwarding certain layer2 control frames. Most are sent untagged out tagged interfaces. Cisco’s RSTP+ and STP tag each vlan BPDU with a the same vlan-id. If you’re using vlan 2000 on one side and vlan 3000 on the other, and the BPDU gets through, one side will shut down their WAN link due to receiving a BPDU with a vlan tag that doesn’t match the BPDU data inside the frame.

Various networking ramblings from Dual CCIE #38070 (R&S, SP) and JNCIE-SP #2227

© 2009-2014 Darren O'Connor All Rights Reserved