Tag Archives: junos

Splitting a module from a python app

My OSPF checker is getting a bit big. The majority of the code is the function that parses the OSPF output and returns the required values.

I’d like to continue to refine what it can pull out. I’d also like to check non-IOS devices like Junos and IOS-XR output.

A function can very easily be moved into a new file and then called as a module. The great thing about this is that others can use the same module in different applications of their own. I can also create a separate module per OS that I’m interested in. Each can be edited separately.

The IOS OSPF checker has now been split into it’s own module like so:

import re
import sys

def ospf_information(i):
    int_list = {}
    ospf = re.split(r'[\n](?=GigabitEthernet|FastEthernet|Serial|Tunnel|Loopback|Dialer|BVI|Vlan|Virtual-Access)',i)
    print(ospf)
    for o in ospf:
        properties = {}
        interface =  re.search(r'(GigabitEthernet|FastEthernet|Serial|Tunnel|Loopback|Dialer|BVI|Vlan|Virtual-Access)[0-9]{1,4}/?[0-9]{0,4}.
?[0-9]{0,4}/?[0-9]{0,3}/?[0-9]{0,3}/?[0-9]{0,3}:?[0-9]{0,3}',o)
        if not interface:
            continue
        interface = interface.group()
        ip = re.search(r'(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}/\d{1,2})',o)
        if not ip:
            ip = re.search(r'Interface is unnumbered. Using address of [a-zA-Z]{1,10}[0-9]{1,5}/?[0-9]{0,5}.?[0-9]{0,5}',o)
            properties['IP'] = ip.group()
        else:
            properties['IP'] = ip.group()
        a = re.search(r'Area ([\s]{0,3}[0-9]{1,5})',o)
        properties['Area'] = a.group(1)
        n = re.search(r'Network Type ([\s]{0,3}[a-zA-Z_]{0,20})',o)
        properties['Net'] = n.group(1)
        c = re.search(r'Cost: ([0-9]{1,5})',o)
        properties['Cost'] = c.group(1)
        s = re.search(r'line protocol is[\s]([a-zA-Z]{1,4})',o)
        properties['Status'] = s.group(1)
        p = re.search(r'Passive',o)
        if p:
            properties['Neigh'] = "Passive Interface"
            properties['Adj'] = None
        else:
            ne = re.search(r'(?:Neighbor Count is )([0-9]{1,3})',o)
            if not ne:
                properties['Neigh'] = None
            else:
                properties['Neigh'] = ne.group(1)
            ad = re.search(r'(?:Adjacent neighbor count is )([0-9]{1,3})',o)
            if not ad:
                properties['Adj'] = None
            else:
                properties['Adj'] = ad.group(1)
        h = re.search(r'Hello ([0-9]{1,3})',o)
        if not h:
            properties['Hello'] = None
        else:
            properties['Hello'] = h.group(1)
        d = re.search(r'Dead ([0-9]{1,3})',o)
        if not d:
            properties['Dead'] = None
        else:
            properties['Dead'] = d.group(1)
        int_list[interface]=properties
    return int_list

if __name__ == "__main__":
    f = open(sys.argv[1])
    info = f.read()
    f.close()
    ospf = ospf_information(info)
    print("This device contains "+str(len(ospf))+" ospf enabled interfaces")
    print(ospf)

A couple of things to note here. The module now returns a dictionary. This allows any app using this module to easily extract whatever values it chooses instead of iterating through a list. The last section of code allows me to run the module directly against some raw router output directly to pull information out. This part is not run if calling as a module.

In my main application I now simply import the module and change how I call it slightly:

import ospfios
 ospf_int = ospfios.ospf_information(output)

I’ve started a preliminary Junos OSPF module which will return similar values:

import re
import sys

def ospf_information(i):
    int_list = {}
    ospf = re.split(r'[\n](?=ge|fe|lo|ae|et|fxp)',i)
    for o in ospf:
        properties = {}
        interface =  re.search(r'(ge|fe|lo|ae|et|fxp)([0-9]?)([-]?){0,1}[0-9]{1,5}/?[0-9]{0,5}/?[0-9]{0,5}/?[0-9]?[.][0-9]{1,5}',o)
        if not interface:
            continue
        interface = interface.group()
        ip = re.search(r'Address: (\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})',o)
        properties['IP'] = ip.group(1)
        c = re.search(r'Cost: ([0-9]{1,5})',o)
        properties['Cost'] = c.group(1)
        ad = re.search(r'(?:Adj count: )([0-9]{1,3})',o)
        properties['Adj'] = ad.group(1)
        h = re.search(r'Hello: ([0-9]{1,3})',o)
        properties['Hello'] = h.group(1)
        d = re.search(r'Dead: ([0-9]{1,3})',o)
        properties['Dead'] = d.group(1)
        int_list[interface]=properties
    return int_list

if __name__ == "__main__":
    f = open(sys.argv[1])
    info = f.read()
    f.close()
    ospf = ospf_information(info)
    print("This device contains "+str(len(ospf))+" ospf enabled interfaces")
    print(ospf)

A quick run directly on a small Junos box:

darreno@Jumpbox:~/git/ospf_checker$ python3 ospfjunos.py junos.txt
This device contains 4 ospf enabled interfaces
{'ge-1/3/0.641': {'IP': '10.11.31.227', 'Cost': '10', 'Adj': '1', 'Hello': '10', 'Dead': '40'}, 'lo0.0': {'IP': '10.11.225.224', 'Cost': '0', 'Adj': '0', 'Hello': '10', 'Dead': '40'}, 'ge-0/0/0.643': {'IP': '10.11.31.90', 'Cost': '10', 'Adj': '1', 'Hello': '10', 'Dead': '40'}, 'ge-0/2/0.644': {'IP': '10.11.31.94', 'Cost': '10', 'Adj': '1', 'Hello': '10', 'Dead': '40'}}

My book – MPLS for Enterprise Engineers is now available from multiple channels

I put together a beginners MPLS book for Juniper. I’ve noticed, when interviewing candidates, that often they have a good knowledge of routing protocols, but lack in MPLS. This is to be expected unless they’ve worked at an ISP. The book is targeted towards those users.

J-net

https://www.juniper.net/us/en/community/junos/training-certification/day-one/networking-technologies-series/mpls-enterprise-engineers/

Amazon

http://www.amazon.com/dp/B00IU1KCJ0
http://www.amazon.co.uk/dp/B00IU1KCJ0

iTunes

https://itunes.apple.com/us/book/day-one-mpls-for-enterprise/id836201741?mt=11

Vervante (Print version)

http://store.vervante.com/c/v/V4081705490.html

Embedded traffic capture on Junos and IOS-XE

IOS-XE and Junos both give you the ability to sniff packets directly on the device itself. This is pretty handy for troubleshooting without having to send an engineer to site with a laptop, potentialy with downtime.

Both are very flexible, so I won’t go over every single option possible on both. Rather I’ll just go over a basic capture and view on both platforms. For this post I’ll use a simple topology with an LACP interface between them to show how to get around a limitation or two:
3850 SRX 300x128 Embedded traffic capture on Junos and IOS XE

I’ve enabled OSPF over the aggregated interface.

IOS-XE Setup

I want to view the OSPF hello packets over the port-channel. IOS-XE will not allow you to specify a port-channel interface, but you can specify a range. I’ll simply use a range of interfaces currently in the port-channel. Note that this is done in privileged exec mode and not in configuration mode:

C3850#monitor capture NEW_CAP interface range gi1/0/1 , gi2/0/1 both
C3850#monitor capture NEW_CAP match any
C3850#monitor capture NEW_CAP file location flash:CAP1.pcap

You are able to push the capture through an ACL to match all kinds of particular things. There are a load of options to change if needed. On a 3850 stack, the output needs to go to the current active switches’ flash or USB.

Without configuring any other options, take a look at the defaults used:

C3850#show monitor capture NEW_CAP

Status Information for Capture NEW_CAP
  Target Type:
   Interface: GigabitEthernet1/0/1, Direction: both
   Interface: GigabitEthernet2/0/1, Direction: both
   Status : Inactive
  Filter Details:
    Capture all packets
  Buffer Details:
   Buffer Type: LINEAR (default)
  File Details:
   Associated file name: flash:CAP1.pcap
  Limit Details:
   Number of Packets to capture: 0 (no limit)
   Packet Capture duration: 0 (no limit)
   Packet Size to capture: 0 (no limit)
   Packets per second: 0 (no limit)
   Packet sampling rate: 0 (no sampling)

There are no limits imposed anywhere. If you leave a capture on running to the flash, it could very easily fill the flash. I’ll impose a limit of 60 seconds on this capture to ensure we don’t fill up the flash:

C3850#monitor capture NEW_CAP limit duration 60

IOS-XE Capture

Let’s start the capture:

C3850#monitor capture NEW_CAP start
*Feb 26 08:23:48.854 GMT: %BUFCAP-6-ENABLE: Capture Point NEW_CAP enabled.

I can either run this for 60 seconds, or stop it manually:

C3850#monitor capture NEW_CAP stop
*Feb 26 08:24:19.584 GMT: %BUFCAP-6-DISABLE: Capture Point NEW_CAP disabled.

Very simple.

IOS-XE – View Captures

IOS-XE has a terse output:

C3850#show monitor capture file flash:CAP1.pcap
  1   0.000000     10.0.0.2 -> 224.0.0.5    OSPF Hello Packet
  2   7.868018     10.0.0.2 -> 224.0.0.5    OSPF Hello Packet
  3  15.429030     10.0.0.2 -> 224.0.0.5    OSPF Hello Packet
  4  23.035002     10.0.0.2 -> 224.0.0.5    OSPF Hello Packet

You can also see the entire detail:

C3850#show monitor capture file flash:CAP1.pcap  detailed
Frame 1: 94 bytes on wire (752 bits), 94 bytes captured (752 bits)
    Arrival Time: Feb 26, 2014 08:23:53.939938000 UTC
    Epoch Time: 1393403033.939938000 seconds
    [Time delta from previous captured frame: 0.000000000 seconds]
    [Time delta from previous displayed frame: 0.000000000 seconds]
    [Time since reference or first frame: 0.000000000 seconds]
    Frame Number: 1
    Frame Length: 94 bytes (752 bits)
    Capture Length: 94 bytes (752 bits)
    [Frame is marked: False]
    [Frame is ignored: False]
    [Protocols in frame: eth:ip:ospf]
Ethernet II, Src: 3c:61:04:d9:73:80 (3c:61:04:d9:73:80), Dst: 01:00:5e:00:00:05 (01:00:5e:00:00:05)
    Destination: 01:00:5e:00:00:05 (01:00:5e:00:00:05)
        Address: 01:00:5e:00:00:05 (01:00:5e:00:00:05)
        .... ...1 .... .... .... .... = IG bit: Group address (multicast/broadcast)
        .... ..0. .... .... .... .... = LG bit: Globally unique address (factory default)
    Source: 3c:61:04:d9:73:80 (3c:61:04:d9:73:80)
        Address: 3c:61:04:d9:73:80 (3c:61:04:d9:73:80)
        .... ...0 .... .... .... .... = IG bit: Individual address (unicast)
        .... ..0. .... .... .... .... = LG bit: Globally unique address (factory default)
    Type: IP (0x0800)
Internet Protocol, Src: 10.0.0.2 (10.0.0.2), Dst: 224.0.0.5 (224.0.0.5)
    Version: 4
    Header length: 20 bytes
    Differentiated Services Field: 0xc0 (DSCP 0x30: Class Selector 6; ECN: 0x00)
        1100 00.. = Differentiated Services Codepoint: Class Selector 6 (0x30)
        .... ..0. = ECN-Capable Transport (ECT): 0

etc....
etc....

It’s also possible to capture directly to the screen:

C3850#monitor capture NEW_CAP start display detailed
A file by the same capture file name already exists, overwrite?[confirm]
Frame 1: 94 bytes on wire (752 bits), 94 bytes captured (752 bits)
    Arrival Time: Feb 26, 2014 08:31:20.753958000 UTC
    Epoch Time: 1393403480.753958000 seconds
    [Time delta from previous captured frame: 0.000000000 seconds]
    [Time delta from previous displayed frame: 0.000000000 seconds]
    [Time since reference or first frame: 0.000000000 seconds]
    Frame Number: 1
    Frame Length: 94 bytes (752 bits)
    Capture Length: 94 bytes (752 bits)
    [Frame is marked: False]
    [Frame is ignored: False]
    [Protocols in frame: eth:ip:ospf]
Ethernet II, Src: 3c:61:04:d9:73:80 (3c:61:04:d9:73:80), Dst: 01:00:5e:00:00:05

IOS-XE Notes

  • The capture configuration is not saved in global config, but the config is still there. In order to remove your monitor session you need to explicitly delete it from privileged exec mode:
C3850#no monitor capture NEW_CAP
  • Embedded wireshark can capture data-plane traffic, as well as control-place traffic

Junos Capture

Start up a shell:

darreno@SRX110> start shell user root
Password:

Junos has tcpdump built-in. For this part I’ll write a file to the tmp folder which we can the view later:

root@SRX110% tcpdump -i ae0.0 -w /tmp/CAP2.pcap
Address resolution is ON. Use  to avoid any reverse lookup delay.
Address resolution timeout is 4s.
Listening on ae0.0, capture size 96 bytes

Junos – view captures

We can use tcpdump to view the files we created:

root@SRX110% tcpdump -qn -r /tmp/CAP2.pcap
23:51:58.856149 Out IP truncated-ip - 20 bytes missing! 10.0.0.2 > 224.0.0.5: OSPFv2, Hello, length 60
23:51:58.991515  In IP 10.0.0.1 > 224.0.0.5: OSPFv2, Hello, length 60
23:52:08.531670  In IP 10.0.0.1 > 224.0.0.5: OSPFv2, Hello, length 60
23:52:08.744550 Out IP truncated-ip - 20 bytes missing! 10.0.0.2 > 224.0.0.5: OSPFv2, Hello, length 60
23:52:17.460023 Out IP truncated-ip - 20 bytes missing! 10.0.0.2 > 224.0.0.5: OSPFv2, Hello, length 60
23:52:17.640020  In IP 10.0.0.1 > 224.0.0.5: OSPFv2, Hello, length 60
23:52:25.978974 Out IP truncated-ip - 20 bytes missing! 10.0.0.2 > 224.0.0.5: OSPFv2, Hello, length 60
23:52:26.888403  In IP 10.0.0.1 > 224.0.0.5: OSPFv2, Hello, length 60
23:52:33.517479 Out IP truncated-ip - 20 bytes missing! 10.0.0.2 > 224.0.0.5: OSPFv2, Hello, length 60
23:52:36.858979  In IP 10.0.0.1 > 224.0.0.5: OSPFv2, Hello, length 60
23:52:42.147688 Out IP truncated-ip - 20 bytes missing! 10.0.0.2 > 224.0.0.5: OSPFv2, Hello, length 60
23:52:46.407409  In IP 10.0.0.1 > 224.0.0.5: OSPFv2, Hello, length 60
23:52:49.663809 Out IP truncated-ip - 20 bytes missing! 10.0.0.2 > 224.0.0.5: OSPFv2, Hello, length 60
23:52:55.448971  In IP 10.0.0.1 > 224.0.0.5: OSPFv2, Hello, length 60

If you wanted to quickly see traffic going over an interface without saving a file, you can do it directly from the cli:

darreno@SRX110> monitor traffic interface ae0.0 detail
Address resolution is ON. Use  to avoid any reverse lookup delay.
Address resolution timeout is 4s.
Listening on ae0.0, capture size 1514 bytes

Reverse lookup for 10.0.0.1 failed (check DNS reachability).
Other reverse lookup failures will not be reported.
Use  to avoid reverse lookups on IP addresses.

23:56:24.203372  In IP (tos 0xc0, ttl   1, id 65445, offset 0, flags [none], proto: OSPF (89), length: 80) 10.0.0.1 > 224.0.0.5: OSPFv2, Hello, length 60 [len 48]
        Router-ID 192.168.255.100, Backbone Area, Authentication Type: none (0)
        Options [External, LLS]
          Hello Timer 10s, Dead Timer 40s, Mask 255.255.255.0, Priority 1
          Neighbor List:
            10.0.0.2
          LLS: checksum: 0xfff6, length: 3
            Extended Options (1), length: 4
              Options: 0x00000001 [LSDB resync]
23:56:24.527779 Out IP (tos 0xc0, ttl   1, id 62974, offset 0, flags [none], proto: OSPF (89), length: 80) 10.0.0.2 > 224.0.0.5: OSPFv2, Hello, length 60 [len 48]
        Router-ID 10.0.0.2, Backbone Area, Authentication Type: none (0)
        Options [External, LLS]
          Hello Timer 10s, Dead Timer 40s, Mask 255.255.255.0, Priority 128
          Neighbor List:
            192.168.255.100
          LLS: checksum: 0xfff6, length: 3
            Extended Options (1), length: 4
              Options: 0x00000001 [LSDB resync]

Junos notes

  • Traffic captured on Junos is control-plane traffic only. It cannot capture data-plane traffic

Always check the forwarding table – IOS, Junos, Netiron

Most bigger routers these days use a distributed system. One of the bigger differences is the separation on the control and forwarding plane. When troubleshooting or verifying it’s essential to view both. Too many engineers simply show the control plane output. While these should match, they don’t always. Note that the forwarding table doesn’t have to be distributed to different hardware.

For the examples below I’ll simply be viewing a default route learned through OSPF. The router in question will always have two equal costs out of the network so you would expect to see two routes.

IOS

First we check the routing table:

R1#sh ip route 0.0.0.0
Routing entry for 0.0.0.0/0, supernet
  Known via "ospf 1", distance 110, metric 1, candidate default path
  Tag 1, type extern 2, forward metric 2
  Last update from 10.0.12.2 on GigabitEthernet2/0, 00:00:33 ago
  Routing Descriptor Blocks:
  * 10.0.13.3, from 10.0.24.4, 00:00:33 ago, via GigabitEthernet1/0
      Route metric is 1, traffic share count is 1
      Route tag 1
    10.0.12.2, from 10.0.24.4, 00:00:33 ago, via GigabitEthernet2/0
      Route metric is 1, traffic share count is 1
      Route tag 1

Two ways to get to 0.0.0.0 – What does the forwarding table show? For this I’ll choose an IP that would follow the default route:

R1#sh ip cef 4.2.2.1
0.0.0.0/0
  nexthop 10.0.12.2 GigabitEthernet2/0
  nexthop 10.0.13.3 GigabitEthernet1/0

Both control plane and data plane agree.

Netiron

Routing table:

[email protected]#sh ip route 0.0.0.0
Type Codes - B:BGP D:Connected I:ISIS O:OSPF R:RIP S:Static; Cost - Dist/Metric
BGP  Codes - i:iBGP e:eBGP
ISIS Codes - L1:Level-1 L2:Level-2
OSPF Codes - i:Inter Area 1:External Type 1 2:External Type 2 s:Sham Link
STATIC Codes - d:DHCPv6
        Destination        Gateway         Port          Cost          Type Uptime src-vrf
1       0.0.0.0/0          10.0.0.1        eth 15/1      110/110       O1   1h22m  -
        0.0.0.0/0          10.0.0.2        eth 16/1      110/110       O1   1h22m  -

In order to show the forwarding table you use show route x.x.x.x detail. Note that I’m executing this command on an XMR16 and I will get the forwarding entry for every single module. I’m going to only show the output for the first module:

[email protected]#sh ip route 4.2.2.1 detail
Type Codes - B:BGP D:Connected I:ISIS O:OSPF R:RIP S:Static; Cost - Dist/Metric
BGP  Codes - i:iBGP e:eBGP
ISIS Codes - L1:Level-1 L2:Level-2
OSPF Codes - i:Inter Area 1:External Type 1 2:External Type 2 s:Sham Link
STATIC Codes - d:DHCPv6
        Destination        Gateway         Port          Cost          Type Uptime src-vrf
1       0.0.0.0/0          10.0.0.1        eth 15/1      110/110       O1   1h24m  -
        0.0.0.0/0          10.0.0.1        eth 16/1      110/110       O1   1h24m  -
        Nexthop Entry ID:65540, Paths: 2, Ref_Count:707/712

D:Dynamic  P:Permanent  F:Forward  U:Us  C:Connected Network E: ESI VLAN
W:Wait ARP  I:ICMP Deny  K:Drop  R:Fragment  S:Snap Encap N:CamInvalid

Module S1:
      IP Address         Next Hop        MAC              Type  Port  Vlan  Pri
      0.0.0.0/0          10.0.0.1       0012.f293.a802   PF    16/1   1     0

      OutgoingIf  ArpIndex PPCR_ID   CamLevel   Parent  DontAge Index Is_trunk
      eth 16/1    5        1:1       31              0               0 0

      U_flags   Entry_flags  Age   Cam:Index               HW_Path_count
      0000e000               0     0x0005ffff (L3, right)  2

        CAM Entry Flag: 00000001H
        PPCR : 1:1 CIDX: 0x0005ffff (L3, right) (IP_NETWORK: 0x56000)

        pram_index_programmed: ppcr[0] 0x0000014c

The output is a little cryptic so I’ll highlight the important bits. First the paths show as two:

Nexthop Entry ID:65540, Paths: 2, Ref_Count:707/712

But the actual next-hop is only showing a single:

     0.0.0.0/0          10.0.0.1       0012.f293.a802   PF    16/1   1     0

This is a cosmetic error. The most important bit is here:

      U_flags   Entry_flags  Age   Cam:Index               HW_Path_count
      0000e000               0     0x0005ffff (L3, right)  2

The hardware path count is two, which is what we expect.

Junos

Finally Junos. First up we look at the route table:

lab@Vega_SRX6> show route 0.0.0.0

inet.0: 32 destinations, 32 routes (29 active, 3 holddown, 0 hidden)
+ = Active Route, - = Last Active, * = Both

0.0.0.0/0          *[OSPF/150] 00:00:12, metric 0, tag 0
                      to 172.30.0.17 via ge-0/0/4.126
                    > to 172.30.0.89 via ge-0/0/4.146

Two routes, our forwarding table should match?

lab@Vega_SRX6> show route forwarding-table destination 4.2.2.1
Routing table: default.inet
Internet:
Destination        Type RtRef Next hop           Type Index NhRef Netif
default            user     1 0:c:29:86:21:55    ucst   584    13 ge-0/0/4.146
default            perm     0                    rjct    36     5

Routing table: __master.anon__.inet
Internet:
Destination        Type RtRef Next hop           Type Index NhRef Netif
default            perm     0                    rjct   534     1

Well no, it doesn’t. While the route table shows two routes, only one is being used by the forwarding table. Junos will not install multiple next-hops into the forwarding-table unless you tell it to:

lab@Vega_SRX6> show configuration policy-options policy-statement BALANCE
then {
    load-balance per-packet;
}
lab@Vega_SRX6> show configuration routing-options forwarding-table
export BALANCE;

Let’s check again:

lab@Vega_SRX6> show route forwarding-table destination 4.2.2.1
Routing table: default.inet
Internet:
Destination        Type RtRef Next hop           Type Index NhRef Netif
default            user     1                    ulst 262142     7
                              0:c:29:25:21:57    ucst   612    11 ge-0/0/4.126
                              0:c:29:86:21:55    ucst   584     9 ge-0/0/4.146
default            perm     0                    rjct    36     5

Routing table: __master.anon__.inet
Internet:
Destination        Type RtRef Next hop           Type Index NhRef Netif
default            perm     0                    rjct   534     1

This time we have both in the forwarding table. Note that while the policy states load-blance per-packet, it’s actually doing per-flow load-sharing.

Conclusion

I have seen routers disagree as to what they think they are doing compared to what they are doing. You need to check both tables above to note what both are doing. This could help immensely when a router is dropping packets it’s supposed to be forwarding, due to your FIB having no entry. I might write a bit on this as I’ve seen it happen more than once.

EDIT – 04/11/13

I’ve since found another way to verify this on the Brocades. If you rconsole onto the line card itself you can see a bit more:

SSH@XMR16#rconsole 1
Remote connection to LP slot 1 established
Press CTRL-X or type 'exit' to disconnect it
LP-1>en
LP-1#sh ip network 0.0.0.0
D:Dynamic  P:Permanent  F:Forward  U:Us  C:Connected Network
W:Wait ARP  I:ICMP Deny  K:Drop  R:Fragment  S:Snap Encap N:CamInvalid
      IP Address         Next Hop        MAC              Type  Port  Vlan  Pri
      0.0.0.0/0          10.0.0.1*    0012.f293.ad02   PF    15/1*  1     0

      OutgoingIf  ArpIndex PPCR_ID   CamLevel   Parent  DontAge Index Is_trunk
      eth 15/1    4        1:1       31              0               0 0

      U_flags   Entry_flags  Age   Cam:Index               HW_Path_count
      0000e000  0x00000001   0     0x0005ffff (L3, right)  2

        CAM Entry Flag: 00000001H
        PPCR : 1:1 CIDX: 0x0005ffff (L3, right) (IP_NETWORK: 0x56000)

        pram_index_programmed: ppcr[0] 0x0000014c
use_index: 0
IP-nh-Pram 0: 0x2ebeec10, ref_count 1
n_paths = 2, type = ECMP_PHY_VE, is_default  = 1, vrf_index = 0
  path[0]: FORWARD, out_intf eth 15/1, nh 10.0.0.1, out_port 15/1, is_trunk 0
  path[1]: FORWARD, out_intf eth 16/1, nh 10.0.0.5, out_port 16/1, is_trunk 0
Pram info: alloc_count 2 use_count 2
  pram[0]: idx 0, pram_idx[0] 0x0000014c
  pram[1]: idx 1, pram_idx[0] 0x0000014d

The top half still shows a single port, but down it shows this:

n_paths = 2, type = ECMP_PHY_VE, is_default  = 1, vrf_index = 0
  path[0]: FORWARD, out_intf eth 15/1, nh 10.0.0.1, out_port 15/1, is_trunk 0
  path[1]: FORWARD, out_intf eth 16/1, nh 10.0.0.5, out_port 16/1, is_trunk 0

n paths is the number of paths. The router is also doing ECMP. It then shows which ports outbound it’ll send traffic.

On a route with only a single hop the bit above are shown as so:

n_paths = 1, type = NON_ECMP, is_default  = 0, vrf_index = 0
  path[0]: FORWARD, out_intf eth 1/20, nh 10.0.0.8, out_port 1/20, is_trunk 0

OSPF Fast Re-Route and BFD on Junos

One of the few advantages that EIGRP had over OSPF and IS-IS was that it had feasable successors. That is the router had already pre-calculated a route to a destination over a backup, non-looping, path.

OSPF and IS-Is has had this for sometime now on both IOS and Junos. It’s also supported on IOS-XR.

This post will mainly go over OSPF. The process is nearly identical for IS-IS.

To start I’ll be using the following topology:
FRR 1 OSPF Fast Re Route and BFD on Junos
R3 has two links to R4. This is going through a switch which will allow us to bring the link down without pulling the interface down. I’m configuring a cost of 100 on the first link and 1000 on the second as I don’t want to bring ECMP into play for this post.

How does a router know it’s neighbour is down? If the interface goes down the detection will be quick. If the interface stays up, but something alone the path is dropping packets, the router will take quite a long time to detect this.

If we leave OSPF to its defaults, it could be 40 seconds before R3 realises it cannot get to R4 over their primary interface (Standard dead timer on broadcast links). Until that happens R3 will be sending packets into the void.

I’ll set up standard OSPF on all interfaces. From R2 I’ll be sending pings to R5′s loopback. R3 and R4 are both tagged interfaces in different vlans. On the switch I can simply remove vlan 24 which will cause packets to be dropped over that vlan.

OSPF – No tweaking

Standard OSPF here with no tweaks. I’ll be showing R3′s config here:

darreno@M7i> show configuration protocols ospf
area 0.0.0.0 {
    interface lo0.3;
    interface fe-0/1/4.24 {
        metric 100;
    }
    interface fe-0/1/5.35 {
        metric 1000;
    }
}

I’ll now initiate a ping flood from R2 to R5. Once that starts I’ll remove vlan 24 from the switch.

Let’s see how the ping flood goes:

!!!.....................................................................!!!

Not very good at all!

OSPF – BFD

Let’s add BFD to the OSPF session on both R3 and R4:

darreno@M7i> show configuration protocols ospf
area 0.0.0.0 {
    interface all;
    interface lo0.3;
    interface fe-0/1/4.24 {
        metric 100;
        bfd-liveness-detection {
            minimum-interval 50;
            minimum-receive-interval 30;
            multiplier 3;
        }
    }
    interface fe-0/1/5.35 {
        metric 1000;
        bfd-liveness-detection {
            minimum-interval 50;
            minimum-receive-interval 30;
            multiplier 3;
        }
    }
}

Do the same test as above.

!!!!.!!!

Much much better. Note that this is a very small topology though so LSAs are very quick to flood. If you had a larger topology, especially if it spans geographic regions it could take much longer for the new route to be calculated.

OSPF – BFD & FRR

Now I’ll add FRR to OSPF on R3. I’ll protect the fe-0/1/4.0 link from R3′s point of view. R3 will run SPF for all it’s destinations through that interface and will know if it can get to any destination through any other interfaces without being looped. In this simple topology any traffic sent over the higher metric interface to R4 will still get to R5 as R4 will not send it back.

First we enable link-protection:

darreno@M7i> show configuration protocols ospf area 0 interface fe-0/1/4.24
link-protection;
metric 100;
bfd-liveness-detection {
    minimum-interval 50;
    minimum-receive-interval 30;
    multiplier 3;
}

Junos will pre-calculate the routes, but it will NOT add it to the FIB by default. You have to enable more than one next-hop in the FIB:

darreno@M7i> show configuration policy-options policy-statement BALANCE
then {
    load-balance per-packet;
}

darreno@M7i> show configuration routing-options forwarding-table
export BALANCE;

Let’s run the same test as above again:

!!!!!!!!!!!!!!!!!!!!!!

I’m simply not losing any at all. The difference between BFD alone and BFD and link-protection is most pronounced on much larger topologies. Remember FRR is a router making a local repair quickly to get packets form A to B while an alternative regular route is calculated.

You can see that enabling FRR is a piece of cake. To verify you need to dig a little deeper. First let’s see the FRR coverage on R3:

darreno@M7i> show ospf backup coverage
Topology default coverage:

Node Coverage:

Area             Covered  Total  Percent
                   Nodes  Nodes  Covered
0.0.0.0                2      3   66.67%

Route Coverage:

Path Type  Covered   Total  Percent
            Routes  Routes  Covered
Intra            5      11   45.45%
Inter            0       0  100.00%
Ext1             0       0  100.00%
Ext2             0       0  100.00%
All              5      11   45.45%

Not every single prefix can be covered as it’s quite topology dependant. If we look into the detail for specifically 5.5.5.5:

darreno@M7i> show ospf backup spf detail | find 5.5.5.5
5.5.5.5
  Self to Destination Metric: 101
  Parent Node: 10.0.8.10
  Primary next-hop: fe-0/1/4.24 via 10.0.24.4
  Backup next-hop: fe-0/1/5.35 via 10.0.35.4
  Backup Neighbor: 4.4.4.4
    Neighbor to Destination Metric: 1, Neighbor to Self Metric: 1
    Self to Neighbor Metric: 100, Backup preference: 0x0
    Eligible, Reason: Contributes backup next-hop

Here we see that fe-0/1/4.24 is the primary and fe-0/1/5.35 is the backup. The backup is also eligible. If we take a look at the route itself:

darreno@M7i> show route 5.5.5.5

inet.0: 24 destinations, 25 routes (24 active, 0 holddown, 0 hidden)
+ = Active Route, - = Last Active, * = Both

5.5.5.5/32         *[OSPF/10] 00:03:15, metric 101
                    > to 10.0.24.4 via fe-0/1/4.24
                      to 10.0.35.4 via fe-0/1/5.35

Both routes are there, but only the first will be used until it fails.

Finally we can take a look at the FIB entry:

darreno@M7i> show route forwarding-table destination 5.5.5.5
Routing table: default.inet
Internet:
Destination        Type RtRef Next hop           Type Index NhRef Netif
5.5.5.5/32         user     1                    ulst 262142     5
                              10.0.24.4          ucst  1303     2 fe-0/1/4.24
                              10.0.35.4          ucst  1304     2 fe-0/1/5.35

The backup hop is already programmed ready to take over as soon as the primary fails.