Fundamentals – PMTUD – IPv4 & IPv6 – Part 1 of 2

One of IPv6′s features is the fact that routers are no longer supposed to fragment packets. Rather it’s up to the hosts on either end to work out the path MTU. This is different in IPv4 in which the routers along the path could fragment the packet. Both IPv4 and IPv6 have a mechanism to work out the path MTU which is what I’ll go over in this post. Instead of going over each separately, I’ll show what problem is trying to be solved and how both differ when it comes to sending traffic.

I’ll be using the following topology in this post:
pmtu 11 Fundamentals   PMTUD   IPv4 & IPv6   Part 1 of 2

The problem

When you visit this blog, your browser is requesting a particular web page from my server. This request is usually quite small. My server needs to respond to that packet with some actual data. This includes the images, words, plugins, style-sheets, etc. This data can be quite large. My server needs to break down this stream of data into IP packets to send back to you. Each packet requires a few headers, and so the most optimum way to send data back to you is the biggest amount of data in the smallest amount of packets.

Between you and my server sits a load of different networks and hardware. There is no way for my server to know the maximum MTU supported by all those devices along the path. Not only can this path change, but I have many thousands of readers in thousands of different countries. In the topology above, the link between R2 and R4 has an MTU of 1400. None of the hosts are directly connected to that segment and so none of them know the MTU of the entire path.
pmtu 2 Fundamentals   PMTUD   IPv4 & IPv6   Part 1 of 2

PMTUD

Path MTU Discovery, RFC1191 for IPv4 and RFC1981 for IPv6, does exactly what the name suggests. Find out the MTU of the path. There are a number of similarities between the two RFCs, but a few key differences which I’ll dig into.

Note – OS implementations of PMTUD can vary widely. I’ll be showing both Debian Linux server 7.6.0 and Windows Server 2012 in this post.

Both RFCs state that hosts should always assume first that the MTU across the entire path matches the first hop MTU. i.e. The servers should assume that the MTU matches the MTU on the link they are connected. In this case both my Windows and Linux servers have a local MTU of 1500.
pmtu 3 Fundamentals   PMTUD   IPv4 & IPv6   Part 1 of 2

The link between R1 and R4 has an IP MTU of 1400. My servers would need to figure the path MTU in order to maximise the packet size without fragmentation.

  • IPv4
  • RFC1191 states:

    The basic idea is that a source host initially assumes that the PMTU of a path is the (known) MTU of its first hop, and sends all datagrams on that path with the DF bit set. If any of the datagrams are too large to be forwarded without fragmentation by some router along the path, that router will discard them and return ICMP Destination Unreachable messages with a code meaning “fragmentation needed and DF set” [7]. Upon receipt of such a message (henceforth called a “Datagram Too Big” message), the source host reduces its assumed PMTU for the path.

    In my example, the servers should assume that the path MTU is 1500. They should send packets back to the user using this MTU and setting the Do Not Fragment bit. R2′s link to R4 is not big enough and so should drop the packet and return the correct ICMP message back to my servers. Those servers should then send those packets again with a lower MTU.

    I’m going to show Wireshark capture from the servers point of view. I’ll start with Windows.

    The first part is the regular TCP 3-way handshake to set up the session. These packets are very small so are generally not fragmented:
    Screen Shot 2014 08 25 at 12.37.40 Fundamentals   PMTUD   IPv4 & IPv6   Part 1 of 2
    The user then requests a file. The server responds with full size packets with the DF bit set. Those packets are dropped by R2, who sends back the required ICMP message:
    Screen Shot 2014 08 25 at 12.39.49 Fundamentals   PMTUD   IPv4 & IPv6   Part 1 of 2

    Dig a bit deeper into those packets. First the full size packet from the server. Note the DF-bit has been set:
    Screen Shot 2014 08 25 at 12.43.49 Fundamentals   PMTUD   IPv4 & IPv6   Part 1 of 2

    Second, the ICMP message sent from R2. This is an ICMP Type 3 Code 4 message. It states the destination is unreachable and that fragmentation is required. Note it also states the MTU of the next-hop. The Windows server can use this value to re-originate it’s packets with a lower MTU.

    All the rest of the packets in the capture then have the lower MTU set. Note that Wireshark shows the ethernet MTU as well hence the value of 1414:
    Screen Shot 2014 08 25 at 12.49.11 Fundamentals   PMTUD   IPv4 & IPv6   Part 1 of 2

    RFC1191 states that a server should cache a lower MTU value. It’s also suggested that this value is cached for 10 minutes, and should be tweakable. You can view the cached value on Windows, but it doesn’t show the timer. Perhaps a reader could let me know?
    Screen Shot 2014 08 25 at 12.53.53 Fundamentals   PMTUD   IPv4 & IPv6   Part 1 of 2

    I’ll now do the same on my Debian server. First part is the 3-way handshake again:
    Screen Shot 2014 08 26 at 1.27.02 pm Fundamentals   PMTUD   IPv4 & IPv6   Part 1 of 2
    The server starts sending packets with an MTU of 1500:
    Screen Shot 2014 08 26 at 1.28.48 pm Fundamentals   PMTUD   IPv4 & IPv6   Part 1 of 2
    Which are dropped by R2, with ICMP messages sent back:
    Screen Shot 2014 08 26 at 1.29.52 pm Fundamentals   PMTUD   IPv4 & IPv6   Part 1 of 2
    The Debian server will cache that entry. Debian does show me the remaining cache time, in this case 584 seconds:
    Screen Shot 2014 08 26 at 1.32.23 pm Fundamentals   PMTUD   IPv4 & IPv6   Part 1 of 2

  • IPv6
  • RFC1981 goes over the finer details of how this works with IPv6. The majority of the document is identical to the RFC1191 version.

    When the Debian server responds, the packets have a size of 1514 on the wire as expected. Note however that there is no DF bit in IPv6 packets. This is a major difference between IPv4 and IPV6 right here. Routers CANNOT fragment IPv6 packets and hence there is no reason to explicitly state this in the packet. All IPv6 packets are non-fragmentable by routers in the path. I’ll go over what this means in depth later.
    Screen Shot 2014 08 27 at 8.06.39 am Fundamentals   PMTUD   IPv4 & IPv6   Part 1 of 2
    R2 cannot forward this packet and drops it. The message returned by R2 is still an ICMP message, but it’s a bit different to the IPv4 version:
    Screen Shot 2014 08 27 at 8.10.56 am Fundamentals   PMTUD   IPv4 & IPv6   Part 1 of 2

    This time the message is ‘Packet too big’ – Very easy to figure out what that means. The ICMP message will contain the MTU of the next-hop as expected:
    Screen Shot 2014 08 27 at 8.14.02 am Fundamentals   PMTUD   IPv4 & IPv6   Part 1 of 2

    The server will act on this message, cache the result, then send packets up to the required MTU:
    Screen Shot 2014 08 27 at 8.17.30 am Fundamentals   PMTUD   IPv4 & IPv6   Part 1 of 2
    Screen Shot 2014 08 27 at 8.18.29 am Fundamentals   PMTUD   IPv4 & IPv6   Part 1 of 2

    Windows server 2012 has identical behaviour. To show the cache simply view the ipv6 destinationcache and you’re good to go.

    Problems

    So what could possibly go wrong? The above all looks good and works in the lab. The biggest issue is that both require those ICMP messages to come back to the sending host. There are a load of badly configured firewalls and ACLs out there dropping more ICMP than they are supposed to. Some people even drop ALL ICMP. There is another issue that I’ll go over in another blog post in the near future.

    In the above examples, if those ICMP messages don’t get back, the sending host will not adjust it’s MTU. If it continues to send large packets, the router with a smaller MTU will drop that packet. All that traffic is blackholed. Smaller packets like requests will get through. Ping will even get through if echo-requests and echo-replies have been let through. You might even be able to see the beginnings of a web page, but the big content will not load.

    On R1′s fa0/1 interface I’ll create this bad access list:

    R1#sh ip access-lists
    Extended IP access list BLOCK-ICMP
        10 permit icmp any any echo
        20 permit icmp any any echo-reply
        30 deny icmp any any
        40 permit ip any any

    From the client I can ping the host:
    Screen Shot 2014 08 27 at 8.31.41 am Fundamentals   PMTUD   IPv4 & IPv6   Part 1 of 2
    I can even open a text-based page from the server:
    Screen Shot 2014 08 27 at 8.32.30 am Fundamentals   PMTUD   IPv4 & IPv6   Part 1 of 2

    But try to download the file:
    Screen Shot 2014 08 27 at 8.33.39 am Fundamentals   PMTUD   IPv4 & IPv6   Part 1 of 2
    The initial 3-way handshake works fine, but nothing else happens. The Debian server is sending those packets, R2 is dropping and informing the sender, but R1 drops those packets. You’ve now got a black-hole. The same things happens with IPv6, though of course the packet dropped is the Packet Too Big message.

    Workarounds

    The best thing to do is fix the problem. Unfortunately that’s not always possible. There are a few things that can be done to work through the problem of dropped ICMP packets.
    If you know the MTU value further down the line, you can use TCP clamping. This causes the router to intercept TCP SYN packets and rewrite the TCP MSS. You need to take into account the size of the added headers.

    1#conf t
    Enter configuration commands, one per line.  End with CNTL/Z.
    R1(config)#int fa1/1
    R1(config-if)#ip tcp adjust-mss  1360
    R1(config-if)#end

    Note how the MSS value has been changed to 1360:
    Screen Shot 2014 08 28 at 1.46.58 pm Fundamentals   PMTUD   IPv4 & IPv6   Part 1 of 2

    I’ve tested with IOS 15.2(4)S2 and it also works with IPv6:
    Screen Shot 2014 08 28 at 1.54.57 pm Fundamentals   PMTUD   IPv4 & IPv6   Part 1 of 2

    The problem with this is that it’s a burden on the router configured. Your router might not even support this option. This also affects ALL TCP traffic going through that router. TCP clamping can work well for VPN tunnels, but it’s not a very scalable solution.

    Another workaround can be to get the router to disregard the DF bit and just let the routers fragment the packets:

    route-map CLEAR-DF permit 10
     set ip df 0
    !
    interface FastEthernet1/1
     ip address 192.168.4.1 255.255.255.0
     ip router isis
     ip policy route-map CLEAR-DF
     ipv6 address 2001:DB8:10:14::1/64
     ipv6 router isis

    The problem with this is that you’re placing burden on the router again. It’s also not at all efficient. Some firewalls also block fragments. Some routers might just drop fragmented packets.
    The biggest problem with this is that there is no df-bit to clear in IPv6. IPv6 packets will not be fragmented by routers. It has to be done by the host.

    End of Part One

    There is simply too much to cover in a single post. I’ll end this post here. Part two will be coming soon!

    Demystifying the IS-IS database

    I’ve gone over the OSPFv2 and OSPFv3 databases in depth before. Now is the time for IS-IS. As always, I’ll start from a basic two router set up and add devices to the topology.

    Basic LSPs

    In OSPF we use the term LSA, Link-State Advertisement. In IS-IS we use the term LSP – Link-State PDUs. Further expanded into Link-State Protocol Data Units. Not to be confused with Label Switched Paths.

    This is the topology we’ll start with:
    IS IS 1 Demystifying the IS IS database
    Like OSPF, IS-IS will treat ethernet links as broadcast by default. In OSPF a DR and BDR will be elected. In IS-IS a single DIS (Designated Intermediate System) is elected with no backup DIS. This DIS election is also pre-emtptive, unlike OSPF. The DIS will originate an LSP representing the DIS. This means I should have three LSPs in the database currently:

    RP/0/0/CPU0:XR1#show isis database
    Tue Aug 12 17:34:21.594 UTC
    
    IS-IS 1 (Level-2) Link State Database
    LSPID                 LSP Seq Num  LSP Checksum  LSP Holdtime  ATT/P/OL
    XR1.00-00           * 0x00000003   0x8577        736             0/0/0
    XR1.01-00             0x00000002   0x1fba        931             0/0/0
    XR2.00-00             0x00000005   0x856b        806             0/0/0
    
     Total Level-2 LSP count: 3     Local Level-2 LSP count: 1

    XR2 has a single LSP with XR1 has two. The XR1.01 LSP is the DIS LSP. Dig deeper into the LSPs to see their current content:

    RP/0/0/CPU0:XR1#show isis database XR1.00-00 detail
    Tue Wed 12 17:38:23.307 UTC
    
    IS-IS 1 (Level-2) Link State Database
    LSPID                 LSP Seq Num  LSP Checksum  LSP Holdtime  ATT/P/OL
    XR1.00-00           * 0x00000003   0x8577        494             0/0/0
      Area Address: 49.0001
      NLPID:        0xcc
      Hostname:     XR1
      IP Address:   1.1.1.1
      Metric: 10         IS XR1.01
      Metric: 10         IP 1.1.1.1/32
      Metric: 10         IP 10.0.12.0/24

    XR1 has originated an LSP stating what area it’s in and hostname. Notice the NLPID value. This means Network Layer Protocol IDentifier. The value of 0xcc translates to IPv4. Further down the LSP contains the IS of XR1 itself, plus two IP ranges. All these with metrics to those IS and IPs. I’ll get onto the ATT/P/OL bits later so ignore those for now.

    It’s important to note that an LSP is made up of several TLVs. On the wire multiple TLVs can be grouped together in a single frame. If large enough, IS-IS will fragment these frames.

    As XR1 is the DIS, there is a separate DIS LSP, let’s take a look at that:

    RP/0/0/CPU0:XR1#show isis database XR1.01-00 detail
    Tue Aug 12 17:43:00.448 UTC
    
    IS-IS 1 (Level-2) Link State Database
    LSPID                 LSP Seq Num  LSP Checksum  LSP Holdtime  ATT/P/OL
    XR1.01-00             0x00000003   0x1dbb        1161            0/0/0
      Metric: 0          IS XR1.00
      Metric: 0          IS XR2.00

    The DIS LSP advertises all the IS’ that are on the segment in which the DIS sits.

    If I change the segment to point-to-point, this removes the need of a DIS and as such there will be no DIS LSP.

    router isis 1
    !
     interface GigabitEthernet0/0/0/1
      point-to-point
    
    RP/0/0/CPU0:XR1#show isis database
    Tue Aug 12 18:46:50.566 UTC
    
    IS-IS 1 (Level-2) Link State Database
    LSPID                 LSP Seq Num  LSP Checksum  LSP Holdtime  ATT/P/OL
    XR1.00-00           * 0x0000000b   0x7480        674             0/0/0
    XR2.00-00             0x0000000d   0x5297        543             0/0/0
    
     Total Level-2 LSP count: 2     Local Level-2 LSP count: 1

    Externals

    I’m going to add another loopback interface on XR1 and redistribute that loopback into IS-IS. This will make the route external

    interface Loopback100
     ipv4 address 100.100.100.100 255.255.255.255
    !
    prefix-set LOOPBACK100
      100.100.100.100/32
    end-set
    !
    route-policy RP-100
      if destination in LOOPBACK100 then
        done
      else
        drop
      endif
    end-policy
    !
    router isis 1
     address-family ipv4 unicast
      redistribute connected level-2 route-policy RP-100

    As I mentioned above, IS-IS has separate TLVs that make up the LSP. Therefore there is still only a single LSP from XR1:

    RP/0/0/CPU0:XR2#sh isis database
    Tue Aug 12 19:03:31.569 UTC
    
    IS-IS 1 (Level-2) Link State Database
    LSPID                 LSP Seq Num  LSP Checksum  LSP Holdtime  ATT/P/OL
    XR1.00-00             0x0000000d   0x6be5        1043            0/0/0
    XR2.00-00           * 0x00000010   0x9c8f        1094            0/0/0
    
     Total Level-2 LSP count: 2     Local Level-2 LSP count: 1

    The external route can be seen in the detailed output under that LSP:

    RP/0/0/CPU0:XR2#sh isis database XR1.00-00 detail
    Tue Aug 12 19:03:58.637 UTC
    
    IS-IS 1 (Level-2) Link State Database
    LSPID                 LSP Seq Num  LSP Checksum  LSP Holdtime  ATT/P/OL
    XR1.00-00             0x0000000d   0x6be5        1016            0/0/0
      Area Address: 49.0001
      NLPID:        0xcc
      Hostname:     XR1
      IP Address:   1.1.1.1
      Metric: 10         IS XR2.00
      Metric: 10         IP 1.1.1.1/32
      Metric: 10         IP 10.0.12.0/24
      Metric: 0          IP-External 100.100.100.100/32

    Inter-Area

    XR3 has now been added to the topology. I’ve had to move XR2 into the same area as XR3 otherwise they will not be able to form a L1 adjacency:
    IS IS 2 Demystifying the IS IS database

    the R2-R3 link has not been changed to point-to-point, and as such I would expect to see three LSPs in XR3s database:

    RP/0/0/CPU0:XR3#show isis database
    Tue Aug 12 09:44:40.660 UTC
    
    IS-IS 1 (Level-1) Link State Database
    LSPID                 LSP Seq Num  LSP Checksum  LSP Holdtime  ATT/P/OL
    XR2.00-00             0x00000008   0xd230        1107            1/0/0
    XR3.00-00           * 0x00000008   0xf1be        1105            0/0/0
    XR3.07-00             0x00000003   0xfcd3        1105            0/0/0
    
     Total Level-1 LSP count: 3     Local Level-1 LSP count: 1

    If you look at XR2′s L1 LSP in detail you now see the ATT bit set. Also note it’s advertising only it’s directly connected interfaces:

    RP/0/0/CPU0:XR3#show isis database XR2.00-00 detail
    Tue Aug 12 19:45:51.025 UTC
    
    IS-IS 1 (Level-1) Link State Database
    LSPID                 LSP Seq Num  LSP Checksum  LSP Holdtime  ATT/P/OL
    XR2.00-00             0x00000008   0xd230        1037            1/0/0
      Area Address: 49.0023
      NLPID:        0xcc
      Hostname:     XR2
      IP Address:   2.2.2.2
      Metric: 10         IS XR3.07
      Metric: 10         IP 2.2.2.2/32
      Metric: 10         IP 10.0.12.0/24
      Metric: 10         IP 10.0.23.0/24

    XR2 has set the ATT bit which is the attached bit. An L1/L2 router will set this bit in the LSP inside the L1 area it’s connected to. This is to inform the L1 routers that it is attached to the L2 domain. No actual default route is advertised, but L1 routers can create their own defaults pointing towards the attached routers:

    RP/0/0/CPU0:XR3#sh ip route 0.0.0.0
    Tue Aug 12 19:47:07.839 UTC
    
    Routing entry for 0.0.0.0/0
      Known via "isis 1", distance 115, metric 10, candidate default path, type level-1
      Installed Aug 12 19:43:09.476 for 00:03:58
      Routing Descriptor Blocks
        10.0.23.2, from 2.2.2.2, via GigabitEthernet0/0/0/0.23
          Route metric is 10
      No advertising protos.

    Notice from XR1′s persepctive, that any routes coming from an L1 area is simple flooded from the L1/L2 router as normal routes:

    RP/0/0/CPU0:XR1#show isis database XR2.00-00 detail
    Tue Aug 12 19:50:08.676 UTC
    
    IS-IS 1 (Level-2) Link State Database
    LSPID                 LSP Seq Num  LSP Checksum  LSP Holdtime  ATT/P/OL
    XR2.00-00             0x0000001b   0x5b3d        778             0/0/0
      Area Address: 49.0023
      NLPID:        0xcc
      Hostname:     XR2
      IP Address:   2.2.2.2
      Metric: 10         IS XR1.00
      Metric: 10         IP 2.2.2.2/32
      Metric: 20         IP 3.3.3.3/32
      Metric: 10         IP 10.0.12.0/24
      Metric: 10         IP 10.0.23.0/24
      Metric: 10         IP 200.200.200.200/32

    IS-IS gives you the ability to leak L2 prefixes into the L1 domain. This is handy when you have two L1/L2 border routers and want to engineer destiations to go on particular paths. From XR2 I’ll leak XR1′s loopback into the L1 domain. The database now shows:

    RP/0/0/CPU0:XR3#show isis database XR2.00-00 detail
    Tue Aug 12 21:53:13.981 UTC
    
    IS-IS 1 (Level-1) Link State Database
    LSPID                 LSP Seq Num  LSP Checksum  LSP Holdtime  ATT/P/OL
    XR2.00-00             0x0000002f   0x4e13        1193            1/0/0
      Area Address: 49.0023
      NLPID:        0xcc
      Hostname:     XR2
      IP Address:   2.2.2.2
      Router Cap:   2.2.2.2, D:0, S:0
      Metric: 10         IS XR3.07
      Metric: 20         IP-Interarea 1.1.1.1/32
      Metric: 10         IP 2.2.2.2/32
      Metric: 10         IP 10.0.23.0/24

    1.1.1.1/32 shows up in LSP as an IP-Interarea route. Again a TLV is used for this.

    IPv6

    When running both IPv4 and IPv6 at the same time, IS-IS can be run in single-topology or multi-topolgy mode. In single topology, all your IS-IS links need to have both v4 and v6 addresses as the SPF tree is run indenpently of prefix information. If the SPF tree is calculated to use a link without a v6 address, IPv6 traffic will be blackholed over that link.

    For now I’ve added an IPv6 loopback and interface address. I’ve got IS-IS running in multi topology mode. I should still only see two LSPs from XR1′s perspective:

    RP/0/0/CPU0:XR1#show isis database
    Tue Aug 12 23:47:02.152 UTC
    
    IS-IS 1 (Level-2) Link State Database
    LSPID                 LSP Seq Num  LSP Checksum  LSP Holdtime  ATT/P/OL
    XR1.00-00           * 0x0000001e   0x9683        1115            0/0/0
    XR2.00-00             0x0000002b   0x62fa        1117            0/0/0
    
     Total Level-2 LSP count: 2     Local Level-2 LSP count: 1

    IPv6 information is carried inside another TLV. Note also that there is a new NLPID value of 0x8e in the LSP. As you would guess this value represents IPv6:

    RP/0/0/CPU0:XR1#show isis database detail XR2.00-00
    Tue Aug 12 23:47:50.899 UTC
    
    IS-IS 1 (Level-2) Link State Database
    LSPID                 LSP Seq Num  LSP Checksum  LSP Holdtime  ATT/P/OL
    XR2.00-00             0x0000002b   0x62fa        1068            0/0/0
      Area Address: 49.0023
      NLPID:        0xcc
      NLPID:        0x8e
      MT:           Standard (IPv4 Unicast)
      MT:           IPv6 Unicast                                     0/0/0
      Hostname:     XR2
      IP Address:   2.2.2.2
      IPv6 Address: 2001:db8:2:2::2
      Metric: 10         IS XR1.00
      Metric: 10         IP 2.2.2.2/32
      Metric: 20         IP 3.3.3.3/32
      Metric: 10         IP 10.0.12.0/24
      Metric: 10         IP 10.0.23.0/24
      Metric: 10         IP 200.200.200.200/32
      Metric: 10         MT (IPv6 Unicast) IS-Extended XR1.00
      Metric: 10         MT (IPv6 Unicast) IPv6 2001:db8:2:2::2/128
      Metric: 10         MT (IPv6 Unicast) IPv6 2001:db8:12::/64

    When running multi-topology mode, you’ll see MT: plus the address families configured for multi-topology. If I change this to single topology:

    RP/0/0/CPU0:XR1#show isis database XR2.00-00 detail
    Tue Aug 12 23:11:20.989 UTC
    
    IS-IS 1 (Level-2) Link State Database
    LSPID                 LSP Seq Num  LSP Checksum  LSP Holdtime  ATT/P/OL
    XR2.00-00             0x00000023   0xd22a        1196            0/0/0
      Area Address: 49.0023
      NLPID:        0xcc
      NLPID:        0x8e
      Hostname:     XR2
      IP Address:   2.2.2.2
      IPv6 Address: 2001:db8:2:2::2
      Metric: 10         IS XR1.00
      Metric: 10         IP 2.2.2.2/32
      Metric: 10         IP 10.0.12.0/24
      Metric: 10         IP 10.0.23.0/24
      Metric: 10         IP 200.200.200.200/32
      Metric: 10         IPv6 2001:db8:2:2::2/128
      Metric: 10         IPv6 2001:db8:12::/64

    MT no longer shows up, and all TLVs are added as-is to the LSP.

    Traffic Engineering

    To enable TE, wide-metrics need to be enabled. Up until this point I’ve been using narrow metrics. Once enabled You can see the TE information in the LSP by doing a verbose output:

    RP/0/0/CPU0:XR1#show isis database verbose XR2.00-00
    Tue Aug 12 23:42:09.932 UTC
    
    IS-IS 1 (Level-2) Link State Database
    LSPID                 LSP Seq Num  LSP Checksum  LSP Holdtime  ATT/P/OL
    XR2.00-00             0x00000026   0x2dd8        910             0/0/0
      Area Address: 49.0023
      NLPID:        0xcc
      NLPID:        0x8e
      Hostname:     XR2
      IP Address:   2.2.2.2
      IPv6 Address: 2001:db8:2:2::2
      Router ID:    2.2.2.2
      Metric: 10         IS-Extended XR1.00
        Affinity: 0x00000000
        Interface IP Address: 10.0.12.2
        Neighbor IP Address: 10.0.12.1
        Physical BW: 1000000 kbits/sec
        Reservable Global pool BW: 0 kbits/sec
        Global Pool BW Unreserved:
          [0]: 0        kbits/sec          [1]: 0        kbits/sec
          [2]: 0        kbits/sec          [3]: 0        kbits/sec
          [4]: 0        kbits/sec          [5]: 0        kbits/sec
          [6]: 0        kbits/sec          [7]: 0        kbits/sec
        Admin. Weight: 167772160
        Ext Admin Group: Length: 32
          0x00000000   0x00000000
          0x00000000   0x00000000
          0x00000000   0x00000000
          0x00000000   0x00000000
      Metric: 10         IP-Extended 2.2.2.2/32
      Metric: 10         IP-Extended 10.0.12.0/24
      Metric: 10         IP-Extended 10.0.23.0/24
      Metric: 10         IP-Extended 200.200.200.200/32
      Metric: 10         IPv6 2001:db8:2:2::2/128
      Metric: 10         IPv6 2001:db8:12::/64

    Notice there there is no new NLPID value for TE. TE extensions are enabled under address-family ipv4 and hence it uses the 0xcc id. If/when RSVP-TE can use IPv6 natively, I could expect to see only the IPv6 ID.

    Overload

    IS-IS has the ability to set the overload bit in an LSP. This could be originated by the router itself if it was overwhelmed, but it can also be hard set when doing planned works for example. If the overload bit is set, other routers will route around the router.

    router isis 1
     set-overload-bit

    Note that OL bit set in the LSP:

    RP/0/0/CPU0:XR1#show isis database
    Tue Aug 12 23:32:58.107 UTC
    
    IS-IS 1 (Level-2) Link State Database
    LSPID                 LSP Seq Num  LSP Checksum  LSP Holdtime  ATT/P/OL
    XR1.00-00           * 0x0000001f   0x9484        947             0/0/0
    XR2.00-00             0x0000002e   0x97a4        1151            0/0/1
    
     Total Level-2 LSP count: 2     Local Level-2 LSP count: 1

    I no longer have access to R3 now as R2 is the only router connecting these two devices:

    RP/0/0/CPU0:XR1#ping 3.3.3.3
    Tue Aug 12 23:08:44.083 UTC
    Type escape sequence to abort.
    Sending 5, 100-byte ICMP Echos to 3.3.3.3, timeout is 2 seconds:
    UUUUU
    Success rate is 0 percent (0/5)

    I am still able to ping XR2 itself though:

    RP/0/0/CPU0:XR1#ping 2.2.2.2
    Tue Aug 12 23:09:32.870 UTC
    Type escape sequence to abort.
    Sending 5, 100-byte ICMP Echos to 2.2.2.2, timeout is 2 seconds:
    !!!!!
    Success rate is 100 percent (5/5), round-trip min/avg/max = 1/1/1 ms

    We’ve now seen the purpose of both the ATT and OL bits, so what is the P bit for? that bit is for the Partition Repair Bit which no vendor has implemented. i.e. it should always show 0.

    Segment Routing

    IS-IS is easily extended using new TLVs. If I enable segment routing under my IS-IS process, I see it added as a new TLV in the LSP:

    RP/0/0/CPU0:XR1#show isis database verbose XR2.00-00
    Tue Aug 12 23:50:35.855 UTC
    
    IS-IS 1 (Level-2) Link State Database
    LSPID                 LSP Seq Num  LSP Checksum  LSP Holdtime  ATT/P/OL
    XR2.00-00             0x00000036   0x252b        954             0/0/0
      Area Address: 49.0023
      NLPID:        0xcc
      NLPID:        0x8e
      MT:           Standard (IPv4 Unicast)
      MT:           IPv6 Unicast                                     0/0/0
      Hostname:     XR2
      IP Address:   2.2.2.2
      IPv6 Address: 2001:db8:2:2::2
      Router Cap:   2.2.2.2, D:0, S:0
        Segment Routing: I:1 V:0, SRGB Base: 900000 Range: 65535
      Metric: 10         IS XR1.00
      Metric: 10         IP 2.2.2.2/32
      Metric: 20         IP 3.3.3.3/32
      Metric: 10         IP 10.0.12.0/24
      Metric: 10         IP 10.0.23.0/24
      Metric: 10         IP 200.200.200.200/32
      Metric: 10         MT (IPv6 Unicast) IS-Extended XR1.00
      Metric: 10         MT (IPv6 Unicast) IPv6 2001:db8:2:2::2/128
      Metric: 10         MT (IPv6 Unicast) IPv6 2001:db8:12::/64

    The Accumulated IGP Metric Attribute for BGP

    This is an interesting draft which can ensure better paths are chosen in certain corner cases. Before this draft, BGP was able to redistribute the IGP metric as a MED value into BGP. The issue with MED is that it’s very low on the BGP best path algorithm. Note that Cisco/Brocade consider weight as primary, but I’ll ignore that for now

    1. Highest Local-Preference
    2. Shortest AS-Path
    3. Lowest Origin Code
    4. Lowest MED
    5. ETC

    MED is only number 4 in the pecking order. In a large network it might be difficult to get everything to match up to that point. Accumulated IGP Metric is a new non-transitive BGP path attribute that carries the IGP metric inside the BGP NLRI. Not only that, but the best-path algorithms are changed as follows:

    1. Highest Local-Preference
    2. Lowest AIGP Cost
    3. Shortest AS-Path
    4. Lowest Origin Code
    5. Lowest MED
    6. ETC

    As long as your local-preference values match, the lowest AIGP cost is taken into account.

    No AIGP

    Take the following topology into consideration:
    BGP AIGP The Accumulated IGP Metric Attribute for BGP
    Assuming all link costs are the same, the shortest path for XR2 to get to IOS2 is via path XR1-XR4-IOS2. I’m going to ignore MED on XR2 for now.

    Quick relevant configs on XR1 and IOS1:

    interface GigabitEthernet0/0/0/1
     description Link to XR2
     ipv4 address 10.0.12.1 255.255.255.0
    !
    interface GigabitEthernet0/0/0/2
     description Link to XR3
     ipv4 address 10.0.13.1 255.255.255.0
    !
    prefix-set 20.20.20.20
      20.20.20.20/32
    end-set
    !
    route-policy PASS
      pass
    end-policy
    !
    route-policy IOS2_LOOPBACK
      if destination in 20.20.20.20 then
        done
      else
        drop
      endif
    end-policy
    !
    router isis 1
     is-type level-2-only
     net 49.0001.0000.0000.0001.00
     address-family ipv4 unicast
      metric-style wide
     !
     interface Loopback0
      address-family ipv4 unicast
      !
     !
     interface GigabitEthernet0/0/0/2
      address-family ipv4 unicast
      !
     !
    !
    router bgp 64512
     address-family ipv4 unicast
      redistribute isis 1 route-policy IOS2_LOOPBACK
     !
     neighbor 10.0.12.2
      remote-as 64513
      address-family ipv4 unicast
       route-policy PASS in
       route-policy PASS out
      !
     !
    !
    interface Loopback0
     ip address 10.10.10.10 255.255.255.255
     ip router isis 1
    !
    interface GigabitEthernet0/0
    !
    interface GigabitEthernet0/0.41
     encapsulation dot1Q 41
     ip address 10.0.41.1 255.255.255.0
     ip router isis 1
    !
    router isis 1
     net 49.0001.0000.0000.0010.00
     is-type level-2-only
     metric-style wide
    !
    router bgp 64512
     bgp log-neighbor-changes
     redistribute isis 1 level-2 route-map IOS2_LOOPBACK
     neighbor 10.0.21.2 remote-as 64513
    !
    ip prefix-list 20.20.20.20 seq 5 permit 20.20.20.20/32
    !
    route-map IOS2_LOOPBACK permit 10
     match ip address prefix-list 20.20.20.20
    !
    route-map IOS2_LOOPBACK deny 20

    XR2 should now have the 20.20.20.20/32 prefix twice. Let’s check the route that XR2 chose:

    RP/0/0/CPU0:XR2#show bgp ipv4 un 20.20.20.20
    Mon Aug 11 18:29:47.825 UTC
    BGP routing table entry for 20.20.20.20/32
    Versions:
      Process           bRIB/RIB  SendTblVer
      Speaker                  5           5
    Last Modified: Aug 11 18:24:57.101 for 00:04:50
    Paths: (2 available, best #1)
      Advertised to update-groups (with more than one peer):
        0.1
      Path #1: Received by speaker 0
      Advertised to update-groups (with more than one peer):
        0.1
      64512
        10.0.12.1 from 10.0.12.1 (1.1.1.1)
          Origin incomplete, metric 0, localpref 100, valid, external, best, group-best, import-candidate, import suspect
          Received Path ID 0, Local Path ID 1, version 5
          Origin-AS validity: not-found
      Path #2: Received by speaker 0
      Not advertised to any peer
      64512
        10.0.21.1 from 10.0.21.1 (10.10.10.10)
          Origin incomplete, metric 0, localpref 100, valid, external, import-candidate, import suspect
          Received Path ID 0, Local Path ID 0, version 0
          Origin-AS validity: not-found

    Currently its going the correct way, however what happens if XR1′s route to 20.20.20.20/32 was increased?

    router isis 1
    !
     interface GigabitEthernet0/0/0/2
      address-family ipv4 unicast
       metric 100

    XR2 still sees the best route via XR1:

    RP/0/0/CPU0:XR2#show route ipv4 20.20.20.20
    Mon Aug 11 18:31:34.958 UTC
    
    Routing entry for 20.20.20.20/32
      Known via "bgp 64513", distance 20, metric 0
      Tag 64512, type external
      Installed Aug 11 18:24:57.065 for 00:06:37
      Routing Descriptor Blocks
        10.0.12.1, from 10.0.12.1, BGP external
          Route metric is 0
      No advertising protos.

    AIGP

    In order to send AIGP, you need to ensure that the AIGP metric is being set in your route-policy, as well as turn on the feature under the neighbour address family. I’ll be doing this on XR1:

    route-policy IOS2_LOOPBACK
      if destination in 20.20.20.20 then
        set aigp-metric igp-cost
      else
        drop
      endif
    end-policy
    !
    router bgp 64512
     !
     neighbor 10.0.12.2
      address-family ipv4 unicast
       aigp

    AIGP has just been added to legacy IOS on version 15.4(3)T which is a version I don’t have in my lab yet. Let’s take a look at the consequences of one setting this value and the other not.

    RP/0/0/CPU0:XR2#show route ipv4 20.20.20.20
    Mon Aug 11 18:42:42.952 UTC
    
    Routing entry for 20.20.20.20/32
      Known via "bgp 64513", distance 20, metric 120 (AIGP metric)
      Tag 64512, type external
      Installed Aug 11 18:35:09.393 for 00:07:33
      Routing Descriptor Blocks
        10.0.12.1, from 10.0.12.1, BGP external
          Route metric is 120

    IOS-XR is preferring the route with the AIGP metric set. You can see the metric value of 120 has been learned. It also sets the local route metric to 120. The update from IOS1 is not preffered so it seems like a non-aigp value is seen as worse than any aigp value that may be set.

    I’m going to swap out IOS1 with another IOS-XR box. This new XR box will be advertising the route with the same metric as IOS1 currently is.
    BGP AIGP 21 The Accumulated IGP Metric Attribute for BGP

    XR2 should now be seeing both AIGP values and choosing XR5 as the next-hop:

    RP/0/0/CPU0:XR2#show bgp ipv4 unicast 20.20.20.20/32
    Mon Aug 11 19:33:43.432 UTC
    BGP routing table entry for 20.20.20.20/32
    Versions:
      Process           bRIB/RIB  SendTblVer
      Speaker                  9           9
    Last Modified: Aug 11 19:33:33.101 for 00:00:10
    Paths: (2 available, best #2)
      Advertised to update-groups (with more than one peer):
        0.1
      Path #1: Received by speaker 0
      Not advertised to any peer
      64512
        10.0.12.1 from 10.0.12.1 (1.1.1.1)
          Origin incomplete, metric 0, localpref 100, aigp metric 120, valid, external, import suspect
          Received Path ID 0, Local Path ID 0, version 0
          Origin-AS validity: not-found
          Total AIGP metric 120
      Path #2: Received by speaker 0
      Advertised to update-groups (with more than one peer):
        0.1
      64512
        10.0.52.5 from 10.0.52.5 (5.5.5.5)
          Origin incomplete, metric 0, localpref 100, aigp metric 40, valid, external, best, group-best, import-candidate, import suspect
          Received Path ID 0, Local Path ID 1, version 9
          Origin-AS validity: not-found
          Total AIGP metric 40

    Once again, the local route metric has been set to match the AIGP metric:

    RP/0/0/CPU0:XR2#sh ip route 20.20.20.20
    Mon Aug 11 19:34:54.567 UTC
    
    Routing entry for 20.20.20.20/32
      Known via "bgp 64513", distance 20, metric 40 (AIGP metric)
      Tag 64512, type external
      Installed Aug 11 19:33:33.523 for 00:01:21
      Routing Descriptor Blocks
        10.0.52.5, from 10.0.52.5, BGP external
          Route metric is 40

    Segment Routing on IOS-XR

    Cisco has released some support for segment-routing on IOS-XR 5.2.0 so what better time to lab it up. I’ve got four IOS-XRv boxes running 5.2.0:

    RP/0/0/CPU0:XR1#sh ver | include XR
    Cisco IOS XR Software, Version 5.2.0[Default]

    Currently IS-IS is the only protocol with support in XR. There are drafts to get this working in both OSPFv2 and OSPFv3

    Segment Routing?

    Segment routing is a huge topic. In the long run it’ll make it very easy for an SDN controller to force packets through the network in any way it wants. The draft says that it can use the existing MPLS data plane (aka labels) or the IPv6 data plane (header extensions). Right now support is for the MPLS data plane only. The nice thing here is that all devices that can currently switch based on labels should really only need a software upgrade to run segment routing in it’s current form.

    Currently, in order to populate the MPLS data plane with labels you need a MPLS control plane protocol to distribute those labels. With segment routing, those labels are distributed with the IGP. Your core is now simplified as it’s only running the IGP with no LDP or RSVP. Your core no longer needs to keep LDP or RSVP state at all.

    Traffic Engineering

    Take the following simple diagram into consideration:
    SR 1 Segment Routing on IOS XR
    I’d like to use both paths to get from PE1 to PE2 for different taffic flows. This is possible with RSVP by creating multiple RSVP-TE tunnels:
    SR 2 Segment Routing on IOS XR
    The above works perfectly fine, but those P routers need to keep state for each and every RSVP tunnel going through them. In segment routing, there is a concept of a node segment and adjaceny segment. There are also other segment types but I won’t go into that yet. With the MPLS dataplane, each segment has a label. I can therefore force traffic to go over a certain segment by adding the segment label to the stack.
    SR 3 Segment Routing on IOS XR
    In the above diagram, if I want PE1 to send to PE2 via the shortest path, it simply imposes the node segment of PE2 onto the packet and sends it on. Every router in the core knows what PE2′s node segment is and as such the packet is pushed through using only that single label. Note that standard MPLS PHP behaviour is still used:
    SR 4 Segment Routing on IOS XR

    If I wanted to force traffic to PE2 to go over the P1-P2 link and then the P2-P3 link, I would stack the labels to ensure it went that way. It’s the ingress PE that decides this:
    SR 5 Segment Routing on IOS XR
    PE1 has stacked the labels in such a way that it forces the packet to go to particular segments. The core does not need to contain any of the LSP state. It simply installs the labels from the IGPs previously sent.

    Configuration

    Segment Routing in 5.2.0 has been enabled, but at a preliminary level only. IS-IS is the only IGP supported. MPLS dataplane is only supported. I can’t seem to find a way to advertise adjaceny segments yet, only node segments. All of the above is fine for an MPLS L3VPN lab. I’ll be using the following topology:
    SR 6 Segment Routing on IOS XR
    The CEs are running OSPFv2 and advertising their loopbacks into OSPF:

    interface Loopback0
     ip address 100.100.100.100 255.255.255.255
     ip ospf 1 area 0
    !
    interface GigabitEthernet0/0.11
     encapsulation dot1Q 11
     ip address 10.0.11.1 255.255.255.0
     ip ospf 1 area 0

    The PE config is pretty standard:

    vrf CUS1
     address-family ipv4 unicast
      import route-target
       100:1
      !
      export route-target
       100:1
      !
     !
    !
    router ospf CUS1
     vrf CUS1
      redistribute bgp 100
      area 0
       interface GigabitEthernet0/0/0/0.11
       !
      !
     !
    !
    router bgp 100
     address-family vpnv4 unicast
     !
     neighbor 4.4.4.4
      remote-as 100
      update-source Loopback0
      address-family vpnv4 unicast
      !
     !
     vrf CUS1
      rd 100:4
      address-family ipv4 unicast
       redistribute ospf CUS1
      !
     !
    !

    XR1 has a VPNv4 session with XR4 and advertising the prefixes over. Segment routing is now enabled under the core IGP, IS-IS:

    router isis 1
     is-type level-2-only
     net 49.0001.0000.0000.0001.00
     address-family ipv4 unicast
      metric-style wide
      segment-routing mpls
     !
     interface Loopback0
      address-family ipv4 unicast
       prefix-sid index 1000
      !
     !
     interface GigabitEthernet0/0/0/1
      address-family ipv4 unicast
      !
     !
     interface GigabitEthernet0/0/0/2
      address-family ipv4 unicast
      !
     !
    !

    For now you can only configure the node ID under the loopback interface. Once this is all done, I should have a labbeled router to R4′s loopback, without LDP or RSVP:

    RP/0/0/CPU0:XR1#show cef  4.4.4.4 | include labels
    Sun Aug 10 19:48:51.587 UTC
         local label 904000      labels imposed {904000}
         local label 904000      labels imposed {904000}
    
    RP/0/0/CPU0:XR1#show mpls int gigabitEthernet 0/0/0/1 detail
    Sun Aug 10 19:49:25.145 UTC
    Interface GigabitEthernet0/0/0/1:
            LDP labelling not enabled
            LSP labelling not enabled
            MPLS ISIS enabled
            MPLS enabled

    There are two labels are XR1 has two equal cost paths to XR4. A quick traceroute will show the same label:

    RP/0/0/CPU0:XR1#traceroute 4.4.4.4
    Sun Aug 10 19:50:16.191 UTC
    
    Type escape sequence to abort.
    Tracing the route to 4.4.4.4
    
     1  10.0.12.2 [MPLS: Label 904000 Exp 0] 9 msec  0 msec  0 msec
     2  10.0.24.4 0 msec  *  0 msec

    Note that L3VPN still uses an inner label, the service/VPN label. The outer transport label has been replaced with the segment routing label. A traceroute from CE1 to CE2 will confirm this:

    CE1#traceroute 200.200.200.200 so lo0 numeric
    Type escape sequence to abort.
    Tracing the route to 200.200.200.200
    VRF info: (vrf in name/id, vrf out name/id)
      1 10.0.11.10 1 msec 1 msec 1 msec
      2 10.0.12.2 [MPLS: Labels 904000/16001 Exp 0] 4 msec 3 msec 3 msec
      3 10.0.24.4 [MPLS: Label 16001 Exp 0] 3 msec 7 msec 3 msec
      4 10.0.42.2 4 msec *  4 msec

    Conclusions

    • Basic segment routing is increadibly easy to enable
    • I don’t see ISPs changing from RSVP-TE to SR anytime soon, but I think it will happen eventually
    • SDN is a great use case for SR, as the controller can inform PEs which segment labels to stack onto a packet as it ingresses the router
    • Perhaps even the host itself could send a packet with an SR stack imposed. Maybe that host has learnt this stack from the SDN controller? Time will tell

    OSPF Enhancements in recent IOS versions

    OSPFv3 Authentication Trailer

    In 2011 I wrote an article showing that in order to provide authenticated OSPFv3 neighbour sessions, you needed the security license on IOS.

    Manav Bhatia commented on that post stating they were working on an IETF standard to fix this. That draft became RFC6506 and then RFC7166

    Cisco has added support for RFC7166 as of IOS 15.4(2)T and IOS-XE 3.11S

    Configuration is very quick and easy. Note that OSPFv3 authentication headers do not support md5 according to the RFC. If you configure your key chain with md5, it will not work.
    OSPFv3 AUTH OSPF Enhancements in recent IOS versions

    R1#sh run int gi0/0.12
    Building configuration...
    
    Current configuration : 125 bytes
    !
    interface GigabitEthernet0/0.12
     encapsulation dot1Q 12
     ipv6 address 2001:DB8:12:0:10:1:2:1/64
     ospfv3 1 ipv6 area 0
    end

    Standard interface config. I’ll now configure the key chain and authenticate ensure all area 0 adjacencies:

    R1#sh run | sec key chain
    key chain AUTH
     key 1
      key-string RFC
      cryptographic-algorithm hmac-sha-512
    
    R1#sh run | sec router ospfv3
    router ospfv3 1
     router-id 1.1.1.1
     !
     address-family ipv6 unicast
      authentication mode strict
      area 0 authentication key-chain AUTH
     exit-address-family

    Verify:

    R1#show ospfv3 interface
    GigabitEthernet0/0.12 is up, line protocol is up
      Link Local Address FE80::A8AA:11FF:FE11:1111, Interface ID 15
      Area 0, Process ID 1, Instance ID 0, Router ID 1.1.1.1
      Network Type BROADCAST, Cost: 1
      Cryptographic authentication enabled with strict key lifetime
        Sending SA: Key 1, Algorithm HMAC-SHA-512 - key chain AUTH
      Transmit Delay is 1 sec, State BDR, Priority 1
      Designated Router (ID) 2.2.2.2, local address FE80::A8AA:22FF:FE22:2222
      Backup Designated router (ID) 1.1.1.1, local address FE80::A8AA:11FF:FE11:1111
      Timer intervals configured, Hello 10, Dead 40, Wait 40, Retransmit 5
        Hello due in 00:00:06
      Graceful restart helper support enabled
      Index 1/1/1, flood queue length 0
      Next 0x0(0)/0x0(0)/0x0(0)
      Last flood scan length is 2, maximum is 2
      Last flood scan time is 0 msec, maximum is 0 msec
      Neighbor Count is 1, Adjacent neighbor count is 1
        Adjacent with neighbor 2.2.2.2  (Designated Router)
      Suppress hello for 0 neighbor(s)
    
    
    R1#show ospfv3 neighbor
    
              OSPFv3 1 address-family ipv6 (router-id 1.1.1.1)
    
    Neighbor ID     Pri   State           Dead Time   Interface ID    Interface
    2.2.2.2           1   FULL/DR         00:00:33    15              GigabitEthernet0/0.12

    Oddly, IOS-XR 5.2.0 still does not support this RFC. Only the previous IPSec authentication:

    RP/0/0/CPU0:XR6(config-ospfv3-ar)#authentication ?
      disable  Do not authenticate OSPFv3 packets
      ipsec    Use IPSec AH authentication

    OSPFv2 Multiarea Adjacency

    In 2012 I wrote another post explaining the problem of suboptimal routing in OSPFv2. RFC5185 was created to allow a single interface to be in multiple areas. At the time of writing that original post, this feature was only in Junos and IOS-XE. This has now been added to IOS 15.4(1)T I recommend you read the above post first to understand the issue.

    I’ll use a similar topology to that original post. I’ve substituted R5 and R6 with IOS-XR boxes:
    OSPF MA1 OSPF Enhancements in recent IOS versions

    XR5 goes over the primary link, but XR6 goes over the backup:

    RP/0/0/CPU0:XR5#traceroute 4.4.4.4
    Thu Aug  7 21:39:21.188 UTC
    
    Type escape sequence to abort.
    Tracing the route to 4.4.4.4
    
     1  10.0.15.1 9 msec  0 msec  0 msec
     2  10.0.14.4 0 msec  *  0 msec
    
    
    RP/0/0/CPU0:XR6#traceroute 4.4.4.4
    Thu Aug  7 21:39:31.999 UTC
    
    Type escape sequence to abort.
    Tracing the route to 4.4.4.4
    
     1  10.0.26.2 0 msec  0 msec  0 msec
     2  10.0.24.4 0 msec  *  0 msec

    Configuration is pretty simple. Add the original area plus the second area to the interface needed:

    interface GigabitEthernet0/0.13
     encapsulation dot1Q 13
     ip address 10.0.13.1 255.255.255.0
     ip ospf multi-area 4
     ip ospf 1 area 0

    To verify:

    R1#show ip ospf 1 multi-area
    OSPF_MA0 is down, line protocol is down
      Primary Interface GigabitEthernet0/0.13, Area 4
      Interface ID 17
      MTU is 1500 bytes
      Interface DOWN as link is not P2P

    An interesting caveat, the interface needs to be in point-to-point mode for this to work:

    R1(config)#int gi0/0.13
    R1(config-subif)#ip ospf net point-to-point

    Once I’ve made the above changes on R1, R2, and R3:

    R1#show ip ospf 1 multi-area
    OSPF_MA0 is up, line protocol is up
      Primary Interface GigabitEthernet0/0.13, Area 4
      Interface ID 17
      MTU is 1500 bytes
      Neighbor Count is 1

    A traceroute from XR6 should now follow the path over the primary link:

    RP/0/0/CPU0:XR6#traceroute 4.4.4.4
    Thu Aug 7 21:55:58.302 UTC
    
    Type escape sequence to abort.
    Tracing the route to 4.4.4.4
    
     1  10.0.26.2 0 msec  0 msec  0 msec
     2  10.0.23.3 0 msec  0 msec  0 msec
     3  10.0.13.1 0 msec  0 msec  0 msec
     4  10.0.14.4 0 msec  *  0 msec

    OSPF Multi-Area Adjacency is one of those things that can fix some odd corner case topologies. I would not recommend it. The issue is that now R3 has a full area 4 and area 0 database. It’s also messy. Rather redesign your network!

    IOS-XR has had this feature since v3.4.1 – A quick config on XR6:

    RP/0/0/CPU0:XR6#sh run router ospf
    Thu Aug 7 22:05:25.833 UTC
    router ospf 1
     area 0
      interface GigabitEthernet0/0/0/0.26
       network point-to-point
      !
     !
     area 10
      multi-area-interface GigabitEthernet0/0/0/0.26
      !
     !
    !

    To verify on XR you need to look at the last few lines on a show ospf interface:

    RP/0/0/CPU0:XR6#show ospf interface gi0/0/0/0.26
    Thu Aug 7 22:05:53.841 UTC
    
    GigabitEthernet0/0/0/0.26 is up, line protocol is up
      Internet Address 10.0.26.6/24, Area 0
      Process ID 1, Router ID 10.0.26.6, Network Type POINT_TO_POINT, Cost: 1
      Transmit Delay is 1 sec, State POINT_TO_POINT, MTU 1500, MaxPktSz 1500
      Timer intervals configured, Hello 10, Dead 40, Wait 40, Retransmit 5
        Hello due in 00:00:07:158
      Index 1/1, flood queue length 0
      Next 0(0)/0(0)
      Last flood scan length is 1, maximum is 1
      Last flood scan time is 0 msec, maximum is 0 msec
      LS Ack List: current length 0, high water mark 16
      Neighbor Count is 1, Adjacent neighbor count is 1
        Adjacent with neighbor 2.2.2.2
      Suppress hello for 0 neighbor(s)
      Multi-area interface Count is 1
        Multi-Area interface exist in area 10 Neighbor Count is 1

    OSPFv3 Multiarea Adjacency

    IOS 15.4(2)T and IOS-XE 3.11S now has support for multi-area adjacency for OSPFv3.

    The config is identical to OSPFv2 so I’m not going to go over it here.

    That’s a wrap for today.

    Various networking ramblings from Dual CCIE #38070 (R&S, SP) and JNCIE-SP #2227

    © 2009-2014 Darren O'Connor All Rights Reserved