Category Archives: Fundamentals

Fundamentals – PMTUD – IPv4 & IPv6 – Part 1 of 2

One of IPv6′s features is the fact that routers are no longer supposed to fragment packets. Rather it’s up to the hosts on either end to work out the path MTU. This is different in IPv4 in which the routers along the path could fragment the packet. Both IPv4 and IPv6 have a mechanism to work out the path MTU which is what I’ll go over in this post. Instead of going over each separately, I’ll show what problem is trying to be solved and how both differ when it comes to sending traffic.

I’ll be using the following topology in this post:
pmtu 11 Fundamentals   PMTUD   IPv4 & IPv6   Part 1 of 2

The problem

When you visit this blog, your browser is requesting a particular web page from my server. This request is usually quite small. My server needs to respond to that packet with some actual data. This includes the images, words, plugins, style-sheets, etc. This data can be quite large. My server needs to break down this stream of data into IP packets to send back to you. Each packet requires a few headers, and so the most optimum way to send data back to you is the biggest amount of data in the smallest amount of packets.

Between you and my server sits a load of different networks and hardware. There is no way for my server to know the maximum MTU supported by all those devices along the path. Not only can this path change, but I have many thousands of readers in thousands of different countries. In the topology above, the link between R2 and R4 has an MTU of 1400. None of the hosts are directly connected to that segment and so none of them know the MTU of the entire path.
pmtu 2 Fundamentals   PMTUD   IPv4 & IPv6   Part 1 of 2

PMTUD

Path MTU Discovery, RFC1191 for IPv4 and RFC1981 for IPv6, does exactly what the name suggests. Find out the MTU of the path. There are a number of similarities between the two RFCs, but a few key differences which I’ll dig into.

Note – OS implementations of PMTUD can vary widely. I’ll be showing both Debian Linux server 7.6.0 and Windows Server 2012 in this post.

Both RFCs state that hosts should always assume first that the MTU across the entire path matches the first hop MTU. i.e. The servers should assume that the MTU matches the MTU on the link they are connected. In this case both my Windows and Linux servers have a local MTU of 1500.
pmtu 3 Fundamentals   PMTUD   IPv4 & IPv6   Part 1 of 2

The link between R1 and R4 has an IP MTU of 1400. My servers would need to figure the path MTU in order to maximise the packet size without fragmentation.

  • IPv4
  • RFC1191 states:

    The basic idea is that a source host initially assumes that the PMTU of a path is the (known) MTU of its first hop, and sends all datagrams on that path with the DF bit set. If any of the datagrams are too large to be forwarded without fragmentation by some router along the path, that router will discard them and return ICMP Destination Unreachable messages with a code meaning “fragmentation needed and DF set” [7]. Upon receipt of such a message (henceforth called a “Datagram Too Big” message), the source host reduces its assumed PMTU for the path.

    In my example, the servers should assume that the path MTU is 1500. They should send packets back to the user using this MTU and setting the Do Not Fragment bit. R2′s link to R4 is not big enough and so should drop the packet and return the correct ICMP message back to my servers. Those servers should then send those packets again with a lower MTU.

    I’m going to show Wireshark capture from the servers point of view. I’ll start with Windows.

    The first part is the regular TCP 3-way handshake to set up the session. These packets are very small so are generally not fragmented:
    Screen Shot 2014 08 25 at 12.37.40 Fundamentals   PMTUD   IPv4 & IPv6   Part 1 of 2
    The user then requests a file. The server responds with full size packets with the DF bit set. Those packets are dropped by R2, who sends back the required ICMP message:
    Screen Shot 2014 08 25 at 12.39.49 Fundamentals   PMTUD   IPv4 & IPv6   Part 1 of 2

    Dig a bit deeper into those packets. First the full size packet from the server. Note the DF-bit has been set:
    Screen Shot 2014 08 25 at 12.43.49 Fundamentals   PMTUD   IPv4 & IPv6   Part 1 of 2

    Second, the ICMP message sent from R2. This is an ICMP Type 3 Code 4 message. It states the destination is unreachable and that fragmentation is required. Note it also states the MTU of the next-hop. The Windows server can use this value to re-originate it’s packets with a lower MTU.

    All the rest of the packets in the capture then have the lower MTU set. Note that Wireshark shows the ethernet MTU as well hence the value of 1414:
    Screen Shot 2014 08 25 at 12.49.11 Fundamentals   PMTUD   IPv4 & IPv6   Part 1 of 2

    RFC1191 states that a server should cache a lower MTU value. It’s also suggested that this value is cached for 10 minutes, and should be tweakable. You can view the cached value on Windows, but it doesn’t show the timer. Perhaps a reader could let me know?
    Screen Shot 2014 08 25 at 12.53.53 Fundamentals   PMTUD   IPv4 & IPv6   Part 1 of 2

    I’ll now do the same on my Debian server. First part is the 3-way handshake again:
    Screen Shot 2014 08 26 at 1.27.02 pm Fundamentals   PMTUD   IPv4 & IPv6   Part 1 of 2
    The server starts sending packets with an MTU of 1500:
    Screen Shot 2014 08 26 at 1.28.48 pm Fundamentals   PMTUD   IPv4 & IPv6   Part 1 of 2
    Which are dropped by R2, with ICMP messages sent back:
    Screen Shot 2014 08 26 at 1.29.52 pm Fundamentals   PMTUD   IPv4 & IPv6   Part 1 of 2
    The Debian server will cache that entry. Debian does show me the remaining cache time, in this case 584 seconds:
    Screen Shot 2014 08 26 at 1.32.23 pm Fundamentals   PMTUD   IPv4 & IPv6   Part 1 of 2

  • IPv6
  • RFC1981 goes over the finer details of how this works with IPv6. The majority of the document is identical to the RFC1191 version.

    When the Debian server responds, the packets have a size of 1514 on the wire as expected. Note however that there is no DF bit in IPv6 packets. This is a major difference between IPv4 and IPV6 right here. Routers CANNOT fragment IPv6 packets and hence there is no reason to explicitly state this in the packet. All IPv6 packets are non-fragmentable by routers in the path. I’ll go over what this means in depth later.
    Screen Shot 2014 08 27 at 8.06.39 am Fundamentals   PMTUD   IPv4 & IPv6   Part 1 of 2
    R2 cannot forward this packet and drops it. The message returned by R2 is still an ICMP message, but it’s a bit different to the IPv4 version:
    Screen Shot 2014 08 27 at 8.10.56 am Fundamentals   PMTUD   IPv4 & IPv6   Part 1 of 2

    This time the message is ‘Packet too big’ – Very easy to figure out what that means. The ICMP message will contain the MTU of the next-hop as expected:
    Screen Shot 2014 08 27 at 8.14.02 am Fundamentals   PMTUD   IPv4 & IPv6   Part 1 of 2

    The server will act on this message, cache the result, then send packets up to the required MTU:
    Screen Shot 2014 08 27 at 8.17.30 am Fundamentals   PMTUD   IPv4 & IPv6   Part 1 of 2
    Screen Shot 2014 08 27 at 8.18.29 am Fundamentals   PMTUD   IPv4 & IPv6   Part 1 of 2

    Windows server 2012 has identical behaviour. To show the cache simply view the ipv6 destinationcache and you’re good to go.

    Problems

    So what could possibly go wrong? The above all looks good and works in the lab. The biggest issue is that both require those ICMP messages to come back to the sending host. There are a load of badly configured firewalls and ACLs out there dropping more ICMP than they are supposed to. Some people even drop ALL ICMP. There is another issue that I’ll go over in another blog post in the near future.

    In the above examples, if those ICMP messages don’t get back, the sending host will not adjust it’s MTU. If it continues to send large packets, the router with a smaller MTU will drop that packet. All that traffic is blackholed. Smaller packets like requests will get through. Ping will even get through if echo-requests and echo-replies have been let through. You might even be able to see the beginnings of a web page, but the big content will not load.

    On R1′s fa0/1 interface I’ll create this bad access list:

    R1#sh ip access-lists
    Extended IP access list BLOCK-ICMP
        10 permit icmp any any echo
        20 permit icmp any any echo-reply
        30 deny icmp any any
        40 permit ip any any

    From the client I can ping the host:
    Screen Shot 2014 08 27 at 8.31.41 am Fundamentals   PMTUD   IPv4 & IPv6   Part 1 of 2
    I can even open a text-based page from the server:
    Screen Shot 2014 08 27 at 8.32.30 am Fundamentals   PMTUD   IPv4 & IPv6   Part 1 of 2

    But try to download the file:
    Screen Shot 2014 08 27 at 8.33.39 am Fundamentals   PMTUD   IPv4 & IPv6   Part 1 of 2
    The initial 3-way handshake works fine, but nothing else happens. The Debian server is sending those packets, R2 is dropping and informing the sender, but R1 drops those packets. You’ve now got a black-hole. The same things happens with IPv6, though of course the packet dropped is the Packet Too Big message.

    Workarounds

    The best thing to do is fix the problem. Unfortunately that’s not always possible. There are a few things that can be done to work through the problem of dropped ICMP packets.
    If you know the MTU value further down the line, you can use TCP clamping. This causes the router to intercept TCP SYN packets and rewrite the TCP MSS. You need to take into account the size of the added headers.

    1#conf t
    Enter configuration commands, one per line.  End with CNTL/Z.
    R1(config)#int fa1/1
    R1(config-if)#ip tcp adjust-mss  1360
    R1(config-if)#end

    Note how the MSS value has been changed to 1360:
    Screen Shot 2014 08 28 at 1.46.58 pm Fundamentals   PMTUD   IPv4 & IPv6   Part 1 of 2

    I’ve tested with IOS 15.2(4)S2 and it also works with IPv6:
    Screen Shot 2014 08 28 at 1.54.57 pm Fundamentals   PMTUD   IPv4 & IPv6   Part 1 of 2

    The problem with this is that it’s a burden on the router configured. Your router might not even support this option. This also affects ALL TCP traffic going through that router. TCP clamping can work well for VPN tunnels, but it’s not a very scalable solution.

    Another workaround can be to get the router to disregard the DF bit and just let the routers fragment the packets:

    route-map CLEAR-DF permit 10
     set ip df 0
    !
    interface FastEthernet1/1
     ip address 192.168.4.1 255.255.255.0
     ip router isis
     ip policy route-map CLEAR-DF
     ipv6 address 2001:DB8:10:14::1/64
     ipv6 router isis

    The problem with this is that you’re placing burden on the router again. It’s also not at all efficient. Some firewalls also block fragments. Some routers might just drop fragmented packets.
    The biggest problem with this is that there is no df-bit to clear in IPv6. IPv6 packets will not be fragmented by routers. It has to be done by the host.

    End of Part One

    There is simply too much to cover in a single post. I’ll end this post here. Part two will be coming soon!

    The dangers of ignoring OSPF MTU

    Quite often I see ip ospf mtu-ignore configured when two router’s MTU have a mismatch. This is bad. To demonstrate why I’ll use the following simple topology:
    ospf MTU The dangers of ignoring OSPF MTU

    Let’s create a simple area 0 point-to-point adjacency between the two routers and make R1′s MTU slightly larger. Then ignore OSPF MTU otherwise the adjacency will not come up:

    R1
    interface GigabitEthernet1/0
     mtu 2000
     ip address 10.0.0.1 255.255.255.0
     ip ospf network point-to-point
     ip ospf mtu-ignore
     ip ospf 1 area 0
    

    The adjacency is fine as far as we can see:

    R1#sh ip ospf neighbor | beg Nei
    Neighbor ID     Pri   State           Dead Time   Address         Interface
    1.2.3.255         0   FULL/  -        00:00:30    10.0.0.1        GigabitEthernet1/0

    Now I’ve added 256 loopback interfaces onto R1 and put them all into OSPF by using network 0.0.0.0 0.0.0.0 area 0. This means all those loopback interfaces will be part of the type1 LSA originated by R1. What happens though?

    interface Loopback1
     ip address 1.2.3.1 255.255.255.255
    !
    interface Loopback2
     ip address 1.2.3.2 255.255.255.255
    !
    interface Loopback3
    !
    [etc etc etc]
    !
    router ospf 1
     network 0.0.0.0 0.0.0.0 area 0

    At first, nothing seems wrong. But take a look at the database from R1 and R2′s perspective. Remember the database should be identical.

    R1#sh ip ospf database
    
                OSPF Router with ID (1.2.3.255) (Process ID 1)
    
                    Router Link States (Area 0)
    
    Link ID         ADV Router      Age         Seq#       Checksum Link count
    1.2.3.255       1.2.3.255       33          0x80000005 0x00767B 257
    10.0.0.2        10.0.0.2        100         0x80000011 0x00C816 2
    R2#sh ip ospf database
    
                OSPF Router with ID (10.0.0.2) (Process ID 1)
    
                    Router Link States (Area 0)
    
    Link ID         ADV Router      Age         Seq#       Checksum Link count
    1.2.3.255       1.2.3.255       130         0x80000004 0x00856D 2
    10.0.0.2        10.0.0.2        128         0x80000011 0x00C816 2

    R1 sees a link count of 257 for R1s router LSA, while R2 only sees 2. This can be confimred by seeing that R2 doesn’t have any OSPF routers to R1′s loopback:

    R2#sh ip route ospf | beg Gate
    Gateway of last resort is not set
    
    

    If you wait a while you’ll see LOADING on the adjacency too. And eventually the adjacency resets and tries again:

    R2#sh ip ospf neighbor
    
    Neighbor ID     Pri   State           Dead Time   Address         Interface
    1.2.3.255         0   LOADING/  -     00:00:32    10.0.0.1        GigabitEthernet1/0
    R2#
    *Feb  6 19:11:26.958: %OSPF-5-ADJCHG: Process 1, Nbr 1.2.3.255 on GigabitEthernet1/0 
    from LOADING to DOWN, Neighbor Down: Too many retransmissions

    So what exactly is happening? If you check Wireshark you’ll see the issue straight away
    ospfmtu1 The dangers of ignoring OSPF MTU
    ospfmtu2 The dangers of ignoring OSPF MTU
    OSPF does not do any sort of path MTU discovery. R1 is attempting to send a type1 LSA and it’s using an MTU size of 2000. R2 cannot receive that large a frame and so those fragments get dropped. R2 never acknowledges the LSA as it’s not receiving anything, and eventually that causes the adjacency to reset. This then continues over and over.

    This could be hidden though. Let’s stop R1 advertising all those addresses via it’s type1 LSA and instead redistribute the links into OSPF:

    R1(config)#router ospf 1
    R1(config-router)#no network 0.0.0.0 0.0.0.0 area 0
    R1(config-router)#redistribute connected subnet
    R1(config-router)#end
    
    R2#sh ip ospf neighbor
    
    Neighbor ID     Pri   State           Dead Time   Address         Interface
    1.2.3.255         0   FULL/  -        00:00:38    10.0.0.1        GigabitEthernet1/0
    R2#sh ip route ospf | beg Gate
    Gateway of last resort is not set
    
          1.0.0.0/32 is subnetted, 255 subnets
    O E2     1.2.3.1 [110/20] via 10.0.0.1, 00:01:14, GigabitEthernet1/0
    O E2     1.2.3.2 [110/20] via 10.0.0.1, 00:01:14, GigabitEthernet1/0
    O E2     1.2.3.3 [110/20] via 10.0.0.1, 00:01:14, GigabitEthernet1/0
    O E2     1.2.3.4 [110/20] via 10.0.0.1, 00:01:14, GigabitEthernet1/0
    O E2     1.2.3.5 [110/20] via 10.0.0.1, 00:01:14, GigabitEthernet1/0
    O E2     1.2.3.6 [110/20] via 10.0.0.1, 00:01:14, GigabitEthernet1/0
    O E2     1.2.3.7 [110/20] via 10.0.0.1, 00:01:14, GigabitEthernet1/0
    O E2     1.2.3.8 [110/20] via 10.0.0.1, 00:01:14, GigabitEthernet1/0
    O E2     1.2.3.9 [110/20] via 10.0.0.1, 00:01:14, GigabitEthernet1/0
    [etc]

    This time it works, even with a mismatched MTU. Why? A type 5 LSA only has space for a single address. This means that R1 originates 255 type 5 LSAs and each of those LSA are much much smaller than 2000 bytes. This means that the LSA updates are not bigger than 1500 bytes and so we never have R2 dropping any of those packets.

    A router only originates a single router LSA, and that single LSA has to contain all the interface addresses for that router that is enabled for OSPF in the area. If a router has 1000 interfaces, well that’s a large type1.

    You can see the individual type5s in the database itself:

    R2#sh ip ospf database | beg External
                    Type-5 AS External Link States
    
    Link ID         ADV Router      Age         Seq#       Checksum Tag
    1.2.3.1         1.2.3.255       390         0x80000001 0x00692A 0
    1.2.3.2         1.2.3.255       390         0x80000001 0x005F33 0
    1.2.3.3         1.2.3.255       390         0x80000001 0x00553C 0
    1.2.3.4         1.2.3.255       390         0x80000001 0x004B45 0
    1.2.3.5         1.2.3.255       390         0x80000001 0x00414E 0
    1.2.3.6         1.2.3.255       390         0x80000001 0x003757 0
    1.2.3.7         1.2.3.255       390         0x80000001 0x002D60 0
    1.2.3.8         1.2.3.255       390         0x80000001 0x002369 0
    1.2.3.9         1.2.3.255       390         0x80000001 0x001972 0
    1.2.3.10        1.2.3.255       390         0x80000001 0x000F7B 0
    [etc etc]

    Out of interest, type3, type4, type5, and type7 LSAs all follow the ‘single address per LSA’ model and as such should never be that big. A type2 LSA will expand to reflect the amount of routers on the layer 2 segment, but I find it hard to believe that there would be over 100 routers on a single segment (though not impossible)

    By the way, I wrote a separate post explaining a few more in-depth spf considerations when it comes to type1s and type5s over here: OSPF – Type 1 LSA vs Type 5 LSA (passive vs redistribute)

    So there you have it. Ignore the MTU at your own peril. Rather fix the MTU issue than just ignoring it. It’s something that might not be an issue ‘now’ but as your router LSA grows in size you suddenly run into a problem.

    Next-Hop IP. What does it actually mean?

    I’ve seen far too much confusion about the fundamentals of IP routing that I thought it would be good to write something like this.

     

    If packets are getting sent to a default gateway, or next-hop, whatever – is that packet actually addressed to that next-hop? Well, it depends on what layer we are talking about. From a layer 3 perspective it’s never actually addressed to that next-hop. i.e. the source and destination IP address NEVER changes, unless you have some sort of device doing NAT.

     

    The next-hop address is merely an address that you are hoping that this packet goes towards. If the next-hop is on the same subnet as the source address, than an ARP resolution will take place and that packet will get sent to the gateway’s MAC address. The destination IP has not changed at all.

     

    If the next-hop is NOT on the same subnet, that packet will travel to the local gateway and then onwards. That gateway might have another idea where that packet should go to as, again, the packet is not actually addresses to that next-hop via layer3.

    This also means that a next-hop address could even be an address that doesn’t exist. As long as the packet travels in the right direction you are good to go.

     

    Let’s take the following diagram as an example:

    next hop1 Next Hop IP. What does it actually mean?

    R2 and R3 are running OSPF with each other. R3 has a loopback of 3.3.3.3 advertised into OSPF so R2 knows how to get there. R1 and R2 are not running OSPF. R2 is advertising the R1 and R2 link into OSPF as a stub network.

    The actual subnets used are 10.12.12.0/24 and 10.23.23.0/24
    R1:

    interface FastEthernet0/0
     ip address 10.12.12.1 255.255.255.0
    

    R2:

    interface FastEthernet0/0
     ip address 10.12.12.2 255.255.255.0
     ip ospf 1 area 0
    !
    interface FastEthernet0/1
     ip address 10.23.23.2 255.255.255.0
     ip ospf network point-to-point
     ip ospf 1 area 0
    !
    router ospf 1
     passive-interface FastEthernet0/0

    R3:

    interface Loopback0
     ip address 3.3.3.3 255.255.255.255
     ip ospf 1 area 0
    !
    interface FastEthernet0/0
     ip address 10.23.23.3 255.255.255.0
     ip ospf network point-to-point
     ip ospf 1 area 0

    On R3 I now set an IP route to 3.3.3.3/32 with a next-hop of 192.168.1.1, which does not exist anywhere. I then create another route to 192.168.1.1/32 with a next-hop of 10.12.12.2

    ip route 3.3.3.3 255.255.255.255 192.168.1.1
    ip route 192.168.1.1 255.255.255.255 10.12.12.2

    Let’s have a look at the route table on R1:

    R1#sh ip route 3.3.3.3
    Routing entry for 3.3.3.3/32
      Known via "static", distance 1, metric 0
      Routing Descriptor Blocks:
      * 192.168.1.1
          Route metric is 0, traffic share count is 1
    
    R1#sh ip route 192.168.1.1
    Routing entry for 192.168.1.1/32
      Known via "static", distance 1, metric 0
      Routing Descriptor Blocks:
      * 10.12.12.2
          Route metric is 0, traffic share count is 1

    As expected everything works fine:

    R1#ping 3.3.3.3
    
    Type escape sequence to abort.
    Sending 5, 100-byte ICMP Echos to 3.3.3.3, timeout is 2 seconds:
    !!!!!
    Success rate is 100 percent (5/5), round-trip min/avg/max = 13/23/29 ms

    However the above example is probably not the best example as CEF would already have worked out all the recursive routing needed:

    R1#sh ip cef 3.3.3.3
    3.3.3.3/32, version 7, epoch 0, cached adjacency 10.12.12.2
    0 packets, 0 bytes
      via 192.168.1.1, 0 dependencies, recursive
        next hop 10.12.12.2, FastEthernet0/0 via 192.168.1.1/32
        valid cached adjacency

    But it does prove that the packet is able to get to 3.3.3.3 even with a next-hop that does not actually exist anywhere.

    Let’s now make a more complicated scenario:
    next hop2 Next Hop IP. What does it actually mean?

    My subnet addressing is similar to before. This time R5 is advertising it’s loopback interface into OSPF. R1 is NOT running OSPF.

    R1 has a static route that says to get to 5.5.5.5 it needs to send it to R3. It then has a route to R3 via R2.

    R1#sh ip route 5.5.5.5
    Routing entry for 5.5.5.5/32
      Known via "static", distance 1, metric 0
      Routing Descriptor Blocks:
      * 10.23.23.3
          Route metric is 0, traffic share count is 1
    

    But what happens when I traceroute from R1?

    R1#traceroute 5.5.5.5
    Type escape sequence to abort.
    Tracing the route to 5.5.5.5
      1 10.12.12.2 52 msec 76 msec 4 msec
      2 10.24.24.4 80 msec 72 msec 68 msec
      3 10.45.45.5 188 msec *  84 msec

    The traffic gets to my destination, but it did not ever get near R3. Why is that?
    Have a look at R2:

    R2#sh ip route 5.5.5.5
    Routing entry for 5.5.5.5/32
      Known via "static", distance 1, metric 0
      Routing Descriptor Blocks:
      * 10.24.24.4
          Route metric is 0, traffic share count is 1

    I put a static route on R2 to send traffic for 5.5.5.5 via R4, not R3.

    So all in all really simple. What I’m merely trying to show is that in regular routing, each and every hop along the way will make their own independent decision on how to get to the destination. When that packet gets to R2, it has no idea that R1 wanted to actually go via R3, because that next-hop is not encoded anywhere. All R1 is doing is sending traffic ‘towards’ the next-hop. R2 will makes it’s own decision as it only sees the destination address of 5.5.5.5

    This behaviour fully explains routing-loops and the problem of traffic getting dropped inside an AS running BGP

    Restricting users to only view parts of the SNMP tree – Cisco

    It’s well known that you can give your customer read-only access to the SNMP tree, but are you sure you want to give them that much information? Even though they can’t change anything, they are able to extract the full configuration, the full routing table and much much more.

    As a test I set up SNMP read-only access to a Cisco box I have and ran a full snmpwalk on it. I extracted over 8Mb worth of text data, including full routing tables; ARP tables; OSPF tables etc…

    Not only that, but while I was running the walk my device CPU was sitting pretty high:

    Router#sh proc cpu sorted
    CPU utilization for five seconds: 33%/3%; one minute: 76%; five minutes: 54%
    PID Runtime(ms)     Invoked      uSecs   5Sec   1Min   5Min TTY Process
    210      121148      106996       1132 15.11% 44.65% 25.56%   0 SNMP ENGINE
    107       70240      213991        328  7.35% 12.99% 11.51%   0 IP SNMP

    Walking the entire SNMP tree also took almost 5 minutes.

    So do you really want your customer to know that much? And secondly do you really want your customers monitoring system polling your devices for everything while your device sits with high CPU all the time?

    I was testing with a few views this morning and came up with the following:

    snmp-server view RESTRICT iso included
    snmp-server view RESTRICT at.* excluded
    snmp-server view RESTRICT ip.* excluded
    snmp-server view RESTRICT ospf.* excluded
    snmp-server community [community] view RESTRICT RO [acl]

    When I polled using this community it took less than 5 seconds and gave me pretty much all the information I would want to give the customer. Be sure to restrict the protocol you’re actually using. I have restricted OSPF above.

    Out of interest, an snmpwalk on my edge BGP router gives me a text file of 0.5GB!

    OSPF – Type 1 LSA vs Type 5 LSA (passive vs redistribute)

    In my OSPF database blog entry here: http://mellowd.co.uk/ccie/?p=1999 – I mentioned that Type3, Type5 and Type7 LSAs are not very memory efficient. Each and every prefix needs a separate LSA, while with a Type1, multiple prefixes can be advertised.

    So it stands to reason that perhaps Type1 are always better? Anything that reduces memory load in large topologies is good right? While not always…

    Consider the following topology:

    type1 type5 OSPF   Type 1 LSA vs Type 5 LSA (passive vs redistribute)

    Granted, this is not a ‘large topology’, but the fundamentals are still the same. R1 and R2 are both OSPF speakers in Area 0 (yellow) – Both are linked to switches. Let’s pretend that each of these routers actually have 5 connections to each switch. Now there is 2 ways we can get these 5 subnets each into OSPF. If we put them in Area 0, but make them passive, they’ll also be part of the Type1 LSA that each router originates. We can also reditribute them into OSPF which will create a seperate Type5 for each subnet. What behaviour can we see for each?

    Let’s start by creating 5 subinterfaces on each router, and then running OSPF passive on all of them.

    This is the config on R1:

    interface FastEthernet0/1
     no ip address
    interface FastEthernet0/1.10
     encapsulation dot1Q 10
     ip address 10.10.10.10 255.255.255.0
     ip ospf 1 area 0
    interface FastEthernet0/1.20
     encapsulation dot1Q 20
     ip address 20.20.20.20 255.255.255.0
     ip ospf 1 area 0
    interface FastEthernet0/1.30
     encapsulation dot1Q 30
     ip address 30.30.30.30 255.255.255.0
     ip ospf 1 area 0
    interface FastEthernet0/1.40
     encapsulation dot1Q 40
     ip address 40.40.40.40 255.255.255.0
     ip ospf 1 area 0
    interface FastEthernet0/1.50
     encapsulation dot1Q 50
     ip address 50.50.50.50 255.255.255.0
     ip ospf 1 area 0
    !
    router ospf 1
     log-adjacency-changes
     passive-interface default
     no passive-interface FastEthernet0/0

    R2:

    interface FastEthernet0/1
     no ip address
    interface FastEthernet0/1.10
     encapsulation dot1Q 10
     ip address 11.11.11.11 255.255.255.0
     ip ospf 1 area 0
    interface FastEthernet0/1.20
     encapsulation dot1Q 20
     ip address 21.21.21.21 255.255.255.0
     ip ospf 1 area 0
    interface FastEthernet0/1.30
     encapsulation dot1Q 30
     ip address 31.31.31.31 255.255.255.0
     ip ospf 1 area 0
    interface FastEthernet0/1.40
     encapsulation dot1Q 40
     ip address 41.41.41.41 255.255.255.0
     ip ospf 1 area 0
    interface FastEthernet0/1.50
     encapsulation dot1Q 50
     ip address 51.51.51.51 255.255.255.0
     ip ospf 1 area 0
    !
    router ospf 1
     log-adjacency-changes
     passive-interface default
     no passive-interface FastEthernet0/0

    Let’s have a look at the database on R1:

    R1#sh ip ospf database
    
                OSPF Router with ID (10.12.12.1) (Process ID 1)
    
                    Router Link States (Area 0)
    
    Link ID         ADV Router      Age         Seq#       Checksum Link count
    10.12.12.1      10.12.12.1      0           0x8000000B 0x0029E7 7
    10.12.12.2      10.12.12.2      154         0x8000000B 0x00FE01 7
    R1#

    Nice and neat. Only 2 Type1 LSAs as expected. If we dig into R2′s Type1 we can see:

    R1#sh ip ospf database router 10.12.12.2
    
                OSPF Router with ID (10.12.12.1) (Process ID 1)
    
                    Router Link States (Area 0)
    
      LS age: 222
      Options: (No TOS-capability, DC)
      LS Type: Router Links
      Link State ID: 10.12.12.2
      Advertising Router: 10.12.12.2
      LS Seq Number: 8000000B
      Checksum: 0xFE01
      Length: 108
      Number of Links: 7
    
        Link connected to: another Router (point-to-point)
         (Link ID) Neighboring Router ID: 10.12.12.1
         (Link Data) Router Interface address: 10.12.12.2
          Number of TOS metrics: 0
           TOS 0 Metrics: 10
    
        Link connected to: a Stub Network
         (Link ID) Network/subnet number: 10.12.12.0
         (Link Data) Network Mask: 255.255.255.0
          Number of TOS metrics: 0
           TOS 0 Metrics: 10
    
        Link connected to: a Stub Network
         (Link ID) Network/subnet number: 51.51.51.0
         (Link Data) Network Mask: 255.255.255.0
          Number of TOS metrics: 0
           TOS 0 Metrics: 10
    
        Link connected to: a Stub Network
         (Link ID) Network/subnet number: 41.41.41.0
         (Link Data) Network Mask: 255.255.255.0
          Number of TOS metrics: 0
           TOS 0 Metrics: 10
    
        Link connected to: a Stub Network
         (Link ID) Network/subnet number: 31.31.31.0
         (Link Data) Network Mask: 255.255.255.0
          Number of TOS metrics: 0
           TOS 0 Metrics: 10
    
        Link connected to: a Stub Network
         (Link ID) Network/subnet number: 21.21.21.0
         (Link Data) Network Mask: 255.255.255.0
          Number of TOS metrics: 0
           TOS 0 Metrics: 10
    
        Link connected to: a Stub Network
         (Link ID) Network/subnet number: 11.11.11.0
         (Link Data) Network Mask: 255.255.255.0
          Number of TOS metrics: 0
           TOS 0 Metrics: 10

    All of R2′s networks are advertised in this single LSA. Nice and simple.

    Let’s remove the interface OSPF config and instead redistribute the fa0/1 subinterfaces into OSPF and see what we get. Let’s first look at the OSPF database:

    R1#sh ip ospf database
    
                OSPF Router with ID (10.12.12.1) (Process ID 1)
    
                    Router Link States (Area 0)
    
    Link ID         ADV Router      Age         Seq#       Checksum Link count
    10.12.12.1      10.12.12.1      65          0x8000000E 0x00CE83 2
    10.12.12.2      10.12.12.2      38          0x8000000E 0x00C28D 2
    
                    Type-5 AS External Link States
    
    Link ID         ADV Router      Age         Seq#       Checksum Tag
    10.10.10.0      10.12.12.1      64          0x80000001 0x0069F5 0
    11.11.11.0      10.12.12.2      37          0x80000001 0x003F1C 0
    20.20.20.0      10.12.12.1      64          0x80000001 0x00FF41 0
    21.21.21.0      10.12.12.2      37          0x80000001 0x00D567 0
    30.30.30.0      10.12.12.1      64          0x80000001 0x00968C 0
    31.31.31.0      10.12.12.2      37          0x80000001 0x006CB2 0
    40.40.40.0      10.12.12.1      64          0x80000001 0x002DD7 0
    41.41.41.0      10.12.12.2      38          0x80000001 0x0003FD 0
    50.50.50.0      10.12.12.1      66          0x80000001 0x00C323 0
    51.51.51.0      10.12.12.2      38          0x80000001 0x009949 0

    A lot more than we had last time. Let’s have a look at the Router LSA and 1 External LSA:

    R1#sh ip ospf database router 10.12.12.2
    
                OSPF Router with ID (10.12.12.1) (Process ID 1)
    
                    Router Link States (Area 0)
    
      Routing Bit Set on this LSA
      LS age: 84
      Options: (No TOS-capability, DC)
      LS Type: Router Links
      Link State ID: 10.12.12.2
      Advertising Router: 10.12.12.2
      LS Seq Number: 8000000E
      Checksum: 0xC28D
      Length: 48
      AS Boundary Router
      Number of Links: 2
    
        Link connected to: another Router (point-to-point)
         (Link ID) Neighboring Router ID: 10.12.12.1
         (Link Data) Router Interface address: 10.12.12.2
          Number of TOS metrics: 0
           TOS 0 Metrics: 10
    
        Link connected to: a Stub Network
         (Link ID) Network/subnet number: 10.12.12.0
         (Link Data) Network Mask: 255.255.255.0
          Number of TOS metrics: 0
           TOS 0 Metrics: 10
    
    R1#sh ip ospf database external 51.51.51.0
    
                OSPF Router with ID (10.12.12.1) (Process ID 1)
    
                    Type-5 AS External Link States
    
      Routing Bit Set on this LSA
      LS age: 107
      Options: (No TOS-capability, DC)
      LS Type: AS External Link
      Link State ID: 51.51.51.0 (External Network Number )
      Advertising Router: 10.12.12.2
      LS Seq Number: 80000001
      Checksum: 0x9949
      Length: 36
      Network Mask: /24
            Metric Type: 2 (Larger than any link state path)
            TOS: 0
            Metric: 20
            Forward Address: 0.0.0.0
            External Route Tag: 0

    We are seeing exactly what we expected, but it’s just bloated everything a bit. So perhaps it’s better to put external interfaces into OSPF and run them as passive? Well, let’s take a closer look at something else. Let’s keep the Type5 LSA and check the SPF statistics on R2:

    R2#sh ip ospf statistics
    
                OSPF Router with ID (10.12.12.2) (Process ID 1)
    
      Area 0: SPF algorithm executed 23 times

    At this very moment the SPF algorithm has excecuted 23 times. Let’s shut 2 of R1′s subinterfaces and see what happens in R2.

    R1#conf t
    Enter configuration commands, one per line.  End with CNTL/Z.
    R1(config)#int fa0/1.10
    R1(config-subif)#shut
    R1(config-subif)#int fa0/1.20
    R1(config-subif)#shut
    R2#sh ip ospf statistics
    
                OSPF Router with ID (10.12.12.2) (Process ID 1)
    
      Area 0: SPF algorithm executed 23 times

    No change in the SPF calculation. Let’s now move everything back into Type1s again and do the same.

    Everything has been changed back, so let’s take a before on R2:

    R2#sh ip ospf statistics
    
                OSPF Router with ID (10.12.12.2) (Process ID 1)
    
      Area 0: SPF algorithm executed 31 times
    R1#conf t
    Enter configuration commands, one per line.  End with CNTL/Z.
    R1(config)#int fa0/1.10
    R1(config-subif)#shut
    R1(config-subif)#int fa0/1.20
    R1(config-subif)#shut

    What do we now see on R2?

    R2#sh ip ospf statistics
    
                OSPF Router with ID (10.12.12.2) (Process ID 1)
    
      Area 0: SPF algorithm executed 33 times

    I removed the same 2 interfaces from OSPF, and this time SPF was run twice, once for each removal. What’s going on here?

    If a router originates a Type1 LSA for all of it’s connected interfaces, each time 1 of those interfaces flap, it needs to originate the entire Type1 LSA again showing the removal of the prefix on that interface. Each time a Type1 LSA (and Type2 for that matter) is flooded into an area, all routers in the area need to run their SPF algorithm. A Type5 is inherently external to the OSPF domain, and so OSPF speakers in the area will believe whatever the ABRs and ASBRs tell them. The next-hop to these external routes will be to the ABR and ASBR in the area, of which the SPF algorithm has already been run to find a route to. This is part of OSPF’s DV behaviour when it gets outside the local area.

    So now we see advantages and disadvantages to both methods. Type1′s keep the database small and clean, while Type5′s allow SPF to run far less when externally facing interfaces go up and down.

    You can tweak OSPF with Incremental OSPF. This allows the OSPF process to only recalculate certain portions of the SPF tree when a Type1 or Type2 is flooded into the area. SPF still needs to be run, but at least in a much more optimised state.