I’ve seen far too much confusion about the fundamentals of IP routing that I thought it would be good to write something like this.
If packets are getting sent to a default gateway, or next-hop, whatever – is that packet actually addressed to that next-hop? Well, it depends on what layer we are talking about. From a layer 3 perspective it’s never actually addressed to that next-hop. i.e. the source and destination IP address NEVER changes, unless you have some sort of device doing NAT.
The next-hop address is merely an address that you are hoping that this packet goes towards. If the next-hop is on the same subnet as the source address, than an ARP resolution will take place and that packet will get sent to the gateway’s MAC address. The destination IP has not changed at all.
If the next-hop is NOT on the same subnet, that packet will travel to the local gateway and then onwards. That gateway might have another idea where that packet should go to as, again, the packet is not actually addresses to that next-hop via layer3.
This also means that a next-hop address could even be an address that doesn’t exist. As long as the packet travels in the right direction you are good to go.
Let’s take the following diagram as an example:
R2 and R3 are running OSPF with each other. R3 has a loopback of 126.96.36.199 advertised into OSPF so R2 knows how to get there. R1 and R2 are not running OSPF. R2 is advertising the R1 and R2 link into OSPF as a stub network.
The actual subnets used are 10.12.12.0/24 and 10.23.23.0/24
interface FastEthernet0/0 ip address 10.12.12.1 255.255.255.0
interface FastEthernet0/0 ip address 10.12.12.2 255.255.255.0 ip ospf 1 area 0 ! interface FastEthernet0/1 ip address 10.23.23.2 255.255.255.0 ip ospf network point-to-point ip ospf 1 area 0 ! router ospf 1 passive-interface FastEthernet0/0
interface Loopback0 ip address 188.8.131.52 255.255.255.255 ip ospf 1 area 0 ! interface FastEthernet0/0 ip address 10.23.23.3 255.255.255.0 ip ospf network point-to-point ip ospf 1 area 0
On R3 I now set an IP route to 184.108.40.206/32 with a next-hop of 192.168.1.1, which does not exist anywhere. I then create another route to 192.168.1.1/32 with a next-hop of 10.12.12.2
ip route 220.127.116.11 255.255.255.255 192.168.1.1 ip route 192.168.1.1 255.255.255.255 10.12.12.2
Let’s have a look at the route table on R1:
R1#sh ip route 18.104.22.168 Routing entry for 22.214.171.124/32 Known via "static", distance 1, metric 0 Routing Descriptor Blocks: * 192.168.1.1 Route metric is 0, traffic share count is 1 R1#sh ip route 192.168.1.1 Routing entry for 192.168.1.1/32 Known via "static", distance 1, metric 0 Routing Descriptor Blocks: * 10.12.12.2 Route metric is 0, traffic share count is 1
As expected everything works fine:
R1#ping 126.96.36.199 Type escape sequence to abort. Sending 5, 100-byte ICMP Echos to 188.8.131.52, timeout is 2 seconds: !!!!! Success rate is 100 percent (5/5), round-trip min/avg/max = 13/23/29 ms
However the above example is probably not the best example as CEF would already have worked out all the recursive routing needed:
R1#sh ip cef 184.108.40.206 220.127.116.11/32, version 7, epoch 0, cached adjacency 10.12.12.2 0 packets, 0 bytes via 192.168.1.1, 0 dependencies, recursive next hop 10.12.12.2, FastEthernet0/0 via 192.168.1.1/32 valid cached adjacency
But it does prove that the packet is able to get to 18.104.22.168 even with a next-hop that does not actually exist anywhere.
My subnet addressing is similar to before. This time R5 is advertising it’s loopback interface into OSPF. R1 is NOT running OSPF.
R1 has a static route that says to get to 22.214.171.124 it needs to send it to R3. It then has a route to R3 via R2.
R1#sh ip route 126.96.36.199 Routing entry for 188.8.131.52/32 Known via "static", distance 1, metric 0 Routing Descriptor Blocks: * 10.23.23.3 Route metric is 0, traffic share count is 1
But what happens when I traceroute from R1?
R1#traceroute 184.108.40.206 Type escape sequence to abort. Tracing the route to 220.127.116.11 1 10.12.12.2 52 msec 76 msec 4 msec 2 10.24.24.4 80 msec 72 msec 68 msec 3 10.45.45.5 188 msec * 84 msec
The traffic gets to my destination, but it did not ever get near R3. Why is that?
Have a look at R2:
R2#sh ip route 18.104.22.168 Routing entry for 22.214.171.124/32 Known via "static", distance 1, metric 0 Routing Descriptor Blocks: * 10.24.24.4 Route metric is 0, traffic share count is 1
I put a static route on R2 to send traffic for 126.96.36.199 via R4, not R3.
So all in all really simple. What I’m merely trying to show is that in regular routing, each and every hop along the way will make their own independent decision on how to get to the destination. When that packet gets to R2, it has no idea that R1 wanted to actually go via R3, because that next-hop is not encoded anywhere. All R1 is doing is sending traffic ‘towards’ the next-hop. R2 will makes it’s own decision as it only sees the destination address of 188.8.131.52
This behaviour fully explains routing-loops and the problem of traffic getting dropped inside an AS running BGP