The subinterface is an invaluable tool in your belt. I pretty must use subinterfaces whenever I connect routers to each other via ethernet, even if initially they are only going to have a single point-to-point connection to each other.
Why? Well a subinterface allows you to stick a dot1q header on your frames. Just like running a trunk between 2 switches, it allows you to run multiple logical links between 2 routers. As noted above, even if you only have a single point-to-point connection between 2 routers I still run them. In the past I’ve had to change or add to a design. If I was running just the physical interfaces and needed to change it, I would require downtime and possibly even going to site to actually reconfigure the CPE device. If I used subinterfaces from the start I can just add another subinterface to the mix. There is ways to get around this though.
You can also mix regular and subiterfaces together on the same interface. You can also tag a subinterface as the native vlan. Let’s look at a couple of different examples and see what we see.
Let’s just start with a simple physical interface setup.
interface FastEthernet0/0 ip address 192.168.1.1 255.255.255.0
interface FastEthernet0/0 ip address 192.168.1.2 255.255.255.0
R1#ping 192.168.1.2 Type escape sequence to abort. Sending 5, 100-byte ICMP Echos to 192.168.1.2, timeout is 2 seconds: !!!!! Success rate is 100 percent (5/5), round-trip min/avg/max = 4/25/68 ms
Let’s say now that for whatever reason we need to run another logical link between these 2 routers. Let’s also say that R2 happens to be on the other side of the country and we need to do this now. Well I did note above that there is a workaround for this as you can run both a physical and subinterface on the router at the same time. Let’s leave the above config in and add this:
interface FastEthernet0/0.10 encapsulation dot1Q 10 ip address 10.0.0.1 255.255.255.0
interface FastEthernet0/0.10 encapsulation dot1Q 10 ip address 10.0.0.2 255.255.255.0
R1#ping 10.0.0.2 repeat 1 Type escape sequence to abort. Sending 1, 100-byte ICMP Echos to 10.0.0.2, timeout is 2 seconds: ! Success rate is 100 percent (1/1), round-trip min/avg/max = 28/28/28 ms R1#ping 192.168.1.2 repeat 1 Type escape sequence to abort. Sending 1, 100-byte ICMP Echos to 192.168.1.2, timeout is 2 seconds: ! Success rate is 100 percent (1/1), round-trip min/avg/max = 24/24/24 ms
Both still work. Now you could log onto R2 via the 10.0.0.2 link and remove the 192.168.1.0/24 address if you wanted without worry. Wireshark shows traffic going over the physical interface has no dot1q tag, while the subinterface has a dot1q tag of 10:
Now let’s mix it up a bit. I can say that a subinterface is tagged with the native vlan. i.e. no vlan tag:
interface FastEthernet0/0.20 encapsulation dot1Q 20 native ip address 188.8.131.52 255.255.255.0
interface FastEthernet0/0 ip address 184.108.40.206 255.255.255.0 secondary ip address 192.168.1.2 255.255.255.0
R1#ping 220.127.116.11 repeat 1 Type escape sequence to abort. Sending 1, 100-byte ICMP Echos to 18.104.22.168, timeout is 2 seconds: ! Success rate is 100 percent (1/1), round-trip min/avg/max = 24/24/24 ms
On R1 I created a subinterface with the native vlan. On R2 I added a secondary IP address. Traffic for the 22.214.171.124/24 range goes over the wire without a dot1q tag. Why not just use a secondary address? Well you can stick a subinterface into a vrf, while you can’t do that with a secondary address. Your IGP behavior also changes somewhat.
I don’t quite like doing the above though. I don’t like have layer2 seperation between networks that should be separated. However it proves the point that a native subinterface’s traffic outbound is exactly the same format as a physical interface. Just a standard ethernet layer2 header with no dot1q tag.
There is another advantage to using a native subinterface. If I want to shut down the link for 126.96.36.199/24, on R1 I can shut down the subinterface. All other subinterfaces stay up. If I do this on R2 however, all subinterfaces go down:
R1(config)#int fa0/0.20 R1(config-subif)#shut R1(config-subif)#end R1# *Mar 1 00:33:31.023: %SYS-5-CONFIG_I: Configured from console by console R1#sh ip int brief Interface IP-Address OK? Method Status Protocol FastEthernet0/0 192.168.1.1 YES manual up up FastEthernet0/0.10 10.0.0.1 YES manual up up FastEthernet0/0.20 188.8.131.52 YES manual administratively down down
R2(config)#int fa0/0 R2(config-if)#shut R2(config-if)#end R2# *Mar 1 00:34:51.955: %SYS-5-CONFIG_I: Configured from console by console R2#sh ip int *Mar 1 00:34:53.347: %LINK-5-CHANGED: Interface FastEthernet0/0, changed state to administratively down *Mar 1 00:34:54.347: %LINEPROTO-5-UPDOWN: Line protocol on Interface FastEthernet0/0, changed state to down R2#sh ip int brief Interface IP-Address OK? Method Status Protocol FastEthernet0/0 192.168.1.2 YES manual administratively down down FastEthernet0/0.10 10.0.0.2 YES manual administratively down down
So while the native subinterface and physical interface send traffic the same, I can shut a native subinterface without affecting any other subinterface, while the shutting down of a physical interface shuts down all the subinterfaces attached to it
Now let’s get deeper…
interface FastEthernet0/0.10 encapsulation dot1Q 10 native ip address 10.10.10.1 255.255.255.0
interface FastEthernet0/0.10 encapsulation dot1Q 10 ip address 10.10.10.2 255.255.255.0
Here I have R1 configured with a native subinterface, while R2 is configured with a dot1q header of 10. You would expect there would be no communication between them, but you would be incorrect. I’m going to turn on debug ip packet on both routers for this to see who sees what
R1#ping 10.10.10.2 repeat 1 Type escape sequence to abort. Sending 1, 100-byte ICMP Echos to 10.10.10.2, timeout is 2 seconds: *Mar 1 00:08:07.763: IP: tableid=0, s=10.10.10.1 (local), d=10.10.10.2 (FastEthernet0/0.10), routed via FIB *Mar 1 00:08:07.763: IP: s=10.10.10.1 (local), d=10.10.10.2 (FastEthernet0/0.10), len 100, sending. Success rate is 0 percent (0/1)
R1#sh arp Protocol Address Age (min) Hardware Addr Type Interface Internet 10.10.10.1 - c200.3e78.0000 ARPA FastEthernet0/0.10 Internet 10.10.10.2 2 c201.3e78.0000 ARPA FastEthernet0/0.10
Let’s try the other way around now.
R2#ping 10.10.10.1 repeat 1 Type escape sequence to abort. Sending 1, 100-byte ICMP Echos to 10.10.10.1, timeout is 2 seconds: *Mar 1 00:11:20.807: IP: tableid=0, s=10.10.10.2 (local), d=10.10.10.1 (FastEthernet0/0.10), routed via RIB *Mar 1 00:11:20.811: IP: s=10.10.10.2 (local), d=10.10.10.1 (FastEthernet0/0.10), len 100, sending *Mar 1 00:11:20.815: IP: s=10.10.10.2 (local), d=10.10.10.1 (FastEthernet0/0.10), len 100, encapsulation failed. Success rate is 0 percent (0/1)
R2#sh arp Protocol Address Age (min) Hardware Addr Type Interface Internet 10.10.10.1 0 Incomplete ARPA Internet 10.10.10.2 - c201.3e78.0000 ARPA FastEthernet0/0.10
Somehow R1 learned R2′s MAC address so communication was there, but it seems it’s only one way. Let’s clear the ARP cache on both and debug ARP to see what happens.
The first thing you notice when you clear the ARP cache is that the local router will send a gratuitous ARP out to let everyone else know about it. Let’s see what we see on R1 when we clear R2′s cache:
R2#clear arp R2# *Mar 1 00:18:45.135: ARP: flushing ARP entries for all interfaces *Mar 1 00:18:45.143: IP ARP: sent rep src 10.10.10.2 c201.3e78.0000, dst 10.10.10.2 ffff.ffff.ffff FastEthernet0/0.10
R1# *Mar 1 00:18:31.231: IP ARP: rcvd rep src 10.10.10.2 c201.3e78.0000, dst 10.10.10.2 FastEthernet0/0.10
What about the other way?
R1#clear arp R1# *Mar 1 00:20:53.847: ARP: flushing ARP entries for all interfaces *Mar 1 00:20:53.855: IP ARP: sent rep src 10.10.10.1 c200.3e78.0000, dst 10.10.10.1 ffff.ffff.ffff FastEthernet0/0.10
R2# *Mar 1 00:21:07.831: IP ARP rep filtered src 10.10.10.1 c200.3e78.0000, dst 10.10.10.1 ffff.ffff.ffff wrong cable, interface FastEthernet0/0
R2 complains that this ARP is coming in on ‘the wrong cable’ while wireshark shows the gratuitous ARP being sent without a dot1q header as expected.
What can we get from this? Well if you have a native subinterface, it will accept both untagged AND tagged frames in the vlan, vlan 10 in my example above. However it will only SEND untagged traffic. Good to know. It can also explain why you sometimes see an ARP entry on one side, while not on the other.
I’ve seen far too much confusion about the fundamentals of IP routing that I thought it would be good to write something like this.
If packets are getting sent to a default gateway, or next-hop, whatever – is that packet actually addressed to that next-hop? Well, it depends on what layer we are talking about. From a layer 3 perspective it’s never actually addressed to that next-hop. i.e. the source and destination IP address NEVER changes, unless you have some sort of device doing NAT.
The next-hop address is merely an address that you are hoping that this packet goes towards. If the next-hop is on the same subnet as the source address, than an ARP resolution will take place and that packet will get sent to the gateway’s MAC address. The destination IP has not changed at all.
If the next-hop is NOT on the same subnet, that packet will travel to the local gateway and then onwards. That gateway might have another idea where that packet should go to as, again, the packet is not actually addresses to that next-hop via layer3.
This also means that a next-hop address could even be an address that doesn’t exist. As long as the packet travels in the right direction you are good to go.
Let’s take the following diagram as an example:
R2 and R3 are running OSPF with each other. R3 has a loopback of 184.108.40.206 advertised into OSPF so R2 knows how to get there. R1 and R2 are not running OSPF. R2 is advertising the R1 and R2 link into OSPF as a stub network.
The actual subnets used are 10.12.12.0/24 and 10.23.23.0/24
interface FastEthernet0/0 ip address 10.12.12.1 255.255.255.0
interface FastEthernet0/0 ip address 10.12.12.2 255.255.255.0 ip ospf 1 area 0 ! interface FastEthernet0/1 ip address 10.23.23.2 255.255.255.0 ip ospf network point-to-point ip ospf 1 area 0 ! router ospf 1 passive-interface FastEthernet0/0
interface Loopback0 ip address 220.127.116.11 255.255.255.255 ip ospf 1 area 0 ! interface FastEthernet0/0 ip address 10.23.23.3 255.255.255.0 ip ospf network point-to-point ip ospf 1 area 0
On R3 I now set an IP route to 18.104.22.168/32 with a next-hop of 192.168.1.1, which does not exist anywhere. I then create another route to 192.168.1.1/32 with a next-hop of 10.12.12.2
ip route 22.214.171.124 255.255.255.255 192.168.1.1 ip route 192.168.1.1 255.255.255.255 10.12.12.2
Let’s have a look at the route table on R1:
R1#sh ip route 126.96.36.199 Routing entry for 188.8.131.52/32 Known via "static", distance 1, metric 0 Routing Descriptor Blocks: * 192.168.1.1 Route metric is 0, traffic share count is 1 R1#sh ip route 192.168.1.1 Routing entry for 192.168.1.1/32 Known via "static", distance 1, metric 0 Routing Descriptor Blocks: * 10.12.12.2 Route metric is 0, traffic share count is 1
As expected everything works fine:
R1#ping 184.108.40.206 Type escape sequence to abort. Sending 5, 100-byte ICMP Echos to 220.127.116.11, timeout is 2 seconds: !!!!! Success rate is 100 percent (5/5), round-trip min/avg/max = 13/23/29 ms
However the above example is probably not the best example as CEF would already have worked out all the recursive routing needed:
R1#sh ip cef 18.104.22.168 22.214.171.124/32, version 7, epoch 0, cached adjacency 10.12.12.2 0 packets, 0 bytes via 192.168.1.1, 0 dependencies, recursive next hop 10.12.12.2, FastEthernet0/0 via 192.168.1.1/32 valid cached adjacency
But it does prove that the packet is able to get to 126.96.36.199 even with a next-hop that does not actually exist anywhere.
My subnet addressing is similar to before. This time R5 is advertising it’s loopback interface into OSPF. R1 is NOT running OSPF.
R1 has a static route that says to get to 188.8.131.52 it needs to send it to R3. It then has a route to R3 via R2.
R1#sh ip route 184.108.40.206 Routing entry for 220.127.116.11/32 Known via "static", distance 1, metric 0 Routing Descriptor Blocks: * 10.23.23.3 Route metric is 0, traffic share count is 1
But what happens when I traceroute from R1?
R1#traceroute 18.104.22.168 Type escape sequence to abort. Tracing the route to 22.214.171.124 1 10.12.12.2 52 msec 76 msec 4 msec 2 10.24.24.4 80 msec 72 msec 68 msec 3 10.45.45.5 188 msec * 84 msec
The traffic gets to my destination, but it did not ever get near R3. Why is that?
Have a look at R2:
R2#sh ip route 126.96.36.199 Routing entry for 188.8.131.52/32 Known via "static", distance 1, metric 0 Routing Descriptor Blocks: * 10.24.24.4 Route metric is 0, traffic share count is 1
I put a static route on R2 to send traffic for 184.108.40.206 via R4, not R3.
So all in all really simple. What I’m merely trying to show is that in regular routing, each and every hop along the way will make their own independent decision on how to get to the destination. When that packet gets to R2, it has no idea that R1 wanted to actually go via R3, because that next-hop is not encoded anywhere. All R1 is doing is sending traffic ‘towards’ the next-hop. R2 will makes it’s own decision as it only sees the destination address of 220.127.116.11
This behaviour fully explains routing-loops and the problem of traffic getting dropped inside an AS running BGP
Ok, it took a bit longer than I wanted it to but here we go.
You can now reach www.mellowd.co.uk/ccie via both IPv4 and IPv6 natively. Hooray.
And it only makes sense to get an address with ccie in it? Sure!
Name: mellowd.co.uk Addresses: 2001:a08:da2::cc1e 18.104.22.168
A couple of things to note. I had to enable listening on ipv6 of my web server software as well as sort my ip6tables out. ICMP is a lot more important in IPv6 than it is in IPv4 so watch that.
Not all service providers have blanket coverage of an entire country. When a service provider gives service to a customer, more than likely the customer will be using a line from an external carrier of the ISP’s choosing.
Metro Ethernet is becoming more and more common these days and it does give us as designers a lot of flexibility. However each time a customer purchases a leased line, that requires another port in your core. That’s fine if the circuit is a nice gig, but quite often a lot of office will have anything from 2Mb to 1Gb. Do you really want to waste your core gig ports for a 2Mb circuit? Not really.
A lot of carriers are now offering aggregated ethernet links. Essentially this means that each customer site has a separate port (of course) but these are all aggregated when the carrier hands off to us. We get a single link carrying a bunch of customer circuits. Now you can sell hundreds of 2Mb circuits and only use a single port.
But how do we separate traffic then? Well you’ll come to an agreement with the carrier and each circuit ordered will have a vlan tag on the core side. This means customer 1 site 1′s traffic will arrive on vlan 1000. customer2 site 1′s traffic will arrive on vlan 1001. Now you just need to stick each tag into an MPLS solution and all is good.
But now what happens when you need to run multiple virtual circuits to a single customer site over their leased line? The vlan tag is already used by the carrier. What if I need to run 2 vrf’s for the customer and I need a WAN interface in each vrf?
We could do QinQ, but let’s think about this. In order to do QinQ I need another device at the customer site to pop another vlan tag on. I then need to get that tag into the core, pop off the tag and stick it back into the core. This could get messy.
Another problem with regular Cisco switches is that I can only do dot1q encapsulation on a port based. This means I need to use a port for every customer again, negating the advantage of the aggregated port to begin with. Or I can send traffic to another switch on a per-port basis, then get the second switch to aggregate the vlans back into the core. This will work, but what a nightmare to support.
Instead of a regular switch I could use a ME3400G and do selective QinQ. This allows me to specify that vlan 10 and 20 gets vlan 1000 popped onto it, and vlan 15 and 25, on the same port, gets vlan 1001 popped onto it.
It’s still another device that we could do without.
Let’s tackle the problem at the customer site first. Ideally I would like to do away with a separate device that does my QinQ. I can’t send double-tagged frames directly out my Cisco. Are we sure about this?
interface GigabitEthernet0/1.500 encapsulation dot1Q 1000 second-dot1q 500 ip address 172.16.255.1 255.255.255.0 end
The second-dot1q command allows you to both send and receive QinQ traffic directly from a router interface. The first tag is the metro tag, while the second tag is the inner tag. Let’s create another interface in another vrf.
interface GigabitEthernet0/1.600 encapsulation dot1Q 1000 second-dot1q 600 ip vrf forwarding 1 ip address 192.168.1.2 255.255.255.0
Perfect. I can create as many sub-interfaces as I need using a single metro tag.
What about my core? In my core I’m using Brocade Netirons. Let’s see if I can terminate a double-tagged frame:
vpls Test_Customer 500 vpls-peer 192.168.1.1 vlan 1000 inner-vlan 500 tagged ethe 15/1 ! vpls Test_Customer 600 vpls-peer 192.168.1.50 vlan 1000 inner-vlan 600 tagged ethe 15/1
No problems there. I can create 2 completely separate VPLS instances using the same single metro tag.
So far so good…
Let’s say for whatever reason I now need to run a point-to-point link directly from my core to the CPE at the customer site. I don’t want to terminate this point-to-point link into a VLL/VPLS. This could be for management, to provide voice or internet access, whatever. Unfortunately my Brocade device cannot terminate a double-tagged frame both in the local table as well as in a VPLS. Now we’re close to going back to our earlier switch examples to pop off that pesky outer tag.
But never fear, there is always a way. I did note above that we can’t terminate a double-tagged frame on both. How about sending some traffic as double-tagged and certain traffic as single? Can we do this?
Let’s start again with the CPE and show the full relevant config:
interface GigabitEthernet0/1.1000 encapsulation dot1Q 1000 ip address 10.0.0.1 255.255.255.0 ! interface GigabitEthernet0/1.600 encapsulation dot1Q 1000 second-dot1q 600 ip vrf forwarding 1 ip address 192.168.1.2 255.255.255.0 ! interface GigabitEthernet0/1.500 encapsulation dot1Q 1000 second-dot1q 500 ip address 172.16.255.1 255.255.255.0
Works just fine on the Cisco. What about my Brocade?
vlan 1000 name Test tagged ethe 15/1 router-interface ve 1000 ! interface ve 1000 ip address 10.0.0.2/24 ! router mpls ! vpls Test_Customer 500 vpls-peer 192.168.1.1 vlan 1000 inner-vlan 500 tagged ethe 15/1 ! vpls Test_Customer 600 vpls-peer 192.168.1.50 vlan 1000 inner-vlan 600 tagged ethe 15/1
Everything tested and everything works perfectly. We’ve managed to remove all kinds of kit and also managed to simply the solution.