Native vlan subinterfaces

On July 27, 2012, in CCIE, by Darren

The subinterface is an invaluable tool in your belt. I pretty must use subinterfaces whenever I connect routers to each other via ethernet, even if initially they are only going to have a single point-to-point connection to each other.

Why? Well a subinterface allows you to stick a dot1q header on your frames. Just like running a trunk between 2 switches, it allows you to run multiple logical links between 2 routers. As noted above, even if you only have a single point-to-point connection between 2 routers I still run them. In the past I’ve had to change or add to a design. If I was running just the physical interfaces and needed to change it, I would require downtime and possibly even going to site to actually reconfigure the CPE device. If I used subinterfaces from the start I can just add another subinterface to the mix. There is ways to get around this though.

You can also mix regular and subiterfaces together on the same interface. You can also tag a subinterface as the native vlan. Let’s look at a couple of different examples and see what we see.

dot1qnative Native vlan subinterfaces

Let’s just start with a simple physical interface setup.
R1:

interface FastEthernet0/0
 ip address 192.168.1.1 255.255.255.0

R2:

interface FastEthernet0/0
 ip address 192.168.1.2 255.255.255.0
R1#ping 192.168.1.2

Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 192.168.1.2, timeout is 2 seconds:
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 4/25/68 ms

dot1qnative 11 Native vlan subinterfaces

Let’s say now that for whatever reason we need to run another logical link between these 2 routers. Let’s also say that R2 happens to be on the other side of the country and we need to do this now. Well I did note above that there is a workaround for this as you can run both a physical and subinterface on the router at the same time. Let’s leave the above config in and add this:
R1:

interface FastEthernet0/0.10
 encapsulation dot1Q 10
 ip address 10.0.0.1 255.255.255.0
interface FastEthernet0/0.10
 encapsulation dot1Q 10
 ip address 10.0.0.2 255.255.255.0
R1#ping 10.0.0.2 repeat 1

Type escape sequence to abort.
Sending 1, 100-byte ICMP Echos to 10.0.0.2, timeout is 2 seconds:
!
Success rate is 100 percent (1/1), round-trip min/avg/max = 28/28/28 ms
R1#ping 192.168.1.2 repeat 1

Type escape sequence to abort.
Sending 1, 100-byte ICMP Echos to 192.168.1.2, timeout is 2 seconds:
!
Success rate is 100 percent (1/1), round-trip min/avg/max = 24/24/24 ms

Both still work. Now you could log onto R2 via the 10.0.0.2 link and remove the 192.168.1.0/24 address if you wanted without worry. Wireshark shows traffic going over the physical interface has no dot1q tag, while the subinterface has a dot1q tag of 10:
dot1qnative 2 Native vlan subinterfaces
dot1qnative 3 Native vlan subinterfaces

Now let’s mix it up a bit. I can say that a subinterface is tagged with the native vlan. i.e. no vlan tag:
R1:

interface FastEthernet0/0.20
 encapsulation dot1Q 20 native
 ip address 20.20.20.1 255.255.255.0

R2:

interface FastEthernet0/0
 ip address 20.20.20.2 255.255.255.0 secondary
 ip address 192.168.1.2 255.255.255.0
R1#ping 20.20.20.2 repeat 1

Type escape sequence to abort.
Sending 1, 100-byte ICMP Echos to 20.20.20.2, timeout is 2 seconds:
!
Success rate is 100 percent (1/1), round-trip min/avg/max = 24/24/24 ms

dot1qnative 4 Native vlan subinterfaces

On R1 I created a subinterface with the native vlan. On R2 I added a secondary IP address. Traffic for the 20.20.20.0/24 range goes over the wire without a dot1q tag. Why not just use a secondary address? Well you can stick a subinterface into a vrf, while you can’t do that with a secondary address. Your IGP behavior also changes somewhat.

I don’t quite like doing the above though. I don’t like have layer2 seperation between networks that should be separated. However it proves the point that a native subinterface’s traffic outbound is exactly the same format as a physical interface. Just a standard ethernet layer2 header with no dot1q tag.

There is another advantage to using a native subinterface. If I want to shut down the link for 20.20.20.0/24, on R1 I can shut down the subinterface. All other subinterfaces stay up. If I do this on R2 however, all subinterfaces go down:
R1:

R1(config)#int fa0/0.20
R1(config-subif)#shut
R1(config-subif)#end
R1#
*Mar  1 00:33:31.023: %SYS-5-CONFIG_I: Configured from console by console
R1#sh ip int brief
Interface                  IP-Address      OK? Method Status                Protocol
FastEthernet0/0            192.168.1.1     YES manual up                    up
FastEthernet0/0.10         10.0.0.1        YES manual up                    up
FastEthernet0/0.20         20.20.20.1      YES manual administratively down down
R2(config)#int fa0/0
R2(config-if)#shut
R2(config-if)#end
R2#
*Mar  1 00:34:51.955: %SYS-5-CONFIG_I: Configured from console by console
R2#sh ip int
*Mar  1 00:34:53.347: %LINK-5-CHANGED: Interface FastEthernet0/0, changed state to administratively down
*Mar  1 00:34:54.347: %LINEPROTO-5-UPDOWN: Line protocol on Interface FastEthernet0/0, changed state to down
R2#sh ip int brief
Interface                  IP-Address      OK? Method Status                Protocol
FastEthernet0/0            192.168.1.2     YES manual administratively down down
FastEthernet0/0.10         10.0.0.2        YES manual administratively down down

So while the native subinterface and physical interface send traffic the same, I can shut a native subinterface without affecting any other subinterface, while the shutting down of a physical interface shuts down all the subinterfaces attached to it

Now let’s get deeper…

R1:

interface FastEthernet0/0.10
 encapsulation dot1Q 10 native
 ip address 10.10.10.1 255.255.255.0

R2:

interface FastEthernet0/0.10
 encapsulation dot1Q 10
 ip address 10.10.10.2 255.255.255.0

Here I have R1 configured with a native subinterface, while R2 is configured with a dot1q header of 10. You would expect there would be no communication between them, but you would be incorrect. I’m going to turn on debug ip packet on both routers for this to see who sees what
R1:

R1#ping 10.10.10.2 repeat 1

Type escape sequence to abort.
Sending 1, 100-byte ICMP Echos to 10.10.10.2, timeout is 2 seconds:

*Mar  1 00:08:07.763: IP: tableid=0, s=10.10.10.1 (local), d=10.10.10.2 (FastEthernet0/0.10), routed via FIB
*Mar  1 00:08:07.763: IP: s=10.10.10.1 (local), d=10.10.10.2 (FastEthernet0/0.10), len 100, sending.
Success rate is 0 percent (0/1)

dot1qnative 5 Native vlan subinterfaces
So the ping failed, but look more closely. R1 was able to encapsulate the packet, which means it has a MAC address. Let’s look at the ARP table:

R1#sh arp
Protocol  Address          Age (min)  Hardware Addr   Type   Interface
Internet  10.10.10.1              -   c200.3e78.0000  ARPA   FastEthernet0/0.10
Internet  10.10.10.2              2   c201.3e78.0000  ARPA   FastEthernet0/0.10

Let’s try the other way around now.
R2:

R2#ping 10.10.10.1 repeat 1

Type escape sequence to abort.
Sending 1, 100-byte ICMP Echos to 10.10.10.1, timeout is 2 seconds:

*Mar  1 00:11:20.807: IP: tableid=0, s=10.10.10.2 (local), d=10.10.10.1 (FastEthernet0/0.10), routed via RIB
*Mar  1 00:11:20.811: IP: s=10.10.10.2 (local), d=10.10.10.1 (FastEthernet0/0.10), len 100, sending
*Mar  1 00:11:20.815: IP: s=10.10.10.2 (local), d=10.10.10.1 (FastEthernet0/0.10), len 100, encapsulation failed.
Success rate is 0 percent (0/1)

dot1qnative 6 Native vlan subinterfaces
Let’s have a look at the ARP cache:

R2#sh arp
Protocol  Address          Age (min)  Hardware Addr   Type   Interface
Internet  10.10.10.1              0   Incomplete      ARPA
Internet  10.10.10.2              -   c201.3e78.0000  ARPA   FastEthernet0/0.10

Somehow R1 learned R2′s MAC address so communication was there, but it seems it’s only one way. Let’s clear the ARP cache on both and debug ARP to see what happens.

The first thing you notice when you clear the ARP cache is that the local router will send a gratuitous ARP out to let everyone else know about it. Let’s see what we see on R1 when we clear R2′s cache:
R2:

R2#clear arp
R2#
*Mar  1 00:18:45.135: ARP: flushing ARP entries for all interfaces
*Mar  1 00:18:45.143: IP ARP: sent rep src 10.10.10.2 c201.3e78.0000,
                 dst 10.10.10.2 ffff.ffff.ffff FastEthernet0/0.10

R1:

R1#
*Mar  1 00:18:31.231: IP ARP: rcvd rep src 10.10.10.2 c201.3e78.0000, dst 10.10.10.2 FastEthernet0/0.10

R1 receives this gratuitous ARP and installs the MAC address into it’s ARP cache. Wireshark shows that R2 is sending this gratuitous ARP with a dot1q tag:
dot1qnative 7 Native vlan subinterfaces

What about the other way?
R1:

R1#clear arp
R1#
*Mar  1 00:20:53.847: ARP: flushing ARP entries for all interfaces
*Mar  1 00:20:53.855: IP ARP: sent rep src 10.10.10.1 c200.3e78.0000,
                 dst 10.10.10.1 ffff.ffff.ffff FastEthernet0/0.10

R2:

R2#
*Mar  1 00:21:07.831: IP ARP rep filtered src 10.10.10.1 c200.3e78.0000, dst 10.10.10.1 ffff.ffff.ffff wrong cable, interface FastEthernet0/0

dot1qnative 8 Native vlan subinterfaces

R2 complains that this ARP is coming in on ‘the wrong cable’ while wireshark shows the gratuitous ARP being sent without a dot1q header as expected.

What can we get from this? Well if you have a native subinterface, it will accept both untagged AND tagged frames in the vlan, vlan 10 in my example above. However it will only SEND untagged traffic. Good to know. It can also explain why you sometimes see an ARP entry on one side, while not on the other.

Tagged with:  

Next-Hop IP. What does it actually mean?

On July 18, 2012, in Fundamentals, by Darren

I’ve seen far too much confusion about the fundamentals of IP routing that I thought it would be good to write something like this.

 

If packets are getting sent to a default gateway, or next-hop, whatever – is that packet actually addressed to that next-hop? Well, it depends on what layer we are talking about. From a layer 3 perspective it’s never actually addressed to that next-hop. i.e. the source and destination IP address NEVER changes, unless you have some sort of device doing NAT.

 

The next-hop address is merely an address that you are hoping that this packet goes towards. If the next-hop is on the same subnet as the source address, than an ARP resolution will take place and that packet will get sent to the gateway’s MAC address. The destination IP has not changed at all.

 

If the next-hop is NOT on the same subnet, that packet will travel to the local gateway and then onwards. That gateway might have another idea where that packet should go to as, again, the packet is not actually addresses to that next-hop via layer3.

This also means that a next-hop address could even be an address that doesn’t exist. As long as the packet travels in the right direction you are good to go.

 

Let’s take the following diagram as an example:

next hop1 Next Hop IP. What does it actually mean?

R2 and R3 are running OSPF with each other. R3 has a loopback of 3.3.3.3 advertised into OSPF so R2 knows how to get there. R1 and R2 are not running OSPF. R2 is advertising the R1 and R2 link into OSPF as a stub network.

The actual subnets used are 10.12.12.0/24 and 10.23.23.0/24
R1:

interface FastEthernet0/0
 ip address 10.12.12.1 255.255.255.0

R2:

interface FastEthernet0/0
 ip address 10.12.12.2 255.255.255.0
 ip ospf 1 area 0
!
interface FastEthernet0/1
 ip address 10.23.23.2 255.255.255.0
 ip ospf network point-to-point
 ip ospf 1 area 0
!
router ospf 1
 passive-interface FastEthernet0/0

R3:

interface Loopback0
 ip address 3.3.3.3 255.255.255.255
 ip ospf 1 area 0
!
interface FastEthernet0/0
 ip address 10.23.23.3 255.255.255.0
 ip ospf network point-to-point
 ip ospf 1 area 0

On R3 I now set an IP route to 3.3.3.3/32 with a next-hop of 192.168.1.1, which does not exist anywhere. I then create another route to 192.168.1.1/32 with a next-hop of 10.12.12.2

ip route 3.3.3.3 255.255.255.255 192.168.1.1
ip route 192.168.1.1 255.255.255.255 10.12.12.2

Let’s have a look at the route table on R1:

R1#sh ip route 3.3.3.3
Routing entry for 3.3.3.3/32
  Known via "static", distance 1, metric 0
  Routing Descriptor Blocks:
  * 192.168.1.1
      Route metric is 0, traffic share count is 1

R1#sh ip route 192.168.1.1
Routing entry for 192.168.1.1/32
  Known via "static", distance 1, metric 0
  Routing Descriptor Blocks:
  * 10.12.12.2
      Route metric is 0, traffic share count is 1

As expected everything works fine:

R1#ping 3.3.3.3

Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 3.3.3.3, timeout is 2 seconds:
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 13/23/29 ms

However the above example is probably not the best example as CEF would already have worked out all the recursive routing needed:

R1#sh ip cef 3.3.3.3
3.3.3.3/32, version 7, epoch 0, cached adjacency 10.12.12.2
0 packets, 0 bytes
  via 192.168.1.1, 0 dependencies, recursive
    next hop 10.12.12.2, FastEthernet0/0 via 192.168.1.1/32
    valid cached adjacency

But it does prove that the packet is able to get to 3.3.3.3 even with a next-hop that does not actually exist anywhere.

Let’s now make a more complicated scenario:
next hop2 Next Hop IP. What does it actually mean?

My subnet addressing is similar to before. This time R5 is advertising it’s loopback interface into OSPF. R1 is NOT running OSPF.

R1 has a static route that says to get to 5.5.5.5 it needs to send it to R3. It then has a route to R3 via R2.

R1#sh ip route 5.5.5.5
Routing entry for 5.5.5.5/32
  Known via "static", distance 1, metric 0
  Routing Descriptor Blocks:
  * 10.23.23.3
      Route metric is 0, traffic share count is 1

But what happens when I traceroute from R1?

R1#traceroute 5.5.5.5
Type escape sequence to abort.
Tracing the route to 5.5.5.5
  1 10.12.12.2 52 msec 76 msec 4 msec
  2 10.24.24.4 80 msec 72 msec 68 msec
  3 10.45.45.5 188 msec *  84 msec

The traffic gets to my destination, but it did not ever get near R3. Why is that?
Have a look at R2:

R2#sh ip route 5.5.5.5
Routing entry for 5.5.5.5/32
  Known via "static", distance 1, metric 0
  Routing Descriptor Blocks:
  * 10.24.24.4
      Route metric is 0, traffic share count is 1

I put a static route on R2 to send traffic for 5.5.5.5 via R4, not R3.

So all in all really simple. What I’m merely trying to show is that in regular routing, each and every hop along the way will make their own independent decision on how to get to the destination. When that packet gets to R2, it has no idea that R1 wanted to actually go via R3, because that next-hop is not encoded anywhere. All R1 is doing is sending traffic ‘towards’ the next-hop. R2 will makes it’s own decision as it only sees the destination address of 5.5.5.5

This behaviour fully explains routing-loops and the problem of traffic getting dropped inside an AS running BGP

 

Ok, it took a bit longer than I wanted it to but here we go.

 

You can now reach www.mellowd.co.uk/ccie via both IPv4 and IPv6 natively. Hooray.

 

 

And it only makes sense to get an address with ccie in it? Sure!

Name:    mellowd.co.uk
Addresses:  2001:a08:da2::cc1e
            89.248.22.45

A couple of things to note. I had to enable listening on ipv6 of my web server software as well as sort my ip6tables out. ICMP is a lot more important in IPv6 than it is in IPv4 so watch that.


button ipv6 big mellowd.co.uk finally reachable via IPv6

Tagged with:  

What a mouthful!

I wanted to get this as my plan is to get the JNCIE-SP after I get my CCIE number. As I’m on hold a bit thanks to visa issues I knocked this one out :)

juniper jncis Juniper Networks Certified Internet Specialist Service Provider (JNCIS SP)

 

Fun with QinQ

On July 4, 2012, in Brocade, CCIE, Design, by Darren

Not all service providers have blanket coverage of an entire country. When a service provider gives service to a customer, more than likely the customer will be using a line from an external carrier of the ISP’s choosing.

Metro Ethernet is becoming more and more common these days and it does give us as designers a lot of flexibility. However each time a customer purchases a leased line, that requires another port in your core. That’s fine if the circuit is a nice gig, but quite often a lot of office will have anything from 2Mb to 1Gb. Do you really want to waste your core gig ports for a 2Mb circuit? Not really.

A lot of carriers are now offering aggregated ethernet links. Essentially this means that each customer site has a separate port (of course) but these are all aggregated when the carrier hands off to us. We get a single link carrying a bunch of customer circuits. Now you can sell hundreds of 2Mb circuits and only use a single port.
QinQ VPLS 1 Fun with QinQ

But how do we separate traffic then? Well you’ll come to an agreement with the carrier and each circuit ordered will have a vlan tag on the core side. This means customer 1 site 1′s traffic will arrive on vlan 1000. customer2 site 1′s traffic will arrive on vlan 1001. Now you just need to stick each tag into an MPLS solution and all is good.
QinQ VPLS 2 Fun with QinQ

But now what happens when you need to run multiple virtual circuits to a single customer site over their leased line? The vlan tag is already used by the carrier. What if I need to run 2 vrf’s for the customer and I need a WAN interface in each vrf?

We could do QinQ, but let’s think about this. In order to do QinQ I need another device at the customer site to pop another vlan tag on. I then need to get that tag into the core, pop off the tag and stick it back into the core. This could get messy.
Another problem with regular Cisco switches is that I can only do dot1q encapsulation on a port based. This means I need to use a port for every customer again, negating the advantage of the aggregated port to begin with. Or I can send traffic to another switch on a per-port basis, then get the second switch to aggregate the vlans back into the core. This will work, but what a nightmare to support.
Option1:
QinQ VPLS 3 Fun with QinQ

Option2:
QinQ VPLS 4 Fun with QinQ

Instead of a regular switch I could use a ME3400G and do selective QinQ. This allows me to specify that vlan 10 and 20 gets vlan 1000 popped onto it, and vlan 15 and 25, on the same port, gets vlan 1001 popped onto it.

It’s still another device that we could do without.

Let’s tackle the problem at the customer site first. Ideally I would like to do away with a separate device that does my QinQ. I can’t send double-tagged frames directly out my Cisco. Are we sure about this?

interface GigabitEthernet0/1.500
 encapsulation dot1Q 1000 second-dot1q 500
 ip address 172.16.255.1 255.255.255.0
end

The second-dot1q command allows you to both send and receive QinQ traffic directly from a router interface. The first tag is the metro tag, while the second tag is the inner tag. Let’s create another interface in another vrf.

interface GigabitEthernet0/1.600
 encapsulation dot1Q 1000 second-dot1q 600
 ip vrf forwarding 1
 ip address 192.168.1.2 255.255.255.0

Perfect. I can create as many sub-interfaces as I need using a single metro tag.

What about my core? In my core I’m using Brocade Netirons. Let’s see if I can terminate a double-tagged frame:

vpls Test_Customer 500
  vpls-peer 192.168.1.1
   vlan 1000 inner-vlan 500
   tagged ethe 15/1
!
 vpls Test_Customer 600
  vpls-peer 192.168.1.50
   vlan 1000 inner-vlan 600
   tagged ethe 15/1

No problems there. I can create 2 completely separate VPLS instances using the same single metro tag.

So far so good…

Let’s say for whatever reason I now need to run a point-to-point link directly from my core to the CPE at the customer site. I don’t want to terminate this point-to-point link into a VLL/VPLS. This could be for management, to provide voice or internet access, whatever. Unfortunately my Brocade device cannot terminate a double-tagged frame both in the local table as well as in a VPLS. Now we’re close to going back to our earlier switch examples to pop off that pesky outer tag.

But never fear, there is always a way. I did note above that we can’t terminate a double-tagged frame on both. How about sending some traffic as double-tagged and certain traffic as single? Can we do this?

Let’s start again with the CPE and show the full relevant config:

interface GigabitEthernet0/1.1000
 encapsulation dot1Q 1000
 ip address 10.0.0.1 255.255.255.0
!
interface GigabitEthernet0/1.600
 encapsulation dot1Q 1000 second-dot1q 600
 ip vrf forwarding 1
 ip address 192.168.1.2 255.255.255.0
!
interface GigabitEthernet0/1.500
 encapsulation dot1Q 1000 second-dot1q 500
 ip address 172.16.255.1 255.255.255.0

Works just fine on the Cisco. What about my Brocade?

vlan 1000 name Test
 tagged ethe 15/1
 router-interface ve 1000
!
interface ve 1000
 ip address 10.0.0.2/24
!
router mpls
!
vpls Test_Customer 500
  vpls-peer 192.168.1.1
   vlan 1000 inner-vlan 500
   tagged ethe 15/1
!
 vpls Test_Customer 600
  vpls-peer 192.168.1.50
   vlan 1000 inner-vlan 600
   tagged ethe 15/1

Everything tested and everything works perfectly. We’ve managed to remove all kinds of kit and also managed to simply the solution.

Tagged with:  

© 2009-2014 Darren O'Connor All Rights Reserved