Let’s take a look at some BPDUs

Due to the limitations of my kit, I can’t test EVERYTHING I would like to. I need a spare ME3400 for that which I don’t have.

I’ll be using the following topology throughout this discussion:

All lines are in fact 2 links between each switch. What is a router doing there I hear you ask? Well there is a method to the madness here. All will be revealed!

I have a laptop connected to SW1 and spanned a port. This is to get some wireshark captures to see actual frames. This source will change throughout the blog in order to get specific frames.

SW1#sh run | begin monitor
monitor session 1 source interface Fa1/0/23
monitor session 1 destination interface Fa1/0/14 encapsulation replicate

I’ve turned off CDP, DTP, and VTP as I don’t want any noisy frames. I’m capturing this in Linux as my Windows driver is stripping off the vlan tag!

I’ve created 4 vlans: 10, 20, 30, 40.

Default spanning tree

Let’s keep the STP mode to the default. I’ve configured a bridge priority of 4096 on SW1.

Let’s take a look at the spanning-tree for vlan 10 on SW1:

SW1#sh span vlan 10 | include protocol
  Spanning tree enabled protocol ieee

The spanning tree enabled says protocol ieee. ieee what exactly? ieee 802.1d-1998 is the standard STP. ieee 802.1d-2004 is the standard RSTP. I guess we can assume that this is 802.1d-1998, but I don’t like to assume anything. Let’s take a look at bpdus sent from SW1 to SW2:

I can see 5 BPDUs sent from SW1 to SW2. Let’s dig deeper into the first:

This is a pvst+ bpdu. Wireshark tells us this straight away, but you can also work it out from the destination MAC address. i.e. this is NOT an ieee BPDU. Note that my switch is currently sending 5 of these. Each one is vlan tagged with the vlan belonging to the bpdu. I have 4 vlans configured (10, 20, 30, 40) so I get a tagged bpdu with each, and a tagged bpdu with vlan 1.
What about the fifth?

Well this IS an ieee BPDU. So really when you are running standard spanning-tree on a Cisco 3560/3750, you’re really sending BPDUs for both ieee and pvst+. But why? Well this is to allow a non-Cisco device to also play ball. PVST+ is Cisco proprietary. We can test this by bridging two interface on my router. You didn’t think we were actually going to route on that device did you?

Once we’ve bridged the routers interfaces running to SW1 and SW2, let’s take a look at the bpdu’s received on SW1 from R1. For this task I’ve changed the root bridge to R1.

The router {7200 running 12.4(15)T17} is only sending out ieee 802.1-1998 bpdus. These are untagged. SW1 shows that the link going off to R1 is the root.

Let’s add a subinterface on R1 tagged with vlan 10, and bridge that to see what happens. This is where things get interesting.
I now see 3 bpdus per cycle:

I see an untagged ieee bpdu. one tagged pvst+ frame for vlan 10, and finally one untagged pvst+ frame for vlan 1. Also to note is that the router does NOT add the vlan id value to it’s priority. Recall that if a switch is the root for vlan 20, and you set the priority to 4096, it actually adds 20 to 4096 to come up with a value of 4116.

SW1#sh span vlan 20 | include Root
  Root ID    Priority    4116

If we check the current root for vlan 10, we can see the priority is a flat 4096:

SW1#sh span vlan 10 | include Root
  Root ID    Priority    4096
Fa1/0/1             Root FWD 19        128.3    P2p 

This means if you are running STP between a few routers and switches, and all their priorities all the same, then the routers will have lower priority and hence will become your root bridges. That is probably not what you want.

Rapid PVST+

Let’s change the mode to rapid-pvst on all switches. I’ve changed the priority of SW1 to zero. This time I see 6 bpdus sent from SW1 to SW2 per cycle. 5 tagged Rapid-PVST+ bpdus, 1 per vlan. This time I get a tagged rapid-pvst+ frame for vlan1.

I also have an untagged ieee frame. However this is ieee-802.1d-2004. You can see this from the protocol identifier version:

What about our router? Let’s take a look at the bpdus sent from SW1 to R1. SW1 is still sending 6 bpdus per cycle, but take a look at the difference for the bpdu sent for vlan 10 and vlan 20.
vlan 10

vlan 20

The bpdu sent to vlan 10 is using regular pvst+, while the bpdu sent to vlan 20 is rapid-pvst+
R1 has been sending regular pvst+ bpdus to SW1. SW1 recognises this and so knows the neighbour only supports that mode. I have not bridged vlan 20 on the router yet and so SW1 continues to send out rapid-pvst+ bpdus with that tag. The untagged ieee bpdu send from SW1 is also using the non-rapid bpdu.

You can see SW1 knows the device on port fa1/0/1 is a regular STP peer:

SW1#sh span vlan 10 | include Fa1/0/1
Fa1/0/1             Desg FWD 19        128.3    P2p Peer(STP) 

Multiple Spanning Tree

Multiple spanning tree is quite different. First, it’s an ieee standard – 802.1s – It also uses ieee rapid-stp internally. Only a single BPDU is sent out per cycle. This bpdu is untagged and it also contains M record to show the name, revision, and a digest of the vlan-to-instance mapping. This allows the switches to determine whether or not they are part of the same MSTP region.

Please excuse the change of font, I had to open the capture on my Windows box in order to get the entire frame in a single shot.

Just to note, all bpdus sent to R1 are now sent as regular PVST+ tagged and ieee 802.1d-1998 format.

Conclusions

So why exactly does this all matter? It’s important to know which frames are tagged and which are not, especially if you’re going to be providing some kind of layer 2 service over your network. This becomes even more important if you are matching different customer’s traffic via vlan tag and not via an actual separate port. Another example would be if you were providing a selective QinQ service in which a single port can map different vlans to different S-vlan tags. Untagged traffic will not play nicely with this.

I’ll leave that discussion for another day though otherwise this post will never get finished. I do not have any handy kit on me to do selective QinQ, but I do have regular QinQ and EoMPLS (VLL/Pseudowire) which will be the basis to revisit this port in the future sometime.

Juniper Networks Certified Internet Professional (JNCIP-SP) completed

This is the final cert I’m getting before I go head first back into my CCIE studies. Cleared it out this morning :)

From now on until I get my numbers, most of my posts will probably be Cisco orientated again. Once numbers are attained we’ll see what happens next :)

MPLS-TE via RSVP – Part 3 of 3 – Brocade Netrion (XMR/MLX)

Part 1 – Cisco IOS
Part 2 – Juniper JunOS
Part 3 – Brocade Netiron XMR/MLX
Part 4 – Cisco IOS-XR

You thought I forgot about this? No ways. The main issue is that while I run a Brocade core, I don’t have a lot of spare Brocade kit to play with. Hence I’m sat at Brocade’s lab showing the same thing here.

So let’s remind ourselves of the topology:

Netiron MLX/XMR basic config

This is CR1’s config. CR2 and CR3 are nearly identical:

interface loopback 2
 ip ospf area 0
 ip address 2.2.2.2/32
!
interface ethernet 2/1
 enable
 ip ospf area 0
 ip address 192.168.1.2/30
!
interface ethernet 2/3
 enable
 ip ospf area 0
 ip address 192.168.1.9/30
!
router mpls

 mpls-interface e2/1

 mpls-interface e2/3

Vanilla again. In fact out of all 3 vendors, Brocade’s is by far the neatest. You simple add interfaces to OSPF and then enable them under router mpls. Job done.

As always, the bulk of the work is done on the PE boxes. Let’s take a look at AR1’s config:

interface loopback 1
 ip ospf area 0
 ip address 1.1.1.1/32
!
interface loopback 10
 ip address 10.10.10.10/32
!
interface ethernet 1/1
 enable
 ip ospf area 0
 ip address 192.168.1.1/30
!
interface ethernet 1/2
 enable
 ip ospf area 0
 ip address 192.168.1.5/30
!
router bgp
 local-as 100
 neighbor 5.5.5.5 remote-as 100
 neighbor 5.5.5.5 update-source 1.1.1.1

 address-family ipv4 unicast
 network 10.10.10.10/32
 exit-address-family
!
router mpls


 mpls-interface e1/1

 mpls-interface e1/2

 path TO-AR3
  loose 5.5.5.5

 lsp TO-AR3
  to 5.5.5.5
  primary TO-AR3
  enable

I’ve enabled OSPF. Enabled BGP. Added a second loopback advertising a route to AR3. Enabled RSVP. And finally created a loose path to AR3 which is used by my LSP config. I’ve done a similar config on AR3 and my BGP session is now up. On AR3 I’ve aadded a loopback with the IP address of 50.50.50.50/32 advertised to AR1 via BGP. Let’s take a look:

telnet@AR1#sh ip route 50.50.50.50
Type Codes - B:BGP D:Connected I:ISIS O:OSPF R:RIP S:Static; Cost - Dist/Metric
BGP  Codes - i:iBGP e:eBGP
ISIS Codes - L1:Level-1 L2:Level-2
OSPF Codes - i:Inter Area 1:External Type 1 2:External Type 2 s:Sham Link
        Destination        Gateway         Port          Cost          Type Uptime
1       50.50.50.50/32     192.168.1.2     eth 1/1       200/0         Bi   0m4s

So we have the route. Can we ping it? I guess you can figure out what’s going to happen if you look at that gateway address:

telnet@AR1#ping 50.50.50.50
Sending 1, 16-byte ICMP Echo to 50.50.50.50, timeout 5000 msec, TTL 64
Type Control-c to abort
Request timed out.
No reply from remote host.

Nope, of course we can’t. BGP is trying to send this packet to the core, who has no idea how to get to 50.50.50.50. So the packet gets dropped. We can verify this with a traceroute:

telnet@AR1#traceroute 50.50.50.50

Type Control-c to abort
Tracing the route to IP node (50.50.50.50) from 1 to 30 hops

  1    *       *       *     ?
  2    *    ^C
Trace Route aborted!

This is an easy fix though. Instead of having a static route, or even making the LSP avaiable to the IGP, we simply tell BGP it can use LSPs as next hops:

router bgp 
next-hop-mpls

telnet@AR1#sh ip route 50.50.50.50
Type Codes - B:BGP D:Connected I:ISIS O:OSPF R:RIP S:Static; Cost - Dist/Metric
BGP  Codes - i:iBGP e:eBGP
ISIS Codes - L1:Level-1 L2:Level-2
OSPF Codes - i:Inter Area 1:External Type 1 2:External Type 2 s:Sham Link
        Destination        Gateway         Port          Cost          Type Uptime
1       50.50.50.50/32     DIRECT          lsp TO-AR3    200/0         Bi   0m5s

Now we do see the LSP as a next hop. And does it work?

telnet@AR1#ping 50.50.50.50
Sending 1, 16-byte ICMP Echo to 50.50.50.50, timeout 5000 msec, TTL 64
Type Control-c to abort
Reply from 50.50.50.50     : bytes=16 time=1ms TTL=63
Success rate is 100 percent (1/1), round-trip min/avg/max=1/1/1 ms.

Of course it does :)

Let’s take a slightly deeper look at what labels will be used.

telnet@AR1#sh mpls forwarding
Total number of MPLS forwarding entries: 1
      Dest-prefix        In-lbl Out-lbl Out-intf      Sig Next-hop
1     5.5.5.5/32         -      1026    e1/1          R   192.168.1.2

We should see AR1 sending the frame out with a label of 1026. Is that what we see?

telnet@AR1#traceroute 50.50.50.50

Type Control-c to abort
Tracing the route to IP node (50.50.50.50) from 1 to 30 hops

  1    <1 ms   <1 ms   <1 ms 192.168.1.2
        MPLS Label=1026 Exp=7 TTL=1 S=1
  2    <1 ms   <1 ms   <1 ms 50.50.50.50

Netiron MLX/XMR explicit paths

Now let's create an RSVP static path to go through CR2 and CR3. Config again is pretty simple:

router mpls
 
 path STRICT-TO-AR3
  strict 192.168.1.6
  strict 192.168.1.18
  strict 192.168.1.26

lsp TO-AR3
  to 5.5.5.5
  primary STRICT-TO-AR3
  enable

I can add a secondary path plus plenty of options, but for this blog the config above is good enough. We should now see a slightly different label, plus 2 labelled hops:

telnet@AR1#sh mpls forwarding
Total number of MPLS forwarding entries: 1
      Dest-prefix        In-lbl Out-lbl Out-intf      Sig Next-hop
1     5.5.5.5/32         -      1024    e1/2          R   192.168.1.6

telnet@AR1#traceroute 50.50.50.50

Type Control-c to abort
Tracing the route to IP node (50.50.50.50) from 1 to 30 hops

  1    <1 ms   <1 ms   <1 ms 192.168.1.6
        MPLS Label=1024 Exp=7 TTL=1 S=1
  2    <1 ms   <1 ms   <1 ms 192.168.1.18
        MPLS Label=1024 Exp=7 TTL=1 S=1
  3    <1 ms   <1 ms   <1 ms 50.50.50.50

All good.

Netiron MLX/XMR Type-10 OSPF LSA

Enabling OSPF-TE is done under router mpls, not ospf as you might expect:

router mpls
 policy
  traffic-eng ospf area 0

Let's take a look at the database:

telnet@CR1#sh ip ospf database

Link States

Index Area ID         Type LS ID           Adv Rtr         Seq(Hex) Age  Cksum  SyncState
1     0               Rtr  192.168.1.1     192.168.1.1     80000027 1213 0x1960 Done
2     0               Rtr  192.168.1.6     192.168.1.6     80000028 1108 0xa89d Done
3     0               Rtr  192.168.1.18    192.168.1.18    80000029 1290 0x5999 Done
4     0               Rtr  192.168.1.10    192.168.1.10    80000028 1186 0x27f5 Done
5     0               Rtr  192.168.1.2     192.168.1.2     8000002b 1287 0x0e49 Done
6     0               Net  192.168.1.17    192.168.1.6     80000022 1348 0x98ca Done
7     0               Net  192.168.1.6     192.168.1.6     80000022 1828 0x1867 Done
8     0               Net  192.168.1.21    192.168.1.18    80000022 1050 0xbf97 Done
9     0               Net  192.168.1.13    192.168.1.2     80000022 1287 0x0873 Done
10    0               Net  192.168.1.10    192.168.1.10    80000022 706  0x0e64 Done
11    0               Net  192.168.1.2     192.168.1.2     80000023 87   0x2e5c Done
12    0               Net  192.168.1.25    192.168.1.18    80000022 570  0x0843 Done
13    0               OpAr 1.0.0.3         192.168.1.1     80000003 159  0xebd0 Done
14    0               OpAr 1.0.0.3         192.168.1.18    80000003 51   0xb2d9 Done
15    0               OpAr 1.0.0.3         192.168.1.10    80000003 45   0xa406 Done
16    0               OpAr 1.0.0.3         192.168.1.6     80000003 57   0x2a76 Done
17    0               OpAr 1.0.0.3         192.168.1.2     80000003 60   0x316a Done
18    0               OpAr 1.0.0.5         192.168.1.2     80000003 60   0x5d4d Done

Index Area ID         Type LS ID           Adv Rtr         Seq(Hex) Age  Cksum  SyncState
19    0               OpAr 1.0.0.2         192.168.1.1     80000003 159  0x25a0 Done
20    0               OpAr 1.0.0.2         192.168.1.18    80000003 51   0x8df7 Done
21    0               OpAr 1.0.0.2         192.168.1.10    80000003 45   0xdbb0 Done
22    0               OpAr 1.0.0.2         192.168.1.6     80000003 57   0xf5c1 Done
23    0               OpAr 1.0.0.2         192.168.1.2     80000003 60   0x3d86 Done
24    0               OpAr 1.0.0.4         192.168.1.6     80000003 57   0x6d39 Done
25    0               OpAr 1.0.0.4         192.168.1.18    80000003 51   0xf59c Done
26    0               OpAr 1.0.0.4         192.168.1.2     80000003 60   0xac06 Done
27    0               OpAr 1.0.0.1         192.168.1.1     80000002 164  0x90e6 Done
28    0               OpAr 1.0.0.1         192.168.1.10    80000002 50   0xb4b0 Done
29    0               OpAr 1.0.0.1         192.168.1.18    80000002 56   0xd480 Done
30    0               OpAr 1.0.0.1         192.168.1.6     80000002 61   0xa4c8 Done
31    0               OpAr 1.0.0.1         192.168.1.2     80000002 66   0x94e0 Done

Let's drill deeper:

telnet@CR1# sh ip ospf database link-state opaque-area 1.0.0.2
Area ID         Type LS ID           Adv Rtr         Seq(Hex) Age  Cksum  SyncState
0               OpAr 1.0.0.2         192.168.1.1     80000003 212  0x25a0 Done
  Area-opaque TE LSA
  Link:
    Link type : multiaccess
    Link ID : 192.168.1.2
    Local IP Address:
                192.168.1.1
    Remote IP Address:
                0.0.0.0
    TE metric: 1
    Link BW: 1000000 kbits/sec
    Reservable BW: 1000000 kbits/sec
    Unreserved BW:
       [0]    1000000 kbits/sec  [1]    1000000 kbits/sec
       [2]    1000000 kbits/sec  [3]    1000000 kbits/sec
       [4]    1000000 kbits/sec  [5]    1000000 kbits/sec
       [6]    1000000 kbits/sec  [7]    1000000 kbits/sec
    Admin Group: 0x00000000

Area ID         Type LS ID           Adv Rtr         Seq(Hex) Age  Cksum  SyncState
0               OpAr 1.0.0.2         192.168.1.18    80000003 104  0x8df7 Done
  Area-opaque TE LSA
  Link:
    Link type : multiaccess
    Link ID : 192.168.1.25
    Local IP Address:
                192.168.1.25
    Remote IP Address:
                0.0.0.0
    TE metric: 1
    Link BW: 1000000 kbits/sec
    Reservable BW: 1000000 kbits/sec
    Unreserved BW:
       [0]    1000000 kbits/sec  [1]    1000000 kbits/sec
       [2]    1000000 kbits/sec  [3]    1000000 kbits/sec
       [4]    1000000 kbits/sec  [5]    1000000 kbits/sec
       [6]    1000000 kbits/sec  [7]    1000000 kbits/sec
    Admin Group: 0x00000000

Area ID         Type LS ID           Adv Rtr         Seq(Hex) Age  Cksum  SyncState
0               OpAr 1.0.0.2         192.168.1.10    80000003 99   0xdbb0 Done
  Area-opaque TE LSA
  Link:
    Link type : multiaccess
    Link ID : 192.168.1.25
    Local IP Address:
                192.168.1.26
    Remote IP Address:
                0.0.0.0
    TE metric: 1
    Link BW: 1000000 kbits/sec
    Reservable BW: 1000000 kbits/sec
    Unreserved BW:
       [0]    1000000 kbits/sec  [1]    1000000 kbits/sec
       [2]    1000000 kbits/sec  [3]    1000000 kbits/sec
       [4]    1000000 kbits/sec  [5]    1000000 kbits/sec
       [6]    1000000 kbits/sec  [7]    1000000 kbits/sec
    Admin Group: 0x00000000

Area ID         Type LS ID           Adv Rtr         Seq(Hex) Age  Cksum  SyncState
0               OpAr 1.0.0.2         192.168.1.6     80000003 112  0xf5c1 Done
  Area-opaque TE LSA
  Link:
    Link type : multiaccess
    Link ID : 192.168.1.6
    Local IP Address:
                192.168.1.6
    Remote IP Address:
                0.0.0.0
    TE metric: 1
    Link BW: 1000000 kbits/sec
    Reservable BW: 1000000 kbits/sec
    Unreserved BW:
       [0]    1000000 kbits/sec  [1]    1000000 kbits/sec
       [2]    1000000 kbits/sec  [3]    1000000 kbits/sec
       [4]    1000000 kbits/sec  [5]    1000000 kbits/sec
       [6]    1000000 kbits/sec  [7]    1000000 kbits/sec
    Admin Group: 0x00000000

Area ID         Type LS ID           Adv Rtr         Seq(Hex) Age  Cksum  SyncState
0               OpAr 1.0.0.2         192.168.1.2     80000003 116  0x3d86 Done
  Area-opaque TE LSA
  Link:
    Link type : multiaccess
    Link ID : 192.168.1.2
    Local IP Address:
                192.168.1.2
    Remote IP Address:
                0.0.0.0
    TE metric: 1
    Link BW: 1000000 kbits/sec
    Reservable BW: 1000000 kbits/sec
    Unreserved BW:
       [0]    1000000 kbits/sec  [1]    1000000 kbits/sec
       [2]    1000000 kbits/sec  [3]    1000000 kbits/sec
       [4]    1000000 kbits/sec  [5]    1000000 kbits/sec
       [6]    1000000 kbits/sec  [7]    1000000 kbits/sec
    Admin Group: 0x00000000

All of the above will also work on the CER and I believe the CES with correct licensing. As with all the vendors there are a lot more options that can be added to the configs above in order to make the network converge and respond to failure a lot faster. The main idea of all three was to show you a quick insight into how RSVP is configured on the 3 main vendors.

INE TS Mock Lab 1 – problems, problems, and more problems

INE have recently released a new TS CCIE rack and gave all previous members 1 free attempt. I had my TS lab booked for 11am sharp and there I was, sat at my desk waiting for the time to go by.

11am finally came and I started the TS lab. I had a quick read through all 10 tickets to see what I was up against. Most tickets look do-able.

I start on ticket 1 and I bleed a good 10 minutes away and not resolved it yet. So I go onto the next ticket. I’m halfway through ticket 2 when I lose connected to INE. Hmm. Let me try and ping 4.2.2.1 – Nothing. Pick up my phone, no dial tone.

Great, right at the start of my limited 2 hour TS lab the damn phone and ADSL break. Now what do I do? Wait? Rush into work and finish it there? It takes 40 minutes to get to work and I’m not even dressed. ARGH. So I jump into some jeans, dodgy shirt, jacket, and I’m off running for the train. Get to my station and hop on. Finally get to work station and then I’m running like a mad man trying to get into my office as quickly as possible. Finally get to my desk weezing like an 80 year old.

Fire up the session again. I’ve got just over an hour left and I’ve solved nothing. Wonderful.

So I get cracking. Time ticks on and I’m nearing the end of my time. I know I’ve managed to solve 4 tickets and I’m so close to fixing a 5th when my time runs out :(

I think the first lab INE released accurately matches the lab difficulty. I think if I had enough time I would’ve been able to pass it. INE say they will be releasing more labs soon so we’ll see how that goes. The plan is to come into the office and do them as not only is it more comfortable, I have a bunch of redundant links here so it’s far less likely I’ll have this issue again.

MPLS L3VPN – Route Distinguisher vs Route Target vs VPN label

A lot of people confuse the above 3 items. I’ll explain exactly what each of the 3 above items do, how you can see them, and how the routers use them to provide a L3VPN service.

Let’s take the following topology for this post:

Here we have 2 L3VPN customers running over our MPLS core. R5 is advertising 5.5.5.5/32. R8 is also advertising 5.5.5.5/32

Route Distinguisher:

The route distinguisher’s sole job is to keep a route unique while the PE routers advertise NLRI (Network Layer Reachability Information) to each other. If R5 and R8 both advertise 5.5.5.5/32 to R3, how will R3 advertise both of those routes to R4 while keeping them unique. The VPNV4 family itself doesn’t run in a VRF. It runs in the global routing instance and hence it needs something to distinguish a route.

Let’s take a quick look at the vrf RD config for both customers and then the vpnv4 route for 6.6.6.6/32 in the BGP table on R3:

R3#sh run | include ip vrf | rd
ip vrf CUS1
 rd 3.3.3.3:100
ip vrf CUS2
 rd 3.3.3.3:200

R3#sh bgp vpnv4 unicast rd 3.3.3.3:200 6.6.6.6
BGP routing table entry for 3.3.3.3:200:6.6.6.6/32, version 106
Paths: (1 available, best #1, table CUS2)
  Not advertised to any peer
  Local, imported path from 4.4.4.4:200:6.6.6.6/32
    4.4.4.4 (metric 4) from 4.4.4.4 (4.4.4.4)
      Origin incomplete, metric 2, localpref 100, valid, internal, best
      Extended Community: RT:100:200 OSPF DOMAIN ID:0x0005:0x000000030200
        OSPF RT:0.0.0.0:2:0 OSPF ROUTER ID:10.0.47.4:0
      mpls labels in/out nolabel/21

R3#sh bgp vpnv4 unicast rd 3.3.3.3:100 6.6.6.6
BGP routing table entry for 3.3.3.3:100:6.6.6.6/32, version 108
Paths: (1 available, best #1, table CUS1)
  Not advertised to any peer
  Local, imported path from 4.4.4.4:100:6.6.6.6/32
    4.4.4.4 (metric 4) from 4.4.4.4 (4.4.4.4)
      Origin incomplete, metric 2, localpref 100, valid, internal, best
      Extended Community: RT:100:100 OSPF DOMAIN ID:0x0005:0x000000020200
        OSPF RT:0.0.0.0:2:0 OSPF ROUTER ID:10.0.46.4:0
      mpls labels in/out nolabel/23

You can see that R3 has 2 vpnv4 routes for 6.6.6.6/32 – 3.3.3.3:200:6.6.6.6/32 and 3.3.3.3:100:6.6.6.6/32. They are unique as one contains :100: and the other contains :200: – Note that R4 does not have to match this RD in any way. It simple needs to be able to accept 2 unique routes. This is especially important when using route reflectors as RR’s will normally only advertise the best route to it’s clients. If they were not unique, the RR would only be advertising one of these routes. The RD in no way determines what VPN a route actually belongs to.

That’s all the route distinguisher does. No more.

Route Target:

The route target’s job is to tell the PE routers what VPN a route actually belongs to. Let’s take a look at the target config on R3:

R3#sh run | inc ip vrf|target
ip vrf CUS1
 route-target export 100:100
 route-target import 100:100
ip vrf CUS2
 route-target export 100:200
 route-target import 100:200

When R3 receives an advertisement from R5, not only does it change the route into a vpnv4 route with the RD to make it unique, it also adds a community value to that advertisement. This is an RT value. Once this NLRI gets to R4, R4 will ensure that only routes that have a certain RT, will be placed in their respective VRF. As an example let’s have a look at the advertisements of 5.5.5.5 from R3 to R4:

R3#sh bgp vpnv4 unicast rd 3.3.3.3:100 5.5.5.5
BGP routing table entry for 3.3.3.3:100:5.5.5.5/32, version 37
Paths: (1 available, best #1, table CUS1)
  Advertised to update-groups:
     9
  Local
    10.0.35.5 from 0.0.0.0 (3.3.3.3)
      Origin incomplete, metric 2, localpref 100, weight 32768, valid, sourced, best
      Extended Community: RT:100:100 OSPF DOMAIN ID:0x0005:0x000000020200
        OSPF RT:0.0.0.0:2:0 OSPF ROUTER ID:10.0.35.3:0
      mpls labels in/out 24/nolabel

We can see the extended community of 100:100 is encoded into this NLRI on R3. This is advertised to R4:

R4#sh bgp vpnv4 unicast rd 3.3.3.3:100 5.5.5.5
BGP routing table entry for 3.3.3.3:100:5.5.5.5/32, version 178
Paths: (1 available, best #1, no table)
  Not advertised to any peer
  Local
    3.3.3.3 (metric 4) from 3.3.3.3 (3.3.3.3)
      Origin incomplete, metric 2, localpref 100, valid, internal, best
      Extended Community: RT:100:100 OSPF DOMAIN ID:0x0005:0x000000020200
        OSPF RT:0.0.0.0:2:0 OSPF ROUTER ID:10.0.35.3:0
      mpls labels in/out nolabel/24

R4#sh run | include ip vrf | 100:100
ip vrf CUS1
 route-target export 100:100
 route-target import 100:100

R4 has an import 100:100 configuration under it’s VRF, and hence matching the community of 100:100 on the received NLRI, the PE router knows that the advertisement is meant for vrf CUS1. Note that the RD has nothing to do with this.

VPN Label:

The VPN label is to determine what VPN a packet belongs to. But hang on, surely that’s what the RT is for? No. The RT is for the control plane, while the VPN label is for the data plane. Let’s expand on that idea a bit. When R3 advertises NLRI to R4, the RT is used to determine where a route actually belongs. When it comes to R5 actually sending a packet to R6, the VPN label is used. Why? Because when a packet is sent, there is no field in the packet that the route-target is stored. Only the route advertisement contains the route-target as a community value. When R5 sends a ping to R6 from it’s loopback, it’s simply a packet with a destination address of 6.6.6.6 and a source address of 5.5.5.5.

So with L3VPNs we have two labels. The top label is the transport label and the bottom label is the VPN label. PHP will pop the transport label off the second to last router, but the VPN label will only be popped by the actual PE in question. When that frame comes in with the VPN label, R4 knows which VRF that packet belongs to.

VPN labels are advertised in the NLRI along with the RT. Let’s take a look at the 2 VPN labels that R4 is advertising to R3:

R3#sh bgp vpnv4 unicast rd 4.4.4.4:100 6.6.6.6
BGP routing table entry for 4.4.4.4:100:6.6.6.6/32, version 6
Paths: (1 available, best #1, no table)
  Not advertised to any peer
  Local
    4.4.4.4 (metric 4) from 4.4.4.4 (4.4.4.4)
      Origin incomplete, metric 2, localpref 100, valid, internal, best
      Extended Community: RT:100:100 OSPF DOMAIN ID:0x0005:0x000000020200
        OSPF RT:0.0.0.0:2:0 OSPF ROUTER ID:10.0.46.4:0
      mpls labels in/out nolabel/21
R3#sh bgp vpnv4 unicast rd 4.4.4.4:200 6.6.6.6
BGP routing table entry for 4.4.4.4:200:6.6.6.6/32, version 8
Paths: (1 available, best #1, no table)
  Not advertised to any peer
  Local
    4.4.4.4 (metric 4) from 4.4.4.4 (4.4.4.4)
      Origin incomplete, metric 2, localpref 100, valid, internal, best
      Extended Community: RT:100:200 OSPF DOMAIN ID:0x0005:0x000000030200
        OSPF RT:0.0.0.0:2:0 OSPF ROUTER ID:10.0.47.4:0
      mpls labels in/out nolabel/23

We can see that R3 will use a VPN label of 21 when sending traffic to the CUS1 VRF, while it’ll use VPN label 23 when sending to CUS2’s VRF.

Let’s run a traceroute from R5 and R8 to confirm this.
CUS1:

R5#traceroute 6.6.6.6

Type escape sequence to abort.
Tracing the route to 6.6.6.6

  1 10.0.35.3 36 msec *  52 msec
  2 10.0.13.1 [MPLS: Labels 20/21 Exp 0] 120 msec 120 msec 132 msec
  3 10.0.12.2 [MPLS: Labels 18/21 Exp 0] 92 msec 148 msec 104 msec
  4 10.0.46.4 [MPLS: Label 21 Exp 0] 104 msec 100 msec 68 msec
  5 10.0.46.6 172 msec *  140 msec

CUS2:

R8#traceroute 6.6.6.6

Type escape sequence to abort.
Tracing the route to 6.6.6.6

  1 10.0.38.3 44 msec 64 msec 40 msec
  2 10.0.13.1 [MPLS: Labels 20/23 Exp 0] 132 msec 132 msec 88 msec
  3 10.0.12.2 [MPLS: Labels 18/23 Exp 0] 124 msec 156 msec 104 msec
  4 10.0.47.4 [MPLS: Label 23 Exp 0] 192 msec 96 msec 76 msec
  5 10.0.47.7 156 msec *  116 msec

The first hop for CUS1 shows a label stack of 20/21 – 20 being transport and 21 being the VPN. CUS2 uses 20/23 – Notice that as the egress PE is the same, the same transport label is used.

On the 3rd hop on both, R2 is popping off the transport label. Both frames now get to R4. One has a VPN label of 21 and the second a label of 23. R4 knows which VRF both packets belong to and sends them on their way to the correct routers.

Hopefully this helps clear this up for some of you..