Carrier Supporting Carrier

On May 6, 2013, in CCIE, by Darren

CSC, or Carrier Supporting Carrier, takes inter-AS L3 VPN to the next level. Let’s say that you are an ISP and you are offering L3VPN MPLS services to your customers in England. You take over another ISP located in say, Australia, and two of your UK customers are also located in Australia. They would like their offices in both locations connected over the MPLS cloud.

It would be very expensive to run a new line between those locations. You do, however, still want to provide L3VPN services to your customers. The is where CSC comes in. CSC allows another ISP to connect both sides of your ISP network together. It also ensures the the core ISP doesn’t learn about any of your customer prefixes, as it doesn’t need to.

Let’s take the following diagram into consideration. (click the image for full size)
Screen Shot 2013 05 04 at 21.01.11 1024x214 Carrier Supporting Carrier
Routers 1, 7, 14, 6, 9, and 8 are all part of ISP2. There are three routers located in each geographical location. Routers 2, 3, 15, and 5 are part of the core carrier. This network stretches to both locations. The rest are customer routers injecting their loopback addresses into OSPF to test.

The core carrier is running IS-IS and LDP. ISP100 is running OSPF and LDP. I won’t go into the regular IGP+LDP config as is pretty straightforward.

With CSC, there are a few new terms to deal with. R14 and R8 are going to be regular PE routers for our ISP. R1 and R6 are going to be called CSC-CE (Customer Supporting Carrier – Customer Edge) routers. R2 and R5 are going to be CSC-PE (Customer Supporting Carrier – Provider Edge) routers. All other ISP routers are simply core routers.

The terminology above assumes that you are speaking in regards to being the core carrier (ISP 500 in this case). That is, ISP500′s edge routers are ‘PE’ and R1 and R6 is the customer’s (ISP) PE routers (Called CE in this case)

It’s a little confusing, depending on which view you take, but it’s really not that difficult.

Initial R14/R8 config – regular PE

The first thing we can start off with is to ensure R14′s PE config is correct. I am running OSPF with the CE routers and learning routes from them:

vrf definition CUS1
 rd 14.14.14.14:1
 route-target export 100:1
 route-target import 100:1
 !
 address-family ipv4
 exit-address-family
!
vrf definition CUS2
 rd 14.14.14.14:2
 route-target export 100:2
 route-target import 100:2
 !
 address-family ipv4
 exit-address-family
!
interface FastEthernet1/1
 vrf forwarding CUS1
 ip address 10.14.16.14 255.255.255.0
 ip ospf network point-to-point
 ip ospf 3 area 0
!
interface FastEthernet2/0
 vrf forwarding CUS2
 ip address 10.14.15.14 255.255.255.0
 ip ospf network point-to-point
 ip ospf 2 area 0

R8 has a similar config on the other side so I won’t put it here.

CSC-PE config

R2 and R5 are going to act as PE routers for the core carrier. They will have their AS200 facing interfaces in a VRF. R2 and R5 will be running regular VPNv4 BGP with each other.
csc pe Carrier Supporting Carrier
R2:

vrf definition CSC_AS100
 rd 2.2.2.2:500
 route-target export 500:100
 route-target import 500:100
 !
 address-family ipv4
 exit-address-family
!
interface FastEthernet1/0
 vrf forwarding CSC_AS100
 ip address 10.1.2.2 255.255.255.0
!
router bgp 500
 no bgp default ipv4-unicast
 neighbor 5.5.5.5 remote-as 500
 neighbor 5.5.5.5 update-source Loopback0
 !
 address-family vpnv4
  neighbor 5.5.5.5 activate
  neighbor 5.5.5.5 send-community extended
 exit-address-family

Once again, R5 has a similar config on the other side so I’m not putting it here.

CSC-CE

Eventually we need R14 and R8 to peer with each other via VPNv4.
csc pe customer Carrier Supporting CarrierThis is much like option C in the inter-AS config in which two PE routers peer with each other even though they are not directly connected in the same AS. In order to do so we need each of them to have a valid route to each other. As R1 and R6 each have routes to their local PE device, they need to learn routes from the other side through the core carrier. To do this I’ll be running BGP to the core carrier. I also need to ensure that I’m sending and receiving labeled BGP routes and the end LSP has to be end to end. No part of the path can be unlabelled. I need to advertise the PE’s (R14 and R8) loopback are advertised over to the core carrier:

interface FastEthernet1/0
 ip address 10.1.2.1 255.255.255.0
 mpls bgp forwarding
!
router bgp 100
 bgp log-neighbor-changes
 neighbor 10.1.2.2 remote-as 500
 !
 address-family ipv4
  network 14.14.14.14 mask 255.255.255.255
  neighbor 10.1.2.2 activate
  neighbor 10.1.2.2 allowas-in 1
  neighbor 10.1.2.2 send-label
  no auto-summary
 exit-address-family

I’ll need to allowas-in 1 as both sides are running the same AS number. Without it, the CSC-CE routers would reject the BGP update.

I then need to ensure the CSC-CE routers are redistributing those learned prefixes into the IGP:

router ospf 1
 redistribute bgp 100 subnets

R6 again has a similar config.

The end result of it so far is that R8 and R14 should now be able to ping each other from their respective loopbacks:

R8#ping 14.14.14.14 so lo0

Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 14.14.14.14, timeout is 2 seconds:
Packet sent with a source address of 8.8.8.8
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 32/45/56 ms
R8#traceroute 14.14.14.14 so lo0

Type escape sequence to abort.
Tracing the route to 14.14.14.14

  1 10.8.9.9 [MPLS: Label 24 Exp 0] 56 msec 44 msec 32 msec
  2 10.6.9.6 [MPLS: Label 24 Exp 0] 60 msec 24 msec 60 msec
  3 10.5.6.5 [MPLS: Label 16 Exp 0] 44 msec 44 msec 48 msec
  4 10.5.13.13 [MPLS: Labels 20/34 Exp 0] 48 msec 44 msec 44 msec
  5 10.3.13.3 [MPLS: Labels 19/34 Exp 0] 44 msec 44 msec 40 msec
  6 10.1.2.2 [MPLS: Label 34 Exp 0] 40 msec 44 msec 44 msec
  7 10.1.2.1 [MPLS: Label 19 Exp 0] 44 msec 44 msec 28 msec
  8 10.1.7.7 [MPLS: Label 18 Exp 0] 56 msec 32 msec 12 msec
  9 10.7.14.14 52 msec *  72 msec

Which they can.

PE config – continued

Now that the PE routers have connectivity to each other, we can set up the VPNv4 BGP session:

router bgp 100
 no bgp default ipv4-unicast
 neighbor 8.8.8.8 remote-as 100
 neighbor 8.8.8.8 update-source Loopback0
 !
 address-family vpnv4
  neighbor 8.8.8.8 activate
  neighbor 8.8.8.8 send-community extended
 exit-address-family

Is the session up?

R14#show bgp vpnv4 unicast all summary
BGP router identifier 14.14.14.14, local AS number 100
BGP table version is 1, main routing table version 1

Neighbor        V           AS MsgRcvd MsgSent   TblVer  InQ OutQ Up/Down  State/PfxRcd
8.8.8.8         4          100       4       4        1    0    0 00:00:50        0

Yes, but no prefixes learnt. We still need to redistribute between our vrf aware OSPF processes and BGP:

router ospf 3 vrf CUS1
 redistribute bgp 100 subnets
!
router ospf 2 vrf CUS2
 redistribute bgp 100 subnets
!
router bgp 100
 !
address-family ipv4 vrf CUS1
  redistribute ospf 3 vrf CUS1
 exit-address-family
!
address-family ipv4 vrf CUS2
  redistribute ospf 2 vrf CUS2
 exit-address-family

Verification

As always with these types of configs, we need to ensure both the control and data planes are working correctly. First let’s see the control plane update of R16′s loopback over to R11. The PE router of R14 should be learning this as a vrf prefix:

R14#show ip route vrf CUS1 16.16.16.16

Routing Table: CUS1
Routing entry for 16.16.16.16/32
  Known via "ospf 3", distance 110, metric 2, type intra area
  Redistributing via bgp 100
  Advertised by bgp 100
  Last update from 10.14.16.16 on FastEthernet1/1, 00:18:35 ago
  Routing Descriptor Blocks:
  * 10.14.16.16, from 16.16.16.16, 00:18:35 ago, via FastEthernet1/1
      Route metric is 2, traffic share count is 1

This prefix is converted into a VPNv4 prefix and advertised over to R8:

R8#show bgp vpnv4 un rd 14.14.14.14:1 16.16.16.16
BGP routing table entry for 14.14.14.14:1:16.16.16.16/32, version 3
Paths: (1 available, best #1, no table)
  Not advertised to any peer
  Local
    14.14.14.14 (metric 1) from 14.14.14.14 (14.14.14.14)
      Origin incomplete, metric 2, localpref 100, valid, internal, best
      Extended Community: RT:100:1 OSPF DOMAIN ID:0x0005:0x000000030200
        OSPF RT:0.0.0.0:2:0 OSPF ROUTER ID:10.14.16.14:0
      mpls labels in/out nolabel/27

This should end up in the correct vrf table:

R8#show ip route vrf CUS1 16.16.16.16

Routing Table: CUS1
Routing entry for 16.16.16.16/32
  Known via "bgp 100", distance 200, metric 2, type internal
  Redistributing via ospf 2
  Advertised by ospf 2 subnets
  Last update from 14.14.14.14 00:07:03 ago
  Routing Descriptor Blocks:
  * 14.14.14.14 (default), from 14.14.14.14, 00:07:03 ago
      Route metric is 2, traffic share count is 1
      AS Hops 0
      MPLS label: 27
      MPLS Flags: MPLS Required

Finally R11 should receive that as on OSPF route:

R11#show ip route 16.16.16.16
Routing entry for 16.16.16.16/32
  Known via "ospf 1", distance 110, metric 2
  Tag Complete, Path Length == 1, AS 100, , type extern 2, forward metric 1
  Last update from 10.8.11.8 on FastEthernet1/0, 00:06:38 ago
  Routing Descriptor Blocks:
  * 10.8.11.8, from 10.8.11.8, 00:06:38 ago, via FastEthernet1/0
      Route metric is 2, traffic share count is 1
      Route tag 3489661028

So our control plane is all good so far. Let’s check our data plane forwarding:

R11#ping 16.16.16.16 so lo0

Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 16.16.16.16, timeout is 2 seconds:
Packet sent with a source address of 11.11.11.11
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 44/57/76 ms


R11#traceroute 16.16.16.16 so lo0

Type escape sequence to abort.
Tracing the route to 16.16.16.16

  1 10.8.11.8 8 msec 12 msec 12 msec
  2 10.8.9.9 [MPLS: Labels 24/27 Exp 0] 52 msec 56 msec 48 msec
  3 10.6.9.6 [MPLS: Labels 24/27 Exp 0] 56 msec 48 msec 48 msec
  4 10.5.6.5 [MPLS: Labels 16/27 Exp 0] 56 msec 40 msec 68 msec
  5 10.5.13.13 [MPLS: Labels 20/34/27 Exp 0] 52 msec 44 msec 52 msec
  6 10.3.13.3 [MPLS: Labels 19/34/27 Exp 0] 56 msec 48 msec 40 msec
  7 10.2.3.2 [MPLS: Labels 34/27 Exp 0] 52 msec 52 msec 40 msec
  8 10.1.2.1 [MPLS: Labels 19/27 Exp 0] 44 msec 44 msec 48 msec
  9 10.1.7.7 [MPLS: Labels 18/27 Exp 0] 52 msec 40 msec 60 msec
 10 10.14.16.14 [MPLS: Label 27 Exp 0] 40 msec 44 msec 36 msec
 11 10.14.16.16 44 msec *  52 msec

No problems there :)

Carrier Supporing Carrier Supporting Carrier

So let’s be silly and go a step further. What if our final customer is actually another ISP offering L3VPN to it’s customers? Let’s change our topology slightly(click the image for full size)
Screen Shot 2013 05 06 at 19.14.59 1024x208 Carrier Supporting Carrier

I’m not going to show all the config here as it’s simply too much to fit into a blog post. However the config itself is pretty much like so. You’ll need to click the image for the larger version:
cscsc 1024x327 Carrier Supporting Carrier

R10 and R15 are our final CE routers. Once all is configured does it all actually work?

R10#traceroute 15.15.15.15 so lo0

Type escape sequence to abort.
Tracing the route to 15.15.15.15

  1 10.10.11.11 12 msec 12 msec 8 msec
  2 10.8.11.8 [MPLS: Labels 16/17 Exp 0] 72 msec 48 msec 68 msec
  3 10.8.9.9 [MPLS: Labels 22/24/17 Exp 0] 36 msec 80 msec 44 msec
  4 10.6.9.6 [MPLS: Labels 20/24/17 Exp 0] 52 msec 56 msec 52 msec
  5 10.5.6.5 [MPLS: Labels 23/24/17 Exp 0] 64 msec 52 msec 52 msec
  6 10.5.13.13 [MPLS: Labels 20/23/24/17 Exp 0] 48 msec 60 msec 52 msec
  7 10.3.13.3 [MPLS: Labels 19/23/24/17 Exp 0] 56 msec 56 msec 48 msec
  8 10.2.3.2 [MPLS: Labels 23/24/17 Exp 0] 56 msec 44 msec 56 msec
  9 10.1.2.1 [MPLS: Labels 19/24/17 Exp 0] 48 msec 48 msec 64 msec
 10 10.1.7.7 [MPLS: Labels 16/24/17 Exp 0] 56 msec 52 msec 60 msec
 11 10.7.14.14 [MPLS: Labels 24/17 Exp 0] 44 msec 56 msec 56 msec
 12 10.15.16.16 [MPLS: Label 17 Exp 0] 56 msec 56 msec 44 msec
 13 10.15.16.15 56 msec *  76 msec

It does indeed. At this point we are up to a four label stack in AS500. If we were running RSVP-TE and FRR we would have even more labels sitting on top.

There is a much easier way to do this of course. The original Customer Carrier could just buy some some or virtual leased line or VPLS from AS500 and they would be directly connected over the same subnet. They could then run MPLS over that link and as far as anyone cares R1 and R6 would be directly connected to each other.

But of course this is a topic on the CCIE SP after all…

Tagged with:  

I wanted to test inter-vendor MPLS L3VPN compatibility between Brocade, Cisco, and Juniper. The ‘core’ itself will be Junos. In a future post I’ll probably have a random Brocade/Cisco device in the core as well to show how that works. This post will be the basis for a number of future posts on various MPLS applications. I wanted to have the core itself all done so that’s what I’ll crack on with on this post.

 

I’ll be running RSVP TE tunnels between my PE routers. The core devices will also be running RSVP of course. For this lab I’m just using OSPF as my core IGP.

 

Let’s use the following topology:

multi vendor l3vpn MPLS RSVP tunnels between Cisco IOS, Junos, & Brocade Netiron

R4 is a Cisco 7200 running advanced IP services version 12.2(33)SRD4
R8 is a Brocade Netiron XMR running 5.4b
All the other routers are M10s running 10.4R12.4

R6, R7, and R5 are my CPE routers – Note that they will not be used for this post, only future posts. R3, R4, and R8 are the PE routers. R1 and R2 are the P routers.

Core

As always with MPLS, the P routers config is very minimal. All the core interfaces are configured like so:

interfaces {
    fe-0/0/3 {
        unit 12 {
            vlan-id 12;
            family inet {
                address 10.0.4.6/30;
            }
            family mpls;

Family MPLS has to be configured on all core interfaces. My protocols config on R2 is like so:

USER2:R2> show configuration protocols
rsvp {
    interface fe-1/0/2.0;
    interface fe-0/0/3.12;
    interface fe-0/0/3.24;
}
mpls {
    interface fe-1/0/2.0;
    interface fe-0/0/3.24;
    interface fe-0/0/3.12;
}
ospf {
    traffic-engineering;
    area 0.0.0.0 {
        interface all;
    }
}

R1 has got a very similar config so I’m not pasting it here.

Junos PE

Family MPLS needs to be configured on the core facing interfaces. The rest of the relevant config for my set up is as follows:

USER3:R3> show configuration protocols
rsvp {
    interface fe-1/0/3.13;
}
mpls {
    no-cspf;
    label-switched-path TO-R4 {
        to 4.4.4.4;
        primary TO-R4;
    }
    label-switched-path TO-R8 {
        to 8.8.8.8;
        primary TO-R8;
    }
    path TO-R4 {
        4.4.4.4 loose;
    }
    path TO-R8 {
        8.8.8.8 loose;
    }
    interface fe-1/0/3.13;
}
ospf {
    traffic-engineering;
    area 0.0.0.0 {
        interface all;
        interface fe-0/0/3.36 {
            disable;
        }
    }
}

I’ve enabled loose paths to the loopbacks of the other 2 PE routers. OSPF TE is turned on and RSVP and MPLS are switched onto the relevant interfaces.

Brocade Netiron PE

There is no need to configure anything specific on the actual MPLS interfaces for Brocade. You simply need to add the core facing interfaces to the MPLS configuration stanza.

router ospf 
 area 0
!
router mpls
 policy
  traffic-eng ospf area 0

 path TO-R3
  loose 3.3.3.3

 path TO-R4
  loose 4.4.4.4

 mpls-interface ve2

 lsp TO-R3
  to 3.3.3.3
  primary TO-R3
  enable

 lsp TO-R4
  to 4.4.4.4
  primary TO-R4
  enable

Cisco IOS PE

With IOS, you need to ensure CEF is enabled. You also need to turn mpls traffic-engineering tunnels on globally, as well as on each core facing interface:

ip cef
!
mpls traffic-eng tunnels
!
interface Tunnel0
 ip unnumbered Loopback0
 tunnel destination 3.3.3.3
 tunnel mode mpls traffic-eng
 tunnel mpls traffic-eng path-option 10 dynamic
!
interface Tunnel1
 ip unnumbered Loopback0
 tunnel destination 8.8.8.8
 tunnel mode mpls traffic-eng
 tunnel mpls traffic-eng path-option 10 dynamic
!
router ospf 1
 router-id 4.4.4.4
 log-adjacency-changes
 mpls traffic-eng router-id Loopback0
 mpls traffic-eng area 0

Verification

The output of various show commands are of course different on each platform. I’ll be showing the actual LSP as well as do some MPLS traceroutes to show how each gives up information. The ‘detail’ switch on the LSPs throws out tons of information so I’ll add those on a later post. The main thing we are concerned about now is just to ensure the LSPs are in fact ‘up’

Brocade:

SSH@XMR_R8#sh mpls lsp
Note: LSPs marked with * are taking a Secondary Path
                               Admin Oper  Tunnel   Up/Dn Retry Active
Name           To              State State Intf     Times No.   Path
TO-R3          3.3.3.3         UP    UP    tnl0     1     0     TO-R3
TO-R4          4.4.4.4         UP    UP    tnl1     2     0     TO-R4

Junos:

USER3:R3> show mpls lsp
Ingress LSP: 2 sessions
To              From            State Rt P     ActivePath       LSPname
4.4.4.4         3.3.3.3         Up     0 *     TO-R4            TO-R4
8.8.8.8         3.3.3.3         Up     0 *     TO-R8            TO-R8
Total 2 displayed, Up 2, Down 0

Egress LSP: 2 sessions
To              From            State   Rt Style Labelin Labelout LSPname
3.3.3.3         4.4.4.4         Up       0  1 SE       3        - C7200_12.2SRD_t0
3.3.3.3         8.8.8.8         Up       0  1 FF       3        - TO-R3
Total 2 displayed, Up 2, Down 0

Transit LSP: 0 sessions
Total 0 displayed, Up 0, Down 0

Cisco:

C7200_12.2SRD#sh mpls traffic-eng tunnels brief
Signalling Summary:
    LSP Tunnels Process:            running
    Passive LSP Listener:           running
    RSVP Process:                   running
    Forwarding:                     enabled
    Periodic reoptimization:        every 3600 seconds, next in 3315 seconds
    Periodic FRR Promotion:         Not Running
    Periodic auto-bw collection:    every 300 seconds, next in 15 seconds
TUNNEL NAME                      DESTINATION      UP IF     DOWN IF   STATE/PROT
C7200_12.2SRD_t0                 3.3.3.3          -         Fa0/0.24  up/up
C7200_12.2SRD_t1                 8.8.8.8          -         Fa0/0.24  up/up
TO-R4                            4.4.4.4          Fa0/0.24  -         up/up
TO-R4                            4.4.4.4          Fa0/0.24  -         up/up

Both Cisco and Juniper show both outbound and inbound tunnels. The Brocade only shows outgoing tunnels in the brief output. The P routers will show transit tunnels like so:

USER2:R2> show mpls lsp
Ingress LSP: 0 sessions
Total 0 displayed, Up 0, Down 0

Egress LSP: 0 sessions
Total 0 displayed, Up 0, Down 0

Transit LSP: 6 sessions
To              From            State   Rt Style Labelin Labelout LSPname
3.3.3.3         4.4.4.4         Up       0  1 SE  300016   299872 C7200_12.2SRD_t0
3.3.3.3         8.8.8.8         Up       0  1 FF  299824   299792 TO-R3
4.4.4.4         8.8.8.8         Up       0  1 FF  300048        0 TO-R4
4.4.4.4         3.3.3.3         Up       0  1 FF  300032        0 TO-R4
8.8.8.8         4.4.4.4         Up       0  1 SE  300000        3 C7200_12.2SRD_t1
8.8.8.8         3.3.3.3         Up       0  1 FF  299856        3 TO-R8
Total 6 displayed, Up 6, Down 0

It’s pretty clear from the output above that LSPs are unidirectional and hence each PE-PE link is actually 2 LSPs, 1 in either direction.

You can also use MPLS pings and traceroutes. This is the output of a couple of MPLS RSVP traceroutes:

Brocade:

SSH@XMR_R8#traceroute mpls rsvp lsp TO-R3

Trace RSVP LSP TO-R3, timeout 5000 msec, TTL 1 to 30
Type Control-c to abort
 1  1ms 2.2.2.2 return code 8(Transit)
 2  1ms 1.1.1.1 return code 8(Transit)
 3  1ms 3.3.3.3 return code 3(Egress)

Cisco:

C7200_12.2SRD#traceroute mpls traffic-eng tunnel 0
Tracing MPLS TE Label Switched Path on Tunnel0, timeout is 2 seconds

Codes: '!' - success, 'Q' - request not sent, '.' - timeout,
  'L' - labeled output interface, 'B' - unlabeled output interface,
  'D' - DS Map mismatch, 'F' - no FEC mapping, 'f' - FEC mismatch,
  'M' - malformed request, 'm' - unsupported tlvs, 'N' - no label entry,
  'P' - no rx intf label prot, 'p' - premature termination of LSP,
  'R' - transit router, 'I' - unknown upstream index,
  'X' - unknown return code, 'x' - return code 0

Type escape sequence to abort.
  0 10.0.4.9 MRU 1500 [Labels: 300016 Exp: 0]
L 1 2.2.2.2 MRU 1518 [Labels: 299872 Exp: 7] 36 ms
L 2 1.1.1.1 MRU 1518 [Labels: implicit-null Exp: 0] 24 ms
! 3 3.3.3.3 1 ms

Juniper:

USER3:R3> traceroute mpls rsvp TO-R8
  Probe options: retries 3, exp 7

  ttl    Label  Protocol    Address          Previous Hop     Probe Status
    1   299824  RSVP-TE     10.0.4.14        (null)           Success
    2   299856  RSVP-TE     10.0.4.6         10.0.4.14        Success
    3        3  RSVP-TE     192.168.1.2      10.0.4.6         Egress

  Path 1 via fe-1/0/3.13 destination 127.0.0.64

All slightly different info, but the same end result. All of the vendors have detail switches which shows a lot more information.

So there you have it. I have all the RSVP tunnels up between all my PE routers and all is well. As noted before, this post will form the basis of a series of posts of various MPLS applications.

Tagged with:  

MPLS-TE via RSVP – Part 1 of 3 – Cisco IOS

On October 14, 2012, in CCIE, by Darren

I’m going to have to split this topic into three separate posts because otherwise it’ll just be too long and I’ll lose you halfway through.
Part 1 – Cisco IOS
Part 2 – Juniper JunOS
Part 3 – Brocade Netiron XMR/MLX
Part 4 – Cisco IOS-XR

Most people I speak to who have MPLS experience is usually experienced with LDP. Most probably because it’s easy and they have no need for traffic engineering.

However in the ISP space, the vast majority of MPLS cores run RSVP-TE. Not only does it give you traffic-engineering capabilities, it also gives you features like fast-reroute and hot standby LSPs. You can also use your IGP to carry TE extensions, but only link-state protocols will do this for you. i.e. you can forget about EIGRP doing anything good for you in an ISP core.

Some people tend to think that RSVP-TE is difficult, but really it’s not that difficult at all. Once you get over the initial hurdles you’ll see how powerful it can be. I have extensive Brocade Netiron RSVP-TE experience, a fair amount of JunOS RSVP-TE experience and hardly any IOS RSVP-TE experience. This is because my current core is all Brocade and Juniper. Unfortunately I can only test RSVP-TE on IOS and not IOS-XR as I don’t have any IOS-XR boxes available for me to test on. It’s far more likely that an ISP core would be running IOS-XR over IOS.

Let’s take the following topology into consideration that I’ll be using for all vendor makes. AR1 and AR3 are my ‘edge’ routers running iBGP with each other. They are each advertising a second loopback address to each over over BGP. CR1, CR2, and CR3 are my core routers not running any BGP at all.

Untitled MPLS TE via RSVP   Part 1 of 3   Cisco IOS

IOS basic config

Let’s start with the core network first. I’m pasting the relevant pieces of config here of CR1. CR2 and CR3 are going to be very similar:

mpls traffic-eng tunnels
!
interface Loopback0
 ip address 3.3.3.3 255.255.255.255
 ip ospf 1 area 0
!
interface Serial1/0
 ip address 10.2.0.2 255.255.255.0
 ip ospf 1 area 0
 mpls traffic-eng tunnels
!
interface Serial1/2
 ip address 10.3.0.1 255.255.255.0
 ip ospf 1 area 0
 mpls traffic-eng tunnels
!
router ospf 1
 mpls traffic-eng router-id Loopback0
 mpls traffic-eng area 0

AR1:

mpls traffic-eng tunnels
!
interface Loopback0
 ip address 2.2.2.2 255.255.255.255
 ip ospf 1 area 0
!
interface Loopback20
 ip address 20.20.20.20 255.255.255.255
!
interface Tunnel0
 ip unnumbered Loopback0
 tunnel destination 4.4.4.4
 tunnel mode mpls traffic-eng
 tunnel mpls traffic-eng path-option 5 dynamic
 no routing dynamic
!
interface Serial1/0
 ip address 10.2.0.1 255.255.255.0
 ip ospf 1 area 0
 mpls traffic-eng tunnels
!
router ospf 1
 mpls traffic-eng router-id Loopback0
 mpls traffic-eng area 0
 router-id 2.2.2.2
!
router bgp 13
 network 20.20.20.20 mask 255.255.255.255
 neighbor 4.4.4.4 remote-as 13
 neighbor 4.4.4.4 update-source Loopback0
 no auto-summary

AR3 has a similar config to AR1, so I’m not going to list it here. Essentially what we’ve done is enabled mpls traffic-engineering globally, enabled it on the transit interfaces, and finally enabled OSPF-TE in OSPF. The AR routers have an iBGP connection to each other. There is no need to enable MPLS IP anywhere as that actually enables LDP.

Now that my tunnels are up, let’s try and ping a BGP learned route and see what happens:

AR3#ping 20.20.20.20

Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 20.20.20.20, timeout is 2 seconds:
U.U.U
Success rate is 0 percent (0/5)

This won’t work because IOS won’t actually use this tunnel for any routing unless I specifically allow it. I could do static routing or PBR, but why not just let the routing protocol do the work?

interface Tunnel0
 tunnel mpls traffic-eng autoroute announce

This command allows the IGP to use the tunnel in it’s tree calculation. Let’s take a look at whether it works now or not:

AR3#ping 20.20.20.20       

Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 20.20.20.20, timeout is 2 seconds:
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 20/37/44 ms

Let’s take a look at the route and CEF table:

AR3#sh ip route 2.2.2.2    
Routing entry for 2.2.2.2/32
  Known via "ospf 1", distance 110, metric 129, type intra area
  Routing Descriptor Blocks:
  * directly connected, via Tunnel0
      Route metric is 129, traffic share count is 1

AR3#sh ip cef 20.20.20.20
20.20.20.20/32, version 12, epoch 0
0 packets, 0 bytes
  tag information from 2.2.2.2/32, shared
    local tag: tunnel-head
    fast tag rewrite with Tu0, point2point, tags imposed: {16}
  via 2.2.2.2, 0 dependencies, recursive
    next hop 2.2.2.2, Tunnel0 via 2.2.2.2/32
    valid adjacency
    tag rewrite with Tu0, point2point, tags imposed: {16}

In order to get to 2.2.2.2 which is the next-hop, it will send the traffic through the LSP tunnel. If we check the CEF table we can see that traffic will be directed towards the tunnel and have the label value of 16 imposed onto it. We can ensure this is correct with a traceroute:

AR3#traceroute 20.20.20.20

Type escape sequence to abort.
Tracing the route to 20.20.20.20

  1 10.3.0.1 [MPLS: Label 16 Exp 0] 36 msec 28 msec 44 msec
  2 10.2.0.1 44 msec 44 msec * 

It’s exactly what we see. Also note that the tunnel is actually following the shortest IGP path at the moment. This is because in the above config we told the ARs to signal the path dynamically. This means it’ll follow the IGP best path. Which will lead us onto our next section.

IOS explicit paths

We can tell IOS that we actually want to use the CR2-CR3 path instead of just learning this information dynamically. We now want to use CR2 and CR3 in the path and not CR1. We can do this in two ways depending on the topology. Either I tell my ingress router that it should follow a very specific path, or I just tell the ingress router to specifically miss a particular node. As LSPs are unidirectional, let’s try both.

AR1:

ip explicit-path name through-CR2-CR3 enable
 next-address 10.5.0.2 
 next-address 10.6.0.2 
 next-address 10.7.0.2 
!
interface Tunnel0
 tunnel mpls traffic-eng path-option 4 explicit name through-CR2-CR3
 tunnel mpls traffic-eng path-option 5 dynamic

AR3:

ip explicit-path name not-through-CR1 enable
 exclude-address 10.3.0.1
!
interface Tunnel0
 tunnel mpls traffic-eng path-option 4 explicit name not-through-CR1
 tunnel mpls traffic-eng path-option 5 dynamic
AR1#traceroute 40.40.40.40      

Type escape sequence to abort.
Tracing the route to 40.40.40.40

  1 10.5.0.2 [MPLS: Label 17 Exp 0] 60 msec 64 msec 72 msec
  2 10.6.0.2 [MPLS: Label 17 Exp 0] 64 msec 60 msec 48 msec
  3 10.7.0.2 68 msec 56 msec * 


AR3#traceroute 20.20.20.20

Type escape sequence to abort.
Tracing the route to 20.20.20.20

  1 10.7.0.1 [MPLS: Label 16 Exp 0] 48 msec 64 msec 64 msec
  2 10.6.0.1 [MPLS: Label 16 Exp 0] 76 msec 48 msec 72 msec
  3 10.5.0.1 64 msec *  60 msec

This is a pretty small topology, so by telling AR3 to skip CR1, there is only 1 other path available. So we create the explicit paths on each ingress router, and then under the tunnel interface we specify that this explicit path is more preferred than the dynamic path. Either way works and you can see from the traceroutes above that both work. The dynamic path is still left under the tunnel interface as we would still like to use it if the CR2-CR3 path becomes unavailable.

IOS Type-10 OSPF LSA

MPLS-TE extensions are carried within OSPF type-10 opaque LSAs. These LSAs have area flooding scope and hence they do not pass through multi-area OSPF. Another reason why ISP cores don’t run multi-area OSPF. You can see the LSAs in the database:

AR1#sh ip ospf database | begin Type-10
		Type-10 Opaque Link Area Link States (Area 0)

Link ID         ADV Router      Age         Seq#       Checksum Opaque ID
1.0.0.0         2.2.2.2         887         0x80000002 0x005AC6 0       
1.0.0.0         3.3.3.3         173         0x80000003 0x005CBB 0       
1.0.0.0         4.4.4.4         557         0x80000002 0x0062AE 0       
1.0.0.0         22.22.22.22     385         0x80000002 0x00AAD5 0       
1.0.0.0         33.33.33.33     319         0x80000002 0x00D651 0       
1.0.0.2         2.2.2.2         172         0x80000004 0x004EFC 2       
1.0.0.2         3.3.3.3         173         0x80000003 0x00704A 2       
1.0.0.2         4.4.4.4         174         0x80000004 0x004AF6 2       
1.0.0.2         22.22.22.22     128         0x80000002 0x008CDD 2       
1.0.0.2         33.33.33.33     76          0x80000002 0x00EFFB 2       
1.0.0.3         2.2.2.2         111         0x80000002 0x0025D4 3       
1.0.0.3         3.3.3.3         173         0x80000003 0x00535C 3       
1.0.0.3         4.4.4.4         306         0x80000002 0x001918 3       
1.0.0.3         22.22.22.22     128         0x80000002 0x00C228 3       
1.0.0.3         33.33.33.33     319         0x80000002 0x0064CC 3   

If we dig deeper into the LSA originated by CR1 we can see the following:

AR1#sh ip ospf database opaque-area adv-router 3.3.3.3

            OSPF Router with ID (2.2.2.2) (Process ID 1)

		Type-10 Opaque Link Area Link States (Area 0)

  LS age: 310
  Options: (No TOS-capability, DC)
  LS Type: Opaque Area Link
  Link State ID: 1.0.0.0
  Opaque Type: 1
  Opaque ID: 0
  Advertising Router: 3.3.3.3
  LS Seq Number: 80000003
  Checksum: 0x5CBB
  Length: 28
  Fragment number : 0

    MPLS TE router ID : 3.3.3.3

    Number of Links : 0

  LS age: 310
  Options: (No TOS-capability, DC)
  LS Type: Opaque Area Link
  Link State ID: 1.0.0.2
  Opaque Type: 1
  Opaque ID: 2
  Advertising Router: 3.3.3.3
  LS Seq Number: 80000003
  Checksum: 0x704A
  Length: 132
  Fragment number : 2
          
    Link connected to Point-to-Point network
      Link ID : 2.2.2.2
      Interface Address : 10.2.0.2
      Neighbor Address : 10.2.0.1
      Admin Metric : 64
      Maximum bandwidth : 193000
      Maximum reservable bandwidth : 0
      Number of Priority : 8
      Priority 0 : 0           Priority 1 : 0         
      Priority 2 : 0           Priority 3 : 0         
      Priority 4 : 0           Priority 5 : 0         
      Priority 6 : 0           Priority 7 : 0         
      Affinity Bit : 0x1
      IGP Metric : 64

    Number of Links : 1

  LS age: 310
  Options: (No TOS-capability, DC)
  LS Type: Opaque Area Link
  Link State ID: 1.0.0.3
  Opaque Type: 1
  Opaque ID: 3
  Advertising Router: 3.3.3.3
  LS Seq Number: 80000003
  Checksum: 0x535C
  Length: 132
  Fragment number : 3

    Link connected to Point-to-Point network
      Link ID : 4.4.4.4
      Interface Address : 10.3.0.1
      Neighbor Address : 10.3.0.2
      Admin Metric : 64
      Maximum bandwidth : 193000
      Maximum reservable bandwidth : 0
      Number of Priority : 8
      Priority 0 : 0           Priority 1 : 0         
      Priority 2 : 0           Priority 3 : 0         
      Priority 4 : 0           Priority 5 : 0         
      Priority 6 : 0           Priority 7 : 0         
      Affinity Bit : 0x1
      IGP Metric : 64
          
    Number of Links : 1

You can see that it will show all your links, any affinities set, max reserved bandwidths, and any currently used bandwidths for different priorities.

I could go on for many hours showing various MPLS features but then I’ll never finish this article.

In the next part I’ll be doing JunOS showing the same features and config as I showed above. In the final part I’ll be doing the same for Brocade Netiron.

Tagged with:  

Setting up a FreeRadius test lab (HOWTO)

On October 1, 2012, in CCIE, by Darren

It’s quite handy to have one of these labs to test your radius configs, especially in the ISP world. This is mainly for testing radius attributes as it’s very easy to get a Cisco box to actually be a regular PPPoE server.

I have an old 7200 NPE-300 connected to a virtual machine running in VMware

FreeRadius Setting up a FreeRadius test lab (HOWTO)

I’m running Ubuntu server 12.04 so installing freeradius is pretty painless:

darreno@radius:~$ sudo apt-get install freeradius

Now we need to configure the box. Just a few files need to be edited for our environment. I won’t go over every single part of radiusd.conf, only the things I made changes to:

darreno@radius:/etc/freeradius$ sudo vi radius.conf

listen {
        type = auth
        ipaddr = 10.80.1.1
        port = 1645

}

listen {
        ipaddr = 10.80.1.1
        port = 1646
        type = acct
}

log {
        destination = files
        file = ${logdir}/radius.log
        syslog_facility = daemon
        stripped_names = no
        auth = yes
        auth_badpass = yes
        auth_goodpass = yes
}

It’s always good to have a fair amount of logging, especially in a lab.

We also need to tell the FreeRadius server that a radius client will be coming in and making authentication requests. We also choose a password here:

darreno@radius:/etc/freeradius$ sudo vi clients.conf
client 10.80.1.2 {
        secret          = radiuspassword
        shortname       = 10.80.1.2
        nastype         = cisco
}

Short and sweet

Finally the actual username, passwords, IPs, attributes, etc are all stored in the users file. For now let’s just create a short single entry:

darreno@radius:/etc/freeradius$ sudo vi users

testuser     Password = "password"
        Framed-IP-Address = 192.168.1.100

Now onto the 7200. The 7200 and FreeRadius server are directly connected in this lab, but in the real world all they need is IP connectivity to each other.

aaa group server radius RADIUS_SERVER
 server 10.80.1.1 auth-port 1645 acct-port 1646
!
aaa authentication ppp CPE_USER group RADIUS_SERVER
aaa authorization network default group RADIUS_SERVER
!
vpdn enable
!
bba-group pppoe LAB
 virtual-template 1
 sessions per-mac limit 20
 sessions per-vlan limit 250
!
interface Loopback0
 ip address 200.200.200.200 255.255.255.255
!
interface FastEthernet0/0
 description Link to FreeRadius server
 ip address 10.80.1.2 255.255.255.0
 duplex full
!
interface FastEthernet1/0
 description PPPOE interface
 no ip address
 duplex full
 pppoe enable group LAB
!
interface Virtual-Template1
 ip unnumbered Loopback0
 no peer default ip address
 ppp authentication chap CPE_USER
!
radius-server host 10.80.1.1 auth-port 1645 acct-port 1646 key radiuspassword

I’ve used a radius group which allows you to add more radius servers and test fail-over scenarios.

For a test device I’ve just configured a 2801 like so:

interface FastEthernet0/0
 no ip address
 duplex auto
 speed auto
 pppoe enable group global
 pppoe-client dial-pool-number 1
!
interface Dialer1
 mtu 1492
 ip address negotiated
 encapsulation ppp
 dialer pool 1
 ppp chap hostname testuser
 ppp chap password 0 password

Let’s give it a quick test. I’ve enabled logging on the radius server to see what’s going on. Let me enable the 2801′s PPPoE interface and see if the radius server sees the authentication request coming in:

darreno@radius:/etc/freeradius$ tail -f /var/log/freeradius/radius.log
Mon Oct  1 21:24:23 2012 : Auth: Login OK: [testuser/] (from client 10.80.1.2 port 0)

So that’s all fine. Did my router pick up the correct IP address?

c2801#sh int dialer 1
Dialer1 is up, line protocol is up (spoofing)
  Hardware is Unknown
  Internet address is 192.168.1.100/32
  MTU 1492 bytes, BW 56 Kbit/sec, DLY 20000 usec,
     reliability 255/255, txload 1/255, rxload 1/255
  Encapsulation PPP, LCP Closed, loopback not set
  Keepalive set (10 sec)
  DTR is pulsed for 1 seconds on reset
  Interface is bound to Vi2
  Last input never, output never, output hang never
  Last clearing of "show interface" counters 05:13:33
  Input queue: 0/75/0/0 (size/max/drops/flushes); Total output drops: 0
  Queueing strategy: weighted fair
  Output queue: 0/1000/64/0 (size/max total/threshold/drops)
     Conversations  0/0/16 (active/max active/max total)
     Reserved Conversations 0/0 (allocated/max allocated)
     Available Bandwidth 42 kilobits/sec
  5 minute input rate 0 bits/sec, 0 packets/sec
  5 minute output rate 0 bits/sec, 0 packets/sec
     1017 packets input, 103010 bytes
     4703 packets output, 173178 bytes
Bound to:
Virtual-Access2 is up, line protocol is up
  Hardware is Virtual Access interface
  MTU 1492 bytes, BW 56 Kbit/sec, DLY 20000 usec,
     reliability 255/255, txload 1/255, rxload 1/255
  Encapsulation PPP, LCP Open
  Stopped: CDPCP
  Open: IPCP
  PPPoE vaccess, cloned from Dialer1
  Vaccess status 0x44, loopback not set
  Keepalive set (10 sec)
  Interface is bound to Di1 (Encapsulation PPP)
  Last input 00:00:01, output never, output hang never
  Last clearing of "show interface" counters 00:01:55
  Input queue: 0/75/0/0 (size/max/drops/flushes); Total output drops: 0
  Queueing strategy: fifo
  Output queue: 0/40 (size/max)
  5 minute input rate 0 bits/sec, 0 packets/sec
  5 minute output rate 0 bits/sec, 0 packets/sec
     27 packets input, 387 bytes, 0 no buffer
     Received 0 broadcasts (0 IP multicasts)
     0 runts, 0 giants, 0 throttles
     0 input errors, 0 CRC, 0 frame, 0 overrun, 0 ignored, 0 abort
     26 packets output, 378 bytes, 0 underruns
     0 output errors, 0 collisions, 0 interface resets
     0 unknown protocol drops
     0 output buffer failures, 0 output buffers swapped out
     0 carrier transitions

c2801#show ip route connected | beg Ga
Gateway of last resort is not set

      192.168.1.0/32 is subnetted, 1 subnets
C        192.168.1.100 is directly connected, Dialer1
      200.200.200.0/32 is subnetted, 1 subnets
C        200.200.200.200 is directly connected, Dialer1


c2801#ping 200.200.200.200
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 200.200.200.200, timeout is 2 seconds:
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 1/2/4 ms

These are PPP links and hence the 7200 and 2801 have swapped host routes. This is why they can get to each other. We can also check form the 7200 side:

c7200#sh ip route 192.168.1.100
Routing entry for 192.168.1.100/32
  Known via "connected", distance 0, metric 0 (connected, via interface)
  Routing Descriptor Blocks:
  * directly connected, via Virtual-Access1.1
      Route metric is 0, traffic share count is 1

c7200#ping 192.168.1.100

Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 192.168.1.100, timeout is 2 seconds:
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 1/1/4 ms

So everything is working just as expected.

The whole point of radius attributes is to be able to do all kinds of fancy things. Let’s say that this 2801 has another network behind it that the rest of our network needs to be able to get to through the BRAS box. An easy way is to get the 7200 to install a static route to the network behind the 2801 that gets installed when the router dials in. Let’s use a loopback on the 2801 for this purpose:

interface Loopback1
 ip address 40.40.40.40 255.255.255.255

going back to the users files in radius above we do the following:

testuser     Password = "password"
        Framed-IP-Address = 192.168.1.100,
        Cisco-Avpair += "ip:route=40.40.40.40 255.255.255.255"

Let’s clear the pppoe session and take a look at the 7200:

c7200#sh ip route 40.40.40.40
Routing entry for 40.40.40.40/32
  Known via "static", distance 1, metric 0
  Routing Descriptor Blocks:
  * 192.168.1.100
      Route metric is 0, traffic share count is 1

c7200#ping 40.40.40.40

Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 40.40.40.40, timeout is 2 seconds:
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 1/1/4 ms

As this is a static route to a connected route, the 7200 can redistribute the routes into the IGP so the rest of your network can get to it. Notice that when I reload the 2801 and the session is pulled down, the static route is removed:

c7200#sh ip route 40.40.40.40
% Network not in table

There are a TON of radius attributes. If I have the time I may go over a few handy ones with which you can create some powerful routing policies.

Tagged with:  

So does OSPF always use the shortest path in in order to ensure that packets always get from A to B with the lowest end to end cost? Not always. In fact when you have more than a single area it’s very easy to NOT go the shortest path at all. You could even turn your ‘non-transit’ 10Mb links into transit links.

Let’s take the following network as an example:
MAOSPF 1 Are you sure its the shortest path?   OSPF Multi Area issues

R3 represents our core. R1 and R2 are both aggregation boxes where all our customers connect to. These boxes are connected into the core with their Gig links. R4 is our first customer. Mr customer wants a primary Gig link with a 100Mb backup link. We have decided to put each customer into their own OSPF area. We will also be changing the auto-cost reference bandwidth to 100Gb to ensure our core sees the difference between 100Mb and Gig links:

R3
interface Loopback0
 ip address 3.3.3.3 255.255.255.255
 ip ospf 1 area 0

interface GigabitEthernet1/0
 ip address 10.0.13.3 255.255.255.0
 ip ospf 1 area 0
!
interface GigabitEthernet2/0
 ip address 10.0.23.3 255.255.255.0
 ip ospf 1 area 0
!         
router ospf 1
 router-id 3.3.3.3
 auto-cost reference-bandwidth 100000
R1
interface GigabitEthernet2/0
 ip address 10.0.13.1 255.255.255.0
 ip ospf 1 area 4
!
interface GigabitEthernet1/0
 ip address 10.0.14.1 255.255.255.0
 ip ospf 1 area 0
!         
router ospf 1
 router-id 1.1.1.1
 auto-cost reference-bandwidth 100000
R2
interface GigabitEthernet2/0
 ip address 10.0.23.2 255.255.255.0
 ip ospf 1 area 0
!
interface FastEthernet1/0
 ip address 10.0.24.2 255.255.255.0
 ip ospf 1 area 4
!
router ospf 1
 router-id 2.2.2.2
 auto-cost reference-bandwidth 100000
R4
interface Loopback0
 ip address 4.4.4.4 255.255.255.255
 ip ospf 1 area 4
!         
interface GigabitEthernet1/0
 ip address 10.0.14.4 255.255.255.0
 ip ospf 1 area 4
!
interface FastEthernet2/0
 ip address 10.0.24.4 255.255.255.0
 ip ospf 1 area 4
!
router ospf 1
 router-id 4.4.4.4
 auto-cost reference-bandwidth 100000

Our core should now see that the best way to get to R4′s loopback is to go through R1:

3#sh ip route 4.4.4.4
Routing entry for 4.4.4.4/32
  Known via "ospf 1", distance 110, metric 201, type inter area
  Last update from 10.0.13.1 on GigabitEthernet1/0, 00:09:44 ago
  Routing Descriptor Blocks:
  * 10.0.13.1, from 1.1.1.1, 00:09:44 ago, via GigabitEthernet1/0
      Route metric is 201, traffic share count is 1
R3#traceroute 4.4.4.4
Type escape sequence to abort.
Tracing the route to 4.4.4.4
VRF info: (vrf in name/id, vrf out name/id)
  1 10.0.13.1 8 msec 20 msec 16 msec
  2 10.0.14.4 16 msec *  20 msec

Similarly R4 should see that the best way to get to R3 is back through R1:

R4#sh ip route 3.3.3.3
Routing entry for 3.3.3.3/32
  Known via "ospf 1", distance 110, metric 201, type inter area
  Last update from 10.0.14.1 on GigabitEthernet1/0, 00:03:47 ago
  Routing Descriptor Blocks:
  * 10.0.14.1, from 1.1.1.1, 00:03:47 ago, via GigabitEthernet1/0
      Route metric is 201, traffic share count is 1
R4#traceroute 3.3.3.3
Type escape sequence to abort.
Tracing the route to 3.3.3.3
VRF info: (vrf in name/id, vrf out name/id)
  1 10.0.14.1 16 msec 16 msec 20 msec
  2 10.0.13.3 20 msec *  20 msec

So everything is fine. Or so we think. There is already a problem here, but it won’t cause a problem until we bring in another customer. Let’s add 2 customers. The first is connected to R1 and the second is connected to R2. Both of these customers have purchased 100Mb single links.

MAOSPF 2 Are you sure its the shortest path?   OSPF Multi Area issues

So, traffic sent from R4′s loopback to either of the 2 new customers loopbacks should get into the core via R4′s 1Gb primary link. Is that what we see?

R4
R4#traceroute 5.5.5.5 source 4.4.4.4
Type escape sequence to abort.
Tracing the route to 5.5.5.5
VRF info: (vrf in name/id, vrf out name/id)
  1 10.0.14.1 16 msec 20 msec 20 msec
  2 10.0.15.5 20 msec *  24 msec
R4#
R4#traceroute 6.6.6.6 source 4.4.4.4
Type escape sequence to abort.
Tracing the route to 6.6.6.6
VRF info: (vrf in name/id, vrf out name/id)
  1 10.0.14.1 12 msec 20 msec 16 msec
  2 10.0.13.3 20 msec 60 msec 20 msec
  3 10.0.23.2 40 msec 44 msec 40 msec
  4 10.0.26.6 72 msec *  44 msec

That’s exactly what we see, but do we have the full picture here? Let’s trace from these new customers to R4′s loopback. Again both should go over R4′s 1Gb primary link:

R5
R5#traceroute 4.4.4.4 source 5.5.5.5
Type escape sequence to abort.
Tracing the route to 4.4.4.4
VRF info: (vrf in name/id, vrf out name/id)
  1 10.0.15.1 8 msec 20 msec 16 msec
  2 10.0.14.4 20 msec *  24 msec

R5 is correct. What about R6?

R6
R6#traceroute 4.4.4.4 source 6.6.6.6
Type escape sequence to abort.
Tracing the route to 4.4.4.4
VRF info: (vrf in name/id, vrf out name/id)
  1 10.0.26.2 20 msec 16 msec 20 msec
  2 10.0.24.4 64 msec *  68 msec

Well this is most certainly NOT correct. Why is this traceroute going through R4′s 100Mb backup link? Let’s go back to the beginning and see what we missed. Let’s have a look at the 3 core routers to see how they all want to get to 4.4.4.4:

R3
R3#sh ip route 4.4.4.4
Routing entry for 4.4.4.4/32
  Known via "ospf 1", distance 110, metric 201, type inter area
  Last update from 10.0.13.1 on GigabitEthernet1/0, 00:29:17 ago
  Routing Descriptor Blocks:
  * 10.0.13.1, from 1.1.1.1, 00:29:17 ago, via GigabitEthernet1/0
      Route metric is 201, traffic share count is 1
R1
R1#sh ip route 4.4.4.4
Routing entry for 4.4.4.4/32
  Known via "ospf 1", distance 110, metric 101, type intra area
  Last update from 10.0.14.4 on GigabitEthernet1/0, 00:37:44 ago
  Routing Descriptor Blocks:
  * 10.0.14.4, from 4.4.4.4, 00:37:44 ago, via GigabitEthernet1/0
      Route metric is 101, traffic share count is 1
R2
R2#sh ip route 4.4.4.4
Routing entry for 4.4.4.4/32
  Known via "ospf 1", distance 110, metric 1001, type intra area
  Last update from 10.0.24.4 on FastEthernet1/0, 00:38:47 ago
  Routing Descriptor Blocks:
  * 10.0.24.4, from 4.4.4.4, 00:38:47 ago, via FastEthernet1/0
      Route metric is 1001, traffic share count is 1

Here is the problem. R2 prefers to get to 4.4.4.4 over it’s directly connected link, even though the metric through R3 would be 401, a whole lot less than 1001.

The issue is that OSPF has it’s own selection process. Regardless of metric, OSPF will ALWAYS prefer intra area routes over inter area routes over external routes. R2 has an interface in Area 4, the same area in which it’s learning about R4′s loopback. Hence when traffic addressed to 4.4.4.4 passes through it, it will always send it off over it’s area 4 interface, no matter how slow it is. It doesn’t make any difference if the second customer is in area 0 or their own area.

In fact, if you dive a bit deeper, you can see that as far as R6 is concerned, the traffic will be going over R4′s primary link. If you see the interface cost of R6′s link as well as the cost end to end this is what you get:

R6
R6#sh ip os int brief | include Fa1/0
Fa1/0        1     0               10.0.26.6/24       1000  BDR   1/1
R6#                                  
R6#
R6#sh ip route 4.4.4.4               
Routing entry for 4.4.4.4/32
  Known via "ospf 1", distance 110, metric 1301, type inter area
  Last update from 10.0.26.2 on FastEthernet1/0, 00:19:15 ago
  Routing Descriptor Blocks:
  * 10.0.26.2, from 1.1.1.1, 00:19:15 ago, via FastEthernet1/0
      Route metric is 1301, traffic share count is 1

What about R2′s active route cost?

R2
R2#sh ip route 4.4.4.4
Routing entry for 4.4.4.4/32
  Known via "ospf 1", distance 110, metric 1001, type intra area
  Last update from 10.0.24.4 on FastEthernet2/0, 00:43:29 ago
  Routing Descriptor Blocks:
  * 10.0.24.4, from 4.4.4.4, 00:43:29 ago, via FastEthernet2/0
      Route metric is 1001, traffic share count is 1

So R6 thinks that traffic will actually go over it’s 1000 cost link, then over the 3 X 100 cost Gig links. But R2 effectively ‘highjacks’ this traffic to send it over it’s direct area 4 link.

So, how can this be fixed?

The first way is to just put everything in area 0. This way all addresses will be reachable via inter area links in area 0. Even if you injected all prefixes in via redistribution or route-policy they’ll all be external, but still reachable through area 0 links.

The second way is to create some sort of tunnel between R1 and R2 and put that tunnel interface into area 4. This way R2 would learn about R4′s loopback over 2 area 4 interfaces. You would need to ensure this tunnel interface has a lower cost than the 100Mb direct connection to R4 in order for traffic to actually be preferred. But who really wants to be creating tunnels over the core of their network? Virtual-links can only be used to connect to area 0, not area 4. Sham links? Can only be used with MPLS.

The third way is thinking outside the box a little. You could use PPPoE over the secondary link and not use OSPF on the link. On R4 you would have a floating static route pointing towards the dialer interface. The actual radius account you use would create a static route to R4′s loopback with a next-hop of the p2p PPPoE link. Ensure the static route is created with a AD higher than OSPF to ensure it’ll use the OSPF link if available.

The fourth way is to just use another protocol connecting the core to the CPE device. BGP perhaps?

The fifth, final, and ties with option 1 for simplicity’s sake is using RFC 5185 – OSPF Multi-Area Adjacency. What this RFC states is the ability to put a routers interface into more than a single OSPF area. This means that I could keep R1 and R2′s links in area 0, but put those same links into area 4. The same would be done for R3. This means that R2 would learn the best from from R1 as an intra area route, without the need for dodgy tunnels. The main problem is that most vendors simply don’t have support for it. Cisco only has it in IOS XE. JUNOS had it since JUNOS 9.4 though. Brocade? No mention of it anywhere yet.

Considering I have some post 9.4 JunOS boxes here, let’s test this out:

R2
> show configuration protocols ospf
reference-bandwidth 10g;
area 0.0.0.0 {
    interface fe-0/0/1.66;
    interface fe-0/0/0.51;
}
area 0.0.0.4 {
    interface fe-1/3/0.16 {
        metric 1000;
    }
    interface fe-0/0/1.66 {
        secondary;
    }
}
R3
> show configuration protocols ospf
reference-bandwidth 10g;
area 0.0.0.0 {
    interface fe-0/0/0.66;
    interface fe-0/0/0.63;
    interface lo0.9;
}
area 0.0.0.4 {
    interface fe-0/0/0.66 {
        secondary;
    }
    interface fe-0/0/0.63 {
        secondary;
    }
}
R1
> show configuration protocols ospf
reference-bandwidth 10g;
area 0.0.0.0 {
    interface fe-0/0/1.63;
    interface fe-1/3/3.79;
}
area 0.0.0.4 {
    interface fe-1/3/0.14;
    interface fe-0/0/1.63 {
        secondary;
    }
}

As you can see, the configuration is pretty simple. You simple add an interface to another area and set it as secondary. Let’s have a look at R2′s neighbours:

> show ospf neighbor
Address          Interface              State     ID               Pri  Dead
10.0.26.6        fe-0/0/0.51            Full      6.6.6.6          128    38
  Area 0.0.0.0
10.0.23.3        fe-0/0/1.66            Full      3.3.3.3          128    38
  Area 0.0.0.0
10.0.23.3        fe-0/0/1.66            Full      3.3.3.3          128    38
  Area 0.0.0.4
10.0.24.4        fe-1/3/0.16            Full      4.4.4.4          128    33
  Area 0.0.0.4

R2 has an adjacency over fe-0/0/1.66 twice. One in Area 0 and one in Area 4. This means it should be learning R4′s loopback as 2 intra-area and 1 inter-area route. It should then choose the path through R3 as it has the better metric:

> show route 4.4.4.4

inet.0: 17 destinations, 17 routes (17 active, 0 holddown, 0 hidden)
+ = Active Route, - = Last Active, * = Both

4.4.4.4/32         *[OSPF/10] 00:18:07, metric 300
                    > to 10.0.23.3 via fe-0/0/1.66

Which is exactly what we see.

Let’s do another traceroute from R6 to confirm:

> traceroute 4.4.4.4
traceroute to 4.4.4.4 (4.4.4.4), 30 hops max, 40 byte packets
 1  10.0.26.2 (10.0.26.2)  1.098 ms  0.965 ms  0.800 ms
 2  10.0.23.3 (10.0.23.3)  0.846 ms  0.943 ms  0.836 ms
 3  10.0.13.1 (10.0.13.1)  0.884 ms  1.036 ms  0.882 ms
 4  4.4.4.4 (4.4.4.4)  1.166 ms  1.328 ms  1.155 ms
Tagged with:  

© 2009-2014 Darren O'Connor All Rights Reserved