Crisis diverted, my blog is saved

MY blog was down for a good 6 hours yesterday, and would’ve been down for a lot longer. Sometimes I tend not to follow my own advice and this could very easily have been avoided.

To put this story into context you need to know where I moved my blog a few months back. My blog was initially hosted on uk2.net’s servers where I used to work. While it’s always been reliable, they had no IPv6 roll-out plans and I wanted to test both IPv6 on my site as well as in my core network. So I moved my blog into my lab here at my current job and it’s been running there ever since.

While we religiously monitor all our equipment, we don’t monitor lab equipment. So I can guess you already see where this is going.

My lab server is an HP G5 DL380 with 2 X 3GHz quad cores and 8GB ram. It has a smart array controller with 2 X 72GB Raid 1 and 6 X 72GB Raid 5 disks. This has VMWare ESX 4.0 installed and is controlled by a separate, monitored vcenter server elsewhere in the network. We use a generational backup application that backs up VMs through the vcenter server.

So with all that out-of-the-way, let’s see what happened.

Exhibit A:

It all started a few days ago when my NOC told me that the backup application was reporting an error when attempting to do a backup of the lab server. I checked to see if my blog was still running and it was. However we could not log directly onto the ESX server via the vsphere client or even SSH. It was decided that we would restart the management tools on the ESX lab server as it seemed that was the issue. i.e. VM’s were running but we could not manage the box.

Now, I really should follow my own advice. At this point I should’ve logged onto my blog, dumped the database, and gzipped /var/www and the dump. But I didn’t. I stupidly thought a simple restart of the management tools would fix the problem.

I connected a monitor and keyboard to my ESX host and was greeted with the familiar screen telling me to manage it remotely via the IP. I opened a new console window with alt+f2 and attempted to login. I typed my username and pressed enter, only for it to sit there forever not asking for my password. Ok. Let’s try a reboot.

Bad move. On the reboot nothing happened. My screen stayed off, but the servers fans were spinning. Let’s try unplugging the server for 10 minutes and plug everything back. This time the fans rted up at full speed, no screen on monitor and the fans did not spin down at all.

Shit.

I’ve seen this type of problem before and it’s usually a dodgy piece of kit of something not plugged in correctly. So one by one I started removing components, switching on, switching off, insert component, remove another, repeat…

Eventually it turned out that it did not want to start with 2 of the RAM modules. Fine. I set those aside and started the server with 6GB.

Now the server is attempting to start, but the array controller on boot is complaining that the raid 5 array is not good. I insert my smartstart CD and open up the array configuration utility. There I can see the raid 5 array in rebuilding. It’s assumed one of the disks I had in that array is new. Note at this point I’ve not actually touched any disks. So I let the rebuild continue and it steadily gets to 89%, then back down to 0% with the alert ‘waiting for rebuild’

SHIT!

I check the stock room and find another 10K serial ATA 72GB 2.5″ disk and stick that in. My array starts rebuilding again.

And then stops at 89% again…

Ok, so the array won’t rebuild, but in theory my data should be fine with a disk missing for now. So I pull the disk out again.

Start up the server and ignore the complaints and VMWare ESX starts to boot up. VMWare ESX is installed on the unaffected Raid 1 disk. It eventually boots, only for the EXACT same problems I had before to start again. I can’t log in via the console. I can’t log in remotely. I can’t do anything really. Not only that but I’m getting all kinds of SCSI errors and VMWare errors on my console screen. I punch the VMWare error into google only for 0 results to come up. Fantastic.

It’s at this point I start to really worry. I noted above that the NOC said we were backing this server up. What exactly what getting backed up? We had a look and the only thing being backed up was esx.conf – That’s right. 1 single configuration file and nothing else. It makes sense really as this is a lab server.

So thinking that I’m now going to lose all the work I’ve put into the blog, I need to get a bit creative.

I wanted to see if I could boot into a live disk and ‘see’ the second disk. That way I might be able to get my .vmdk files back at least. I tried Bart’s PE disk (Windows-based) and through diskpart was able to see my 350GB raid 5 disk. However the partition type was unknown.

Note: Really I should’ve got a better live CD that can actually read vmware formatted disks. Any suggestions? Please comment!

Ok so now I’m getting desperate. It’s also getting close to 18:00 and I really don’t want to be spending my entire Friday night trying to recover this.

So what else can read a vmware formatted disk? Well vmware of course! My final goal was to reinstall vmware 4.0 from the original disk on the first disk, and hope and pray that it actually manages to read the second disk with all my VMs.

I couldn’t find the original disk so downloaded the .iso from vmware. I had no blank DVDs so used unetbootin to extract the ios onto a USB stick. This doesn’t work. It get’s halfway through installing only for it to complain the disk is not in the drive. I finally manage to locate the original DVD, only for the server to refuse to boot from it.

ARGH!!!

This was a stupid mistake. Someone has swapped out the DVD drive with a CD drive and hence the CD drive could not read the DVD. Pop a DVD drive in and it finally starts installing VMWare.

VMWare installed, reboot. I’m now finally able to log into the ESX box remotely! However there are no VMs in the inventory. Ok, I do see my 350GB disk as an available datastore so let’s right-click and browse.

And I see all my folders :) Right-click my blog vm, add to inventory. Start that beast up.

mellowd.co.uk/ccie is FINALLY back online!!!

I immediately set up a .vmdk level backup of my blog and run it. I also do a mysql dump of my database and do a file-level backup. I’ve set my .vmdk backup to run daily. This really should’ve been done from the start!

The server still refuses to rebuild the raid 5 array so the server is not in perfect health. I will be moving the VM to a properly monitored box shortly.

All in all I had dodgy ram, a faulty raid 5 array, and a borked VMWare install. My guess is that the dodgy ram was to blame for VMWare getting all messed up.

Oddly enough, none of the alert lights on the front of the HP server were showing red.

MPLS-TE via RSVP – Part 2 of 3 – Juniper JunOS

This is part 2 of my basic MPLS-TE RSVP series.

Part 1 – Cisco IOS
Part 2 – Juniper JunOS
Part 3 – Brocade Netiron XMR/MLX
Part 4 – Cisco IOS-XR

So now let’s move onto JunOS. Same topology used:

JunOS basic config

I’m pasting the relevant pieces of config here of CR1. CR2 and CR3 are going to be very similar:

interfaces {
    fe-0/0/1 {
        unit 13 {
            vlan-id 13;
            family inet {
                address 10.0.2.2/24;
            }
            family mpls;
        }
    }
    fe-0/0/2 {
        unit 13 {
            vlan-id 13;
            family inet {
                address 10.0.3.1/24;
            }
            family mpls;
        }
    }
    lo0 {
        unit 3 {
            family inet {
                address 3.3.3.3/32;
            }
        }
    }
}
protocols {
    rsvp {
        interface all;
    }
    mpls {
        interface all;
    }
    ospf {
        area 0.0.0.0 {
            interface all;
        }
    }
}

Again the actual core routers have a very vanilla configuration. In JunOS we need to enable RSVP on all transit interfaces, MPLS on all interfaces, and family mpls on the transit interfaces themselves. Some of you may be thinking why do we need to enable MPLS under both the interface and protocols stanza? Well any config on the interfaces is generally information for the data-plane, while the protocols config is for the control-plane. Basically enabling mpls under the protocols ensures MPLS actually runs on the interface, while the interface configuration ensures that the interface will actually accept labeled packets to begin with.

One thing to note is that I’m not actually enabling traffic-engineering under OSPF. At least not yet. More on that later…

Now for AR1’s config. This has the same config as above, plus a whole lot more:

interfaces {
    fe-0/0/0 {
        unit 13 {
            vlan-id 13;
            family inet {
                address 10.0.2.1/24;
            }
            family mpls;
        }
        unit 15 {
            vlan-id 15;
            family inet {
                address 10.0.5.1/24;
            }
            family mpls;
        }
    }
    lo0 {
        unit 1 {
            family inet {
                address 1.1.1.1/32;
                address 20.20.20.20/32;
            }
        }
    }
}
protocols {
    rsvp {
        interface all;
}
    mpls {
    no-cspf;
    label-switched-path AR1-to-AR3 {
        to 2.2.2.2;
    }
    interface all;
}
    }
    bgp {
        group INTERNAL {
            local-address 1.1.1.1;
            export LO_OUT;
            neighbor 2.2.2.2 {
                peer-as 13;
            }
        }
    }
    ospf {
        export export-ospf;
        area 0.0.0.0 {
            interface all;
            interface lo0.1 {
                disable;
            }
        }
    }
}
policy-options {
    prefix-list LOOP {
        20.20.20.20/32;
    }
    policy-statement LO_OUT {
        from {
            protocol direct;
            prefix-list LOOP;
        }
        then accept;
    }
    policy-statement export-ospf {
        from {
            protocol direct;
            route-filter 1.1.1.1/32 exact;
        }
        then accept;
    }
}
routing-options {
    autonomous-system 13;
}

So it has the same MPLS config. I’ve also configured BGP and OSPF so that it exports the first loopback address in OSPF and the second in BGP. I’ve also created an LSP to 2.2.2.2 – Notice that I’m still not running traffic-engineering extensions in OSPF. What this means is that the LSP will simply be signaled using the current shorted path OSPF route. No TE extensions required. This is a fundamental difference to IOS as IOS needs TE in your link state protocol in order for this to function.

Let’s now ping across:

darreno:AR3> ping 20.20.20.20 rapid
PING 20.20.20.20 (20.20.20.20): 56 data bytes
!!!!!
--- 20.20.20.20 ping statistics ---
5 packets transmitted, 5 packets received, 0% packet loss
round-trip min/avg/max/stddev = 1.068/3.046/10.511/3.735 ms

This does work, unlike IOS. With IOS we had to instruct the OS to actually consider the tunnel as a usable route-able interface. In JunOS, all LSP endpoints enter the inet.3 table. BGP uses this table first as a next-hop resolver. Let’s check this:

darreno:AR3> show route table inet.3

inet.3: 1 destinations, 1 routes (1 active, 0 holddown, 0 hidden)
+ = Active Route, - = Last Active, * = Both

1.1.1.1/32         *[RSVP/7/1] 00:24:18, metric 0
                    > to 10.0.3.1 via fe-0/0/3.13, label-switched-path AR3-to-AR1

darreno:AR3> show route 20.20.20.20

inet.0: 17 destinations, 17 routes (17 active, 0 holddown, 0 hidden)
+ = Active Route, - = Last Active, * = Both

20.20.20.20/32     *[BGP/170] 01:03:27, localpref 100, from 1.1.1.1
                      AS path: I
                    > to 10.0.3.1 via fe-0/0/3.13, label-switched-path AR3-to-AR1

The nice thing is that the route table shows the interface to actually be ‘label-switched-path AR3-to-AR1’

Only BGP will use this inet.3 table by default. This means that OSPF won’t use the LSP itself unless we tell it to:

darreno:AR3> show route 1.1.1.1

inet.0: 17 destinations, 17 routes (17 active, 0 holddown, 0 hidden)
+ = Active Route, - = Last Active, * = Both

1.1.1.1/32         *[OSPF/150] 00:26:18, metric 0, tag 0
                    > to 10.0.3.1 via fe-0/0/3.13

inet.3: 1 destinations, 1 routes (1 active, 0 holddown, 0 hidden)
+ = Active Route, - = Last Active, * = Both

1.1.1.1/32         *[RSVP/7/1] 00:26:14, metric 0
                    > to 10.0.3.1 via fe-0/0/3.13, label-switched-path AR3-to-AR1

My IGPs will use regular unlabeled packets, while BGP using 1.1.1.1 as a next-hop will use the LSP. You can see my regular traceroute below will use the OSPF unlabeled route while a trace to the BGP route will be labeled:

darreno:AR3> traceroute 1.1.1.1
traceroute to 1.1.1.1 (1.1.1.1), 30 hops max, 40 byte packets
 1  10.0.3.1 (10.0.3.1)  1.133 ms  5.565 ms  0.808 ms
 2  1.1.1.1 (1.1.1.1)  1.078 ms  1.177 ms  1.064 ms

darreno:AR3> traceroute 20.20.20.20
traceroute to 20.20.20.20 (20.20.20.20), 30 hops max, 40 byte packets
 1  10.0.3.1 (10.0.3.1)  1.504 ms  1.302 ms  1.140 ms
     MPLS Label=299776 CoS=0 TTL=1 S=1
 2  20.20.20.20 (20.20.20.20)  1.105 ms  2.046 ms  1.260 ms

We can also dive a bit deeper into the forwarding table to seewhat will happen. This is similar to checking the CEF table on IOS

darreno:AR3> show route forwarding-table destination 20.20.20.20
Logical system: AR3
Routing table: default.inet
Internet:
Destination        Type RtRef Next hop           Type Index NhRef Netif
20.20.20.20/32     user     0                    indr 262142     2
                              10.0.3.1          Push 299776   832     2 fe-0/0/3.13

The forwarding table says that it will impose a label value of 299776 onto the frame, which is exactly what we saw in the traceroute earlier.

JunOS explicit paths

So let’s get our hands dirty with explicit paths. I’ll also be enabling OSPF-TE for this part (though it’s not actually required for explicit paths)

AR3
label-switched-path AR3-to-AR1 {
    to 1.1.1.1;
    primary through-CR3-CR2;
}
path through-CR3-CR2 {
    10.0.7.1 strict;
    10.0.6.1 strict;
    10.0.5.1 strict;

darreno:AR3> show configuration protocols ospf
traffic-engineering;

Very similar to IOS. We specify the explicit next hops along the path and ensure the LSP is using this named path as a primary. Now we should see a traceroute go over 2 labeled hops:

darreno:AR3> traceroute 20.20.20.20
traceroute to 20.20.20.20 (20.20.20.20), 30 hops max, 40 byte packets
 1  10.0.7.1 (10.0.7.1)  1.448 ms  1.543 ms  1.181 ms
     MPLS Label=299792 CoS=0 TTL=1 S=1
 2  10.0.6.1 (10.0.6.1)  1.195 ms  1.346 ms  1.209 ms
     MPLS Label=299792 CoS=0 TTL=1 S=1
 3  20.20.20.20 (20.20.20.20)  10.419 ms  1.272 ms  1.144 ms

Which is exactly what we see.

JunOS Type-10 OSPF LSA

We’ve also enabled OSPF TE so let’s take a look at the OSPF database:

darreno:AR1> show ospf database | match opa
OpaqArea*1.0.0.1          1.1.1.1          0x80000003   326  0x22 0x2cf8  28
OpaqArea 1.0.0.1          2.2.2.2          0x80000002   328  0x22 0x32eb  28
OpaqArea 1.0.0.1          3.3.3.3          0x80000002   327  0x22 0x36df  28
OpaqArea 1.0.0.1          4.4.4.4          0x80000002   327  0x22 0x3ad3  28
OpaqArea 1.0.0.1          5.5.5.5          0x80000002   328  0x22 0x3ec7  28
OpaqArea*1.0.0.3          1.1.1.1          0x80000003   326  0x22 0x4f8d 124
OpaqArea 1.0.0.3          2.2.2.2          0x80000002   328  0x22 0x3699 124
OpaqArea 1.0.0.3          3.3.3.3          0x80000002   327  0x22 0x33a1 124
OpaqArea 1.0.0.3          4.4.4.4          0x80000002   327  0x22 0xab1f 124
OpaqArea 1.0.0.3          5.5.5.5          0x80000002   328  0x22 0xbd07 124
OpaqArea*1.0.0.4          1.1.1.1          0x80000003   326  0x22 0xdbf9 124
OpaqArea 1.0.0.4          2.2.2.2          0x80000002   328  0x22 0x6373 124
OpaqArea 1.0.0.4          3.3.3.3          0x80000002   327  0x22 0x27ac 124
OpaqArea 1.0.0.4          4.4.4.4          0x80000002   327  0x22 0xb513 124
OpaqArea 1.0.0.4          5.5.5.5          0x80000002   328  0x22 0xb50e 124

And again if we dig deeper into the LSA we can see actual bandwidths, priorities, and reservations:

darreno:AR1> show ospf database opaque-area advertising-router 3.3.3.3 extensive

    OSPF database, Area 0.0.0.0
 Type       ID               Adv Rtr           Seq      Age  Opt  Cksum  Len
OpaqArea 1.0.0.1          3.3.3.3          0x80000002   401  0x22 0x36df  28
  Area-opaque TE LSA
  RtrAddr (1), length 4: 3.3.3.3
  Aging timer 00:53:18
  Installed 00:06:38 ago, expires in 00:53:19, sent 00:06:36 ago
  Last changed 00:09:54 ago, Change count: 1
OpaqArea 1.0.0.3          3.3.3.3          0x80000002   401  0x22 0x33a1 124
  Area-opaque TE LSA
  Link (2), length 100:
    Linktype (1), length 1:
      2
    LinkID (2), length 4:
      10.0.2.2
    LocIfAdr (3), length 4:
      10.0.2.2
    RemIfAdr (4), length 4:
      0.0.0.0
    TEMetric (5), length 4:
      1
    MaxBW (6), length 4:
      100Mbps
    MaxRsvBW (7), length 4:
      100Mbps
    UnRsvBW (8), length 32:
        Priority 0, 100Mbps
        Priority 1, 100Mbps
        Priority 2, 100Mbps
        Priority 3, 100Mbps
        Priority 4, 100Mbps
        Priority 5, 100Mbps
        Priority 6, 100Mbps
        Priority 7, 100Mbps
    Color (9), length 4:
      0
  Aging timer 00:53:18
  Installed 00:06:38 ago, expires in 00:53:19, sent 00:06:36 ago
  Last changed 00:09:54 ago, Change count: 1
OpaqArea 1.0.0.4          3.3.3.3          0x80000002   401  0x22 0x27ac 124
  Area-opaque TE LSA
  Link (2), length 100:
    Linktype (1), length 1:
      2
    LinkID (2), length 4:
      10.0.3.1
    LocIfAdr (3), length 4:
      10.0.3.1
    RemIfAdr (4), length 4:
      0.0.0.0
    TEMetric (5), length 4:
      1
    MaxBW (6), length 4:
      100Mbps
    MaxRsvBW (7), length 4:
      100Mbps
    UnRsvBW (8), length 32:
        Priority 0, 100Mbps
        Priority 1, 100Mbps
        Priority 2, 100Mbps
        Priority 3, 100Mbps
        Priority 4, 100Mbps
        Priority 5, 100Mbps
        Priority 6, 100Mbps
        Priority 7, 100Mbps
    Color (9), length 4:
      0
  Aging timer 00:53:18
  Installed 00:06:38 ago, expires in 00:53:19, sent 00:06:36 ago
  Last changed 00:09:54 ago, Change count: 1

The biggest difference here is that the JunOS default behavior is far more relevant to what a lot of ISP cores would actually be like. Basically I would generally only want my BGP next-hop to be labeled. The majority of regular MPLS/L3 MPLS/VPLS etc all use BGP and hence all traffic over those technologies will be labeled, while regular IP traffic internal to the network continues to operate as regular IP traffic.

IOS is a lot more label happy and will create loads of labels for all kinds of prefixes. You can of course change the behavior from both vendors.

I think it’s also a lot easier in JunOS to troubleshoot as you have a separate route table specifically used as a BGP next-hop resolver.

MPLS-TE via RSVP – Part 1 of 3 – Cisco IOS

I’m going to have to split this topic into three separate posts because otherwise it’ll just be too long and I’ll lose you halfway through.
Part 1 – Cisco IOS
Part 2 – Juniper JunOS
Part 3 – Brocade Netiron XMR/MLX
Part 4 – Cisco IOS-XR

Most people I speak to who have MPLS experience is usually experienced with LDP. Most probably because it’s easy and they have no need for traffic engineering.

However in the ISP space, the vast majority of MPLS cores run RSVP-TE. Not only does it give you traffic-engineering capabilities, it also gives you features like fast-reroute and hot standby LSPs. You can also use your IGP to carry TE extensions, but only link-state protocols will do this for you. i.e. you can forget about EIGRP doing anything good for you in an ISP core.

Some people tend to think that RSVP-TE is difficult, but really it’s not that difficult at all. Once you get over the initial hurdles you’ll see how powerful it can be. I have extensive Brocade Netiron RSVP-TE experience, a fair amount of JunOS RSVP-TE experience and hardly any IOS RSVP-TE experience. This is because my current core is all Brocade and Juniper. Unfortunately I can only test RSVP-TE on IOS and not IOS-XR as I don’t have any IOS-XR boxes available for me to test on. It’s far more likely that an ISP core would be running IOS-XR over IOS.

Let’s take the following topology into consideration that I’ll be using for all vendor makes. AR1 and AR3 are my ‘edge’ routers running iBGP with each other. They are each advertising a second loopback address to each over over BGP. CR1, CR2, and CR3 are my core routers not running any BGP at all.

IOS basic config

Let’s start with the core network first. I’m pasting the relevant pieces of config here of CR1. CR2 and CR3 are going to be very similar:

mpls traffic-eng tunnels
!
interface Loopback0
 ip address 3.3.3.3 255.255.255.255
 ip ospf 1 area 0
!
interface Serial1/0
 ip address 10.2.0.2 255.255.255.0
 ip ospf 1 area 0
 mpls traffic-eng tunnels
!
interface Serial1/2
 ip address 10.3.0.1 255.255.255.0
 ip ospf 1 area 0
 mpls traffic-eng tunnels
!
router ospf 1
 mpls traffic-eng router-id Loopback0
 mpls traffic-eng area 0

AR1:

mpls traffic-eng tunnels
!
interface Loopback0
 ip address 2.2.2.2 255.255.255.255
 ip ospf 1 area 0
!
interface Loopback20
 ip address 20.20.20.20 255.255.255.255
!
interface Tunnel0
 ip unnumbered Loopback0
 tunnel destination 4.4.4.4
 tunnel mode mpls traffic-eng
 tunnel mpls traffic-eng path-option 5 dynamic
 no routing dynamic
!
interface Serial1/0
 ip address 10.2.0.1 255.255.255.0
 ip ospf 1 area 0
 mpls traffic-eng tunnels
!
router ospf 1
 mpls traffic-eng router-id Loopback0
 mpls traffic-eng area 0
 router-id 2.2.2.2
!
router bgp 13
 network 20.20.20.20 mask 255.255.255.255
 neighbor 4.4.4.4 remote-as 13
 neighbor 4.4.4.4 update-source Loopback0
 no auto-summary

AR3 has a similar config to AR1, so I’m not going to list it here. Essentially what we’ve done is enabled mpls traffic-engineering globally, enabled it on the transit interfaces, and finally enabled OSPF-TE in OSPF. The AR routers have an iBGP connection to each other. There is no need to enable MPLS IP anywhere as that actually enables LDP.

Now that my tunnels are up, let’s try and ping a BGP learned route and see what happens:

AR3#ping 20.20.20.20

Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 20.20.20.20, timeout is 2 seconds:
U.U.U
Success rate is 0 percent (0/5)

This won’t work because IOS won’t actually use this tunnel for any routing unless I specifically allow it. I could do static routing or PBR, but why not just let the routing protocol do the work?

interface Tunnel0
 tunnel mpls traffic-eng autoroute announce

This command allows the IGP to use the tunnel in it’s tree calculation. Let’s take a look at whether it works now or not:

AR3#ping 20.20.20.20       

Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 20.20.20.20, timeout is 2 seconds:
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 20/37/44 ms

Let’s take a look at the route and CEF table:

AR3#sh ip route 2.2.2.2    
Routing entry for 2.2.2.2/32
  Known via "ospf 1", distance 110, metric 129, type intra area
  Routing Descriptor Blocks:
  * directly connected, via Tunnel0
      Route metric is 129, traffic share count is 1

AR3#sh ip cef 20.20.20.20
20.20.20.20/32, version 12, epoch 0
0 packets, 0 bytes
  tag information from 2.2.2.2/32, shared
    local tag: tunnel-head
    fast tag rewrite with Tu0, point2point, tags imposed: {16}
  via 2.2.2.2, 0 dependencies, recursive
    next hop 2.2.2.2, Tunnel0 via 2.2.2.2/32
    valid adjacency
    tag rewrite with Tu0, point2point, tags imposed: {16}

In order to get to 2.2.2.2 which is the next-hop, it will send the traffic through the LSP tunnel. If we check the CEF table we can see that traffic will be directed towards the tunnel and have the label value of 16 imposed onto it. We can ensure this is correct with a traceroute:

AR3#traceroute 20.20.20.20

Type escape sequence to abort.
Tracing the route to 20.20.20.20

  1 10.3.0.1 [MPLS: Label 16 Exp 0] 36 msec 28 msec 44 msec
  2 10.2.0.1 44 msec 44 msec * 

It’s exactly what we see. Also note that the tunnel is actually following the shortest IGP path at the moment. This is because in the above config we told the ARs to signal the path dynamically. This means it’ll follow the IGP best path. Which will lead us onto our next section.

IOS explicit paths

We can tell IOS that we actually want to use the CR2-CR3 path instead of just learning this information dynamically. We now want to use CR2 and CR3 in the path and not CR1. We can do this in two ways depending on the topology. Either I tell my ingress router that it should follow a very specific path, or I just tell the ingress router to specifically miss a particular node. As LSPs are unidirectional, let’s try both.

AR1:

ip explicit-path name through-CR2-CR3 enable
 next-address 10.5.0.2 
 next-address 10.6.0.2 
 next-address 10.7.0.2 
!
interface Tunnel0
 tunnel mpls traffic-eng path-option 4 explicit name through-CR2-CR3
 tunnel mpls traffic-eng path-option 5 dynamic

AR3:

ip explicit-path name not-through-CR1 enable
 exclude-address 10.3.0.1
!
interface Tunnel0
 tunnel mpls traffic-eng path-option 4 explicit name not-through-CR1
 tunnel mpls traffic-eng path-option 5 dynamic
AR1#traceroute 40.40.40.40      

Type escape sequence to abort.
Tracing the route to 40.40.40.40

  1 10.5.0.2 [MPLS: Label 17 Exp 0] 60 msec 64 msec 72 msec
  2 10.6.0.2 [MPLS: Label 17 Exp 0] 64 msec 60 msec 48 msec
  3 10.7.0.2 68 msec 56 msec * 


AR3#traceroute 20.20.20.20

Type escape sequence to abort.
Tracing the route to 20.20.20.20

  1 10.7.0.1 [MPLS: Label 16 Exp 0] 48 msec 64 msec 64 msec
  2 10.6.0.1 [MPLS: Label 16 Exp 0] 76 msec 48 msec 72 msec
  3 10.5.0.1 64 msec *  60 msec

This is a pretty small topology, so by telling AR3 to skip CR1, there is only 1 other path available. So we create the explicit paths on each ingress router, and then under the tunnel interface we specify that this explicit path is more preferred than the dynamic path. Either way works and you can see from the traceroutes above that both work. The dynamic path is still left under the tunnel interface as we would still like to use it if the CR2-CR3 path becomes unavailable.

IOS Type-10 OSPF LSA

MPLS-TE extensions are carried within OSPF type-10 opaque LSAs. These LSAs have area flooding scope and hence they do not pass through multi-area OSPF. Another reason why ISP cores don’t run multi-area OSPF. You can see the LSAs in the database:

AR1#sh ip ospf database | begin Type-10
		Type-10 Opaque Link Area Link States (Area 0)

Link ID         ADV Router      Age         Seq#       Checksum Opaque ID
1.0.0.0         2.2.2.2         887         0x80000002 0x005AC6 0       
1.0.0.0         3.3.3.3         173         0x80000003 0x005CBB 0       
1.0.0.0         4.4.4.4         557         0x80000002 0x0062AE 0       
1.0.0.0         22.22.22.22     385         0x80000002 0x00AAD5 0       
1.0.0.0         33.33.33.33     319         0x80000002 0x00D651 0       
1.0.0.2         2.2.2.2         172         0x80000004 0x004EFC 2       
1.0.0.2         3.3.3.3         173         0x80000003 0x00704A 2       
1.0.0.2         4.4.4.4         174         0x80000004 0x004AF6 2       
1.0.0.2         22.22.22.22     128         0x80000002 0x008CDD 2       
1.0.0.2         33.33.33.33     76          0x80000002 0x00EFFB 2       
1.0.0.3         2.2.2.2         111         0x80000002 0x0025D4 3       
1.0.0.3         3.3.3.3         173         0x80000003 0x00535C 3       
1.0.0.3         4.4.4.4         306         0x80000002 0x001918 3       
1.0.0.3         22.22.22.22     128         0x80000002 0x00C228 3       
1.0.0.3         33.33.33.33     319         0x80000002 0x0064CC 3   

If we dig deeper into the LSA originated by CR1 we can see the following:

AR1#sh ip ospf database opaque-area adv-router 3.3.3.3

            OSPF Router with ID (2.2.2.2) (Process ID 1)

		Type-10 Opaque Link Area Link States (Area 0)

  LS age: 310
  Options: (No TOS-capability, DC)
  LS Type: Opaque Area Link
  Link State ID: 1.0.0.0
  Opaque Type: 1
  Opaque ID: 0
  Advertising Router: 3.3.3.3
  LS Seq Number: 80000003
  Checksum: 0x5CBB
  Length: 28
  Fragment number : 0

    MPLS TE router ID : 3.3.3.3

    Number of Links : 0

  LS age: 310
  Options: (No TOS-capability, DC)
  LS Type: Opaque Area Link
  Link State ID: 1.0.0.2
  Opaque Type: 1
  Opaque ID: 2
  Advertising Router: 3.3.3.3
  LS Seq Number: 80000003
  Checksum: 0x704A
  Length: 132
  Fragment number : 2
          
    Link connected to Point-to-Point network
      Link ID : 2.2.2.2
      Interface Address : 10.2.0.2
      Neighbor Address : 10.2.0.1
      Admin Metric : 64
      Maximum bandwidth : 193000
      Maximum reservable bandwidth : 0
      Number of Priority : 8
      Priority 0 : 0           Priority 1 : 0         
      Priority 2 : 0           Priority 3 : 0         
      Priority 4 : 0           Priority 5 : 0         
      Priority 6 : 0           Priority 7 : 0         
      Affinity Bit : 0x1
      IGP Metric : 64

    Number of Links : 1

  LS age: 310
  Options: (No TOS-capability, DC)
  LS Type: Opaque Area Link
  Link State ID: 1.0.0.3
  Opaque Type: 1
  Opaque ID: 3
  Advertising Router: 3.3.3.3
  LS Seq Number: 80000003
  Checksum: 0x535C
  Length: 132
  Fragment number : 3

    Link connected to Point-to-Point network
      Link ID : 4.4.4.4
      Interface Address : 10.3.0.1
      Neighbor Address : 10.3.0.2
      Admin Metric : 64
      Maximum bandwidth : 193000
      Maximum reservable bandwidth : 0
      Number of Priority : 8
      Priority 0 : 0           Priority 1 : 0         
      Priority 2 : 0           Priority 3 : 0         
      Priority 4 : 0           Priority 5 : 0         
      Priority 6 : 0           Priority 7 : 0         
      Affinity Bit : 0x1
      IGP Metric : 64
          
    Number of Links : 1

You can see that it will show all your links, any affinities set, max reserved bandwidths, and any currently used bandwidths for different priorities.

I could go on for many hours showing various MPLS features but then I’ll never finish this article.

In the next part I’ll be doing JunOS showing the same features and config as I showed above. In the final part I’ll be doing the same for Brocade Netiron.

Setting up a FreeRadius test lab (HOWTO)

It’s quite handy to have one of these labs to test your radius configs, especially in the ISP world. This is mainly for testing radius attributes as it’s very easy to get a Cisco box to actually be a regular PPPoE server.

I have an old 7200 NPE-300 connected to a virtual machine running in VMware

I’m running Ubuntu server 12.04 so installing freeradius is pretty painless:

[email protected]:~$ sudo apt-get install freeradius

Now we need to configure the box. Just a few files need to be edited for our environment. I won’t go over every single part of radiusd.conf, only the things I made changes to:

[email protected]:/etc/freeradius$ sudo vi radius.conf

listen {
        type = auth
        ipaddr = 10.80.1.1
        port = 1645

}

listen {
        ipaddr = 10.80.1.1
        port = 1646
        type = acct
}

log {
        destination = files
        file = ${logdir}/radius.log
        syslog_facility = daemon
        stripped_names = no
        auth = yes
        auth_badpass = yes
        auth_goodpass = yes
}

It’s always good to have a fair amount of logging, especially in a lab.

We also need to tell the FreeRadius server that a radius client will be coming in and making authentication requests. We also choose a password here:

[email protected]:/etc/freeradius$ sudo vi clients.conf
client 10.80.1.2 {
        secret          = radiuspassword
        shortname       = 10.80.1.2
        nastype         = cisco
}

Short and sweet

Finally the actual username, passwords, IPs, attributes, etc are all stored in the users file. For now let’s just create a short single entry:

[email protected]:/etc/freeradius$ sudo vi users

testuser     Password = "password"
        Framed-IP-Address = 192.168.1.100

Now onto the 7200. The 7200 and FreeRadius server are directly connected in this lab, but in the real world all they need is IP connectivity to each other.

aaa group server radius RADIUS_SERVER
 server 10.80.1.1 auth-port 1645 acct-port 1646
!
aaa authentication ppp CPE_USER group RADIUS_SERVER
aaa authorization network default group RADIUS_SERVER
!
vpdn enable
!
bba-group pppoe LAB
 virtual-template 1
 sessions per-mac limit 20
 sessions per-vlan limit 250
!
interface Loopback0
 ip address 200.200.200.200 255.255.255.255
!
interface FastEthernet0/0
 description Link to FreeRadius server
 ip address 10.80.1.2 255.255.255.0
 duplex full
!
interface FastEthernet1/0
 description PPPOE interface
 no ip address
 duplex full
 pppoe enable group LAB
!
interface Virtual-Template1
 ip unnumbered Loopback0
 no peer default ip address
 ppp authentication chap CPE_USER
!
radius-server host 10.80.1.1 auth-port 1645 acct-port 1646 key radiuspassword

I’ve used a radius group which allows you to add more radius servers and test fail-over scenarios.

For a test device I’ve just configured a 2801 like so:

interface FastEthernet0/0
 no ip address
 duplex auto
 speed auto
 pppoe enable group global
 pppoe-client dial-pool-number 1
!
interface Dialer1
 mtu 1492
 ip address negotiated
 encapsulation ppp
 dialer pool 1
 ppp chap hostname testuser
 ppp chap password 0 password

Let’s give it a quick test. I’ve enabled logging on the radius server to see what’s going on. Let me enable the 2801’s PPPoE interface and see if the radius server sees the authentication request coming in:

[email protected]:/etc/freeradius$ tail -f /var/log/freeradius/radius.log
Mon Oct  1 21:24:23 2012 : Auth: Login OK: [testuser/] (from client 10.80.1.2 port 0)

So that’s all fine. Did my router pick up the correct IP address?

c2801#sh int dialer 1
Dialer1 is up, line protocol is up (spoofing)
  Hardware is Unknown
  Internet address is 192.168.1.100/32
  MTU 1492 bytes, BW 56 Kbit/sec, DLY 20000 usec,
     reliability 255/255, txload 1/255, rxload 1/255
  Encapsulation PPP, LCP Closed, loopback not set
  Keepalive set (10 sec)
  DTR is pulsed for 1 seconds on reset
  Interface is bound to Vi2
  Last input never, output never, output hang never
  Last clearing of "show interface" counters 05:13:33
  Input queue: 0/75/0/0 (size/max/drops/flushes); Total output drops: 0
  Queueing strategy: weighted fair
  Output queue: 0/1000/64/0 (size/max total/threshold/drops)
     Conversations  0/0/16 (active/max active/max total)
     Reserved Conversations 0/0 (allocated/max allocated)
     Available Bandwidth 42 kilobits/sec
  5 minute input rate 0 bits/sec, 0 packets/sec
  5 minute output rate 0 bits/sec, 0 packets/sec
     1017 packets input, 103010 bytes
     4703 packets output, 173178 bytes
Bound to:
Virtual-Access2 is up, line protocol is up
  Hardware is Virtual Access interface
  MTU 1492 bytes, BW 56 Kbit/sec, DLY 20000 usec,
     reliability 255/255, txload 1/255, rxload 1/255
  Encapsulation PPP, LCP Open
  Stopped: CDPCP
  Open: IPCP
  PPPoE vaccess, cloned from Dialer1
  Vaccess status 0x44, loopback not set
  Keepalive set (10 sec)
  Interface is bound to Di1 (Encapsulation PPP)
  Last input 00:00:01, output never, output hang never
  Last clearing of "show interface" counters 00:01:55
  Input queue: 0/75/0/0 (size/max/drops/flushes); Total output drops: 0
  Queueing strategy: fifo
  Output queue: 0/40 (size/max)
  5 minute input rate 0 bits/sec, 0 packets/sec
  5 minute output rate 0 bits/sec, 0 packets/sec
     27 packets input, 387 bytes, 0 no buffer
     Received 0 broadcasts (0 IP multicasts)
     0 runts, 0 giants, 0 throttles
     0 input errors, 0 CRC, 0 frame, 0 overrun, 0 ignored, 0 abort
     26 packets output, 378 bytes, 0 underruns
     0 output errors, 0 collisions, 0 interface resets
     0 unknown protocol drops
     0 output buffer failures, 0 output buffers swapped out
     0 carrier transitions

c2801#show ip route connected | beg Ga
Gateway of last resort is not set

      192.168.1.0/32 is subnetted, 1 subnets
C        192.168.1.100 is directly connected, Dialer1
      200.200.200.0/32 is subnetted, 1 subnets
C        200.200.200.200 is directly connected, Dialer1


c2801#ping 200.200.200.200
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 200.200.200.200, timeout is 2 seconds:
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 1/2/4 ms

These are PPP links and hence the 7200 and 2801 have swapped host routes. This is why they can get to each other. We can also check form the 7200 side:

c7200#sh ip route 192.168.1.100
Routing entry for 192.168.1.100/32
  Known via "connected", distance 0, metric 0 (connected, via interface)
  Routing Descriptor Blocks:
  * directly connected, via Virtual-Access1.1
      Route metric is 0, traffic share count is 1

c7200#ping 192.168.1.100

Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 192.168.1.100, timeout is 2 seconds:
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 1/1/4 ms

So everything is working just as expected.

The whole point of radius attributes is to be able to do all kinds of fancy things. Let’s say that this 2801 has another network behind it that the rest of our network needs to be able to get to through the BRAS box. An easy way is to get the 7200 to install a static route to the network behind the 2801 that gets installed when the router dials in. Let’s use a loopback on the 2801 for this purpose:

interface Loopback1
 ip address 40.40.40.40 255.255.255.255

going back to the users files in radius above we do the following:

testuser     Password = "password"
        Framed-IP-Address = 192.168.1.100,
        Cisco-Avpair += "ip:route=40.40.40.40 255.255.255.255"

Let’s clear the pppoe session and take a look at the 7200:

c7200#sh ip route 40.40.40.40
Routing entry for 40.40.40.40/32
  Known via "static", distance 1, metric 0
  Routing Descriptor Blocks:
  * 192.168.1.100
      Route metric is 0, traffic share count is 1

c7200#ping 40.40.40.40

Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 40.40.40.40, timeout is 2 seconds:
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 1/1/4 ms

As this is a static route to a connected route, the 7200 can redistribute the routes into the IGP so the rest of your network can get to it. Notice that when I reload the 2801 and the session is pulled down, the static route is removed:

c7200#sh ip route 40.40.40.40
% Network not in table

There are a TON of radius attributes. If I have the time I may go over a few handy ones with which you can create some powerful routing policies.