Are you sure it’s the shortest path? – OSPF Multi Area issues

So does OSPF always use the shortest path in in order to ensure that packets always get from A to B with the lowest end to end cost? Not always. In fact when you have more than a single area it’s very easy to NOT go the shortest path at all. You could even turn your ‘non-transit’ 10Mb links into transit links.

Let’s take the following network as an example:

R3 represents our core. R1 and R2 are both aggregation boxes where all our customers connect to. These boxes are connected into the core with their Gig links. R4 is our first customer. Mr customer wants a primary Gig link with a 100Mb backup link. We have decided to put each customer into their own OSPF area. We will also be changing the auto-cost reference bandwidth to 100Gb to ensure our core sees the difference between 100Mb and Gig links:

R3
interface Loopback0
 ip address 3.3.3.3 255.255.255.255
 ip ospf 1 area 0

interface GigabitEthernet1/0
 ip address 10.0.13.3 255.255.255.0
 ip ospf 1 area 0
!
interface GigabitEthernet2/0
 ip address 10.0.23.3 255.255.255.0
 ip ospf 1 area 0
!         
router ospf 1
 router-id 3.3.3.3
 auto-cost reference-bandwidth 100000
R1
interface GigabitEthernet2/0
 ip address 10.0.13.1 255.255.255.0
 ip ospf 1 area 0
!
interface GigabitEthernet1/0
 ip address 10.0.14.1 255.255.255.0
 ip ospf 1 area 4
!         
router ospf 1
 router-id 1.1.1.1
 auto-cost reference-bandwidth 100000
R2
interface GigabitEthernet2/0
 ip address 10.0.23.2 255.255.255.0
 ip ospf 1 area 0
!
interface FastEthernet1/0
 ip address 10.0.24.2 255.255.255.0
 ip ospf 1 area 4
!
router ospf 1
 router-id 2.2.2.2
 auto-cost reference-bandwidth 100000
R4
interface Loopback0
 ip address 4.4.4.4 255.255.255.255
 ip ospf 1 area 4
!         
interface GigabitEthernet1/0
 ip address 10.0.14.4 255.255.255.0
 ip ospf 1 area 4
!
interface FastEthernet2/0
 ip address 10.0.24.4 255.255.255.0
 ip ospf 1 area 4
!
router ospf 1
 router-id 4.4.4.4
 auto-cost reference-bandwidth 100000

Our core should now see that the best way to get to R4’s loopback is to go through R1:

3#sh ip route 4.4.4.4
Routing entry for 4.4.4.4/32
  Known via "ospf 1", distance 110, metric 201, type inter area
  Last update from 10.0.13.1 on GigabitEthernet1/0, 00:09:44 ago
  Routing Descriptor Blocks:
  * 10.0.13.1, from 1.1.1.1, 00:09:44 ago, via GigabitEthernet1/0
      Route metric is 201, traffic share count is 1
R3#traceroute 4.4.4.4
Type escape sequence to abort.
Tracing the route to 4.4.4.4
VRF info: (vrf in name/id, vrf out name/id)
  1 10.0.13.1 8 msec 20 msec 16 msec
  2 10.0.14.4 16 msec *  20 msec

Similarly R4 should see that the best way to get to R3 is back through R1:

R4#sh ip route 3.3.3.3
Routing entry for 3.3.3.3/32
  Known via "ospf 1", distance 110, metric 201, type inter area
  Last update from 10.0.14.1 on GigabitEthernet1/0, 00:03:47 ago
  Routing Descriptor Blocks:
  * 10.0.14.1, from 1.1.1.1, 00:03:47 ago, via GigabitEthernet1/0
      Route metric is 201, traffic share count is 1
R4#traceroute 3.3.3.3
Type escape sequence to abort.
Tracing the route to 3.3.3.3
VRF info: (vrf in name/id, vrf out name/id)
  1 10.0.14.1 16 msec 16 msec 20 msec
  2 10.0.13.3 20 msec *  20 msec

So everything is fine. Or so we think. There is already a problem here, but it won’t cause a problem until we bring in another customer. Let’s add 2 customers. The first is connected to R1 and the second is connected to R2. Both of these customers have purchased 100Mb single links.

So, traffic sent from R4’s loopback to either of the 2 new customers loopbacks should get into the core via R4’s 1Gb primary link. Is that what we see?

R4
R4#traceroute 5.5.5.5 source 4.4.4.4
Type escape sequence to abort.
Tracing the route to 5.5.5.5
VRF info: (vrf in name/id, vrf out name/id)
  1 10.0.14.1 16 msec 20 msec 20 msec
  2 10.0.15.5 20 msec *  24 msec
R4#
R4#traceroute 6.6.6.6 source 4.4.4.4
Type escape sequence to abort.
Tracing the route to 6.6.6.6
VRF info: (vrf in name/id, vrf out name/id)
  1 10.0.14.1 12 msec 20 msec 16 msec
  2 10.0.13.3 20 msec 60 msec 20 msec
  3 10.0.23.2 40 msec 44 msec 40 msec
  4 10.0.26.6 72 msec *  44 msec

That’s exactly what we see, but do we have the full picture here? Let’s trace from these new customers to R4’s loopback. Again both should go over R4’s 1Gb primary link:

R5
R5#traceroute 4.4.4.4 source 5.5.5.5
Type escape sequence to abort.
Tracing the route to 4.4.4.4
VRF info: (vrf in name/id, vrf out name/id)
  1 10.0.15.1 8 msec 20 msec 16 msec
  2 10.0.14.4 20 msec *  24 msec

R5 is correct. What about R6?

R6
R6#traceroute 4.4.4.4 source 6.6.6.6
Type escape sequence to abort.
Tracing the route to 4.4.4.4
VRF info: (vrf in name/id, vrf out name/id)
  1 10.0.26.2 20 msec 16 msec 20 msec
  2 10.0.24.4 64 msec *  68 msec

Well this is most certainly NOT correct. Why is this traceroute going through R4’s 100Mb backup link? Let’s go back to the beginning and see what we missed. Let’s have a look at the 3 core routers to see how they all want to get to 4.4.4.4:

R3
R3#sh ip route 4.4.4.4
Routing entry for 4.4.4.4/32
  Known via "ospf 1", distance 110, metric 201, type inter area
  Last update from 10.0.13.1 on GigabitEthernet1/0, 00:29:17 ago
  Routing Descriptor Blocks:
  * 10.0.13.1, from 1.1.1.1, 00:29:17 ago, via GigabitEthernet1/0
      Route metric is 201, traffic share count is 1
R1
R1#sh ip route 4.4.4.4
Routing entry for 4.4.4.4/32
  Known via "ospf 1", distance 110, metric 101, type intra area
  Last update from 10.0.14.4 on GigabitEthernet1/0, 00:37:44 ago
  Routing Descriptor Blocks:
  * 10.0.14.4, from 4.4.4.4, 00:37:44 ago, via GigabitEthernet1/0
      Route metric is 101, traffic share count is 1
R2
R2#sh ip route 4.4.4.4
Routing entry for 4.4.4.4/32
  Known via "ospf 1", distance 110, metric 1001, type intra area
  Last update from 10.0.24.4 on FastEthernet1/0, 00:38:47 ago
  Routing Descriptor Blocks:
  * 10.0.24.4, from 4.4.4.4, 00:38:47 ago, via FastEthernet1/0
      Route metric is 1001, traffic share count is 1

Here is the problem. R2 prefers to get to 4.4.4.4 over it’s directly connected link, even though the metric through R3 would be 401, a whole lot less than 1001.

The issue is that OSPF has it’s own selection process. Regardless of metric, OSPF will ALWAYS prefer intra area routes over inter area routes over external routes. R2 has an interface in Area 4, the same area in which it’s learning about R4’s loopback. Hence when traffic addressed to 4.4.4.4 passes through it, it will always send it off over it’s area 4 interface, no matter how slow it is. It doesn’t make any difference if the second customer is in area 0 or their own area.

In fact, if you dive a bit deeper, you can see that as far as R6 is concerned, the traffic will be going over R4’s primary link. If you see the interface cost of R6’s link as well as the cost end to end this is what you get:

R6
R6#sh ip os int brief | include Fa1/0
Fa1/0        1     0               10.0.26.6/24       1000  BDR   1/1
R6#                                  
R6#
R6#sh ip route 4.4.4.4               
Routing entry for 4.4.4.4/32
  Known via "ospf 1", distance 110, metric 1301, type inter area
  Last update from 10.0.26.2 on FastEthernet1/0, 00:19:15 ago
  Routing Descriptor Blocks:
  * 10.0.26.2, from 1.1.1.1, 00:19:15 ago, via FastEthernet1/0
      Route metric is 1301, traffic share count is 1

What about R2’s active route cost?

R2
R2#sh ip route 4.4.4.4
Routing entry for 4.4.4.4/32
  Known via "ospf 1", distance 110, metric 1001, type intra area
  Last update from 10.0.24.4 on FastEthernet2/0, 00:43:29 ago
  Routing Descriptor Blocks:
  * 10.0.24.4, from 4.4.4.4, 00:43:29 ago, via FastEthernet2/0
      Route metric is 1001, traffic share count is 1

So R6 thinks that traffic will actually go over it’s 1000 cost link, then over the 3 X 100 cost Gig links. But R2 effectively ‘highjacks’ this traffic to send it over it’s direct area 4 link.

So, how can this be fixed?

The first way is to just put everything in area 0. This way all addresses will be reachable via inter area links in area 0. Even if you injected all prefixes in via redistribution or route-policy they’ll all be external, but still reachable through area 0 links.

The second way is to create some sort of tunnel between R1 and R2 and put that tunnel interface into area 4. This way R2 would learn about R4’s loopback over 2 area 4 interfaces. You would need to ensure this tunnel interface has a lower cost than the 100Mb direct connection to R4 in order for traffic to actually be preferred. But who really wants to be creating tunnels over the core of their network? Virtual-links can only be used to connect to area 0, not area 4. Sham links? Can only be used with MPLS.

The third way is thinking outside the box a little. You could use PPPoE over the secondary link and not use OSPF on the link. On R4 you would have a floating static route pointing towards the dialer interface. The actual radius account you use would create a static route to R4’s loopback with a next-hop of the p2p PPPoE link. Ensure the static route is created with a AD higher than OSPF to ensure it’ll use the OSPF link if available.

The fourth way is to just use another protocol connecting the core to the CPE device. BGP perhaps?

The fifth, final, and ties with option 1 for simplicity’s sake is using RFC 5185 – OSPF Multi-Area Adjacency. What this RFC states is the ability to put a routers interface into more than a single OSPF area. This means that I could keep R1 and R2’s links in area 0, but put those same links into area 4. The same would be done for R3. This means that R2 would learn the best from from R1 as an intra area route, without the need for dodgy tunnels. The main problem is that most vendors simply don’t have support for it. Cisco only has it in IOS XE. JUNOS had it since JUNOS 9.4 though. Brocade? No mention of it anywhere yet.

Considering I have some post 9.4 JunOS boxes here, let’s test this out:

R2
> show configuration protocols ospf
reference-bandwidth 10g;
area 0.0.0.0 {
    interface fe-0/0/1.66;
    interface fe-0/0/0.51;
}
area 0.0.0.4 {
    interface fe-1/3/0.16 {
        metric 1000;
    }
    interface fe-0/0/1.66 {
        secondary;
    }
}
R3
> show configuration protocols ospf
reference-bandwidth 10g;
area 0.0.0.0 {
    interface fe-0/0/0.66;
    interface fe-0/0/0.63;
    interface lo0.9;
}
area 0.0.0.4 {
    interface fe-0/0/0.66 {
        secondary;
    }
    interface fe-0/0/0.63 {
        secondary;
    }
}
R1
> show configuration protocols ospf
reference-bandwidth 10g;
area 0.0.0.0 {
    interface fe-0/0/1.63;
    interface fe-1/3/3.79;
}
area 0.0.0.4 {
    interface fe-1/3/0.14;
    interface fe-0/0/1.63 {
        secondary;
    }
}

As you can see, the configuration is pretty simple. You simple add an interface to another area and set it as secondary. Let’s have a look at R2’s neighbours:

> show ospf neighbor
Address          Interface              State     ID               Pri  Dead
10.0.26.6        fe-0/0/0.51            Full      6.6.6.6          128    38
  Area 0.0.0.0
10.0.23.3        fe-0/0/1.66            Full      3.3.3.3          128    38
  Area 0.0.0.0
10.0.23.3        fe-0/0/1.66            Full      3.3.3.3          128    38
  Area 0.0.0.4
10.0.24.4        fe-1/3/0.16            Full      4.4.4.4          128    33
  Area 0.0.0.4

R2 has an adjacency over fe-0/0/1.66 twice. One in Area 0 and one in Area 4. This means it should be learning R4’s loopback as 2 intra-area and 1 inter-area route. It should then choose the path through R3 as it has the better metric:

> show route 4.4.4.4

inet.0: 17 destinations, 17 routes (17 active, 0 holddown, 0 hidden)
+ = Active Route, - = Last Active, * = Both

4.4.4.4/32         *[OSPF/10] 00:18:07, metric 300
                    > to 10.0.23.3 via fe-0/0/1.66

Which is exactly what we see.

Let’s do another traceroute from R6 to confirm:

> traceroute 4.4.4.4
traceroute to 4.4.4.4 (4.4.4.4), 30 hops max, 40 byte packets
 1  10.0.26.2 (10.0.26.2)  1.098 ms  0.965 ms  0.800 ms
 2  10.0.23.3 (10.0.23.3)  0.846 ms  0.943 ms  0.836 ms
 3  10.0.13.1 (10.0.13.1)  0.884 ms  1.036 ms  0.882 ms
 4  4.4.4.4 (4.4.4.4)  1.166 ms  1.328 ms  1.155 ms

Forcing tunnel traffic to only go through certain interfaces

Let’s say you have the following topology:

R3 has 2 loopbacks that R2 is not aware of. You’ve been told that you need to create 2 tunnels between R1 and R3, but the caveat is that traffic to 1 loopback should only ever go over 1 physical interface while traffic to the other loopback should go over the second physical interface only.

Not only this, but port fa0/1 on R1 is connected to an intermediate L2 device

I have configured OSPF on all interfaces except for the tunnel interface and R3’s 2 loopbacks. Let’s now create the 2 tunnels and static routes:

interface Tunnel1
 ip address 1.1.1.1 255.255.255.0
 tunnel source FastEthernet0/1
 tunnel destination 3.3.3.3
!
interface Tunnel2
 ip address 2.2.2.1 255.255.255.0
 tunnel source FastEthernet0/0
 tunnel destination 3.3.3.3
!
interface FastEthernet0/0
 ip address 10.1.12.1 255.255.255.0
 ip ospf 1 area 0

interface FastEthernet0/1
 ip address 10.0.12.1 255.255.255.0
 ip ospf 1 area 0
!
ip route 50.50.50.50 255.255.255.255 Tunnel2
ip route 60.60.60.60 255.255.255.255 Tunnel1

Ok, so the tunnels are created on R1 with static routes pointing to R3’s loopback interface. Are we sure that traffic for Tunnel 1 is actually going over the correct interface? Most of the time yes, but let’s shut down R2’s fa1/0 interface and see what happens.

I’ll be pinging 60.60.60.60 from R1. Traffic should be going out it’s fa0/1 interface only, but it’s not:

R1#sh int fa0/1 | include minute
  5 minute input rate 0 bits/sec, 0 packets/sec
  5 minute output rate 0 bits/sec, 0 packets/sec


R1#sh int fa0/0 | include minute
  5 minute input rate 9000 bits/sec, 9 packets/sec
  5 minute output rate 9000 bits/sec, 9 packets/sec

All traffic is going out the fa0/0 interface, which really makes perfect sense. R1’s fa0/1 interface is still up as the switch it’s connected to is up. This interface is in OSPF and hence R3 can still get to it through R1’s fa0/0 interface. The tunnel source says fa0/1, but this doesn’t mean it HAS to go out that interface. It’s simply the source address used.

So how do we force it? There are 2 ways to do this. We can tell IOS to force tunnel traffic out an interface, or we could use VRFs. Let’s start with forcing:

interface Tunnel1
 ip address 1.1.1.1 255.255.255.0
 tunnel source FastEthernet0/1
 tunnel destination 3.3.3.3
 tunnel route-via FastEthernet0/1 mandatory
!
interface Tunnel2
 ip address 2.2.2.1 255.255.255.0
 tunnel source FastEthernet0/0
 tunnel destination 3.3.3.3
 tunnel route-via FastEthernet0/0 mandatory

Route-via mandatory will ensure IOS only sends traffic out the interface you want it to. IOS has to have a route to the tunnel end point out of that interface. i.e. if the router did not have a valid route to 3.3.3.3 out fa0/1, tunnel 1 would go down. As R2’s fa1/0 interface is still down, tunnel 1 should be down, which it is:

R1#sh ip route 3.3.3.3
Routing entry for 3.3.3.3/32
  Known via "ospf 1", distance 110, metric 3, type intra area
  Last update from 10.1.12.2 on FastEthernet0/0, 00:20:25 ago
  Routing Descriptor Blocks:
  * 10.1.12.2, from 60.60.60.60, 00:20:25 ago, via FastEthernet0/0
      Route metric is 3, traffic share count is 1

R1#sh int tun 1 | include protocol
Tunnel1 is up, line protocol is down 
  Tunnel protocol/transport GRE/IP
     0 unknown protocol drops

Let’s no shut R2’s interface now and see the tunnel come back up:

R2(config)#int fa1/0
R2(config-if)#no shut

R1#sh ip route 3.3.3.3
Routing entry for 3.3.3.3/32
  Known via "ospf 1", distance 110, metric 3, type intra area
  Last update from 10.0.12.2 on FastEthernet0/1, 00:00:10 ago
  Routing Descriptor Blocks:
  * 10.1.12.2, from 60.60.60.60, 00:21:56 ago, via FastEthernet0/0
      Route metric is 3, traffic share count is 1
    10.0.12.2, from 60.60.60.60, 00:00:10 ago, via FastEthernet0/1
      Route metric is 3, traffic share count is 1

R1#sh int tun 1 | include protocol
Tunnel1 is up, line protocol is up 
  Tunnel protocol/transport GRE/IP
     0 unknown protocol drops

The second way is to run VRFs. When you create a tunnel interface you specify a source and destination address under the tunnel interface itself. Those addresses need to be reachable in the vrf in which the tunnel interface is configured. But what if you want the tunnel interface itself to be in the global route table while the end points are in a VRF? This is done via the tunnel vrf command.

interface FastEthernet0/0
 ip vrf forwarding VRF2
 ip address 10.1.12.1 255.255.255.0
 ip ospf 2 area 0
!
interface FastEthernet0/1
 ip vrf forwarding VRF1
 ip address 10.0.12.1 255.255.255.0
 ip ospf 1 area 0

The tunnels themselves have not come up, as 3.3.3.3 is no longer reachable through the global route table. But let’s add the tunnel vrf command:

interface Tunnel1
 ip address 1.1.1.1 255.255.255.0
 tunnel source FastEthernet0/1
 tunnel destination 3.3.3.3
 tunnel vrf VRF1
!
interface Tunnel2
 ip address 2.2.2.1 255.255.255.0
 tunnel source FastEthernet0/0
 tunnel destination 3.3.3.3
 tunnel vrf VRF2

Tunnel vrf will allow you to have the tunnel end points reachable via a VRF table, while the actual tunnel traffic itself is in the global vrf table, or any other vrf.

If of course you wanted the tunnel traffic in the same vrf, you would just configure ip vrf forwarding on the tunnel interface and you would not need tunnel vrf.

Note that as fa0/1 is the only interface in VRF1, if 3.3.3.3 becomes unreachable through that interface would go down. Hence the same objective is achieved :)

Twitter

So I’ve not really used Twitter as much as I should, mainly as most people speak a load of shit on there.

But I would like to actually follow and be followed by people in the field, as it’s a great way to get some communication going when I/You run into problems.

Anyways, feel free to add me: https://twitter.com/mellowdrifter

Route-based and policy-based IPSec vpns between Cisco IOS and Juniper ScreenOS

I had to train some NOC’ers a couple of days ago and I came up with a few scenarios we come across often, and how we would ensure the VPNs work.

We currently predominantly use Juniper’s ScreenOS range for IPSec VPNs. 3rd parties use anything from Juniper, Cisco, Checkpoint, etc. The most common is Cisco, and it’s also the one that causes the most issues from my experience.

Let’s use the following topology. 10.0.0.0/24 represents the public internet. Each site needs to have connectivity to the other through an IPSec VPN tunnel.

Both IOS and ScreenOS allow you to create both policy-based and route-based VPNs. From my experience, most people seem to do policy-based VPNs. I much prefer route-based VPNs. In fact I have not created a policy-based VPN in a number of years as I always find route-based to be far superior.

There are a number of differences, but in essence in goes like this. With policy-based, interesting traffic going over the regular interface will be encrypted and sent to the other side of the tunnel. Interesting traffic is defined in an ACL. With route-based, you actually create a layer 3 interface and any traffic going through that tunnel is subject to encryption.

The great thing about route-based tunnels is that you can run routing protocols over them. You can also send any traffic you like over it and you know it’ll be encrypted. You can also easily attach service policies to a tunnel interface. In essence, you get a fully routable layer3 interface. A policy-based VPN simply doesn’t give you this capability.

Note that I’m going to use some generic phase 1 and phase 2 settings. You can of course add loads of different options, but that’s up to you.

Let’s start with the Juniper ScreenOS device. eth0/5 is my untrust interface.

set interface tunnel.1 zone untrust
set interface tunnel.1 ip unnumbered interface ethernet0/5
set ike gateway "VPN_P1" address 10.0.0.2 Main outgoing-interface "ethernet0/5"
(continued form line above) preshare vpnblog proposal "pre-g2-3des-sha"
set vpn "VPN_P2" gateway "VPN_P1" no-replay tunnel idletime 0 proposal "g2-esp-3des-sha"
set vpn "VPN_P2" bind interface tunnel.1
set route 192.168.0.0/24 interface tunnel.1

In the above I’ve created Tunnel 1. I then created my phase 1 and phase 2 settings and binded the tunnel interface to the phase 2 set up. I then create a static route to send traffic going to 192.168.0.0/24 over the tunnel.

Let’s now move over to the Cisco. Pretty much the same thing I’m doing, but the config looks a little different:

crypto isakmp policy 1
 encr 3des
 authentication pre-share
 group 2
crypto isakmp key vpnblog address 10.0.0.1
!
crypto ipsec transform-set ESP-3DES-SHA esp-3des esp-sha-hmac
!
crypto ipsec profile VPN_P2
 set transform-set ESP-3DES-SHA
!
interface Tunnel0
 ip unnumbered FastEthernet0/0
 tunnel source 10.0.0.2
 tunnel destination 10.0.0.1
 tunnel mode ipsec ipv4
 tunnel protection ipsec profile VPN_P2
!
ip route 172.16.0.0 255.255.255.0 Tunnel0

Both my hosts can now ping each other. You can verify the VPN is up as follows:

ScreenOS
SSG5-> get ike cookies

IKEv1 SA -- Active: 1, Dead: 0, Total 1

80182f/0003, 10.0.0.1:500->10.0.0.2:500, PRESHR/grp2/3DES/SHA, xchg(5) (VPN_P1/grp-1/usr-1)
resent-tmr 26233316 lifetime 28800 lt-recv 28800 nxt_rekey 28373 cert-expire 0
initiator, err cnt 0, send dir 0, cond 0x0
nat-traversal map not available
ike heartbeat              : disabled
ike heartbeat last rcv time: 0
ike heartbeat last snd time: 0
XAUTH status: 0
DPD seq local 0, peer 0


IOS
Cisco#sh crypto ipsec sa

interface: Tunnel0
    Crypto map tag: Tunnel0-head-0, local addr 10.0.0.2

   (removed for brevity)

     inbound esp sas:
      spi: 0x1DCD78DB(500005083)
        transform: esp-3des esp-sha-hmac ,
        in use settings ={Tunnel, }
        conn id: 3001, flow_id: FPGA:1, crypto map: Tunnel0-head-0
        sa timing: remaining key lifetime (k/sec): (4469954/3330)
        IV size: 8 bytes
        replay detection support: Y
        Status: ACTIVE

     outbound esp sas:
      spi: 0x3593AF04(898871044)
        transform: esp-3des esp-sha-hmac ,
        in use settings ={Tunnel, }
        conn id: 3002, flow_id: FPGA:2, crypto map: Tunnel0-head-0
        sa timing: remaining key lifetime (k/sec): (4469953/3329)
        IV size: 8 bytes
        replay detection support: Y
        Status: ACTIVE

Of course the best thing about using a tunnel interface is that you can use it like a regular layer 3 interface. If both sides are under the same administrative control why not just run OSPF instead of the static routes?

ScreenOS
set protocol ospf
set enable
set interface trust protocol ospf area 0.0.0.0
set interface trust protocol ospf passive
set interface trust protocol ospf enable
set interface tunnel.1 protocol ospf area 0.0.0.0
set interface tunnel.1 protocol ospf enable
set interface tunnel.1 mtu 1443
unset route 192.168.0.0/24 interface tunnel.1
IOS
interface FastEthernet0/1
 ip ospf 1 area 0
!
interface Tunnel0
 ip ospf 1 area 0
!
no ip route 172.16.0.0 255.255.255.0 Tunnel0
Cisco#sh ip route 172.16.0.0
O       172.16.0.0 [110/11112] via 10.0.0.1, 00:01:19, Tunnel0

SSG5-> get route ip 192.168.0.0
 Dest for 192.168.0.0
--------------------------------------------------------------------------------------
trust-vr       : => 192.168.0.0/24 (id=9) via 0.0.0.0 (vr: trust-vr)
                    Interface tunnel.1 , metric 2

I had to change the MTU of the ScreenOS tunnel as they report their sizes a bit differently.

So the above is all very simple, as long as both sides are using route-based VPN tunnels. Unfortunately 90% of the time I see the 3rd party creating a policy-based tunnel. While you can match it to be a policy-based tunnel on the ScreenOS side, you can actually still run a route-based VPN. No you won’t be able to run a routing protocol over it, but it will keep your configuration clean.

Cisco
crypto isakmp policy 1
 encr 3des
 authentication pre-share
 group 2
crypto isakmp key vpnblog address 10.0.0.1
!
!
crypto ipsec transform-set ESP-3DES-SHA esp-3des esp-sha-hmac
!
crypto map VPN_P2 10 ipsec-isakmp
 set peer 10.0.0.1
 set transform-set ESP-3DES-SHA
 set pfs group2
 match address 100
!
interface FastEthernet0/0
 crypto map VPN_P2
!
access-list 100 permit ip 192.168.0.0 0.0.0.255 172.16.0.0 0.0.0.255

The ScreenOS config is the same, except for the fact that I’ve re-added that static route over the tunnel.

Unfortunately the VPN does not come up, but if you check the ScreenOS event log it’s quite clear what’s happening:

Rejected an IKE packet on untrust from 10.0.0.2:500 to 10.0.0.1:500 with cookies 39baf5f7
and 9e7b3f4ab8141de4 because The peer sent a proxy ID that did not match the one in the 
SA config. IKE 10.0.0.2 Phase 2: No policy exists for the proxy ID received: local ID 
(172.16.0.0/255.255.255.0, 0, 0) remote ID (192.168.0.0/255.255.255.0, 0, 0).

The ScreenOS box is complaining that the proxy-id received does not match it’s own. The Cisco proxy-id is made up from the ACL we defined above. As the ScreenOS box is running a route-based VPN, it’s proxy-id is essentially 0.0.0.0/0 0.0.0.0/0

To fix it, we can very easily ‘fake’ our proxy-id on the ScreenOS box:

set vpn "VPN_P2" proxy-id  local-ip 172.16.0.0/24 remote-ip 192.168.0.0/24 "ANY"

As soon as this command is entered, both VPN tunnels come straight up.

What is quite common is for multiple subnets from one side needing access to multiple subnets to the other and vice-versa. Now a route-based VPN on both sides would run this extremely easily with a single IPSec tunnel. If the remote party is determined to use policy-based VPNs on their Cisco device, you can still run multiple tunnels to them.

A great feature as of ScreenOS 6.3 onwards is that you can create multiple proxy-ids per phase 2 tunnel. With each proxy-id it’ll spawn a new phase 2 tunnel, but the only configuration you need to do is to add another proxy-id to the existing tunnel. Nice and Easy!

BGP outbound route filtering

BGP ORF is a powerful feature that relieves both the ISP and their BGP customers from time wasting and headaches. It can also be used to do some pretty complicated MPLS L3VPN stuff, but let’s keep this post relatively simple.

Let’s us the following simple topology for this post:

Let’s say that R2 is a customer of R1. R1 is is a Tier 1 ISP with the full global routing table. R2 has bought IP transit from both R1 and another company. R2 has a choice of what BGP table to receive from the ISP. Should I take the full routing table? Should I take just a default?
Maybe a default and a few other prefixes? With the full table, I can do all kinds of load-sharing and I can ensure that my traffic will always take the shortest as-path.
On the other hand I could simple take defaults from both. That way my routers only need to hold a single default, but my routers don’t really know the best path for anything.
I can also ask both ISPs to send me a default plus a few more specifics.

Of course the problems with the first is that my edge routers need to hold the entire BGP table. The second option I am extremely limited in sending my outbound traffic via the best path. The final option seems the best, but what if I wanted to change which routes the ISP is sending me? I would have to submit a request for them to do so, and who knows how long until they make that change. Most ISPs will not even do this for you anyway.

You could take the entire BGP table and then run that through a filter blocking everything you didn’t want. That works but it’s a waste. Let’s say you want 100 prefixes. The ISP’s router will have to send all 500k IPv4 prefixes to you, only for you to filter out all but 10 of them. This is not efficient for both the ISPs and your own router. Sending the entire table also takes a few minutes.

Enter OutBound Route Filtering (ORF) – Effectively this allows you the customer to ‘configure’ the ISPs router to only send you what you want. At any time you can update the filter on your local box and that change is fed to the ISPs box. Let’s see this in action.

R1 has 9 loopbacks, 1.1.1.1 to 9.9.9.9. It’s also sending a default route. Standard config is like so:

R1
router bgp 1
 bgp log-neighbor-changes
 neighbor 10.0.0.1 remote-as 200
 !
 address-family ipv4
  network 1.1.1.0 mask 255.255.255.0
  network 2.2.2.0 mask 255.255.255.0
  network 3.3.3.0 mask 255.255.255.0
  network 4.4.4.0 mask 255.255.255.0
  network 5.5.5.0 mask 255.255.255.0
  network 6.6.6.0 mask 255.255.255.0
  network 7.7.7.0 mask 255.255.255.0
  network 8.8.8.0 mask 255.255.255.0
  network 9.9.9.0 mask 255.255.255.0
  neighbor 10.0.0.1 activate
  neighbor 10.0.0.1 default-originate
 exit-address-family
R2
router bgp 200
 bgp log-neighbor-changes
 neighbor 10.0.0.0 remote-as 1
 !
 address-family ipv4
  neighbor 10.0.0.0 activate
 exit-address-family


R2#sh ip bgp | begin Network
   Network          Next Hop            Metric LocPrf Weight Path
*> 0.0.0.0          10.0.0.0                               0 1 i
*> 1.1.1.0/24       10.0.0.0                 0             0 1 i
*> 2.2.2.0/24       10.0.0.0                 0             0 1 i
*> 3.3.3.0/24       10.0.0.0                 0             0 1 i
*> 4.4.4.0/24       10.0.0.0                 0             0 1 i
*> 5.5.5.0/24       10.0.0.0                 0             0 1 i
*> 6.6.6.0/24       10.0.0.0                 0             0 1 i
*> 7.7.7.0/24       10.0.0.0                 0             0 1 i
*> 8.8.8.0/24       10.0.0.0                 0             0 1 i
*> 9.9.9.0/24       10.0.0.0                 0             0 1 i

So far everything is expected. However I now want to receive the default route, but also 4.4.4.4/24 and 8.8.8.8/24 – nothing else. The first thing we need to do is enable ORF. Note that this can be set to ‘receive’, ‘send’, or ‘both’. This ensures that only certain routers control certain others. Note also that when you configure this, the BGP session resets.

R1
router bgp 1
 address-family ipv4
  neighbor 10.0.0.1 capability orf prefix-list receive
R2
router bgp 200
 address-family ipv4
  neighbor 10.0.0.0 capability orf prefix-list send

Note that ‘receive’ is for the router sending the routes while ‘send’ is for the router receiving the routes. The send and receive keywords are for sending and receiving the prefix-lists, not the routes themselves.

So now on R2 let’s create a prefix-list specifying what we want and apply that to our neighbour:

ip prefix-list ROUTES_WANTED seq 5 permit 0.0.0.0/0
ip prefix-list ROUTES_WANTED seq 10 permit 4.4.4.0/24
ip prefix-list ROUTES_WANTED seq 15 permit 8.8.8.0/24
!
router bgp 200
 address-family ipv4
  neighbor 10.0.0.0 prefix-list ROUTES_WANTED in

R2#sh ip bgp | begin Network
   Network          Next Hop            Metric LocPrf Weight Path
*> 0.0.0.0          10.0.0.0                               0 1 i
*> 4.4.4.0/24       10.0.0.0                 0             0 1 i
*> 8.8.8.0/24       10.0.0.0                 0             0 1 i

If you run a debug you’ll see that R2 is not receiving and then rejecting prefixes, it’ simply doesn’t see them. You can actually see this from R1’s perpective:

R1
R1#sh ip bgp neighbors 10.0.0.1 received prefix-filter 
Address family: IPv4 Unicast
ip prefix-list 10.0.0.1: 3 entries
   seq 5 permit 0.0.0.0/0
   seq 10 permit 4.4.4.0/24
   seq 15 permit 8.8.8.0/24

Now you want the 5.5.5.0/24 prefix? The customer only needs to update his prefix-filter:

R2
R2(config)#ip prefix-list ROUTES_WANTED seq 20 permit 5.5.5.0/24

R2#clear ip bgp * soft in prefix-filter 

R2#sh ip bgp | begin Network            
   Network          Next Hop            Metric LocPrf Weight Path
*> 0.0.0.0          10.0.0.0                               0 1 i
*> 4.4.4.0/24       10.0.0.0                 0             0 1 i
*> 5.5.5.0/24       10.0.0.0                 0             0 1 i
*> 8.8.8.0/24       10.0.0.0                 0             0 1 i
R1
R1#sh ip bgp neighbors 10.0.0.1 received prefix-filter 
Address family: IPv4 Unicast
ip prefix-list 10.0.0.1: 4 entries
   seq 5 permit 0.0.0.0/0
   seq 10 permit 4.4.4.0/24
   seq 15 permit 8.8.8.0/24
   seq 20 permit 5.5.5.0/24

You specify ORF filters and capability on a per address-family basis. So you can use the same technique to filter IPv6 unicast, IPv6 multicast, IPv4 multicast, etc

Of course JunOS can do the exact same thing. Let’s replicate the topology above and do the same thing. Note that when you set this up between IOS and JunOS, you need to tell JunOS to use the Cisco format:

JunOS
darren> show configuration protocols bgp
group EXTERNAL {
    neighbor 10.0.10.1 {
        peer-as 100;
        outbound-route-filter {
            bgp-orf-cisco-mode;
            prefix-based {
                accept {
                    inet;
                }
            }
        }
    }
}

darren> show bgp neighbor orf 10.0.10.1 detail
Peer: 10.0.10.1+179   Type: External
  Group: EXTERNAL

  inet-unicast
    Filter updates recv:          4 Immediate:          1
    Filter: prefix-based            receive
            Updates recv:          4
      Received filter entries:
        seq 5 /0 permit minlen 0 maxlen 0
        seq 10 4.4.4.0/24 permit minlen 0 maxlen 0
        seq 15 8.8.8.0/24 permit minlen 0 maxlen 0
        seq 20 5.5.5.0/24 permit minlen 0 maxlen 0

darren> show route advertising-protocol bgp 10.0.10.1

inet.0: 15 destinations, 16 routes (15 active, 0 holddown, 0 hidden)
  Prefix                  Nexthop              MED     Lclpref    AS path
* 4.4.4.0/24              Self                                    2 ?
* 5.5.5.0/24              Self                                    2 ?
* 8.8.8.0/24              Self                                    2 ?

This is an very powerful way to easily let one router tell another what routes to send it, instead of just dropping it. Great for regular and MPLS-based BGP