2013 Wrap up

2013 has been a stellar year on a number of levels for me. In January I sat my CCIE R&S for the second time and passed at last. November this year I took and passed the JNCIE-SP in my first attempt.

I travelled to more places this year than I’ve ever done. I’ve managed to see Greece, Belgium, The Netherlands, Hungary, Germany, and Denmark – Some of these places I’ve already been to, but there were a few new on the list.

Blog-wise I’m a few hundred short of 180 000 views this year. I’ve almost totalled half a million in total since I started. Not bad for what is essentially my own notes.

Post-wise my old simple posts are still pulling major numbers in. My top three posts are :

  1. HSRP Object Tracking
  2. Access-lists vs prefix-lists
  3. MPLS L3VPN – Route Distinguisher vs Route Target vs VPN label

 

In the last 3 months the following three posts have already been increasing rapidly:

  1. L2VPN on Junos using CCC/Martini/Kompella
  2. So how does the JNCIE-SP compare to the CCIE R&S?
  3. Moving routes between a VRF and the global (default) RIB – Part 1 – Cisco IOS

 

In total I have published 267 posts and I still have 85 drafts! In fact I created 5 drafts alone in the last two weeks. If only I had the time to finish them…

Work-wise has been just as busy. I can’t go too much into in but rest assured I’m very busy on a day to day basis.

I plan a number of things for 2014. I plan to sit my CCIE SP lab and possibly go for the JNCIE-SEC too. I plan to work more on my python skills and who knows what else.

Hope your 2013 has been as good as mine and I wish all my readers the best for 2014!

Remote Triggered Black Hole Filtering and Flowspec

RTBH is a mature technology widely used to lower the effects of a DDOS attack against a customer of yours. While it works well, it’s a bit of a sledgehammer. Flowspec is a new technology that gives you a lot more control over what is blocked and as such it’s a lot more powerful.

I’ll be using the following diagram for this post:

P1 and P2 are edge routers peering with transit peers. R3 is a route-reflector which is peered to both P1 and P2. C1 is a customer attached to P3 originating their own address space (172.16.0.0/16)

RTBH

RTBH works on the concept of black-holing traffic towards an IP host/subnet. It does this by advertising a statically injected static route which has been pre-defined to have a next-hop to null0/discard.

As an example, let’s assume a host with the address 172.16.200.10 is under attack. R3, the RR is the route-injector, but it can be any of the internal iBGP routers. There is quite a bit of upfront config with RTBH, but most of this config only needs to be done once.

On all BGP routers in the core you need a route that will be discarded:

[email protected]> show configuration routing-options
static {
    route 192.0.2.1/32 discard;
}

On all routers I want routes learned with a certain community to have their next-hop pointing to the discard route:

[email protected]> show configuration policy-options
policy-statement BLACK-HOLE-FILTER {
    term 1 {
        from community BLACK_HOLE;
        then {
            next-hop 192.0.2.1;
        }
    }
}
community BLACK_HOLE members 65401:666;

I’m going to apply this an an inbound filter on my iBGP sessions:

[email protected]> show configuration protocols bgp group ISP1
import BLACK-HOLE-FILTER;

Basically we are saying that any routes learned via BGP with the above community, set your next-hop to discard. On the route injector we set up an export policy matching static routes with a tag of 666. Any route matching will have the black hole community added. As this will be a specific route we need to ensure it doesn’t leave the confines of our AS and so we also tag no-export:

[email protected]_RR> show configuration policy-options
policy-statement RTBH {
    term BLACK-HOLE {
        from {
            protocol static;
            tag 666;
        }
        then {
            local-preference 5000;
            community add no-export;
            community add BLACK_HOLE;
            next-hop 192.0.2.1;
            accept;
        }
    }
}
community BLACK_HOLE members 65401:666;
community no-export members no-export;

The above policy is then applied outbound on the iBGP session on the route-injector:

[email protected]_RR> show configuration protocols bgp group ISP1
local-address 192.168.0.3;
export RTBH;

RTBH testing and verification

From a router out on the internet I can currently ping the affected host:

[email protected]> ping 172.16.200.10 interface lo0.0 rapid
PING 172.16.200.10 (172.16.200.10): 56 data bytes
!!!!!
--- 172.16.200.10 ping statistics ---
5 packets transmitted, 5 packets received, 0% packet loss
round-trip min/avg/max/stddev = 7.836/10.747/14.091/2.430 ms

I’ll now implement a black hole static on the route-injector:

set routing-options static route 172.16.200.10/32 next-hop 192.0.2.1 resolve tag 666

[edit]
[email protected]_RR# commit and-quit
commit complete
Exiting configuration mode

If we ping from the internet again:

[email protected]> ping 172.16.200.10 interface lo0.0 rapid
PING 172.16.200.10 (172.16.200.10): 56 data bytes
.....
--- 172.16.200.10 ping statistics ---
5 packets transmitted, 0 packets received, 100% packet loss

All packets lost. We can ensure only this /32 is affected by pinging another host in the subnet:

[email protected]> ping 172.16.200.50 interface lo0.0 rapid
PING 172.16.200.50 (172.16.200.50): 56 data bytes
!!!!!
--- 172.16.200.50 ping statistics ---
5 packets transmitted, 5 packets received, 0% packet loss
round-trip min/avg/max/stddev = 5.582/7.116/10.018/1.658 ms

Looking at the edge routers we see the learned /32, and the next-hop of discard:

[email protected]> show route 172.16.200.10 extensive

inet.0: 17 destinations, 17 routes (17 active, 0 holddown, 0 hidden)
172.16.200.10/32 (1 entry, 1 announced)
TSI:
KRT in-kernel 172.16.200.10/32 -> {indirect(262143)}
        *BGP    Preference: 170/-5001
                Next hop type: Indirect
                Address: 0x97106d0
                Next-hop reference count: 3
                Source: 192.168.0.3
                Next hop type: Discard
                Protocol next hop: 192.0.2.1
                Indirect next hop: 94781d0 262143
                State: 
                Local AS: 65401 Peer AS: 65401
                Age: 2:15       Metric2: 0
                Task: BGP_65401.192.168.0.3+64669
                Announcement bits (2): 0-KRT 4-Resolve tree 1
                AS path: I
                Communities: 65401:666 no-export
                Accepted
                Localpref: 5000
                Router ID: 192.168.0.3
                Indirect next hops: 1
                        Protocol next hop: 192.0.2.1 Metric: 0
                        Indirect next hop: 94781d0 262143
                        Indirect path forwarding next hops: 0
                                Next hop type: Discard
                        192.0.2.1/32 Originating RIB: inet.0
                          Metric: 0                       Node path count: 1
                          Forwarding nexthops: 0
                                Next hop type: Discard

The /32 route has been learned through BGP from the route-injector. The correct communities are set. The next-hop goes to a route that is discard, and hence any packets going to this host are now discarded.

Adding and removing hosts are are simple as adding or removing routes on the route-injector.

The above works extremely well, but until the attack is finished and routes removed, that IP address is unroutable over the internet. Any traffic at all going towards it will be black-holed.

Flowspec

There is a more subtle way of doing the above. RFC5575 is the definition of a new filtering mechanism called flowspec. Oddly half the RFC authors are Cisco employess, yet as of today I can only find support for flowspec on Junos.

Essentially flowspec allows routers to advertise firewall filters to your edge BGP devices directly through BGP. Because this is a filter, it allows you to use all the actions of a regular firewall filter. Do you want to police DNS traffic only in a DNS amplification attack? Simple. Flowspec gives you the flexibility to do so.

The first part of enabling flowspec is to configure BGP to carry the NLRI. This will be done on all your internal routers:

[email protected]> configure
Entering configuration mode

[edit]
[email protected]# set protocols bgp group ISP1 family inet flow

[edit]
[email protected]# commit and-quit
commit complete
Exiting configuration mode

Now let’s suppose 172.16.200.10 is under some kind of ICMP attack. I want to block all ICMP traffic to this host from the edge routers, but still allow other traffic through to the host:

[email protected]_RR> show configuration routing-options flow
route BLOCK-ICMP-172.16.200.10 {
    match {
        destination 172.16.200.10/32;
        protocol icmp;
    }
    then discard;
}
term-order standard;

This router will now advertise this filter to all other iBGP peers.

Flowspec testing and verification

We can test this from the internet by trying to ping to this address, and then trying to FTP. Ping should fail, while FTP should be let through:

[email protected]> ping 172.16.200.10 source 192.168.50.1 rapid
PING 172.16.200.10 (172.16.200.10): 56 data bytes
.....
--- 172.16.200.10 ping statistics ---
5 packets transmitted, 0 packets received, 100% packet loss
[email protected]> ftp 172.16.200.10 source 192.168.50.1
Connected to 172.16.200.10.
220 C1 FTP server (Version 6.00LS) ready.
Name (172.16.200.10:root): darreno
331 Password required for darreno.
Password:
230 User darreno logged in.
Remote system type is UNIX.
Using binary mode to transfer files.
ftp>

This works exactly as expected.

You can verify the flow NLRI coming in and applied as a filter on the edge routers:

[email protected]> show route table inetflow.0

inetflow.0: 1 destinations, 1 routes (1 active, 0 holddown, 0 hidden)
+ = Active Route, - = Last Active, * = Both

172.16.200.10,*,proto=1/term:1
                   *[BGP/170] 00:12:57, localpref 100, from 192.168.0.3
                      AS path: I, validation-state: unverified
                      Fictitious

[email protected]> show firewall

Filter: __default_bpdu_filter__

Filter: __flowspec_default_inet__
Counters:
Name                                                Bytes              Packets
172.16.200.10,*,proto=1                              2352                   28

172.16.200.10,* – meaning destination address 172.16.200.10/32 with any source – proto=1 is ICMP

Conclusion

  • Flowspec gives you a lot more options when it comes to filtering out DDOS attacks. Instead of isolating an IP you are able to filter specific traffic only. These firewall filters are then advertised via BGP to all your iBGP speakers.
  • As this is a firewall filter, you don’t have to specify a discard action. You can just as easily set a policing action.
  • Currently junos supports flowspec on both the inet and family-inet-vpn familes. So no v6 support yet
  • Most other vendors still don’t have working implementations

VC mode 4/5 – Brocade VLL/VPLS

When doing L2VPN, there are two different ways in which packets are encapsulated over the MPLS link. I’ll take the following quote from RFC4488 directly to spare myself from explaining it all:

4.1. Ethernet Tagged Mode

The Ethernet frame will be encapsulated according to the procedures
defined later in this document for tagged mode. It should be noted
that if the VLAN identifier is modified by the egress PE, the
Ethernet spanning tree protocol might fail to work properly. If this
issue is of significance, the VLAN identifier MUST be selected in
such a way that it matches on the attachment circuits at both ends of
the PW.

If the PE detects a failure on the Ethernet physical port, or the
port is administratively disabled, it MUST send a PW status
notification message for all PWs associated with the port.

This mode uses service-delimiting tags to map input Ethernet frames
to respective PWs and corresponds to PW type 0x0004 “Ethernet Tagged
Mode” [IANA].

4.2. Ethernet Raw Mode

The Ethernet frame will be encapsulated according to the procedures
defined later in this document for raw mode. If the PE detects a
failure on the Ethernet input port, or the port is administratively
disabled, the PE MUST send an appropriate PW status notification
message to the corresponding remote PE.

In this mode, all Ethernet frames received on the attachment circuit
of PE1 will be transmitted to PE2 on a single PW. This service
corresponds to PW type 0x0005 “Ethernet” [IANA].

4.4.1. Raw Mode vs. Tagged Mode

When the PE receives an Ethernet frame, and the frame has a VLAN tag,
we can distinguish two cases:

1. The tag is service-delimiting. This means that the tag was
placed on the frame by some piece of service provider-operated
equipment, and the tag is used by the service provider to
distinguish the traffic. For example, LANs from different
customers might be attached to the same service provider
switch, which applies VLAN tags to distinguish one customer’s
traffic from another’s, and then forwards the frames to the PE.

2. The tag is not service-delimiting. This means that the tag was
placed in the frame by a piece of customer equipment, and is
not meaningful to the PE.

Whether or not the tag is service-delimiting is determined by local
configuration on the PE.

If an Ethernet PW is operating in raw mode, service-delimiting tags
are NEVER sent over the PW. If a service-delimiting tag is present
when the frame is received from the attachment circuit by the PE, it
MUST be stripped (by the NSP) from the frame before the frame is sent
to the PW.

If an Ethernet PW is operating in tagged mode, every frame sent on
the PW MUST have a service-delimiting VLAN tag. If the frame as
received by the PE from the attachment circuit does not have a
service-delimiting VLAN tag, the PE must prepend the frame with a
dummy VLAN tag before sending the frame on the PW. This is the
default operating mode. This is the only REQUIRED mode.

In both modes, non-service-delimiting tags are passed transparently
across the PW as part of the payload. It should be noted that a
single Ethernet packet may contain more than one tag. At most, one
of these tags may be service-delimiting. In any case, the NSP
function may only inspect the outermost tag for the purpose of
adapting the Ethernet frame to the pseudowire.

In both modes, the service-delimiting tag values have only local
significance, i.e., are meaningful only at a particular PE-CE
interface. When tagged mode is used, the PE that receives a frame
from the PW may rewrite the tag value, or may strip the tag entirely,
or may leave the tag unchanged, depending on its configuration. When
raw mode is used, the PE that receives a frame may or may not need to
add a service-delimiting tag before transmitting the frame on the
attachment circuit; however, it MUST not rewrite or remove any tags
that are already present.

Quite a mouthful. In my testing of both modes on a Brocade Netiron XMR I’m seeing slightly different results. Let’s take a look. For this post I’ll use the following topology:

I have three sites and on those sites the customer has a router. We also have a switch on site which will be taking the customer’s tagged frames and impose another vlan tag on top for QinQ. That QinQ tag goes over the attachment circuit to the PE router.

On the PE router we’ll be putting it all in the same VPLS. I have mirrored a core port and will show exactly what the packet format is when traffic goes between these three routers. One thing to note is that vc-modes have to match. Your VPLS peers will not come up if they don’t. I am using LDP-signhalled VPLS over RSVP for this example.

VPLS VC-MODE 4

I’ll be showing the config of PE3. The others are near-idential. Only the outer vlan tag is set to the local service-deliminating vlan id.

router mpls
 vpls VC_TEST 9000
  vc-mode tagged
  vpls-peer 10.11.224.60 10.11.224.61
  vlan 500
   tagged ethe 1/13

We can verify the session is both up and operating at the right mode:

[email protected]# sh mpls vpls id 9000
VPLS VC_TEST, Id 9000, Max mac entries: 8192
 Total vlans: 1, Tagged ports: 1 (1 Up), Untagged ports 0 (0 Up)
 IFL-ID: n/a
  Vlan 500
   L2 Protocol: NONE
   Tagged: ethe 1/13
 VC-Mode: Tagged
 Total VPLS peers: 2 (2 Operational)
 Peer address: 10.11.224.60, State: Operational, Uptime: 3 hr 28 min
  Tnnl in use: tnl0(1097)[RSVP]    Peer Index:0
  Local VC lbl: 983040, Remote VC lbl: 983060
  Local VC MTU: 8974, Remote VC MTU: 8974
  Local VC-Type: Ethernet Tagged(0x04), Remote VC-Type: Ethernet Tagged(0x04)
 Peer address: 10.11.224.61, State: Operational, Uptime: 3 hr 28 min
  Tnnl in use: tnl1(1104)[RSVP]    Peer Index:1
  Local VC lbl: 983041, Remote VC lbl: 983100
  Local VC MTU: 8974, Remote VC MTU: 8974
  Local VC-Type: Ethernet Tagged(0x04), Remote VC-Type: Ethernet Tagged(0x04)
 CPU-Protection: OFF
 Local Switching: Enabled
 Extended Counter: ON
 Multicast Snooping: Disabled

A quick overview of how the QinQ SW3 is configured. Again S1 and S2 will be the same except for their vlan-id:

interface GigabitEthernet1/0/22
 switchport access vlan 500
 switchport trunk encapsulation dot1q
 switchport mode dot1q-tunnel
!
interface GigabitEthernet1/0/23
 switchport trunk encapsulation dot1q
 switchport trunk allowed vlan 500
 switchport mode trunk

CPE3 has a subinterface configured sending tagged traffic:

interface FastEthernet0/0.10
 encapsulation dot1Q 10
 ip address 10.10.10.3 255.255.255.0
 ip ospf 1 area 0

Let’s see if R3 can ping R2’s subinterface:

R3#ping 10.10.10.2

Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 10.10.10.2, timeout is 2 seconds:
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 1/2/4 ms

Yes it can. Let’s refer back to the RFC and see if we can see what packet format should be going over the MPLS core, before we look at a packet itself.

If an Ethernet PW is operating in tagged mode, every frame sent on
the PW MUST have a service-delimiting VLAN tag. If the frame as
received by the PE from the attachment circuit does not have a
service-delimiting VLAN tag, the PE must prepend the frame with a
dummy VLAN tag before sending the frame on the PW. This is the
default operating mode. This is the only REQUIRED mode.

In both modes, the service-delimiting tag values have only local
significance, i.e., are meaningful only at a particular PE-CE
interface. When tagged mode is used, the PE that receives a frame
from the PW may rewrite the tag value, or may strip the tag entirely,
or may leave the tag unchanged, depending on its configuration.

On PE3 above, vlan 500 is the service-deliminating vlan tag. On PE2 it’s vlan 700. The second quote can very easily be understood differently by different vendors. The RFC itself is saying it could be one of three things!

Either way let’s ping from R3 to R2 and capture that in wireshark and see what a single packet looks like:

Both the service and customer vlan are sent across the link. Communication is still working so PE2 swaps the service-deliminating tag with it’s own service tag (vlan 600) outbound. The reverse is true when a packet comes in from R2 back to R3:

There is an important note here. The Brocade Netiron will not allow me to specify ‘any’ coming into a port to be put into a VPLS. i.e. If I want to carry cvlans across the core I need to pop another tag on top before it gets into a VPLS instance.

VPLS VC-MODE 5

I’ve changed all my PE routers to use vc mode 5 for this VPLS instance:

[email protected]#sh mpls vpls id 9000
VPLS VC_TEST, Id 9000, Max mac entries: 8192
 Total vlans: 1, Tagged ports: 1 (1 Up), Untagged ports 0 (0 Up)
 IFL-ID: n/a
  Vlan 500
   L2 Protocol: NONE
   Tagged: ethe 1/13
 VC-Mode: Raw
 Total VPLS peers: 2 (2 Operational)
 Peer address: 10.11.224.60, State: Operational, Uptime: 1 min
  Tnnl in use: tnl0(1097)[RSVP]    Peer Index:0
  Local VC lbl: 983040, Remote VC lbl: 983062
  Local VC MTU: 8974, Remote VC MTU: 8974
  Local VC-Type: Ethernet(0x05), Remote VC-Type: Ethernet(0x05)
 Peer address: 10.11.224.61, State: Operational, Uptime: 1 min
  Tnnl in use: tnl1(1104)[RSVP]    Peer Index:1
  Local VC lbl: 983041, Remote VC lbl: 983106
  Local VC MTU: 8974, Remote VC MTU: 8974
  Local VC-Type: Ethernet(0x05), Remote VC-Type: Ethernet(0x05)
 CPU-Protection: OFF
 Local Switching: Enabled
 Extended Counter: ON
 Multicast Snooping: Disabled

I’ll ping R2 from R3 again and see what we see this time:

This time the service-deliminating tag has been stripped. Only the inner-vlan is being sent across the MPLS LSP.

VLL VC-MODE 4

VLLs are different in that they are point-to-point. The Netiron will allow me to tag any vlan without having to push another vlan tag on top first.

I’ve changed the switches to not do QinQ anymore. They are merely passing the tagged vlans from the CPE directly to the core interfaces.

I’ve set up vc-mode 4 and ping across. What do we expect? As there is no service-deliminating vlan tag, the RFC states the device should add a dummy vlan tag. Let’s see if it does:

The Brocade has added vlan 1 on top of vlan 10. This is what we expect.

VLL VC-MODE 5

I’ve now changed it to vc-mode 5. This should mean that no dummy vlan is sent, but we should still see our vlan 10:

Which is exactly what we see.

Why is this important?

Knowing how this works is mainly due to interop. I’ll be doing the same tests above on both Cisco and juniper kit to see the differences. The main thing to remember is that VC mode 4 and VC mode 5 peers will NOT come up.

Annoyingly, on the Brocade the VPLS defaults to vc-type 5 while the VLL defaults to vc-type 4.

Even if they do match, we can see from the RFC above that it’s not at all easy to determine exactly what the router is supposed to do. This can cause you to run into all manner of interop issues.

I’ve already read out there that vc-type 5 simply means untagged frames. This is not true. Both vc-mode 4 and 5 can send cvlan tags across the MPLS cloud.