Tag Archives: routing

Segment Routing on IOS-XR

Cisco has released some support for segment-routing on IOS-XR 5.2.0 so what better time to lab it up. I’ve got four IOS-XRv boxes running 5.2.0:

RP/0/0/CPU0:XR1#sh ver | include XR
Cisco IOS XR Software, Version 5.2.0[Default]

Currently IS-IS is the only protocol with support in XR. There are drafts to get this working in both OSPFv2 and OSPFv3

Segment Routing?

Segment routing is a huge topic. In the long run it’ll make it very easy for an SDN controller to force packets through the network in any way it wants. The draft says that it can use the existing MPLS data plane (aka labels) or the IPv6 data plane (header extensions). Right now support is for the MPLS data plane only. The nice thing here is that all devices that can currently switch based on labels should really only need a software upgrade to run segment routing in it’s current form.

Currently, in order to populate the MPLS data plane with labels you need a MPLS control plane protocol to distribute those labels. With segment routing, those labels are distributed with the IGP. Your core is now simplified as it’s only running the IGP with no LDP or RSVP. Your core no longer needs to keep LDP or RSVP state at all.

Traffic Engineering

Take the following simple diagram into consideration:
SR 1 Segment Routing on IOS XR
I’d like to use both paths to get from PE1 to PE2 for different taffic flows. This is possible with RSVP by creating multiple RSVP-TE tunnels:
SR 2 Segment Routing on IOS XR
The above works perfectly fine, but those P routers need to keep state for each and every RSVP tunnel going through them. In segment routing, there is a concept of a node segment and adjaceny segment. There are also other segment types but I won’t go into that yet. With the MPLS dataplane, each segment has a label. I can therefore force traffic to go over a certain segment by adding the segment label to the stack.
SR 3 Segment Routing on IOS XR
In the above diagram, if I want PE1 to send to PE2 via the shortest path, it simply imposes the node segment of PE2 onto the packet and sends it on. Every router in the core knows what PE2′s node segment is and as such the packet is pushed through using only that single label. Note that standard MPLS PHP behaviour is still used:
SR 4 Segment Routing on IOS XR

If I wanted to force traffic to PE2 to go over the P1-P2 link and then the P2-P3 link, I would stack the labels to ensure it went that way. It’s the ingress PE that decides this:
SR 5 Segment Routing on IOS XR
PE1 has stacked the labels in such a way that it forces the packet to go to particular segments. The core does not need to contain any of the LSP state. It simply installs the labels from the IGPs previously sent.

Configuration

Segment Routing in 5.2.0 has been enabled, but at a preliminary level only. IS-IS is the only IGP supported. MPLS dataplane is only supported. I can’t seem to find a way to advertise adjaceny segments yet, only node segments. All of the above is fine for an MPLS L3VPN lab. I’ll be using the following topology:
SR 6 Segment Routing on IOS XR
The CEs are running OSPFv2 and advertising their loopbacks into OSPF:

interface Loopback0
 ip address 100.100.100.100 255.255.255.255
 ip ospf 1 area 0
!
interface GigabitEthernet0/0.11
 encapsulation dot1Q 11
 ip address 10.0.11.1 255.255.255.0
 ip ospf 1 area 0

The PE config is pretty standard:

vrf CUS1
 address-family ipv4 unicast
  import route-target
   100:1
  !
  export route-target
   100:1
  !
 !
!
router ospf CUS1
 vrf CUS1
  redistribute bgp 100
  area 0
   interface GigabitEthernet0/0/0/0.11
   !
  !
 !
!
router bgp 100
 address-family vpnv4 unicast
 !
 neighbor 4.4.4.4
  remote-as 100
  update-source Loopback0
  address-family vpnv4 unicast
  !
 !
 vrf CUS1
  rd 100:4
  address-family ipv4 unicast
   redistribute ospf CUS1
  !
 !
!

XR1 has a VPNv4 session with XR4 and advertising the prefixes over. Segment routing is now enabled under the core IGP, IS-IS:

router isis 1
 is-type level-2-only
 net 49.0001.0000.0000.0001.00
 address-family ipv4 unicast
  metric-style wide
  segment-routing mpls
 !
 interface Loopback0
  address-family ipv4 unicast
   prefix-sid index 1000
  !
 !
 interface GigabitEthernet0/0/0/1
  address-family ipv4 unicast
  !
 !
 interface GigabitEthernet0/0/0/2
  address-family ipv4 unicast
  !
 !
!

For now you can only configure the node ID under the loopback interface. Once this is all done, I should have a labbeled router to R4′s loopback, without LDP or RSVP:

RP/0/0/CPU0:XR1#show cef  4.4.4.4 | include labels
Sun Aug 10 19:48:51.587 UTC
     local label 904000      labels imposed {904000}
     local label 904000      labels imposed {904000}

RP/0/0/CPU0:XR1#show mpls int gigabitEthernet 0/0/0/1 detail
Sun Aug 10 19:49:25.145 UTC
Interface GigabitEthernet0/0/0/1:
        LDP labelling not enabled
        LSP labelling not enabled
        MPLS ISIS enabled
        MPLS enabled

There are two labels are XR1 has two equal cost paths to XR4. A quick traceroute will show the same label:

RP/0/0/CPU0:XR1#traceroute 4.4.4.4
Sun Aug 10 19:50:16.191 UTC

Type escape sequence to abort.
Tracing the route to 4.4.4.4

 1  10.0.12.2 [MPLS: Label 904000 Exp 0] 9 msec  0 msec  0 msec
 2  10.0.24.4 0 msec  *  0 msec

Note that L3VPN still uses an inner label, the service/VPN label. The outer transport label has been replaced with the segment routing label. A traceroute from CE1 to CE2 will confirm this:

CE1#traceroute 200.200.200.200 so lo0 numeric
Type escape sequence to abort.
Tracing the route to 200.200.200.200
VRF info: (vrf in name/id, vrf out name/id)
  1 10.0.11.10 1 msec 1 msec 1 msec
  2 10.0.12.2 [MPLS: Labels 904000/16001 Exp 0] 4 msec 3 msec 3 msec
  3 10.0.24.4 [MPLS: Label 16001 Exp 0] 3 msec 7 msec 3 msec
  4 10.0.42.2 4 msec *  4 msec

Conclusions

  • Basic segment routing is increadibly easy to enable
  • I don’t see ISPs changing from RSVP-TE to SR anytime soon, but I think it will happen eventually
  • SDN is a great use case for SR, as the controller can inform PEs which segment labels to stack onto a packet as it ingresses the router
  • Perhaps even the host itself could send a packet with an SR stack imposed. Maybe that host has learnt this stack from the SDN controller? Time will tell

OSPF as the PE-CE routing protocols deep dive – Part 3 of 3 – Loop Prevention

Read part 1
Read part 2
Read part 3

 
When customer sites are single-homed, there is no possibility of a loop forming, unless of course your customer decides to set up a bunch of GRE tunnels and run OSPF over that, but I digress. If a site is multi-homed, or two sites have a back-door between them, it’s essential that route from BGP going into OSPF, do not go back into BGP.

Let’s create a slightly different diagram for this one. R3 is now also a PE router:
loop ospf OSPF as the PE CE routing protocols deep dive – Part 3 of 3 – Loop Prevention

The loop prevention used ultimately depends on whether a prefix comes in as internal or external. If a sham-link is configured and all OSPF routes are intra-area, no loop prevention is needed. Standard SPF is run everything is fine. This is because everything is seen in area 0, and SPF can run with full knowledge of the entire area.

As soon as type3s and type5s are used, OSPF becomes a little more distance vector like. ABRs/ASBRs originate new LSAs and other OSPF router believe what is told to them. This makes is possible for loops to appear when multual redistribution is occuring.

The down bit

Let’s go back to RFC 4577, specifically section 4.2.5.1

When a type 3 LSA is sent from a PE router to a CE router, the DN bit [OSPF-DN] in the LSA Options field MUST be set. This is used to ensure that if any CE router sends this type 3 LSA to a PE router, the PE router will not redistribute it further.

When a PE router needs to distribute to a CE router a route that comes from a site outside the latter’s OSPF domain, the PE router presents itself as an ASBR (Autonomous System Border Router), and distributes the route in a type 5 LSA. The DN bit [OSPF-DN] MUST be set in these LSAs to ensure that they will be ignored by any other PE routers that receive them.

There are deployed implementations that do not set the DN bit, but instead use OSPF route tagging to ensure that a type 5 LSA generated by a PE router will be ignored by any other PE router that may receive it. A special OSPF route tag, which we will call the VPN Route Tag (see Section 4.2.5.2), is used for this purpose. To ensure backward compatibility, all implementations adhering to this specification MUST by default support the VPN Route Tag procedures specified in Sections 4.2.5.2, 4.2.8.1, and 4.2.8.2. When it is no longer necessary to use the VPN Route Tag in a particular deployment, its use (both sending and receiving) may be disabled by configuration.

Essentially, if an LSA arrives at a PE with the down bit set, that will never be redistributed into BGP. This prevents the route from leaking in from one PE back into another PE.

Down Bit – IOS

R7 is advertising it’s loopback address. No sham-links are used and so R4 will originate a type3 LSA to R6:

R6#show ip ospf database summary 7.7.7.7  adv-router 4.4.4.4

            OSPF Router with ID (6.6.6.6) (Process ID 1)

                Summary Net Link States (Area 0)

  Routing Bit Set on this LSA in topology Base with MTID 0
  LS age: 441
  Options: (No TOS-capability, DC, Downward)
  LS Type: Summary Links(Network)
  Link State ID: 7.7.7.7 (summary Network Number)
  Advertising Router: 4.4.4.4
  LS Seq Number: 80000003
  Checksum: 0x5636
  Length: 28
  Network Mask: /32
        MTID: 0         Metric: 2

Options state ‘Downward’ – This LSA is flooded to R6 -> R5 -> R3. R3, another PE, will have the LSA (all databases need to match remember) but it will not use the LSA. The routing bit will not be set, and it will not redistribute that into BGP either:

R3#  show ip ospf database summary 7.7.7.7  adv-router 4.4.4.4

            OSPF Router with ID (10.0.35.3) (Process ID 1)

                Summary Net Link States (Area 0)

  LS age: 597
  Options: (No TOS-capability, DC, Downward)
  LS Type: Summary Links(Network)
  Link State ID: 7.7.7.7 (summary Network Number)
  Advertising Router: 4.4.4.4
  LS Seq Number: 80000003
  Checksum: 0x5636
  Length: 28
  Network Mask: /32
        MTID: 0         Metric: 2

The same happens vice-versa. Any LSA originated by R3 to R5, will be received but not used by R4.
loop ospf2 OSPF as the PE CE routing protocols deep dive – Part 3 of 3 – Loop Prevention

Down Bit – IOS-XR

No change in IOS-XR behaviour. You need to be sure your domain-ids match to get a type3 between IOS and IOS-XE:

R6#sh ip ospf database summary 7.7.7.7 adv-router 4.4.4.4

            OSPF Router with ID (6.6.6.6) (Process ID 1)

                Summary Net Link States (Area 0)

  Routing Bit Set on this LSA in topology Base with MTID 0
  LS age: 20
  Options: (No TOS-capability, DC, Downward)
  LS Type: Summary Links(Network)
  Link State ID: 7.7.7.7 (summary Network Number)
  Advertising Router: 4.4.4.4
  LS Seq Number: 80000001
  Checksum: 0x5A34
  Length: 28
  Network Mask: /32
        MTID: 0         Metric: 2

Down bit set on the type3.

Route tags – IOS

Let’s go back to the RFC to see what this is all about. Section 4.2.5.2

If a particular VRF in a PE is associated with an instance of OSPF, then by default it MUST be configured with a special OSPF route tag value, which we call the VPN Route Tag. By default, this route tag MUST be included in the Type 5 LSAs that the PE originates (as the result of receiving a BGP-distributed VPN-IPv4 route, see Section 4.2.8) and sends to any of the attached CEs.

The configuration and inclusion of the VPN Route Tag is required for backward compatibility with deployed implementations that do not set the DN bit in type 5 LSAs. The inclusion of the VPN Route Tag may be disabled by configuration if it has been determined that it is no longer needed for backward compatibility.

The value of the VPN Route Tag is arbitrary but must be distinct from any OSPF Route Tag being used within the OSPF domain. Its value MUST therefore be configurable. If the Autonomous System number of the VPN backbone is two bytes long, the default value SHOULD be an automatically computed tag based on that Autonomous System number

If the Autonomous System number is four bytes long, then a Route Tag value MUST be configured, and it MUST be distinct from any Route Tag used within the VPN itself.

If a PE router needs to use OSPF to distribute to a CE router a route that comes from a site outside the CE router’s OSPF domain, the PE router SHOULD present itself to the CE router as an Autonomous System Border Router (ASBR) and SHOULD report such routes as AS-external routes. That is, these PE routers originate Type 5 LSAs reporting the extra-domain routes as AS-external routes. Each such Type 5 LSA MUST contain an OSPF route tag whose value is that of the VPN Route Tag. This tag identifies the route as having come from a PE router. The VPN Route Tag MUST be used to ensure that a Type 5 LSA originated by a PE router is not redistributed through the OSPF area to another PE router.

Note that it says the OSPF should set a route-tag when the implementation doesn’t support setting the down bit in type5 LSAs. Also note in the previous RFC quote that it did note an implementation could set the down bit in type5s if desired. At this point I’ve stopped advertising R7′s loopback directly into OSPF and simply redistributed the loopback. This ensures that the LSA is external.

Usually when an ASBR originates a type5, that type5 remains unchanged in the domain. i.e. the originating router is the same. However according to the quote above, the PE need to originate a new type5 to the attached CE. This we see on R6:

R6#show ip ospf database external 7.7.7.7  adv-router 4.4.4.4

            OSPF Router with ID (6.6.6.6) (Process ID 1)

                Type-5 AS External Link States

  Routing Bit Set on this LSA in topology Base with MTID 0
  LS age: 38
  Options: (No TOS-capability, DC)
  LS Type: AS External Link
  Link State ID: 7.7.7.7 (External Network Number )
  Advertising Router: 4.4.4.4
  LS Seq Number: 80000001
  Checksum: 0x77C7
  Length: 36
  Network Mask: /32
        Metric Type: 2 (Larger than any link state path)
        MTID: 0
        Metric: 20
        Forward Address: 0.0.0.0
        External Route Tag: 3489661028

Notice no down bit. Also note the originator of this type5 is R4 itself. Finally the route has an external route tag of 3489661028

Much like the down bit, if a PE router receives an external LSA with a domain tag that matches it’s own, that LSA will not be used or redistributed
loop ospf31 OSPF as the PE CE routing protocols deep dive – Part 3 of 3 – Loop Prevention

R3#show ip ospf 1 database external 7.7.7.7 adv-router 4.4.4.4

            OSPF Router with ID (10.0.35.3) (Process ID 1)

                Type-5 AS External Link States

  LS age: 744
  Options: (No TOS-capability, DC)
  LS Type: AS External Link
  Link State ID: 7.7.7.7 (External Network Number )
  Advertising Router: 4.4.4.4
  LS Seq Number: 80000001
  Checksum: 0x77C7
  Length: 36
  Network Mask: /32
        Metric Type: 2 (Larger than any link state path)
        MTID: 0
        Metric: 20
        Forward Address: 0.0.0.0
        External Route Tag: 3489661028

No routing bit set, no redistribution happening.

Route tags – IOS-XR

R6#sh ip ospf database external 7.7.7.7 adv-router 4.4.4.4

            OSPF Router with ID (6.6.6.6) (Process ID 1)

                Type-5 AS External Link States

  Routing Bit Set on this LSA in topology Base with MTID 0
  LS age: 11
  Options: (No TOS-capability, DC)
  LS Type: AS External Link
  Link State ID: 7.7.7.7 (External Network Number )
  Advertising Router: 4.4.4.4
  LS Seq Number: 80000001
  Checksum: 0xEFCE
  Length: 36
  Network Mask: /32
        Metric Type: 2 (Larger than any link state path)
        MTID: 0
        Metric: 20
        Forward Address: 0.0.0.0
        External Route Tag: 3489661028

IOS-XR and IOS have the same behaviour.

IOS – 32bit AS number – Route-tag

The RFC states that when using 16bit AS numbers, the domain tag is automatically derived. When using a 32bit AS number, it should be manually configured. You are able to manually set this even when using a 16bit number with the domain-tag command. You can see above that when using a 16bit number it was automatic. Let’s move to a 32bit number and see what we see.
A quick change of the BGP sessions:

R4#sh run | sec router bgp
router bgp 4294967295
 no bgp default ipv4-unicast
 bgp log-neighbor-changes
 neighbor 2.2.2.2 remote-as 4294967295
 neighbor 2.2.2.2 update-source Loopback0
 neighbor 3.3.3.3 remote-as 4294967295
 neighbor 3.3.3.3 update-source Loopback0

Take a look at the type5 on R6. The domain-tag matches the 32bit AS number directly. This is not 100% confirming to the RFC which states it should be manually set:

R6#sh ip ospf database external 7.7.7.7 adv-router 4.4.4.4

            OSPF Router with ID (6.6.6.6) (Process ID 1)

                Type-5 AS External Link States

  Routing Bit Set on this LSA in topology Base with MTID 0
  LS age: 76
  Options: (No TOS-capability, DC)
  LS Type: AS External Link
  Link State ID: 7.7.7.7 (External Network Number )
  Advertising Router: 4.4.4.4
  LS Seq Number: 80000001
  Checksum: 0x2C48
  Length: 36
  Network Mask: /32
        Metric Type: 2 (Larger than any link state path)
        MTID: 0
        Metric: 20
        Forward Address: 0.0.0.0
        External Route Tag: 4294967295

Of course, R3 will not use that LSA as it’s domain-tag matches.

Considering the domain-tag matches, it stands to reason that any inter-AS VPN using OSPF would be susceptible to routing loops as each SP will have a different domain-tag. One of them could manually set it to match the other.

32bit AS number – Route-tag – IOS-XR

IOS-XR’s 32bit external behaviour is identical to IOS:

R6#sh ip ospf database external 7.7.7.7 adv-router 4.4.4.4

            OSPF Router with ID (6.6.6.6) (Process ID 1)

                Type-5 AS External Link States

  Routing Bit Set on this LSA in topology Base with MTID 0
  LS age: 76
  Options: (No TOS-capability, DC)
  LS Type: AS External Link
  Link State ID: 7.7.7.7 (External Network Number )
  Advertising Router: 4.4.4.4
  LS Seq Number: 80000001
  Checksum: 0xA44F
  Length: 36
  Network Mask: /32
        Metric Type: 2 (Larger than any link state path)
        MTID: 0
        Metric: 20
        Forward Address: 0.0.0.0
        External Route Tag: 4294967295

Once again, IOS and IOS-XR have the same behaviour.

Notes

  • Unlike parts 1 and 2 of this blog, IOS and IOS-XR finally show identical behaviour when it comes to loop prevention.

OSPF as the PE-CE routing protocols deep dive – Part 1 of 3 – Redistribution

Read part 1
Read part 2
Read part 3

 
When doing L3VPN, using OSPF is actually one of the more complicated options. Vector-based protocols like RIP, EIGRP, and BGP are comparatively simple.

RFC4577 is a great RFC that goes over how OSPF and BGP should operate when it comes to using OSPF as the PE-CE routing protocol.

I wanted to go into detail some of what is noted on the RFC to see just how both IOS and IOS-XR interpret the RFC. Also it makes it a bit fun by purposely trying to break the RFC and seeing what happens.

First, a quick refresh of how PE-CE protocols work when not using BGP as the PE-CE routing protocol. I’m going to brush very lightly over this.

Consider the following network. R2, R3, and R3 are ISP routers in which R2 and R4 are PE routers. R7, R5, and R6 belong to the customer. R7 and R5 are both connected to the same PE while R6 is connected to another PE.
RFC4577 12 OSPF as the PE CE routing protocols deep dive   Part 1 of 3   Redistribution

The CE routers are running OSPF with the PE routers. The PE routers redistribute these OSPF routes into BGP and then converts them to VPNv4 NLRI. These VPNv4 NLRIare advetised to other PE routers via BGP. The PE also converts these VPNv4 routes back into OSPF and then off to the CE router:
RFC4577 22 OSPF as the PE CE routing protocols deep dive   Part 1 of 3   Redistribution

LSA Translation

Taking the above image as an example. R7 is running OSPF with R2. R2 is also running OSPF with R5 and so any LSA updates are sent to R5 from R7 as per standard OSPF rules. When R2 needs to advertise the route over to R4, that LSA needs to be converted to a VPNv4 route. R4 will then convert that VPNv4 route back to an OSPF route on the other side. So how does the RFC state this LSA must be translated?

Section 4.2.6 of the RFC states:

For every address prefix that was installed in the VRF by one of its associated OSPF instances, the PE must create a VPN-IPv4 route in BGP. Each such route will have some of the
following Extended Communities attributes:

- The OSPF Domain Identifier Extended Communities attribute. If the OSPF instance that installed the route has a non-NULL primary Domain Identifier, this MUST be present; if that OSPF instance has only a NULL Domain Identifier, it MAY be omitted. This attribute is encoded with a two-byte type field, and its type is 0005, 0105, or 0205. For backward compatibility, the type 8005 MAY be used as well and is treated as if it were 0005. If the OSPF instance has a NULL Domain Identifier, and the OSPF Domain Identifier Extended Communities attribute is present, then the attribute’s value field must be all zeroes, and its type field may be any of 0005, 0105, 0205, or 8005.

- OSPF Route Type Extended Communities Attribute. This attribute MUST be present. It is encoded with a two-byte type field, and its type is 0306. To ensure backward compatibility, the type 8000 SHOULD be accepted as well and treated as if it were type 0306. The remaining six bytes of the Attribute are encoded as follows:

Area Number – Route Type – Options

In the test network I have already configured mutual redistribution between OSPF and BGP on both PE routers. Let’s see if the VPNv4 routes match what we expect from the RFC. R7 is advertising it’s loopback into OSPF. R2 converts this to a VPNv4 route. Let’s dig into the VPNv4 route itself:

R2#show bgp vpnv4 un all  7.7.7.7
BGP routing table entry for 2.2.2.2:1:7.7.7.7/32, version 28
Paths: (1 available, best #1, table A)
  Advertised to update-groups:
     1
  Local
    10.0.27.7 from 0.0.0.0 (2.2.2.2)
      Origin incomplete, metric 2, localpref 100, weight 32768, valid, sourced, best
      Extended Community: RT:1:1 OSPF DOMAIN ID:0x0005:0x000000010200
        OSPF RT:0.0.0.0:2:0 OSPF ROUTER ID:2.2.2.2:0
      mpls labels in/out 26/nolabel

The route has a number of extended communities. The first one we’ll look at is the domain id value of

OSPF DOMAIN ID:0x0005:0x000000010200

IOS has encoded a type 005 domain ID with a value of 000000010200. This is interesting as I have not hard-coded a domain ID. Section 4.2.4 of the RFC states:

Each OSPF instance MUST be associated with one or more Domain Identifiers. This MUST be configurable, and the default value (if none is configured) SHOULD be NULL.

I have not configured one yet there is one. This means IOS is configuring one automatically even though it SHOULD be null.

The second community we’ll look at is the Route Type Extended Communities Attribute:

OSPF RT:0.0.0.0:2:0

The RFC states that the RT is broken up as follows:

  1. 32-bit Area number
  2. Route-type
  3. Options

From our value above we can see that the original OSPF LSA is from area 0. Our RT says that this route comes from a type-2 LSA, but that’s incorrect as 7.7.7.7 is coming in via a type-1 LSA so that is a bit odd (as we shall see in a bit, it doesn’t actually matter whether this value is 1, 2, or 3 at the end of day). The final byte is the Options byte which is currently zero.

This VPNv4 update is now sent over to R4, who needs to take that information and create a new OSPF LSA and advertise it to R6. What does the RFC say about how the PE needs to do this?

VPNv4 routes received via BGP

Sescion 4.2.8.1 of the RFC states:

With respect to a particular OSPF instance associated with a VRF, a VPN-IPv4 route that is installed in the VRF and then selected as the preferred route is treated as an External Route if one of the following conditions holds:

- The route type field of the OSPF Route Type Extended Community has an OSPF route type of “external”

- The route is from a different domain from the domain of the OSPF instance

What this means is that if a route comes into a PE as an External or NSSA-External , it will always be so. It can never change. If a route comes in with a type of 1, 2, or 3; and the domain-id matches – then the local PE will originate a new type-3 LSA. i.e. the route will appear inter-area on the other customer sites.
If a route comes in with a type of 1, 2, or 3; and the domain-id does not match, then it becomes an external route.
All my routers are currently running IOS and OSPF process ID 100. This means currently all the domain-ids match. This means that R4 should be originating a new type-3 LSA. We can verify this on R6:

R6#sh ip ospf database summary 7.7.7.7

            OSPF Router with ID (6.6.6.6) (Process ID 1)

                Summary Net Link States (Area 0)

  Routing Bit Set on this LSA in topology Base with MTID 0
  LS age: 638
  Options: (No TOS-capability, DC, Downward)
  LS Type: Summary Links(Network)
  Link State ID: 7.7.7.7 (summary Network Number)
  Advertising Router: 4.4.4.4
  LS Seq Number: 80000001
  Checksum: 0x1EDF
  Length: 28
  Network Mask: /32
        MTID: 0         Metric: 2

We see the 7.7.7.7/32 LSA coming from 4.4.4.4. This means the OSPF route should be inter area:

R6#sh ip route 7.7.7.7
Routing entry for 7.7.7.7/32
  Known via "ospf 1", distance 110, metric 3, type inter area
  Last update from 10.0.46.4 on FastEthernet1/0, 00:11:25 ago
  Routing Descriptor Blocks:
  * 10.0.46.4, from 4.4.4.4, 00:11:25 ago, via FastEthernet1/0
      Route metric is 3, traffic share count is 1

Let’s change the domain-id on R4 to Null to see if this will change the route-type:

R4#conf t
Enter configuration commands, one per line.  End with CNTL/Z.
R4(config)#router ospf 1
R4(config-router)#domain-id Null
R4(config-router)#end

Verify:

R6#sh ip route 7.7.7.7
Routing entry for 7.7.7.7/32
  Known via "ospf 1", distance 110, metric 2
  Tag Complete, Path Length == 1, AS 100, , type extern 2, forward metric 1
  Last update from 10.0.46.4 on FastEthernet1/0, 00:00:08 ago
  Routing Descriptor Blocks:
  * 10.0.46.4, from 4.4.4.4, 00:00:08 ago, via FastEthernet1/0
      Route metric is 2, traffic share count is 1
      Route tag 3489661028

R6#sh ip ospf database external 7.7.7.7

            OSPF Router with ID (6.6.6.6) (Process ID 1)

                Type-5 AS External Link States

  Routing Bit Set on this LSA in topology Base with MTID 0
  LS age: 69
  Options: (No TOS-capability, DC)
  LS Type: AS External Link
  Link State ID: 7.7.7.7 (External Network Number )
  Advertising Router: 4.4.4.4
  LS Seq Number: 80000001
  Checksum: 0x863A
  Length: 36
  Network Mask: /32
        Metric Type: 2 (Larger than any link state path)
        MTID: 0
        Metric: 2
        Forward Address: 0.0.0.0
        External Route Tag: 3489661028

As expected, the route is now external.

IOS-XR

I’ve swapped out R4 with an IOS-XR box and configured it the same. How has R6′s loopback been converted into a VPNv4 route?

R2#show bgp vpnv4 un all 6.6.6.6
BGP routing table entry for 1:1:6.6.6.6/32, version 4
Paths: (1 available, best #1, table A)
  Not advertised to any peer
  Local
    4.4.4.4 (metric 4) from 4.4.4.4 (4.4.4.4)
      Origin incomplete, metric 2, localpref 100, valid, internal, best
      Extended Community: RT:1:1 OSPF RT:0.0.0.0:2:0 OSPF ROUTER ID:4.4.4.4:0
      mpls labels in/out nolabel/16012

What’s interesting here is that IOS-XR follows the RFC a little bit more closely in that there is no implicit default Domain-ID. This means a L3VPN where some of your routers are IOS and some are IOS-XR, their Domain-IDs are not going to match unless you change the defaults. This should also mean on R6 I should be seeing external routes from R7 and R5:

RP/0/3/CPU0:R6#sho route ipv4 7.7.7.7
Thu Jan  2 17:05:59.526 UTC

Routing entry for 7.7.7.7/32
  Known via "ospf 1", distance 110, metric 2
  Tag 3489661028, type extern 2
  Installed Jan  2 17:01:21.626 for 00:04:38
  Routing Descriptor Blocks
    10.19.20.19, from 4.4.4.4, via POS0/7/0/0
      Route metric is 2
  No advertising protos.
RP/0/3/CPU0:R6#sho route ipv4 5.5.5.5
Thu Jan  2 17:06:05.378 UTC

Routing entry for 5.5.5.5/32
  Known via "ospf 1", distance 110, metric 2
  Tag 3489661028, type extern 2
  Installed Jan  2 17:01:21.625 for 00:04:43
  Routing Descriptor Blocks
    10.19.20.19, from 4.4.4.4, via POS0/7/0/0
      Route metric is 2
  No advertising protos.

Let’s hard-code the Domain-ID on R4 to ensure they now match:

RP/0/0/CPU0:R4#conf
Thu Jan  2 17:06:40.703 UTC
RP/0/0/CPU0:R4(config)#router ospf 100 vrf A domain-id type 0005 value 0000006$
RP/0/0/CPU0:R4(config)#end
Uncommitted changes found, commit them before exiting(yes/no/cancel)? [cancel]:yes
RP/0/3/CPU0:R6#sho route ipv4 5.5.5.5
Thu Jan  2 17:08:01.164 UTC

Routing entry for 5.5.5.5/32
  Known via "ospf 1", distance 110, metric 3, type inter area
  Installed Jan  2 17:07:33.737 for 00:00:27
  Routing Descriptor Blocks
    10.19.20.19, from 4.4.4.4, via POS0/7/0/0
      Route metric is 3
  No advertising protos.

Knowing the implicit defaults on both platforms can certainly save you from headaches.

Multiple Domain-IDs

IOS gives you the option to have secondary domain-IDs. The configuration guide doesn’t give all that information on what exactly it does, so it’s time to break out Wireshark. First I’ll configure multiple secondary domain-ids on R2:

R2#sh run | sec router ospf
router ospf 1 vrf A
 domain-id type 0005 value 000000010200
 domain-id type 0005 value 000000020200 secondary
 domain-id type 0005 value 000000030200 secondary
 domain-id type 0005 value 000000040200 secondary
 log-adjacency-changes
 redistribute bgp 100 subnets

Will this make R2 generate VPNv3 update with multiple extended OSPF communities? I’m capturing BGP traffic on R4′s core interface and done a route refresh:
RFC4577 3 OSPF as the PE CE routing protocols deep dive   Part 1 of 3   Redistribution
No. The VPNv4 update still only has a single domain-id. Secondary domain-ids are for a receiving PE to look at. If it receives OSPF updates from multiple different domain-id’s, if the ID matches any of the local secondary IDs, then it is considered a match. In order for this to work, all sides will need to match multiple IDs to consider everything internal as each PE can only originate a single ID outbound.

Reliable static routing without the need for the data license

Sometimes it’s required that you have a number of static routes on a router, maybe for management or some other reason. If the static route point to a next-hop, but the exit interface stays up, there is no way for the router to know that it’s sending traffic down a black hole. Let’s show the following diagram as an example:
reliable static Reliable static routing without the need for the data license

R2 is a CPE on site. It has a primary link on fa0/0 connected to both R3 and R4 through a switch/VPLS. R2 is running OSPF with R3 and R4 and also has a floating static default route to R4′s fa0/0 interface. If this link goes down, the floating static route should come into play and take over. While the link is up we have a static route on R2 that sends our management traffic (10.0.0.0/24) to R4′s fa0/0 interface.

R3 is originating a default route via OSPF.

But does this actually work? Let’s configure it up quickly first and then break R2′s primary link.

R3:

interface FastEthernet0/0
 ip address 192.168.234.3 255.255.255.0
 ip ospf 1 area 0
!
router ospf 1
 default-information originate always

R4:

interface FastEthernet0/0
 ip address 192.168.234.4 255.255.255.0
 ip ospf 1 area 0
!
interface Serial0/0
 ip address 24.24.24.4 255.255.255.0

R2

interface FastEthernet0/0
 ip address 192.168.234.2 255.255.255.0
 ip ospf 1 area 0
!
interface Serial0/0
 ip address 24.24.24.2 255.255.255.0
!
ip route 0.0.0.0 0.0.0.0 24.24.24.4 200
ip route 10.0.0.0 255.255.255.0 192.168.234.4

Let’s have a look at R2′s routing table:

R2#   sh ip route | begin Gate
Gateway of last resort is 192.168.234.3 to network 0.0.0.0

C    192.168.234.0/24 is directly connected, FastEthernet0/0
     24.0.0.0/24 is subnetted, 1 subnets
C       24.24.24.0 is directly connected, Serial0/0
     10.0.0.0/24 is subnetted, 1 subnets
S       10.0.0.0 [1/0] via 192.168.234.4
O*E2 0.0.0.0/0 [110/1] via 192.168.234.3, 00:01:22, FastEthernet0/0

All looks good. I have my OSPF default route and I also have my management range route to R4.

Now for some reason, R2′s primary link fails. The fa0/0 interface stays up however.The link is dodgy, r there is a mess-up with the VPLS, it doesn’t really matter. What happens then?

R2 loses it’s adjacency with R4, but what about our management traffic?

R2#   sh ip route | begin Gate
Gateway of last resort is 24.24.24.4 to network 0.0.0.0

C    192.168.234.0/24 is directly connected, FastEthernet0/0
     24.0.0.0/24 is subnetted, 1 subnets
C       24.24.24.0 is directly connected, Serial0/0
     10.0.0.0/24 is subnetted, 1 subnets
S       10.0.0.0 [1/0] via 192.168.234.4
S*   0.0.0.0/0 [200/0] via 24.24.24.4

The problem is that we are still sending management traffic off to R4. This is the problem with a static route, it’s static! R2 has a next-hop of 192.168.234.4 – It’s interface in this subnet is still up, and so the router is trying to ARP for 192.168.234.4. Of course R4 never responds but the router will continue to try. It’ll never fail over to the backup.

Now with reliable static routing you are able to generate an IP Sla object which consistently pings another interface. If you get no response you cause the track object to go down and hence the static route goes down. The problem with this is that you need an expensive data license for the privilege of doing this.

But track objects can track a lot more than just IP Sla objects. You can also track routes. So why not track the default route considering we are learning that through the primary link? If the primary fails and OSPF times out, we will remove the OSPF default. Let’s try and see what happens:

R2:

track 1 ip route 0.0.0.0 0.0.0.0 reachability
!
ip route 10.0.0.0 255.255.255.0 192.168.234.4 track 1

Some of you may see a problem here, but bear with me.

Let’s see if this has fixed the problem:

R2#   sh ip route | begin Gate
Gateway of last resort is 24.24.24.4 to network 0.0.0.0

C    192.168.234.0/24 is directly connected, FastEthernet0/0
     24.0.0.0/24 is subnetted, 1 subnets
C       24.24.24.0 is directly connected, Serial0/0
     10.0.0.0/24 is subnetted, 1 subnets
S       10.0.0.0 [1/0] via 192.168.234.4
S*   0.0.0.0/0 [200/0] via 24.24.24.4

Hmm, we are still sending traffic to R4′s fa0/0 interface. Why is this?

R2#sh track 1
Track 1
  IP route 0.0.0.0 0.0.0.0 reachability
  Reachability is Up (static)
    1 change, last change 00:04:22
  First-hop interface is Serial0/0
  Tracked by:
    STATIC-IP-ROUTING 0

The problem is that our floating static route went live. As soon as it did we had a default route again and hence the track object is now UP.

But you don’t HAVE to track a default route. Why don’t we simply inject a phantom prefix on R3? One that will simply be used for tracking?

R3:

interface Loopback1
 ip address 3.3.3.3 255.255.255.255
 ip ospf 1 area 0

R2:

R2#conf t
Enter configuration commands, one per line.  End with CNTL/Z.
R2(config)#no track 1
R2(config)#track 1 ip route 3.3.3.3/32 reachability

R2 is now tracking the loopback route from R3:

R2#sh track 1
Track 1
  IP route 3.3.3.3 255.255.255.255 reachability
  Reachability is Up (OSPF)
    2 changes, last change 00:00:03
  First-hop interface is FastEthernet0/0
  Tracked by:
    STATIC-IP-ROUTING 0
R2#   sh ip route | begin Gate
Gateway of last resort is 192.168.234.3 to network 0.0.0.0

     3.0.0.0/32 is subnetted, 1 subnets
O       3.3.3.3 [110/11] via 192.168.234.3, 00:00:24, FastEthernet0/0
C    192.168.234.0/24 is directly connected, FastEthernet0/0
     24.0.0.0/24 is subnetted, 1 subnets
C       24.24.24.0 is directly connected, Serial0/0
     10.0.0.0/24 is subnetted, 1 subnets
S       10.0.0.0 [1/0] via 192.168.234.4
O*E2 0.0.0.0/0 [110/1] via 192.168.234.3, 00:00:24, FastEthernet0/0

R2 now loses OSPF adjacency:

R2#
*Mar  1 00:38:51.919: %OSPF-5-ADJCHG: Process 1, Nbr 192.168.234.3 on FastEthernet0/0 from FULL to DOWN, Neighbor Down: Dead timer expired
R2#
*Mar  1 00:38:55.239: %OSPF-5-ADJCHG: Process 1, Nbr 192.168.234.4 on FastEthernet0/0 from FULL to DOWN, Neighbor Down: Dead timer expired
R2#
*Mar  1 00:39:05.307: %TRACKING-5-STATE: 1 ip route 3.3.3.3/32 reachability Up->Down
R2#
R2#sh track 1
Track 1
  IP route 3.3.3.3 255.255.255.255 reachability
  Reachability is Down (no route)
    3 changes, last change 00:00:11
  First-hop interface is unknown
  Tracked by:
    STATIC-IP-ROUTING 0
R2#   sh ip route | begin Gate
Gateway of last resort is 24.24.24.4 to network 0.0.0.0

C    192.168.234.0/24 is directly connected, FastEthernet0/0
     24.0.0.0/24 is subnetted, 1 subnets
C       24.24.24.0 is directly connected, Serial0/0
S*   0.0.0.0/0 [200/0] via 24.24.24.4

R2 loses the track route, removes the static and install the floating static route. All is good :)

I know there are better ways of doing the above. As in advertise management ranges via OSPF or running BFD, but not all of these are always available, especially over back up links.

Upgrading the compact flash on a Juniper M10 (RE2.0, RE333)

I’ve got a spare Juniper M10 in my lab, and I wanted to upgrade JUNOS to version 10. With JUNOS you can only upgrade 3 minor revisions at a time, so I needed to go up to 8.5 first. However I got this error:

WARNING: This installation will not succeed.
WARNING: The boot device is less than 256M.
WARNING: A hardware upgrade is required.

8.5 requires a 256MB boot partition. 9.0 requires a flash card of at least 512MB and 10.0+ needs a Gig card.

These old routing engines are not the easiest to upgrade. You need to take out the routing engine from the back of the M10 and physically remove 2 boards from each other. You can find a detailed description of how to remove it over here: http://juniper.cluepon.net/Replacing/upgrading_the_CF_on_a_RE

This is what the original internal card looks like:ram cf96 unit 300x300 Upgrading the compact flash on a Juniper M10 (RE2.0, RE333)

Once I get the old card out (96MB) I decided on trying a couple I had lying around. The first one I tried was a Kingston 2GB card:20 160 033 09 300x225 Upgrading the compact flash on a Juniper M10 (RE2.0, RE333)

I stuck the routing engine back into the M10 and booted it up. Please note you WILL need console access for these next steps.

When booting up, it’ll try and boot up from the compact flash card. It’ll sit here for a good 5 minutes or so, just leave it be. It’ll eventually reboot and boot off the internal hard disk. Once booted, you’ll need to issue ‘request system snapshot partition’

root> request system snapshot partition
Clearing current label...
Partitioning compact-flash media (ad0) ...
Partitions on snapshot:

  Partition  Mountpoint  Size    Snapshot argument
      a      /           1024MB  root-size
      e      /config     198MB   config-size
      f      /var        760MB   var-size
Running newfs (1024MB) on compact-flash media / partition (ad0s1a)...
Running newfs (198MB) on compact-flash media /config partition (ad0s1e)...
Running newfs (760MB) on compact-flash media /var partition (ad0s1f)...
Copying '/dev/ad1s1a' to '/dev/ad0s1a' .. (this may take a few minutes)
Copying '/dev/ad1s1e' to '/dev/ad0s1e' .. (this may take a few minutes)
The following filesystems were archived: / /config

The above will take quite a while, but it’ll partition up the flash card, and then copy the boot partitions over to the flash disk. You can see that it automatically created a boot partition of 1GB – I’m not sure if you can manually adjust this.

Now all you need to do it reboot:

root> request system reboot
Reboot the system ? [yes,no] (no) yes

You should see this on startup:

Trying to Boot from Compact Flash...
Loading /boot/loader
Console: serial port
BIOS drive A: is disk0
BIOS drive C: is disk1
BIOS drive D: is disk2
BIOS 639kB/523200kB available memory

FreeBSD/i386 bootstrap loader, Revision 0.8
([email protected], Thu Apr 12 00:09:54 GMT 2007)
Loading /boot/defaults/loader.conf
/kernel text=0x4bbe0b data=0x3ca74+0x62caa syms=[0x4+0x5a9c0+0x4+0x6a58f]

Hit [Enter] to boot immediately, or space bar for command prompt.

Looks fine to me. You can also verify from the cli:

root> show system storage
Filesystem              Size       Used      Avail  Capacity   Mounted on
/dev/ad0s1a             1.7G        49M       1.5G        3%  /

ad0 is the flash card. Finally you can verify with the show system boot-messages command:

root> show system boot-messages
Copyright (c) 1996-2001, Juniper Networks, Inc.
All rights reserved.
.
.
.
[removed to make it easier to read]
.
.
ad0: 1983MB  [4029/16/63] at ata0-master PIO4
ad1: 11513MB  [23392/16/63] at ata0-slave UDMA33
Mounting root from ufs:/dev/ad0s1a

Finally, can I now install higher versions of JUNOS?

root> show version
Model: m10
JUNOS Base OS boot [8.5R2.10]
JUNOS Base OS Software Suite [8.5R2.10]
root> show version
Model: m10
JUNOS Base OS boot [9.3R4.4]
JUNOS Base OS Software Suite [9.3R4.4]
root> show version
Model: m10
JUNOS Base OS boot [10.0R4.7]
JUNOS Base OS Software Suite [10.0R4.7]

Works perfectly fine. 1 thing I have noticed is that upgrading the JUNOS image took a while, but it’s not something you do everyday. I’ve now gone JUNOS 10 on this box :)

EDIT (11/01/11): I was trying to upgrade from 10.1 to 10.2 and noticed that I suddenly had no space in /var – Checking the logs I see this:

ad1: 11513MB  at ata0-slave UDMA33
root> show system storage
Filesystem              Size       Used      Avail  Capacity   Mounted on
/dev/ad0s1a             1.7G       168M       1.4G       10%  /
devfs                   1.0K       1.0K         0B      100%  /dev
devfs                   1.0K       1.0K         0B      100%  /dev/
/dev/md0                 34M        34M         0B      100%  /packages/mnt/jbase
/dev/md1                232M       232M         0B      100%  /packages/mnt/jkernel-10.0R4.7
/dev/md2                 16M        16M         0B      100%  /packages/mnt/jpfe-M10-10.0R4.7
/dev/md3                5.5M       5.5M         0B      100%  /packages/mnt/jdocs-10.0R4.7
/dev/md4                 60M        60M         0B      100%  /packages/mnt/jroute-10.0R4.7
/dev/md5                 15M        15M         0B      100%  /packages/mnt/jcrypto-10.0R4.7
/dev/md6                 36M        36M         0B      100%  /packages/mnt/jpfe-common-10.0R4.7
/dev/md7                 63M       8.0K        58M        0%  /tmp
/dev/md8                 63M        13M        45M       22%  /mfs
/dev/ad0s1e             195M        12K       179M        0%  /config
procfs                  4.0K       4.0K         0B      100%  /proc
root> request system partition hard-disk
mount: /dev/ad1s1e : No such file or directory

Hmm, BIOS sees the hard drive, but I can’t use it. this was easily fixed though:

root> request system snapshot partition
Clearing current label...
Partitioning hard-disk media (ad1) ...
Partitions on snapshot:

  Partition  Mountpoint  Size    Snapshot argument
      a      /           1024MB  root-size
      e      /config     1GB     config-size
      f      /var        9GB     var-size
Running newfs (1024MB) on hard-disk media / partition (ad1s1a)...
Running newfs (1GB) on hard-disk media /config partition (ad1s1e)...
Running newfs (9GB) on hard-disk media /var partition (ad1s1f)...
Copying '/dev/ad0s1a' to '/dev/ad1s1a' .. (this may take a few minutes)
Copying '/dev/ad0s1e' to '/dev/ad1s1e' .. (this may take a few minutes)
The following filesystems were archived: / /config
root> request system partition hard-disk

WARNING:   The hard disk is about to be partitioned.  The contents
WARNING:   of /altroot and /altconfig will be saved and restored.
WARNING:   All other data is at risk.  This is the setup stage, the
WARNING:   partition happens during the next reboot.

Setting up to partition the hard disk ...

WARNING:   A REBOOT IS REQUIRED TO PARTITION THE HARD DISK.  Use the
WARNING:   'request system reboot' command when you are ready to proceed
WARNING:   with the partitioning.  To abort the partition of the hard disk
WARNING:   use the 'request system partition abort' command.


root> request system reboot
Reboot the system ? [yes,no] (no) yes

I then rebooted

root> show system storage
Filesystem              Size       Used      Avail  Capacity   Mounted on
/dev/ad0s1a             1.7G       168M       1.4G       10%  /
devfs                   1.0K       1.0K         0B      100%  /dev
devfs                   1.0K       1.0K         0B      100%  /dev/
/dev/md0                 34M        34M         0B      100%  /packages/mnt/jbase
/dev/md1                232M       232M         0B      100%  /packages/mnt/jkernel-10.0R4.7
/dev/md2                 16M        16M         0B      100%  /packages/mnt/jpfe-M10-10.0R4.7
/dev/md3                5.5M       5.5M         0B      100%  /packages/mnt/jdocs-10.0R4.7
/dev/md4                 60M        60M         0B      100%  /packages/mnt/jroute-10.0R4.7
/dev/md6                 15M        15M         0B      100%  /packages/mnt/jcrypto-10.0R4.7
/dev/md7                 36M        36M         0B      100%  /packages/mnt/jpfe-common-10.0R4.7
/dev/md8               1006M       8.0K       926M        0%  /tmp
/dev/md9               1006M       1.2M       925M        0%  /mfs
/dev/ad0s1e             195M        12K       179M        0%  /config
procfs                  4.0K       4.0K         0B      100%  /proc
/dev/ad1s1f             8.2G        12M       7.5G        0%  /var

All fixed