Basic OOP Python

I’ve just started with object oriented programming in Python so I thought I’d cover some of the basics here. Please don’t assume this is a thorough tutorial on OOP!

The beauty of OOP is that it allows me to create a template with which I can create objects. The building blocks of the object sit in the class, while the object itself is created from that blueprint with various properties. I can then make as many of these objects as I desire from that single class. I can see this being very beneficial for certain types of programming (games especially)

Basics

I’ll start with something simple. I want to create a class called Ball. This class requires certain properties like radius and colour. I’ll create a class like so:

class Ball:
    def __init__(self, radius, colour):
        self.radius = radius
        self.colour = colour

I’ve created a class called ‘Ball’ – This requires two variables, radius and colour. Note that ‘self’ refers to the object that is being created. Once this is done, I can then call this class to create objects. I need to ensure I pass both variables otherwise:

>>>blueball = Ball("Blue")

Traceback (most recent call last):
  File "", line 1, in 
    blueball = Ball("Blue")
TypeError: __init__() takes exactly 3 arguments (2 given)

Let’s do it properly:

>>>blueball = Ball(10, "Blue")

blueball is now an object created from the class:

>>> blueball
<__main__.Ball instance at 0x10f11bdd0>
>>> print type(blueball)
type 'instance'

Each self.x in the class is a method which I can set and interrogate. If I wanted to see the current radius of blueball, I simply call the method I defined:

>>> blueball.radius
10

I can also change the variable later if I so choose:

>>> blueball.radius = 20
>>> blueball.radius
20

If I simply print blueball, I won’t get much:

>>> print blueball
<__main__.Ball instance at 0x10f11bdd0>

In order to be able to get string output from an object, I need to ensure the class has a string method. I’ll add the following to my original class:

class Ball:
	def __init__(self, radius, colour):
		self.radius = radius
		self.colour = colour
    def __str__(self):
		return "I am a " + self.colour + " ball, with a radius of " + str(self.radius)

Note that my old blueball variable has still been created from the old class so I’ll simply create a new one:

>>> blueball = Ball(10, "Blue")
>>> print str(blueball)
I am a Blue ball, with a radius of 10

I can now happily create as many objects as I want, with different properties. Each object is a separate instance:

>>> redball = Ball(20, "Red")
>>> print str(redball)
I am a Red ball, with a radius of 20


Let’s take this a step further. For this I’m going to use Scott Rixner’s CodeSkulptor as it has some nice draw capabilties built-in. It also runs in your browser directly.

I’d like to create an empty space with nothing. I then want to click a button to create a new ball with random properties. That ball should then be displayed inside the space. Each click should be a new object created from my original class. I’m going to add a few more properties to my class which will come clear later.

I’ll first get my global variables set:

list_of_balls = []
width = 1000
height = 600
colours = ["Aqua","Blue","Green","Lime","Maroon","Navy","Orange","Red","White","Yellow"]

Now comes the Ball class:

class Ball:
    def __init__(self, radius, mass, colour, x_location):
        self.radius = radius
        self.mass = mass
        self.colour = colour
        self.location = [x_location, height/2]

When I click the mouse button, I want certain random properties set for the object:

def click():
    radius = random.randint(1,40)
    mass = radius
    colour = random.choice(colours)
    x_location = random.randint(20, width-20)
    new_ball = Ball(radius, mass, colour, x_location)
    list_of_balls.append(new_ball)

The mouse click handler creates a new ball with random properties, then appends that ball to a list of balls. Each time I click a new ball is added to the list.

I now need to draw the balls. I need to iterate through my list of objects and draw each one:

def draw(canvas):
    for ball in list_of_balls:
        canvas.draw_circle((ball.location[0],ball.location[1]), ball.radius, 1, ball.colour, ball.colour)

Click here to view and run the code in CodeSkulptor. Press play in the top left corner to run the code.

Dynamic

So the above code simply creates a bunch of balls on random places on the x axis while being in the middle of the y axis. Let’s start moving these balls around based on their initial mass. Currently the bigger the ball, the bigger the mass. Let’s keep it that way for now. All balls will simply fall towards the ground. I’d like to make sure that when the ball hits the bottom of the screem it’s reflected back up. Same goes for the top of the screen.

First I need to add velocity to the object. Note that properties of an object don’t all have to be variables. I could set the velocity of all the balls exactly the same. For now I’ll use the mass of the ball:

class Ball:
    def __init__(self, radius, mass, colour, x_location):
        self.radius = radius
        self.mass = mass
        self.velocity = self.mass
        self.colour = colour
        self.location = [x_location, height/2]

I then need to update my draw handler so it updates the position of each ball. If the y location of the ball hits the top or bottom of the screen, reverse the direction:

def draw(canvas):
    for ball in list_of_balls:
        ball.location[1] += ball.velocity
        if ball.location[1] >= (height - ball.radius) or ball.location[1] <= ball.radius:
            ball.velocity = -ball.velocity
        canvas.draw_circle((ball.location[0],ball.location[1]), ball.radius, 1, ball.colour, ball.colour)

Click here to open my code in CodeSkulptor. Now every time you add a new ball, it’ll start bouncing against the top and bottom wall. Create as many balls as you like!

Conclusions

It’s still early days in my OOP work. I can see this method is perfect for applications like gaming. I’m not 100% sure if I’ll have an application for it in my type of coding, I’ll have to see.

In the interim, the basics of OOP isn’t that difficult. I’ve still got a long way to go but happy so far!

For future work on my code above, it would be trivial to set a random mass. It would also be nice to extend the balls to simply bounce like in real life, or gets the balls to bounce off each other. Either way, the properties of each ball itself is independent of the environment in which it sits.

python interface-checker

I’d like helpdesk to be able to enable and disable switchports without requiring them to know the underlying OS. My plan is to have a webpage with a list of devices. When you click on a device it will check the interfaces status via SNMP and display administrative and operational status of all interfaces on the device. Then via a few click it’ll use NETCONF to enable or disable a port.

As a first step I’ve created a quick python script to SNMP into a device and get it’s output. For the moment this just outputs to the CLI in a table. Long term the front end will use this to get the interface status when a device is clicked.

The next step is to get NETCONF working.

I’ve tested this script on Juniper, Cisco, and Brocade and all data is retrieved correctly.

Here is an example from a Juniper MX80:

Darren-MBP:interface-checker darren$ ./ic.py router-name
+-------------------------+-----------------+--------------+--------------------+
|        OID Value        |  Interface Name | Admin Status | Operational Status |
+-------------------------+-----------------+--------------+--------------------+
|  1.3.6.1.2.1.2.2.1.2.1  |       fxp0      |     DOWN     |        DOWN        |
|  1.3.6.1.2.1.2.2.1.2.4  |       lsi       |      UP      |         UP         |
|  1.3.6.1.2.1.2.2.1.2.5  |       dsc       |      UP      |         UP         |
|  1.3.6.1.2.1.2.2.1.2.6  |       lo0       |      UP      |         UP         |
|  1.3.6.1.2.1.2.2.1.2.7  |       tap       |      UP      |         UP         |
|  1.3.6.1.2.1.2.2.1.2.8  |       gre       |      UP      |         UP         |
|  1.3.6.1.2.1.2.2.1.2.9  |       ipip      |      UP      |         UP         |
|  1.3.6.1.2.1.2.2.1.2.10 |       pime      |      UP      |         UP         |
|  1.3.6.1.2.1.2.2.1.2.11 |       pimd      |      UP      |         UP         |
|  1.3.6.1.2.1.2.2.1.2.12 |       mtun      |      UP      |         UP         |
|  1.3.6.1.2.1.2.2.1.2.16 |      lo0.0      |      UP      |         UP         |
|  1.3.6.1.2.1.2.2.1.2.17 |       em0       |      UP      |         UP         |
|  1.3.6.1.2.1.2.2.1.2.18 |      em0.0      |      UP      |         UP         |
|  1.3.6.1.2.1.2.2.1.2.21 |    lo0.16384    |      UP      |         UP         |
|  1.3.6.1.2.1.2.2.1.2.22 |    lo0.16385    |      UP      |         UP         |
|  1.3.6.1.2.1.2.2.1.2.23 |       em1       |      UP      |        DOWN        |
|  1.3.6.1.2.1.2.2.1.2.33 |       me0       |      UP      |         UP         |
|  1.3.6.1.2.1.2.2.1.2.34 |      me0.0      |      UP      |         UP         |
| 1.3.6.1.2.1.2.2.1.2.501 |      demux0     |      UP      |         UP         |
| 1.3.6.1.2.1.2.2.1.2.502 |     lc-0/0/0    |      UP      |         UP         |
| 1.3.6.1.2.1.2.2.1.2.503 |  lc-0/0/0.32769 |      UP      |         UP         |
| 1.3.6.1.2.1.2.2.1.2.504 |       cbp0      |      UP      |         UP         |
| 1.3.6.1.2.1.2.2.1.2.505 |       pip0      |      UP      |         UP         |
| 1.3.6.1.2.1.2.2.1.2.506 |       pp0       |      UP      |         UP         |
| 1.3.6.1.2.1.2.2.1.2.507 |       irb       |      UP      |         UP         |
| 1.3.6.1.2.1.2.2.1.2.508 |     xe-0/0/0    |     DOWN     |        DOWN        |
| 1.3.6.1.2.1.2.2.1.2.509 |     xe-0/0/1    |     DOWN     |        DOWN        |
| 1.3.6.1.2.1.2.2.1.2.510 |     xe-0/0/2    |     DOWN     |        DOWN        |
| 1.3.6.1.2.1.2.2.1.2.511 |     xe-0/0/3    |     DOWN     |        DOWN        |
| 1.3.6.1.2.1.2.2.1.2.512 |     ge-1/0/0    |     DOWN     |        DOWN        |
| 1.3.6.1.2.1.2.2.1.2.513 |     ge-1/0/1    |     DOWN     |        DOWN        |
| 1.3.6.1.2.1.2.2.1.2.514 |     ge-1/0/2    |     DOWN     |        DOWN        |
| 1.3.6.1.2.1.2.2.1.2.515 |     ge-1/0/3    |     DOWN     |        DOWN        |
| 1.3.6.1.2.1.2.2.1.2.516 |     ge-1/0/4    |     DOWN     |        DOWN        |
| 1.3.6.1.2.1.2.2.1.2.517 |     ge-1/0/5    |     DOWN     |        DOWN        |
| 1.3.6.1.2.1.2.2.1.2.518 |     ge-1/0/6    |     DOWN     |        DOWN        |
| 1.3.6.1.2.1.2.2.1.2.519 |     ge-1/0/7    |     DOWN     |        DOWN        |
| 1.3.6.1.2.1.2.2.1.2.520 |     ge-1/0/8    |     DOWN     |        DOWN        |
| 1.3.6.1.2.1.2.2.1.2.521 |     ge-1/0/9    |     DOWN     |        DOWN        |
| 1.3.6.1.2.1.2.2.1.2.522 |     ge-1/1/0    |      UP      |         UP         |
| 1.3.6.1.2.1.2.2.1.2.523 |     ge-1/1/1    |      UP      |         UP         |
| 1.3.6.1.2.1.2.2.1.2.524 |     ge-1/1/2    |      UP      |         UP         |
| 1.3.6.1.2.1.2.2.1.2.525 |     ge-1/1/3    |      UP      |         UP         |
| 1.3.6.1.2.1.2.2.1.2.526 |     ge-1/1/4    |      UP      |         UP         |
| 1.3.6.1.2.1.2.2.1.2.527 |     ge-1/1/5    |     DOWN     |        DOWN        |
| 1.3.6.1.2.1.2.2.1.2.528 |     ge-1/1/6    |     DOWN     |        DOWN        |
| 1.3.6.1.2.1.2.2.1.2.529 |     ge-1/1/7    |     DOWN     |        DOWN        |
| 1.3.6.1.2.1.2.2.1.2.530 |     ge-1/1/8    |     DOWN     |        DOWN        |
| 1.3.6.1.2.1.2.2.1.2.531 |     ge-1/1/9    |     DOWN     |        DOWN        |
| 1.3.6.1.2.1.2.2.1.2.532 |    pfh-0/0/0    |      UP      |         UP         |
| 1.3.6.1.2.1.2.2.1.2.533 |    pfe-0/0/0    |      UP      |         UP         |
| 1.3.6.1.2.1.2.2.1.2.534 | pfh-0/0/0.16383 |      UP      |         UP         |
| 1.3.6.1.2.1.2.2.1.2.535 | pfe-0/0/0.16383 |      UP      |         UP         |
| 1.3.6.1.2.1.2.2.1.2.547 |    ge-1/1/0.0   |      UP      |         UP         |
| 1.3.6.1.2.1.2.2.1.2.552 |    ge-1/1/1.0   |      UP      |         UP         |
| 1.3.6.1.2.1.2.2.1.2.553 |    ge-1/1/2.0   |      UP      |         UP         |
| 1.3.6.1.2.1.2.2.1.2.554 |    ge-1/1/3.0   |      UP      |         UP         |
| 1.3.6.1.2.1.2.2.1.2.555 |    ge-1/1/4.0   |      UP      |         UP         |
+-------------------------+-----------------+--------------+--------------------+

You can find it here: https://github.com/mellowdrifter/interface-checker

Debian/Ubuntu PMTUD & uRPF

I originally started my PMTUD posts using Ubuntu 14.04. Halfway through the post I simply could not get Ubuntu to change it’s MTU on receipt of ICMP fragmentation needed messages. I then tried Debian and it worked. Windows also had no issues changing it’s MTU.

Wanting to finish off the post I switched to Debian and then would investigate the fault later.

Let’s remind ourselves of the original topology:
pmtu 11 Debian/Ubuntu PMTUD & uRPF
Swap out Debian for Ubuntu in the above image.

When I initially started to test, I dropped the MTU between R1 and R2 to 1400. The link between R2 and R4 was kept at 1500. If the user requested a file from the server at this point, Ubuntu would attempt to send at 1500 and get it’s packet dropped at R1. R1 would send a Fragmentation Needed packet back to the Ubuntu server, which would adjust it’s MTU and then send at 1400.

When I changed the MTU between R1 and R2 back up to 1500 and dropped R2-R4 down to 1400, it no longer worked. Debian and Windows did work. I ran tcpdump on Ubuntu and confirmed that it was definitely getting Fragmentation Needed packets. Ubuntu was only acting on Fragmentation Needed packets if it came from it’s default gateway, R1. Any router further along in the path was getting it’s ICMP packets ignored.

In order to understand what the problem is I need to show more about the topology. While the above diagram shows how thins are connected for the most part, it is missing a couple of things. All the devices are running inside virtualbox linked to GNS3. eth1 of all the servers are connected to the above topology, while eth0 was connected via NAT to my host PC so I could install software:
PMTUD uRPF Debian/Ubuntu PMTUD & uRPF
Each device had a static route to 192.168.0.0/16 to go out eth1 while their default route was out eth0. Some of you may be sensing what the issue is already…

The point to point links between the virtual routers are using the 10.0.0.0/8 space.

If Ubuntu received an ICMP packet from 192.168.4.1, it’s local default gateway on R1, there were no issues. If it received a packet from R2 or R4′s local interfaces, the packet was dropped. Debian and Windows both didn’t have problems, even though they are configured the same way.

sysctl.conf

I’ve touched on sysctl.conf before the the PMTUD posts, but there is an important difference in the defaults of Ubuntu and Debian. Take a look at this.
Debian:
Screen Shot 2014 09 02 at 9.11.50 am Debian/Ubuntu PMTUD & uRPF
Ubuntu:
Screen Shot 2014 09 02 at 9.12.01 am Debian/Ubuntu PMTUD & uRPF

uRPF

Ubuntu has Unicast Reverse Path Forwarding on by default. Debian has it off by default. In sysctl.conf on both machines, the required configuration setting is commented out:
Screen Shot 2014 09 02 at 9.15.44 am Debian/Ubuntu PMTUD & uRPF
R2 was originating it’s ICMP packets from it’s local interface, 10.0.12.2 in my example. Ubuntu did receive that packet, but it failed the RPF check and so was ignored. To confirm I tested this in two different ways:

  • Add a static route to 10.0.0.0/8 out eth1
  • Disable uRPF check on Ubuntu

Each test individually allowed the original PMTUD to work. What’s odd is that the sysctl.conf file in Ubuntu says that you need to uncomment the lines to turn on uRPF, but it’s on by default. Uncommenting the lines and setting the value to 1 is the same as leaving them commented. In Debian the default is to disable uRPF. In that distro you would need to uncomment the uRPF lines and set the value to 1 to turn the feature on.

Conclusions

  • If a server is multi-homed, PMTUD could break if the ICMP message arrives on an interface that the server is not expecting.
  • If you do have a server multi-homed, it would probably be best to turn off uRPF

Fundamentals – PMTUD – IPv4 vs IPv6 – Part 2 of 2

This is a continuation of a post I started back here. Please read it first before starting below.

RFC 4821

Another workaround we can use is Packetization Layer Path MTU Discovery – RFC 4821. The RFC enables a host to mainly acts in one of two ways:

  • Use regular PMTUD. If no acknowledgments are received and no ICMP messages are received, start to probe.
  • Ignore regular PMTUD and always probe.

Probing is where the host will send a packet with the min MTU configured and then attempt to increase that size. If acknowledgements are received on the larger size, then try increase it again. Option 1 will wait for a timeout so on broken PMTUD paths it starts a bit slow. It will however use regular PMTUD whenever it can so it’s a lot more efficient. Option 2 simple probes all the time. It starts a bit quicker on smaller MTU paths, but the server is also sending smaller packets to ALL paths in the beginning. Much less efficient.

I’ll configure this in Debian and then go through Wireshark to show what’s going on. Add the commands net.ipv4.tcp_mtu_probing = 1 to /etc/sysctl.conf then reload sysctl:

root@debian1:~# sysctl -p
net.ipv4.tcp_mtu_probing = 1

Start the transfer and what does Wireshark show us:
Screen Shot 2014 08 29 at 3.43.59 pm Fundamentals   PMTUD – IPv4 vs IPv6 – Part 2 of 2
After the standard 3-way handshake, the server sends a number of 1514 byte packets. ICMP has been blocked and as such there are no ICMP fragmentation needed messages coming from R2. After 5.3 seconds the server sends a number of 578 byte packets.
Screen Shot 2014 08 29 at 3.43.24 pm Fundamentals   PMTUD – IPv4 vs IPv6 – Part 2 of 2
These get ACK’d correctly:
Screen Shot 2014 08 29 at 3.44.59 pm Fundamentals   PMTUD – IPv4 vs IPv6 – Part 2 of 2
0.5 seconds later the server sends a single 1090 byte packets and fill the rest of the window with 578 byte packets. As soon as the ACK for that big packet comes back, the server sends all of its packets at 1090:
Screen Shot 2014 08 29 at 3.47.31 pm Fundamentals   PMTUD – IPv4 vs IPv6 – Part 2 of 2
Screen Shot 2014 08 29 at 3.48.01 pm Fundamentals   PMTUD – IPv4 vs IPv6 – Part 2 of 2

A couple of things to note about this setting in Ubuntu 14.04 and Debian 7.6.0:

  1. The system does not cache the MTU of the path found through PLPMTUD. This does mean that if you have a host making multiple TCP connections to your server over a small MTU path, each one of those are going to need to wait for the timeout.
  2. There is no net.ipv6.tcp_mtu_probing setting in sysctl.conf. However if you enable this setting for IPv4 then IPv6 has the same behavior as IPv4:

Screen Shot 2014 08 29 at 3.54.50 pm Fundamentals   PMTUD – IPv4 vs IPv6 – Part 2 of 2

Windows can also be configured for PLPMTUD but I’ll leave it up to the reader to figure out how to do that.

PMTUD Cache

I showed in part 1 that the server will cache an entry if the MTU is lower than the local link. By default, Debian will cache this entry for 10 minutes. This time is adjustable via sysctl.conf:

root@debian1:~# sysctl -a | grep mtu_expires
net.ipv4.route.mtu_expires = 600
net.ipv6.route.mtu_expires = 600

As soon as a value is cached, the timer starts. This timer counts down even if there is an existing file transfer. The reason is because paths can change. While the transfer is going on it could move to a path which has no MTU issues. We would want the server to then increase it’s MTU. Doing this too quickly can cause more traffic to drop and so the suggestion is to cache the MTU for 10 whole minutes and then try to increase. I’ve started a file transfer which is ongoing and then checked the cache entry on the server. You can see the timer going down:
Screen Shot 2014 09 01 at 1.22.23 pm Fundamentals   PMTUD – IPv4 vs IPv6 – Part 2 of 2
The client has then finished downloading and disconnected from the server. At this point the server still keeps that cache entry. This ensures that if the client connects again shortly it will start with an MTU of 1400:
Screen Shot 2014 09 01 at 1.30.12 pm Fundamentals   PMTUD – IPv4 vs IPv6 – Part 2 of 2
I’ve started a new download within the cache time above and we can see the server immediately starts sending packets with the correct MTU:
Screen Shot 2014 09 01 at 1.37.23 pm Fundamentals   PMTUD – IPv4 vs IPv6 – Part 2 of 2

What should happen when the cache times out is that the server should try to send a larger MTU packet, up to the local MTU. I don’t see that with Debian though. I started the test with the lower MTU cached on my server. When the cache was about to expire above I started the test again and as expected the session starts with the lower cached MTU. I then changed the MTU between R2 and R5 back up to the regular MTU:

R2(config)#int fa0/1
R2(config-if)#no ip mtu 1400
R2(config-if)#end

The odd thing is, when the cache entry timed out, Debian carried on sending packets with an MTU of 1400 and cached the entry again. That’s not supposed to happen.

I then tried the same test again, this time manually clearing the cache on Debian:

root@debian1:~#ip route flush cache

This time the server immediately started to send larger packets:
Screen Shot 2014 09 01 at 2.16.48 pm Fundamentals   PMTUD – IPv4 vs IPv6 – Part 2 of 2

IPv6 has roughly the same broken behavior. At first the cache is created and starts to count down. I started a transfer when it was about to expire. This time it again stayed at 1400, but the timer jumped into a huge number:
Screen Shot 2014 09 01 at 3.59.22 pm Fundamentals   PMTUD – IPv4 vs IPv6 – Part 2 of 2
8590471 seconds is roughly 99 days. Not sure if this is a bug or what exactly.

Clearing the IPv6 cache on the other hand had the required effect:
Screen Shot 2014 09 01 at 4.04.20 pm Fundamentals   PMTUD – IPv4 vs IPv6 – Part 2 of 2
If the MTU matches the outgoing interface, there is no need for the system to cache that entry taking up more resources on the server. Wireshark shows the jump in MTU:
Screen Shot 2014 09 01 at 4.05.32 pm Fundamentals   PMTUD – IPv4 vs IPv6 – Part 2 of 2

Conclusions

  • Blocking the required ICMP packets breaks PMTUD completely.
  • There are alternatives to PMTUD, but they are slower initially.
  • Test your OS’s behavior. I mainly tested with Debian and I ran into a number of ‘odd’ scenarios. Mainly to do with the cache.

Fundamentals – PMTUD – IPv4 & IPv6 – Part 1 of 2

One of IPv6′s features is the fact that routers are no longer supposed to fragment packets. Rather it’s up to the hosts on either end to work out the path MTU. This is different in IPv4 in which the routers along the path could fragment the packet. Both IPv4 and IPv6 have a mechanism to work out the path MTU which is what I’ll go over in this post. Instead of going over each separately, I’ll show what problem is trying to be solved and how both differ when it comes to sending traffic.

I’ll be using the following topology in this post:
pmtu 11 Fundamentals   PMTUD   IPv4 & IPv6   Part 1 of 2

The problem

When you visit this blog, your browser is requesting a particular web page from my server. This request is usually quite small. My server needs to respond to that packet with some actual data. This includes the images, words, plugins, style-sheets, etc. This data can be quite large. My server needs to break down this stream of data into IP packets to send back to you. Each packet requires a few headers, and so the most optimum way to send data back to you is the biggest amount of data in the smallest amount of packets.

Between you and my server sits a load of different networks and hardware. There is no way for my server to know the maximum MTU supported by all those devices along the path. Not only can this path change, but I have many thousands of readers in thousands of different countries. In the topology above, the link between R2 and R4 has an MTU of 1400. None of the hosts are directly connected to that segment and so none of them know the MTU of the entire path.
pmtu 2 Fundamentals   PMTUD   IPv4 & IPv6   Part 1 of 2

PMTUD

Path MTU Discovery, RFC1191 for IPv4 and RFC1981 for IPv6, does exactly what the name suggests. Find out the MTU of the path. There are a number of similarities between the two RFCs, but a few key differences which I’ll dig into.

Note – OS implementations of PMTUD can vary widely. I’ll be showing both Debian Linux server 7.6.0 and Windows Server 2012 in this post.

Both RFCs state that hosts should always assume first that the MTU across the entire path matches the first hop MTU. i.e. The servers should assume that the MTU matches the MTU on the link they are connected. In this case both my Windows and Linux servers have a local MTU of 1500.
pmtu 3 Fundamentals   PMTUD   IPv4 & IPv6   Part 1 of 2

The link between R1 and R4 has an IP MTU of 1400. My servers would need to figure the path MTU in order to maximise the packet size without fragmentation.

  • IPv4
  • RFC1191 states:

    The basic idea is that a source host initially assumes that the PMTU of a path is the (known) MTU of its first hop, and sends all datagrams on that path with the DF bit set. If any of the datagrams are too large to be forwarded without fragmentation by some router along the path, that router will discard them and return ICMP Destination Unreachable messages with a code meaning “fragmentation needed and DF set” [7]. Upon receipt of such a message (henceforth called a “Datagram Too Big” message), the source host reduces its assumed PMTU for the path.

    In my example, the servers should assume that the path MTU is 1500. They should send packets back to the user using this MTU and setting the Do Not Fragment bit. R2′s link to R4 is not big enough and so should drop the packet and return the correct ICMP message back to my servers. Those servers should then send those packets again with a lower MTU.

    I’m going to show Wireshark capture from the servers point of view. I’ll start with Windows.

    The first part is the regular TCP 3-way handshake to set up the session. These packets are very small so are generally not fragmented:
    Screen Shot 2014 08 25 at 12.37.40 Fundamentals   PMTUD   IPv4 & IPv6   Part 1 of 2
    The user then requests a file. The server responds with full size packets with the DF bit set. Those packets are dropped by R2, who sends back the required ICMP message:
    Screen Shot 2014 08 25 at 12.39.49 Fundamentals   PMTUD   IPv4 & IPv6   Part 1 of 2

    Dig a bit deeper into those packets. First the full size packet from the server. Note the DF-bit has been set:
    Screen Shot 2014 08 25 at 12.43.49 Fundamentals   PMTUD   IPv4 & IPv6   Part 1 of 2

    Second, the ICMP message sent from R2. This is an ICMP Type 3 Code 4 message. It states the destination is unreachable and that fragmentation is required. Note it also states the MTU of the next-hop. The Windows server can use this value to re-originate it’s packets with a lower MTU.

    All the rest of the packets in the capture then have the lower MTU set. Note that Wireshark shows the ethernet MTU as well hence the value of 1414:
    Screen Shot 2014 08 25 at 12.49.11 Fundamentals   PMTUD   IPv4 & IPv6   Part 1 of 2

    RFC1191 states that a server should cache a lower MTU value. It’s also suggested that this value is cached for 10 minutes, and should be tweakable. You can view the cached value on Windows, but it doesn’t show the timer. Perhaps a reader could let me know?
    Screen Shot 2014 08 25 at 12.53.53 Fundamentals   PMTUD   IPv4 & IPv6   Part 1 of 2

    I’ll now do the same on my Debian server. First part is the 3-way handshake again:
    Screen Shot 2014 08 26 at 1.27.02 pm Fundamentals   PMTUD   IPv4 & IPv6   Part 1 of 2
    The server starts sending packets with an MTU of 1500:
    Screen Shot 2014 08 26 at 1.28.48 pm Fundamentals   PMTUD   IPv4 & IPv6   Part 1 of 2
    Which are dropped by R2, with ICMP messages sent back:
    Screen Shot 2014 08 26 at 1.29.52 pm Fundamentals   PMTUD   IPv4 & IPv6   Part 1 of 2
    The Debian server will cache that entry. Debian does show me the remaining cache time, in this case 584 seconds:
    Screen Shot 2014 08 26 at 1.32.23 pm Fundamentals   PMTUD   IPv4 & IPv6   Part 1 of 2

  • IPv6
  • RFC1981 goes over the finer details of how this works with IPv6. The majority of the document is identical to the RFC1191 version.

    When the Debian server responds, the packets have a size of 1514 on the wire as expected. Note however that there is no DF bit in IPv6 packets. This is a major difference between IPv4 and IPV6 right here. Routers CANNOT fragment IPv6 packets and hence there is no reason to explicitly state this in the packet. All IPv6 packets are non-fragmentable by routers in the path. I’ll go over what this means in depth later.
    Screen Shot 2014 08 27 at 8.06.39 am Fundamentals   PMTUD   IPv4 & IPv6   Part 1 of 2
    R2 cannot forward this packet and drops it. The message returned by R2 is still an ICMP message, but it’s a bit different to the IPv4 version:
    Screen Shot 2014 08 27 at 8.10.56 am Fundamentals   PMTUD   IPv4 & IPv6   Part 1 of 2

    This time the message is ‘Packet too big’ – Very easy to figure out what that means. The ICMP message will contain the MTU of the next-hop as expected:
    Screen Shot 2014 08 27 at 8.14.02 am Fundamentals   PMTUD   IPv4 & IPv6   Part 1 of 2

    The server will act on this message, cache the result, then send packets up to the required MTU:
    Screen Shot 2014 08 27 at 8.17.30 am Fundamentals   PMTUD   IPv4 & IPv6   Part 1 of 2
    Screen Shot 2014 08 27 at 8.18.29 am Fundamentals   PMTUD   IPv4 & IPv6   Part 1 of 2

    Windows server 2012 has identical behaviour. To show the cache simply view the ipv6 destinationcache and you’re good to go.

    Problems

    So what could possibly go wrong? The above all looks good and works in the lab. The biggest issue is that both require those ICMP messages to come back to the sending host. There are a load of badly configured firewalls and ACLs out there dropping more ICMP than they are supposed to. Some people even drop ALL ICMP. There is another issue that I’ll go over in another blog post in the near future.

    In the above examples, if those ICMP messages don’t get back, the sending host will not adjust it’s MTU. If it continues to send large packets, the router with a smaller MTU will drop that packet. All that traffic is blackholed. Smaller packets like requests will get through. Ping will even get through if echo-requests and echo-replies have been let through. You might even be able to see the beginnings of a web page, but the big content will not load.

    On R1′s fa0/1 interface I’ll create this bad access list:

    R1#sh ip access-lists
    Extended IP access list BLOCK-ICMP
        10 permit icmp any any echo
        20 permit icmp any any echo-reply
        30 deny icmp any any
        40 permit ip any any

    From the client I can ping the host:
    Screen Shot 2014 08 27 at 8.31.41 am Fundamentals   PMTUD   IPv4 & IPv6   Part 1 of 2
    I can even open a text-based page from the server:
    Screen Shot 2014 08 27 at 8.32.30 am Fundamentals   PMTUD   IPv4 & IPv6   Part 1 of 2

    But try to download the file:
    Screen Shot 2014 08 27 at 8.33.39 am Fundamentals   PMTUD   IPv4 & IPv6   Part 1 of 2
    The initial 3-way handshake works fine, but nothing else happens. The Debian server is sending those packets, R2 is dropping and informing the sender, but R1 drops those packets. You’ve now got a black-hole. The same things happens with IPv6, though of course the packet dropped is the Packet Too Big message.

    Workarounds

    The best thing to do is fix the problem. Unfortunately that’s not always possible. There are a few things that can be done to work through the problem of dropped ICMP packets.
    If you know the MTU value further down the line, you can use TCP clamping. This causes the router to intercept TCP SYN packets and rewrite the TCP MSS. You need to take into account the size of the added headers.

    1#conf t
    Enter configuration commands, one per line.  End with CNTL/Z.
    R1(config)#int fa1/1
    R1(config-if)#ip tcp adjust-mss  1360
    R1(config-if)#end

    Note how the MSS value has been changed to 1360:
    Screen Shot 2014 08 28 at 1.46.58 pm Fundamentals   PMTUD   IPv4 & IPv6   Part 1 of 2

    I’ve tested with IOS 15.2(4)S2 and it also works with IPv6:
    Screen Shot 2014 08 28 at 1.54.57 pm Fundamentals   PMTUD   IPv4 & IPv6   Part 1 of 2

    The problem with this is that it’s a burden on the router configured. Your router might not even support this option. This also affects ALL TCP traffic going through that router. TCP clamping can work well for VPN tunnels, but it’s not a very scalable solution.

    Another workaround can be to get the router to disregard the DF bit and just let the routers fragment the packets:

    route-map CLEAR-DF permit 10
     set ip df 0
    !
    interface FastEthernet1/1
     ip address 192.168.4.1 255.255.255.0
     ip router isis
     ip policy route-map CLEAR-DF
     ipv6 address 2001:DB8:10:14::1/64
     ipv6 router isis

    The problem with this is that you’re placing burden on the router again. It’s also not at all efficient. Some firewalls also block fragments. Some routers might just drop fragmented packets.
    The biggest problem with this is that there is no df-bit to clear in IPv6. IPv6 packets will not be fragmented by routers. It has to be done by the host.

    End of Part One

    There is simply too much to cover in a single post. I’ll end this post here. Part two will be coming soon!

    Various networking ramblings from Dual CCIE #38070 (R&S, SP) and JNCIE-SP #2227

    © 2009-2014 Darren O'Connor All Rights Reserved