When and when not to multithread

At the end of my last post on Python multithreading, I said my example was not the best. Let me expand some more on this.

While testing code in the previous post, I noticed that certain code was slower when multiple threads were running. Also these threads are not tied to a CPU. If we were talking about a bigger applications in which we wanted to ensure multiple threads were on different CPUs, you are in fact looking for multiprocessing.

Consider the following code. It simple counts to 99 999, doubles the number, then prints this to the screen. At first I’ll do this as a single thread app then multithread and time them.

Single-thread

#!/usr/bin/python

for i in range (100000):
    i *= 2
    print i

Multi-thread

#!/usr/bin/python

import threading
lock = threading.Lock()

def thread_test(i):
    i *= 2
    with lock:
        print i

threads = []
for i in range (100000):
    t = threading.Thread(target = thread_test, args = (i,))
    threads.append(t)
    t.start()

I’ll now time and run the command. I’ll run each command three times and take the average of all three:

time ./single.py

The single thread is able to do this in 0.411 seconds, while the multithreaded app takes a full 16.409 seconds.

Now I’ll do a test in which multithreading will make a big difference. I have a list of 500 random urls. I want to log into each, then get them to display the page contents. Not all urls respond, and I’ve also given a three second timeout to fetching any page.

The single thread app is coded like so:

#!/usr/bin/python

import urllib2

with open("urls.txt", "r") as f:
    urls = f.readlines()

for url in urls:
    request = urllib2.Request(url)
    try:
        response = urllib2.urlopen(request, timeout = 3)
        page = response.read()
        print page
    except:
        print "Unable to download"

This takes a full 11 minutes and 40 seconds to fully run.

Converted to multithread:

#!/usr/bin/python
 
import urllib2
import threading

lock = threading.Lock()

def thread_test(url):
    try:
        response = urllib2.urlopen(url, timeout = 3)
        page = response.read()
        with lock:
            print page
    except:
        with lock:
            print "Unable to download"

with open("urls.txt", "r") as f:
   urls = f.readlines()
threads = []

for url in urls:
    request = urllib2.Request(url)
    t = threading.Thread(target = thread_test, args = (request,))
    threads.append(t)
    t.start()

This time the average over 3 runs is only 1 minute and 40 seconds.

I am however still locking output to the screen. This may be bad practice, but let’s assume I don’t really care about visible output. Maybe I just want to throw some commands somewhere, or something simple like ping. If I didn’t lock before printing, how quickly could this actually run?

#!/usr/bin/python
 
import urllib2
import threading

lock = threading.Lock()

def thread_test(url):
    try:
        response = urllib2.urlopen(url, timeout = 3)
        page = response.read()
    except:
        pass

with open("urls.txt", "r") as f:
   urls = f.readlines()
threads = []

for url in urls:
    request = urllib2.Request(url)
    t = threading.Thread(target = thread_test, args = (request,))
    threads.append(t)
    t.start()

This completes in 1 minute and 13 seconds. Not as much as I hoped for. But it does mean one thing. Python is not running ALL the threads at exactly the same time. If that was the case, the max run time would be just over three seconds as that’s what the timeout is.

I’ll load up Wireshark and run the test again. I should see how many threads are sending HTTP GETs at the same time. When I start the threads, I can see 26 threads all starting within a second of each other. Only a full 7 seconds later do others start:
Screen Shot 2014-12-14 at 13.39.51

After that, I see more threads being added as others end. The timing seems random later as each page has a different response time.
Screen Shot 2014-12-14 at 13.41.20

This seems to be an OS imposed limit.

Conclusions

  • Multithreading in Python has certain benefits only in specific cases.
  • Mainly if you are requesting data from many different sources. No need to query them one at a time.
  • Python’s initial single thread is rather efficient all by itself.

I’m certainly not going to rewrite all my current code to use multithreading. Going back to my original OSPF checker, it certainly would be good to check pull information off multiple devices at the same time, but the rest of the app I’d still keep as a single thread.

Basic Python Multithreading

The first ‘proper’ Python app I made logged onto a list of devices and pulled out OSPF state. This worked perfectly fine. The app correctly works out whether it can log into a device or not, and waits a few seconds to ensure a device actually responds.

The issue is that if I have a list of say 1000 devices, and 500 of them don’t respond, the amount of time you need to wait rapidly increases as it looks at each one in turn. Would it not be better for the app to be able to log into multiple devices at the same time in parallel? This would drastically reduce the runtime.

Basic Threading

Consider the following code:

#!/usr/bin/python

import threading

def thread_test():
    print "I am a thread"
    return

threads = []
for i in range(4):
    t = threading.Thread(target = thread_test)
    threads.append(t)
    t.start()

A module is defined called thread_test. I then spawn four threads, each of which run the module. I should therefore see four lines printed:

$ ./thread.py
I am a thread
I am a thread
I am a thread
I am a thread

Of course getting them all to do exactly the same thing is a bit boring. I may have a list of items I want to print. Let’s pass each item as an argument and print them out:

#!/usr/bin/python

import threading

list_of_items = ["cat", "banana", "house", "phone"]

def thread_test(item):
    print "I am a " + item
    return

threads = []
for word in list_of_items:
    t = threading.Thread(target = thread_test, args = (word,))
    threads.append(t)
    t.start()
$ ./thread.py
I am a cat
I am a banana
I am a house
I am a phone

If you’ve run this code, you may notice that sometimes your output gets a bit garbled:

$ ./thread.py
I am a catI am a banana

I am a house
I am a phone

$ ./thread.py
I am a cat
 I am a banana
I am a house
I am a phone

All four threads are trying to write to the screen at the same time. If outputting to the screen, or writing to a file, this output can look rather messy. Especially as the device and thread count goes up.

Locks

I can use locks to prevent this. Each thread can go do it’s business, but if I need to write to the screen or write to a file, I ensure only a single thread can do this at a time. As an example I’ll iterate through a list of 100. All those threads will create their data in memory at pretty much the same time, but I’ll ensure only one at a time can print and write to a file. I’ll also ensure that the application closes the file only after all threads are completed.

#!/usr/bin/python

import threading

lock = threading.Lock()

def thread_test(num):
    phrase = "I am number " + str(num)
    lock.acquire()
    print phrase
    f.write(phrase + "\n")
    lock.release()

threads = []
f = open("text.txt", 'w')
for i in range (100):
    t = threading.Thread(target = thread_test, args = (i,))
    threads.append(t)
    t.start()

while threading.activeCount() > 1:
    pass
else:
    f.close()

Any code between the locks acquiring and releasing can only be done one at a time. The example above doesn’t show a great example, but the action of getting data and waiting from a remote device can take a few seconds. If that can all be done at the same time, then results written once at a time to a file, it would speed things up immensely.

Update – 15/12/14

Ben Cardy below mentioned a great shortcut that most viewers might miss if not reading all the comments. For that reason I’ll put it up here. My code above acquires and releases a lock when needed. There is a simpler way to do this. If you code with lock, any code indented after will essentially be wrapped in lock codes. This is nice as you don’t have to remember to release the lock. Another benefit is that the with code will release the lock even if the thread throws an exception.

The last code above could be rewritten like so:

#!/usr/bin/python
 
import threading
 
lock = threading.Lock()
 
def thread_test(num):
    phrase = "I am number " + str(num)
    with lock:
        print phrase
        f.write(phrase + "\n")

 
threads = []
f = open("text.txt", 'w')
for i in range (100):
    t = threading.Thread(target = thread_test, args = (i,))
    threads.append(t)
    t.start()
 
while threading.activeCount() > 1:
    pass
else:
    f.close()

DHCP Snooping – Filter those broadcasts!

I had a specific requirement recently and I wanted to test it’s behaviour. In particular the feature is DHCP snooping. Let’s quickly go over the DHCP process at a high level to see how it works:

DHCP

Let’s take the following simple diagram to show what’s going on. We have a switch with two hosts connected. We also have a DHCP server. I’m using generic names as I’ll be testing this on different switches. Assume all devices are in the same vlan.
dhcp_snoop1

Host1 has just booted and needs an IP address. It’ll send a DHCP DISCOVER packet which is a broadcast. This broadcast gets sent to all ports in the vlan:
dhcp_snoop2

The DHCP server will then send an DHCP OFFER to host 1. It does this via unicast using the destination MAC as the layer 2 destination:
dhcp_snoop3

Host1 then send a DHCP REQUEST via broadcast. Why broadcast? This is because it may have received offers from multiple DHCP servers and is essentially telling all of them that they are accepting an offer some one of them.
dhcp_snoop4

Finally, the DHCP server acknowledges that Host1 has accepted it’s offered IP with a DHCP ack. This is unicast again:
dhcp_snoop5

Now, depending on bootp options, the offer and/or ack might actually be broadcast. The behaviour is also slightly different when using DHCP helpers, but we are mainly concerned with the DHCPDISCOVER and DHCPREQUEST packets which are always broadcast.

DHCP Snooping

In the above example, there was nothing stopping Host 2 from providing IP addresses via DHCP. This might either be malicious activity, or merely someone doing something wrong and either configuring a device wrong, or plugging in a device which should not be there.

DHCP snooping was created to prevent this from happening. DHCP’s main concern is making sure that DHCPOFFERS only come in via trusted ports. In our example port 1 connected to the DHCP server should be a trusted port. Port2 and port 3 connected to Host 1 and Host 2 respectively should never have DHCPOFFER packets on ingress. But here is the kicker. A DHCPOFFER is in response to an event. That event is a DHCPDISCOVER. That DHCPDISCOVER is a broadcast.

It stands to reason that if a DHCPOFFER cannot ever ingress port2 and port3, those ports should never have DHCPDISCOVER packets replicated to them to begin with, regardless of whether those packets are broadcast. All other broadcasts should go through, but these specific DHCP ones should not.

So is this what we actually see in the real world? I’ll test this on the devices I have available to see what behaviour I see.

Cisco Catalyst IOS

My config is as follows:

ip dhcp snooping vlan 1-4094                                                                           
ip dhcp snooping 

interface FastEthernet0/1                                                                              
 switchport access vlan 10                                                                             
 switchport mode access                                                                                
!                                                                                                      
interface FastEthernet0/2                                                                              
 switchport access vlan 10                                                                             
 switchport mode access        
!
interface FastEthernet0/24                                                                             
 switchport access vlan 10                                                                             
 switchport mode access                                                                                
 ip dhcp snooping trust     

DHCP snooping enabled with fa0/24 being the trusted port going towards my server.

I have host1 and host2 connected with the following MAC addresses:

  • 78:2b:cb:e4:e3:88
  • 00:26:5a:ef:85:33

I’ll now listen on fa0/24. I should see both DHCPDISCOVER broadcasts coming though:

$ sudo tcpdump -i eth1 -n port 67 and port 68
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth1, link-type EN10MB (Ethernet), capture size 65535 bytes
11:45:45.204815 IP 0.0.0.0.68 > 255.255.255.255.67: BOOTP/DHCP, Request from 78:2b:cb:e4:e3:88, length 300
11:45:48.733826 IP 0.0.0.0.68 > 255.255.255.255.67: BOOTP/DHCP, Request from 00:26:5a:ef:85:33, length 300

That’s exactly what I see.

If I now move the capture point over to fa0/2, I hope to see no broadcasts at all. If not, this would mean the device is not replicating those broadcasts out untrusted ports:

$ sudo tcpdump -i eth1 -n port 67 and port 68
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth1, link-type EN10MB (Ethernet), capture size 65535 bytes

Silence. That’s just what I wanted to see.

Juniper EX

Config is as follows:

[email protected]> show configuration interfaces ge-0/0/2           
unit 0 {
    family ethernet-switching {
        port-mode access;
        vlan {
            members vlan_test;
        }
    }
}

{master:0}
[email protected]> show configuration interfaces ge-0/0/3           
unit 0 {
    family ethernet-switching {
        port-mode access;
        vlan {
            members vlan_test;
        }
    }
}

{master:0}
[email protected]> show configuration interfaces ge-0/0/4           
unit 0 {
    family ethernet-switching {
        port-mode access;
        vlan {
            members vlan_test;
        }
    }
}


[email protected]> show configuration ethernet-switching-options    
secure-access-port {
    interface ge-0/0/4.0 {
        dhcp-trusted;
    }
    vlan all {
        examine-dhcp;
    }
}

ge-0/0/4 is now my trusted DHCP server port. If I listen on that port, I should see both devices broadcasts:

$ sudo tcpdump -i eth1 -n port 67 and port 68
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth1, link-type EN10MB (Ethernet), capture size 65535 bytes

11:58:02.539119 IP 0.0.0.0.68 > 255.255.255.255.67: BOOTP/DHCP, Request from 78:2b:cb:e4:e3:88, length 300
11:58:05.809947 IP 0.0.0.0.68 > 255.255.255.255.67: BOOTP/DHCP, Request from 00:26:5a:ef:85:33, length 300

What about when listening on the untrusted port?

$ sudo tcpdump -i eth1 -n port 67 and port 68
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth1, link-type EN10MB (Ethernet), capture size 65535 bytes
11:58:55.342651 IP 0.0.0.0.68 > 255.255.255.255.67: BOOTP/DHCP, Request from 78:2b:cb:e4:e3:88, length 300

I hear the broadcast come through. There is only one MAC as I’ve had to disconnect the host in order to listen via wireshark.

Conclusions

  • IOS switches filter the initial DHCPDISCOVER broadcast packets. Junos switches do not.
  • Both devices DO drop DHCPOFFER packets coming in on untrusted ports.
  • Cisco is a bit more intelligent in it’s behaviour.

Not filtering the broadcast initially doesn’t break DHCP snooping. But it’s completely unnecessary. Why send a request out a port that you would filter a reply? I’ve seen switches reload and suddenly all devices on the switch try to get their IPs back. All devices receive all these broadcasts when only the trusted port should receive it. Filtering ensures less broadcasts on the network and also prevents badly configured devices from replying to a packet it should never have received.

ESXi whitebox server

I usually have access to an ESX box at work where I can run multiple VMs and virtual routers for labbing and testing. I’ve also wanted one at home. It’s nice to be able to quickly spin up VMs when needed without always running them through my laptop.

While virtual routers don’t need lots of resources, I did want a beefy machine as there are a few servers I’d like to get running that need lots of CPU power.

Requirements

  • 32GB RAM capable with ECC
  • Fast CPU with at least 4 physical cores
  • Quiet
  • Small
  • OOB (ilo/IPMI/Etc)
  • Low power

Specifically these are the things I don’t need:

  • Optical drive
  • Hard drive
  • GPU

The point of the box is to sit in the corner with only power and network connected. If anything went wrong, I don’t want to have to connect a monitor to it. I’m also not running any tasks requiring video output on the server itself. All VMs will be logged in via SSH.

I already have a Synology DS411 which will provide an iSCSI connection to the ESX server. Hence no need for internal hard drives.

My initial build was going to be built around an Intel i7 4790. However the i7 doesn’t support ECC ram and it also has a built-in HD4000 GPU which I don’t need.

Parts list

I ended up going for the Intel Xeon E3-1230v3. 4 cores, 8 threads, all the virtulisation support I need. It has no built-in GPU. It supports up to 32GB ECC RAM. Intel have released a newer 1231v3, but I couldn’t find a good price for it in the UK and all it gives is an extra 100MHz which I’m not fussed about.

RAM is quite pricey at the moment. While I wanted 32GB, I’ll start with 16GB for now and add another 16GB when prices drop.

For the motherboard, I went with the SuperMicro X10SLL-F. It supports both the CPU and RAM and also has built-in IPMI. The board has two onboard Intel NICs, i217LM and i210AT. The board also has an on-board VGA card. I’m not going to use that, but it will be handy if I can’t log into both the server and IPMI. It also has an a-type USB slot on board which is quite handy as you’ll see later.

For the PSU, I wanted both silent and efficient. I don’t need a huge amount of wattage either as I have no GPU. I ended up with the Seasonic G-360. 80-Plus certified and very quiet.

Part of the problem with an ESXi whitebox is ensuring that VMWare recognises all your components. I did extensive research in order to ensure this was the case. There are a couple of things I hit upon, but they were easily fixed.

Final part list:

  • Intel Xeon E3-1230 V3
  • SuperMicro X10SLL-F
  • 2 X 8GB Crucial DDR3 PC3-12800 ECC Unbuffered
  • Seasonic G-360 PSU
  • Aerocool Dead Silence Gaming Cube Case
  • Old 1Gb USB flash drive

Building and installing

The SuperMicro board has a dedicated IPMI port and so I can do the entire install remotely. I’ll mount the ISO over the network, and all do all the config this way. This is the screen you see when first logging onto IPMI web interface:
Capture
I decided to install Vmware itself on a USB stick. What’s nice about this motherboard is that it has a USB port on the motherboard itself, meaning no external USB key required. This keeps it a bit neater.

The SuperMicro has two external NICs, one Intel 217 and an Intel 210. I’ve installed VMware 5.5 update 2 and the I210 Intel card is supported out of the box. No need to hack any drivers into the ISO. I’m more than happy with one NIC for now so I’ve no need to try and get the 217 working.

Once VMWare was installed, I created a 300GB iSCSI LUN from my Synology and attached VMWare to that. The install and set up really was painless.

Vmware shows my system as:
system

Virtual devices

I wanted to start a basic lab, so I have the following VMs running in my lab:
VMs

With all my VMs running, I see hardly any CPU and quite a bit of RAM usage as I expected:
resource
For now the RAM amount is fine. As I ramp up the lab and prices drop, I’ll add another 16GB to the system.

Power usage

As I wanted this to be low power, I’ve done full wattage readings on power usage.

  • Server off, IPMI on – 3.7 Watts
  • Server on, no VMs running – 23 Watts
  • Server on, all lab VMs running – 34 Watts

Not at all bad. In another post I’ll show the Synology power draw as well as the power draw if all VMs are using full CPU. I’ll also go over how I automate my VMs starting and shutting down.

Basic OOP Python

I’ve just started with object oriented programming in Python so I thought I’d cover some of the basics here. Please don’t assume this is a thorough tutorial on OOP!

The beauty of OOP is that it allows me to create a template with which I can create objects. The building blocks of the object sit in the class, while the object itself is created from that blueprint with various properties. I can then make as many of these objects as I desire from that single class. I can see this being very beneficial for certain types of programming (games especially)

Basics

I’ll start with something simple. I want to create a class called Ball. This class requires certain properties like radius and colour. I’ll create a class like so:

class Ball:
    def __init__(self, radius, colour):
        self.radius = radius
        self.colour = colour

I’ve created a class called ‘Ball’ – This requires two variables, radius and colour. Note that ‘self’ refers to the object that is being created. Once this is done, I can then call this class to create objects. I need to ensure I pass both variables otherwise:

>>>blueball = Ball("Blue")

Traceback (most recent call last):
  File "", line 1, in 
    blueball = Ball("Blue")
TypeError: __init__() takes exactly 3 arguments (2 given)

Let’s do it properly:

>>>blueball = Ball(10, "Blue")

blueball is now an object created from the class:

>>> blueball
<__main__.Ball instance at 0x10f11bdd0>
>>> print type(blueball)
type 'instance'

Each self.x in the class is a method which I can set and interrogate. If I wanted to see the current radius of blueball, I simply call the method I defined:

>>> blueball.radius
10

I can also change the variable later if I so choose:

>>> blueball.radius = 20
>>> blueball.radius
20

If I simply print blueball, I won’t get much:

>>> print blueball
<__main__.Ball instance at 0x10f11bdd0>

In order to be able to get string output from an object, I need to ensure the class has a string method. I’ll add the following to my original class:

class Ball:
	def __init__(self, radius, colour):
		self.radius = radius
		self.colour = colour
    def __str__(self):
		return "I am a " + self.colour + " ball, with a radius of " + str(self.radius)

Note that my old blueball variable has still been created from the old class so I’ll simply create a new one:

>>> blueball = Ball(10, "Blue")
>>> print str(blueball)
I am a Blue ball, with a radius of 10

I can now happily create as many objects as I want, with different properties. Each object is a separate instance:

>>> redball = Ball(20, "Red")
>>> print str(redball)
I am a Red ball, with a radius of 20


Let’s take this a step further. For this I’m going to use Scott Rixner’s CodeSkulptor as it has some nice draw capabilties built-in. It also runs in your browser directly.

I’d like to create an empty space with nothing. I then want to click a button to create a new ball with random properties. That ball should then be displayed inside the space. Each click should be a new object created from my original class. I’m going to add a few more properties to my class which will come clear later.

I’ll first get my global variables set:

list_of_balls = []
width = 1000
height = 600
colours = ["Aqua","Blue","Green","Lime","Maroon","Navy","Orange","Red","White","Yellow"]

Now comes the Ball class:

class Ball:
    def __init__(self, radius, mass, colour, x_location):
        self.radius = radius
        self.mass = mass
        self.colour = colour
        self.location = [x_location, height/2]

When I click the mouse button, I want certain random properties set for the object:

def click():
    radius = random.randint(1,40)
    mass = radius
    colour = random.choice(colours)
    x_location = random.randint(20, width-20)
    new_ball = Ball(radius, mass, colour, x_location)
    list_of_balls.append(new_ball)

The mouse click handler creates a new ball with random properties, then appends that ball to a list of balls. Each time I click a new ball is added to the list.

I now need to draw the balls. I need to iterate through my list of objects and draw each one:

def draw(canvas):
    for ball in list_of_balls:
        canvas.draw_circle((ball.location[0],ball.location[1]), ball.radius, 1, ball.colour, ball.colour)

Click here to view and run the code in CodeSkulptor. Press play in the top left corner to run the code.

Dynamic

So the above code simply creates a bunch of balls on random places on the x axis while being in the middle of the y axis. Let’s start moving these balls around based on their initial mass. Currently the bigger the ball, the bigger the mass. Let’s keep it that way for now. All balls will simply fall towards the ground. I’d like to make sure that when the ball hits the bottom of the screem it’s reflected back up. Same goes for the top of the screen.

First I need to add velocity to the object. Note that properties of an object don’t all have to be variables. I could set the velocity of all the balls exactly the same. For now I’ll use the mass of the ball:

class Ball:
    def __init__(self, radius, mass, colour, x_location):
        self.radius = radius
        self.mass = mass
        self.velocity = self.mass
        self.colour = colour
        self.location = [x_location, height/2]

I then need to update my draw handler so it updates the position of each ball. If the y location of the ball hits the top or bottom of the screen, reverse the direction:

def draw(canvas):
    for ball in list_of_balls:
        ball.location[1] += ball.velocity
        if ball.location[1] >= (height - ball.radius) or ball.location[1] <= ball.radius:
            ball.velocity = -ball.velocity
        canvas.draw_circle((ball.location[0],ball.location[1]), ball.radius, 1, ball.colour, ball.colour)

Click here to open my code in CodeSkulptor. Now every time you add a new ball, it’ll start bouncing against the top and bottom wall. Create as many balls as you like!

Conclusions

It’s still early days in my OOP work. I can see this method is perfect for applications like gaming. I’m not 100% sure if I’ll have an application for it in my type of coding, I’ll have to see.

In the interim, the basics of OOP isn’t that difficult. I’ve still got a long way to go but happy so far!

For future work on my code above, it would be trivial to set a random mass. It would also be nice to extend the balls to simply bounce like in real life, or gets the balls to bounce off each other. Either way, the properties of each ball itself is independent of the environment in which it sits.

© 2009-2019 Darren O'Connor All Rights Reserved -- Copyright notice by Blog Copyright