Life is busy

I’ve had zero time to update the blog recently. As some of you may know, I recently started a new job with Google. I’ve moved my family and I over from the UK to Dublin, Ireland.

To say I’m busy right now is an understatement. Not only is there a ton of reading for me to do at work, I’m also trying to sort out all the issues of moving country again.

There will be updates coming eventually, but when that will be I’m not yet certain. I’m working regular posts as well as a new part of the site that’ll eventually come live. I’m also working on a few twitter based apps. Two in the wild already are my @bgp4_table and @bgp6_table accounts.

In the meantime feel free to continue commenting. I do read all and try and reply when it’s possible.

Python and MySQL

Let me preface this post by stating I am not a database expert. I use them occasionally now and then. The below post probably doesn’t show best practices. If you have any suggestions feel free to comment.

Over the weekend I’ve been testing various ways for me to store, update, and retrieve data from a database. At first I was using the sqlite3 package, but turned to MySQL. Why the move you ask? Well I already had MySQL running on the target machine and already had scripts backing up all databases every night. Why not just use the existing database there?

For this post I’m using python 2.7.3, Debian 7.7.0, and MySQL-python 1.2.5.

Install and set-up

Ensure that the mysql python library is installed:

root@python-db:/tmp# pip install MySQL-python
Collecting MySQL-python
  Downloading MySQL-python-1.2.5.zip (108kB)
    100% |################################| 110kB 6.4MB/s
Installing collected packages: MySQL-python

I’ve also installed mysql on the test server. I’m using a root password of password simply as this is a test box.

I’ll now create a database to work on. I want a database that contains BGP AS numbers, AS Names, a queried field, and a data to know when last the AS changed. A user will be created that has full access to this database. Password for now will be password.

root@python-db:~# mysql -u root -p
Enter password:

mysql> create database AS_NUM_NAME;
Query OK, 1 row affected (0.00 sec)

mysql> CREATE USER 'asnuser'@'localhost' IDENTIFIED BY 'password';
Query OK, 0 rows affected (0.00 sec)

mysql> GRANT ALL ON AS_NUM_NAME.* TO 'asnuser'@'localhost';
Query OK, 0 rows affected (0.00 sec)

mysql> USE AS_NUM_NAME;
Database changed

mysql> create table ASN(AS_NUM INT, AS_NAME VARCHAR(50), QUERIED INT, CHANGED DATE);
Query OK, 0 rows affected (0.01 sec)

mysql> flush privileges;
Query OK, 0 rows affected (0.00 sec)

mysql> quit;
Bye

Add data

The mysql library allows us to open a connection to the database and execute sql commands. The next script will open the database, insert new data, commit those changed, then close the database:

#!/usr/bin/python

import MySQLdb
import time

today = time.strftime("%Y-%m-%d")

db = MySQLdb.connect("localhost", "asnuser", "password", "AS_NUM_NAME")

with db:
    cursor = db.cursor()
    sql = '''INSERT INTO ASN(AS_NUM, AS_NAME, QUERIED, CHANGED) \
                    VALUES (%s, %s, %s, %s)'''
    cursor.execute(sql, (1, 'Level 3 Communications, Inc.', 0, today))

I’ll give this a run:

root@python-db:~# ./add.py

Log into the database to see the changes:

root@python-db:~# mysql -u asnuser -p
Enter password:

mysql> use AS_NUM_NAME;

mysql> select * from ASN;
+--------+------------------------------+---------+------------+
| AS_NUM | AS_NAME                      | QUERIED | CHANGED    |
+--------+------------------------------+---------+------------+
|      1 | Level 3 Communications, Inc. |       0 | 2015-01-06 |
+--------+------------------------------+---------+------------+
1 row in set (0.00 sec)

Of course, doing an single update is not very useful. I’ll create a text file with the first 20 AS numbers and names in use:

1     Level 3 Communications, Inc.
2     University of Delaware
3     Massachusetts Institute of Technology
4     University of Southern California
5     Symbolics, Inc.
6     Bull HN Information Systems Inc.
7     UK Defence Research Agency
8     Rice University
9     Carnegie Mellon University
10    CSNET Coordination and Information Center (CSNET-CIC)
11    Harvard University
12    New York University
13    Headquarters, USAISC
14    Columbia University
15    DYNAMICS
16    Lawrence Berkeley National Laboratory
17    Purdue University
18    University of Texas at Austin
19    Leidos, Inc.
20    University of Rochester

I want to extract the first number, then everything will be part of the AS name. I’ll then stick them all in the database:

#!/usr/bin/python

import MySQLdb
import time

today = time.strftime("%Y-%m-%d")
as_list = []

with open ('20_asn') as f:
    new_as = f.readlines()
for AS in new_as:
    as_list.append(AS.strip())
db = MySQLdb.connect("localhost", "asnuser", "password", "AS_NUM_NAME")

with db:
    cursor = db.cursor()
    sql = '''INSERT INTO ASN(AS_NUM, AS_NAME, QUERIED, CHANGED) \
                    VALUES (%s, %s, %s, %s)'''
    for AS in as_list:
        split_as = AS.split(' ', 1)
        cursor.execute(sql, (int(split_as[0].strip()), split_as[1].strip(), 0, today))

After a quick run, let’s log back into the database and check what we have

mysql> SELECT * FROM ASN WHERE AS_NUM = 14;
+--------+---------------------+---------+------------+
| AS_NUM | AS_NAME             | QUERIED | CHANGED    |
+--------+---------------------+---------+------------+
|     14 | Columbia University |       0 | 2015-01-06 |
+--------+---------------------+---------+------------+
1 row in set (0.00 sec)

mysql> SELECT * FROM ASN WHERE AS_NUM = 18;
+--------+-------------------------------+---------+------------+
| AS_NUM | AS_NAME                       | QUERIED | CHANGED    |
+--------+-------------------------------+---------+------------+
|     18 | University of Texas at Austin |       0 | 2015-01-06 |
+--------+-------------------------------+---------+------------+
1 row in set (0.00 sec)

Retrieve data

I’ll write another script now that takes an AS number as an argument and prints out the AS name:

#!/usr/bin/python

import MySQLdb
import sys

db = MySQLdb.connect("localhost", "asnuser", "password", "AS_NUM_NAME")
cursor = db.cursor()
sql = "SELECT AS_NAME FROM ASN where AS_NUM = " + sys.argv[1]
with db:
    cursor.execute(sql)
    row = cursor.fetchone()
    print row[0]

Note: Replies are always given as tuples, even when a single value is returned. This is why I’m only printing the first item in the returned tuple with [0]

A quick run gives me what I need:

root@python-db:~# ./check.py 5
Symbolics, Inc.
root@python-db:~# ./check.py 6
Bull HN Information Systems Inc.
root@python-db:~# ./check.py 7
UK Defence Research Agency

Updating

I’d like to update the QUERIED field each time I query a value. I’ll rewrite the last script to update that field when it gets a result. First it should get the AS Name and the QUERIED value. Then update the QUERIED value by 1, and print the AS name:

#!/usr/bin/python

import MySQLdb
import sys

query_as = sys.argv[1]

db = MySQLdb.connect("localhost", "asnuser", "password", "AS_NUM_NAME")
cursor = db.cursor()
sql = "SELECT * FROM ASN where AS_NUM = " + sys.argv[1]
with db:
    cursor.execute(sql)
    row = cursor.fetchone()
    queried = row[2]
    queried += 1
    sql = """
        UPDATE ASN
        SET QUERIED = %s
        WHERE AS_NUM = %s"""
    cursor.execute(sql, (queried, query_as))
    print row[1]

First, my mysql check the queried value is 0:

mysql> select * from ASN where AS_NUM = 18;
+--------+-------------------------------+---------+------------+
| AS_NUM | AS_NAME                       | QUERIED | CHANGED    |
+--------+-------------------------------+---------+------------+
|     18 | University of Texas at Austin |       0 | 2015-01-06 |
+--------+-------------------------------+---------+------------+
1 row in set (0.00 sec)

Query that value three times:

root@python-db:~# ./check.py 18
University of Texas at Austin
root@python-db:~# ./check.py 18
University of Texas at Austin
root@python-db:~# ./check.py 18
University of Texas at Austin

Now check the database:

mysql> select * from ASN where AS_NUM = 18;
+--------+-------------------------------+---------+------------+
| AS_NUM | AS_NAME                       | QUERIED | CHANGED    |
+--------+-------------------------------+---------+------------+
|     18 | University of Texas at Austin |       3 | 2015-01-06 |
+--------+-------------------------------+---------+------------+
1 row in set (0.00 sec)

Very useful.

There was an issue I created earlier that you may have spotted. I created a record for ASN1, then when I created the first 20 I created another ASN1. This can be shown via a query:

mysql> SELECT * FROM ASN WHERE AS_NUM = 1;
+--------+------------------------------+---------+------------+
| AS_NUM | AS_NAME                      | QUERIED | CHANGED    |
+--------+------------------------------+---------+------------+
|      1 | Level 3 Communications, Inc. |       0 | 2015-01-06 |
|      1 | Level 3 Communications, Inc. |       0 | 2015-01-06 |
+--------+------------------------------+---------+------------+
2 rows in set (0.00 sec)

The script I used to populate all 20 was fine to populate a database for the first time, but no good for further updates. The script simply created new records, with new dates. What we want is to check the database first, then do different actions depending on what we see.

I’ll now get a list of the first 30 AS numbers, then run a script to update. I’ll need to check the following:

  • Does the AS Number already exist?
  • If so, check that the name matches (note, I’m storing only 50 characters of the name, so only match the first 50 characters)
  • If the name doesn’t match, update the name and updated the CHANGED field.
  • Otherwise just create a new record and insert todays date into the CHANGED field.

I’ll first delete all records with ASN 1 from the database then write the new script.

mysql> DELETE FROM ASN WHERE AS_NUM = 1;
Query OK, 2 rows affected (0.00 sec)

As this was getting a bit big, I moved the update and create sections into their own methods.

#!/usr/bin/python

import MySQLdb
import sys
import time

as_list = []
already, new, changed = 0, 0, 0
today = time.strftime("%Y-%m-%d")

with open ('30_asn') as f:
    new_as = f.readlines()
for AS in new_as:
    as_list.append(AS.strip())

db = MySQLdb.connect("localhost", "asnuser", "password", "AS_NUM_NAME")
cursor = db.cursor()

def create_sql(AS_NUM, AS_NAME):
    sql = '''INSERT INTO ASN(AS_NUM, AS_NAME, QUERIED, CHANGED) \
                VALUES (%s, %s, %s, %s)'''
    cursor.execute(sql, (AS_NUM, AS_NAME, 0, today))

def update_sql(AS_NUM, AS_NAME):
    sql = """
        UPDATE ASN
        SET QUERIED = %s
        WHERE AS_NUM = %s"""
    cursor.execute(sql, (AS_NUM, AS_NAME))

with db:
    for AS in as_list:
        split_as = AS.split(' ', 1)
        split_as[1] = split_as[1].lstrip()
        sql = "SELECT * FROM ASN where AS_NUM = " +str(split_as[0])
        cursor.execute(sql)
        row = cursor.fetchone()
        if row:
            if row[1] == split_as[1][:50].strip():
                already += 1
                pass
            else:
                changed += 1
                update_sql(int(split_as[0].strip()), split_as[1].strip())
        else:
            new += 1
            create_sql(int(split_as[0].strip()), split_as[1].strip())

print "New AS: " + str(new)
print "Changed: " + str(changed)
print "Unchanged: " + str(already)

Let’s do a quick run:

root@python-db:~# ./insert.py
New AS: 11
Changed: 0
Unchanged: 20

Great, 1 was re-added, plus the 10 new ones. This should add them all, plus not change the values of anything that was already there:

mysql> SELECT * FROM ASN;
+--------+---------------------------------------------------+---------+------------+
| AS_NUM | AS_NAME                                           | QUERIED | CHANGED    |
+--------+---------------------------------------------------+---------+------------+
|      1 | Level 3 Communications, Inc.                      |       0 | 2015-01-06 |
|      2 | University of Delaware                            |       2 | 2015-01-06 |
|      3 | Massachusetts Institute of Technology             |       1 | 2015-01-06 |
|      4 | University of Southern California                 |       0 | 2015-01-06 |
|      5 | Symbolics, Inc.                                   |       0 | 2015-01-06 |
|      6 | Bull HN Information Systems Inc.                  |       0 | 2015-01-06 |
|      7 | UK Defence Research Agency                        |       0 | 2015-01-06 |
|      8 | Rice University                                   |       0 | 2015-01-06 |
|      9 | Carnegie Mellon University                        |       0 | 2015-01-06 |
|     10 | CSNET Coordination and Information Center (CSNET- |       1 | 2015-01-06 |
|     11 | Harvard University                                |       0 | 2015-01-06 |
|     12 | New York University                               |       0 | 2015-01-06 |
|     13 | Headquarters, USAISC                              |       0 | 2015-01-06 |
|     14 | Columbia University                               |       0 | 2015-01-06 |
|     15 | DYNAMICS                                          |       0 | 2015-01-06 |
|     16 | Lawrence Berkeley National Laboratory             |       0 | 2015-01-06 |
|     17 | Purdue University                                 |       0 | 2015-01-06 |
|     18 | University of Texas at Austin                     |       3 | 2015-01-06 |
|     19 | Leidos, Inc.                                      |       0 | 2015-01-06 |
|     20 | University of Rochester                           |       2 | 2015-01-06 |
|     21 | The RAND Corporation                              |       0 | 2015-01-06 |
|     22 | Navy Network Information Center (NNIC)            |       0 | 2015-01-06 |
|     23 | National Aeronautics and Space Administration     |       0 | 2015-01-06 |
|     24 | National Aeronautics and Space Administration     |       0 | 2015-01-06 |
|     25 | University of California at Berkeley              |       0 | 2015-01-06 |
|     26 | Cornell University                                |       0 | 2015-01-06 |
|     27 | University of Maryland                            |       0 | 2015-01-06 |
|     28 | Deutsches Zentrum fuer Luft- und Raumfahrt        |       0 | 2015-01-06 |
|     29 | Yale University                                   |       0 | 2015-01-06 |
|     30 | SRI International                                 |       0 | 2015-01-06 |
+--------+---------------------------------------------------+---------+------------+
30 rows in set (0.00 sec)

Python paths and Cron logging

I created two new twitter accounts yesterday and the amount of followers in such a short time is great to see. Feel free to follow them here – @bgp4_table and @bgp6_table

The accounts get updated through Python, and that Python script is run via a cron job once every six hours.

I noticed that when I ran my script manually, it worked fine. When the cronjob ran it, nothing happened. As there is no console log it made me wonder what the issue was.

Cron log

The first thing I needed to do was configure cron to log it’s output. I’ve done it like so in my crontab:

0 0,6,12,18 * * * /home/scripts/tweet.py > /home/scripts/tweet.log 2>&1

This directs all 1 and 2 output to the log file I created. In case you not aware, standard stream 0 is input, stream 1 is output, and stream 2 is errors. The commands above ensure I’m logging both output, if any, and errors, if any.

Python paths

On the next cron run, my log was created and it was plain to see:

$ less tweet.log
Traceback (most recent call last):
  File "/home/scripts/tweet.py", line 10, in 
    with open("v4_count") as f:
IOError: [Errno 2] No such file or directory: 'v4_count'

As part of my script, I save the last values in text files located in the same path as the script. When the script runs again, it reads the last value, get the new value, then works out a delta to display. I then write the latest value into that file for the next run. This is an example of one of those reads:

# Pull last delta
with open("v4_count") as f:
    old_v4 = int(f.readline())

This ran fine if I ran the script manually, but I was always running the script from within the same folder. This meant that python was finding and opening the file in the same folder. With the cronjob, it was getting called from somwehere else where v4_count did not exist.

The simple fix for this was to change all references to the full path:

# Pull last delta
with open("/home/scripts/v4_count") as f:
    old_v4 = int(f.readline())

This time on the next run, no more problems :)

Using bird to pull global BGP route counts

For an electronics project I’m working on I wanted a way to check the current global routing table every five minutes for both IPv4 and IPv6. I did not want to log into anyone else’s router or looking glass as checking every 5 minutes may be considered abuse.

So I thought to spin up a bird routing instance. I just wanted to receive the routes, not actually install them into the kernel on my linux box. From there I’d be able to check the table size sent over.

Nat Morris helped me out by sending a full tables over a multihop BGP session.

Installing bird is trivial. In order to ensure I’m only running BGP and not installing into the kernel, my configuration looks pretty simple.

/etc/bird.conf:

log syslog all;

router id x.x.x.x;

protocol device { }

protocol bgp {
        local as xxxxx;
        neighbor x.x.x.x as xxxxx;
        multihop;
        password "xxxxx";
}

/etc/bird6.conf:

log syslog all;

router id x.x.x.x;

protocol device { }

protocol bgp {
        local as xxxxx;
        neighbor x:x:x:x:x::x as xxxxx;
        source address x:x:x:x:x:x::x;
        multihop;
        password "xxxxx";
}

In order to get the figures I need, I’d usually have to log into the console of the daemon like so:

$ birdc
BIRD 1.3.7 ready.
bird> show protocols all bgp1
name     proto    table    state  since       info
bgp1     BGP      master   up     19:06       Established
  Preference:     100
  Input filter:   ACCEPT
  Output filter:  REJECT
  Routes:         511014 imported, 0 exported, 511014 preferred
  Route change stats:     received   rejected   filtered    ignored   accepted
    Import updates:         523721          0          0        826     522895
    Import withdraws:         1313          0        ---          0       1313
    Export updates:         522895     522895          0        ---          0
    Export withdraws:         1313        ---        ---        ---          0
  BGP state:          Established
    Neighbor address: x.x.x.x
    Neighbor AS:      xxxxx
    Neighbor ID:      x.x.x.x
    Neighbor caps:    refresh AS4
    Session:          external multihop AS4
    Source address:   x.x.x.x
    Hold timer:       142/180
    Keepalive timer:  33/60

I could get a script to log in and get the required information via regular expressions, but there has to be an easier way. Turns out you can push a command directly to bird without logging into it first:

$birdc 'show protocols all bgp1'
BIRD 1.3.7 ready.
name     proto    table    state  since       info
bgp1     BGP      master   up     19:06       Established
  Preference:     100
  Input filter:   ACCEPT
  Output filter:  REJECT
  Routes:         511025 imported, 0 exported, 511025 preferred
  Route change stats:     received   rejected   filtered    ignored   accepted
    Import updates:         523924          0          0        839     523085
    Import withdraws:         1329          0        ---          0       1329
    Export updates:         523085     523085          0        ---          0
    Export withdraws:         1329        ---        ---        ---          0
  BGP state:          Established
    Neighbor address: x.x.x.x
    Neighbor AS:      xxxxx
    Neighbor ID:      x.x.x.x.x
    Neighbor caps:    refresh AS4
    Session:          external multihop AS4
    Source address:   x.x.x.x
    Hold timer:       161/180
    Keepalive timer:  24/60

Still too much information, but we can use grep!

$birdc 'show protocols all bgp1' | grep 'Routes'
  Routes:         510977 imported, 0 exported, 510977 preferred

Better, but those fields are nicely tabbed so awk to the rescue! I’d like to get that route count. Easily done.

$birdc 'show protocols all bgp1' | grep 'Routes' | awk {'print $2'}
510976

I can now cron a script that will pull those values once every 5 minutes and generate an XML file which you can see right here.

Feel free to query that page and use it for your own projects. Just be aware there is NO SLA on it :)

When and when not to multithread

At the end of my last post on Python multithreading, I said my example was not the best. Let me expand some more on this.

While testing code in the previous post, I noticed that certain code was slower when multiple threads were running. Also these threads are not tied to a CPU. If we were talking about a bigger applications in which we wanted to ensure multiple threads were on different CPUs, you are in fact looking for multiprocessing.

Consider the following code. It simple counts to 99 999, doubles the number, then prints this to the screen. At first I’ll do this as a single thread app then multithread and time them.

Single-thread

#!/usr/bin/python

for i in range (100000):
    i *= 2
    print i

Multi-thread

#!/usr/bin/python

import threading
lock = threading.Lock()

def thread_test(i):
    i *= 2
    with lock:
        print i

threads = []
for i in range (100000):
    t = threading.Thread(target = thread_test, args = (i,))
    threads.append(t)
    t.start()

I’ll now time and run the command. I’ll run each command three times and take the average of all three:

time ./single.py

The single thread is able to do this in 0.411 seconds, while the multithreaded app takes a full 16.409 seconds.

Now I’ll do a test in which multithreading will make a big difference. I have a list of 500 random urls. I want to log into each, then get them to display the page contents. Not all urls respond, and I’ve also given a three second timeout to fetching any page.

The single thread app is coded like so:

#!/usr/bin/python

import urllib2

with open("urls.txt", "r") as f:
    urls = f.readlines()

for url in urls:
    request = urllib2.Request(url)
    try:
        response = urllib2.urlopen(request, timeout = 3)
        page = response.read()
        print page
    except:
        print "Unable to download"

This takes a full 11 minutes and 40 seconds to fully run.

Converted to multithread:

#!/usr/bin/python
 
import urllib2
import threading

lock = threading.Lock()

def thread_test(url):
    try:
        response = urllib2.urlopen(url, timeout = 3)
        page = response.read()
        with lock:
            print page
    except:
        with lock:
            print "Unable to download"

with open("urls.txt", "r") as f:
   urls = f.readlines()
threads = []

for url in urls:
    request = urllib2.Request(url)
    t = threading.Thread(target = thread_test, args = (request,))
    threads.append(t)
    t.start()

This time the average over 3 runs is only 1 minute and 40 seconds.

I am however still locking output to the screen. This may be bad practice, but let’s assume I don’t really care about visible output. Maybe I just want to throw some commands somewhere, or something simple like ping. If I didn’t lock before printing, how quickly could this actually run?

#!/usr/bin/python
 
import urllib2
import threading

lock = threading.Lock()

def thread_test(url):
    try:
        response = urllib2.urlopen(url, timeout = 3)
        page = response.read()
    except:
        pass

with open("urls.txt", "r") as f:
   urls = f.readlines()
threads = []

for url in urls:
    request = urllib2.Request(url)
    t = threading.Thread(target = thread_test, args = (request,))
    threads.append(t)
    t.start()

This completes in 1 minute and 13 seconds. Not as much as I hoped for. But it does mean one thing. Python is not running ALL the threads at exactly the same time. If that was the case, the max run time would be just over three seconds as that’s what the timeout is.

I’ll load up Wireshark and run the test again. I should see how many threads are sending HTTP GETs at the same time. When I start the threads, I can see 26 threads all starting within a second of each other. Only a full 7 seconds later do others start:
Screen Shot 2014-12-14 at 13.39.51

After that, I see more threads being added as others end. The timing seems random later as each page has a different response time.
Screen Shot 2014-12-14 at 13.41.20

This seems to be an OS imposed limit.

Conclusions

  • Multithreading in Python has certain benefits only in specific cases.
  • Mainly if you are requesting data from many different sources. No need to query them one at a time.
  • Python’s initial single thread is rather efficient all by itself.

I’m certainly not going to rewrite all my current code to use multithreading. Going back to my original OSPF checker, it certainly would be good to check pull information off multiple devices at the same time, but the rest of the app I’d still keep as a single thread.

Various networking ramblings from Dual CCIE #38070 (R&S, SP) and JNCIE-SP #2227

© 2009-2015 Darren O'Connor All Rights Reserved