Python and MySQL

Let me preface this post by stating I am not a database expert. I use them occasionally now and then. The below post probably doesn’t show best practices. If you have any suggestions feel free to comment.

Over the weekend I’ve been testing various ways for me to store, update, and retrieve data from a database. At first I was using the sqlite3 package, but turned to MySQL. Why the move you ask? Well I already had MySQL running on the target machine and already had scripts backing up all databases every night. Why not just use the existing database there?

For this post I’m using python 2.7.3, Debian 7.7.0, and MySQL-python 1.2.5.

Install and set-up

Ensure that the mysql python library is installed:

root@python-db:/tmp# pip install MySQL-python
Collecting MySQL-python
  Downloading MySQL-python-1.2.5.zip (108kB)
    100% |################################| 110kB 6.4MB/s
Installing collected packages: MySQL-python

I’ve also installed mysql on the test server. I’m using a root password of password simply as this is a test box.

I’ll now create a database to work on. I want a database that contains BGP AS numbers, AS Names, a queried field, and a data to know when last the AS changed. A user will be created that has full access to this database. Password for now will be password.

root@python-db:~# mysql -u root -p
Enter password:

mysql> create database AS_NUM_NAME;
Query OK, 1 row affected (0.00 sec)

mysql> CREATE USER 'asnuser'@'localhost' IDENTIFIED BY 'password';
Query OK, 0 rows affected (0.00 sec)

mysql> GRANT ALL ON AS_NUM_NAME.* TO 'asnuser'@'localhost';
Query OK, 0 rows affected (0.00 sec)

mysql> USE AS_NUM_NAME;
Database changed

mysql> create table ASN(AS_NUM INT, AS_NAME VARCHAR(50), QUERIED INT, CHANGED DATE);
Query OK, 0 rows affected (0.01 sec)

mysql> flush privileges;
Query OK, 0 rows affected (0.00 sec)

mysql> quit;
Bye

Add data

The mysql library allows us to open a connection to the database and execute sql commands. The next script will open the database, insert new data, commit those changed, then close the database:

#!/usr/bin/python

import MySQLdb
import time

today = time.strftime("%Y-%m-%d")

db = MySQLdb.connect("localhost", "asnuser", "password", "AS_NUM_NAME")

with db:
    cursor = db.cursor()
    sql = '''INSERT INTO ASN(AS_NUM, AS_NAME, QUERIED, CHANGED) \
                    VALUES (%s, %s, %s, %s)'''
    cursor.execute(sql, (1, 'Level 3 Communications, Inc.', 0, today))

I’ll give this a run:

root@python-db:~# ./add.py

Log into the database to see the changes:

root@python-db:~# mysql -u asnuser -p
Enter password:

mysql> use AS_NUM_NAME;

mysql> select * from ASN;
+--------+------------------------------+---------+------------+
| AS_NUM | AS_NAME                      | QUERIED | CHANGED    |
+--------+------------------------------+---------+------------+
|      1 | Level 3 Communications, Inc. |       0 | 2015-01-06 |
+--------+------------------------------+---------+------------+
1 row in set (0.00 sec)

Of course, doing an single update is not very useful. I’ll create a text file with the first 20 AS numbers and names in use:

1     Level 3 Communications, Inc.
2     University of Delaware
3     Massachusetts Institute of Technology
4     University of Southern California
5     Symbolics, Inc.
6     Bull HN Information Systems Inc.
7     UK Defence Research Agency
8     Rice University
9     Carnegie Mellon University
10    CSNET Coordination and Information Center (CSNET-CIC)
11    Harvard University
12    New York University
13    Headquarters, USAISC
14    Columbia University
15    DYNAMICS
16    Lawrence Berkeley National Laboratory
17    Purdue University
18    University of Texas at Austin
19    Leidos, Inc.
20    University of Rochester

I want to extract the first number, then everything will be part of the AS name. I’ll then stick them all in the database:

#!/usr/bin/python

import MySQLdb
import time

today = time.strftime("%Y-%m-%d")
as_list = []

with open ('20_asn') as f:
    new_as = f.readlines()
for AS in new_as:
    as_list.append(AS.strip())
db = MySQLdb.connect("localhost", "asnuser", "password", "AS_NUM_NAME")

with db:
    cursor = db.cursor()
    sql = '''INSERT INTO ASN(AS_NUM, AS_NAME, QUERIED, CHANGED) \
                    VALUES (%s, %s, %s, %s)'''
    for AS in as_list:
        split_as = AS.split(' ', 1)
        cursor.execute(sql, (int(split_as[0].strip()), split_as[1].strip(), 0, today))

After a quick run, let’s log back into the database and check what we have

mysql> SELECT * FROM ASN WHERE AS_NUM = 14;
+--------+---------------------+---------+------------+
| AS_NUM | AS_NAME             | QUERIED | CHANGED    |
+--------+---------------------+---------+------------+
|     14 | Columbia University |       0 | 2015-01-06 |
+--------+---------------------+---------+------------+
1 row in set (0.00 sec)

mysql> SELECT * FROM ASN WHERE AS_NUM = 18;
+--------+-------------------------------+---------+------------+
| AS_NUM | AS_NAME                       | QUERIED | CHANGED    |
+--------+-------------------------------+---------+------------+
|     18 | University of Texas at Austin |       0 | 2015-01-06 |
+--------+-------------------------------+---------+------------+
1 row in set (0.00 sec)

Retrieve data

I’ll write another script now that takes an AS number as an argument and prints out the AS name:

#!/usr/bin/python

import MySQLdb
import sys

db = MySQLdb.connect("localhost", "asnuser", "password", "AS_NUM_NAME")
cursor = db.cursor()
sql = "SELECT AS_NAME FROM ASN where AS_NUM = " + sys.argv[1]
with db:
    cursor.execute(sql)
    row = cursor.fetchone()
    print row[0]

Note: Replies are always given as tuples, even when a single value is returned. This is why I’m only printing the first item in the returned tuple with [0]

A quick run gives me what I need:

root@python-db:~# ./check.py 5
Symbolics, Inc.
root@python-db:~# ./check.py 6
Bull HN Information Systems Inc.
root@python-db:~# ./check.py 7
UK Defence Research Agency

Updating

I’d like to update the QUERIED field each time I query a value. I’ll rewrite the last script to update that field when it gets a result. First it should get the AS Name and the QUERIED value. Then update the QUERIED value by 1, and print the AS name:

#!/usr/bin/python

import MySQLdb
import sys

query_as = sys.argv[1]

db = MySQLdb.connect("localhost", "asnuser", "password", "AS_NUM_NAME")
cursor = db.cursor()
sql = "SELECT * FROM ASN where AS_NUM = " + sys.argv[1]
with db:
    cursor.execute(sql)
    row = cursor.fetchone()
    queried = row[2]
    queried += 1
    sql = """
        UPDATE ASN
        SET QUERIED = %s
        WHERE AS_NUM = %s"""
    cursor.execute(sql, (queried, query_as))
    print row[1]

First, my mysql check the queried value is 0:

mysql> select * from ASN where AS_NUM = 18;
+--------+-------------------------------+---------+------------+
| AS_NUM | AS_NAME                       | QUERIED | CHANGED    |
+--------+-------------------------------+---------+------------+
|     18 | University of Texas at Austin |       0 | 2015-01-06 |
+--------+-------------------------------+---------+------------+
1 row in set (0.00 sec)

Query that value three times:

root@python-db:~# ./check.py 18
University of Texas at Austin
root@python-db:~# ./check.py 18
University of Texas at Austin
root@python-db:~# ./check.py 18
University of Texas at Austin

Now check the database:

mysql> select * from ASN where AS_NUM = 18;
+--------+-------------------------------+---------+------------+
| AS_NUM | AS_NAME                       | QUERIED | CHANGED    |
+--------+-------------------------------+---------+------------+
|     18 | University of Texas at Austin |       3 | 2015-01-06 |
+--------+-------------------------------+---------+------------+
1 row in set (0.00 sec)

Very useful.

There was an issue I created earlier that you may have spotted. I created a record for ASN1, then when I created the first 20 I created another ASN1. This can be shown via a query:

mysql> SELECT * FROM ASN WHERE AS_NUM = 1;
+--------+------------------------------+---------+------------+
| AS_NUM | AS_NAME                      | QUERIED | CHANGED    |
+--------+------------------------------+---------+------------+
|      1 | Level 3 Communications, Inc. |       0 | 2015-01-06 |
|      1 | Level 3 Communications, Inc. |       0 | 2015-01-06 |
+--------+------------------------------+---------+------------+
2 rows in set (0.00 sec)

The script I used to populate all 20 was fine to populate a database for the first time, but no good for further updates. The script simply created new records, with new dates. What we want is to check the database first, then do different actions depending on what we see.

I’ll now get a list of the first 30 AS numbers, then run a script to update. I’ll need to check the following:

  • Does the AS Number already exist?
  • If so, check that the name matches (note, I’m storing only 50 characters of the name, so only match the first 50 characters)
  • If the name doesn’t match, update the name and updated the CHANGED field.
  • Otherwise just create a new record and insert todays date into the CHANGED field.

I’ll first delete all records with ASN 1 from the database then write the new script.

mysql> DELETE FROM ASN WHERE AS_NUM = 1;
Query OK, 2 rows affected (0.00 sec)

As this was getting a bit big, I moved the update and create sections into their own methods.

#!/usr/bin/python

import MySQLdb
import sys
import time

as_list = []
already, new, changed = 0, 0, 0
today = time.strftime("%Y-%m-%d")

with open ('30_asn') as f:
    new_as = f.readlines()
for AS in new_as:
    as_list.append(AS.strip())

db = MySQLdb.connect("localhost", "asnuser", "password", "AS_NUM_NAME")
cursor = db.cursor()

def create_sql(AS_NUM, AS_NAME):
    sql = '''INSERT INTO ASN(AS_NUM, AS_NAME, QUERIED, CHANGED) \
                VALUES (%s, %s, %s, %s)'''
    cursor.execute(sql, (AS_NUM, AS_NAME, 0, today))

def update_sql(AS_NUM, AS_NAME):
    sql = """
        UPDATE ASN
        SET QUERIED = %s
        WHERE AS_NUM = %s"""
    cursor.execute(sql, (AS_NUM, AS_NAME))

with db:
    for AS in as_list:
        split_as = AS.split(' ', 1)
        split_as[1] = split_as[1].lstrip()
        sql = "SELECT * FROM ASN where AS_NUM = " +str(split_as[0])
        cursor.execute(sql)
        row = cursor.fetchone()
        if row:
            if row[1] == split_as[1][:50].strip():
                already += 1
                pass
            else:
                changed += 1
                update_sql(int(split_as[0].strip()), split_as[1].strip())
        else:
            new += 1
            create_sql(int(split_as[0].strip()), split_as[1].strip())

print "New AS: " + str(new)
print "Changed: " + str(changed)
print "Unchanged: " + str(already)

Let’s do a quick run:

root@python-db:~# ./insert.py
New AS: 11
Changed: 0
Unchanged: 20

Great, 1 was re-added, plus the 10 new ones. This should add them all, plus not change the values of anything that was already there:

mysql> SELECT * FROM ASN;
+--------+---------------------------------------------------+---------+------------+
| AS_NUM | AS_NAME                                           | QUERIED | CHANGED    |
+--------+---------------------------------------------------+---------+------------+
|      1 | Level 3 Communications, Inc.                      |       0 | 2015-01-06 |
|      2 | University of Delaware                            |       2 | 2015-01-06 |
|      3 | Massachusetts Institute of Technology             |       1 | 2015-01-06 |
|      4 | University of Southern California                 |       0 | 2015-01-06 |
|      5 | Symbolics, Inc.                                   |       0 | 2015-01-06 |
|      6 | Bull HN Information Systems Inc.                  |       0 | 2015-01-06 |
|      7 | UK Defence Research Agency                        |       0 | 2015-01-06 |
|      8 | Rice University                                   |       0 | 2015-01-06 |
|      9 | Carnegie Mellon University                        |       0 | 2015-01-06 |
|     10 | CSNET Coordination and Information Center (CSNET- |       1 | 2015-01-06 |
|     11 | Harvard University                                |       0 | 2015-01-06 |
|     12 | New York University                               |       0 | 2015-01-06 |
|     13 | Headquarters, USAISC                              |       0 | 2015-01-06 |
|     14 | Columbia University                               |       0 | 2015-01-06 |
|     15 | DYNAMICS                                          |       0 | 2015-01-06 |
|     16 | Lawrence Berkeley National Laboratory             |       0 | 2015-01-06 |
|     17 | Purdue University                                 |       0 | 2015-01-06 |
|     18 | University of Texas at Austin                     |       3 | 2015-01-06 |
|     19 | Leidos, Inc.                                      |       0 | 2015-01-06 |
|     20 | University of Rochester                           |       2 | 2015-01-06 |
|     21 | The RAND Corporation                              |       0 | 2015-01-06 |
|     22 | Navy Network Information Center (NNIC)            |       0 | 2015-01-06 |
|     23 | National Aeronautics and Space Administration     |       0 | 2015-01-06 |
|     24 | National Aeronautics and Space Administration     |       0 | 2015-01-06 |
|     25 | University of California at Berkeley              |       0 | 2015-01-06 |
|     26 | Cornell University                                |       0 | 2015-01-06 |
|     27 | University of Maryland                            |       0 | 2015-01-06 |
|     28 | Deutsches Zentrum fuer Luft- und Raumfahrt        |       0 | 2015-01-06 |
|     29 | Yale University                                   |       0 | 2015-01-06 |
|     30 | SRI International                                 |       0 | 2015-01-06 |
+--------+---------------------------------------------------+---------+------------+
30 rows in set (0.00 sec)