Linked lists

Python has a rather handy list method. It allows you to add and remove items at will. How it actually does this is rather elaborate and you can read all about it over here.

C doesn’t give you the same flexibility. When you create an array, it is of X size. That size cannot change as there is no guarantee that memory on either side of the existing array space is even free for you. For this reason there is a handy data structure called the linked list.

Linked List

A linked list is essentially a struct node with a list item, and then a pointer to the next node. If you need to add an item to a list, you simply create another node in memory. As each item is created, it doesn’t matter where in memory it’s created, as the previous item has a pointer to the new node. A picture really helps here:
linked1
As we shall see, adding, inserting, and removing items are all very easy to do with linked lists. It’s not all roses though. Linked lists use more memory than an array of similar items. Each node has a pointer to the next item which takes up memory. The bigger issue is random access.

Let’s say we initialise an array with 1000 items. If I wanted to get to item 500, the computer would simply look at the address of item 0, work out the start address plus 500 of item size, and read that location.
linked2

You can’t do this with a linked list. Each node is in a separate location. They could be lined up next to each other, but they can just as easily be all over the place. The only way to get to item 500 in a linked list, is to start at node 0, read the pointer to the next node, then keep doing that until you reach node 500.

How do you know when you reach the end of the linked list? You could probably use any kind of escape character, but NULL tends to get used.

Basic implementation

So how do we create a linked list? Here I’ll create a struct called node that will hold an integer. The struct will also hold a pointer to the next node.

#include <stdio.h>

// Struct node with pointer to struct node
typedef struct node {
    int number;
    struct node* next;
}node;

int main(void) {

    // Create four nodes
    node node1;
    node node2;
    node node3;
    node node4;
    
    // Place numbers inside the four nodes we made
    node1.number = 10;
    node2.number = 20;
    node3.number = 30;
    node4.number = 40;
    
    // Each node will point to the next node
    node1.next = &node2;
    node2.next = &node3;
    node3.next = &node4;
    node4.next = NULL;
}

Here I have created four nodes. Each points to the next node, except for node 4. It has it’s next pointer assigned to NULL. If you fail to assign this NULL value, the pointer will have whatever value what in that memory location when you create it. That is going to point to some random value which is bad news. In order to traverse the list, we need a pointer to the root, or currently node1’s address. This is a pointer that should never change unless node1 is deleted or a node is inserted before it. We also need a cursor that will be updated with the current node’s address. I’ll create a function that prints out items in the list. The following is added to main()

    // Never lose the root position, otherwise you
    // lose access to your list!
    node* root = &node1;
    
    // Use cur as a cursor to move around
    node* cur;
    cur = root;

    // Print current list
    printList(cur); 

And the printList function is as so:

void printList(struct node* cur) {
    // Traverse through our linked list
    int i = 0;
    while (cur != NULL) {
        printf("Item %i is %i\n", i, cur->number);
        i++;
        
        // Move cursor to next node via pointer
        cur = cur->next;
    }
}

Running the program gives the following:

$ ./list 
Item 0 is 10
Item 1 is 20
Item 2 is 30
Item 3 is 40

Let’s change the print statement now to print out the memory locations of each of these items. Printing the value of node->next should give is the address of the next node. Let’s change the printList function like so:

void printList(struct node* cur) {
    // Traverse through our linked list
    while (cur != NULL) {
        printf("%i is is at position %p\n", cur->number, &cur->number);
        printf("Next node is at position %p\n\n", cur->next);
        
        // Move cursor to next node via pointer
        cur = cur->next;
    }
}

Printing this out gives us the following:

$ ./list 
10 is is at position 0xbff38e68
Next node is at position 0xbff38e60

20 is is at position 0xbff38e60
Next node is at position 0xbff38e58

30 is is at position 0xbff38e58
Next node is at position 0xbff38e50

40 is is at position 0xbff38e50
Next node is at position (nil)

Notice the memory address of each next value is the same as the address of the next value printed. Exactly what we would expect. Also notice that node4’s next value is NULL. I’ve used this very fact in the while loop to know when the list has ended and exit the loop.

Adding to the list

The previous program had me hard code four node values into the program. Let’s change this now so that the program will prompt to add values, and then print them out when needed:

#include <stdio.h>
#include <stdlib.h>

// Struct node with pointer to struct node
typedef struct node {
    int number;
    struct node* next;
}node;

void printList(node* cur) {

    // Traverse through our linked list
    while (cur != NULL) {
        printf("%i is is at position %p\n", cur->number, &cur->number);
        printf("Next node is at position %p\n\n", cur->next);
        
        // Move cursor to next node via pointer
        cur = cur->next;
    }
}

void addNode(node* cur) {

    // Traverse through our linked list
    // Need to get to end of list in order to append
    if (cur != NULL) {
        while (cur->next != NULL) {
            cur = cur->next;
        }
    }
    
    // Now at the the end of the list, ensure the current last node
    // points to a new node we create dynamically
    cur->next = (node*) malloc(sizeof(node));
    
    // Move to the new node
    cur = cur->next;
    
    // Get a value inside that node
    printf("Please enter node value: ");
    scanf("%i", &cur->number);  
}
        
int main(void) {

    // Create initial node
    node node1;
    
    // Get initial value from user
    printf("Please enter inital node value: ");
    scanf("%i", &node1.number);
    
    // Initial node's next should be end of list
    node1.next = NULL;
    
    // Root position of list
    node* root = &node1;
    
    // Use cur as a cursor to move around
    node* cur;
    
    // User controls what to do next
    int choice;
    do
    {
        printf("1 - Add new item to list\n");
        printf("2 - Print list\n");
        printf("3 - Exit\n# ");
        
        scanf("%i", &choice);
        
        switch (choice) {
            case 3:
                return 0;

            case 2:
                cur = root;
                printList(cur);
                break;
                
            case 1:
                cur = root;
                addNode(cur);
         }
                
    } while (choice != 3);
}

Let’s add a couple of values and then print them out:

: ./list
Please enter inital node value: 100
1 - Add new item to list
2 - Print list
3 - Exit
# 1
Please enter node value: 200
1 - Add new item to list
2 - Print list
3 - Exit
# 2
100 is is at position 0xbf97dc18
Next node is at position 0x875a008

200 is is at position 0x875a008
Next node is at position (nil)

1 - Add new item to list
2 - Print list
3 - Exit
# 1
Please enter node value: 300
1 - Add new item to list
2 - Print list
3 - Exit
# 1
Please enter node value: 400
1 - Add new item to list
2 - Print list
3 - Exit
# 2
100 is is at position 0xbf97dc18
Next node is at position 0x875a008

200 is is at position 0x875a008
Next node is at position 0x875a018

300 is is at position 0x875a018
Next node is at position 0x875a028

400 is is at position 0x875a028
Next node is at position (nil)

1 - Add new item to list
2 - Print list
3 - Exit
# 3

Inserting into the list

Inserting into a list is very simple too, but it comes with one risky part. Order of operations is quite important. If you get to the position where you want to insert, then create a new node, you need to be sure that you don’t lose the address to the original next node. If you do so, you could lose the rest of your list!

Once again, imagery helps to explain this. Here we have three nodes. We now want to add a new node in the position:
linked3
Here we malloc a new node, and then point the previous node to this new node:
linked4
Oops, how do we now reattach the rest of the list? It’s there in memory, but good luck finding it.
linked5
It’s essential to ensure you either link the new node to the old rest of list first, then change the old node:
linked6
linked7
Or simply save the pointer value that the original node was pointing to, and copy that back into our new node’s pointer value.

I’ve chose the latter in the following code. I’ll only post the changed parts. I’ll create a new function that will do the insertion and call it through main():

    do
    {
        printf("1 - Add new item to list\n");
        printf("2 - Print list\n");
        printf("3 - Insert node\n");
        printf("4 - Exit\n# ");
        
        scanf("%i", &choice);
        
        switch (choice) {
            case 4:
                return 0;
                
            case 3:
                cur = root;
                printf("After which value should I insert? ");
                int match;
                scanf("%i", &match);
                insertNode(cur, match);
                break;

            case 2:
                cur = root;
                printList(cur);
                break;
                
            case 1:
                cur = root;
                addNode(cur);
         }
                
    } while (choice != 4);


void insertNode(node* cur, int match) {

    // Traverse through linked list to find value
    while (cur->next != NULL) {
        
        // We've found the node we're looking for
        if (cur->number == match) {
        
            // save pointer to rest of list
            node* temp = cur->next;
            
            // Create a new node and get current node to point to it
            cur->next = (node*) malloc(sizeof(node));
        
            // Move to the new node
            cur = cur->next;
            
            // Get a value to insert
            printf("What value do you want to insert? ");
            int number;
            scanf("%i", &number);
            cur->number = number;
            
            // Finally ensure new node connects to the rest of the list
            cur->next = temp;
            
            return;
        }
        
        // Otherwise keep going through the list
        cur = cur->next;
    }
}

Let’s add some values and then insert one:

$ ./list 
Please enter inital node value: 10
1 - Add new item to list
2 - Print list
3 - Insert node
4 - Exit
# 1
Please enter node value: 20
1 - Add new item to list
2 - Print list
3 - Insert node
4 - Exit
# 1
Please enter node value: 30
1 - Add new item to list
2 - Print list
3 - Insert node
4 - Exit
# 3
After which value should I insert? 20
What value do you want to insert? 25
1 - Add new item to list
2 - Print list
3 - Insert node
4 - Exit
# 2
10 is is at position 0xbfd42938
Next node is at position 0x831e008

20 is is at position 0x831e008
Next node is at position 0x831e028

25 is is at position 0x831e028
Next node is at position 0x831e018

30 is is at position 0x831e018
Next node is at position (nil)

1 - Add new item to list
2 - Print list
3 - Insert node
4 - Exit
# 4

Deleting a node

Deleting is similar to inserting. Instead of getting a new value, we simply look for the node in question and unlink it from the chain. As per the inserting function, we need to take care that we don’t lose the address of the next node in the chain. It’s a little more tricky as well, as when we remove a node, we need to get back to the previous node so we can set it’s pointer to the next node we need:
linked8
Here I’ve deleted the second node. The first node’s pointer now points to a memory location with garbage data. We’ve also lost the address to the rest of the list!

In my function here, I’ll ensure I always keep the pointer address of the previous node. When we find the node we want to remove, we still have the previous pointer value that we can copy where we need it:

void deleteNode(node* cur, int match) {

    // We need to keep note of previous node
    // otherwise we cannot reconnect it!
    node* previous = cur;

    // Traverse through linked list to find value
    while (cur->next != NULL) {
        
        // We've found the node we're looking for
        if (cur->number == match) {
        
            // Set pointer of last node around this node to new node
            node* temp = cur->next;
            cur = previous;
            cur->next = temp;
            
            return;
            
            // Where to release the RAM?
        }
        
        // Otherwise keep going through the list
        previous = cur;
        cur = cur->next;
    }
}
: ./list 
Please enter inital node value: 10
1 - Add new item to list
2 - Print list
3 - Insert node
4 - Delete node
5 - Exit
# 1
Please enter node value: 20
1 - Add new item to list
2 - Print list
3 - Insert node
4 - Delete node
5 - Exit
# 3
After which value should I insert? 10
What value do you want to insert? 15
1 - Add new item to list
2 - Print list
3 - Insert node
4 - Delete node
5 - Exit
# 2
10 is is at position 0xbfa0b0c8
Next node is at position 0x8bda018

15 is is at position 0x8bda018
Next node is at position 0x8bda008

20 is is at position 0x8bda008
Next node is at position (nil)

1 - Add new item to list
2 - Print list
3 - Insert node
4 - Delete node
5 - Exit
# 4
Which value should I delete? 15
1 - Add new item to list
2 - Print list
3 - Insert node
4 - Delete node
5 - Exit
# 2
10 is is at position 0xbfa0b0c8
Next node is at position 0x8bda008

20 is is at position 0x8bda008
Next node is at position (nil)

1 - Add new item to list
2 - Print list
3 - Insert node
4 - Delete node
5 - Exit
#5

Freeing your nodes

It’s good practice to clean up once you’re done. Just before we exit, I want to ensure that all nodes are removed from memory. As noted in the delete section, we need to ensure that we save the location of the next node. There are a couple of ways to do this. One way is to keep a note of the next node’s address, then delete the current node. then off to the next node and repeat. I’ve done that here:

void deleteList(node* cur) {
    // Traverse through linked list
    while (cur != NULL) {
        node* temp = cur;
        cur = cur->next;
        free (temp);
    }
}

Extras

I’ve not covered all the different ways of manipulating a linked list. What if we want to delete the first or last node for example?
This is a good read if you want to take it further: https://www.cs.bu.edu/teaching/c/linked-list/delete/

You can find my working code for the above right here: https://github.com/mellowdrifter/C-Sandbox/blob/master/linked.c

Structs in C

Back in March 2014 I wrote an app in Python that would log in and check various OSPF properties.

When putting the data into a structure, I was limited both by knowledge and Python at the time. I ended up using a dictionary which worked rather well, but I was never 100% happy with it.

Recently I’ve stumbled across structs in C, in which I can make pretty much any data structure I would like.

The Python app above pulled some very specific results. Looks at each OSPF-enabled interface on a router and get it’s properties. Most of those properties are integers, but the interface name itself is a string. Each interface has n amount of properties, and each router itself has n amount of OSPF interfaces.

An interface therefore looks like so:

Interface name - String
IP address - String
Cost - Int
Adj - Int
Hello timer - Int
Dead timer - Int

And above that, a router looks like so:

Router name - String
Interface - Struct
Interface - Struct

A struct can itself contain structs. I’ll explain more on this later.

Struct

A struct is a data structure. Some amount of RAM is given back by the OS to contain the data you require. If we look back at the interface above, it has a certain amount of fields, and each of those fields contain different data types. Let’s create one in C:

struct ospfInt {
	char intName[15];
	char ipAddress[15];
	int cost;
	int adj;
	int hello;
	int dead;
}; 

Here we have created a data structure, that contains 2 strings with 16 characters, and 4 integers. Let’s use this in a program and manually insert properties into the struct for now. I’ll manually define two interfaces and then pass those structs to a function to print out the various properties:

#include <stdio.h>
#include <string.h>

struct ospfInt {
	char intName[15];
	char ipAddress[15];
	int cost;
	int adj;
	int hello;
	int dead;
};

void printInterface(struct ospfInt interface) {

	// Print out structs passed here
	printf("Interface name: %s\n", interface.intName);
	printf("IP Address: %s\n", interface.ipAddress);
	printf("Cost: %i\n", interface.cost);
	printf("Adj: %i\n", interface.adj);
	printf("Hello: %i\n", interface.hello);
	printf("Dead: %i\n\n", interface.dead);
}

int main(void) {

	// Create two interfaces with type struct ospfInt
	struct ospfInt interface1;
	struct ospfInt interface2;

	//Insert properties into those structs
	strcpy(interface1.intName, "ge-1/3/0.641");
	strcpy(interface1.ipAddress, "10.11.31.227");
	interface1.cost = 10;
	interface1.adj = 1;
	interface1.hello = 10;
	interface1.dead = 40;

	strcpy(interface2.intName, "lo0.0");
	strcpy(interface2.ipAddress, "10.11.225.224");
	interface2.cost = 0;
	interface2.adj = 0;
	interface2.hello = 3;
	interface2.dead = 12;

	// Print out properties via function
	printInterface(interface1);
	printInterface(interface2);
}

The end result of the above being like so:

$ ./structs
Interface name: ge-1/3/0.641
IP Address: 10.11.31.227
Cost: 10
Adj: 1
Hello: 10
Dead: 40

Interface name: lo0.0
IP Address: 10.11.225.224
Cost: 0
Adj: 0
Hello: 3
Dead: 12

typedef struct

Instead of initialising interface1 and interface2 above as type struct, I can just typedef the struct itself. So change this:

struct ospfInt {
	char intName[15];
	char ipAddress[15];
	int cost;
	int adj;
	int hello;
	int dead;
};

to this:

typedef struct {
	char intName[15];
	char ipAddress[15];
	int cost;
	int adj;
	int hello;
	int dead;
}ospfInt;

Now in order to create a structure of type ospfInt above, simply initialise it like so:

	// Create two interfaces with type struct ospfInt
	ospfInt interface1;
	ospfInt interface2;

Remembering to change all references to it to simply state ospfInt:

void printInterface(ospfInt interface) {

	// Print out structs passed here
}

attribute packed

C compilers will align data inside the structure to fit inside 4 byte structures. This has got to do with the way that data is read by CPUs these days. What this means though is that the amount of memory a structure of yours needs might not be the sum total of the data types inside. I’ll create a new structure called interface1 and check each of it’s components size. I’ll then print out the actual size on the system.

typedef struct {
	char intName[15];
	char ipAddress[15];
	int cost;
	int adj;
	int hello;
	int dead;
}ospfInt;

int main(void) {

	// Create an inteface with type struct ospfInt
	ospfInt interface1;

	// Print out size of each datatype inside struct
	printf("Bytes used in intName[15] = %lu\n", sizeof(interface1.intName));
	printf("Bytes used in ipAddress[15] = %lu\n", sizeof(interface1.ipAddress));
	printf("Bytes used in cost = %lu\n", sizeof(interface1.cost));
	printf("Bytes used in adj = %lu\n", sizeof(interface1.adj));
	printf("Bytes used in hello = %lu\n", sizeof(interface1.hello));
	printf("Bytes used in dead = %lu\n", sizeof(interface1.dead));
	printf("Bytes total = %lu", sizeof(interface1.intName) + sizeof(interface1.ipAddress)
									+ sizeof(interface1.cost) + sizeof(interface1.adj)
									+ sizeof(interface1.hello) + sizeof(interface1.dead));

	printf("\n\nActual byes used = %lu\n", sizeof(interface1));

This code gives this output on my Mac:

$ ./structs
Bytes used in intName[15] = 15
Bytes used in ipAddress[15] = 15
Bytes used in cost = 4
Bytes used in adj = 4
Bytes used in hello = 4
Bytes used in dead = 4
Bytes total = 46

Actual byes used = 48

The reason here is because the two strings are taking up 15 bytes each. As this is not on a four byte boundary, each one has been padded with an extra byte to take it to a multiple of four. If I were to change the struct to 16, these values should match up:

typedef struct {
	char intName[16];
	char ipAddress[16];
	int cost;
	int adj;
	int hello;
	int dead;
}ospfInt;
$ ./structs
Bytes used in intName[16] = 16
Bytes used in ipAddress[16] = 16
Bytes used in cost = 4
Bytes used in adj = 4
Bytes used in hello = 4
Bytes used in dead = 4
Bytes total = 48

Actual byes used = 48

You can inform the compiler not to align data this way. If you’re reading structures from somewhere else, it’s important that data values fall on boundaries that you would expect. To remove alignment you create the structure like so:

typedef struct {
	char intName[15];
	char ipAddress[15];
	int cost;
	int adj;
	int hello;
	int dead;
}__attribute__((__packed__))
ospfInt;

As alignment is removed, no padding will be done and the struct should be exactly 46 bytes:

$ ./structs
Bytes used in intName[15] = 15
Bytes used in ipAddress[15] = 15
Bytes used in cost = 4
Bytes used in adj = 4
Bytes used in hello = 4
Bytes used in dead = 4
Bytes total = 46

Actual byes used = 46

Struct of structs

The data types inside structs can be any data type, including other structs. I now want to create the router struct, which contains it’s name, it’s loopback IP address, and all the OSPF enabled interface structures. There are a couple of ways I could do this and I’ll go over both. Let’s first assume that a router will not have more than 500 OSPF enabled interfaces to begin with. I could create a struct of structs like so:

typedef struct {
	char routerName[20];
	char loopAddress[15];
	ospfInt interfaces[500];
}ospfRouter;

Create a function that will then print out the router details, plus its two interfaces like so:

void printRouter(ospfRouter router) {

	// Print out router information and pass
	// interfaces for printing
	printf("Router name: %s\n", router.routerName);
	printf("Loopback address: %s\n\n", router.loopAddress);

	for (int i = 0; i < 2; i++) {
		printInterface(router.interfaces[i]);
	}
}

int main(void) {
	printRouter(router1);
}

The result being:

$ ./structs
Router name: r1.com
Loopback address: 10.10.10.10

Interface name: ge-1/3/0.641
IP Address: 10.11.31.227
Cost: 10
Adj: 1
Hello: 10
Dead: 40

Interface name: lo0.0
IP Address: 10.11.225.224
Cost: 0
Adj: 0
Hello: 3
Dead: 12

Another way would be to define pointers in the router struct. This way you could create a load of interface structs, and have the router struct itself simply point to those variables. I can’t think of any benefit to this though. You would have to keep a load of interface variables and to me it seems it would be better to stick that inside the router struct itself.

Visualising recursive functions

I’ve been doing a lot of coding practise, and recursive functions are one of those things that’s easier to visualise than think about.

At it’s heart, a recursive function is one that calls itself in over and over again until a result is found. At first that doesn’t make sense so let’s look at an example.

Factorial

The factorial of a number is the product off all numbers from that number down to 1. Two examples follow:

3! = 3 X 2 X 1 = 6
4! = 4 X 3 X 2 X 1 = 24
etc...

Looking closely, the factorial of any number is the product of that number and the factorial of the number below. i.e. The factorial of 4 is the same as 4 X the factorial of 3. Carrying on with this example, the factorial of 3 is the same as 3 X the factorial of 2.

This is a perfect use case of a recursive function, as a function can work out the factorial of any number by multiplying it by the factorial of a number 1 smaller. It is important to ensure that this recursiveness does not happen indefinitely. If a function calls itself forever, you’ll have a loop that never ends. In the above factorial, as soon as we get to 1, there is no need to call the function again, rather that should be used as an exit to the function.
a764b1ed93f5fae80373f990de499c79ef0e2b0b3f950cb6b42ed9294de3b947

I’ll write up a quick factorial recursive function in C:

#include <stdio.h>
#include <stdlib.h>

int factorial(int number) {

    //provide a way out of the function
    if (number == 1) {
        return 1;
    }
    
    // otherwide multiply value with factorial of smaller number
    else {
        return number * factorial(number - 1);
    }
}

int main(int argc, char* argv[]) {

    int number = atoi(argv[1]);
    printf("The factorial of %i is %i\n", number, factorial(number));
}

So the program is simple enough. Take a number via an argument, convert it to an integer, then print out the factorial of it. Let’s pass a few values to see the result:

$ ./factorial 3
The factorial of 3 is 6
$ ./factorial 4
The factorial of 4 is 24
$ ./factorial 5
The factorial of 5 is 120
$ ./factorial 6
The factorial of 6 is 720

If I pass a factorial of 4, what I’m actually doing is as follows:

4 X (3 X (2 X (1)))

Each call to factorial will call the function again with a number 1 smaller. This continues until the number is one. This is then returned to the previous function, returned to the previous function, and so on. i.e. The function will be continue to call itself until it gets to 1. That value will then be returned to the calling function and so on. In essence it gets to 1, then works it way out from the inside.

If I left out the case where number == 1, this function would loop forever. Let’s give this a test:

int factorial(int number) {
    return number * factorial(number - 1);
}
$ ./factorial 3
Bus error (core dumped)

Nope. I do wonder how far it got before it gets to this message though.

int factorial(int number) {
    printf("number is now %i\n", number);
    return number * factorial(number - 1);

}
$ ./factorial 3 > error.txt
Bus error (core dumped)

$ tail error.txt 
number is now -261597
number is now -261598
number is now -261599
number is now -261600
number is now -261601
number is now -261602
number is now -261603
number is now -261604
number is now -261605
number is now -261606

So we got to -261606 before this program crashed. But why did it crash? Should this function not just continue on forever, subtracting 1 each time? The reason has to do with how recursive functions are viewed in memory. Looking back at the factorial above, I noted that this is how the function saw the factorial of 4:

4 X (3 X (2 X (1)))

The computer needs to work this out from the inside out. i.e. It needs to get to number 1, to return that value to the previous call, which then passes that number back to the previous call to the function. Each time a function calls itself, that function gets added to the stack.

e7633bedf897bb24ce668ac9c5df6bf88a58ff7e114d27606a756f4c4888a3f1
Let’s visualise this. The program has just started and we have a stack up on which main has been called:
Untitled drawing
main() calls factorial(), which gets added to the stack:
Untitled drawing (1)

There is a finite limit to how big the stack is, and so a function can only call itself so many times. The larger the original number in our factorial calculation, the larger the amount of times our function calls itself. At some point we are going to run out of space on our stack. Remember one we get to the end, the result of the final calculation will be passed to the previous function, where it’s result will then be passed back and so on. Storing those functions and variables take space.
Untitled drawing (2)
At which point the program will crash. I’ll now change the program again to show the memory location of the variable passed into it each time. A new memory location will be used each time, until we run out:

int factorial(int number) {

    int* pointer = &number;
    printf("number is now %i and memory location is %p\n", number, pointer);
    
    return number * factorial(number - 1);
}
$ ./factorial 3 > error.txt
Bus error (core dumped)

$ tail error.txt 
number is now -174516 and memory location is 0xbf68ea14
number is now -174517 and memory location is 0xbf68e9e4
number is now -174518 and memory location is 0xbf68e9b4
number is now -174519 and memory location is 0xbf68e984
number is now -174520 and memory location is 0xbf68e954
number is now -174521 and memory location is 0xbf68e924
number is now -174522 and memory location is 0xbf68e8f4
number is now -174523 and memory location is 0xbf68e8c4
number is now -174524 and memory location is 0xbf68e894

Overflowing the stack is called a stack overflow. So at least you now know where the name stackoverflow.com comes from.

There is a program called valgrind that will actually show us what error was encountered:

$ valgrind ./factorial 3
number is now -174555 and memory location is 0xbe233654
==3145== Stack overflow in thread 1: can't grow stack to 0xbe232ffc
==3145== 
==3145== Process terminating with default action of signal 11 (SIGSEGV): dumping core
==3145==  Access not within mapped region at address 0xBE232FFC
==3145==    at 0x40FAB9C: _IO_file_write@@GLIBC_2.1 (fileops.c:1261)
==3145==  If you believe this happened as a result of a stack
==3145==  overflow in your program's main thread (unlikely but
==3145==  possible), you can try to increase the size of the
==3145==  main thread stack using the --main-stacksize= flag.
==3145==  The main thread stack size used in this run was 8388608.
==3145== Stack overflow in thread 1: can't grow stack to 0xbe232ff8
==3145== 
==3145== Process terminating with default action of signal 11 (SIGSEGV)
==3145==  Access not within mapped region at address 0xBE232FF8
==3145==    at 0x4024510: _vgnU_freeres (in /usr/lib/valgrind/vgpreload_core-x86-linux.so)
==3145==  If you believe this happened as a result of a stack
==3145==  overflow in your program's main thread (unlikely but
==3145==  possible), you can try to increase the size of the
==3145==  main thread stack using the --main-stacksize= flag.
==3145==  The main thread stack size used in this run was 8388608.
==3145== 
==3145== HEAP SUMMARY:
==3145==     in use at exit: 0 bytes in 0 blocks
==3145==   total heap usage: 0 allocs, 0 frees, 0 bytes allocated
==3145== 
==3145== All heap blocks were freed -- no leaks are possible
==3145== 
==3145== For counts of detected and suppressed errors, rerun with: -v
==3145== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
Segmentation fault (core dumped)

So there you have it :)

Life is busy

I’ve had zero time to update the blog recently. As some of you may know, I recently started a new job with Google. I’ve moved my family and I over from the UK to Dublin, Ireland.

To say I’m busy right now is an understatement. Not only is there a ton of reading for me to do at work, I’m also trying to sort out all the issues of moving country again.

There will be updates coming eventually, but when that will be I’m not yet certain. I’m working regular posts as well as a new part of the site that’ll eventually come live. I’m also working on a few twitter based apps. Two in the wild already are my @bgp4_table and @bgp6_table accounts.

In the meantime feel free to continue commenting. I do read all and try and reply when it’s possible.

Python and MySQL

Let me preface this post by stating I am not a database expert. I use them occasionally now and then. The below post probably doesn’t show best practices. If you have any suggestions feel free to comment.

Over the weekend I’ve been testing various ways for me to store, update, and retrieve data from a database. At first I was using the sqlite3 package, but turned to MySQL. Why the move you ask? Well I already had MySQL running on the target machine and already had scripts backing up all databases every night. Why not just use the existing database there?

For this post I’m using python 2.7.3, Debian 7.7.0, and MySQL-python 1.2.5.

Install and set-up

Ensure that the mysql python library is installed:

root@python-db:/tmp# pip install MySQL-python
Collecting MySQL-python
  Downloading MySQL-python-1.2.5.zip (108kB)
    100% |################################| 110kB 6.4MB/s
Installing collected packages: MySQL-python

I’ve also installed mysql on the test server. I’m using a root password of password simply as this is a test box.

I’ll now create a database to work on. I want a database that contains BGP AS numbers, AS Names, a queried field, and a data to know when last the AS changed. A user will be created that has full access to this database. Password for now will be password.

root@python-db:~# mysql -u root -p
Enter password:

mysql> create database AS_NUM_NAME;
Query OK, 1 row affected (0.00 sec)

mysql> CREATE USER 'asnuser'@'localhost' IDENTIFIED BY 'password';
Query OK, 0 rows affected (0.00 sec)

mysql> GRANT ALL ON AS_NUM_NAME.* TO 'asnuser'@'localhost';
Query OK, 0 rows affected (0.00 sec)

mysql> USE AS_NUM_NAME;
Database changed

mysql> create table ASN(AS_NUM INT, AS_NAME VARCHAR(50), QUERIED INT, CHANGED DATE);
Query OK, 0 rows affected (0.01 sec)

mysql> flush privileges;
Query OK, 0 rows affected (0.00 sec)

mysql> quit;
Bye

Add data

The mysql library allows us to open a connection to the database and execute sql commands. The next script will open the database, insert new data, commit those changed, then close the database:

#!/usr/bin/python

import MySQLdb
import time

today = time.strftime("%Y-%m-%d")

db = MySQLdb.connect("localhost", "asnuser", "password", "AS_NUM_NAME")

with db:
    cursor = db.cursor()
    sql = '''INSERT INTO ASN(AS_NUM, AS_NAME, QUERIED, CHANGED) \
                    VALUES (%s, %s, %s, %s)'''
    cursor.execute(sql, (1, 'Level 3 Communications, Inc.', 0, today))

I’ll give this a run:

root@python-db:~# ./add.py

Log into the database to see the changes:

root@python-db:~# mysql -u asnuser -p
Enter password:

mysql> use AS_NUM_NAME;

mysql> select * from ASN;
+--------+------------------------------+---------+------------+
| AS_NUM | AS_NAME                      | QUERIED | CHANGED    |
+--------+------------------------------+---------+------------+
|      1 | Level 3 Communications, Inc. |       0 | 2015-01-06 |
+--------+------------------------------+---------+------------+
1 row in set (0.00 sec)

Of course, doing an single update is not very useful. I’ll create a text file with the first 20 AS numbers and names in use:

1     Level 3 Communications, Inc.
2     University of Delaware
3     Massachusetts Institute of Technology
4     University of Southern California
5     Symbolics, Inc.
6     Bull HN Information Systems Inc.
7     UK Defence Research Agency
8     Rice University
9     Carnegie Mellon University
10    CSNET Coordination and Information Center (CSNET-CIC)
11    Harvard University
12    New York University
13    Headquarters, USAISC
14    Columbia University
15    DYNAMICS
16    Lawrence Berkeley National Laboratory
17    Purdue University
18    University of Texas at Austin
19    Leidos, Inc.
20    University of Rochester

I want to extract the first number, then everything will be part of the AS name. I’ll then stick them all in the database:

#!/usr/bin/python

import MySQLdb
import time

today = time.strftime("%Y-%m-%d")
as_list = []

with open ('20_asn') as f:
    new_as = f.readlines()
for AS in new_as:
    as_list.append(AS.strip())
db = MySQLdb.connect("localhost", "asnuser", "password", "AS_NUM_NAME")

with db:
    cursor = db.cursor()
    sql = '''INSERT INTO ASN(AS_NUM, AS_NAME, QUERIED, CHANGED) \
                    VALUES (%s, %s, %s, %s)'''
    for AS in as_list:
        split_as = AS.split(' ', 1)
        cursor.execute(sql, (int(split_as[0].strip()), split_as[1].strip(), 0, today))

After a quick run, let’s log back into the database and check what we have

mysql> SELECT * FROM ASN WHERE AS_NUM = 14;
+--------+---------------------+---------+------------+
| AS_NUM | AS_NAME             | QUERIED | CHANGED    |
+--------+---------------------+---------+------------+
|     14 | Columbia University |       0 | 2015-01-06 |
+--------+---------------------+---------+------------+
1 row in set (0.00 sec)

mysql> SELECT * FROM ASN WHERE AS_NUM = 18;
+--------+-------------------------------+---------+------------+
| AS_NUM | AS_NAME                       | QUERIED | CHANGED    |
+--------+-------------------------------+---------+------------+
|     18 | University of Texas at Austin |       0 | 2015-01-06 |
+--------+-------------------------------+---------+------------+
1 row in set (0.00 sec)

Retrieve data

I’ll write another script now that takes an AS number as an argument and prints out the AS name:

#!/usr/bin/python

import MySQLdb
import sys

db = MySQLdb.connect("localhost", "asnuser", "password", "AS_NUM_NAME")
cursor = db.cursor()
sql = "SELECT AS_NAME FROM ASN where AS_NUM = " + sys.argv[1]
with db:
    cursor.execute(sql)
    row = cursor.fetchone()
    print row[0]

Note: Replies are always given as tuples, even when a single value is returned. This is why I’m only printing the first item in the returned tuple with [0]

A quick run gives me what I need:

root@python-db:~# ./check.py 5
Symbolics, Inc.
root@python-db:~# ./check.py 6
Bull HN Information Systems Inc.
root@python-db:~# ./check.py 7
UK Defence Research Agency

Updating

I’d like to update the QUERIED field each time I query a value. I’ll rewrite the last script to update that field when it gets a result. First it should get the AS Name and the QUERIED value. Then update the QUERIED value by 1, and print the AS name:

#!/usr/bin/python

import MySQLdb
import sys

query_as = sys.argv[1]

db = MySQLdb.connect("localhost", "asnuser", "password", "AS_NUM_NAME")
cursor = db.cursor()
sql = "SELECT * FROM ASN where AS_NUM = " + sys.argv[1]
with db:
    cursor.execute(sql)
    row = cursor.fetchone()
    queried = row[2]
    queried += 1
    sql = """
        UPDATE ASN
        SET QUERIED = %s
        WHERE AS_NUM = %s"""
    cursor.execute(sql, (queried, query_as))
    print row[1]

First, my mysql check the queried value is 0:

mysql> select * from ASN where AS_NUM = 18;
+--------+-------------------------------+---------+------------+
| AS_NUM | AS_NAME                       | QUERIED | CHANGED    |
+--------+-------------------------------+---------+------------+
|     18 | University of Texas at Austin |       0 | 2015-01-06 |
+--------+-------------------------------+---------+------------+
1 row in set (0.00 sec)

Query that value three times:

root@python-db:~# ./check.py 18
University of Texas at Austin
root@python-db:~# ./check.py 18
University of Texas at Austin
root@python-db:~# ./check.py 18
University of Texas at Austin

Now check the database:

mysql> select * from ASN where AS_NUM = 18;
+--------+-------------------------------+---------+------------+
| AS_NUM | AS_NAME                       | QUERIED | CHANGED    |
+--------+-------------------------------+---------+------------+
|     18 | University of Texas at Austin |       3 | 2015-01-06 |
+--------+-------------------------------+---------+------------+
1 row in set (0.00 sec)

Very useful.

There was an issue I created earlier that you may have spotted. I created a record for ASN1, then when I created the first 20 I created another ASN1. This can be shown via a query:

mysql> SELECT * FROM ASN WHERE AS_NUM = 1;
+--------+------------------------------+---------+------------+
| AS_NUM | AS_NAME                      | QUERIED | CHANGED    |
+--------+------------------------------+---------+------------+
|      1 | Level 3 Communications, Inc. |       0 | 2015-01-06 |
|      1 | Level 3 Communications, Inc. |       0 | 2015-01-06 |
+--------+------------------------------+---------+------------+
2 rows in set (0.00 sec)

The script I used to populate all 20 was fine to populate a database for the first time, but no good for further updates. The script simply created new records, with new dates. What we want is to check the database first, then do different actions depending on what we see.

I’ll now get a list of the first 30 AS numbers, then run a script to update. I’ll need to check the following:

  • Does the AS Number already exist?
  • If so, check that the name matches (note, I’m storing only 50 characters of the name, so only match the first 50 characters)
  • If the name doesn’t match, update the name and updated the CHANGED field.
  • Otherwise just create a new record and insert todays date into the CHANGED field.

I’ll first delete all records with ASN 1 from the database then write the new script.

mysql> DELETE FROM ASN WHERE AS_NUM = 1;
Query OK, 2 rows affected (0.00 sec)

As this was getting a bit big, I moved the update and create sections into their own methods.

#!/usr/bin/python

import MySQLdb
import sys
import time

as_list = []
already, new, changed = 0, 0, 0
today = time.strftime("%Y-%m-%d")

with open ('30_asn') as f:
    new_as = f.readlines()
for AS in new_as:
    as_list.append(AS.strip())

db = MySQLdb.connect("localhost", "asnuser", "password", "AS_NUM_NAME")
cursor = db.cursor()

def create_sql(AS_NUM, AS_NAME):
    sql = '''INSERT INTO ASN(AS_NUM, AS_NAME, QUERIED, CHANGED) \
                VALUES (%s, %s, %s, %s)'''
    cursor.execute(sql, (AS_NUM, AS_NAME, 0, today))

def update_sql(AS_NUM, AS_NAME):
    sql = """
        UPDATE ASN
        SET QUERIED = %s
        WHERE AS_NUM = %s"""
    cursor.execute(sql, (AS_NUM, AS_NAME))

with db:
    for AS in as_list:
        split_as = AS.split(' ', 1)
        split_as[1] = split_as[1].lstrip()
        sql = "SELECT * FROM ASN where AS_NUM = " +str(split_as[0])
        cursor.execute(sql)
        row = cursor.fetchone()
        if row:
            if row[1] == split_as[1][:50].strip():
                already += 1
                pass
            else:
                changed += 1
                update_sql(int(split_as[0].strip()), split_as[1].strip())
        else:
            new += 1
            create_sql(int(split_as[0].strip()), split_as[1].strip())

print "New AS: " + str(new)
print "Changed: " + str(changed)
print "Unchanged: " + str(already)

Let’s do a quick run:

root@python-db:~# ./insert.py
New AS: 11
Changed: 0
Unchanged: 20

Great, 1 was re-added, plus the 10 new ones. This should add them all, plus not change the values of anything that was already there:

mysql> SELECT * FROM ASN;
+--------+---------------------------------------------------+---------+------------+
| AS_NUM | AS_NAME                                           | QUERIED | CHANGED    |
+--------+---------------------------------------------------+---------+------------+
|      1 | Level 3 Communications, Inc.                      |       0 | 2015-01-06 |
|      2 | University of Delaware                            |       2 | 2015-01-06 |
|      3 | Massachusetts Institute of Technology             |       1 | 2015-01-06 |
|      4 | University of Southern California                 |       0 | 2015-01-06 |
|      5 | Symbolics, Inc.                                   |       0 | 2015-01-06 |
|      6 | Bull HN Information Systems Inc.                  |       0 | 2015-01-06 |
|      7 | UK Defence Research Agency                        |       0 | 2015-01-06 |
|      8 | Rice University                                   |       0 | 2015-01-06 |
|      9 | Carnegie Mellon University                        |       0 | 2015-01-06 |
|     10 | CSNET Coordination and Information Center (CSNET- |       1 | 2015-01-06 |
|     11 | Harvard University                                |       0 | 2015-01-06 |
|     12 | New York University                               |       0 | 2015-01-06 |
|     13 | Headquarters, USAISC                              |       0 | 2015-01-06 |
|     14 | Columbia University                               |       0 | 2015-01-06 |
|     15 | DYNAMICS                                          |       0 | 2015-01-06 |
|     16 | Lawrence Berkeley National Laboratory             |       0 | 2015-01-06 |
|     17 | Purdue University                                 |       0 | 2015-01-06 |
|     18 | University of Texas at Austin                     |       3 | 2015-01-06 |
|     19 | Leidos, Inc.                                      |       0 | 2015-01-06 |
|     20 | University of Rochester                           |       2 | 2015-01-06 |
|     21 | The RAND Corporation                              |       0 | 2015-01-06 |
|     22 | Navy Network Information Center (NNIC)            |       0 | 2015-01-06 |
|     23 | National Aeronautics and Space Administration     |       0 | 2015-01-06 |
|     24 | National Aeronautics and Space Administration     |       0 | 2015-01-06 |
|     25 | University of California at Berkeley              |       0 | 2015-01-06 |
|     26 | Cornell University                                |       0 | 2015-01-06 |
|     27 | University of Maryland                            |       0 | 2015-01-06 |
|     28 | Deutsches Zentrum fuer Luft- und Raumfahrt        |       0 | 2015-01-06 |
|     29 | Yale University                                   |       0 | 2015-01-06 |
|     30 | SRI International                                 |       0 | 2015-01-06 |
+--------+---------------------------------------------------+---------+------------+
30 rows in set (0.00 sec)