Archive for the ‘Generalities’ Category

Data structures in functional programming …big issues (?!)

November 1, 2008

Welcome to this short presentation where I show something simple turning out to have dramatic consequences. Data and relationships between them plays a vital role in any paradigm …let’s have a closer look of how these can be manipulated in a functional paradigm where data is immutable.

Let us start with something simple. Imagine we want to have a tree which could be navigable. That is, having a tree node, you could access both its children and its parent.

Ok, no problem, let us just define some data type like this for instance:

data Tree a = Tree
{
parent :: Maybe (Tree a),
content :: a,
children :: Forest a
}

This way, one would build data trees like this:

root = Tree
{
parent = Nothing,
content = “root”,
children = [child]
}

child = Tree
{
parent = Just root,
content = “child”,
children = []
}

So far so good you might think …Well, not really, here we already have two issues:

1. Changing a node == reconstruct the whole tree

Assume some operation takes a tree as input and outputs the same tree among with a single node was modified, added or deleted.
In any of these 3 cases, the consequences are the same:
– if node A has been modified/added/deleted, its children and parent has to be replaced with ones having updated references to A
– the same happens for their own children and parents
– …
– the whole tree becomes a replica of the original one

2. Coherence is not enforced

Nothing prevents you from creating bullshit where children do not reference their proper parents and vice-versa. Moreover, creating helper functions cannot really help since they cannot act locally. Indeed, because of issue 1, the whole tree must be built at once, it cannot be assembled of small pieces.

…And the tree was just an example. This applies to ANY navigable data structure!!!
When anything is modified, everything linked to it has to be replicated!
I guess there is no need to draw a picture when there is lots of data and (bidirectional) relationships between them.

One way around would be to use dictionaries to store the relationships like:
parent_child = [(root, chA), (root, chB), …]
And use the latter one for all operations. But this is both very inconvinient and inefficient.

Another way would be to use monads and mutable state …which boils down to simulating the imperative object-oriented paradigm (in a much uglier and complex manner).

…I’m curious of what you think about this, because the fact that something as simple as a tree involves such dramatic consequences looks pretty catastrophic to me.