Data structures in functional programming …big issues (?!)

Welcome to this short presentation where I show something simple turning out to have dramatic consequences. Data and relationships between them plays a vital role in any paradigm …let’s have a closer look of how these can be manipulated in a functional paradigm where data is immutable.

Let us start with something simple. Imagine we want to have a tree which could be navigable. That is, having a tree node, you could access both its children and its parent.

Ok, no problem, let us just define some data type like this for instance:

data Tree a = Tree
{
parent :: Maybe (Tree a),
content :: a,
children :: Forest a
}

This way, one would build data trees like this:

root = Tree
{
parent = Nothing,
content = “root”,
children = [child]
}

child = Tree
{
parent = Just root,
content = “child”,
children = []
}

So far so good you might think …Well, not really, here we already have two issues:

1. Changing a node == reconstruct the whole tree

Assume some operation takes a tree as input and outputs the same tree among with a single node was modified, added or deleted.
In any of these 3 cases, the consequences are the same:
– if node A has been modified/added/deleted, its children and parent has to be replaced with ones having updated references to A
– the same happens for their own children and parents
– …
– the whole tree becomes a replica of the original one

2. Coherence is not enforced

Nothing prevents you from creating bullshit where children do not reference their proper parents and vice-versa. Moreover, creating helper functions cannot really help since they cannot act locally. Indeed, because of issue 1, the whole tree must be built at once, it cannot be assembled of small pieces.

…And the tree was just an example. This applies to ANY navigable data structure!!!
When anything is modified, everything linked to it has to be replicated!
I guess there is no need to draw a picture when there is lots of data and (bidirectional) relationships between them.

One way around would be to use dictionaries to store the relationships like:
parent_child = [(root, chA), (root, chB), …]
And use the latter one for all operations. But this is both very inconvinient and inefficient.

Another way would be to use monads and mutable state …which boils down to simulating the imperative object-oriented paradigm (in a much uglier and complex manner).

…I’m curious of what you think about this, because the fact that something as simple as a tree involves such dramatic consequences looks pretty catastrophic to me.

Advertisements

Tags: , , , ,

6 Responses to “Data structures in functional programming …big issues (?!)”

  1. Winheim Raulsh Says:

    Unfortunately, there’s nothing you can do about it directly. There is a base required amount of logical definition needed to have a well-formed tree-modification algorithm that works on well-formed trees.

    You could flatten your tree into a sorted list, remove the item from the list, and then re-create a tree from the new list. Of course that’s inefficient!

    I’ll think of other ideas. 🙂

  2. Julian Beaumont Says:

    {- This is what Zippers are for. A Zipper represents the location of a hole within a data structure, for example a zipper for lists would represent a list with a missing element. If you then couple that with the value of the missing element you get a cursor which can be used to navigate the structure performing appropriate modifications, for the tree example above you’d use something similar to the following.

    http://en.wikipedia.org/wiki/Zipper_data_structure -}

    data Tree a
    = Leaf a
    | Node [Tree a]

    data ListZipper a
    = (List a, List a)

    data ListCursor a
    = (ListZipper a, a)

    data TreeZipper a
    = List (ListZipper (Tree a))

    data TreeCursor a
    = (TreeZipper a, Tree a)

    toList :: ListCursor a -> List a
    toList ((xs, ys), x) = rev xs ++ [ x ] ++ ys

    ofList :: List a -> ListCursor a
    ofList (x : xs) = (([], xs), x)

    toTree :: TreeCursor a -> Tree a
    toTree ([], node) = node
    toTree (siblings : zipper, node) = toTree (zipper, Node (toList (siblings, node)))

    ofTree :: Tree a -> TreeCursor a
    ofTree t = ([], t)

    getCurrent :: TreeCursor a -> TreeCursor a
    getCurrent (zipper, node) = node

    moveChild :: TreeCursor a -> TreeCursor a
    moveChild (zipper, Node(child : children)) = (([], children) : zipper, child)

    moveSibling :: TreeCursor a -> TreeCursor a
    moveSibling ((leftSiblings, newNode : rightSiblings) : zipper, currentNode) = ((currentNode : leftSiblings), rightSiblings) : zipper, newNode)

    moveParent :: TreeCursor a -> TreeCursor a
    moveParent (siblings : zipper, node) = (zipper, Node(toList (siblings, node)))

    {- It should be fairly obvious how to define additional operations, for example to delete a node. -}

  3. sidewords Says:

    Hi,

    Thanks for your post, it was very informative …yet a bit “tricky” if I may say. 😉

    Concerning zippers, I advise the article on haskell.org’s wiki:
    http://www.haskell.org/haskellwiki/Zipper
    It’s very intuitive, easy to follow and thus suited for everyone. …and more complete than wikipedia’s article which I found personnally a bit dry & dull.

    …anyway, that’s not the point.

    The point is that your post, while being fully correct, does not address the core issue. It is orthogonal to the initial problem.

    Indeed, the initial problem was concerned about “navigable” data structures. That is, data structures that have bidirectional references among each other. Like doubly linked lists, trees with parent reference, custom interlinked data, etc… In all such cases, whatever you do, the point is that you cannot modify something locally. You have to copy the whole data structure for the slightest change whatsoever. Zipper or not.

    For non-navigable structures like normal lists (unidirectional) or simple trees (unidirectional), it is of course trivial to modify things and move things around. You basically just have to “cut them” at that point and insert the modified structure in it. And basically, zippers are just helper functions for doing exactly this. Navigating, storing context, and rebuilding the whole using the context …for unidirectional data structures!

    Once you have cycles, this does not work anymore. And a bidirectional link is a cycle by itself. And here, no matter what you do, you will have to copy everything!

    Let’s take for example a look at a doubly linked list like the one below:
    a-b-c-X-d-e-f
    if you want to change X in Y then you have to change the node itself, but also both neighboring nodes to point to that new node. And hence also the neighbors of the neighbors and so on. Moreover, one cannot do that incrementally but it must be done all at once, to finally become
    a’-b’-c’-Y-d’-e’-f’
    where the ‘ stand for the nodes with modified links. If a single node would not be modified, it would mean it would point to the old list.

    So, zipper or not, the problem remains untouched. Zippers are just helper functions to take apart and recombine unidirectional data structures without cycles. By the way, using zippers or not, the cost of modifying a node at depth h is still O(h) which is not very cheap.

    The last but important thing we did not take into account is lazyness which will delay this cost and make performances pretty hard to forsee.

  4. Julian Beaumont Says:

    True, I don’t think I explained myself very well. Presumably there is some motivation behind wanting a data structure with cyclic references, and I assume that motivation is the ability to follow those references when navigating the data structure. For example with the double linked list you want to be able to move forward or backwards through the list by following the references.

    If you create the cyclic data structure naively then, as you said, you will need to copy the entire data structure whenever you want to make a change. Because of this, to make working with the cyclic data structures practical you need to find some alternative representation for the data. For example, in the original blog entry you suggested using a dictionary
    parent_child = [(root, chA), (root, chB), …]
    to represent tree’s with nodes that can access both the parent and children.

    What I’m suggesting is that in the majority of useful cases you can use a zipper as the alternative representation of the cyclic data structure. Again, going back to the double linked list example, I believe, if you represent the doubly linked list using a zipper over a singly linked list then you can implement all operations on the doubly linked list with essentially the same time complexity as in a language with mutability.

    There are, however, some caveats to this claim. Because all old references to nodes will still refer to the old version of the data structure after an update to a node it may be necessary to navigate to the locations those references pointed to in the data structure to get fresh references, adding some additional cost. There are, however, alternative approaches to implementing zippers based on delimited continuations which allow multiple zippers to update the same structure simultaneously which could be used to avoid this problem.

    http://okmij.org/ftp/Computation/Continuations.html#zipper

    Obviously this use of zippers isn’t possible for all data structures with cyclic references, for example, you couldn’t use a zipper in order to implement potentially cyclic graphs. This is because the references which cause cycles in these cases aren’t known in advance and aren’t local, for some appropriate definition of local. In these cases you have to resort to a different representation, most likely something similar to your use of a dictionary to implement trees.

    As to lazyness, most of my work is with Ocaml, hence, its not something I’m used to reasoning about and as such I’ve ignored it in everything I’ve said.

  5. Winheim Raulsh Says:

    What do you mean, you have to modify the whole data structure!? Assuming mutable structures that work via pointers, that’s not true. I’ve coded doubly-linked lists myself, and they DON’T require global changes. That’s not true for any data strucutre.

    Inductively, for all data structures, changing a node requires that you can write to (and consequently, you have access to) all pointers that point to the changing item. Maybe if everything is connected to everything else, THEN you need to modify all other items. When and item changes, the other items don’t change: their structure is preserved.

    Structures whose items have local reference sharing require local changes. Structures whose items have global reference sharing require global changes.

  6. sidewords Says:

    that’s the point… in functional programming, everything is immutable! you have no mutable pointers!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s


%d bloggers like this: