Why .NET LinkedList does not support Concat and Split operations?

2011/8/13 19:18:50英文系列评论(0)浏览(2735)

Download SimpleLinkedList source code

Concat O(1) or O(n) ?

The .NET LinkedList is a circular doubly linked list, where each node holds a reference to its previous and next nodes. The last node’s next is the head node and the head node’s previous is the last one.

Linked lists are attractive because of O(1) insertion and removal operations. Instead of shifting elements in array, you just chain nodes in appropriate order and that’s it.

Following this logic, concatenation of two lists should be similar O(1) operation. You just bind the end of the first list with the head of the second and let both the ends of the resulting list point to each other.

Nevertheless, if you look at .NET LinkedList implementation, you will find out that the only way to join two linked lists is to add elements of the second list one by one to the first one. This is O(n) operation.

Implementing Concat

So if you look at the implementation, you will find out that the LinkedListNode has one more field except prev and next. This is a reference to the owner list. So with this implementation, even if you implement concat in O(1) as described above, you will still need at least reparent all nodes which is again O(n) operation.

Can we avoid referencing list in every node? Yes we can. We can rewrite Previous and Next properties like this.

Before

public LinkedListNode Next
{
    get
    {
        if ((this.next != null) && (this.next != this.list.head))
        {
            return this.next;
        }
        return null;
    }
}

After

public SimpleLinkedListNode GetNext(SimpleLinkedList list)
{
    if ((this.Next != null) && (this.Next != list.First))
    {
        return this.Next;
    }
    return null;
}

After removing parent checking exception handling code from LinkeList implementation, you can implement concatenation like this:

public void Concat(SimpleLinkedList secondList)
{
    if (secondList.m_Head == null)
    {
        return;
    }

    if (this.m_Head == null)
    {
        this.m_Head = secondList.m_Head;
    }
    else
    {
        var seamLeft = this.Last;
        var seamRight = secondList.m_Head;
        var newEnd = secondList.Last;

        seamLeft.Next = seamRight;
        seamRight.Prev = seamLeft;
        newEnd.Next = this.m_Head;
        this.m_Head.Prev = newEnd;
    }
}

You are even able to maintain Count property correctly by adding counts of both lists.

this.m_Count += secondList.m_Count;

Drawbacks

The major drawback of this solution is that the second list will be left in an inconsistent state after this operation. Its last.prev does not point to its head anymore. Another point is that any active operation on one of the lists will modify both, thus they have shared nodes. These side effects violate fundamental principles of OO design. So I can imagine that Microsoft won’t compromise at that point and decided not to implement concatenation.

What about Split?

Implementing Split is very similar, with the difference that you are not able to maintain Count property in O(1) time. You are splitting at certain node without knowing its index inside the list. So if you want to have a Split operation, you should compromise Count property.

Concat Extension Method

There is a Concat extension method on IEnumarable interface. You might think it’s as stupid as Count extension method on IEnumarable and spends O(n) on bringing all together.

In fact, it executes in O(1) returning a new enumerator which enumerates first list and jumps to the second when the first list reaches its end.
This might help if you are not going to continue working with concatenation result like with a LinkedList. Another point is that several subsequent concatenation operations cause nested enumerators. So you get a tree of enumerators over enumerators, etc.

My Implementation

The actual goal of my implementation of double linked list supporting concat and split was to try out and demonstrate test driven refactoring approach.

The first implementation was just derived from common .NET LinkedList.

public class SimpleLinkedList : LinkedList
{
    public SimpleLinkedList()
    {
    }

    public SimpleLinkedList(IEnumerable enumerable)
            : base(enumerable)
    {
    }

    internal SimpleLinkedList Split(LinkedListNode splitNode)
    {
        throw new NotImplementedException();
    }

    public LinkedListNode Find(T search, IEqualityComparer comparer)
    {
        return Find(search);
    }

    public void FindLast(T search, IEqualityComparer comparer)
    {
        return Find(search);
    }
}

The second step was creating appropriate test set ensuring common LinkedList behaviour.

Example

[TestCase(new string[] { }, "A", new[] { "A" })]
[TestCase(new string[] { }, null, new[] { (string)null })]
[TestCase(new[] { "A" }, "B", new[] { "A", "B" })]
[TestCase(new[] { "A", "B", "C" }, "D", new[] { "A", "B", "C", "D" })]
public void AddLast_one_element_occururs_at_the_end_of_the_list
		(string[] original, string element, string[] expected )
{
    var target = new SimpleLinkedList(original);
    target.AddLast(element);
    CollectionAssert.AreEquivalent(expected, target);
}

After I had a reasonable set, I started implementing functionality successively.
Here is the result – SimpleLinkedList.

Advantages

Less memory consumption, thus every node SimpleLinkedListNode has three pointers (prev, next, value) instead of four (prev, next, list, value) unlike original .NET implementation.
Supports Concat and Split operations in O(1)
Supports IEnumarable Reverse() enumerator in O(1) – by the way, I do not see any reason why it’s not provided natively on the .NET LinkedList. Appropriate extension method requires O(n).

Disadvantages

Does not support Count.
Concat operation leaves second list in an inconsistent state.
Split operation leaves original list in an inconsistent state.
You are able to share nodes between lists.

Other

I have chosen an alternative strategy for implementing enumeration and find operations, rather than more verbose and purely readable original implementation. I hope the negative performance impact remains insignificant.

So be careful using this list and use it only if you are aware of the consequences.