Wednesday, March 7, 2012

Using QuickSort to Explore Scala Collection Methods

In my continuing effort to play with my new favorite language Scala I decided to look at one of the Standard Algorithms in the field: quick sort. Quick Sort is a divide and conquer algorithm that picks a pivot point in a list and then calls itself to sort the two halves. demonstrated a very simple Scala quick sort shown below. Let’s examine it as a learning vehicle. The complex looking first line defines a function that takes a List of things T where T is any class that supports "Ordered". It also returns such a list.

As with almost all recursive algorithms quick sort has two cases: the empty list and the non-empty list. Scala's "list match" sets up those two cases. In the case of the Nil list we return Nil which is how the recursive algorithm completes. Otherwise we match against "cdr::cons". That expression means "first item in the list followed by the rest of the list". Cons and cdr are the standard names that the Lisp language uses for those terms. This may seem like an odd way to view a list but in the functional world it is a very common idiom.

So given a first element and a rest-of-the-list what do we do? We use another of Scala powerful collections function: partition. "cons partition (_ < cdr)" says "create two lists: a list of the items in the list that do match the condition and another list of the items that do not match the condition. These two lists are returned into the anonymous tuple “val (before after)”. This shows how Tuples are first class citizens in Scala, allowing us to return multiple values from a function call.

Searching the web for Java implementations of quick sort shows the following code as quite typical for the partition portion of the algorithm:

I don’t know about you but Listing 2 looks prime for off-by-one errors and is nothing I’d expect to get right the first time. Compare that to: val (before,after) = cons partition (_ < cdr). The beauty of that line is that its close to the English description: create lists of the items before and after the specified item by partitioning the list based on a test. I’m beginning to feel that Scala feels hard not because it is hard but because it’s so different from Java in that its easy!

Lastly we have the recursive part of the function. We call ourselves on the before list and the after list, and build a new list of the results of those two calls plus the cdr value (because it’s not in either list). This implementation works and has the advantage of being a tiny bit of code. The drawback is that it always picks the first item in each list as the pivot point. It is known to be sub-optimal especially in the case where the function is called on an already sorted list.

This next version of the program picks a better pivot point, in this case the middle entry of the list.  We accomplish this by changing the "case" to "theList : List" which just  means any list.  We manually find the pivot point and then perform two for-comprehensions to find the list items larger and smaller than the pivot entry.  This gains performance via the better pivot point but trades that off against having to scan the main list twice for the two for-comprehensions.  Further testing revealed that this implementation also had a defect in that if an item was present in the list more than once it would only appear once in the  sorted output.  Another reason for robust testing!

Our last version (for now) tackles the duplicate issue and is also a bit more efficient. We go back to using partition to generate our two lists, but now we post-process the second list.  We call partition again on the second list (which contains items not-less-than the pivot) into a list of matching items and not matching items.

Summary: This article is not meant to create the best possible Scala implementation of  Quick Sort, but to give you a vehicle for playing with Scala list manipulation functions.