Sunday, April 20, 2014

The Application of Algorithms

Since January I've been taking an Algorithms class from Princeton. Through the marvel that is modern education and the internet, it is possible to learn such a complex subject from such a prestigious university for free. The pusher of this most elucidating drug is Coursera, and the course work is delightfully complicated; especially since it required dusting off my Java foo which had been withering away for the better part of two years now.

Unfortunately, there is one terrible catch;

"I will not make solutions to homework, quizzes or exams available to anyone else. This includes both solutions written by me, as well as any official solutions provided by the course staff. " - Coursera's Honor Code (EULA) [Emphasis Mine]

As you might imagine, this complicates the issue of sharing one's recent work. It is also why this post isn't full of GitHub links to complicated implementations I'm proud of.

Part One of this Algorithms class covers a fairly basic set of algorithms;

  • The foundations of algorithms
    • Speed, memory, order of growth, practical observations and jargon.
  • Union Find
    • A basic connection finding algorithm, not terribly powerful on its own, but a fundamental building block in algorithm theory and a sub routine in many more powerful algorithms.
  • Basic Sort Algorithms
    • A quick summation of the basic sorting algorithms: selection sort, insertion sort and shell sort.
  • Stacks & Queues
    • A more in depth look at the algorithms behind these basic data structures, this section looks at some fundamental OOP concepts with a Java flair.
  • Advanced Sorts
    • Quicksort, Mergesort and their various implementations. This section is really where the class gets good and starkly illustrates the strengths of good algorithms and the trade offs necessary to make them.
  • P-Queues and Symbol Tables
    • The opener to BSTs and Trees as a whole. As with all things in this class, the fundamental data structures beneath a common concept are explained in detail before the main data structure. In a lot of ways it feels like an opening act before the people you paid to see show up.
  • Advanced Trees
    • Balanced search trees and geometric applications of BSTs, specifically 1d,  Kd-trees and line interception, detection, and sort algorithms. This lesson is a real mind bender as it hints at the fundamental basis for machine vision technology.
  • Hash Tables 
    • The last week is spent covering hash tables, since they're really the ultimate form of Symbol Tables. Time is devoted to the core elements of hash table creation, the actual hashing and subsequent sorting and searching. This lesson actually prompted me to go looking through MRI Ruby's implementation of hashes.

While the lectures and assignments have prompted me to think in new ways, this part of the course left me feeling like the knowledge was somewhat difficult to apply. The abstractions discussed above, save maybe the sorting algorithms, are quite far removed from real world problems. The class' assignments were quite good at relating these algorithms back to real world problems, but after turning each one in, I found myself pondering how I could sneak the algorithm into a rails project in some nifty manner.

While Rails is used for many things, the ease with which one can build a simple web application is without a doubt its main strength. Unfortunately, there just aren't many problems in the web-app sphere that lend themselves to these basic algorithms. Sure sorting of data is important, but your database is far better at it then you. There just aren't many nifty ways to use the algorithms discussed above in most Rails apps, the algorithms are too low level. They deal with memory optimization, connectivity and data management, things that are all abstracted away from you by either Ruby, Rails, your database or the inherent nature of the MVC framework. In this situation, the strength of Rails is its greatest weakness.

The first part of the algorithms class ended in the middle of March, but that isn't where this story ends. There are more delightful algorithms to explore, and part two began a week later covering some of the more complicated, and applicable, algorithms used today.


  • Undirected and Directed Graphs
    • Graphs are one of my favorite abstractions. A graph is nothing more then a point of vertices and lines connecting them. The beauty of this structure is that it is flexible enough to represent anything from roads, to social networks, to points of failure in a circuit board. This first lesson focuses on building, searching, sorting and generally using graphs.
  • Minimum Spanning Trees & Shortest Path
    • This lesson guides the student into edge weighted graphs, and the big name algorithms, Kruskal and Prim, that dominate them. Another problem type is introduced, the 'Shortest Path Problem'. The core of problems is that they're a transferable abstraction that can be used to solve numerous different real world problems. The lesson wraps up with distinguishing cyclic vs a-cyclic graphs, Dijkstra's algorithm and the double edged sword of negative weighted edges. All in all this lesson series was my favorite.
  • Max Flow & Radix Sorts
    • Maximum Flow is another problem type, specifically it is the question of how much of something can transit a network from point A to B. As with all problem types, it transfers seamlessly from rail networks to phone lines to the internet itself. The answers to maxflow problems lie in the Ford-Fulkerson algorithm, which as you might've guessed, sits upon a graph. Radix Sorts are a series of different sorts used to manage Strings. Which despite their fundamental nature in all things programming, are quite a complicated construct to work with.
What comes next, I won't know for a little while, but this second part of the class has thus far proven itself to be much more exciting. Unlike the first course, the algorithms discussed above do not deal primarily with memory management or other low level concerns. Instead they leverage a mastery of these low level concepts to solve real world problems. This makes them prime candidates for being integrated into a Rails application. Yet, where does one go to find a problem that needs solving? Perhaps a mock travel agency with route planning using Dijkstra's algorithm? Maybe a game of some kind that uses Ford-Fulkerson to determine the maximum possible score?

Beyond the question of how to display my new found knowledge, there is a subtler subject to discuss; speed and clarity of reasoning. Many times a programmer will find themselves pondering whether they've determined all possible edge cases. Or whether an edge case exists in the first place. This process can consume hours, sometimes days, depending on how it is gone about. An uncaught edge case can lead to a subtle bug that only ever materializes once or twice in a half million queries. Working with these algorithms has improved my ability to see the possibilities of a given situation. That is the greatest boon of these classes. Sure the knowledge is nice, but being able to look at a problem and quickly make heads or tails of it, determine a solution and then work through the finer points of that solution is the real reward.

I highly recommend anyone who hasn't taken such a class to seek one out. The skills you'll develop are well worth the time.