Thursday, September 25, 2008

Sling Blade Runner

ITA Software could be an interesting place to work at. They sure have some interesting puzzles on their career site serving as tools for resume selection / recruitment. One of such puzzles has just gone to their archives which is a big green flag for an open discussion.

The puzzle is called Sling Blade Runner, and it is specified as follows:

"How long a chain of overlapping movie titles, like Sling Blade Runner, can you find?"

Use the following listing of movie titles: MOVIES.LST. Multi-word overlaps, as in "License to Kill a Mockingbird," are allowed. The same title may not be used more than once in a solution. Heuristic solutions that may not always produce the greatest number of titles will be accepted: seek a reasonable tradeoff of efficiency and optimality.

Let's go and try to solve it!

The General Algorithm

The task suggests a reduction to the graph-traversal domain. By translating the specification to a bit more technical level, we can lay ground for an actual implementation. The highest level description of the proposed solution is as follows:

  1. Take the input of a set S of sentences

  2. Build the overlap-graph D of S which is a directed graph D(V, A); where

    • V=S,
    • A = {arc(s1, s2) if s2 overlaps s1 and both elements of S}
  3. Find the longest (vertex disjoint) path P in D

  4. Output P

Breaking down above steps takes us more closer to an actual realization. Following are the main steps in greater detail:

1. Take the input

The input is given as a stream of sentences (titles) separated by line breaks, where

  • a sentence is a non-empty sequence of words separated by non-breaking whitespace, and
  • a word is a non-empty sequence of non-whitespace characters.

For simplicity we won't care too much about different character sets and encodings and casing, and let the implementation environment decide about the proper meaning of whitespaces and characters.

What we do care about is that the input can possibly containt empty sentences that we ignore and multiple instances of the same sentence which we treat as a single value. Note that a sentence does not contain any of the whitespaces which appeared in the original line of input.

2. Build the overlap-graph

The meaning of "overlapping" is outlined in the original description of the task. A more formal definition would be that the sentence s2 overlaps sentence s1 iff any of the (non-empty) suffixes of s1 (including s1) is the prefix of s2 (including s2), where s1 is not s2.

The overlap-graph D then is made up by enumerating each possible p=(s1, s2) pair of S and putting an arc (s1, s2) into D for each p if s2 overlaps s1.

3. Find longest path

An L-long vertex disjoint path in the directed graph G is a sequence of vertices v[1], v[2], ..., v[L-1], v[L] with the following properties:

  • for each 1 <= i <= L, i != j it holds thats v[i] != v[j]
  • for each 1 <= i < L it holds that v[i] is the predecessor of v[i+1] in G (ie. there is an arc (v[i], v[i+1]) in G)

The longest vertex disjoint path P from a source vertex v in G is found by the following BFS-like recursive algorithm:

let n = the source of the search, initially n = v
let visited = vertices already visited by the algorithm, initially empty
define function longest_path(G, n, visited) as
if n is in visited then
return empty_list
for each successor m of n do
path_candidates[m] = longest_path(G, m, visited ++ n)
let mpc = one of the longest paths in path_candidates
return v ++ mpc // the list of nodes on one of the longest paths from n in G

To get the global longest path GLP in D, one of the best results of longest_path(D, n, empty_list) over each n in D must be chosen.

The "one of..." clauses are present because two distinct paths can have the same length.

4. Write output

Print vertices of GLP each vertex on a line.

5. Test

Distinct parts of the algorithm can be tested as separate units: overlap-detection, termination-of-traversal in a cyclic graph, proper graph-building, etc. -- but I'm gonna skip describing those. As far as testing the whole algorithm goes I'm only interested in two things:

  • That any results fed back to the input are given back unchanged.
  • That I would pick the most naive benchmark implementation and every optimization is subject give the same results as this benchmark.

These are the main concerns regarding the evolution of my implementations. I will present actual implementations in future posts, but I won't talk about testing: publishing of any code will imply a that it has passed above tests.

Friday, September 12, 2008

Google Chrome -- The Evil User Interface

I really can't believe things went so far that I'm going to put this kind of rant in ink. I really have to state that I'm still gonna take the Google pill over Microsoft on any day, but heck, maybe they don't learn from the master that well after all. I'm talking about Google Chrome. It may well be that Google did revolutionize things on the server side and took on very well on the web-client side, but God are they unprofessional on the desktop. Unprofessional may not be the most suitable word here, but if they are doing those amateurish things on the desktop on purpose, well, that just makes it sound downright evil.

First of all, I didn't even keep Google Desktop on my machine for more than an hour because frankly, having a web server to let me get easier access to my desktop kinda sounds freaky to me. Turns out that I wasn't overly paranoid at all.

Now they came out with a browser, the user interface of which isn't a very likely candidate to be served by, well a web server -- and I was truly frantic about it. Even got sold on the Chrome comic in a snap. Soon as it was available to download I gave it a run, even took a look at the source code (though not as deep as Scott Hanselman here -- more on this in a minute). I was truly excited only to find out in five minutes that this stuff is not for me, and I believe it's not even for the casual user. Of course there is at least one solid conspiracy theory stating that this browser is not meant for the users. Anyhow, I think that messing up the user experience won't really help a product to be embraced by anybody regardless of the creators' original intent.

All my enthusiasm just vaporized in 30 seconds after the download. Why, the installer shoves the damn thing under "%USERPROFILE%\Local Settings\Application Data". I mean they put the program into a data folder, which not only makes it friggin counterintuitive but raises a bunch of management issues in any sanely managed environment, be it a shared family computer or a corporate environment. (Yeah I know Google support's this von Neumann mindset of everything being just data, but hey...)

Next thing is the implementation of the user interface. Chrome gives you a bigger client area which is good, but this obviously means the removal of the nonclient stuff like the title bar, which is nice for an instant messenger's buddy window or a media player but is kinda frivolous for a web browser which you constantly have in the foreground. It kills interoperability with a whole slew of applications like WindowBlinds or GridMove, and not playing nice with other applications won't make good to your reputation in the hood.

The last thing might sound like nitpicking -- that is even more nitpicking than the previous ones :) but as I've said I went to see the Chrome source through the eyes of Scott Hanselman and found out that Google went down the too-clever path and released a client software which relies on undocumented features of an operating system. Call me a old-fashion but I think this wouldn't even be appropriate for a tiny 2-man shop, let alone for GOOG. It's just bad karma.

Tuesday, September 9, 2008

Microsoft Cloud

The Microsoft Cloud is to be seen drifting by. There has been a lot of buzz about clouds in general and about Microsoft's upcoming cloud strategies (see this very neat overview) Of course lot of these conversations have been vague, which is understandable in light of MS being very quiet about technical details regarding the Cloud. I mean, Their Cloud, because it very well seems like they're coming late to the game just to try and pull an IBM on the other players -- which I think would come together for them very nicely.
Why do I think so?
Well. Coupla months into the PDC and David Chappell comes up with a Clouding Platforms Paper and it finally makes sense. MS is going to give a flying damn about cloud services. They're building a platform, baby. You see? What's better than locking you into a development environment than ultimately locking you into a deployment environment? Hell, they may even give away free cookies, developer-developer-developer t-shirts and "Microsoft Patterns and Practices" handouts, oh just come and run your application straight on their infrastructure.
And oh why, they're pushing DLR like crazy which would make enterprise development even more opaque: expect hardcore rubysts go crazy when their code is ultimately put into the same appdomain with some cheesy vb.net code -- and communicating.

Nevertheless. I guess I've never been looking forward this much to the PDC (if ever) -- mainly because Raymond Chen and secondly because of the unveiling of the MS cloud stuff.