Thursday, September 25, 2008

Sling Blade Runner

ITA Software could be an interesting place to work at. They sure have some interesting puzzles on their career site serving as tools for resume selection / recruitment. One of such puzzles has just gone to their archives which is a big green flag for an open discussion.

The puzzle is called Sling Blade Runner, and it is specified as follows:

"How long a chain of overlapping movie titles, like Sling Blade Runner, can you find?"

Use the following listing of movie titles: MOVIES.LST. Multi-word overlaps, as in "License to Kill a Mockingbird," are allowed. The same title may not be used more than once in a solution. Heuristic solutions that may not always produce the greatest number of titles will be accepted: seek a reasonable tradeoff of efficiency and optimality.

Let's go and try to solve it!

The General Algorithm

The task suggests a reduction to the graph-traversal domain. By translating the specification to a bit more technical level, we can lay ground for an actual implementation. The highest level description of the proposed solution is as follows:

  1. Take the input of a set S of sentences

  2. Build the overlap-graph D of S which is a directed graph D(V, A); where

    • V=S,
    • A = {arc(s1, s2) if s2 overlaps s1 and both elements of S}
  3. Find the longest (vertex disjoint) path P in D

  4. Output P

Breaking down above steps takes us more closer to an actual realization. Following are the main steps in greater detail:

1. Take the input

The input is given as a stream of sentences (titles) separated by line breaks, where

  • a sentence is a non-empty sequence of words separated by non-breaking whitespace, and
  • a word is a non-empty sequence of non-whitespace characters.

For simplicity we won't care too much about different character sets and encodings and casing, and let the implementation environment decide about the proper meaning of whitespaces and characters.

What we do care about is that the input can possibly containt empty sentences that we ignore and multiple instances of the same sentence which we treat as a single value. Note that a sentence does not contain any of the whitespaces which appeared in the original line of input.

2. Build the overlap-graph

The meaning of "overlapping" is outlined in the original description of the task. A more formal definition would be that the sentence s2 overlaps sentence s1 iff any of the (non-empty) suffixes of s1 (including s1) is the prefix of s2 (including s2), where s1 is not s2.

The overlap-graph D then is made up by enumerating each possible p=(s1, s2) pair of S and putting an arc (s1, s2) into D for each p if s2 overlaps s1.

3. Find longest path

An L-long vertex disjoint path in the directed graph G is a sequence of vertices v[1], v[2], ..., v[L-1], v[L] with the following properties:

  • for each 1 <= i <= L, i != j it holds thats v[i] != v[j]
  • for each 1 <= i < L it holds that v[i] is the predecessor of v[i+1] in G (ie. there is an arc (v[i], v[i+1]) in G)

The longest vertex disjoint path P from a source vertex v in G is found by the following BFS-like recursive algorithm:

let n = the source of the search, initially n = v
let visited = vertices already visited by the algorithm, initially empty
define function longest_path(G, n, visited) as
if n is in visited then
return empty_list
for each successor m of n do
path_candidates[m] = longest_path(G, m, visited ++ n)
let mpc = one of the longest paths in path_candidates
return v ++ mpc // the list of nodes on one of the longest paths from n in G

To get the global longest path GLP in D, one of the best results of longest_path(D, n, empty_list) over each n in D must be chosen.

The "one of..." clauses are present because two distinct paths can have the same length.

4. Write output

Print vertices of GLP each vertex on a line.

5. Test

Distinct parts of the algorithm can be tested as separate units: overlap-detection, termination-of-traversal in a cyclic graph, proper graph-building, etc. -- but I'm gonna skip describing those. As far as testing the whole algorithm goes I'm only interested in two things:

  • That any results fed back to the input are given back unchanged.
  • That I would pick the most naive benchmark implementation and every optimization is subject give the same results as this benchmark.

These are the main concerns regarding the evolution of my implementations. I will present actual implementations in future posts, but I won't talk about testing: publishing of any code will imply a that it has passed above tests.

Friday, September 12, 2008

Google Chrome -- The Evil User Interface

I really can't believe things went so far that I'm going to put this kind of rant in ink. I really have to state that I'm still gonna take the Google pill over Microsoft on any day, but heck, maybe they don't learn from the master that well after all. I'm talking about Google Chrome. It may well be that Google did revolutionize things on the server side and took on very well on the web-client side, but God are they unprofessional on the desktop. Unprofessional may not be the most suitable word here, but if they are doing those amateurish things on the desktop on purpose, well, that just makes it sound downright evil.

First of all, I didn't even keep Google Desktop on my machine for more than an hour because frankly, having a web server to let me get easier access to my desktop kinda sounds freaky to me. Turns out that I wasn't overly paranoid at all.

Now they came out with a browser, the user interface of which isn't a very likely candidate to be served by, well a web server -- and I was truly frantic about it. Even got sold on the Chrome comic in a snap. Soon as it was available to download I gave it a run, even took a look at the source code (though not as deep as Scott Hanselman here -- more on this in a minute). I was truly excited only to find out in five minutes that this stuff is not for me, and I believe it's not even for the casual user. Of course there is at least one solid conspiracy theory stating that this browser is not meant for the users. Anyhow, I think that messing up the user experience won't really help a product to be embraced by anybody regardless of the creators' original intent.

All my enthusiasm just vaporized in 30 seconds after the download. Why, the installer shoves the damn thing under "%USERPROFILE%\Local Settings\Application Data". I mean they put the program into a data folder, which not only makes it friggin counterintuitive but raises a bunch of management issues in any sanely managed environment, be it a shared family computer or a corporate environment. (Yeah I know Google support's this von Neumann mindset of everything being just data, but hey...)

Next thing is the implementation of the user interface. Chrome gives you a bigger client area which is good, but this obviously means the removal of the nonclient stuff like the title bar, which is nice for an instant messenger's buddy window or a media player but is kinda frivolous for a web browser which you constantly have in the foreground. It kills interoperability with a whole slew of applications like WindowBlinds or GridMove, and not playing nice with other applications won't make good to your reputation in the hood.

The last thing might sound like nitpicking -- that is even more nitpicking than the previous ones :) but as I've said I went to see the Chrome source through the eyes of Scott Hanselman and found out that Google went down the too-clever path and released a client software which relies on undocumented features of an operating system. Call me a old-fashion but I think this wouldn't even be appropriate for a tiny 2-man shop, let alone for GOOG. It's just bad karma.

Tuesday, September 9, 2008

Microsoft Cloud

The Microsoft Cloud is to be seen drifting by. There has been a lot of buzz about clouds in general and about Microsoft's upcoming cloud strategies (see this very neat overview) Of course lot of these conversations have been vague, which is understandable in light of MS being very quiet about technical details regarding the Cloud. I mean, Their Cloud, because it very well seems like they're coming late to the game just to try and pull an IBM on the other players -- which I think would come together for them very nicely.
Why do I think so?
Well. Coupla months into the PDC and David Chappell comes up with a Clouding Platforms Paper and it finally makes sense. MS is going to give a flying damn about cloud services. They're building a platform, baby. You see? What's better than locking you into a development environment than ultimately locking you into a deployment environment? Hell, they may even give away free cookies, developer-developer-developer t-shirts and "Microsoft Patterns and Practices" handouts, oh just come and run your application straight on their infrastructure.
And oh why, they're pushing DLR like crazy which would make enterprise development even more opaque: expect hardcore rubysts go crazy when their code is ultimately put into the same appdomain with some cheesy vb.net code -- and communicating.

Nevertheless. I guess I've never been looking forward this much to the PDC (if ever) -- mainly because Raymond Chen and secondly because of the unveiling of the MS cloud stuff.

Monday, June 9, 2008

You still have to sell me on WPF

There is an interesting tendency visible on ITJobsWatch regarding Windows Presentation Foundation. It's that the demand for WPF workforce has grown significantly in the past 6 months (as of 2008Q2). I don't really got around peeking into the actual job descriptions there but three quarters of a year has grown the demand for WPF-developer up to 1,2%. Compare this do WinForms' (plumetting) 3,9% you may wonder if WPF is really ready to replace WinForms or maybe it has so many pioneering features that it generates a whole lot of new expectations about user interfaces. I must say that I Don't Thnik So. Of course I'm biased towards Line Of Business applications, I'm really hesitant about WPF when it comes to LOB because I can't really believe that people won't miss WinForm's DataGridView, the tabbed interfaces, the Masked Textbox or the MDI. No sir. It's true that there has been a lot of debate on WPF LOB usage and WPF vs. Winforms and WPF credibility and a lot of promoting of WPF (albeit some spectators say not enough of it) from Microsoft, and I believe that WPF is still sorrounded by a lot of FUD, and I don't really want to name the guilty parties. Let me just highlight one of the highlight featues of WPF-based development: the fact that designers can work more closely with programmers on a WPF project. I can't see why and how this could be a very big achievement in a time where people have learned that designers don't give a heck about functionality and programmer's don't give a heck about user interface design. Not because they are evil, simply because there is a whole impendance mismatch between being the designer-world and the developer-world (and the software usability world before we forget what the game is about) so unless you can spend a lot of money on designing and usability testing your user interface, you're effectively busted. And if you're really busted, it doesn't make too much of a difference which technology you actually use.

Of course there's a lot of value in WPF which has soon to be explored, but I think that we're not there yet. It's really cool tho to have a framework which encourages separation of concerns in an area really really prone to produce ugly and smelly code.

UI design for programmers

I've long been wanting to read User Interface Desing For Programmers by Joel Spolsky. Which is absolutely positively a must-read for every developer who is concerned about developing great user interfaces. Start by learning that "great" == "highly usable" not "bells and whistles all over the place".

The only issue I had with the book was that somehow I thought it indicates that copying an already estabilished design could do you more good than trying to invent something really cool. The notable example here is the Microsoft Office user interface which is the product of hundreds thousand dollars of careful design and usability screening at Microsoft, which magically justifies of Office UI as great. While this might be true, I really don't want people to believe that The One Microsoft Way is always the great way, because Microsoft employees are people too and they can produce shitty user interaction design too.

Anyways, UIDfP is a great book, and if you're interested even more in UI design, then you should definitely also read About Face too.

Friday, May 23, 2008

I Love Emacs

I bought a new laptop last week, and when migrating my data over from the old one, I did realize that information vital to manage my stuff (like time and money and projects and stuff) is scattered around in so many places (Outlook, plain text files, ToDoList, AceMoney, various timesheets in Excel, etc) that it's starting to look unprofessional. :-)
I must admit that I was content with AceMoney, but my project/time-tracking system was what bothered me:
- Use a ToDoList to decompose my projects to a tree of tasks.
- Track my project-time in a Word table (I know it's cheesy, but Word let's you enter current time very quick)
- Cross-reference ToDoList's autogenerated ID's with Word's table
- Report with a little WinForms program, using AntiWord to extract the Word table.
Boy, there's a whole lot of dependencies there, and a whole lot of clumsyness, but well, that's what I got used to.
And that's all past now, because I've found out about Emacs's Org Mode and I'm the happiest person alive to do organize *ALL* my stuff and clock my time with it.

Plus, becoming the Emacs-enthusiast I always only hoped I would become, I'm even migrating my personal accounting from AceMoney to Ledger which has an Emacs mode.

I must say again, that I was content with AceMoney itself, and I think it's a very handy and clean application. For personal finance, it's far better I my book than jGnash or GnuCash, BUT Ledger is the ultimate lean-and-mean solution for you if you live by "less dependencies and more openness the better"-code. And I think you should.

Thursday, May 15, 2008

Jython StringTemplate un(shallow)copyable

I've started porting to Jython a Python application, which uses the Python version of StringTemplate. Right after the start I've bumped into an error when using StringTemplateGroup.
The Python code looks like this:

def go():
g = stringtemplate.StringTemplateGroup("A","B")
g.getInstanceOf("A_TEMPLATE")

When calling go() from Java/Jython, the following exception is raised:

Exception in thread "main" Traceback (innermost last):
...
...
File "C:\DEV\stringtemplate3\groups.py", line 363, in getInstanceOf
File "C:\DEV\stringtemplate3\templates.py", line 391, in getInstanceOf
File "C:\DEV\stringtemplate3\templates.py", line 369, in dup
File "C:\DEV\jython2.2.1\Lib\copy.py", line 78, in copy
Error: un(shallow)copyable object of type

Others have bumped into this error too. It's because Jython doesn't fully implement object copying (in the same sense that CPython doesn't implement it fully -- you cannot just copy arbitraty C objects). It turns out, that this isn't a problem for me, because as soon as I've changed the StringTemplateGroup error listener to my own (copyable) listener, the problem went away.

But there's one caveat. I actually choose to NOT use any custom error listener, because console errors are okay for me, so I've tried and put errors=None in the StringTemplateGroup constructor, which is a no-no, because None means "use the default one", which is exactly what I didn't want right now.

So the final solution was to redefine go like so:

def go():
g = stringtemplate.StringTemplateGroup("A","B")
g.errorListener = None
g.getInstanceOf("A_TEMPLATE")

Works like a charm, so far.