Last time my LFS build failed it wasn't even binutils but glibc! After 15 minutes of RMTFing I realized that it's called configparms not configparams, dummy.
Developing view of a late-adopter. Voice of your average developer from your run-of-the-mill outsourcing company. Too young to know multiple inheritance, too old to catch up with Ruby On Rails. Too ironic to be not unreal.
Friday, December 12, 2008
Linux From Scratch -- Revisited
Last time my LFS build failed it wasn't even binutils but glibc! After 15 minutes of RMTFing I realized that it's called configparms not configparams, dummy.
Tuesday, December 2, 2008
Linux From Scratch -- Failure
I guess I will have to come back to this one and contact the good guys on the mailing list. Of course I will do my RTFM before flooding public channels.
Saturday, November 29, 2008
Linux From Scratch -- Host System
As it is advertised it is definitely a valid host environment for the build process, as checked by version-check.sh
After setting up my main file systems, we need to get the source packages. There is a wget input file in the stable book download directory which comes very handy. My slow internet connection makes me have to wait for all of them to arrive.
Linux From Scratch -- Introduction
But it always bothered me that I'm not really into Linux. Not into like "I use Ubuntu for my daily browsing and document editing chores" or "I'm the sysadmin of a 50-server heterogeneous *nix farm" but somewhere in between.
I'm pretty much tied to Windows as my main working environment, and can't really afford the hassle to migrate all my stuff to Linux (yeah, going from XP 32-bit to Vista 64-bit was a big enough deal already, thank you), so what do I do?
This is where Linux From Scratch comes into play. I mean. If I just really could build a whole Linux system from scratch, well it would really mean at least something, right?
Well. Jump right into it.
Because I'm a newbie, I start with the latest stable version of the LFS book.
First of all I need a host systems which is a simple 2.6 generic Linux VMware Image created by EasyVMX! and booted up with the LFS LiveCD. Since the latest book version is 6.4 and the latest LiveCD version is 6.3 and I don't need no X and stuff, I'm gonna pick the 32-bit "-min" image.
Since I have a slow internet connection, I guess I will have to hook up on things later.
Thursday, September 25, 2008
Sling Blade Runner
ITA Software could be an interesting place to work at. They sure have some interesting puzzles on their career site serving as tools for resume selection / recruitment. One of such puzzles has just gone to their archives which is a big green flag for an open discussion.
The puzzle is called Sling Blade Runner, and it is specified as follows:
"How long a chain of overlapping movie titles, like Sling Blade Runner, can you find?"
Use the following listing of movie titles: MOVIES.LST. Multi-word overlaps, as in "License to Kill a Mockingbird," are allowed. The same title may not be used more than once in a solution. Heuristic solutions that may not always produce the greatest number of titles will be accepted: seek a reasonable tradeoff of efficiency and optimality.
Let's go and try to solve it!
The General Algorithm
The task suggests a reduction to the graph-traversal domain. By translating the specification to a bit more technical level, we can lay ground for an actual implementation. The highest level description of the proposed solution is as follows:
- Take the input of a set S of sentences 
- Build the overlap-graph D of S which is a directed graph D(V, A); where - V=S,
- A = {arc(s1, s2) if s2 overlaps s1 and both elements of S}
 
- Find the longest (vertex disjoint) path P in D 
- Output P 
Breaking down above steps takes us more closer to an actual realization. Following are the main steps in greater detail:
1. Take the input
The input is given as a stream of sentences (titles) separated by line breaks, where
- a sentence is a non-empty sequence of words separated by non-breaking whitespace, and
- a word is a non-empty sequence of non-whitespace characters.
For simplicity we won't care too much about different character sets and encodings and casing, and let the implementation environment decide about the proper meaning of whitespaces and characters.
What we do care about is that the input can possibly containt empty sentences that we ignore and multiple instances of the same sentence which we treat as a single value. Note that a sentence does not contain any of the whitespaces which appeared in the original line of input.
2. Build the overlap-graph
The meaning of "overlapping" is outlined in the original description of the task. A more formal definition would be that the sentence s2 overlaps sentence s1 iff any of the (non-empty) suffixes of s1 (including s1) is the prefix of s2 (including s2), where s1 is not s2.
The overlap-graph D then is made up by enumerating each possible p=(s1, s2) pair of S and putting an arc (s1, s2) into D for each p if s2 overlaps s1.
3. Find longest path
An L-long vertex disjoint path in the directed graph G is a sequence of vertices v[1], v[2], ..., v[L-1], v[L] with the following properties:
- for each 1 <= i <= L, i != j it holds thats v[i] != v[j]
- for each 1 <= i < L it holds that v[i] is the predecessor of v[i+1] in G (ie. there is an arc (v[i], v[i+1]) in G)
The longest vertex disjoint path P from a source vertex v in G is found by the following BFS-like recursive algorithm:
let n = the source of the search, initially n = v
let visited = vertices already visited by the algorithm, initially empty
define function longest_path(G, n, visited) as
   if n is in visited then 
       return empty_list
   for each successor m of n do
       path_candidates[m] = longest_path(G, m, visited ++ n)
   let mpc = one of the longest paths in path_candidates
   return v ++ mpc // the list of nodes on one of the longest paths from n in G
 To get the global longest path GLP in D, one of the best results of longest_path(D, n, empty_list) over each n in D must be chosen.
The "one of..." clauses are present because two distinct paths can have the same length.
4. Write output
Print vertices of GLP each vertex on a line.
5. Test
Distinct parts of the algorithm can be tested as separate units: overlap-detection, termination-of-traversal in a cyclic graph, proper graph-building, etc. -- but I'm gonna skip describing those. As far as testing the whole algorithm goes I'm only interested in two things:
- That any results fed back to the input are given back unchanged.
- That I would pick the most naive benchmark implementation and every optimization is subject give the same results as this benchmark.
These are the main concerns regarding the evolution of my implementations. I will present actual implementations in future posts, but I won't talk about testing: publishing of any code will imply a that it has passed above tests.
Friday, September 12, 2008
Google Chrome -- The Evil User Interface
First of all, I didn't even keep Google Desktop on my machine for more than an hour because frankly, having a web server to let me get easier access to my desktop kinda sounds freaky to me. Turns out that I wasn't overly paranoid at all.
Now they came out with a browser, the user interface of which isn't a very likely candidate to be served by, well a web server -- and I was truly frantic about it. Even got sold on the Chrome comic in a snap. Soon as it was available to download I gave it a run, even took a look at the source code (though not as deep as Scott Hanselman here -- more on this in a minute). I was truly excited only to find out in five minutes that this stuff is not for me, and I believe it's not even for the casual user. Of course there is at least one solid conspiracy theory stating that this browser is not meant for the users. Anyhow, I think that messing up the user experience won't really help a product to be embraced by anybody regardless of the creators' original intent.
All my enthusiasm just vaporized in 30 seconds after the download. Why, the installer shoves the damn thing under "%USERPROFILE%\Local Settings\Application Data". I mean they put the program into a data folder, which not only makes it friggin counterintuitive but raises a bunch of management issues in any sanely managed environment, be it a shared family computer or a corporate environment. (Yeah I know Google support's this von Neumann mindset of everything being just data, but hey...)
Next thing is the implementation of the user interface. Chrome gives you a bigger client area which is good, but this obviously means the removal of the nonclient stuff like the title bar, which is nice for an instant messenger's buddy window or a media player but is kinda frivolous for a web browser which you constantly have in the foreground. It kills interoperability with a whole slew of applications like WindowBlinds or GridMove, and not playing nice with other applications won't make good to your reputation in the hood.
The last thing might sound like nitpicking -- that is even more nitpicking than the previous ones :) but as I've said I went to see the Chrome source through the eyes of Scott Hanselman and found out that Google went down the too-clever path and released a client software which relies on undocumented features of an operating system. Call me a old-fashion but I think this wouldn't even be appropriate for a tiny 2-man shop, let alone for GOOG. It's just bad karma.
Tuesday, September 9, 2008
Microsoft Cloud
Why do I think so?
Well. Coupla months into the PDC and David Chappell comes up with a Clouding Platforms Paper and it finally makes sense. MS is going to give a flying damn about cloud services. They're building a platform, baby. You see? What's better than locking you into a development environment than ultimately locking you into a deployment environment? Hell, they may even give away free cookies, developer-developer-developer t-shirts and "Microsoft Patterns and Practices" handouts, oh just come and run your application straight on their infrastructure.
And oh why, they're pushing DLR like crazy which would make enterprise development even more opaque: expect hardcore rubysts go crazy when their code is ultimately put into the same appdomain with some cheesy vb.net code -- and communicating.
Nevertheless. I guess I've never been looking forward this much to the PDC (if ever) -- mainly because Raymond Chen and secondly because of the unveiling of the MS cloud stuff.
Monday, June 9, 2008
You still have to sell me on WPF
Of course there's a lot of value in WPF which has soon to be explored, but I think that we're not there yet. It's really cool tho to have a framework which encourages separation of concerns in an area really really prone to produce ugly and smelly code.
UI design for programmers
The only issue I had with the book was that somehow I thought it indicates that copying an already estabilished design could do you more good than trying to invent something really cool. The notable example here is the Microsoft Office user interface which is the product of hundreds thousand dollars of careful design and usability screening at Microsoft, which magically justifies of Office UI as great. While this might be true, I really don't want people to believe that The One Microsoft Way is always the great way, because Microsoft employees are people too and they can produce shitty user interaction design too.
Anyways, UIDfP is a great book, and if you're interested even more in UI design, then you should definitely also read About Face too.
Friday, May 23, 2008
I Love Emacs
I must admit that I was content with AceMoney, but my project/time-tracking system was what bothered me:
- Use a ToDoList to decompose my projects to a tree of tasks.
- Track my project-time in a Word table (I know it's cheesy, but Word let's you enter current time very quick)
- Cross-reference ToDoList's autogenerated ID's with Word's table
- Report with a little WinForms program, using AntiWord to extract the Word table.
Boy, there's a whole lot of dependencies there, and a whole lot of clumsyness, but well, that's what I got used to.
And that's all past now, because I've found out about Emacs's Org Mode and I'm the happiest person alive to do organize *ALL* my stuff and clock my time with it.
Plus, becoming the Emacs-enthusiast I always only hoped I would become, I'm even migrating my personal accounting from AceMoney to Ledger which has an Emacs mode.
I must say again, that I was content with AceMoney itself, and I think it's a very handy and clean application. For personal finance, it's far better I my book than jGnash or GnuCash, BUT Ledger is the ultimate lean-and-mean solution for you if you live by "less dependencies and more openness the better"-code. And I think you should.
Thursday, May 15, 2008
Jython StringTemplate un(shallow)copyable
The Python code looks like this:
def go():
g = stringtemplate.StringTemplateGroup("A","B")
g.getInstanceOf("A_TEMPLATE")
When calling go() from Java/Jython, the following exception is raised:
Exception in thread "main" Traceback (innermost last):
...
...
File "C:\DEV\stringtemplate3\groups.py", line 363, in getInstanceOf
File "C:\DEV\stringtemplate3\templates.py", line 391, in getInstanceOf
File "C:\DEV\stringtemplate3\templates.py", line 369, in dup
File "C:\DEV\jython2.2.1\Lib\copy.py", line 78, in copy
Error: un(shallow)copyable object of type
Others have bumped into this error too. It's because Jython doesn't fully implement object copying (in the same sense that CPython doesn't implement it fully -- you cannot just copy arbitraty C objects). It turns out, that this isn't a problem for me, because as soon as I've changed the StringTemplateGroup error listener to my own (copyable) listener, the problem went away.
But there's one caveat. I actually choose to NOT use any custom error listener, because console errors are okay for me, so I've tried and put errors=None in the StringTemplateGroup constructor, which is a no-no, because None means "use the default one", which is exactly what I didn't want right now.
So the final solution was to redefine go like so:
def go():
g = stringtemplate.StringTemplateGroup("A","B")
g.errorListener = None
g.getInstanceOf("A_TEMPLATE")
Works like a charm, so far.
Sunday, May 4, 2008
Know (not) One Editor
Of course I'm just totally mean here, because back in '99 Eclipse and VS weren't even conceived. What I really wanted to point out was, that the more advanced technology becomes, the more specialized knowledge you have to embrace. Back in the days, you were wicked cool with Emacs as an IDE. Nowadays you should switch if you want to avoid unneeded complexity when working with enterprise(y) stuff. And you better know some nano/joe/vi cause it's damn sure not all of your clients will have emacs installed on their machine you have to quickfix *now*
Friday, April 25, 2008
awk vs. Python performance
I thought I would put my two cents into this game by describing a microbenchmark inspired by an actual task of mine: convert a file of delimited values to LIBSVM's data format. That is transform each line of
num1 num2 num3 num4 ...
to
num1 1:num2 2:num3 3:num4 ...
I'm omitting details about (LIB)SVM here, but this isn't relevant now. The point is that the first field is kept in the output, but the rest of them are prefixed with 'number of field in line minus one' and a colon. Note that columns may be omitted in the output, but field indexes must be ascending within a line (this is the so-called sparse matrix format).
Of course my first solution is in Python:
for line in file:
cols = line.strip().split()
print " ".join(["%i:%s" %(i, x.strip())
for (i, x) in enumerate(cols)])[2:]
For a file with 100k lines and 128 columns, this runs in 38 seconds on my WINDOWS box.
Not too bad, but there is an itching disturbance, which does not let the urge to rewrite it in awk escape me. I never wrote a single line in awk, so this should be a good start.
And lo, my first awk program:
{
printf("%s", $1)
for (i = 2; i <= NF; i++)
{
printf(" %i:%s", i-1, $i)
}
printf("\n")
}
Let's run it on my WINDOWS box, under CYGWIN, with the same 100k/128cols file as above.
...
Ouch, this took more than 5 and a half minutes!! Shame for awk, go Python. Still it sounds like nonsense. Let's see. Both Python and awk are interpreted, so iterations are supposed to be slow(?) The Python code iterates over the columns with a library string function (join), which is implemented in C so it's blazing. awk uses 'for' extensively, so let's optimize it by unwinding the loop:
{
print $1, "1:"$2 , "2:"$3 , "3:"$4 , [...]
[...] , "125:"$126 , "126:"$127 , "127:"$128
}
This is a common technique used in compiler optimization, see how it works in this case. It WORKS, it runs MUCH faster. Same test file, same WINDOWS box and CYGWIN, 45 seconds. Wow, a 7-fold speedup, but still slower than Python. Is this for real? I don't think so.
Enter GNU/Linux, just for kicks, another box. The Python program runs in 31 seconds, and hey, wait a minute, we didn't have to wait a minute: the awk program is done in 17 seconds. I say, wow...
In a problem this little the use of Python was not too justified. In awk you even get mixable standard input / file handling for free, whereas in Python you must implement it by yourself. But when script portability is a concern, you still should consider Python even for a trivial task like this, because you could get more balanced performance. Of course you could give a try to Psyco, the Python optimizer, but you will have another dependency, and it may as well turn out that you're performance will even be much worse! More on this later.
Right tool for the right task, but know you environment too.
Monday, February 25, 2008
Shake Ya Tailfeathers (A Haskell Puzzle)
After reading and processing the first some chapters of YAHT and still not getting CPS, I nevertheless wanted to produce something, so came up with this:
It's a puzzle I've learned a long time ago called Tailfeather.
Tailfeather is a number-sequence, where the very first element is lress than the last one, the second element is lress than the second to last element and so on.
The task is to write a function with the specification:
tailfeather :: Int -> Int
{- tailfeather n = b
where 'b' > 1 is the smallest radix in which
the digits written of 'n'
form a Tailfeather sequence of numbers
-}
A very nice, very entry-level stuff. And the very beauty of it all is that implementing this solution was absolutely straighforward: I was able to focus on the actual problem without worrying about the underlying details (like how to represent a number-sequence -- is that an ArrayList or and array?)
Actually there are two sub-problems: A) rewriting a decimal number to another radix and B) determining whether a sequence is a Tailfeather.
Starting with the latter subtask, all you have to do is break a list into three parts, and solve the task recursively, like so:
tailfeather0 (x:xs) =
(x < last xs) &&
(tailfeather0 (init xs))
First we check if the Tailfeather property holds for the first and the last elements, then cut those elements from the list (that is take the init of the tail) and check the remaining part.
Of course the recursion must be terminated (and this next code MUST appear before the previous fragment):
tailfeather0 [] = True
tailfeather0 [x] = True
DONE.
Let's see problem A. If you don't know the proper algorithm for converting from decimal to another radix, it is given here (if the site is down use Wikipedia).
My first naive rewrite of the example provided on the linked page yielded the following:
convert2base2 0 = []
convert2base2 n = (mod n 2) : convert2base2 (div n 2)
Looks tight, the only problem is that it returns the digits in reverse order (starting with the least significant bit). Reversing the list after the whole computation would be an option, instead I do it in place:
convert2base2 n = convert2base2 (div n 2) ++ [mod n 2]
Generalizing the converter to accept any base (don't use this one with b <= 1):
convert2base _ 0 = []
convert2base b n = convert2base b (div n b) ++ [mod n b]
Now that we have our building blocks, let's combine them:
tailfeather n = tailfeather1 2 n
tailfeather1 b n =
if (b == n)
then b+1
else if tailfeather0 (convert2base b n)
then b
else tailfeather1 (b+1) n
That's it.
