SIPWiki - 2December2003

Twiddled with mod_rewrite until my URL’s look pretty. This is MUCH harder than writing a handler for a Java servlet, and even with a lot of debugging information in the logs it was hard to tell what strings were being filtered through the regular expressions.

Alas, aside from convincing Google to index them, pretty URLs don’t make the underlying content any more tractable to the human mind. A pointer is just a pointer after all, and I become increasingly convinced that animal memory simply does not deal in addressing problems. Consequently the difficulty in converting memetic “pointers” into something that means anything to a machine.

This is the central problem facing software of the future. Information capacity increases, more and more is generated, and kept. You can no longer solve the problem with brute force: meticulous filing systems, ancient employees. The pan is no more shallow, but the water is flowing so much faster. Even if you forsee no economic value in mining all that data, you have to avoid losing what data you need in a mass of irrelevant and duplicated documents.

Put it another way: nothing is isolated. At any one time you occupy one node in an information graph. The graph has directed and undirected properties. Directed, in that you arrived at the node from somewhere else, and in that in order to understand the current node you need access to content in other nodes. Undirected, in that the concepts themselves do not affect each other in one direction only, and more often than not what you need to know is how concepts are linked to each other, which cliques a single concept belongs to.

What one needs for maximal efficiency, sitting at any node, is access to content that bears directly on the content of the current node. More specifically, that bears directly on the operating question. Conventional hyperlinks fail in this respect because they are chosen by the author, who often has very different needs than the reader (in graphical terms, cannot anticipate the parents of the node). Hockemeyer et al describe an adaptive hypertext scheme by the name of RATH. (An acronym they “hardly deserve”, according to Carl.) Unfortunately though “knowledge space” is a nice concept, I haven’t seen any convincing formalism for it. Consequently RATH requires a whole lot of intervention, and can never be truly general.

Let me continue to speculate. You want to know what mink oil is made out of. So you type “mink oil” into Google, and you get 12,200 responses (at last count). At least the first three pages are links to people selling mink oil, most of whom don’t tell you what mink oil is made of. (It’s mink, in case you wind up at this page). Fine. So you scan through 3 pages of links, clicking on a few promising ones. During this process you have in mind your original question (the missing node in knowledge space, if you will), which you use to evaluate the relevance of each page. The reason the search took you so long (assuming you weren’t stoned) is that you couldn’t communicate to Google what you actually wanted. This is not a problem of picking the right keywords: adding “manufacture” has only a marginal effect on the probability of getting a relevant response.

And the first response is incorrect. But that’s another problem altogether…

last modified: 2003-12-02 18:15:18 -0500