IT

Information Lifecycle Management and the cost of forgetfulness

Maxwell’s demon is a classic thought experiment that illustrates the second law of thermodynamics. The conundrum drove Ludwig Boltzmann to suicide. Leo Szilard, a contemporary and friend of Einstein, and one of the first proponents of the atomic bomb, provided the first refutation in 1929 – Maxwell’s demon appears to create energy from scratch, but what it is really doing is transferring entropy to the outside world.

In his analysis, Szilard considered alternative demons that would overcome his objection, and for one of them, now known as the Szilard Engine, his interesting conclusion is that it cannot work because forgetting information from memory in itself incurs thermodynamic costs. To make a real-world analogy – you may pay to get information in the form of your daily newspaper, but disposing of all that paper also incurs real costs in the form of garbage hauling taxes, even if you are not aware of them. In the cosmic order, getting rid of data is as important as acquiring it in the first place.

One of the buzzwords of the day in IT is Information Lifecycle Management, This basically means using a fancy database to track information assets, how they are stored, backed up and disposed of in accordance to retention policies and various legislative mandates like the Sarbanes-Oxley law. Companies like Microsoft discovered to their dismay the consequences of having incriminating information dragged into court under subpoena.

It seems the price of forgetfulness is eternal vigilance…

A side note – one of the things that seems consistently forgotten whenever designing a database is archiving and deleting old historical data – the data just keeps accumulating, usually until the database becomes obsolete and is decommissioned or the original designers have moved on to other jobs. In large scale databases, the efficient archiving of data requires partitioning, and is several orders of magnitude harder if the partitioning was poorly designed in the original data model. For instance, if some classers of historical data have to be held for longer retention period than others, make sure they are stored in different partitions as well, otherwise separating them will require lengthy batches. If you are specifying a database today, for your successors’ sake, plan for the orderly disposal of data once it is no longer relevant.

Tracing telephone number prefixes

I recently had a project where I needed to find out what telco served users based on their phone number (US only). Area code tables are a dime a dozen, but only give you the state, and I needed finer granularity than that, including the ability to drill down to the first three digits of the local phone number, for a total of 6 digits known in industry parlance as the NPA-NXX.

The solution I found is to go straight to the source: the website of the company tht administers the North American numbering plan on behalf of the FCC (the NANP actually covers more than the US, including Canada and some Caribbean countries, but the registrar I am refering to only covers the US).

They have a very convenient page with downloadable tables of NPA-NXX to carrier assignments. As an example, here is the entry for my home phone number:

Entry for 415-359-0918
StateNPA-NXXOCNCompany name Rate centerSwitchUse
CA415-3599740Pacific Bell SFNC CNTRLSNFCCA12DS0Assigned

OCN is the operating company number, a numeric code they use for carriers. The “rate center” (usually the city or town name) is unfortunately encoded using the proprietary Telcordia Common Language standards rather than in plain English, and you need to pay a license fee to get that database. The carrier name also varies wildly. Pac Bell has been fully subsumed under the SBC brand name, but the old identity still linger in these tables (Verizon, in contrast, has been much more diligent at having these tables updated, even when the pre-merger name is still mentioned).

With number portability, specially the forthcoming wired to wireless number portability required by the FCC, this information will be less useful as you will be able to have, say a New York phone number but actually be in Tokyo using a Vonage VoIP box (possibly even one with a ported number), but it is still a useful resource that is not widely known.

Update (2004-10-25):

This page is the most popular Google search on my website. If all you want is to look up a phone number and don’t want to go through the hassle of importing all the NANPA tables, there are a number of NPA-NXX search pages available on the web.

Sessions must die

Many e-commerce sites have session timeouts. Dawdle too long between the moment you enter the site and the moment you actually want to buy something, and you will be presented with an unpleasant message. The words “session timeout” will be there, drowned in a sea of technobabble, and you will have to restart from scratch. Using a bookmark will often have the same effect.

At this point, you may well be tempted to go shop elsewhere; indeed, it is the only principled response to such blatant contempt for customers. You will notice that successful sites like Amazon.com do not make their customers suffer such hassles – once you’re in, you are in, whether you have to take a lunch break or not. I don’t buy the security argument either – there is nothing sensitive about the contents of a cart, security belongs at checkout time, not browse time.

The reason why such crimes against usability are perpetrated is that business requirements too often take a back seat to technical expediency, paradoxically most often due to lack of technical competence. Many web development environments keep track of what you do on a website, the contents of your cart, and so on, in “sessions”, portions of memory that are set aside for this book-keeping purpose. They cannot be set aside forever, and must be purged to make room for new customers.

The tyro programmer will leave the default policy in place, which is to dump the session altogether and place the burden of recovering state on the customer. More experienced programmers will implement the session mechanism in a database so it can be kept almost indefinitely. In an era where disk space costs a dollar or two per gigabyte, and a desktop computer has enough processing power to crunch tens of thousands of transactions per minute, there is no justification for not doing so.

Dissecting the ENUM hype

I have seen some hype building up about ENUM (RFC 2916) recently, suggesting that ENUM will finally make Voice-over-IP (VoIP) leapfrog traditional telephony.

First of all, what is ENUM? It is simply a means for a VoIP program like Microsoft Windows Messenger or Apple’s iChat AV to find how to reach a correspondent from a phone number. First-generation VoIP programs required users to enter the IP address of their correspondent to call them, or to use non-standard or semi-standard directory services like Microsoft’s short-lived ILS. Since most people wouldn’t recognize an IP address if it bit them on the nose, this limited the market to geeks or incredibly determined penny-pinchers. With ENUM, if you know my phone number, say (415) 359-0918, or if you use the ITU-T E.164 international numbering standard, +1 415 359 0918, you would find the IP address of my VoIP phone/program by looking up 8.1.9.0.9.5.3.5.1.4.1.e164.arpa in the DNS. This is the reversed E.164 phone number, with e164.arpa tacked onto it, the way you can do reverse domain name lookup of the IP address 198.144.198.154 by looking up the PTR record corresponding to 154.198.144.198.in-addr.arpa. The .arpa is a legacy of the days when the Internet was bankrolled by the US ARPA, now the Defense Advanced Research Projects Agency (DARPA), and cause no little grumbling from people who would rather have a more “neutral” root e164.int.

This is a clever hack, allowing VoIP clients to leverage the DNS infrastructure, rather than building new protocols or even leveraging existing ones like LDAP. Every modern computer from Palm PDAs to Windows PCs to IBM mainframes have built-in DNS clients (although few support the ridiculously over-engineered new NAPTR resource record introduced for ENUM in RFC 2915). LDAP client libraries are nowhere near as widespread. Implementing ENUM is thus relatively easy, in the order of a few days of work at most, assuming the underlying name client software is not hardwired for IP address resolution only. In a similar way, a few years ago a group of young turks unsuccessfully proposed the ITU set aside an international country code for VoIP calls (the decimal IP address would be encoded as a 12 digit phone number). The old ITU proposal addressed the issue of getting conventional phones to reach an IP address, ENUM addresses how to reach a conventional phone number from an IP phone.

That said, while ENUM simplifies some of the logistical issues of managing a federated namespace or numbering plan, the bulk of the effort in transitioning to VoIP lies elsewhere:

  • Making IP phones as convenient as traditional phones (waiting 5 minutes for Windows to boot does not quite qualify).
  • Providing the same level of resiliency as the old network, for instance in case of power failure (when I was in Telecom school, one of our teachers always insisted on the fact when phones fail, people can die because emergency services are not dispatched any more).
  • Providing gateways between traditional telephony and IP telephony, and finding a sound business model to pay for it.
  • Figuring out how billing and compensation will be handled
  • Increasing competition by allowing direct calls when both users are VoIP capable, without having to know this explicitly beforehand.
  • Improving the ubiquity of IP connectivity, specially residential broadband and wireless.
  • Improving IP network quality of service, since Quality of Service approaches do not scale and end up being even more inefficient than the old network.

ENUM does not address the chicken-and-egg syndrome that is slowing VoIP adoption. A significant proportion of telephone calls is now carried over IP, for instance AT&T and MCI have both declared their intent to migrate over half their voice traffic to IP by 2005. But that is invisible to end-users, and not accessible – you won’t be able to dial an AT&T user by the IP address AT&T uses internally for them. AT&T will not give it to you because then they would not be able to bill you for it the way they can bill other telcos with interconnect agreements.

As the global E.164 numbering plan is controlled by the ITU and delegated to Telcos, ENUM will not by itself make it possible to create new phone numbers that are not controlled by Telcos. A competitive service provider could petition to obtain a number range, the way Vonage obtained US phone numbers for its VoIP service, but telephone billing has assumptions on the structure of phone numbers deeply embedded within it, probably inextricably. After all, it has taken carriers over ten years to learn to bill calls towards mobile phones in Europe differently by detecting they have a leading 6 digit.

Conceivably, the ENUM NAPTR resource records could be served from any domain name, not just one under e164.arpa. One could imaging that in the future, instead of giving a nineteenth-century style phone number (the first Strowger automatic phone exchange was installed in New Haven, Connecticut in 1891), you would give out a domain name (“call me at fazal.majid.info“) or something that looks like an email address. But that is not what ENUM offers (today), in part by design.

An annoying Sony installer bug

I bought a Sony Clié UX50 PDA a few weeks ago. I started noticing that every time I restart my PC (admittedly, not something I do all that often), an incredibly annoying Windows Explorer window set to C:\Program Files\Sony would pop up at the end of the boot sequence.

After a little analysis, it turns out the Sony installer created a registry key HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows\CurrentVersion\Run\Palm MulitUser Config with the value C:\Program Files\Sony Handheld\Configtool.exe. Enclosing that in double quotes using Regedit (insert obligatory admonishments about the dangers of Regedit here) fixes the problem. Click here for a screenshot of what it should look like.

Without the double quotes, Windows is thrown off by the space between “Sony” and “Handheld“, and assumes you want to run the command C:\Program Files\Sony with the command-line parameter Handheld\Configtool.exe. As there is no such command C:\Program Files\Sony, Windows just runs the default action for that object, which is to open it in a Windows Explorer window.

I wouldn’t have gone in such excruciating detail about this issue, except:

  • Somebody else annoyed with this bug might find this resolution via Google
  • It shows how sloppy Sony is about testing its software installers (in passing, note the typo “MulitUser” instead of “MultiUser“)
  • That registry folder is where all the startup items that aren’t in the Start menu reside. You might want to have a look at it to see what odd junk resides there, such as obnoxious system tray items like RealPlayer.

Update (2004-01-12):

Sony has an official fix for this problem: PDTREGISTRYUPDATE.EXE.