IT

IM developments

Telcos look at instant messaging providers with deep suspicion. Transporting voice is just a special case of transporting bits, and even the global Internet is now good enough for telephony (indeed, many telcos are already using IP to transport voice for their phone networks, albeit on private IP backbones). The main remaining barriers to VoIP adoption are interoperability with the legacy network during the transition, and signaling (i.e. finding the destination’s IP address). IM providers offer a solution for the latter, and could thus become VoIP providers. AOL actually is, indirectly, through Apple’s iChat AV. This competitive threat explains why, for instance, France Télécom made a defensive investment in open-source IM provider Jabber.

Two recent developments promise to change dramatically the economic underpinnings of the IM industry:

  1. Yahoo announced a few weeks ago it would drop its enterprise IM product. Within a week, AOL followed suit.
  2. AOL and Yahoo agreed to interoperate with LCS, Microsoft’s forthcoming Enterprise IM server. Microsoft will pay AOL and Yahoo a royalty for access to their respective IM networks.

These announcement make it clear neither Yahoo nor AOL feel they can sell successfully into enterprise accounts, and certainly not match Microsoft’s marketing muscle in that segment.

The second part, in effect Microsoft agreeing to pay termination fees to AOL and Yahoo, means that Microsoft’s business IM users will subsidize consumers. This is very similar to the situation in telephony, where businesses cross-subsidize local telephony for residential customers by paying higher fees. For most telcos, interconnect billing is either the first or second largest source of revenue, and this development may finally make IM profitable for Yahoo and AOL, rather than the loss-leader it is today.

Apparently Microsoft has concluded it cannot bury its IM competitors, and would rather make money now serving its business customers’ demand for an interoperable IM solution than wait to have the entire market to itself using its familiar Windows bundling tactics. Left out in the cold is IBM’s Lotus Sametime IM software.

Businesses will now be able to reach customers on all three major networks, but this does not change the situation for consumers. The big three IM providers have long played cat-and-mouse games with companies like Trillian that tried to provide reverse-engineered clients that work with all three networks. Ostensibly, this is for security reasons, but obviously the real explanation is to protect their respective walled gardens, just as in the early days the Bell Telephone company would refuse to interconnect with its competitors, and many businesses had to have maintain multiple telephones, one for each network. It is not impossible, however, that interoperability will be offered to consumers as a paid, value-added option. Whether consumers are ready to pay is an entirely different question.

Effective anti-spam enforcement

The European Union E-Privacy directive of 2002, the US CAN-SPAM act of 2003 and other anti-spam laws allow legal action against spammers. Only official authorities can initiate action (although there are proposals to set up a bounty system in the US), but enforceability of these statutes is a problem, as investigations and prosecutions are prohibitively expensive, and both law enforcement and prosecutors have other pressing priorities contending for finite resources. Financial investigative techniques (following the money trail) that can be deployed against terrorists, drug dealers and money launderers are overkill for spammers, and would probably raise civil liberties issues.

There is an option that could dramatically streamline anti-spam enforcement, however. Spammers have to find a way to get paid, and payment is usually tendered using a credit card. Visa and Mastercard both have systems by which a temporary, one-time use credit card number can be generated. This service is used mostly to assuage the fears of online shoppers, but also provides a solution.

Visa and Mastercard could offer an interface that would allow FTC investigators and their European counterparts to generate “poisoned” credit card numbers. Any merchant account that attempts a transaction using such a number would be immediately frozen and its balance forfeited. Visa and Mastercard’s costs could be defrayed by giving them a portion of the confiscated proceeds.

Of course, proper judicial oversight would have to be provided, but this is a relatively simple way to nip the spam problem in the bud, by hitting spammers where it hurts most – in the pocketbook.

Why IPv6 will not loosen IP address allocation

The current version of Internet Protocol (IP), the communications protocol underlying the Internet, is version 4. In IPv4, the address of any machine on the Internet, whether a client or a server, is encoded in 4 bytes. Due to various overheads, the total number of addresses available for use is much less than the theoretical 4 billion possible. This is leading to a worldwide crunch in the availability of addresses, and rationing is in effect, specially in Asia, which came late to the Internet party and has a short allocation (Stanford University has more IPv4 addresses allocated to it than the whole of China).

Internet Protocol version 6, IPv6, quadrupled the size of the address field to 16 bytes, i.e. unlimited for all practical purposes, and made various other improvements. Unfortunately, its authors severely underestimated the complexity of migrating from IPv4 to IPv6, which is why it hasn’t caught on as quickly as it should have, even though the new protocol is almost a decade old now. Asian countries are leading in IPv6 adoption, simply because they don’t have the choice. Many people make do today with Network Address Translation (NAT), where a box (like a DSL router) allows several machines to share a single global IP address, but this is not an ideal solution, and one that only postpones the inevitable (but not imminent) reckoning.

One misconception, however, is that that the slow pace of the migration is somehow related to the fact you get your IP addresses from your ISP, and don’t “own” them or have the option to port them the way you now can with your fixed or mobile phone numbers. While IPv6 greatly increases the number of addresses available for assignment, this will not change the way addresses are allocated, for reasons unrelated to the address space crunch.

First of all, nothing precludes anyone from requesting an IPv4 address directly from the registry in charge of their continent:

  • ARIN in North America and Africa south of the Equator
  • LACNIC for Latin America and the Caribbean
  • RIPE (my former neighbors in Amsterdam) for Europe, Africa north of the Equator, and Central Asia
  • APNIC for the rest of Asia and the Pacific.

That said, these registries take the IP address shortage seriously and will require justification to grant the request. Apart from ISPs, the other main kind of allocation recipients are large organizations that require significant numbers of IP addresses (e.g. for a corporate Intranet) and that will use multiple ISPs for their Internet connectivity.

The reason why IP addresses are allocated mostly through ISPs is the stability of the routing protocols used by ISPs to provide global IP connectivity. The Internet is a federation of independent networks that agree to exchange traffic, sometimes for free (peering) or for a fee (transit). Each of these networks is called an “Autonomous System” (AS) and has an AS number (ASN) assigned to it. ASNs are coded in 16 bits, so there are only 65536 available to begin with.

When your IP packets go from your machine to their destination, they will first go through your ISP’s routers to your ISP’s border gateway that connects to other transit or final destination ISPs leading to your destination. There usually are an order of magnitude or two fewer border routers than interior routers. The interior routers do not need much intelligence, all they need to know is how to get their packets to the border. The border routers, on the other hand, need to have a map of the entire Internet. For each block of possible destination IP addresses, they need to know which next-hop ISP to forward the packet on to. Border routers exchange routing information using the Border Gateway Protocol, version 4 (BGP4).

BGP4 is in many ways black magic. Any mistake in BGP configuration can break connectivity or otherwise impair the stability of vast swathes of the Internet. Very few vendors know how to make reliable and stable implementations of BGP4 (Cisco and Juniper are the only two really trusted to get it right), and very few network engineers have real-world experience with BGP4, learned mostly through apprenticeship. BGP4 in the real scary world of the Internet is very different from the safe and stable confines of a Cisco certification lab. The BGP administrators worldwide are a very tightly knit cadre of professionals, who gather in organizations like NANOG and shepherd the Net.

The state of the art in exterior routing protocols like BGP4 has not markedly improved in recent years, and the current state of the art in core router technology just barely keeps up with the fluctuations in BGP. One of the control factors is the total size of BGP routing tables, which has been steadily increasing as the Internet expands (but no longer exponentially, as was the case in the early days). The bigger the routing tables, the more memory has to be added to each and every border router in the planet, and the slower route lookups will be. For this reason, network engineers are rightly paranoid about keeping routing tables small. Their main weapon consists of aggregating blocks of IP addresses that should be forwarded the same way, so they take up only one slot.

Now assume every Internet user on the planet has his own IP address that is completely portable. The size of the routing tables would explode from 200,000 or so today to hundreds of millions. Every time someone logged on to a dialup connection, every core router on the planet would have to be informed, and they would simply collapse under the sheer volume of routing information overhead, and not have the time to forward actual data packets.

This is the reason why IP addresses will continue to be assigned by your ISP: doing it this way allows your ISP to aggregate all its IP addresses in a single block, and send a single route to all its partners. Upstream transit ISPs do even more aggregation, and keep the routing tables to a manageable size. The discipline introduced by the regional registries and ISPs is precisely what changed the exponential trend in routing table growth (one which even Moore’s law would not be able to keep up with) to a linear one.

It’s not as if this requirement is anti-competitive, unlike telcos dragging their feet on number portability – the DNS was precisely created so users would not have to deal with IP addresses, and can easily be changed to point to new addresses in the event of a change of IP addresses.

Networked storage on the cheap

As hard drives get denser, the cost of raw storage is getting ridiculously cheap – well under a dollar per gigabye as I write. The cost of managed storage, however, is an entirely different story.

Managed storage is the kind required for “enterprise applications”, i.e. when money is involved. It builds on raw storage by adding redundancy, the ability to hot-swap drives, to add capacity without disruption. In the higher-end of the market, additional manageability features include fault tolerance, the ability to take “snapshots” of data for backup purposes, and to mirror data remotely for disaster recovery purposes.

Traditionally, managed storage has been more expensive than raw disk by a factor of at least two, sometimes even an order of magnitude or more. When I started my company in 2000, for instance, we paid $300,000, almost half of our initial capital investment, for a pair of clustered Network Appliance F760 filers, with a total disk capacity of 600GB or so ($500/GB, when disk drives would cost $10/GB at the time). The investment was well worth it, as these machines have proven remarkably reliable, and the Netapps’ instant snapshot capability is vital for us, as it allows us to take instantaneous snapshots of our Oracle databases, which we can then back up in a leisurely backup window, without having to keep Oracle in the performance-sapping backup mode during that time.

Web serving workloads and the like can easily be distributed across farms of inexpensive rackmount x86 servers, an architecture pioneered by ISPs. Midrange servers (up to 4 processors), pretty much commodities nowaday, are adequate for all but the very highest transaction volume databases. Storage and databases are the backbone of any information system, however, and a CIO cannot afford to take any risks with them, that is why storage represents such a high proportion of hardware costs for most IT departments, and why specialists like EMC have the highest profit margins in the industry.

Most managed storage is networked, i.e. does not consist of hard drives directly attached to a server, but instead of disks attached to a specialized storage appliance connected to the server with a fast interconnect. There are two schools:

  • Network-Attached Storage (NAS), like our Netapps, that basically serve act as network file servers using common protocols like NFS (for UNIX) and SMB (for Windows). These are more often used for midrange applications and unstructured data, and connect using inexpensive Ethernet (Gigabit Ethernet, in our case) networks every network administrator is familiar with. NAS are available for home or small office use, at prices of $500 and up.
  • Storage Area Networks (SAN) offer a block-level interface (they behave like virtual hard drives that serve fixed-size blocks of data, without any understanding of what is in them). They currently use Fibre Channel, a fast and low latency interconnect, that is unfortunately also terribly expensive (FC switches are over ten times more expensive than equivalent Gigabit Ethernet gear). The cost of setting up a SAN usually limits them to high-end, mainframe-class data centers. Exotic cluster filesystems or databases like Oracle RAC need to be used if multiple servers are going to access the same data.

One logical way to lower the cost of SANs is to use inexpensive Ethernet connectivity. This was recently standardized as iSCSI, which is essentially SCSI running on top of TCP/IP. I recently became aware of Ximeta, a company that makes external drives that apparently implement iSCSI, at a price that is very close to that of raw disks (since iSCSI does not have to manage state for clients the way a more featured NAS does, Ximeta can shun expensive CPUs and RAM, and use a dedicated ASIC instead).

The Ximeta hardware is not a complete solution, and the driver software manages the metadata for the cluster of networked drives, such as the information that allows multiple drives to be concatenated to add capacity while keeping the illusion of a single virtual disk. The driver is also responsible for RAID, although Windows, Mac OS X and Linux all have volume managers capable of this. There are apparently some Windows-only provisions to allow multiple computers to share a drive, but I doubt they constitute a full-blown clustered filesystem. There are very few real-world cases in the target market where anything more than a cold standby is required, and it makes a lot more sense to designate one machine to share a drive for the others in the network.

I think this technology is very interesting and has the potential to finally make SANs affordable for small businesses, as well as for individuals (imagine extending the capacity of a TiVo by simply adding networked drives in a stack). Disk-to-disk Backups are replacing sluggish and relatively low-capacity tape drives, and these devices are interesting for that purpose as well.

Keyspan USB Server review

I saw the Keyspan USB Server at MacWorld SF a few months ago, but it has only recently started to ship (I received mine yesterday). This device allows you to connect a Mac or PC to up to 4 USB 1.1 peripherals remotely over Ethernet, much as a print server allows you to access remote printers. It also allows sharing of USB devices between multiple computers.

I use it to reduce clutter in my apartment by moving away bulky items like my HP 7660 printer and my Epson 3170 scanner away from the iMac in my living room, which has progressively become my main computer, even though it is probably the slowest machine I have.

You install the driver software (Windows 2000/XP or Mac OS X, no drivers for Linux so far), and it creates a simulated USB hub device that takes care of bridging the USB requests over Ethernet. There is a management program that allows you to configure the settings on the USB Server such as the IP address (zeroconf, a.k.a RendezVous is supported, a nice touch), password and access mode. The user interface is functional, if not perfectly polished. To use a USB peripheral hooked to the USB server, you fire up the admin client, select one of the USB devices and take a “lease” on it. I have links to some screen shots of the GUI below:

The process is as smooth as it can possibly be, given that USB devices are not designed to be shared between multiple hosts, and thus some form of locking had to be provided. I tried my scanner over the Ethernet, and have not noticed any perceptible degradation in performance. The software copes with sleep mode correctly. The only nit I would have to pick is that the power adapter “wall wart” DC connector slips off the device too easily (not enough friction to hold it in place), disconnecting it.

Many families are becoming multi-computer households. The Keyspan USB Server is a surprisingly effective way to share peripherals or to move bulky and seldom used peripherals out of the way. At a street price of around $100, it is not inexpensive, but I found it a very worthwhile accessory for my home network.