Friday, July 10, 2009

Hazards of working on the 13th floor

I'm not superstitious, I promise.

I work on the 13th floor, and there's something hinky with the elevator. Car #3.

At least half the time I walk near the elevator shaft, Car #3 is either open, or opening.

Walk out of the bathroom? Ding! and the doors are opening. Like it hears you coming. (I'm not superstitious, but not opposed to anthropomorphicization...)

Here's the best part. When you get on the elevator, the 7th floor lights up. On its own. As the doors close.

So, Unlucky 13, Lucky 7, and then the lobby.
Read more...

Monday, July 6, 2009

Goodbye CompuServe

Back in the heady days of the 90s, I worked at CompuServe, and I was there for the AOL purchase.

Since CompuServe is now officially dead, I thought I'd write up some trivia that I remember. Some of this stuff should go in a textbook on how to get complacent and kill yourself as a business.

While I was there, CompuServe was owned by H&R Block. That might seem like an odd match; it was. The overly conservative management team there was not ready to deal with the changes required to adapt to the Internet era. Instead, they came out with Wow!. We used to laugh and say that the "tc-chk" noise that it made when you clicked on a "pog" was really coming from a "User Click Interaction Specialist II" (yes, there really were titles like that) sitting at a microphone in Columbus, Ohio.

CompuServe had some long-term investment in DEC 10 technology. So much so, that when DEC (Digital Equipment Corporation) decided to discontinue them, CompuServe started making their own-- they had a hardware team that managed to shoehorn a DEC 10 into a pair of VME boards with a mess of VLSI chips. These were used for at least half of everything that made up CompuServe at the time: all the Terminal Access Nodes (TANs), the "chatrooms" (which had exactly 40 channels-- it was designed to mimic a CB radio), etc.

The TANs were what sat in various closets/server rooms around the country, and had modems hanging off them; they handled all the local dialup for which CompuServe charged so handily. They also handled another function: Credit Cards.

One of the years that I was there, I remember some senior manager coming out from Columbus, and sharing some of the following facts (and, I'll admit that things have gotten a little hazy in the intervening dozen years):

  • 1 billion credit card transactions were handled on the CompuServe network.
  • The revenue per employee was something like $180,000
  • CompuServe had been logging a 20% CAGR for something like 20 years. Note to the wise: I've seen lots of companies tank after making a statement similar to this.

Some additional bits that I remember:

  • The entire (okay, maybe not entire) network ran on X.25. This made for amusing things like dialing into a TAN, and the tan giving you a PPP (Point-to-Point Protocol) connection, but this really worked by giving you a direct connection to a server in Columbus that actually hosted your PPP connection. The connection from the TAN to Columbus was over X.25. So you had an X.25 connection that PPP rode on top of.
  • CompuServe really did have a good network, and I suspect, without looking, that CompuServe Network Services is still around in one form or another. The only really weird bit was the X.25 stuff.

Where did CompuServe go wrong? I think that the biggest hint should be in the DEC 10 hardware. They started to go wrong when the decision to stay with the old hardware, even when it meant that they had to do their own hardware engineering. Forget that the Internet was going into exponential growth: they ignored Moore's Law.

I'm assuming that they did this because they had a comfort level with rocking the 36-bit processor. In fact, when the question was asked, CompuServe management responded with some vague platitudes about the benefits of a 36-bit platform. I don't suppose it was ever really evaluated properly whether making hardware was part of the core business.

The conversion pain should have happened in the mid 80s, when there was a chance. Instead, systems calcified around a dead architecture. When someone finally saw the handwriting on the wall, the possibility of change was gone.

CompuServe, I believe, was a victim of their own success; there was a prevalent attitude of imperviousness (20% CAGR for 20 years!), fueled by the fact that they were printing cash. This was then compounded by the fact that they were owned by a financial services firm, where the dominant paradigm seems to be to leave something that is making money alone (current financial crisis, anyone?).

Even though it was clear that this had to happen eventually, I'm still sad to see CompuServe go. I worked with many great, smart people there, and learned a tremendous amount about internet scale engineering.
Read more...

Sunday, June 21, 2009

Internet scale databases

After going to the Amazon Web Services event at Mariner Stadium, I've been doing quite a bit of ruminating on building highly scalable web sites.

I suppose that part of this is my roots in webmastering (back when that was a singular title!).

So, I've decided to do some analysis of various technologies that contribute to building highly scalable web sites. Today's analysis is on databases (and whether you should even use one!).

The default: use a relational database.

There are a whole raft of issues with this, not the least of which is write performance. Eventually, a single database cannot be scaled (no matter how much money you throw at it) to support the write volumes required by high volume web applications. There are a few solutions:

Database sharding:

This can take many forms, but there are several basic versions:

Table partitioning: moving tables to separate databases
Range partitioning: e.g. putting user number 1-100,000 on server a, 100,001-200,000 on server b, etc.
Directory partitioning: put a directory in front of the database, so you have to ask two questions (where is this user info stored, and ask for the data from that server) to retrieve data.

More in-depth analysis here.

Of these, I think that the last one is quite interesting, especially if architected in from the beginning. It would be hard to retrofit, but the biggest advantage is that repartitioning is quite feasible.

Caching:

Memcached has become de rigeur in building high scalability applications like web sites. Its ability to speed up websites has been well documented, and I think it is a fantastic solution for not only improving read performance on databases, but the ability to cache the results of expensive operations.

I do have one nit, however, and that is what happens when a memcached server fails: there is no redundancy (which is acceptable, and can be planned for), and the decision about where data should go is dependent on the number of machine instances that are running. It would be better for this to be able to scale up and down easily.

Google's Bigtable, Hypertable and Amazon SimpleDB:

The only one of these that is available as open source appears to be Hypertable. The idea here is that you forgo the whole relational model for a simple data model (think giant spreadsheet), with much more flexibility in what you can stuff into a column, e.g. columns can store multiple attributes. Underneath this simple exterior is a highly distributed datastore (designed to run on many machines) that is designed to handle huge throughput.

And now for the crazy idea:

If, e.g., you only needed to store customer records, and each of those records were identified by a natural key (email address), then you might not need a database at all (or, at least for this purpose).

What if you combined Google's Protocol Buffers with something like Danga Interactive's MogileFS. Every object is stored as a protocol buffer for fast serialization/deserialization, and MogileFS takes care of storing the serialized file. It handles replication, managing redundancy, etc. This could also work on Amazon's Simple Storage Service (S3).

I realize that this does not handle things like iteration, but what if that isn't in the use case? Alternately, what if the architecture is something like Memcached -> Protobuf + MogileFS -> Sharded DB? You could have the best of all worlds: fast access through Memcached, if that fails, retrieve the record from MogileFS (and push into Memcached), and if you need iteration, or an alternate lookup, you still have the relational DB in a scalable format.
Read more...

Monday, May 11, 2009

Why do I have physical infrastructure?

I've been off on one of my ADD fueled research binges lately-- investigating Cloud Computing. I looked at several offerings:

  1. Nimbus
  2. Amazon EC2
  3. Google AppEngine
And here are the conclusions that I came to:

Nimbus:
A nice layer on top of Xen to manage VM instances on a set of machines. This is really targeted at scientific computing, and does have a bit of a learning curve. If you have a pile of hardware that is being underutilized, some technical know-how and elbow grease (a.k.a. 'Round Tuits'), this is a nice looking solution to run a mess of VMs on. Especially if there are many images, but they don't necessarily run all the time (e.g. testing machines, support/demo environments). Turning VMs on and off is fairly straight forward. It offers an emulation of the Amazon EC2 WSDL interfaces, which is nice.

Amazon EC2:
This is an online cloud computing platform. Upload your VM(s), and use the Web Services interface to activate machines. Pay per hour, or per year. I'd have a hard time justifying capital expenditure (on server hardware) in a start up with this service available. The user can activate as many VMs as they need to service demand. The amount of CPU and disk available is quite elastic (hence the name, Elastic Compute Cloud).

Google AppEngine:
This is an interesting service-- it basically offers a stripped Python (and now Java) environment. Applications need to be specifically ported to the environment, but it does utilize (require) a number of best practices so that your application should be somewhat "forced" to be scalable. Probably a good option for "green field" web 2.0 applications, but if you have code already, might not be the best choice. Integration seems like it might be a little problematic (e.g. no MySQL/Postgres/SQL Server or C Library support).

Summary
Of these, Amazon EC2 is the one that fascinates me. We regularly build VMs that mimic customer environments, and these might only get used for a couple dozens of hours per year. EC2 would be a great solution.

We also have regular requirements to do performance testing under specific circumstances. We can allocate 12 hours on a fairly beefy machine with very low cost, get our testing done, and have results for under $20-ish. That is amazing. Read more...

Tuesday, April 14, 2009

Zawinski's Law of Software Envelopment

Every program attempts to expand until it can read mail. Those programs which cannot so expand are replaced by ones which can.
Jamie Zawinski, Jargon file entry

My friend Robert and I were talking the other day about Twitter, and he related that he's been seeing people using it like they would use email. This is no surprise; every new "social technology" that comes along ends up being treated like email. Instant messaging did it, etc.

So I'm coining Young's Corrolary to Zawinski's law:

Every social networking software or web application eventually implements an email analogue, or its users will use some function of the network as email.

Of course, you could just use email, but that wouldn't be "Web 2.0".
Read more...

Wednesday, April 8, 2009

Specialization is for Insects

A human being should be able to change a diaper, plan an invasion, butcher a hog, conn a ship, design a building, write a sonnet, balance accounts, build a wall, set a bone, comfort the dying, take orders, give orders, cooperate, act alone, solve equations, analyze a new problem, pitch manure, program a computer, cook a tasty meal, fight efficiently, die gallantly. Specialization is for insects.

-Robert A. Heinlein

As a hiring manager, I've always looked for people that tend to generalize. I'm not just looking for someone that has narrowed down on one skill set, or even focused on development work to the exclusion of other pursuits.

The people that I find most effective have other interests; they tend to be "renaissance men". Maybe they paint, or cook, or work on cars. They might play volleyball, or soccer, or softball.

Almost universally, they play an instrument.

I've seen the other side of this, after working with mainframes for several years: aging COBOL programmers that are just waiting to be put out to pasture. That isn't a dig against COBOL or COBOL programmers-- but there is a certain crowd that never updated their skills, never did anything else, and now are trapped in an evolutionary dead-end. I suspect that there are many Visual Basic programmers sitting in IT departments right now that are gearing up for this fate (and no, I'm not talking about VB.Net).

Can you talk to customers? You're more valuable than someone who can't.

Can you give a demo? Think on your feet?

Do you know how to wear appropriate attire to meet with important people? I've met many programmers over the years that never learned to how to dress "business casual", and tend to look really uncomfortable when forced to. People that can effectively "dress for success" are worth more.

If I'm hiring a java programmer, I usually want to see that you've used a scripting language, and that you have some modicum of database development and optimization. I want to know that you've done network engineering, and can explain at a high level what load balancing looks like. I want you to be able to explain sockets (this is one of those things, like pointers, that you either get, or don't get, and those who "get it" are better suited for the job).
Read more...

Tuesday, April 7, 2009

Stem Cell Research to Heal Broken Bones

Another breakthrough in stem cell research, the adult variety. No surprise about that last bit.

The treatment uses stem cells from the bone marrow, and then they use magnets to guide them to the location of the bone break. They are expecting to be in clinical trials within 5 years.

Amazing times that we live in. Wake me up when I can inject some additional grey matter.
Read more...

Monday, April 6, 2009

Why I think the copyright system is broken, an anecdote


The EFF has an excellent article about President Obama's gift to Queen Elizabeth: an iPod filled with music and video.

Article here.

If the gift were physical copies of the items (e.g. if he wanted to give her DVDs that would not play in her DVD player, as he did with Gordon Brown), then there is no question that the first sale doctrine applies. However, in this post-Napster era, copyright owners seem to have gone completely insane with attempting to impose control over digital distribution. The man on the street can tell you that this is folly, but they persist nonetheless.

Just for reference, I think that content producers should get paid for their art. That does not generate any love for the RIAA, the record companies, or the other racketeers that make up the current copyright thugs.

Image above licensed under creative commons.
Read more...

Tuesday, March 31, 2009

Why can't we just have data types?

I'm not the biggest fan of object oriented programming. There, I said it.

Also, the world is round, and circles the sun. Regardless of obsoleted dogma.

I just wanted to cook up a hash, containing hashes as values (with a couple levels of nesting). So, what do I end up with, since I'm currently stuck with those, "I know, we'll make everything an object!" languages?

This:

((ArrayList)((HashMap)dicts.get(pattern.get("Dictionary"))).get("Elements")).add(pattern);

That really should have been simple, and it's a great argument for DATA TYPES. Not Objects, not Aspects, just data. Really.

Perl's been accused of being cryptic, but here's the equivalent:

push(@{ $dicts{Dictionary}->{Elements} }, $pattern);
Of course, you don't actually have to do it that way, since perl is actually quite flexible. It has a very limited range of actual data types, but then has many operators that can manipulate the data (in wondrous and fantastic ways!).
Read more...

Saturday, March 28, 2009

Computer Science/Engineering, an analogy



Computer science is to programming as pigmentation science is to painting.

I have asked the same question of people that I have worked with for the last 10-15 years (a long time). Here it is:

Is programming an art or a science?
Before I give you the answer, think about this: when you look at a beautiful building, there is a lot of science involved. But the thing that made it beautiful was not the science, it was the artistry that built on that science.

In case it's not immediately obvious, the near unanimous answer was: it is an art. Anecdotally, the better the person was at getting things done the more likely they were to answer "art", and quickly.

The sooner we get over the idea of programming as an "engineering discipline" or "science", the better.

Next time you hear someone going on about how we need "formal proofs" or "stronger engineering" for applications, ask them: "What have you actually built lately?"

(Image from user LukeGordon1 on Flickr, and is Creative Commons licensed)
Read more...