This blog is now hosted at

Monday, November 16, 2009

Great customer service

Those of you following my build on Facebook know that I'm building a fEarful 12/6, which uses the Eminence 3012LF speaker. After assembling, I noticed that the speaker had a bunch of distortion, and found a hairline crease in the cone.

Called Eminence, got an RMA number, and shipped off my speaker with crossed fingers.

They called today, and he said that the speaker looks fine, sounds fine, and the crease I saw was probably a casting mark or something. Uh-oh, I'm thinking.

"But we'll get a new speaker right out to you."

Nice.

Tuesday, September 29, 2009

Swine Flu - Remarkable?

I was listening to NPR this morning, and they had a doctor (epidemiologist? immunologist? something like that) on to talk about Swine Flu (H1N1).

Now, there is a lot of coverage on Swine Flu lately, and I thought some of the things that the doctor said were very interesting.

  1. That Swine Flu and the Seasonal Flu have the same symptoms.
  2. That Swine Flu and the Seasonal Flu are treated using the same methods (fever reducer and fluids).
Now, you might forgive me for thinking at this point: what's the difference? If swine flu is no more dangerous, and is treated the same way, why the hullabaloo?

It turns out that the core difference is that the flu season has started early this year.

Interestingly, we had a relatively dry, cool summer. The areas that are being hardest hit (the south) had the driest summers. The flu spreads best in cold, dry environments. If you have been told that the flu has nothing to do with the weather, you've been misinformed.

Now, I'm not trying to imply causality here, but there's been so much misinformation, e.g. "57% of those tested showed positive for swine flu", that I thought I'd offer a possible explanation.

By the way, that statistic above? You probably didn't catch that 57% was the number of people who:

  1. Came to the hospital with flu-like symptoms
  2. Were tested (and they "selectively test"!). Selective testing means that they had severe symptoms and another compounding medical condition, like asthma.

You'll forgive me, then, for maybe thinking that the press is playing fast and loose with the numbers. And I love that the CDC publishes a map that shows "how geographically dispersed" the flu is. NO INDICATION OF SEVERITY. So a couple of cases, spread around == pandemic. Nuts.

Attribution: photo from Patrick Long

Saturday, September 12, 2009

Remove me from your mailing list.

If you are sending out newsletters, here's some advice:

Make it very easy to unsubscribe.

Here's why:

  1. It's just common decency.
  2. Users will not appreciate your lock in tactics, and it will tarnish your brand.
  3. Since it's hard to unsubscribe, they'll just mark you as spam.
If you require me to log in to unsubscribe, I'm marking you as spam.
If there is more than 1-2 steps to unsubscribe, I'm marking you as spam.

Ideally, put a link at the bottom of the email that says "unsubscribe". When clicked, it optionally asks, "are you sure?", and then unsubscribes the user. No fuss, no muss.

Image from Sarah G

Saturday, September 5, 2009

Android is actually "Getting There"

I finally have all the applications I need for the G1 to be truly useable, and at least give it parity with the Blackberry or iphone.

The main things:

Push (Exchange) email, in the form of a $25 download called Touchdown. This app is a bit wonky from a usability standpoint, but has consistently gotten better as the developer has released improved versions.

While I feel that this should be DEFAULT FUNCTIONALITY (are you listening Google?!), I can say it's worth the price, and I haven't really had any major problems with it once it was configured (configuration is a bit of a pain).

PPTP VPN: Thanks to Cyanogen for backporting this to CyanogenMod (if you're not running this, you're missing out on a better experience). This is slated to be included in the Donut release, but CyanogenMod has it now, and it works (although I haven't been able to get it working on WiFi).

Those were the two biggest things that were just *missing* from the Android stack, and I can actually recommend the G1 for business users. The PPTP thing makes me really happy, especially since the PPP daemon has probably supported it since the launch, but there was no way to properly drive it.

Photo by Max.

Tuesday, September 1, 2009


My son has been spending all his "computer time" lately on a site called PlayCrafter: basically a flash gaming site, but also allows you to create your own games. The result is below:

I'm obviously biased, but he made the "hottest games" list. I'm trying to use this as an encouragement to delve deeper into the programming aspects of game development (wish me luck!).

Friday, July 10, 2009

Hazards of working on the 13th floor

I'm not superstitious, I promise.

I work on the 13th floor, and there's something hinky with the elevator. Car #3.

At least half the time I walk near the elevator shaft, Car #3 is either open, or opening.

Walk out of the bathroom? Ding! and the doors are opening. Like it hears you coming. (I'm not superstitious, but not opposed to anthropomorphicization...)

Here's the best part. When you get on the elevator, the 7th floor lights up. On its own. As the doors close.

So, Unlucky 13, Lucky 7, and then the lobby.

Monday, July 6, 2009

Goodbye CompuServe

Back in the heady days of the 90s, I worked at CompuServe, and I was there for the AOL purchase.

Since CompuServe is now officially dead, I thought I'd write up some trivia that I remember. Some of this stuff should go in a textbook on how to get complacent and kill yourself as a business.

While I was there, CompuServe was owned by H&R Block. That might seem like an odd match; it was. The overly conservative management team there was not ready to deal with the changes required to adapt to the Internet era. Instead, they came out with Wow!. We used to laugh and say that the "tc-chk" noise that it made when you clicked on a "pog" was really coming from a "User Click Interaction Specialist II" (yes, there really were titles like that) sitting at a microphone in Columbus, Ohio.

CompuServe had some long-term investment in DEC 10 technology. So much so, that when DEC (Digital Equipment Corporation) decided to discontinue them, CompuServe started making their own-- they had a hardware team that managed to shoehorn a DEC 10 into a pair of VME boards with a mess of VLSI chips. These were used for at least half of everything that made up CompuServe at the time: all the Terminal Access Nodes (TANs), the "chatrooms" (which had exactly 40 channels-- it was designed to mimic a CB radio), etc.

The TANs were what sat in various closets/server rooms around the country, and had modems hanging off them; they handled all the local dialup for which CompuServe charged so handily. They also handled another function: Credit Cards.

One of the years that I was there, I remember some senior manager coming out from Columbus, and sharing some of the following facts (and, I'll admit that things have gotten a little hazy in the intervening dozen years):

  • 1 billion credit card transactions were handled on the CompuServe network.
  • The revenue per employee was something like $180,000
  • CompuServe had been logging a 20% CAGR for something like 20 years. Note to the wise: I've seen lots of companies tank after making a statement similar to this.

Some additional bits that I remember:

  • The entire (okay, maybe not entire) network ran on X.25. This made for amusing things like dialing into a TAN, and the tan giving you a PPP (Point-to-Point Protocol) connection, but this really worked by giving you a direct connection to a server in Columbus that actually hosted your PPP connection. The connection from the TAN to Columbus was over X.25. So you had an X.25 connection that PPP rode on top of.
  • CompuServe really did have a good network, and I suspect, without looking, that CompuServe Network Services is still around in one form or another. The only really weird bit was the X.25 stuff.

Where did CompuServe go wrong? I think that the biggest hint should be in the DEC 10 hardware. They started to go wrong when the decision to stay with the old hardware, even when it meant that they had to do their own hardware engineering. Forget that the Internet was going into exponential growth: they ignored Moore's Law.

I'm assuming that they did this because they had a comfort level with rocking the 36-bit processor. In fact, when the question was asked, CompuServe management responded with some vague platitudes about the benefits of a 36-bit platform. I don't suppose it was ever really evaluated properly whether making hardware was part of the core business.

The conversion pain should have happened in the mid 80s, when there was a chance. Instead, systems calcified around a dead architecture. When someone finally saw the handwriting on the wall, the possibility of change was gone.

CompuServe, I believe, was a victim of their own success; there was a prevalent attitude of imperviousness (20% CAGR for 20 years!), fueled by the fact that they were printing cash. This was then compounded by the fact that they were owned by a financial services firm, where the dominant paradigm seems to be to leave something that is making money alone (current financial crisis, anyone?).

Even though it was clear that this had to happen eventually, I'm still sad to see CompuServe go. I worked with many great, smart people there, and learned a tremendous amount about internet scale engineering.

Sunday, June 21, 2009

Internet scale databases

After going to the Amazon Web Services event at Mariner Stadium, I've been doing quite a bit of ruminating on building highly scalable web sites.

I suppose that part of this is my roots in webmastering (back when that was a singular title!).

So, I've decided to do some analysis of various technologies that contribute to building highly scalable web sites. Today's analysis is on databases (and whether you should even use one!).

The default: use a relational database.

There are a whole raft of issues with this, not the least of which is write performance. Eventually, a single database cannot be scaled (no matter how much money you throw at it) to support the write volumes required by high volume web applications. There are a few solutions:

Database sharding:

This can take many forms, but there are several basic versions:

Table partitioning: moving tables to separate databases
Range partitioning: e.g. putting user number 1-100,000 on server a, 100,001-200,000 on server b, etc.
Directory partitioning: put a directory in front of the database, so you have to ask two questions (where is this user info stored, and ask for the data from that server) to retrieve data.

More in-depth analysis here.

Of these, I think that the last one is quite interesting, especially if architected in from the beginning. It would be hard to retrofit, but the biggest advantage is that repartitioning is quite feasible.


Memcached has become de rigeur in building high scalability applications like web sites. Its ability to speed up websites has been well documented, and I think it is a fantastic solution for not only improving read performance on databases, but the ability to cache the results of expensive operations.

I do have one nit, however, and that is what happens when a memcached server fails: there is no redundancy (which is acceptable, and can be planned for), and the decision about where data should go is dependent on the number of machine instances that are running. It would be better for this to be able to scale up and down easily.

Google's Bigtable, Hypertable and Amazon SimpleDB:

The only one of these that is available as open source appears to be Hypertable. The idea here is that you forgo the whole relational model for a simple data model (think giant spreadsheet), with much more flexibility in what you can stuff into a column, e.g. columns can store multiple attributes. Underneath this simple exterior is a highly distributed datastore (designed to run on many machines) that is designed to handle huge throughput.

And now for the crazy idea:

If, e.g., you only needed to store customer records, and each of those records were identified by a natural key (email address), then you might not need a database at all (or, at least for this purpose).

What if you combined Google's Protocol Buffers with something like Danga Interactive's MogileFS. Every object is stored as a protocol buffer for fast serialization/deserialization, and MogileFS takes care of storing the serialized file. It handles replication, managing redundancy, etc. This could also work on Amazon's Simple Storage Service (S3).

I realize that this does not handle things like iteration, but what if that isn't in the use case? Alternately, what if the architecture is something like Memcached -> Protobuf + MogileFS -> Sharded DB? You could have the best of all worlds: fast access through Memcached, if that fails, retrieve the record from MogileFS (and push into Memcached), and if you need iteration, or an alternate lookup, you still have the relational DB in a scalable format.

Monday, May 11, 2009

Why do I have physical infrastructure?

I've been off on one of my ADD fueled research binges lately-- investigating Cloud Computing. I looked at several offerings:

  1. Nimbus
  2. Amazon EC2
  3. Google AppEngine
And here are the conclusions that I came to:

A nice layer on top of Xen to manage VM instances on a set of machines. This is really targeted at scientific computing, and does have a bit of a learning curve. If you have a pile of hardware that is being underutilized, some technical know-how and elbow grease (a.k.a. 'Round Tuits'), this is a nice looking solution to run a mess of VMs on. Especially if there are many images, but they don't necessarily run all the time (e.g. testing machines, support/demo environments). Turning VMs on and off is fairly straight forward. It offers an emulation of the Amazon EC2 WSDL interfaces, which is nice.

Amazon EC2:
This is an online cloud computing platform. Upload your VM(s), and use the Web Services interface to activate machines. Pay per hour, or per year. I'd have a hard time justifying capital expenditure (on server hardware) in a start up with this service available. The user can activate as many VMs as they need to service demand. The amount of CPU and disk available is quite elastic (hence the name, Elastic Compute Cloud).

Google AppEngine:
This is an interesting service-- it basically offers a stripped Python (and now Java) environment. Applications need to be specifically ported to the environment, but it does utilize (require) a number of best practices so that your application should be somewhat "forced" to be scalable. Probably a good option for "green field" web 2.0 applications, but if you have code already, might not be the best choice. Integration seems like it might be a little problematic (e.g. no MySQL/Postgres/SQL Server or C Library support).

Of these, Amazon EC2 is the one that fascinates me. We regularly build VMs that mimic customer environments, and these might only get used for a couple dozens of hours per year. EC2 would be a great solution.

We also have regular requirements to do performance testing under specific circumstances. We can allocate 12 hours on a fairly beefy machine with very low cost, get our testing done, and have results for under $20-ish. That is amazing. Read more...

Tuesday, April 14, 2009

Zawinski's Law of Software Envelopment

Every program attempts to expand until it can read mail. Those programs which cannot so expand are replaced by ones which can.
Jamie Zawinski, Jargon file entry

My friend Robert and I were talking the other day about Twitter, and he related that he's been seeing people using it like they would use email. This is no surprise; every new "social technology" that comes along ends up being treated like email. Instant messaging did it, etc.

So I'm coining Young's Corrolary to Zawinski's law:

Every social networking software or web application eventually implements an email analogue, or its users will use some function of the network as email.

Of course, you could just use email, but that wouldn't be "Web 2.0".

Wednesday, April 8, 2009

Specialization is for Insects

A human being should be able to change a diaper, plan an invasion, butcher a hog, conn a ship, design a building, write a sonnet, balance accounts, build a wall, set a bone, comfort the dying, take orders, give orders, cooperate, act alone, solve equations, analyze a new problem, pitch manure, program a computer, cook a tasty meal, fight efficiently, die gallantly. Specialization is for insects.

-Robert A. Heinlein

As a hiring manager, I've always looked for people that tend to generalize. I'm not just looking for someone that has narrowed down on one skill set, or even focused on development work to the exclusion of other pursuits.

The people that I find most effective have other interests; they tend to be "renaissance men". Maybe they paint, or cook, or work on cars. They might play volleyball, or soccer, or softball.

Almost universally, they play an instrument.

I've seen the other side of this, after working with mainframes for several years: aging COBOL programmers that are just waiting to be put out to pasture. That isn't a dig against COBOL or COBOL programmers-- but there is a certain crowd that never updated their skills, never did anything else, and now are trapped in an evolutionary dead-end. I suspect that there are many Visual Basic programmers sitting in IT departments right now that are gearing up for this fate (and no, I'm not talking about VB.Net).

Can you talk to customers? You're more valuable than someone who can't.

Can you give a demo? Think on your feet?

Do you know how to wear appropriate attire to meet with important people? I've met many programmers over the years that never learned to how to dress "business casual", and tend to look really uncomfortable when forced to. People that can effectively "dress for success" are worth more.

If I'm hiring a java programmer, I usually want to see that you've used a scripting language, and that you have some modicum of database development and optimization. I want to know that you've done network engineering, and can explain at a high level what load balancing looks like. I want you to be able to explain sockets (this is one of those things, like pointers, that you either get, or don't get, and those who "get it" are better suited for the job).

Tuesday, April 7, 2009

Stem Cell Research to Heal Broken Bones

Another breakthrough in stem cell research, the adult variety. No surprise about that last bit.

The treatment uses stem cells from the bone marrow, and then they use magnets to guide them to the location of the bone break. They are expecting to be in clinical trials within 5 years.

Amazing times that we live in. Wake me up when I can inject some additional grey matter.

Monday, April 6, 2009

Why I think the copyright system is broken, an anecdote

The EFF has an excellent article about President Obama's gift to Queen Elizabeth: an iPod filled with music and video.

Article here.

If the gift were physical copies of the items (e.g. if he wanted to give her DVDs that would not play in her DVD player, as he did with Gordon Brown), then there is no question that the first sale doctrine applies. However, in this post-Napster era, copyright owners seem to have gone completely insane with attempting to impose control over digital distribution. The man on the street can tell you that this is folly, but they persist nonetheless.

Just for reference, I think that content producers should get paid for their art. That does not generate any love for the RIAA, the record companies, or the other racketeers that make up the current copyright thugs.

Image above licensed under creative commons.

Tuesday, March 31, 2009

Why can't we just have data types?

I'm not the biggest fan of object oriented programming. There, I said it.

Also, the world is round, and circles the sun. Regardless of obsoleted dogma.

I just wanted to cook up a hash, containing hashes as values (with a couple levels of nesting). So, what do I end up with, since I'm currently stuck with those, "I know, we'll make everything an object!" languages?



That really should have been simple, and it's a great argument for DATA TYPES. Not Objects, not Aspects, just data. Really.

Perl's been accused of being cryptic, but here's the equivalent:

push(@{ $dicts{Dictionary}->{Elements} }, $pattern);
Of course, you don't actually have to do it that way, since perl is actually quite flexible. It has a very limited range of actual data types, but then has many operators that can manipulate the data (in wondrous and fantastic ways!).

Saturday, March 28, 2009

Computer Science/Engineering, an analogy

Computer science is to programming as pigmentation science is to painting.

I have asked the same question of people that I have worked with for the last 10-15 years (a long time). Here it is:

Is programming an art or a science?
Before I give you the answer, think about this: when you look at a beautiful building, there is a lot of science involved. But the thing that made it beautiful was not the science, it was the artistry that built on that science.

In case it's not immediately obvious, the near unanimous answer was: it is an art. Anecdotally, the better the person was at getting things done the more likely they were to answer "art", and quickly.

The sooner we get over the idea of programming as an "engineering discipline" or "science", the better.

Next time you hear someone going on about how we need "formal proofs" or "stronger engineering" for applications, ask them: "What have you actually built lately?"

(Image from user LukeGordon1 on Flickr, and is Creative Commons licensed)

Wednesday, March 25, 2009

Covington Friday Night Car Shows Getting Ready to Start

It's almost time for the Friday night Covington Car shows to start!

The first Friday in May (May 1, 2009) through the last Friday in September, weather permitting, from 4PM - 8PM.

I have additional information (directions), and pictures, here.

The pictures should be convincing enough, but there are a lot of really nice cars there, the folks are friendly, and it's very family friendly. Take your kids. Read more...

Tuesday, March 24, 2009

The Hundredth Idiot (or, why not to read business books)

One hundred idiots make idiotic plans and carry them out. All but one justly fail. The hundredth idiot, whose plan succeeded through pure luck, is immediately convinced he's a genius.
--Iain M Banks, MATTER
Iain Banks makes one of the most astute observations I've ever read.

This is so applicable to:

  1. The business model de jour: How To Use Social Networks To Facilitate Enterprise Application Development!
  2. The development model de jour: Standing On Your Head To Increase Bloodflow and Facilitate New Development Methodologies!
Now, before you hie off on the "next big thing", remember that most of these "big things" are just hundredth idiots. Or, you can read a book like The Mythical Man Month by Fred Brooks and ask yourself: how is it that a book written 35 years ago, about a project completed 45 years ago is still 95% relevant?

Could it be that there's nothing new under the sun? Read more...

Monday, March 23, 2009

Frame-work (an oxymoron)

One of the most difficult things to learn about project management/development is the difference between adaptability and "does the work for you". People who have not worked with the latter might have a tendency to desire the latter (after all, who does not want less work?), but my experience has been that these types of systems always end up extracting their "pound of flesh". There Ain't No Such Thing As A Free Lunch.

The first time I used Struts, I was enamored of it for about a day. Then I realized what a pain it was to work with. It has been a while since I've used it, but my memory is primarily one of, "What magic incantation do I need to stuff in some random configuration file to actually make this page transition work?"

Another example, Ruby on Rails: I always thought this name apropos. You need to jump off and do some 4-wheeling? No dice, dude! You're on Rails!

How about this description of Cocoon:

Apache Cocoon is a web development framework built around the concepts of separation of concerns and component-based web development. Cocoon implements these concepts around the notion of 'component pipelines', each component on the pipeline specializing on a particular operation. This makes it possible to use a Lego(tm)-like approach in building web solutions, hooking together components into pipelines without any required programming. (emphasis mine)

Hint: anyone who suggests that you can program without programming is suffering from a logical fallacy. The Cocoon folks avoid this by offering that you can "hook together components into pipelines" without programming. How pipelines become applications, that I'm curious about.

I don't mean to pick on specific projects, but I'm just pointing out examples. There are certain advantages to having components, but the end goal should be adaptability and maintainability. Not the mess of an MBean talking to an EJB Session Bean that was looked up over JNDI that talks to a JMX Component that I have seen one too many times. ("See, we've got this framework, but it won't talk to this other framework, so now we're writing a Facade class (ooh! Design Patterns! It must be good!) to expose things to the other framework...")

For whatever reason, this appears to be largely a Java phenomenon. There are more Java frameworks than there are applications built on them!

...well over half of the time you spend working on a project (on the order of 70 percent) is spent thinking, and no tool, no matter how advanced, can think for you. Consequently, even if a tool did everything except the thinking for you -- if it wrote 100 percent of the code, wrote 100 percent of the documentation, did 100 percent of the testing, burned the CD-ROMs, put them in boxes, and mailed them to your customers -- the best you could hope for would be a 30 percent improvement in productivity. In order to do better than that, you have to change the way you think.

Fred Brooks [paraphrased] as quoted from Allen Holub's

The main thing that a framework needs to do is stay out of your way. It should not be an impediment to future progress. You should be able to say, "I want to change the sorting algorithm for this list," and be able to accomplish it quickly without the framework getting in the way.

I'm not suggesting that you should go through "Not Invented Here" syndrome, and re-implement wheels, airbags, windshields and turn signals, but that understanding the balance between adaptability and the enforced consistency of a framework. After all:

A foolish consistency is the hobgoblin of little minds, adored by little statesmen and philosophers and divines.

Ralph Waldo Emerson

Thursday, March 12, 2009

Now you have two problems

This is a classic quote from Jamie Zawinski:

Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems.

This has made me laugh many, many times over the years, as I've been trying to not weep over some dreadful regular expression. Read more...

Tuesday, March 10, 2009

Dear Mom...

My brother sent me this picture, and while I don't normally post "family" articles, this one will have to be the exception.

That's my mom, racing Purple Jeep. I'm guessing circa 1976-1978. I think she threw out the trophies, but she won races. Read more...

Thursday, February 26, 2009

Google blocking paid apps for Developer Phones

Apparently, if you paid the $425 to purchase an Android Developer Phone (ADP1), you are now a second class citizen. Google decided that you can't have paid applications.

Article here.

It's pretty surprising that they'd do this, since we're basically talking about alienating the most hardcore fans and the developers that make your platform worthwhile. There were a lot of areas where Google either just didn't think things through or rushed to market before things were ready (the Email app!).

Here is the rub, as far as I'm concerned:

Google depends on file permissions to control applications.
The developer phone has the ability to copy paid applications.
*All* rooted phones have the ability to copy paid applications (since using file permissions isn't very secure).

So, Google has decided to punish all users that have the ADP1 version of the operating system (all phones can currently be switched between all the different versions of the OS: US/UK/Europe/ADP). Note that this isn't the same set of users that have the ADP1 hardware; at least one person I know has an ADP1, but has a rooted version of RC33 (the US version).

If you took the time to put a rooted version of one of the consumer versions, then you are unaffected by this. And, since clearly all the people who spent $425 on the ADP1 are merely agents of software piracy, they will all be doing this so they can pirate all the paid applications in the Android Market.

Pirates! Arr.

Wednesday, February 25, 2009

Duplicate message ids in the sendmail log

Here's a quick little perl script that can help you find duplicate message ids in your sendmail log. Script and explanation after the break.


while (<>) {
($_ =~ /msgid=\<(.*?)\>?,/) && ($msgid = $1);
print $msgid . "\n" if ($ids{$msgid} == 2);

Where to use this script? Let's say, hypothetically, that you have email coming in that is causing the mail server to tempfail a message (over and over). Or you have a server that is sending a mail message over and over. This script can help you find those messages quickly.

Stem Cell Research (what hill do you want to die on?)

I think that stem cell research is an important avenue of medical research.

Embryonic stem cell research (as opposed to adult stem cell) has ethical issues related to it. Read on, and I'll tell you why embryonic stem cell research might not be such a great idea, and maybe some insight into the organizational management issues related to it.

This post was triggered by an article on the Examiner.

Yet another breakthrough for stem cell research. Make sure you understand, though: that's adult stem cell research. All of the innovation in stem cell research (including actual medical treatments that are available today) is in adult stem cell research.

It makes no sense to me to spend money on embryonic stem cell research, when over ten years of research, and tens or (more likely) hundreds of millions of dollars have been poured into it, to the net result: zero innovation, zero breakthroughs, zero treatments based on it. What a waste of money and time.

Here's the thing though: once it became a political issue, there was a certain camp that demanded that it be pursued, simply because "the other guys" said that they didn't like it. This isn't a discussion of the politics of the issue, but the idea that people will do things that make no sense just because someone else tells them not to. Those of you with children understand what I'm saying.

Now, I've been a victim of this thinking. I even saw it coming, and it still happened.

I worked for someone who could always get me to take action just by saying, "well, if you think you can't do it...". I knew his strategy, but was still susceptible.

In life (and work) we have to be able to take an objective step back, and ask ourselves: is this the hill I want to die on? Is this actually productive, or am I just pursuing it for emotional reasons?

Tuesday, February 24, 2009

Thinking about reputation services

One of the most popular methods of blocking spam these days is to use a device which implements a reputation service. Unfortunately, there is a significant issue with these types of services. With a little research, a spammer could effectively bypass this protection or pollute the reputations of so-called "good" hosts.

A short description of how reputation services work, and then a breakdown of the failure spot after the break.

Reputation services work on a simple principal: they keep a list of hosts and their respective reputations, and they perform actions based on the reputation of the connecting host. This system "trains" like a bayesian filter does--hosts can improve or decline in status over time. Typically this is augmented with a service which updates this host list.

The two important points to keep in mind here are that a) they track IP addresses, and b) they use training.

A brief sidebar to talk about the Internet Protocol (IP). IP packets can become fragmented for various reasons, including intentionally. The IP specification has a method to re-assemble these fragmented packets:

To assemble the fragments of an internet datagram, an internet
protocol module (for example at a destination host) combines
internet datagrams that all have the same value for the four fields:
identification, source, destination, and protocol. The combination
is done by placing the data portion of each fragment in the relative
position indicated by the fragment offset in that fragment's
internet header. The first fragment will have the fragment offset
zero, and the last fragment will have the more-fragments flag reset
to zero.

The important section there is "placing the data portion of each fragment in the relative position indicated by the fragment offset". In other words, each fragment gets to determine where it starts. By specially crafting packets, it is possible to forge the source and destination ip addresses, but this is only evident if the packets are re-assembled. If you are running at high volume, packet re-assembly is expensive.

Here's the potential problem: if you have a reputation service (which, by definition, needs to handle high volume), it could easily fall prey to clever spammers using IP fragmentation to bypass or pollute the reputation data. If they were truly evil, they would deliberately ruin the reputation of valid, non-spamming organizations.

Worse still, most of these reputation devices communicate back to a central reputation store, so it would be possible to create a denial of service against certain organizations: the ruined reputation now propagates out to other subscribers of the service.

Defense in depth is the only strategy that truly works long term.