March 2010 Archives

manpurse.jpg

I’ve written my fair share of data access code. Recently I’ve come up with a nice solution for the case when only some of the requested keys are in the database.

My initial reaction to the “some found” scenario was implement one of two options:

  1. Raise an Exception
  2. Return only the found items

The first option is attractive because asking for an ID carries with it the assumption that the item exists. If it doesn’t, it’s an error.

The second option becomes more desirable if you don’t want to crash the whole operation if a tiny fraction of items might be hosed. Imagine manpurses.com, where a page normally shows 100 purses, but only one of them is missing from the database. Should we kill all hope of showing the 99 good records?

Recently I came up with what I think is a Pretty Good solution to the whole scenario. Indulge me as I walk through my thinking.

A Data Adapter for Man Purses

class UnforgivingManPurseDataAdapter(object):

    def __init__(self, db):
        self.db = db

    def get_bags(self, bag_ids):
        """Get bags for a list of ids.

        Raise an exception if any bags aren't found.

        """
        bags = self.db.query("select * from BAGS where id in (" + bag_ids + ")")
        if len(bags) < len(bag_ids):
            raise BagException("problem getting bags for ids " + str(bag_ids))
        return bags

The UnforgivingManPurseAdapter implements the first option from above: it raises an error if any of the bags aren’t found in the database. Deeming that too harsh for a production situation, I came up with a solution like this:

Squishy Man Purse Adapter

class SquishyManPurseDataAdapter(object):

    def __init__(self, db):
        self.db = db

    def get_bags(self, bag_ids, is_missing_ok=True):
        """Get bags for a list of ids

        Raise an exception if is_missing_ok is False and not
        all bags are found in the database.

        """
        bags = self.db.query("select * from BAGS where id in (" + bag_ids + ")")
        if len(bags) < len(bag_ids) and not is_missing_ok:
            raise BagException("problem getting bags for ids " + str(bag_ids))
        return bags

The SquishyManPurseAdapter lets the caller decide if missing bags should be an error. This is fine, but the method signature is kind of polluted. Plus in the default case the operation fails silently. That’s not awesome.

Another implementation of option two is to return a tuple of found and missing products:

Tuple Man Purse Data Adapter

class TuplelManPurseDataAdapter(object):

    def __init__(self, db):
        self.db = db

    def get_bags(self, bag_ids):
        """Get bags for a list of ids

        Raise an exception if is_missing_ok is False and not
        all bags are found in the database.

        """
        bags = self.db.query("select * from BAGS where id in (" + bag_ids + ")")
        missing_ids = []
        if len(bags) < len(bag_ids):
            missing_ids = self._get_missing_ids(bag_ids, bags)    
        return (bags, missing_ids)

This is bad because now the caller needs to check if there are any missing ids. Plus the method’s name implies it gets bags, but it really gets bags AND missing bag ids. Grossout.

My Solution: A Smart and Friendly Data Error Class


class PursesNotFoundError(Exception):
    """Some purses are missing.

    Check the missing_ids attribute for the missing purse ids.
    The found_purses attribute holds the purses that were found.

    """
    missing_ids = None
    found_purses = None

    def __init__(self, missing_ids=None, found_purses=None):
        self.missing_ids = missing_ids or []
        self.found_purses = found_purses or []


class HumaneManPurseDataAdapter(object):

    def __init__(self, db):
        self.db = db

    def get_bags(self, bag_ids):
        """Get bags for a list of ids.

        Raise an exception if any bags aren't found.

        """
        bags = self.db.query("select * from BAGS where id in (" + bag_ids + ")")
        if len(bags) < len(bag_ids):
            missing_ids = self._get_missing_ids(bag_ids, bags)
            raise PursesNotFoundError(missing_bag_ids, bags)
        return bags

I really like this solution because it:

  1. Makes no bones about the fact that something went wrong. An exception will always be raised on missing purses.
  2. Allows the caller to recover from the error with minimal effort (if it chooses to do so).

In retrospect, I really need to modify my thinking around Exceptions. They’re not just vessels for an error string that rocket their way up through the stack when Things Go Wrong. They can and should be packed with data and functionality that make it easier for the caller to recover from the error.

In fact, Python 2.6 does away with the whole “message” attribute of the Exception, which further reinforces the fact that it should probably be a fairly rich object.

Ted Dziuba has put into words perfectly what I feel every time someone utters “Because relational databases suck”.

by replacing MySQL or Postgres with a different, new data store, you have traded a well-enumerated list of limitations and warts for a newer, poorly understood list of limitations and warts, and that is a huge business risk.

The points he make kind of apply to any new hotness. People often discount the business value of understood systems. Better the devil you know and all that.

It’s seductive to us nerd-types to use later and greater stuff. Before you go chucking your existing architecture, take a teeny moment to consider if you’re doing just fine with what you’ve got.

I Can’t Wait for NoSQL to Die - [Ted Dziuba]

I’ve long held that the most important part of the iconic scrum-agile-xp user story format (“as a…I want…so…”) is the “So” section. You know, the part that gives the reason for the story?

As this post points out, the “so” is often given over to tautology or ignored entirely.

A few of us got talking while doing an inception early in 2009 that the ‘so that’ statement was often either repeating the requirement, was too general or was left off completely when capturing stories. This meant that we were missing the true goal of the business and lead to problems with scope and misunderstandings of what a story truly meant once we got into delivery.

The proposed solution is simple, elegant, and awesome: put the “so” before the “what”.

Check out the examples given in the post. With this new format you can’t write a story without thinking about and documenting it’s motivation. It’s impossible. It’s great.

so that… so what? - jkBlog

USE ME! My Flip Video Camera

| Comments

This week my awesome wife bought me a Flip Ultra HD video camera (in black). It’s fairly well-known that this little camera was built on very pragmatic principles, achieving its success due to a palatable price and simple feature set.

No Dumb CD

This pragmatism really hit me when I found that the installer for the bundled video editing and management sofware FlipShare was stored in the camera itself. No CD needed. Kind of a no-brainer when you think about it.

It Waits to Find Updates Until You’re Done!

The other subtle but clever about the software is that FlipShare looks for updates when you close the program, not when you first open it. This is total genius.

People start programs to do something with them, not to see if there are any updates. Getting a “Hey there’s a new version!” alert box shoved in your face creates a minor but tangible block to your work.

Waiting until the user is done lets them immediately do what they need to, then give a little “if it’s not too much trouble…” when turning it off. Very nice.

I’m wondering if this is the future of rich clients: Get emergency updates upon startup, everything else on shutdown.

The flip has tons of other ways it quietly screams “You can use me right away!”, and they all combine into a cutely pleasant experience.

Now to make enterprise applications so much fun…

Ha! Code Entropy Explained

| Comments

This cartoon is probably the best depiction I’ve seen of code entropy.

entropy_explained.jpg

Simply Explained: Entropy - [Geek & Poke]

Helmuth_Karl_Bernhard_von_Moltke.jpg

Field Marshal Helmuth Carl Bernard von Moltke:

No plan of operations extends with certainty beyond the first encounter with the enemy’s main strength

Which means that you should construct a thorough and high-quality plan, but that plan must be immediately and constantly adjusted.

Software projects can be really painful it the project plan is never altered. People don’t like doing it. I suspect it’s because making project plans is difficult and time-consuming. Updating and changing it is admitting initial plan was somehow wrong.

Or maybe managers are terrified by the thought of constantly changing how we’re working.

Or maybe we feel like it’s a waste of time. We’ve got a plan, let’s follow the damn thing and finish up.

Or maybe it’s panic. The more time we spend planning, the less time we have to finish the work.

Updating the project plan isn’t wasteful, nor is it an admission of failure. It’s reality. With something as hard to predict as software, we should expect that our initial plan won’t line up with what’s happening.

Don’t get me wrong. It’s critical to have a plan. Just remember that it should change. Constantly.

The Last Test You'll Ever Write

| Comments

What Makes You Think It Will Work?

  • In production?
  • when the database goes down?
  • If we get ten-times our normal traffic?
  • If an exception is thrown by the confabulator module?
  • If someone doesn’t enter any input at all?
  • If we need to revert our code?
  • In Opera/Safari/Konqueror?

You can add your own predicates to this question, but you get the idea. I find that this is a great principle to follow when building software. This concept glues together all the other kinds of testing we do, but it’s easy to lose sight of it.

Passing unit tests aren’t good evidence that your system will work. They’re good evidence that your code isn’t fundamentally broken, but say nothing for what the end-user might experience.

You still need to verify that your code will integrate properly with other systems, deploy well, and work in it’s live environment.

Better than just unit tests is a combination of automated and human test flavors: unit tests, integration tests, automated browser tests, human-driven testing, and any number of others.

All this testing creates a level of confidence that things will be OK. It’s never really a guarantee, and each team’s mix will vary.

It might be ok for your shop of one to have no automated tests for the one page you manage. You just need to be prepared to click all the links on it every time you make a change.

Your 100-person team building an enterprise portal probably needs a little more automation to achieve the same level of comfort, and so will likely rig some integration tests and browser testing.

Or maybe you just decide that you can and should staff fifty full-time testers to continuously click things.

Whatever the mix, the purpose remains the same: help build confidence that the final product will work.

Comment Gadget Fail

| Comments

disqus.jpg

I’ve switched back from the Google Friend Connect comment Gadget to the original Disqus comment setup.

Why I Wanted Friend Connect

I love the idea of readers “joining” Code Softly, much like they might follow a site on Blogger. So I added Friend Connect’s members gadget to the sidebar (you should totally join).

Once that was in, I found the recommendation widget, which was kind of a nice and generic thumbs-up gizmo, kinda like digg or reddit .

So far, so good. You can join the community and then flag posts you like.

Then I thought it would be cool if people could comment using the same identity they created when joining and recommending. That’s when things got rocky.

Where the Comment Gadget Falls Down

I was happy to see the commenting gadget in the Friend Connect gallery. I yanked the code for that sucker, tweaked it a bit, and punched it onto the site.

comments.png

I immediately saw a few problems:

  • No aggregation. There was no easy way to indicate on the homepage how many comments a post had. I got around this by showing a collapsed widget at the bottom of a post, but it still felt kinda awkward.

  • No formatting. Comments get all of their formatting removed. Things like whitespace are important for aesthetics, especially when people might be posting code snippets and stuff.

  • Strange comment counting. A reply to a comment isn’t “counted” as a comment. For example, if Lenny makes a comment, then Carl replies to Lenny, the total comment count is one.

  • No easy view of all the comments on the site.

With My Tail Between My Legs

After a few posts, I realized that feedback and discussion was suffering because of the limitations of the widget. Plus, the folks at Disqus had made some speed improvements.

So I put the older, richer, and yes better comments up. Maybe someday the Friend Connect widget will be improved upon, or someone can refute all my criticisms and I’ll be able to try it again. For now I’m sticking with what works.

gilligan.jpg

What makes an 11-year veteran programmer better than a hotshot n00b right out of school?

I think it’s fair to say that experienced programmers are better than rookies. Sure, there are exceptions, but in my time in the business I’ve rarely seen someone with a couple of years under their belt do better at creating good stuff on time than someone who’s been at it for a decade.

Before everyone goes nuts, let me qualify this a bit. I define “better” as “better able to deliver working software within a mid-size shop”. This narrows things down, but it’s where my experiences lie, so I choose to focus on it.

I have my own thoughts on why I think this is true, but I’m very interested in hearing what others have to say.

So, thoughts?

All too often we slap together solutions in the name of finishing on time. This has been described as incurring technical debt:

Just as a business incurs some debt to take advantage of a market opportunity developers may incur technical debt to hit an important deadline.

I think programmers intuitively feel this is correct. Good ones learn where the cutoff is, bad ones either create piles of technical debt, or paralyze their shop in the quest for gold-plated code.

One of the insidious things about technical debt is that eliminating it is more complex than just paying of financial debt + interest. There’s a psychological component that can sometimes hide the debt entirely.

For example, in a rush, Flavio crams in a complex but crude workaround to a problem. His solution gets the job done, but it’s complexity hides the fact that he has broken the object model and implemented a really crummy design.

Over time, this kludge is mistaken for “the way we do it” by other developer who come across the design. They reinforce it by building upon it, oblivious to it’s rotten nature.

When Flavio peeks back in and sees the horrible mess built upon his once-tactical hack, he runs for the hills. The technical debt has ballooned into some kind of sick nerd reverse-mortgage crisis.

Normal debts have creditors who are very good at informing their debtors of exactly how much is owed. Software doesn’t extend us that courtesy.

We’re kind of on the hook for keeping track of our own technical debt, which means we need to be disciplined and pay things off as soon as possible.

normal-distribution.jpg

Your project probably isn’t epic, nor is it trivial. That’s precisely why it’s hard.

John D. Cook summarizes the mathematical “law of medium numbers”. In short: the hard stuff is in the middle.

Atoms are simple, and so are stars, but medium-sized things like birds are complicated. Medium-sized systems are where you see chaos.

With minimal effort, I humbly restate this as The Law Of Medium Software Projects:

For medium-sized software projects, we can expect that large fluctuations, irregularities, and discrepancies in time and effort will occur more or less regularly.

When you start from scratch with a small problem space, like “I want a portfolio site”, the solution is pretty clear-cut: Pick a basic publishing platform, get your images, prose, then organize and launch.

Super-large efforts can also be simple. If you’re building out a titanic data center, the question of “how much power we gonna’ need?” boils down to square-footage and math.

It’s the middle-sized stuff that is tricky: the four-month project, the 300-user web app, the blog that gets enough traffic to create non-trivial hosting costs, the database that’s big enough to need DBA’s, but not big enough for bleeding-edge storage stuff.

The normal curve tells us that most projects fall in the middle. The realities of our profession dictate the same.

If a job is so small as to be trivially simple, people rig it themselves or unload it to a third party provider.

If a project is titanic in scope and size, then we staff a lot of people and plan for a lot of time. While the details can be gnarly and intricate, the 10,000 foot-view of the project boils down to predicting person-hours and hardware costs.

What the Law Means to You

We need to be careful which methodologies and anecdotes we latch onto. Just because Google leans on Bigtable doesn’t mean it’s time to drop Oracle like a bad habit. Just because 37Signals says “simple is best” doesn’t mean you can ignore 95% of the feature requests from your department.

The trick is to map these successes onto our own situation. Think of it like the breakfast buffet at Denny’s. Sure, they’ve got cantaloupe, but you want eggs and pancakes, so admire the melon. Appreciate what it can lend to breakfast. Maybe even take a teeny piece, but don’t dump what you’ve accumulated just to make room for honeydew.