Steve Squyres: Roving Mars

You could be excused for thinking that Roving Mars: Spirit, Opportunity, and the Exploration of the Red Planet is a science book. It’s got a Martian landscape on the front cover, and the author was the “Principal Investigator” of the projects it chronicles. If you’re not careful, you might even learn a little bit about geology.

Mostly, though, Roving Mars is a book about project management. Squyres often speaks, somewhat disconcertingly, about “doing science” as if science is merely a product of having assets correctly positioned, in the same way that a movie’s revenue is the product of having copies of the film in theatres. He admits that, from his perspective, one of the critical goals of the Spirit and Opportunity missions was to justify more Mars missions, in the same way a succesful product generates more demand in the marketplace.

Much of the ground Squyres covers will be familiar to anyone who’s manged a difficult project (perhaps especially a software development effort). He covers intial brainstorming; marketing and proposal development; forming strategic alliances with competitors; the struggle for budgetary, schedule, and manpower resources; risk mitigation strategies; motivational techniques; benefits and drawbacks of delegation and outsourcing; troubleshooting and quality assurance; and aproaches to consensus-building and fostering effective decision-making. It’s a fast and engaging read. Several chapters are written in the form of Squyres’ journal entries, which gives it a “you are there,” sort of immediacy. For a book about project management, it’s often surprisingly suspenseful and moving, and Squyres’ “boldly go where no one has gone before”-style enthusiasm is palpable.

Throughout he makes a solid case for his own talents as a manager (despite his penchant for tantrums). And throughout he reinforces my growing sense that there is something fundamentally and systemically wrong with the current best-practice management of complex engineering development efforts.

The Mars rover project is repeatedly stymied by mistakes that simply shouldn’t be made: instruments designed to work sideways but not upright, confusion between English and metric units, pieces that are fabricated to the wrong size. It’s perhaps especially disheartening to compare these errors to the highly-publicized mistakes NASA has made in recent history, from grinding the Hubble’s mirror to the wrong spec to the material science failures that cost the lives of space shuttle astronauts.

Also disturbing — but eerily familiar to me — was the degree to which the developers of the Mars rover software were unable to predict its behavior. I was shocked by how frequently the rover team was faced by problems I’ve faced with notoriously buggy commercial software. Computer that crashes as soon as it boots up? Been there, fixed that. Corrupted flash memory? Ate my second cellphone alive.

I’m convinced that the issue isn’t stupidity or incompetence on the part of the team, not just because these folks have high-falutin’ degrees in their fields, but also because every smart team I’ve had a chance to observe or directly work with — including some folks who made me feel positively dim — has made similarly obvious mistakes on sufficiently complex projects. On the biggest projects I’ve been associated with, it was sometimes painfully obvious that no single person understood the whole requirements document. I once saw a data entity diagram that covered a large conference room wall from floor to ceiling. I saw team members literally start sobbing when it became evident that fundamental assumptions underlying that diagram — which represented over a year of work and several million dollars — had never been valid.

I’ve begun to think of it as a big picture/little picture problem. When teams are stovepiped, each group can do its “little-picture” work and check and resolve its internal errors. On small, well-characterized projects, group leaders can grasp the “big picture” at a level of detail that permits identification and resolution of problems that cross group lines. But on projects that are bigger and more uncertain, it becomes impossible for anyone to grasp the gestalt of the project at a sufficient level of detail. Things start to slip through the cracks.

Since Malcolm Gladwell’s books — particularly The Tipping Point — have had more influence on my thinking than any others in a decade or so, I’m inclined to wonder if large engineering projects are being constrained by the fundamental limits of human cognition. I’m even tempted to wonder if Gladwell’s “magic number” 150 might crop up somewhere in a calculation of maximum manageable size.

I don’t think the problem is insoluble, but I think it calls for new techniques for asserting correctness. There are mathematical methods for “proving” the correctness of software. They’re seldom applied in the real world, partly because they’re cumbersome and expensive, but also, I think, because they rely on not changing requirements during development. I argue that since no one ever understands the requirements for complex projects, it’s almost inevitable that the requirements will change when one or more deficencies are identified midstream. My anecdotal experience suggests strongly that many serious engineering errors arise from failure to understand the consequences of a requirements change during the development cycle.

The engineering development process of the future should attack this problem from three angles:

  • The requirements definition phase must systemically address the inability of humans to fully characterize the behavior of extremely complex systems.
  • Throughout the development cycle it must embody consistency checks that prevent errors of the English/metric variety
  • Throughout the development cycle it must explicitly maintain the constraints on its own behavior, so that flaws resulting from requirements changes are immediately evident.
    (Software often has implicit constraints, e.g., it only works if only one document is open. Currently, information about these constraints may only exist in the mind of a single developer.)

Two other takeaways from Roving Mars:

  • Good golly, rocket scientists drink more than I would have guessed.
  • Wow, a lot of Mars probes have just flat out disappeared. Some enterprising sci-fi writer ought to be able to get at least a short story out of the conceit that the Martians shoot down any probe that gets too close to their cities, and play games keeping just out of camera range of the ones they allow to land.

Needs More Demons? No, Squyres’ project is plenty bedevilled.

Published by therealsummervillain

likes: equality, making things easier to use, biking, jangle, distortion, monogamy dislikes: bigotry, policies that jeopardize people, lack of transparency

2 thoughts on “Steve Squyres: Roving Mars

  1. What you’re saying is exactly true, based on my own experiences from the last couple of years working on the most complex system I’ve ever worked on. When a project is so big that no one person, either on the contractor side or on the customer side, understands the basics of the whole thing, then requirements are going to change drastically midstream somewhere and you just have to hope that your initial design is robust enough to accomodate those changes without totally going back to the drawing board. As more and more legacy systems get retired and integrated together with other systems into whole new ones, this is becoming more and more of a problem. The human brain just doesn’t seem capable of wrapping itself around the things.

    Like

  2. I’ve had some great discussions tangentially inspired by this item, but unfortunately I can’t share them here.

    I should acknowledge that the Correctness by Construction article by Martin Croxford and Dr. Roderick Chapman (both of Praxis High Integrity Systems) was of great interest. Their approach to correctness relies on creating accurate specifications.

    I heard from a co-worker of the folks who worked on the lander software, who is currently working on a large project following an agile process. The developer expressed some frustration with that effort.

    Agile development treats requirements and specifications more flexibly than the traditional “waterfall” processes. Agile projects strive for early release of working code and frequent iterative improvements. I’ve actually been a member of an agile team for the past several months; there are handful of us, and it works very well. The anecdotal evidence I’ve been collecting suggests that agile development doesn’t scale well; it works best with small teams.

    I’ve heard it alleged that Scrum, a fairly stringent breed of agile development, scales better. I have no direct experience with Scrum to date.

    Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: