Debugging software that is running 150 million miles away is something most of us will never have to do, thankfully. But one former NASA programmer, software engineer Ron Garret, shared his experience of diagnosing faulty LISP software on a Deep Space spacecraft mission, in one recent episode of Adam Gordon Bell’s Corecursive podcast.
Garret shared a remarkable story about debugging in deep space – along with some memories from the early days of programming. Along the way, Garret offered a refreshing perspective on what’s changed – and what hasn’t – in the world of programming. Garret also explored the unique challenges of writing code for a spacecraft.
And he remembered his starring role in one truly glorious moment from the history of Lisp.
Garret had worked as a research scientist at NASA’s Jet Propulsion Laboratory from 1988 until 2000 – and again from 2001 through 2004. Garret’s specialty: autonomous mobile robots. He helped to pioneer what is today the de-facto standard autonomous mobile robot control architecture.
Garret’s team worked on prototypes for the robotic Mars rover Sojourner.
But then there was Lisp – a language based on abstracting problems cleanly into lists and functions. And while C programmers worry about things like dangling pointers, Lisp also has automatic memory management. “It’s just so much faster and easier to get things done when the language you’re using provides you with some of these high level abstractions,” Garret remembered on the podcast. “And in a world where the only language that has that is Lisp, knowing Lisp really is like a superpower.
With Lisp, “every problem becomes a compiler problem”
“It just blew everything else out of the water back in those days.”
Back in the day, Lisp really wasn’t used that much around NASA though.
“There was quite a bit of prejudice against Lisp because it was weird and unfamiliar, and it had this strange garbage collection technology that you just never knew when it would just stop your process dead in its tracks,” Garrett recalled.
Garret’s group found it useful for memory-constrained hardware. Lisp could be used to fashion a custom language specifically for the problem at hand, and then compiling it for the robot’s hardware. Or, as Bell puts it, “every problem becomes a compiler problem.” Garret’s team painstakingly wrote and tested their code on a robot simulator (on a Macintosh computer) before installing it in the actual rover and performing a time-consuming test drive out into the Arroyo.
Despite the code base the group developed, when the Sojourner rover reached Mars, it was powered with C code.
Yet in 1998 a new NASA director launched NASA’s New Millennium project – a pilot program for demonstrating different (and cheaper) technologies, through a number of deep space exploratory missions.
This meant their Lisp code got a second life, Garret remembered on the podcast. The autonomy technology that the team had started developing for rovers was repurposed. Its new mission? Flight controller.
Garret’s team worked on an innovative decision-making software – using a custom language written in Lisp specifically designed to avoid the possibility of a dreaded “race condition” (where two concurrently running threads fight for the same memory space). “It was tested for days and days and days” – on the exact same hardware was going into space. “So we were very confident that it was going to work.
“And it did not work…”
Deep Space Failure:
Garret explains that during their three days of flight-controlling, “There was a time at which it was supposed to do something and that time came and went and it did not do the thing it was supposed to do. And alarm bells rang…
“Now this code that’s been proven deadlock-free seems to be frozen 150 million miles from home.”
It was a tense situation. “We had no idea what was going on…. And everything that we did when we decided to do something, we would do it and then we’d sit around and wait an hour for the result. ” After a team in a conference room reached their consensus, their commands “went through a review process that consisted of a number of layers of management, all of whom had to sign off on it.”
After approvals were obtained, the commands went out through a dedicated hardwired network to one of the Deep Space network 70 meter-wide antennas, which sent the commands flying through space at the speed of light… ”
First they requested a backtrace – a common programming operation that generates a list of all the currently-active processes (and, as Garret described it, “what they’re waiting for.”)
“It was actually almost immediately obvious what was going wrong because there was this one process that was waiting for something that should have already happened…
Ron dug up these photos that are very cool: pic.twitter.com/9PbnYXw5PU:
– Adam Gordon Bell 🤓 (@adamgordonbell) May 2, 2022:
“The problem was that there was, in fact, a race condition. Which was supposed to have been impossible. ” Unfortunately, one of Garret’s coders had called a lower-level Lisp function – which had inadvertently created “an end-run around the safety guarantees” of their carefully-customized language. (Garret blames himself for not explaining this more clearly to the coder.)
The team decided to “manually” trigger the event – which got the software running again.
“We did not lose the spacecraft and we did accomplish all of the mission objectives – so technically it was a success,” Garret said on the podcast. “But the development process was so painful and fraught with difficulty – and again, there were politics. So despite the fact that we actually did manage to get it to work, the autonomy project was canceled after that and it never flew again. ”
A 2002 essay on Garret’s personal web site argues that “The demise of Lisp at JPL is a tragedy. The language is particularly well suited for the kind of software development that is often done here: one-of-a-kind, highly dynamic applications that must be developed on extremely tight budgets and schedules. ”
But Lisp was passed over for C ++, and then Java, with the rationale given as an attempt to follow “best practices.” Garret’s response? “We’re confusing best practice with: standard: practice. The two are not the same. ” And even beyond that, what ultimately best isn’t an unvarying standard, but should depend on the particulars of the current project at hand.
But in a discussion on Hacker News, one commenter identified themself as a NASA engineer who’d been the payload software engineer for a 2009 mission exploring the moon’s south pole – and said they’d used Lisp to write their own custom language for instrument command sequences (and for simulating the computer). “Lisp’s simple, flexible syntax and macros made it easy to express patterns of commanding and timing for this.”
So they left Garret with a reassuring thought: “I think Lisp is still used in various nooks and crannies of NASA.”
Feature image by NASA / JPL, Public Domain.