NASA Pulls Off 160-Million-Mile Software Patch

dmit · on Aug 18, 2012

Related Google Tech Talk: Debugging Code from 60 Million Miles Away https://www.youtube.com/watch?v=_gZK0tW8EhQ&hd=1

Ron Garret talks about DS1, a project in the 90s where a small team used Lisp to send an autonomous craft into space in a third of the time and at a fraction of the cost of previous comparable projects. (Yes, I couldn't have made up a more HN-friendly story if I tried.)

ovi256 · on Aug 18, 2012

But did it have paying customers ?

shiven · on Aug 18, 2012

Nope. They never figured out the right pricing model for their API :)

sgrove · on Aug 18, 2012

Ah, hadn't seen that one, thanks.

SwaroopH · on Aug 18, 2012

Impressive (even more for the geek me) but done before. Only relevant because of comparison with hellish OTA process for non-Google Android devices.

https://twitter.com/#!/search/realtime/android%20nasa%20ota

noselasd · on Aug 18, 2012

They can even patch Voyager, now at the edge of our solar system: http://www.spaceflightnow.com/news/n1005/11voyager2/

They also patched the MER rovers many times; This bug caused quite a few headlines back in the day: http://www.nasa.gov/offices/oce/llis/1483.html

sehugg · on Aug 18, 2012

I seem to remember a Scientific American article from the 1980s where they also had to rewrite code around a dead bit in RAM. They've done some pretty drastic changes since, like using the backup computer to add image compression on the fly (lots more detail here): http://history.nasa.gov/computers/Ch6-2.html

sanarothe · on Aug 18, 2012

I once nearly 'bricked' my router playing with some custom firmware. That in itself was terrifying, let alone the prospect of 'bricking the rover.'

ahelwer · on Aug 18, 2012

Being responsible for the update would take years off of my life. One slip up and you've thrown away 2.5 billion dollars.

sigkill · on Aug 18, 2012

Sure 2.5 billion dollars is a large sum of money. Imagine how you'd break this news to the public. You've practically crushed people's hopes and dreams. "Umm, guys, we just accidentally bricked the rover because of a firmware upgrade gone rogue". I'd end up in tears if I had done this.

Speaking of firmwares, I'm unable to find out information if they have a dual bios kind of system where if the boot fails it can recover from the backup firmware.

skrebbel · on Aug 18, 2012

Wait, they're doing this by hand? Production deployments to a computer on a different planet? In the 21st century? If there's one deploy I'd want to automate the hell of (and test, over a 2 baud network), it's this one.

It's not like these guys never did a remote software update before. So I must be missing something.

lisper · on Aug 18, 2012

It depends on what you my by "doing it by hand." The update process is "automated" in the sense that it's completely scripted. Every step is known ahead of time, and has been extensively tested on ground-based duplicates of the flight hardware. The only aspect of the process that is done "by hand" is "pushing the big red button" to initiate the next step of the process, and even this is a very stylized and well rehearsed process. The only reason for having even this step of the process be manual is so that humans can assess the situation between steps and satisfy themselves that nothing has gone wrong with one step before proceeding to the next. That's the real challenge in a situation like this: your communications link is operating on the hairy edge of the limits imposed by the laws of physics, so lots of things can go wrong in production that worked in rehearsal.

skrebbel · on Aug 18, 2012

i see! Thanks for the clarification. And, makes a lot of sense.

jonknee · on Aug 18, 2012

Who said they're doing it by hand? It takes 28 minutes to receive feedback from anything, so they're going very slowly and monitoring what happens.

kbatten · on Aug 18, 2012

What is your proposal to automate this? Remember there is very large delay and blackouts that happen.

After they do it by hand several times, sure automate it.

ehsanu1 · on Aug 18, 2012

Automate it in a test environment that simulates production as closely as possible. Once the automated software updates work flawlessly under various conditions, use that in production.

vidarh · on Aug 18, 2012

Then an unexpected condition happens and the automated response breaks something badly, and suddenly your expensive robot is now an expensive piece of junk.

jonknee · on Aug 18, 2012

What makes you think they haven't already tested the hell out of it?

rimantas · on Aug 18, 2012

We are not talking about some machine in amazon cloud there. We are talking about the robot on the freaking Mars.

sktrdie · on Aug 18, 2012

I find it weird that the rover wouldn't have enough storage to hold both the logics of the landing and the surface mission. I mean, SSDs nowadays can hold hundreds of gigs and are very small.

Does anybody have more details on that? A complete software update just sounds like a really risky thing, for something that you have no physical access to.

My opinion is that they probably have really strict protocols from legacy missions, and since they just work, they're not going to change them.

jgrahamc · on Aug 18, 2012

Every time there's a report about Curiosity doing something that overlaps with software or hardware that we at HN are familiar with comments like this appear.

The bottom line is: (a) Curiosity was designed 8 years ago so the tech. on board is old (b) everything that's on board has to be radiation hardened and (c) NASA are conservative about 'new stuff' because they need it to work (e.g. the thrusters on the skycrane were derived from the Viking landers).

angstrom · on Aug 18, 2012

Now tack on the fact that it has an initial 2 year mission powered by an RTG. http://en.wikipedia.org/wiki/Radioisotope_thermoelectric_gen...

...but not just any RTG, a MMRTG: http://en.wikipedia.org/wiki/Multi-Mission_Radioisotope_Ther...

It's able to supply a steady 125 W of power for 14 years, before dropping to 100W. The technology has been used reliably on multiple projects. Based on the longevity of the previous rovers that have safely made it to Mars it's worth considering that barring mechanical malfunctions and hardware failures (it has two main computers) the rover could turn power off to non critical components after 14 years and continue to run on 100W. It might not be far fetched to think that this rover could still be running if/when humans arrive in the 2030s.

That's far cooler than what you would get running on the latest and greatest for convenience.

felipemnoa · on Aug 18, 2012

>> the rover could turn power off to non critical components after 14 years and continue to run on 100W

That constant source is charging a battery. As long as the battery is still working it would just take longer to charge the battery.

sp332 · on Aug 18, 2012

How does the radiation from the power supply not screw with the electronics?

robryan · on Aug 18, 2012

I wonder whether it will get to the point where they just send processors was a large amount of redundancy, multicore/ multiprocessor, and just not trust any one result but probabilistically across operations in parallel.

harrywincup · on Aug 18, 2012

It must be unbearably frustrating building this stuff to work around the limitations of 10+ year-old technologies when the superior tech inside even the phone in your pocket could achieve so much more.

mturmon · on Aug 19, 2012

Almost all of the radiation effects appear to be single event upsets (SEUs). So in principle, it should be possible to design robust software to detect and recover from faults (SEUs do no permanent damage to hardware).

There have been efforts within NASA to design robust OS extensions -- redundancy, watchdogs, heartbeats, checkpoints, etc. -- so that fast/cheap COTS hardware could be used, at least for some functions like image processing. This has never reached critical mass and been carried to completion, however.

So yes, as you guessed, there is significant pain and heartburn that computational capabilities are so far behind. Engineers have to spend a lot of time and cleverness squeezing, say, stereo vision processing, or image compression, into the hardware available.

papalalu · on Aug 18, 2012

presumably the 8 year old design would use already-proven tech too..

ballooney · on Aug 18, 2012

Indeed, their shopping list 8 years ago was not from Newegg with whatever Intel had at the time, it was whatever BAE had spent many years rad hardening and qualifying by that point, which would itself have been based on an already proven CPU (read: older) when BAE first started looking. All of a sudden it's 1993.

facorreia · on Aug 18, 2012

Maybe there's a lesson for us right there.

akent · on Aug 18, 2012

The CPU is a radiation hardened PowerPC 750 known as a RAD750 made by BAE. Some more info at http://news.cnet.com/8301-11386_3-57491281-76/slow-but-rugge...

rwmj · on Aug 18, 2012

Raises the interesting question: Are there radiation-hardened hard drives (not SSDs)?

Someone · on Aug 18, 2012

Hard drives will not work in a vacuum. http://www.dansdata.com/spacecomp.htm:

"Many people think hard drives are hermetically sealed, but they're not - they just have no through-flow ventilation. In vacuum, a hard drive will instantly eat itself when turned on.

[…] Even if you run at full atmospheric pressure, you don't want all your computers to die if you lose some air.

rwmj · on Aug 18, 2012

Very good point. I guess the other thing I hadn't thought about is temperature (variability in particular) which must affect hard drives with all their moving/expanding metal parts a lot more than it does SSDs.

Raphael · on Aug 18, 2012

Apparently he could make the text look fine, but some of the graphics weren't scaling along.

jlgreco · on Aug 18, 2012

It probably wouldn't be too hard to do. I think the problem with that for an application like this would be standard physical harddrive reliability issues. More moving parts, no matter how reliable we normally think of them, probably wouldn't be something that would thrill the people who build these rovers.

Dylan16807 · on Aug 18, 2012

The main storage is 2GB. Each camera has 8GB of storage. There is no way in hell this was solely because of space constraints.