Ron Garret talks about DS1, a project in the 90s where a small team used Lisp to send an autonomous craft into space in a third of the time and at a fraction of the cost of previous comparable projects. (Yes, I couldn't have made up a more HN-friendly story if I tried.)
I seem to remember a Scientific American article from the 1980s where they also had to rewrite code around a dead bit in RAM. They've done some pretty drastic changes since, like using the backup computer to add image compression on the fly (lots more detail here): http://history.nasa.gov/computers/Ch6-2.html
Sure 2.5 billion dollars is a large sum of money. Imagine how you'd break this news to the public. You've practically crushed people's hopes and dreams. "Umm, guys, we just accidentally bricked the rover because of a firmware upgrade gone rogue". I'd end up in tears if I had done this.
Speaking of firmwares, I'm unable to find out information if they have a dual bios kind of system where if the boot fails it can recover from the backup firmware.
Wait, they're doing this by hand? Production deployments to a computer on a different planet? In the 21st century? If there's one deploy I'd want to automate the hell of (and test, over a 2 baud network), it's this one.
It's not like these guys never did a remote software update before. So I must be missing something.
It depends on what you my by "doing it by hand." The update process is "automated" in the sense that it's completely scripted. Every step is known ahead of time, and has been extensively tested on ground-based duplicates of the flight hardware. The only aspect of the process that is done "by hand" is "pushing the big red button" to initiate the next step of the process, and even this is a very stylized and well rehearsed process. The only reason for having even this step of the process be manual is so that humans can assess the situation between steps and satisfy themselves that nothing has gone wrong with one step before proceeding to the next. That's the real challenge in a situation like this: your communications link is operating on the hairy edge of the limits imposed by the laws of physics, so lots of things can go wrong in production that worked in rehearsal.
Automate it in a test environment that simulates production as closely as possible. Once the automated software updates work flawlessly under various conditions, use that in production.
Then an unexpected condition happens and the automated response breaks something badly, and suddenly your expensive robot is now an expensive piece of junk.
I find it weird that the rover wouldn't have enough storage to hold both the logics of the landing and the surface mission. I mean, SSDs nowadays can hold hundreds of gigs and are very small.
Does anybody have more details on that? A complete software update just sounds like a really risky thing, for something that you have no physical access to.
My opinion is that they probably have really strict protocols from legacy missions, and since they just work, they're not going to change them.
Every time there's a report about Curiosity doing something that overlaps with software or hardware that we at HN are familiar with comments like this appear.
The bottom line is: (a) Curiosity was designed 8 years ago so the tech. on board is old (b) everything that's on board has to be radiation hardened and (c) NASA are conservative about 'new stuff' because they need it to work (e.g. the thrusters on the skycrane were derived from the Viking landers).
It's able to supply a steady 125 W of power for 14 years, before dropping to 100W. The technology has been used reliably on multiple projects. Based on the longevity of the previous rovers that have safely made it to Mars it's worth considering that barring mechanical malfunctions and hardware failures (it has two main computers) the rover could turn power off to non critical components after 14 years and continue to run on 100W. It might not be far fetched to think that this rover could still be running if/when humans arrive in the 2030s.
That's far cooler than what you would get running on the latest and greatest for convenience.
I wonder whether it will get to the point where they just send processors was a large amount of redundancy, multicore/ multiprocessor, and just not trust any one result but probabilistically across operations in parallel.
It must be unbearably frustrating building this stuff to work around the limitations of 10+ year-old technologies when the superior tech inside even the phone in your pocket could achieve so much more.
Almost all of the radiation effects appear to be single event upsets (SEUs). So in principle, it should be possible to design robust software to detect and recover from faults (SEUs do no permanent damage to hardware).
There have been efforts within NASA to design robust OS extensions -- redundancy, watchdogs, heartbeats, checkpoints, etc. -- so that fast/cheap COTS hardware could be used, at least for some functions like image processing. This has never reached critical mass and been carried to completion, however.
So yes, as you guessed, there is significant pain and heartburn that computational capabilities are so far behind. Engineers have to spend a lot of time and cleverness squeezing, say, stereo vision processing, or image compression, into the hardware available.
Indeed, their shopping list 8 years ago was not from Newegg with whatever Intel had at the time, it was whatever BAE had spent many years rad hardening and qualifying by that point, which would itself have been based on an already proven CPU (read: older) when BAE first started looking. All of a sudden it's 1993.
"Many people think hard drives are hermetically sealed, but they're not - they just have no through-flow ventilation. In vacuum, a hard drive will instantly eat itself when turned on.
[…] Even if you run at full atmospheric pressure, you don't want all your computers to die if you lose some air.
Very good point. I guess the other thing I hadn't thought about is temperature (variability in particular) which must affect hard drives with all their moving/expanding metal parts a lot more than it does SSDs.
It probably wouldn't be too hard to do. I think the problem with that for an application like this would be standard physical harddrive reliability issues. More moving parts, no matter how reliable we normally think of them, probably wouldn't be something that would thrill the people who build these rovers.
Ron Garret talks about DS1, a project in the 90s where a small team used Lisp to send an autonomous craft into space in a third of the time and at a fraction of the cost of previous comparable projects. (Yes, I couldn't have made up a more HN-friendly story if I tried.)