As a caveat, if you are thinking of buying a voice like Lucy, it does not work with Windows' built-in Text-to-speech (for example - Adobe Reader's text-to-speech feature). It looks like Lernout & Hauspie has some kind of agreement with Microsoft to lock out other competitors.
Wow. They even have tons of foreign languages. This would have been extremely useful when I was learning how to pronounce words in Dutch (still might be ;) ).
I don't use their API. I just use the TextAloud program to occasionally read websites and documentation to me. (Don't do this at home folks - let's you surf the web without 'reading' it - it obviously is not at the 'generate your own audiobook level' [1] - however it's way beyond E.T. and that speech synthesizer)
[1] That would require bridging the gap from 'Dry eyes. dry eyes' monotone to an actor doing a dramatic script reading. Vocal variety, appropriate pauses, inflections, tone - wow - a lot of things. Possible startup opportunity here?
As an aside, it takes much longer to listen than to read (I'm thinking 10x for some material).
My possible startup idea on the web/speech was to have a bookmarklet that would send the URL of the page you are on to my service, it would check for recordings of that page, and if any were found embed a flash player in the page so you could hear the text.
If not, you could record something and send it in. It would be a web2 site in the sense that other people would do most of the work ;)
However, I forsee more problems than really useful features, including: checking for accuracy and verbal abuse hidden in submitted recordings, the sheer amount of stuff on the internet making it unlikely that you'd find what you're looking for, and the fact that really (near)blind people would already have screen readers and embedded flash wont help too much, and comments on blogs wouldn't be a good fit for it.
Voice recognition and speech synthesis will only really come in to play once - if - we get some kind of a.i. going.
for the longest time I used to think that the next real breakthrough in computers would be proper speech recognition and text-to-speech conversion, but once I clued in to how slow the audio channel is I let go of that.
It seems such a natural, but language is so full of formal 'fluff' that you'd have to have a pretty good a.i. on the other side of the speech link to get any net improvement as compared to traditional input methods.
The ideal secretary would be able to interpret 'send an email to John regarding the outstanding invoices' and be done with it, before you've got a computer at that level we'll be writing 2020 or later.
http://www.research.ibm.com/tts/