IBM has developed a computerised voice that is almost indistinguishable from a human.

randomtask · on Feb 2, 2009

Here's a page with some sample audio clips. There are some papers that are some credited to the guy mentioned in the article on the publications page

http://www.research.ibm.com/tts/

wallflower · on Feb 2, 2009

Acapela Group's voices are already pretty good. I bought Lucy, the British one. She sounds almost like a voice-response automated phone system.

http://www.acapela-group.com/text-to-speech-interactive-demo...

nobbynutz · on Feb 2, 2009

if you don't mind me asking how much was "Lucy" roughly? $100's, $1,000's, $10,000's, more? I couldn't find a price list on the site.

lunchbox · on Feb 2, 2009

Lucy costs $35 USD, plus the cost of the text-to-speech program (e.g. TextAloud, which I think is <$50): http://www.nextup.com/acapela.html

(Acapela doesn't sell directly to consumers; they license their technology to resellers.)

wallflower · on Feb 2, 2009

As a caveat, if you are thinking of buying a voice like Lucy, it does not work with Windows' built-in Text-to-speech (for example - Adobe Reader's text-to-speech feature). It looks like Lernout & Hauspie has some kind of agreement with Microsoft to lock out other competitors.

furyg3 · on Feb 3, 2009

Wow. They even have tons of foreign languages. This would have been extremely useful when I was learning how to pronounce words in Dutch (still might be ;) ).

wallflower · on Feb 4, 2009

I'm so close to buying the Spanish voice to practice my Spanish listening skills.

markessien · on Feb 2, 2009

You use this in an API or just personally? Is it possible to use these voices over an API (for non-commercial use)?

wallflower · on Feb 2, 2009

I don't use their API. I just use the TextAloud program to occasionally read websites and documentation to me. (Don't do this at home folks - let's you surf the web without 'reading' it - it obviously is not at the 'generate your own audiobook level' [1] - however it's way beyond E.T. and that speech synthesizer)

API product page (iPhone version available now too!): http://www.acapela-group.com/acapela-multimedia-8-speech-sol...

[1] That would require bridging the gap from 'Dry eyes. dry eyes' monotone to an actor doing a dramatic script reading. Vocal variety, appropriate pauses, inflections, tone - wow - a lot of things. Possible startup opportunity here?

As an aside, it takes much longer to listen than to read (I'm thinking 10x for some material).

jodrellblank · on Feb 3, 2009

My possible startup idea on the web/speech was to have a bookmarklet that would send the URL of the page you are on to my service, it would check for recordings of that page, and if any were found embed a flash player in the page so you could hear the text.

If not, you could record something and send it in. It would be a web2 site in the sense that other people would do most of the work ;)

However, I forsee more problems than really useful features, including: checking for accuracy and verbal abuse hidden in submitted recordings, the sheer amount of stuff on the internet making it unlikely that you'd find what you're looking for, and the fact that really (near)blind people would already have screen readers and embedded flash wont help too much, and comments on blogs wouldn't be a good fit for it.

jacquesm · on Feb 2, 2009

Voice recognition and speech synthesis will only really come in to play once - if - we get some kind of a.i. going.

for the longest time I used to think that the next real breakthrough in computers would be proper speech recognition and text-to-speech conversion, but once I clued in to how slow the audio channel is I let go of that.

It seems such a natural, but language is so full of formal 'fluff' that you'd have to have a pretty good a.i. on the other side of the speech link to get any net improvement as compared to traditional input methods.

The ideal secretary would be able to interpret 'send an email to John regarding the outstanding invoices' and be done with it, before you've got a computer at that level we'll be writing 2020 or later.

markessien · on Feb 2, 2009

I know of a possible solution for the problem. I'm going to publish a paper in a couple of months that will include my ideas on this.

jacquesm · on Feb 2, 2009

will you please post it here ? or mail me a link ? j@ww.com

I'd be very (very!) happy to read it.

markessien · on Feb 3, 2009

Will do.

markessien · on Feb 2, 2009

I don't see an iphone version there.

wallflower · on Feb 2, 2009

From parent link.

http://www.acapela-group.com/acapela-for-iphone-26-speech-so...

quoderat · on Feb 2, 2009

Here you go:

http://www.patents.com/Generating-paralinguistic-phenomena-v...

But that said, it mostly just sounds like a way to insert a bunch of annoying "umms" and "ahs" into artificial conversation.

lunchbox · on Feb 2, 2009

So it looks like this system has people use simple textual markup. From the patent:

"For example, the developer may specify: <prosody style="bad news">Well, \sigh I cannot answer that question;</prosody>"

sosuke · on Feb 2, 2009

Perfect candidate for EmotionML? http://www.w3.org/2005/Incubator/emotion/XGR-emotionml/

abdulhaq · on Feb 2, 2009

I'm pretty sure I've spoken to a number of these computerised voices at some customer service call centres.

TooMuchNick · on Feb 2, 2009

IBM's new business plan: Run the works of the Clash through their software, crank it through Songsmith and make millions on the YouTube adsense.

TrevorJ · on Feb 2, 2009

Sounds like they still have some trouble with phoneme cadence and blending.

nobbynutz · on Feb 2, 2009

Does anyone have a link for any further details?

ars · on Feb 2, 2009

Um, I don't, like, want that, er, in my, you know, like, conversations with robots.

mhb · on Feb 2, 2009

No place to, uh, hear it?