Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
IBM has developed a computerised voice that is almost indistinguishable from a human. (telegraph.co.uk)
27 points by vaksel on Feb 2, 2009 | hide | past | favorite | 25 comments


Here's a page with some sample audio clips. There are some papers that are some credited to the guy mentioned in the article on the publications page

http://www.research.ibm.com/tts/


Acapela Group's voices are already pretty good. I bought Lucy, the British one. She sounds almost like a voice-response automated phone system.

http://www.acapela-group.com/text-to-speech-interactive-demo...


if you don't mind me asking how much was "Lucy" roughly? $100's, $1,000's, $10,000's, more? I couldn't find a price list on the site.


Lucy costs $35 USD, plus the cost of the text-to-speech program (e.g. TextAloud, which I think is <$50): http://www.nextup.com/acapela.html

(Acapela doesn't sell directly to consumers; they license their technology to resellers.)


As a caveat, if you are thinking of buying a voice like Lucy, it does not work with Windows' built-in Text-to-speech (for example - Adobe Reader's text-to-speech feature). It looks like Lernout & Hauspie has some kind of agreement with Microsoft to lock out other competitors.


Wow. They even have tons of foreign languages. This would have been extremely useful when I was learning how to pronounce words in Dutch (still might be ;) ).


I'm so close to buying the Spanish voice to practice my Spanish listening skills.


You use this in an API or just personally? Is it possible to use these voices over an API (for non-commercial use)?


I don't use their API. I just use the TextAloud program to occasionally read websites and documentation to me. (Don't do this at home folks - let's you surf the web without 'reading' it - it obviously is not at the 'generate your own audiobook level' [1] - however it's way beyond E.T. and that speech synthesizer)

API product page (iPhone version available now too!): http://www.acapela-group.com/acapela-multimedia-8-speech-sol...

[1] That would require bridging the gap from 'Dry eyes. dry eyes' monotone to an actor doing a dramatic script reading. Vocal variety, appropriate pauses, inflections, tone - wow - a lot of things. Possible startup opportunity here?

As an aside, it takes much longer to listen than to read (I'm thinking 10x for some material).


My possible startup idea on the web/speech was to have a bookmarklet that would send the URL of the page you are on to my service, it would check for recordings of that page, and if any were found embed a flash player in the page so you could hear the text.

If not, you could record something and send it in. It would be a web2 site in the sense that other people would do most of the work ;)

However, I forsee more problems than really useful features, including: checking for accuracy and verbal abuse hidden in submitted recordings, the sheer amount of stuff on the internet making it unlikely that you'd find what you're looking for, and the fact that really (near)blind people would already have screen readers and embedded flash wont help too much, and comments on blogs wouldn't be a good fit for it.


Voice recognition and speech synthesis will only really come in to play once - if - we get some kind of a.i. going.

for the longest time I used to think that the next real breakthrough in computers would be proper speech recognition and text-to-speech conversion, but once I clued in to how slow the audio channel is I let go of that.

It seems such a natural, but language is so full of formal 'fluff' that you'd have to have a pretty good a.i. on the other side of the speech link to get any net improvement as compared to traditional input methods.

The ideal secretary would be able to interpret 'send an email to John regarding the outstanding invoices' and be done with it, before you've got a computer at that level we'll be writing 2020 or later.


I know of a possible solution for the problem. I'm going to publish a paper in a couple of months that will include my ideas on this.


will you please post it here ? or mail me a link ? j@ww.com

I'd be very (very!) happy to read it.


Will do.


I don't see an iphone version there.



Here you go:

http://www.patents.com/Generating-paralinguistic-phenomena-v...

But that said, it mostly just sounds like a way to insert a bunch of annoying "umms" and "ahs" into artificial conversation.


So it looks like this system has people use simple textual markup. From the patent:

"For example, the developer may specify: <prosody style="bad news">Well, \sigh I cannot answer that question;</prosody>"



I'm pretty sure I've spoken to a number of these computerised voices at some customer service call centres.


IBM's new business plan: Run the works of the Clash through their software, crank it through Songsmith and make millions on the YouTube adsense.


Sounds like they still have some trouble with phoneme cadence and blending.


Does anyone have a link for any further details?


Um, I don't, like, want that, er, in my, you know, like, conversations with robots.


No place to, uh, hear it?




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: