- Apple is selling, and accepting admissions for, its AI audiobook transcription service.
- The voices are sometimes annoying, but mostly impressive.
- They’re good for accessibility and indie authors, bad for human voice actors.
Apple's Books store now sells audiobooks read by AI-synthesized voices.
Depending on your point of view, this is either a massive boon for indie authors, who could never afford to pay for a full audiobook production, or yet another story of artificial intelligence (AI) coming to take over the jobs of human artists. But the real story is more complicated, with plot twists that include accessibility, Apple’s weird decisions regarding who can use the service, and whether it will make much of a difference at all.
“The advantage of this, of course, is that it’s a big win not only for those who prefer audiobooks but for accessibility as well. There are millions of titles out there that aren’t available to individuals with disabilities that make it difficult or impossible for them to read ebooks or print books,” author and journalist Dan Moren writes on his Six Colors blog. “The addition of easy-to-produce audio versions could open up a wealth of content.”
Before we get to the implications, just how are these voices? It's one thing creating a voice that can read back answers for a voice assistant like Siri and quite another to come up with a voice that won't start to drive you nuts after five minutes. A good audiobook narrator brings life to the story and characters, but above all, they have to get out of the way. If their grand performance distracts from the story or starts to grate, then they've failed.
You can take a listen to Apple’s audiobook voices without having to purchase a book, and for me, this is already enough to know I don’t want to listen to them. For example, Madison, a “fiction/romance soprano,” drags the final word of each sentence and drops its voice a little lower at the same time. It sounds great the first time, lending a realistic touch, but as soon as you realize it’s there on every single sentence, it’s distracting.
There are millions of titles out there that aren't available to individuals with disabilities that make it difficult or impossible for them to read ebooks or print books.
Jackson, the "fiction/romance baritone," is better, without any repeating tics in the delivery. It's quite impressive, in fact. I could listen to it quite happily. And Helena and Mitchell, the "self-development" voices, are equally good.
The process of submitting your book and getting it processed is—as you'd expect from Apple—somewhat arbitrary. To start, you can only "nominate" your title for consideration, like this was the book Oscars. Then Apple will evaluate them to ensure they meet its criteria, including "content compatibility" (not too many foreign words), which could be due to the limitations of Apple's current AI voice capabilities.
The other oddity is that indie authors cannot submit their books direct to Apple. While they can self-submit e-books to the Books Store using Apple’s iTunes Connect tool, Apple only accepts independent audiobook submissions via third-party services Draft2Digital and Ingram CoreSource.
Given that this service is most appealing to indie authors who wouldn't even consider an audiobook version otherwise, this seems backward.
“Apple is only offering this to big titles who can afford to produce a professional audiobook anyway,” independent author Graham Bower told Lifewire via direct message.
If you go with Apple as your audiobook provider, it will be exclusive to Apple, says Bower. "Because the whole point of using Ingram and D2D is they're cross-platform (you submit it to them, and they submit as an intermediary to Amazon, Kobo, Google Play, and Apple). But Apple's AI audiobooks are only Apple," says Bower.
One assumes that some of this is down to the newness of the whole service. Apple is still in the testing phase. But if it shakes out, this could be a great tool for indie authors and a boon for accessibility.
As for the audiobook voice actors? If you listened to the demos, you’ll have heard how good these voices are. And unlike open-ended AI services like ChatGPT and Stable Diffusion, which are tasked with creating arbitrary art, AI-based text-to-voice is a relatively closed environment, which makes it all the easier to get results that could put humans out of a job while only offering very limited diversity in author voices. It doesn’t look good.