Human Actor or AI Text-to-Speech? The replacement of human narrators with AI-generated voices in audiobook production is a highly controversial topic in today’s authoring world.
(This article is part of my Udemy Course, Becoming an Author: The Master Class Series. If you are reading this article as a standalone piece, you might consider joining the course and learning the specific skill of producing Text-to-Voice Narratives. You can also take advantage of the sessions that came before this and those that come after.)
Cost Can Be a Big Factor in Your Decision
As an emerging author, my effort to understand the craft led me to analyze the publishing industry, which became a key factor in my decision to create an authoring system, develop a course on it, and offer it to you on Udemy. If all you want to do is write stories for your own benefit or to share with friends, then any effort is probably sufficient. If you want to make money from your craft, a serious evaluation of your skills, methods, and the various ways to capitalize on your work is essential. The math matters, and decisions will need to be made from a business model perspective.
There are roughly 4.2 million books published every year, and traditional publishers account for only about 10% of those, most of them by authors already under contract. Fewer than 2% of books submitted by new authors are published, and 95% of those submissions are rejected immediately due to poor formatting, grammatical issues, or the lack of a well-developed plot. Even among the polished manuscripts pitched by agents, fewer than 5% are offered a traditional contract. The decisions on the part of agents and publishers regarding whether to accept a book are often made on non-literary grounds. It has long been known that if the author or plot is not from the demographic they want to promote, getting represented is almost impossible. I was even informed by one agency that they didn’t represent people like me. Whatever that meant.
Most self-published authors sell fewer than 100 copies of a book in their lifetime; only 5% sell more than 250. Even if they can be published traditionally, the sales numbers are around 3,000. This means the median income per book is about $6,000 over an author’s lifetime, and most of that is consumed by production costs unrelated to printing and distribution. A human actor typically charges $3,000 to $5,000 to narrate a 100,000-word book. Professional editing can cost between $3,000 and $6,000 for the same book. The result is that editing and audio production can cost an author their expected income, which they pay in advance, with no guarantee of recouping that investment.
The investment discussion is one part of the financial analysis; the cash flow of a book is another. I know that I am not the norm, but I wrote and published nine full-length novels in six years, have seven more in development, and created about a dozen short stories beyond those. In the traditional world, that represents far more than $100,000 in editing and narration costs, and that doesn’t even consider promotional and marketing expenses.
Learning to Use Text-to-Speech Generation is an Investment in a Developing Technology
This is one of the biggest factors in my decision to use T2V rather than a human narrator for my books: I wanted to become skilled with that technology, even if I chose to employ human actors to narrate some of them. In just the past year, we have seen huge improvements in this technology, and I anticipate that, for some types of narration, the next few years will bring us to the point where you can’t tell whether you are listening to a human or not. One example is a favorite Brazilian Bossa Nova track I listen to almost daily. The melody and the singers’ voices are incredibly relaxing. I was shocked to learn recently that it is AI-generated. Now I listen to it and try to find any hint that it isn’t a real woman singing. It is perfect.
Using T2V Helps a Writer Become a Better Author
As I have explained in my course, using technology to perform work that would normally be outsourced can be a great help in a writer’s development toward becoming an author. It is a critical element in the system I created, which was instrumental in my ability to publish books and stories that have received excellent ratings. As I shared before, to get a fair determination of my system’s value, I couldn’t rely on book sales or ratings on publishing platforms. For most authors, the numbers are too few to constitute an accurate sample, and there is far too much manipulation of those metrics for me to rely on them. What I did was create some stories across different genres and place them on what I call the “Free to post, free to read” websites available on the internet. I did that under pseudonyms for various reasons, including a desire to keep the metrics clean of bias. The results showed that I reached more than 50,000 reads per story in the first month and received ratings ranging from 4.86 to 4.93 stars. Further, my rankings among the thousands of stories submitted the previous month placed three of my stories in the top ten. According to the platform’s tabulations, these levels of achievement are nearly unprecedented, especially for a new author with no prior following on the site.
One Important Factor That Affects Your Decision is Whether the Narration of Your Story Should be Acted Out or Just Read
Some stories require a human voice to accurately convey mood, passion, suspense, or perspective. In those cases, the use of a human voice can be critical. I don’t wish to imply that the voices available on T2V sites don’t sound human; they do, and in many cases are specifically constructed with tones and inflections that are designed for optimal appeal. That means they sound better than most human voices. What they can’t provide is the depth or range of emotions specific to some story’s themes, if that is pertinent. In some cases, however, simply reading the manuscript is sufficient, or even preferable, if you don’t want the result to be audio that sounds like an old radio show. I host The First Chapter Podcast, a show that introduces authors to new audiences. Some of the books would not come across the same way if a non-human narrator were used, because they require a sultry, ghostly, or perhaps childlike voice. AI isn’t going to deliver that, at least in its present form.
Other books don’t have that need, as they are historical, technical, or subject-driven. In those cases, it could be argued that using a non-human voice might offer advantages, especially if the author reads their own work out of budgetary necessity. Remember, today, in many cases, it is almost impossible to distinguish human from generated voices. These judgments can also be personal. In my case, I have listened to many books by highly published authors that I didn’t like because of the human narration. Personally, I find it distracting when the narrator changes their voice to match the character. Men adopting a woman’s voice, or a woman a man’s, isn’t something I find appealing. I would rather have had those stories read than “acted.” In that case, given the technology’s excellence, it is hard to justify the cost of hiring a human.
There is another option, and that is to hire different actors to “read” the parts of the characters, then one to be the narrator. The cost of talent and production for that option is outside of most people’s budgets, but there are T2V platforms that offer nearly every language, accent, and dialect. Suppose you wanted to spend time creating an audio that uses different voices to represent your characters. That is quickly becoming a reality, and I am aware of developments that will soon enable audio and video productions that cannot be distinguished from productions by human actors. In truth, much of the video content we see in movies is created using green screens and actor avatars. As alarming as it might seem, data processing and AI are where we are going, and the industry is spending trillions of dollars to develop them.
Determining Whether the Best Narration is Human or Technological Might Depend on the Subject Matter
Some types of work contain language or accents, technological or cultural words, international locations, or even slang. In those cases, it can be difficult to decide which option is least problematic. Finding either a human actor or an AI-generated option capable of doing justice to a book on medical technology, virology, zoology, or military terminology would be a challenge. I have listened to human narrations in which the actors didn’t convey the subject-specific words accurately. Obviously, that can also be the case with T2V options. In some of those cases, the best option is to have the author narrate their own work. If the author’s voice or abilities are insufficient, then the challenge is to find someone within that industry or culture who can perform the work. Platforms that offer different actors can be of help in finding someone.
There is No “Perfect”
In the end, converting a script to audio or even video will leave something lacking. How many times have we watched a movie and commented that it wasn’t as good as the book? The human mind has an amazing ability to visualize. There is simply something about the written medium, if done with excellence, that allows us to interpret the story in its best version. It is valuable to have your work in audio. In today’s multitasking world, fewer people take the time to read. I drive a lot, and when my wife and I are on the road with our RV, one of our favorite pastimes is to listen to one of my stories. In many situations where you want a story, reading just isn’t advised. I think this conversation and discussion over what is best, human efforts or technology, will be with us for a long time. I, for one, grieve the loss of reading skills and attention spans in the younger generation. Be that as it may, there is little that I find more relaxing than sitting in my garden on a warm summer evening and listening to one of my stories as I watch the hummingbirds feed.
In this class, I will demonstrate how to edit Text-to-Voice translations and produce the audio files for publication.
Don’t be one of the 95% of authors whose work is rejected due to a lack of merit. These skills can be learned. For the first ten sessions of the course, and to get the rest of the classes in Becoming an Author: The Master Class Series, use this discount coupon and pay only $19.99 for a $150 value! Save yourself thousands of dollars and learn to perform every aspect of authoring your work. In the coming weeks, more class sessions will be uploaded as they become available.
https://www.udemy.com/course/becoming-an-author-the-master-class-series/?couponCode=VSCAMPBELL
That coupon is a limited-time offer, but I might extend it if there is enough demand. If it expires, please contact me at [email protected], and I can provide you with the current discount.