Recording a voice-over is challenging enough. You go through way too many takes to get what you want. You don’t have enough time to rehearse and hit your tone and intention targets. You read endless audio editing software guides to make sure your voice sounds good. And even if you nail all of these things, if you don’t have access to a studio, your perfect performance will be riddled with background noise.
So should you give up and hire a voice actor? Not yet: AI voice generators can deliver impressive results. These AI text-to-speech apps have been picking up on quality, realism, and controls, helping you create a natural rendition of text without even having to plug a mic into your computer.
I spent a few weeks testing all the AI voice generator tools I could get my hands on, and based on my experiences with them, these are the six best.
The best AI voice generators
What makes the best AI voice generator?
The best AI voice generators are pretty easy to spot: the generated speech sounds natural and realistic, almost (almost!) as if a real person is saying the words.
Beyond this intuitive check, each platform offers a range of settings that help you steer the generation, such as pronunciation, pitch, volume, or pace. And if you’re planning to go full AI voice, you can learn Speech Synthesis Markup Language (SSML) and mark how each word should be performed with the highest level of control. Don’t overdo these, though: it can reduce the quality and realism of the output.
With that in mind, here’s what I looked for as I was testing the best AI voice generators:
-
Realism. These text-to-voice apps offer realistic speech, with variations, natural changes in tone, and adequate pauses.
-
Available controls. Pitch, volume, pace, and pronunciation controls, among others, will let you tune the generation to your needs.
-
Audio quality. I looked for the highest export audio quality possible, so you can use these voices in any project.
-
Voice library. Multiple voices can fit a wider range of projects—including voices in other languages—so you can have greater flexibility as you work.
-
Extras. If any app has any useful extra tools for generating voice, such as audio-to-audio or AI model training, I took that into consideration. But I didn’t consider any AI video generation apps for this list, even though some do offer text-to-voice as an add-on.
I also went a little further. Before becoming a writer, I was an actor for ten years, and back in the day, I did a one-month workshop on voice acting and dubbing. I used that experience to judge these voices based on additional parameters:
-
Narration pacing. Humans make variations in reading speed, which is useful for adding emphasis or increasing engagement. Bad AI usually evens everything out, so I paid attention to the models that introduced the best variations.
-
Intonation. Intonation deals with the variations of pitch throughout sentences. The worst AI models make everything predictable, robotic, and lifeless—many were excluded because of this.
-
Emotional performance. Some apps let you choose sad, excited, or whispered renditions of the text. I excluded those that weren’t subtle, heavily over- or under-acting the script. Still, it’s hard for AI to give an accurate performance here, so if you need something nuanced, you might consider working with a professional voice actor.
I spent over three weeks signing up for every AI voice generator I could find. I used the same text in every one of them to better home in on the differences. I tried the controls to gauge their power and see whether they’d help me improve the final result. I saved samples from every app: there’s a link to hear a brief excerpt from each below.
When judging the best AI voice generator for your purposes, keep in mind that your audience will probably be paying attention to other details of your content as well. A few imperfections here and there are completely forgivable. With all this in mind, here are this year’s best picks.
The best AI voice generators at a glance
Best AI voice generator for hundreds of realistic voices
ElevenLabs (Web)
Listen to the results: ElevenLabs example output
ElevenLabs leads the pack with a voice library featuring over 300 voices—including licensable AI-powered versions of real people, like Christy Carlson Romano, TV actress and Disney’s Kim Possible.
With so many voices to choose from, it’s great to see good search and filtering tools. Click Voices on the left-side menu and then the Voice Library tab at the top of the screen. If a friend or coworker gave you a tip for a good voice, you can search for it by name. If you’re in a browsing mood instead, use the categories to filter voices based on style or purpose: from conversational to advertisement-oriented voices, there’s a bit of everything to fit any kind of project. On the right side of these categories, you can click to sort based on four properties, from trending voices to those who generated a high number of outputs. Right next to that, you have the advanced filters, which are great for further displaying voices based on category, gender, age, language, and accent.
As you hear voices you like, add them to the Voice Lab. This will let you select them in the speech generation tool, which you can access by clicking Speech. Paste in your text or upload an audio track, click the voice name dropdown to choose your voice, and hit Generate. If you’re not happy with the first shot, there are two major ways you can tweak:
-
The first one is by selecting a different AI model. Each model has a different range of settings, with one being better for multi-language generation and another for low-latency, for example.
-
Then, based on which model you selected, you can control stability (low setting means more emotional variation), similarity (low setting means more difference from the sample voice), style exaggeration (high setting amplifies the variation in general), and speaker boost (further grounds the output in the original AI training data).
Currently valued at $1B, ElevenLabs has the funding to grow into an even more powerful AI voice generation platform. It definitely has the flexibility and quality for that, even if the controls are less powerful than other platforms present on this list.
ElevenLabs price: Free for ~10 minutes of audio every month; paid plans start at $5/month (or $50/year) for ~30 minutes of audio and extra features like voice cloning
Best AI voice generator for human-like cadence
Speechify (Web, iOS, Android)
Listen to the result: Speechify example output
Cadence: the rhythm as someone reads a text, the spaces between words, and the overall speed. Speechify is ahead of the competition, generating in one shot a pleasing output that sounds like a creative, experienced voice actor. Calm, well-paced, with a good balance between variation and consistency.
The website’s home page may be confusing as Speechify brands itself as a platform for reading text out loud, mostly for productivity use cases. You can use it while you drive or take a walk outside. And with available voices such as Snoop Dogg and Gwyneth Paltrow, it’s fun to listen to a list of your favorite digital marketing blogs in the legendary style of the D-O-double-G.
If you want to generate and download voices for your projects instead, click the button at the top of the screen to go to Speechify Studio. While you can’t use the famous voices—boo—you’ll see that the existing options are top-notch. As you paste your script and start generating, you can increase or decrease speed, control pitch, change the volume, add custom pronunciation, and set pauses at different parts of the text.
There are two good extras here. If you usually create slide-based videos, Speechify has a tool that can put together a simple presentation. Just generate the voice, add a background music track, and export. The second one will let you add your voice to the platform, so you can generate sound using your own voice.
Speechify price: Free without the option to download; paid plans start at $24/user/month (billed annually) or $69/user/month (billed monthly)
Best AI voice generator for word-by-word control
WellSaid (Web)
Listen to the result: WellSaid Labs example output
Where other platforms go general, WellSaid Labs offers full control over sections of your script, down to word-for-word if necessary.
How does this work? Open the editor, and paste in your script. On the right-side tab, click Cues to open the controls. The words on the screen become outlined: click on the word or combination of words to select, and then adjust loudness or pace. If you select a comma or a period instead, you can adjust how long the pause should be.
When you’re finished editing one section, click anywhere on the central part of the screen to deselect it. You’ll notice that what you just edited is now underlined with color: if you changed pace, it’s green; if you edited loudness, it’s blue; for punctuation pauses, it’s purple. This is a good guide in case you want to come back and make adjustments. One word of advice: don’t make drastic changes—the biggest variations here can reduce the overall realism.
Pronunciation controls don’t live in the generation editor. Instead, look to the left-side menu, click Pronunciation, and add your replacements. Start by adding the original word, and then type out how it should sound instead—even if it butchers the spelling. There’s a learning curve and experimentation process around this, so make sure to take a look at the respelling guide.
To make the most of the tools here, there’s a Resources section with entry points to the most important topics in the documentation. There are step-by-step guides to help you get started, improving your voice generation workflow or working with pronunciations. And if you’re collaborating with others, you can quickly share a link to a project to gather feedback.
WellSaid Labs price: Free trial available; paid plans start at $44/month (billed annually) or $49/month (billed monthly)
Best AI voice generator for engaging speech variations
Respeecher (Web)
Listen to the result: Respeecher example output
Tired of hearing robotic speech that sounds like a long, boring straight line? Respeecher introduces variations that make the narration more interesting to hear, increasing how natural and realistic each voice sounds.
The best part is you don’t have to engineer this at all. When you input your text, you can try generating it with different voices or narration styles. Each generation will be grouped under the appropriate part of the script, with natural-sounding variations.
The user interface is unintuitive, so it was a surprise to find generation controls hidden away from the main editor screen. Click the Settings tab on the left side and tweak pitch calibration, emotional range, and general audio properties. When you change these, it changes all future outputs, so remember to come back here if you need something different.
In addition to pasting in your text or uploading an audio file, you can use your microphone to record it live. In this case, all the app does is change your voice to match the template’s, giving you full control over the performance of the text. If you have some acting experience or have natural talents here, be sure to give it a try.
You can train an AI model with your own voice or the voices of others, so you can play an entire cast of characters using your keyboard. As this could make deepfakes easier to produce, Respeecher runs a security check to understand who you are, also raising the monthly subscription price sharply.
I tried multiple voices with the same text, and there’s a more creative vibe here when compared with others on this list. This enunciation and voice style is a good match for cartoons and quirkier projects. This doesn’t mean it’s off-limits for serious business use, but it could turn off people looking for a more professional-sounding avatar. A drawback or an opportunity to differentiate from your competition? Up to you to judge.
Respeecher price: From $4/month
Best AI voice generator for narration style variety
Altered (Web, Desktop)
Listen to the result: Altered example output
Narration style acts as a general pitch and rhythm change to convey a unique feel to the generated text. The app that has the widest range of options here is Altered. Beyond style, the platform has more possibilities than others on this list, so it’ll take you a bit longer to get familiar with all the corners. Let’s take a walk through everything you can do here.
Real-time morphing enables the Altered Virtual Microphone, changing your original voice to that of an AI avatar in real time. A fun thing to do when you’re 14 and chatting online with your gamer friends, but business-oriented grown-ups can use it to record this voice directly into another audio editing app, streamlining the workflow.
Post-production morphing is a fancy name for audio-to-audio generation. Add a recording of a text, choose the target voice, and hit generate. Download the results, and plug them into your project.
Rapid voice creation lets you add clean 4- to 8-second clips of a voice to the platform, so you can clone it and use it for generation. (Terms and conditions apply.)
Text-to-speech opens the expected editor to input your script and select your voice. Narration styles depend on the one you choose, so click through each to see the main differences. The possibilities here vary between “Just Below Neutral” for consistency to “Positive, Shout” for emphasis and energy. Mind that, depending on the script and the tone you choose, the results may be inconsistent, strange, funny, or all of the above.
Finally, Altered also packs an Audio Editor with a cool amount of controls. You can upload your audio—any kind of audio—and access transcription, speech generation, or noise removal, among many other possibilities. The learning curve is a bit steep here, as this screen has a real audio editor vibe: be sure to open the docs and use them as a companion.
Altered price: Limited free plan available; paid plans from $6/month
Best AI voice generator for emphasis control
Murf (Web)
Listen to the result: Murf AI example output
Try this simple beginner acting exercise: pick a sentence from this article, and read it out loud. Then repeat it emphasizing a different word each time. As you do so, notice how the meaning and feel of the whole sentence changes. Murf lets you do this for your AI-generated voices.
The emphasis control button is easy to miss. When working on a project, start adding text to the first block. As you do so, take a look at the icon to the left of the play button—it looks like a comment icon—and click on it. A pop-up appears with a sequence of all the words in that block, with a high-medium-low scale: click anywhere to add a point. Where you click matters, so experiment with adding points in the left/right and top/bottom axes.
Beyond these controls, you can adjust general speed and pitch, add pauses, or add custom pronunciation. If you choose the Ken voice, you’ll also have access to the widest range of narrative styles, a total of nine, from Storytelling to Sad. I tried the Sobbing setting expecting a bad result but was surprised by the subtle acting. Good one, Ken.
When you look to the bottom of the screen, you can expand the timeline to reveal more features. You can add video and music directly into the platform to produce content and export it directly from Murf AI, ready to share. As you move your content strategy forward, you can invite your teammates and collaborate on voice generation projects: anyone can leave comments on each script block, so you can keep tweaking until you reach the best result possible.
One last word of advice: the voices on the paid plan sound much better than the ones on the free tier. If you’re serious about voice generation and like Murf AI’s controls, consider investing sooner.
Murf price: Free for 10 minutes of voice generation and 2 projects; paid plans start at $23/month (billed annually) or $29/month (billed monthly)
Does OpenAI have an AI voice generation model?
Yes, the creators of ChatGPT are in the game. The only way to use the OpenAI text-to-speech is via API, requiring a bit of tech-savvy to set this up.
They also have an AI voice cloning model that’s reportedly so powerful that it’s not available for general use. (Yikes.) There’s no estimate as to when a commercial version will pop up. Read more in the official blog post on the challenges and opportunities of synthetic voices.
Are AI-generated voices legal?
All the platforms on this list offer a collection of voices that were created by fine-tuning the training data or modeling a real person’s voice with their consent. Using these voices is legal, provided you remain within the service and licensing terms of the app you’re using.
The main problem lies with AI voice cloning. With just a few samples of a real person’s voice, anyone could tune an AI model to talk like anyone—including famous people. And including you. Creating and using these deepfakes can lead to identity theft, manipulation, misinformation, blackmail, or infringement of copyright laws (when talking about artists and their work).
Depending on where you are in the world, there may be legislation to control these kinds of uses, meaning there are legal consequences if consent isn’t secured or if the voice is used with criminal intent—or in a way that can be interpreted as such. If you’re cloning someone else’s voice and using it to generate with AI, always secure their (preferably written) consent before using the outputs.
Speaking without a mouth
With an AI voice generator, you can turn scripts into a flowing narrative, ready to add as a voice-over on a video, without dozens of takes and without hiring a production team.
All the platforms on this list offer ways to try out the features and voices, so pick one of your scripts and run your tests. It’s also important to find one that has controls that make sense to you, so take some time to feel how each one works. Now that you can speak using just your keyboard, what will you create next?
Related reading: