Transcription and Voice Recognition Tools
This is a comparative review of transcription and voice recognition tools including speech-to-text (voice recognition using Google and Dragon), paid transcription services, and slick technologies to make the DIY transcribing process more efficient. I write memoirs and do family history work for clients, so I have spent a ridiculous amount of time experimenting with methods to transcribe interviews and audio. It is an important but tedious process. The best answer for me varies depending on the particulars of each project. In short, I’ll show you how to do dictation and transcription using the most cost-effective and efficient tools.
As a note, recently I had 6 audio hours from a client day spent telling stories. It would have taken me forever to transcribe myself so I clipped the audio into half-hour chunks and sent them out to various services, while transcribing some of the work myself using different methods. This got the job done, and also gave a good side-by-side comparison of the costs and time involved with each approach.
Google Speech-to-Text, a free online tool
If you’re not a great typist, text-to-speech technologies can be wonderful, and Google launched its free service to much fanfare. Its powerful voice recognition does not, however, allow you to upload a file to be transcribed. Some people suggest that you can play audio next to your computer microphone and have it transcribe, but when I tried, the result was garbage.
There is a workaround method. You can listen to an audio file and speak the words aloud. I have found that it is most efficient to listen to the audio on my Android phone with headphones using a free app called “Easy record Rewind Transcription” (see brief review below). This app is handy because common music players do not have good pause and rewind buttons. Another important feature for transcribing is auto-rewind a second or two after hitting pause.
These are the steps for using the workaround dictation method:
- Open a Google Doc using Chrome as your browser. Other browsers won’t work/
- Make sure your microphone is on and functioning. Side note: This step is a bit buggy on my Mac and often have to monkey with the settings until it will read my external microphone.
- Click Tools on the navigation bar, select “voice typing.” Then click the large microphone icon that pops up.
- Listen to the audio file using your phone or other device with headphones on. Without headphones, Google would hear your warm voice plus the audio playing in the background. Messy!
- Then start speaking what you hear.
Here is a video that shows me actually doing the listen/dictate process using Google speech to text. My body is not seen in the frame because I am sitting in the chair facing the computer, but I am holding my phone and speaking into my desktop computer microphone. You can’t hear the audio because I am listening with earphones (otherwise two voices would confuse the program). You can hear my voice saying the words I hear, and onscreen Google is doing a reasonable job of taking dictation.
It does a decent job–not as accurate or fast as Dragon–but hey, it’s free. Also, you shouldn’t need a powerhouse computer. This method takes me about the same amount of time as typing a file using oTranscribe, or 1 hour for 30 minutes of audio. (My typing test speed is 85 WPM).
Don’t forget: you have to use Chrome as your browser.
Video of me dictating an audio file to Google: google speech to text best
I use Dragon Naturally Speaking voice recognition software a lot (Actually Dragon for Mac, but have used the PC version too). Dragon was a game changer for my workflow, and I did not expect that. The cost as of writing this article is $60 for the home version on the Dragon site, about $40 on Amazon, $267 professional on Amazon, and $300 for the Mac professional version I have not used the home edition so I cannot say why it costs less.
There are four ways I have used Dragon software.
Method #1 Train it to your own voice:
Dictate your own story, emails or other documents using your voice. This is the software’s real strength since it is set up for you to “train the Dragon.” You read stories and it gets smarter by adapting to your own speech patterns and accent. Another amazing feature is running documents you have written and sent emails through it. This teaches the software phrases and acronyms you commonly use. Once I took the time to configure Dragon and learn the voice commands (“Go to sleep” or “Scratch that”), I have found Dragon Dictate to be very accurate in dictating my speech faster than I can type (85 words-per-minute type test speed). It becomes more efficient if you combine real-time keyboard and mouse along with voice commands. I did not believe I would like it as much as I do, but now I use Dragon to dictate emails and other documents.
One note is that I like using a headset or my podcast grade Blue Yetti microphone that sits closer to my mouth. Although the internal mic on my Mac is pretty good, it still strains my voice after a while if I try to speak loud enough.
There is some learning curve in setup and becoming good at dictation through use of commands, but the payoff for me has been real.
Method #2 Dictate on the fly:
The second way to use Dragon is to dictate into a digital recorder or the app. Then you can upload files to be processed by the software. The principle is the same as real-time dictation except there is no ability to make corrections and combine keystrokes. This means it is not as accurate, but portability is essential sometime.
Method #3 – Process a file in someone else’s voice
It is possible to create a profile for different speakers and to upload an audio file into the software for processing. This method is a lot less accurate and there will be no punctuation. Because of these limitations, if I need a clean transcription it takes more time to clean up Dragon’s work as to just transcribe it in the first place, especially if the interviewee has an accent or the audio sounds far away.
However, if I am doing a large number of interviews and only need the basic gist, I run all the audio files through Dragon as I go. This gives me enough reference to be of use later. (I have found that if I spend just a few minutes editing major words, it improves the ability to search keywords later.) So when I am working on writing a full life history or memoir, with many details from interviews I may want to revisit later, Dragon’s rough-cut accomplishes that. After I return from an interview, I run the audio file through Dragon, usually starting before I go to bed since it takes a while to process. In the morning, I paste the new transcription into a master Word document. Later, when I am in the thick of writing, I can search by keyword and find related conversations and refresh my memory on details. Here are Dragon’s settings:
Method #4 – Simultaneously Listen and Dictate:
If I want to do a more accurate transcription of another speaker with more accuracy, the same workaround described with Google speech-to-text applies. I listen to the audio (I use “Easy record Rewind Transcription” to listen on my Android phone and speak the words aloud, like doing simultaneous interpretation Using this method, it took me 45 minutes to dictate a 30-minute segment, plus a little time for uploading and saving files. For comparison, it takes me an hour to type a 30-minute segment using oTranscribe (reviewed below). As one might expect, the result when dictating with my trained Dragon was a little more accurate and faster than with Google speech-to-text. For example, Dragon did a much better job recognizing punctuation commands in my voice.
Warning! Dragon is a Resource Hog.
Speech recognition is powerful software, which means it needs resources to run. I learned this the hard way on a four-year-old PC at work and a Mac of the same age at home. Installing Dragon ground both machines to a halt. Not only would the software not function properly, but it gobbled up so much capacity that it hosed my whole computer. I ended up rebuilding my Mac so I could function again, minus Dragon, and upgrading my machine at my day job. Recently I bought a powerful new Mac desktop for home and sprung for the latest version of Dragon. Now it runs like a dream and I love it. The software upgrade was enough of an improvement on the prior version to be worth the money. They seem to come out with new versions of Dragon about every year, and because at the field of speech recognition is still developing, each upgrade begs installation. It can be frustrating, though to keep shelling out money. For these reasons, occasional users may want to stick with free Google voice-to-text.
Lesson: If you don’t have a fast machine, user beware.
For listening to audio files for transcribing, I haven’t found anything better than this app for my Android phone. (Note, this does not actually transcribe, I use it for listening only). Unfortunately I cannot find an equivalent for the iPhone or iPad so if someone knows one, please comment on this article.
Seriously, it’s a huge pain to transcribe a file listening to iTunes because the controls are not designed for pause and rewind and I’m constantly losing my place or wasting time rewinding too far. The primary strength of this app is the way it automatically backs up a couple seconds when you hit pause (customizable).
The way I use this app is to go into my DropBox account installed on my phone, and select the audio file I want to transcribe. Then I click “open with” and select EasyRecordTranscription.
It does freeze on occasion, requiring a restart and loss of where I was in the file. But overall it has made my work more efficient.
You can download the app from the GooglePlayStore here.
In the olden days when I worked at a law firm, I typed dictation using a foot pedal dictation machine and headset. The machine used little tapes, and the foot pedal setup was was very efficient. These units can can still be purchased for transcribing digital files.
Along those lines, oTranscribe is a free online app that has keyboard shortcuts instead of using a foot pedal. You can listen to a digital file and type right in the program online. When I am done, I paste the text into Microsoft Word. Also, oTranscribe does a good job saving the file as you go. When I close the browser and return later, the file is still uploaded and I can begin where I left off (I didn’t expect that). It does occur to me that these files are hanging out there on the Internet somewhere, so when I am processing a confidential or sensitive file, I work offline using Dragon in Microsoft Word instead.
I hope I haven’t confused you about when I use oTranscribe and when I rely on the Android app to listen, so here is a note for clarification. You cannot use Google speech-to-text or Dragon Dictate together with oTranscribe online. Google speech-to-text must pair with a Google Doc, and for Dragon I dictate into Microsoft Word. (You can use Dragon’s own text editor too). Hence, I open oTranscribe solely for the straight-up process of listening to and typing up a file, no voice dictation or speech recognition involved.
Bottom line, oTranscribe has been a great productivity boost for when I want to physically type up an audio file such as if my voice is tired and I need a break.
For the sake of comparison, using oTranscribe takes me an hour and five minutes to transcribe a 30-minute audio segment (my typing test speed is 85 WPM). Here is a screen shot of a transcription I did a while back:
TranscribeMe is a big professional service online for transcribing audio. They have a “first pass” service that I understand uses a combination of machine and human transcription to save money. The cost for this service is $0.79 per audio minute. I sent in four audio files that were each about 30 minutes in length, and was surprised at the quality returned–not perfect, but really quite good. I submitted the files at about 1 p.m. on a Saturday and all four were returned by 8 a.m. on Monday. Bottom line: if the audio quality is decent and you don’t need 100% perfect transcription, I was pleased with the mix of quality for the price. I would definitely use this service again when I don’t have time to transcribe something myself. Here is a link to their website.
Rev is another big transcription company online, with straightforward pricing at $1 per audio minute. They use humans to transcribe, and of all the methods reviewed in this article, this service provided the most accurate, cleanest final result. I was very, very pleased and if I ever need to get a transcription close to perfect (and don’t have time to do it myself), I would use their service again. I submitted a file at 1:00 p.m. on a Saturday and it was returned to me at 5:00 p.m. the same day even though I did not request expedited service. That particular file was 30 minutes in length, so the cost was $30. The word count was 5,090. By experience, I know that it usually takes about an hour to transcribe 30 minutes of audio, but mine would not have been as perfect as what they returned so really it would have taken me longer for an equivalent result.
Here is a link to Rev’s website.
Keep in mind that if you would like to hand a file or even a digital recorder over to us, we would be happy to handle the files and transcribe your audio in a professional and confidential manner. We charge $1 per audio minute plus a file handling fee if we are pulling files from a device. Note that our service is quite accurate but we do not guarantee 100% perfect because it may be impossible to decipher every word. We also filter out “um” and filler words. In other words, we do not specialize in court-reporting perfection but you will get a nice document in a format suitable for family history purposes. Click here to request this service.
This online service uses machines to transcribe audio. I haven’t used it yet, so I invite comments by anyone who has. Here is a link to their website.
This handy service allows you to tape any call on an iPhone or Android. There is a per-minute option, or current pricing as of 2017 is $7.99 per year for unlimited recording. I haven’t used it yet, but know I will in the future, and will write a review then. Click here the TapeACall website.
Did you know that YouTube can transcribe the words that are spoken in a video? This was designed for closed captioning, and also for search engines to pick up your video content. One of the coolest features is to transcribe into another language. So if you are working with a video interview instead of audio, this might be the way to go. Alternatively, you can convert an audio file into a video format and then run it through YouTube, but online reviews suggest that the accuracy isn’t all that great. It seems like a lot of steps to convert audio into video, then upload to YouTube and run it through, especially if the results are mediocre. Many of the comments mention that they couldn’t get it to work. Still, I mention it as an option, especially if language translation is important, or if you are working with video already. If you want to give it a whirl, just do a search on YouTube and there are some good video tutorials showing how to get a transcription using YouTube. Here is a link to a 6 minute tutorial with over 100,000 views. https://youtu.be/iWNCPj5jTWM
Fiver.com bills itself as “freelance services for the lean entrepreneur.” Services start at $5 a piece and there are hundreds of individual vendors of transcription services. I haven’t used it, but it is certainly an option. https://www.fiverr.com
Amazon Mechanical Turk (Mturk):
Amazon offers a facilitation service a little like Fiverr but would be better suited for more involved projects. There are transcriptionists available in their network. This is another service that I have not used, and I saw mixed reviews online. One of the complaints is that it can be a bit unreliable. You really don’t know who you are going to get to do a job. College students might be earning some extra cash but can be notoriously flaky in completing freelance jobs. Here is what the official Amazon website says about the service at https://requester.mturk.com:
Mechanical Turk gives businesses and developers access to an on-demand, scalable workforce
- Flexibility: Scale your workforce up and down quickly
- Accuracy: Get high-quality, cost-effective results
- Speed: Start receiving results in minutes
The reality is there is so much more to transcription than meets the eye. Hopefully some of these resources have answered any questions you have and if you have other questions or thoughts about what you have used that works well, please let us know in the comments!
Rhonda Lauritzen is the founder and an author at Evalogue.Life, where we tell personal and family stories that inspire. (Let us help you tell yours!) Rhonda lives to hear and tell about people’s lives, especially the uncanny moments. She and her husband Milan restored an old Victorian in Ogden and work together in Evalogue.Life, weaving family and business together.