I haven’t used speech recognition software since 2001. Back then, I was working in a hospital and we were experimenting with Dragon Naturally Speaking (now owned by Nuance)as a way to streamline the medical transcription process. The software was interesting, but far too primitive to save time for anyone who typed with more than two fingers. When I was offered a review copy of MacSpeech Dictate, I was intrigued. How much of a difference would eight years of development and processor speed bumps make on speech recognition?
Evidently, (thanks to Moore’s Law) eight years makes a huge difference. I have honestly been blown away at how accurate the software is. In fact, after using it for only 30 minutes, 99% of the errors I encountered were because I skipped reading the manual and was simply trying to guess what the “control” words were. By control words, I mean the commands that tell the software to capitalize something, select your last sentence, move around the document and handle punctuation. For example, saying “quotation mark” makes the words “quotation mark” to appear on your screen. With a little experimenting, I discovered that saying “open quote” and “close quote” gave the desired punctuation.
The actual errors (where I say one thing and it types something else) don’t seem any more frequent than the errors I make while typing normally. However, until you get used to the editing commands, it can take a lot longer to correct something by voice than using the keyboard commands. If you want to really get the most out of it, you need to spend some time learning the editing commands.
Mac Speech Dictate (amazon) is actually pretty smart about understanding what you mean based on your pauses. For example, in the previous paragraph, I was able to type “open quote” simply by saying the words close together for the punctuation and with a slight pause for the actual words.
Using voice recognition software is quite a bit different than simply recording what you want to say and turning it over to an assistant to type it up. The software can’t figure out where you want to end your sentences or put punctuation. A real-life person will have no trouble figuring out these types of things. Software isn’t quite as smart. It can’t understand the meaning behind your words. However, MacSpeech Dictate does a very good job of understanding context.
MacSpeech Dictate (amazon link) attempts to figure out what word you want based on the words that come before and after. This seems to work surprisingly well. The software makes it very easy to do “phrase training,” where you teach it not only how you pronounce a particular word, but the context in which that word is used. Video of phrase training.
I was impressed that the software came with its own USB mic. I have a Plantronics microphone that seems to work just fine with the system. You will need a quality microphone to get good results. You need something that isn’t going to pick up all the ambient noise in the room. Even with a good microphone I noticed quite a drop in accuracy after switching on a small (but noisy) fan.
If you don’t like the included microphone, I would still suggest getting some sort of headset mic. You want the pick up to be near your mouth to get the most accurate transcription. MacSpeech sells an interesting looking Bluetooth microphone on their website. You can use as a handheld or plug in a cell phone style microphone and earpiece.
What I found to be most interesting about dictating instead of typing is how you must think differently. I am so much in the habit of typing out what I’m thinking that trying to say it seems unnatural and requires a great deal of thought. Part of this might be due to the fact that I’m more aware of how my sentences sound when speaking and watching them appear on the screen. This is probably a good thing and may make my writing a bit more natural. I would also assume that some of my mental slowness just comes from the awkwardness of using a new tool for the first time.
The above paragraph was written when I first started using Dictate. After using it a bit longer, I have a different theory. I don’t think that voice recognition is making me think more slowly. I think that I normally think slowly because I have to wait to type what I’m thinking. I type pretty fast–around 60 to 70 wpm, so I have always thought that I was typing about as fast as I could think up what I wanted to write. Now I’m not so sure. I think it feels slower to use Dictate because I notice how often the computer is waiting on me to decide what I want to say next. When I’m typing I feel like I’m thinking quickly because the bottleneck is my ability to type. When I’m using Dictate, I feel like I’m thinking slowly because the bottleneck is my ability to think of the next sentence.
On the downside, looking over what I’ve written so far, I think I tend to be much more wordy when speaking than typing. Who knows? Using this for a while may make my speech more succinct.
The part of your brain that is active when typing on a computer is different than the part that is active when writing with a pen. I would guess that writing by voice uses still different areas. This might not make a huge difference, but it might help you if you ever get writer’s block and want to switch to a different method just to change things up a bit.
MacSpeech seems to work pretty much anywhere you can put your cursor. It ties into Apple’s assistive technology framework, so it is pretty well integrated into the operating system. This means that in addition to using MacSpeech Dictate’s Notepad program, you can type directly into WordPress to post on your blog, iChat for instant messaging or even try reading numbers into Excel. (Although I’m not sure I’d recommend that.)
Dictate comes with a word-processor that they recommend for doing dictation. (Dictate also works with other programs like Word, Mail, web pages, etc.) The word-processor seems to slow down a lot if you leave it open too long. Shutting it down and starting it back up seems to solve the problem. I also had trouble opening items in the Open Recent list. For some reason, they wouldn’t launch no matter how many times I clicked on them. I was able to open documents using the Open command so I’m not sure what was happening there.
I was surprised that I couldn’t find a way to import an audio file. There doesn’t appear to be a way to talk on a portable device and import it for transcription, later. You might be able to play it back in real time through your microphone port, but it seems like that would be less accurate. Previous versions of iListen (the predecessor to Dictate) had an import capability, so it seems odd that it isn’t in Dictate.
I once found an issue where Dictate wanted to add a capital letter A or S at the end of the line whenever I said PERIOD. This happened when I was typing into a wiki and I couldn’t seem to get it to do it again later. I’m not sure if this was an issue with Dictate or something funny happening with my operating system. Regarless, it was only a minor inconvenience and went away the next time I tried it.
Tips for speech recognition
- Make sure you have a good quality microphone. Speech recognition does not work very well if the computer can’t hear what you are saying.
- Reduce the ambient noise by closing your office door or turning off noisy equipment. In particular, you don’t want to have a bunch of people talking while you are trying to do dictation.
- Read the instructions. In particular, make sure you understand the control words. If you have to jump back and forth to the keyboard in order to create punctuation, edit (video of editing)or navigate, it will slow you down drastically. You also need to make sure you understand how to “train” the system so you can quickly correct anything it doesn’t understand out of the box.
- Give it some time. Speech recognition will take a little while to get used to. This isn’t so much because of the technology, but it requires you to think in a different way than when you are typing.
- Proof your work. If you are talking away and not watching the screen, little things can slip through. Of course, if you are typing anything important you should be proofing that as well. The biggest mistake I ran into was when the computer heard me say “can” and I meant to say “can’t”.
- While I think it would be terrible to use speech recognition software as an excuse for not learning to type, it is a great option for people who do not have the physical ability to type.
- Some people are using MacSpeech Dictate to transcribe written diaries into text. Optical character recognition software can’t read most handwriting — at least not very accurately. A human reading handwritten text into good speech recognition software can be pretty accurate and efficient.
- People with repetitive stress injury, Parkinson’s disease and arthritis are using speech recognition software to reduce the amount of time they have to spend with their hands on a keyboard.
Pricing, Version and Mics.
MacSpeech sells their standard version of Dictate for $199. Nuance also sells it along with the Windows based Dragon software.. It is available on Amazon for about $50 less. This includes a microphone with an adapter to plug into the USB port. For testing, I used a Plantronics MX-500i microphone because I prefer the smaller size.
MacSpeech also sells legal and medical versions available for an additional cost.
If you use a PC, you might want to check out Dragon Naturally Speaking from Nuance. It is a different program, but it shares the same speech recognition engine that MacSpeech Dictate uses. Nuance is the company that owns the recognition engine and licenses it to MacSpeech. The Dragon Naturally Speaking software has some additional features. Depending on what version you have it will do things like:
- Save an audio copy of what was said so you can go back if there are any questions.
- Automatically transcribe an audio file once it gets placed in a specified folder.
- Use Text to Speech to read things back to you. (OS X has this feature built in to the operating system.)
- Can be used with a handheld digital recorder.
- Natural punctuation. (This will automatically put in periods and commas.)
Dictate is amazing. I am very impressed with how well it does at transcribing audio with very little training. If you type fast, tried voice recognition 5 or 6 years ago and gave up because it didn’t save you any time, you may find that the improved accuracy makes it worthwhile (especially if you’ve developed carpal tunnel over the last 5 years). If you type slowly, the software could pay for itself very quickly.