Full disclosure: I'm a stenographic CART provider, so I'm in direct
competition with this sort of service, which sells a "voice captioning system", consisting of voice recognition software, a computer, a voicemask microphone, and the promise that it will only take a few hours of training for an employee of the college to achieve realtime speed and accuracy.
That said, I really recommend you get a live demonstration of this
technology before you buy it. I work for a university whose disability
accommodations department bought a system very like this one; I don't
know whether or not it was exactly the same. I do know that it was a
voice captioning system, that the disabilities coordinator hired ASL
interpreters to train with it and use it (the student used hearing
aids and her first language was English, but she understood ASL, and
the school had been offering her interpreters because they had been
unable to find a CART provider. She had gone through undergraduate
school using remote stenographic CART, but told me that she preferred
onsite CART to both remote CART and onsite ASL interpretation). It was
a disaster for the student. The transcripts were apparently
unreadable, and the captioning system (which, I got the impression,
nearly broke the disability accommodations department's budget for the
semester) is currently moldering in a cupboard. I began providing
CART there immediately afterwards, and this fall I'll be starting my
fourth year with them.
UPDATE, JUNE 2010:
I recently asked the school's disability services director her
specific reasons for switching from the voice captioning
system to CART, and she emailed me the following:
"1) The Captioning training software was difficult to access and use, and the training itself proved difficult and laborious.
2) A prospective captioner would have to put in literally hundreds of hours of training just for the system to accurately recognize their voice.
3) The process proved to be disruptive to other students in the classroom.
4) One captioner felt that the microphone setup was claustrophic.
5) I was having sign language interpreters trained on the captioning system. At first the interpreters were eager to learn a new technology, but as the training droned on and on, and when they used it in the classroom and it was glaringly apparent that students didn't derive the same level of cogent, expressive interpreting they'd had with interpreters, the interpreters felt they were doing the students a grave injustice. They all hated it and felt either they should be left to do real sign language interpreting or that CART would be a much better alternative.
6) The students felt the resulting transcript contained too many errors and the lag time was too long.
7) The end result is that I have a $5000 Captioning System just
sitting in my storage room collecting dust."
-- Mai McDonald,
Pratt
Institute Disability Services. Posted with permission.
Computerized transcription is a fantastic technology, and it's particularly
useful for those who can speak easily but find typing difficult. When
someone is composing off the top of their head, they generally tend to
speak at a slower rate than their ordinary conversational speed, and
if they see that the program has made a mistake, they can easily stop
and correct it. It's a very different situation when it comes to
realtime transcription of someone else's speech, particularly in a
university environment. It is possible to produce an accurate
transcript using only a microphone and a computer with sufficiently
advanced software and processing power, but it is by no means a simple
matter.
A distinction should be made here between speech
recognition and voice recognition. Speech recognition is completely
automated. A computer picks up natural live speech from any speaker through a
microphone and instantly translates it to text without any human
intermediary. In practice, speech recognition is unsuited for the
task of live transcription. It can be effective in distinguishing from
a small number of options or commands (such as the automated voice menus
that have replaced "press one for yes or two for no" in some commercial
telephone systems), but it is unable to distinguish between speakers
or insert punctuation, and its accuracy is wildly variable,
ranging from not very good to abysmal.
Voice writing, on the other hand, employs voice recognition, requiring
a period of training in which the computer learns the
individual patterns of a speaker's voice, while the speaker learns
how to make their speech consistent enough to be reliably
transcribed by the computer. There are several excellent voicewriters
working today who are able to provide verbatim transcription, but
they've put in thousands of hours training their voice, their
software, and their transcription theory (coming up with different
ways to pronounce homophones, for instance, since artificial
technology is very far from solving the "their/they're/there"
problem.) It's probably more difficult to find a truly verbatim
voicewriter than it is to find a truly verbatim stenographic CART provider.
They're quite rare, and they charge equivalent prices to stenographic CART.
Unfortunately, a majority of voicewriting services currently serving
universities position themselves as economical alternatives to CART, and
they generally save that money by hiring voicewriters with insufficient
training to deliver an accurate verbatim transcript.
This can seem counterintuitive. Most people, after all,
find speaking easier than typing, and so they assume that voice
writing should naturally be easier than realtime stenography. The
trouble is that, in ordinary conversation, the translation engine
is the human brain, which is vastly better than a computer is at compensating
for inconsistent pronunciation, muffled speech, accents and dialects,
homonyms, rate variation, and unfamiliar vocabulary. By drawing on
context, outside knowledge, speech-reading, and memories of previous
conversations, humans are able to compensate for an extraordinary number
of gaps and errors which computers simply cannot parse. In order to transcribe
speech accurately, a computer needs input that is even, regular, perfectly
articulated, cleanly delineated, and without potential duplicates
or ambiguities in pronunciation.
I sometimes like to say that voicewriters are to stenotypists as
beatboxers
are to drummers. Without training, it's easier to say "
paradiddle"
once than it is to beat one on a drum set, but try saying "paradiddle
pataflafla flam dragadiddle ratamacue" ten times fast without tripping
over your tongue. It usually takes a drummer only a few years
of solid practice to play at a competent level. Good beatboxers,
on the other hand, have to be fiendishly talented and practice fiendishly
hard, because the human voice is not easily able to imitate the
complex percussive rhythms which come naturally to drummers.
The human hand is generally a more accurate instrument than the human
voice for swift, repetitive motions, which are precisely
the sorts of signals a computer needs to receive in order to deliver
consistent output. This is why, despite the wide availability of
voice recognition software and the relative scarcity and cost of stenographic
technology, the majority of realtime court reporters and CART providers
working today are stenographers. Realtime verbatim voice writing is
tremendously difficult to achieve; consequently, it's rare to find
a voice writer capable of providing the level of service that
qualified CART providers take for granted.
Putting all that aside, even assuming the voice captioning service
provides highly trained, highly accurate voice writers, another
big disadvantage of a voicewriter versus a CART provider is that sound
inevitably bleeds through the mask they use to cover their computer's microphone.
Though the mask muffles the sound to some degree, it can still be distracting
in an academic environment. Some services propose to solve this
by offering remote captioning. Remote captioning,
whether provided by stenographers or voicewriters, is a
good solution when no one can be found to provide onsite services, but
it has several drawbacks.
For one, remote captioners can't read terminology
written on Powerpoint slides or speak up to clarify unclear phrases at the
student's request, and they can't voice for students who don't
use speech but who want to contribute to class discussions. I'm
currently doing some work for a university that hired a remote
stenographic CART provider to caption some highly technical classes
for a Pharmacy student. One of the professors had a thick accent,
spoke very quietly, used complex biochemical terminology,
and taught in a classroom with thick concrete walls. I
read some of the transcripts, and the captioner was
clearly quite skillful, but every third word was (inaudible); the Skype
reception kept cutting out, the professor's accent made
his speech very hard to catch without watching his mouth, and none of
the extensive information displayed on the screen or in paper handouts
was available for the remote captioner to use. Three weeks
into the semester, I was called in to provide CART onsite,
and the following semester the student requested me
for all of his lecture classes.
Another disadvantage of remote captioning, whether
stenographic or voice-based, is that only the person wearing the
microphone will be transcribed. This means that questions from
students, unless the professor repeats them before answering,
will be marked as inaudible, and the student might feel left out of
the discussion. I'm certainly not opposed to working remotely on
principle. I own remote captioning software, and I'm more than willing
to provide it in cases where an onsite CART provider
is not available; but, by and large, onsite CART is more likely to
ensure that realtime output is accurate and complete. Sometimes services
claim that remote captioning is preferable for students who might not
like their classmates to know that they're receiving realtime,
and who might feel awkward with a captioner sitting next to them.
In that case, a good solution is to find a captioner or CART provider
with two computers that transmit text wirelessly back and forth.
In situations where my clients have told me that they'd prefer to sit
on their own rather than reading from my laptop, I set up my equipment
in the back of the room, give them a computer connected via Bluetooth,
and let them sit wherever they like.
I know CART providers are hard to find and sometimes can seem prohibitively
expensive, but very often speech-to-text companies claim a lot more
than they're able to deliver, and it can severely affect the quality
of a student's access. It's worth being a little cautious
before laying out a lot of cash on something that might not be as good
as it sounds, which is why I advise prospective clients to ask for a
demonstration of various technologies before settling on a provider.
I'm always happy to provide demonstrations of CART on request, free of charge.