Skip to main content

Table 3 Summary of speech recognition (SR) review results

From: A systematic review of speech recognition technology in health care

Author

Aim

Setting

Outcome measures

Results

Year

 

Sample

  

Country Design

  

Speech technology (ST)

 

Design

    

Al-Aynati and Chorneyko 2003 [18]

To compare SR software with HT for generating pathology reports

Setting: Surgical pathology

1. Accuracy rate

Accuracy rate (mean %)

Sample: 206 pathology reports

2. Recognition/ Transcription errors

SR: 93.6 HT: 99.6

Canada Experimental

 

ST: IBM Via Voice Pro version 8 with pathology vocabulary dictionary

 

Mean recognition errors

SR: 6.7 HT: 0.4

Mohr et al. 2003 [22]

To compare SR software with HT for clinical notes

Setting: Endocrinology and Psychiatry

1. Dictation/recording time + transcription (minutes) = Report Turnaround Time (RTT).

RTT (mins)

Endocrinology

SR: (Recording + transcription) = 23.7

HT: (Dictation + transcription) = 25.4

USA Experimental

Sample: 2,354 reports

ST: Linguistic Technology Systems LTI with clinical notes application

 

SR: 87.3% (CI 83.3, 92.3) productive compared to HT.

Psychiatry transcriptionist

SR: (Recording + transcription) = 65.2

HT: (Dictation + transcription) = 38.1

SR: 63.3% (CI 54.0, 74.0) productive compared to HT.

Psychiatry secretaries

SR: (Recording + transcription) = 36.5

HT: (Dictation + transcription) = 30.5

SR: 55.8% (CI 44.6, 68.0) productive compared to HT.

Author, secretary, type of notes were predictors of productivity (p < 0.05).

NSLHD 2012 [29]

To compare accuracy and time between SR software and HT to produce emergency department reports

Setting: Emergency Department

1. RTT

RTT mean (range) in minutes

Australian Experimental

 

Sample: 12 reports

 

SR: 1.07 (46 sec, 1.32)

ST: Nuance Dragon Voice Recognition

 

HT: 3.32 (2.45, 4.35)

HT: Spelling and punctuation errors

SR: Occasional misplaced words

Alapetite, 2008 [30]

To evaluate the impact of background

Setting: Simulation laboratory

1. Word Recognition Rate (WRR)

WRR

Denmark Non-experimental

noise (sounds of alarms, aspiration, metal, people talking, scratch, silence, ventilators) and other factors affecting SR accuracy when used in operating rooms

Sample: 3600 short anaesthesia commands

 

Microphone

  

Microphone 1: Headset 83.2%

ST: Philips Speech Magic 5.1.529 SP3 and Speech Magic Inter Active Danish language, Danish medical dictation adapted by Max Manus

 

Microphone 2: Handset 73.9%

Recognition mode

Command 81.6%

Free text 77.1%

Background noise

Scratch 66.4%

Silence 86.8%

Gender

Male 76.8%

Female 80.3%

Alapetite et al. 2009 [31]

To identify physician’s perceptions, attitudes and expectations of SR technology.

Setting: Hospital (various clinical settings)

1. Users’ expectation and experience

Overall

Denmark Non-experimental

 

Sample: 186 physicians

Predominant response noted.

Q1 Expectation: positive 44%

Q1 Experience: negative 46%

Performance

Q8 Expectation: negative 64%

Q8 Experience: negative 77%

Time

Q14 Expectation: negative 85%

Q14 Experience: negative 95%

Social influence

Q6 Expectation negative 54%

Q6 Experienced negative 59%

Callaway et al. 2002 [20]

To compare an off the shelf SR software with manual transcription services for radiology reports

Setting: 3 military medical facilities

1. RTT (referred to as TAT)

RTT

USA Non-experimental

 

Sample: Facility 1: 2042 reports

2. Costs

Facility 1: Decreased from 15.7 hours (HT) to 4.7 hours (SR)

Facility 2: 26600 reports

 

Completed in <8 h: SR 25% HT 6.8%

Facility 3: 5109 reports

 

Facility 2: Decreased from 89 hours (HT) to 19 hours (SR)

ST: Dragon Medical

 

Cost

Professional 4.0

 

Facility 2: $42,000 saved

Facility 3: $10,650 saved

Derman et al. 2010 [32]

To compare SR with existing methods of data entry for the creation of electronic progress notes

Setting: Mental health hospital

1. Perceived usability

Usability

Canada Non-experimental

 

Sample: 12 mental health physicians

ST: Details not provided

2. Perceived time savings

50% prefer SR

3. Perceived impact

Time savings: No sig diff (p = 0.19)

Impact

Quality of care No sig diff (p = 0.086)

Documentation No sig diff (p = 0.375)

Workflow No sig improvement (p = 0.59)

Devine et al. 2000 [33]

To compare ‘out-of-box’ performance of 3 continuous SR software packages for the generation of medical reports.

Sample: 12 physicians from Veterans Affairs facilities New England

1. Recognition errors (mean error rate)

Recognition errors (mean-%)

USA Non-experimental

 

ST: System 1 (S1) IBM ViaVoice98 General Medicine Vocabulary.

2. Dictation time

Vocabulary

3. Completion time

S1 (7.0 -9.1%) S3 (13.4-15.1%) S2 (14.1-15.2%)

System 2 (S2) Dragon Naturally Speaking Medical Suite, V 3.0.

4. Ranking

S1 Best with general English and medical abbreviations.

Dictation time: No sig diff (P < 0.336).

System 3 (S3) L&H Voice Xpress for Medicine, General Medicine Edition, V 1.2.

5. Preference

Completion time (mean):

S2 (12.2 min) S1 (14.7 min) S3 (16.1 min)

Ranking: 1 S1 2 S2 3 S3

Irwin et al. 2007 [34]

To compare SR features and functionality of 4 dental software application systems.

Setting: Simulated dental

1. Training time

Training time

USA Non-experimental

 

Sample: 4 participants (3 students, 1 faculty member)

2. Charting time

S1 11 min 8 sec S2 9 min 1 sec (no data reported for S3 ad S4).

3. Completion

ST: Systems 1 (S1) Microsoft SR with Dragon NaturallySpeaking.

4. Ranking

Charting time: S1 5 min 20 sec S2 9 min 13 sec, (no data reported for S3 ad S4).

System 2 (S2) Microsoft SR

 

Completion %: S1 100 S2 93 S3 90 S4 82

Systems 3 (S3) & System 4 (S4) Default speech engine.

 

Ranking

1 S1 104/189 2 S2 77/189

Kanal et al. 2001 [35]

To determine the accuracy of continuous SR for transcribing radiology reports

Setting: Radiology department

1. Error rates

Error rates (mean ± %)

USA Non-experimental

 

Sample: 72 radiology reports 6 participants

 

Overall (10.3 ± 33%)

Significant errors (7.8 ± 3.4%)

ST: IBM MedSpeaker/Radiology software version 1.1

 

Subtle significant errors (1.2 ± 1.6%)

Koivikko et al. 2008 [36]

To evaluate the effect of speech recognition onadiology workflow systems over a period of 2 years

Setting: Radiology department

1. RTT (referred to as TAT) at 3 collection points:

RTT (mean ± SD) in minutes

Finland Non-experimental

 

Sample: > 20000 reports; 14 Radiologists

HT: 2005 (n = 6037)

HT: 1486 ± 4591

ST: Finnish Radiology Speech

SR1: 2006 (n = 6486)

SR 1: 323 ± 1662

Recognition System (Philips Electronics)

SR2: 2007 (n = 9072)

SR 2 : 280 ± 763

HT: cassette-based reporting

2. Reports completed ≤ 1 hour

Reports ≤ 1 hour (%)

SR1: SR in 2006

 

HT: 26

SR2: SR in 2007

 

SR 1 : 58

Training:

  

10-15 minutes training in SR

 

Langer 2002 [37]

To compare impact of SR on radiologist productivity. Comparison of 4 workflow systems

Setting: Radiology departments

1. RTT (referred to as TAT)

RTT (mean ± SD%) in hours/ RP

USA Non-experimental

 

Sample: Over 40 radiology sites

2. Report productivity (RP), number of reports per day

System 1

System 1 Film, report dictated, HT

 

RTT: 48.2 ± 50 RP: 240

System 2 Film, report dictated, SR

 

System 2

System 3 Picture archiving and communication system + HT

 

RTT: 15.5 ± 93 RP: 311

System 3

System 4 Picture archiving and communication system + SR

 

RTT: 13.3 ± 119 (t value at 10%) RP: 248

System 4

RTT: 15.7 ± 98 (t value at 10%) RP: 310

Singh et al. 2011 [23]

To compare accuracy and turnaround

Setting: Surgical pathology

1. RTT (referred to as TAT)

RTT in days

USA Non-experimental

times between SR software and traditional transcription service (TS) when used for generating surgical pathology reports

Sample: 5011 pathology reports

2. Reports completed ≤ 1 day

Phase 0: 4

ST: VoiceOver (version 4.1) Dragon Naturally Speaking Software (version 10)

3. Reports completed ≤ 2 day

Phase 1: 4

Phase 0: 3 years prior SR

Phase 2–4: 3

Phase 1: First 35 months of SR use, gross descriptions

Reports ≤ 1 day (%)

Phase 0: 22

Phase 2–4: During use of SR for gross descriptions and final diagnosis

Phase 1: 24

Phase 2–4: 36

Reports ≤ 2 day (%)

Phase 0: 54

Phase 1: 60

Phase 2–4: 67

Zick et al. 2001 [38]

To compare accuracy and RTT between

Setting: Emergency Department

1. RTT (referred to as TAT)

RTT in mins

USA Non-experimental

SR software and traditional transcription service (TS) when used for recording in patients’ charts in ED

Sample: Two physicians - 47 patients’ charts

2. Accuracy

SR: 3.55 TS: 39.6

3. Errors per chart

Accuracy % (Mean and range)

ST: Dragon NaturallySpeaking Medical suite version 4

4. Dictation and editing time

SR: 98.5 (98.2-98.9) TS: 99.7 (99.6-99.8)

4. Throughput

Average errors/chart

SR: 2.5 (2–3) TS: 1.2 (0.9-1.5)

Average dictation time in mins (Mean and range)

SR: 3.65 (3.35-3.95) TS: 3.77 (3.43-4.10)

Throughput (words/minute)

SR: 54.5 (49.6-59.4) TS: 14.1 (11.1-17.2)

  1. Report productivity (RP): Normalises the output of staff to the daily report volume.
  2. Note: SR = speech recognition ST = speech technology HT = human transcription RTT = report turnaround time WRR = word recognition rate PACS = picture archiving and communication system RP = report productivity TS = traditional transcription service ED = emergency department Sig. = Significant Diff = difference. TAT = turnaround time, equivalent to RTT.