IIT Real-Time Communication, WebRTC, Mobility, VoIP, NG911 Conference & Expo


Real Time Communications Conference & Expo at Illinois Tech

IEEE International Conference

  • Home
  • News
  • Sponsors/Exhibitors
    • Conference Prospectus
    • Become a Sponsor and/or Exhibitor
  • Program/Tracks
    • Conference Schedule
    • Research Track
      • Research Track CfP
    • Programmable Real-Time Networks
    • VoiceTech
    • WebRTC & Real-Time Applications
    • Next Generation Emergency Communications
    • Internet of Things
    • Conference Speakers
  • Call for Posters
    • Accepted Posters
  • Propose a Talk
  • Travel/Hotels
  • Contact
  • Register

Presentation

Track: VoiceTech
TranscribeX: LLM-Enhanced ASR Transcription
When transcribing telephony audio, Automatic Speech Recognition (ASR) engines often produce noisy output with high word error rates (WER). This impacts the efficacy of downstream analyses that process this transcribed text (intent determination, sentiment analysis, etc.). In this talk, we present two experiments demonstrating how a Large Language Model (LLM) can be used to improve the quality of telephony-based transcripts.

In Experiment 1, we introduce an LLM choice method: providing an LLM with two or more ASR-generated transcripts for an audio file and instructing it to select the best transcription. In addition, the LLM is also prompted with information about domain and comparative ASR performance. Tested on an internal dataset of customer experience surveys, this approach yields a 1.7% WER improvement over the best-performing ASR. The method’s usefulness can be further maximized by focusing on documents with the highest ASR disagreement, achieving WER improvements of up to 5% on data subsets with high ASR disagreement.

In Experiment 2, we further test the LLM choice method on a dataset taken from the same domain but in a different distribution. We find that the method is less effective on this dataset overall but still useful for some documents. In particular, short documents in this dataset benefit from the LLM choice method, with a 2% WER improvement over the best-performing ASR for 1-5 word transcripts.

These experiments provide a proof-of-concept that ASR transcriptions of telephony audio can be improved via an LLM enhancement approach like the LLM choice method we propose. However, maximizing the performance gains from this approach requires a targeted improvement strategy specific to the domain and distribution of a dataset.
  • Grace LeFevre - Speaker
Presentation Video
Presentation Notes
LEFEVRE_TranscribeXLLMEnhancedASRTranscription.pdf

Follow Us

FacebooktwitterlinkedinFacebooktwitterlinkedin

Share This

FacebooktwitterlinkedinmailFacebooktwitterlinkedinmail

News

RTC Conference 2024

RTC Conference is Today!

RTC Conference at Illinois Tech

More Info:

  • Contact
  • Research Track CFP

© 2012-2013 llinois Institute of Technology School of Applied Technology 201 East Loop Road, Wheaton IL 60189 630.682.6000
3424 South State, Chicago IL 60616 312.567.5280 Emergency Information

© Copyright 2024 RTC-Conference · All Rights Reserved

7ads6x98y