IIT Real-Time Communication, WebRTC, Mobility, VoIP, NG911 Conference & Expo


Real Time Communications Conference & Expo at Illinois Tech

IEEE International Conference

  • Home
  • News
  • Sponsors & Exhibitors
    • Become a Sponsor and/or Exhibitor
    • Sponsors
    • Exhibitors
  • Program
  • Conference Speakers
  • Travel/Hotels
  • Propose a Talk
  • Register
  • HackRTC

Presentation

Track: VoiceTech
Digits Micro-Model: Enhancing Digit Recognition with Domain-Specific ASR
Digit recognition is of utmost importance in processing payment information, phone numbers, and various numerical data. Accurate and efficient digit recognition plays a crucial role in ensuring seamless user experiences and preventing errors in critical tasks. Therefore, in this project, our primary goal is to train a domain-specific Kaldi Automatic Speech Recognition (ASR) model that can recognize digits of up to five digits. Recent advancements in ASR have often focused on the power of large-scale, domain-general models. However, in very constrained domains, a domain-specific "micro" model may outperform general-purpose models. Micro models are a lightweight mechanism compared to a large-scale, general model. Using a general ASR model, like Whisper or Amazon Transcribe, to do digit recognition is akin to breaking open a peanut with a sledgehammer. While the results will likely be sufficient, there are more effective approaches. For this reason, we train a Kaldi model on open-source single-digit utterances and test its ability to recognize variable-length digit strings, with a maximum length of five.

To achieve robust digit recognition, we also curate a dataset that not only encompasses digits of various lengths, but also contains training observations that discern numerical digits pronounced by humans in diverse manners. For instance, the number 653 may be articulated as "six hundred and fifty-three," "six fifty-three," or even "sixty-three five." This diversity in digit lengths and pronunciation styles ensures that the model can effectively handle different numeric representations encountered in real-world scenarios. Our dataset comprises 14,000 instances collected from three diverse data sources, providing a comprehensive and representative collection of real-world numeric patterns. Through this project, we aim to contribute to the advancement of domain-specific ASR models, fostering more efficient and accurate digit recognition in critical applications.
  • Chirag Chhablani - Speaker
Presentation Video
Presentation Notes
CHHABLANI-DIGIT-MICRO-MODEL.pptx

Follow Us

FacebooktwitterlinkedinFacebooktwitterlinkedin

Share This

FacebooktwitterlinkedinmailFacebooktwitterlinkedinmail

News

Recap of HackRTC and RTC Conference 2023: Access Videos and Slides

RTC Conference at Illinois Tech starts tomorrow! – Livestreaming Instructions

RTC Conference Keynote Talks!

More Info:

  • Contact
  • Research Track CFP

© 2012-2013 llinois Institute of Technology School of Applied Technology 201 East Loop Road, Wheaton IL 60189 630.682.6000
3424 South State, Chicago IL 60616 312.567.5280 Emergency Information

© Copyright 2023 RTC-Conference · All Rights Reserved

7ads6x98y