GL's Speech Transcription Server: Automated Speech-to-Text and IVR Testing Solution
Welcome to the latest issue of GL's Newsletter! We are excited to announce the release of Speech Transcription Sever software, a PC-based automated Speech-to-Text conversion application that allows automated on-the-fly generation of audio files from user-defined text.
Overview
GL’s Speech Transcription Server (STS) is a speech-to-text conversion application that translates spoken language into text and provides analysis of the transcribed speech. The STS converts recorded audio files (PCM or WAV) into text format. It continuously monitors single or multiple folders for short audio files. Once detected, the files are placed in the speech transcription queue for processing. The audio files are then sent to a cloud-based transcription service, where they are accurately converted into text.
Network providers utilize the STS to:
- Verifies the accuracy of IVR voice prompts and ensures the proper flow of the IVR system
- Validates the accuracy of Voice Mail content and confirms that the Voice Mail functionality is operating correctly
- Ensures accurate and seamless voice transmission across any network, maintaining clarity and proper flow
The STS works with GL’s Audio File Conversion Utility, which supports all industry-standard voice codec formats. The utility converts recorded voice files from their native codec format to a GL-standard format, enhancing the STS’s capability to handle voice files with higher density and multiple codec options.
The STS can function as a standalone utility or integrate with other GL test tools to enable automation, precise call control, and quality analysis. The STS supports REST APIs, allowing seamless integration with GL’s intrusive test tools, such as Message Automation and Protocol Simulation (MAPS™) and VQuad™. Users can send and receive transcription requests and retrieve transcription results directly from a database.
MAPS™ features a unique architecture for multi-interface, multi-protocol simulation, making it ideal for testing core networks, access networks, and interoperability functions. MAPS™ High Density network appliance is designed to easily achieve up to 64,000 simultaneous calls per appliance (i.e., 8000 RTP media sessions per port). Using a stack of multiple servers, a larger test system with 100K-200K calls is achievable for enterprise to carrier grade testing. The VQuad™ Probe HD is a versatile, all-in-one hardware solution that supports multiple physical interfaces, enabling connections to any wired or wireless network. It seamlessly performs automated, end-to-end voice and data testing across any network.
By integrating GL’s STS with the MAPS™ IVR and VQuad™ platforms, users can automate the testing of IVR tree traversal, checking pass/fail conditions with high precision. The system automatically records each prompt (IVR menus) and analyzes the audio files using speech-to-text transcription capabilities. Both MAPS™ and VQuad™ testing platforms support the use of the STS utility across various networks, including 2-Wire (FXO, FXS), TDM, IP, and wireless networks (5G, VoLTE, UMTS, and GSM).
The STS software includes a Speech-to-Text feature that converts recorded speech into text. Supporting formats like PCM, WAV, and GLW, it is ideal for verifying voice prompts and assisting with IVR systems. The utility provides transcription accuracy with a confidence score (0 to 1) and can process audio in real-time, storing results for analysis or further use.
The STS utility includes a Text-to-Speech feature that allows automated, on-the-fly generation of audio files from user-defined text. The speech synthesizer provides options to configure languages (over 50 supported) and save audio files in PCM/WAV format. Text can be manually entered to generate voice prompts in the selected audio file format. The converted audio files are automatically saved in the specified directory path.
The STS can also be used with GL’s Voice Quality Testing solution to measure network voice quality, assess the impact of different codecs on speech transcription quality, and evaluate the effects of Noise, Echo, and Bit Error Rates. Additionally, the STS supports various industry-standard voice codecs. If access to both sides of the call is unavailable, the speech-to-text method can record a pre-specified sentence at the far end and provide a certainty score based on the expected text. This conversion verifies if the received audio (e.g., a customer speaking) matches expectations, confirming good audio quality on the network.
Key Features
- Assist with IVR/VM testing where voice responses are required and provides Pass/Fail result
- Supports up to 50 languages and variants while converting both PCM and WAV file formats
- Cloud-based speech synthesizer provides accurate speech files
- Text-to-speech feature supports the latest voice codecs
- Transcribe up to 30 seconds of speech files into text
- REST API support for fast and easy integration with third-party testing platforms
- REST API server allows one STS instance to serve multiple clients