Automation testing of Text to Speech web app

As a part of Qxf2Services Hackathon, I had picked up a project to automate testing of a readily available Text to Speech web app. To follow along, I assume you have some familiarity with Python, Selenium.

Overview of Text to Speech Demo app

To try out the testing of Text to Speech, I was looking for a readily available web app which can help me achieve my goal. After some googling, I found out a readily available and hosted Text to Speech Demo Web app. This Text to Speech service understands text and natural language to generate synthesized audio output complete with appropriate cadence and intonation.

Working of Text to Speech Demo app

To use Text to Speech Demo app user needs to :

select the voice language of his choice from the dropdown
Input the text (Note that The text language must match the selected voice language)
Click on the Speak button to hear the speech or Click on Download button which will give you an Mp3 Audio file of the speech that will speak out the text you had given as input in step 2

Our Test Scenario

Opening Demo app
Selecting Voice – By default we are going to use American English (en-US): Allison (female, expressive, transformable)
Inputting text – so our Input text would be Thank You
Click on Download button
Convert the Mp3 Audio file to .Wav file using pydub
Detect the text from the .Wav file
Compare the input text given in step 3 matches with the detected text from step 6

Automating Our Test Scenario

Create a file named test_voice_demo_app.py with the following content:

"""
This is a Automation test for Text to Speech Demo app
"""
import os
import unittest
import time
from selenium import webdriver
 
class VoiceWebAppTest(unittest.TestCase):
    "Class to run tests against voice web app"
    def setUp(self):
        "Setup for the test"
        chrome_options = webdriver.ChromeOptions() 
        prefs = {'download.default_directory' : 'path to your preferred download directory'}
        chrome_options.add_experimental_option('prefs', prefs)
        self.driver = webdriver.Chrome(chrome_options=chrome_options)
        self.driver.maximize_window()
 
     def test_voice_web_app(self):
        "Test the voice web app text to speech"        
        url = 'https://text-to-speech-demo.ng.bluemix.net/'
        print 'Opening %s'%url
 
        #Open the Voice Demo App
        self.driver.get(url)
        time.sleep(5)
        #Scroll down        
        self.driver.execute_script("window.scrollBy(0, -150);")
        time.sleep(5)
        #Input the text
        self.driver.find_element_by_xpath("//select[@name='voice']").click()
        time.sleep(2)
        self.driver.find_element_by_xpath("//select[@name='voice']/option[@value='en-US_AllisonVoice']").click()
 
        #Set input text
        keyword_input = 'Thank You'
        print 'Input text is : %s'%keyword_input
        input_text_area = self.driver.find_element_by_xpath("//div[@data-id='Text']/textarea[@class='base--textarea textarea']")
        input_text_area.clear()
        input_text_area.send_keys(keyword_input)       
 
        #Download speech Audio Mp3 file
        self.driver.find_element_by_xpath("//button[text()='Download']").click()
 
    def tearDown(self):
        "Tear down the test"
        self.driver.quit()
 
#---START OF SCRIPT
if __name__ == '__main__':
    suite = unittest.TestLoader().loadTestsFromTestCase(VoiceWebAppTest)
    unittest.TextTestRunner(verbosity=2).run(suite)

By this point , we have automated our test to download transcript.mp3 to our desired location.

Converting Audio file format

Now that, we have the Audio file of the input text with us which is in mp3 format ,but to detect the text from mp3 format is not possible , so we need to convert it to a .wav extension which is a preferred solution I found after some googling around.

To convert a .mp3 file to .wav file we will be making use of pydub opensource python package that can convert .mp3 file to various other audio file extensions.
To install this package you need to run pip install pydub

pydub internally uses FFmpeg, which is the leading multimedia framework, able to decode, encode, transcode, mux, demux, stream, filter and play pretty much anything that humans and machines have created.You can download the package from – FFmpeg builds page depending upon your system and add it to the PATH

from pydub import AudioSegment
 
sound = AudioSegment.from_mp3("transcript.mp3") #Downloaded transcript.mp3 file which we need to convert
sound.export("eng.wav", format="wav") #Output eng.wav file to detect text from

Recognizing text from Audio file

Next, we need to detect the text from the speech for Audio file.This can be done by making use of SpeechRecognition python library.

To install the package you need to run pip install SpeechRecognition

This library has support for various Speech recognition engines which can be found at Speech Recognition Documentaion , but for my project i arbitarily used Google Speech Recognition support

import speech_recognition as sr
 
audio = 'eng.wav' #name of the file
r = sr.Recognizer()
with sr.AudioFile(audio) as source:
    audio = r.record(source)
    try:
        recognized_text = r.recognize_google(audio,language='en-US')
        print('Decoded text from Audio is {}'.format(recognized_text))
    except:
        print('Sorry could not recognize your voice')

Asserting input text with the recognized text

assert keyword_input.lower() == recognized_text.lower(),"Detected speech text doesnt match with input text"

Combining all the above pieces , our final test_voice_demo_app.py script would look like:

import os
import unittest
import time
from selenium import webdriver
from pydub import AudioSegment
import speech_recognition as sr
 
class VoiceWebAppTest(unittest.TestCase):
    "Class to run tests against voice web app"
    def setUp(self):
        "Setup for the test"
        chrome_options = webdriver.ChromeOptions() 
        prefs = {'download.default_directory' : 'E:/workspace-qxf2/hackathon/Voicewebapptest'}
        chrome_options.add_experimental_option('prefs', prefs)
        self.driver = webdriver.Chrome(chrome_options=chrome_options)
        self.driver.maximize_window()
 
 
    def test_voice_web_app(self):
        "Test the voice web app text to speech"        
        url = 'https://text-to-speech-demo.ng.bluemix.net/'
        print 'Opening %s'%url
 
        #Open the Voice Demo App
        self.driver.get(url)
        time.sleep(5)
        #Scroll down        
        self.driver.execute_script("window.scrollBy(0, -150);")
        time.sleep(5)
        #Input the text
        self.driver.find_element_by_xpath("//select[@name='voice']").click()
        time.sleep(2)
        self.driver.find_element_by_xpath("//select[@name='voice']/option[@value='en-US_AllisonVoice']").click()
 
        #Set input text
        keyword_input = 'Thank You'
        print 'Input text is : %s'%keyword_input
        input_text_area = self.driver.find_element_by_xpath("//div[@data-id='Text']/textarea[@class='base--textarea textarea']")
        input_text_area.clear()
        input_text_area.send_keys(keyword_input)       
 
        #Download speech Audio Mp3 file
        self.driver.find_element_by_xpath("//button[text()='Download']").click()
        time.sleep(5)
 
        #Convert Mp3 file to .Wav file  
        sound = AudioSegment.from_mp3("transcript.mp3")
        sound.export("eng.wav", format="wav")  
 
 
        #Recognize the text from Mp3 Audio file 
        audio = 'eng.wav' #name of the file
        r = sr.Recognizer()
        with sr.AudioFile(audio) as source:
            audio = r.record(source)
            try:
                recognized_text = r.recognize_google(audio,language='en-US')
                print('Decoded text from Audio is {}'.format(recognized_text))
            except:
                print('Sorry could not recognize your voice')
 
        assert keyword_input.lower() == recognized_text.lower(),"Detected speech text doesnt match with input text"
 
    def tearDown(self):
        "Tear down the test"
        self.driver.quit()
        os.remove('transcript.mp3')
        os.remove('eng.wav')
 
 
 
 
#---START OF SCRIPT
if __name__ == '__main__':
    suite = unittest.TestLoader().loadTestsFromTestCase(VoiceWebAppTest)
    unittest.TextTestRunner(verbosity=2).run(suite)

import os import unittest import time from selenium import webdriver from pydub import AudioSegment import speech_recognition as sr class VoiceWebAppTest(unittest.TestCase): "Class to run tests against voice web app" def setUp(self): "Setup for the test" chrome_options = webdriver.ChromeOptions() prefs = {'download.default_directory' : 'E:/workspace-qxf2/hackathon/Voicewebapptest'} chrome_options.add_experimental_option('prefs', prefs) self.driver = webdriver.Chrome(chrome_options=chrome_options) self.driver.maximize_window() def test_voice_web_app(self): "Test the voice web app text to speech" url = 'https://text-to-speech-demo.ng.bluemix.net/' print 'Opening %s'%url #Open the Voice Demo App self.driver.get(url) time.sleep(5) #Scroll down self.driver.execute_script("window.scrollBy(0, -150);") time.sleep(5) #Input the text self.driver.find_element_by_xpath("//select[@name='voice']").click() time.sleep(2) self.driver.find_element_by_xpath("//select[@name='voice']/option[@value='en-US_AllisonVoice']").click() #Set input text keyword_input = 'Thank You' print 'Input text is : %s'%keyword_input input_text_area = self.driver.find_element_by_xpath("//div[@data-id='Text']/textarea[@class='base--textarea textarea']") input_text_area.clear() input_text_area.send_keys(keyword_input) #Download speech Audio Mp3 file self.driver.find_element_by_xpath("//button[text()='Download']").click() time.sleep(5) #Convert Mp3 file to .Wav file sound = AudioSegment.from_mp3("transcript.mp3") sound.export("eng.wav", format="wav") #Recognize the text from Mp3 Audio file audio = 'eng.wav' #name of the file r = sr.Recognizer() with sr.AudioFile(audio) as source: audio = r.record(source) try: recognized_text = r.recognize_google(audio,language='en-US') print('Decoded text from Audio is {}'.format(recognized_text)) except: print('Sorry could not recognize your voice') assert keyword_input.lower() == recognized_text.lower(),"Detected speech text doesnt match with input text" def tearDown(self): "Tear down the test" self.driver.quit() os.remove('transcript.mp3') os.remove('eng.wav') #---START OF SCRIPT if __name__ == '__main__': suite = unittest.TestLoader().loadTestsFromTestCase(VoiceWebAppTest) unittest.TextTestRunner(verbosity=2).run(suite)

How to run

To run the script , use command –

python test_voice_demo_app.py

Output

What next?

You can extend this test script to include support for various other input voice language texts available in Text to Speech Demo app by making use of various python language translator packages.

Rohan Joshi

I am a software tester with more than 3 years of experience. I started my career in an e-commerce startup called Browntape Technologies. I was looking forward to work with a software testing organization which would help me showcase my testing and technical skills. So I joined Qxf2. I love scripting in Python and using Selenium. I live in Goa and enjoy its beaches. My hobbies include playing cricket, driving and exploring new places.

6 thoughts on “Automation testing of Text to Speech web app”

QA says:

December 12, 2019 at 6:12 am

Hi,

Can you help me little more on it

1. Indira Nellutla says:
  
  December 12, 2019 at 7:19 am
  
  Hi, Could you please provide more details
  
swapnil says:

January 22, 2020 at 1:02 am

can we automate websocket api?

1. Rahul Bhave says:
  
  January 22, 2020 at 2:15 am
  
  Hi Swapnil,
  
  Following references may be helpful to you:
  
  https://www.twilio.com/docs/voice/tutorials/consume-real-time-media-stream-using-websockets-python-and-flask
  https://techtutorialsx.com/2018/02/11/python-websocket-client/
  https://developer.nexmo.com/use-cases/voice-call-websocket-python
  
Sreenath says:

July 6, 2020 at 11:56 am

Hi Can we Automate “speech to text and text to speech” using the above?If yes can you elaborate a little ?

1. Rahul Bhave says:
  
  July 7, 2020 at 6:40 am
  
  Hi Sreenath,
  
  Would you please provide more details about your query?
  
  Regards,
  Rahul

Automation testing of Text to Speech web app

Automation testing of Text to Speech web app

Overview of Text to Speech Demo app

Working of Text to Speech Demo app

Our Test Scenario

Automating Our Test Scenario

Converting Audio file format

Recognizing text from Audio file

Asserting input text with the recognized text

How to run

Output

What next?

6 thoughts on “Automation testing of Text to Speech web app”

Leave a Reply Cancel reply

Subscribe to our weekly Newsletter

Overview of Text to Speech Demo app

Working of Text to Speech Demo app

Our Test Scenario

Automating Our Test Scenario

Converting Audio file format

Recognizing text from Audio file

Asserting input text with the recognized text

How to run

Output

What next?

Related posts:

6 thoughts on “Automation testing of Text to Speech web app”

Leave a Reply Cancel reply

You may like this....