Automation testing of Text to Speech web app

As a part of Qxf2Services Hackathon, I had picked up a project to automate testing of a readily available Text to Speech web app. To follow along, I assume you have some familiarity with Python, Selenium.


Overview of  Text to Speech Demo app

To try out the testing of Text to Speech, I was looking for a readily available web app which can help me achieve my goal. After some googling, I found out a readily available and hosted Text to Speech Demo Web app. This Text to Speech service understands text and natural language to generate synthesized audio output complete with appropriate cadence and intonation.

Working of Text to Speech Demo app

To use Text to Speech Demo app user  needs to :

  1. select the voice language of his choice from the dropdown
  2. Input the text (Note that The text language must match the selected voice language)
  3. Click on the Speak button to hear the speech or Click on Download button which will give you an Mp3 Audio file of the speech that will speak out the text you had given as input in step 2

Our Test Scenario

  1. Opening Demo app
  2. Selecting Voice –  By default we are going to use American English (en-US): Allison (female, expressive, transformable)
  3. Inputting text –  so our Input text would be Thank You
  4. Click on Download button
  5. Convert the Mp3 Audio file to .Wav file using pydub
  6. Detect the text from the .Wav file
  7. Compare the input text given in step 3 matches with the detected text from step 6

Automating Our Test Scenario

    Create a file named test_voice_demo_app.py with the following content:

    """
    This is a Automation test for Text to Speech Demo app
    """
    import os
    import unittest
    import time
    from selenium import webdriver
     
    class VoiceWebAppTest(unittest.TestCase):
        "Class to run tests against voice web app"
        def setUp(self):
            "Setup for the test"
            chrome_options = webdriver.ChromeOptions() 
            prefs = {'download.default_directory' : 'path to your preferred download directory'}
            chrome_options.add_experimental_option('prefs', prefs)
            self.driver = webdriver.Chrome(chrome_options=chrome_options)
            self.driver.maximize_window()
     
         def test_voice_web_app(self):
            "Test the voice web app text to speech"        
            url = 'https://text-to-speech-demo.ng.bluemix.net/'
            print 'Opening %s'%url
     
            #Open the Voice Demo App
            self.driver.get(url)
            time.sleep(5)
            #Scroll down        
            self.driver.execute_script("window.scrollBy(0, -150);")
            time.sleep(5)
            #Input the text
            self.driver.find_element_by_xpath("//select[@name='voice']").click()
            time.sleep(2)
            self.driver.find_element_by_xpath("//select[@name='voice']/option[@value='en-US_AllisonVoice']").click()
     
            #Set input text
            keyword_input = 'Thank You'
            print 'Input text is : %s'%keyword_input
            input_text_area = self.driver.find_element_by_xpath("//div[@data-id='Text']/textarea[@class='base--textarea textarea']")
            input_text_area.clear()
            input_text_area.send_keys(keyword_input)       
     
            #Download speech Audio Mp3 file
            self.driver.find_element_by_xpath("//button[text()='Download']").click()
     
        def tearDown(self):
            "Tear down the test"
            self.driver.quit()
     
    #---START OF SCRIPT
    if __name__ == '__main__':
        suite = unittest.TestLoader().loadTestsFromTestCase(VoiceWebAppTest)
        unittest.TextTestRunner(verbosity=2).run(suite)

    By this point , we have automated our test to download transcript.mp3 to our desired location.

    Converting Audio file format

    Now that, we have the Audio file of the input text with us which is in mp3 format ,but to detect the text from mp3 format is not possible , so we need to convert it to a .wav extension which is a preferred solution I found after some googling around.

    To convert a .mp3 file to .wav file we will be making use of pydub opensource python package that can convert .mp3 file to various other audio file extensions.
    To install this package you need to run  pip install pydub

    pydub internally uses FFmpeg, which is the leading multimedia framework, able to decode, encode, transcode, mux, demux, stream, filter and play pretty much anything that humans and machines have created.You can download the package from – FFmpeg builds page depending upon your system and add it to the PATH

    from pydub import AudioSegment
     
    sound = AudioSegment.from_mp3("transcript.mp3") #Downloaded transcript.mp3 file which we need to convert
    sound.export("eng.wav", format="wav") #Output eng.wav file to detect text from

    Recognizing text from Audio file

    Next, we need to detect the text from the speech for Audio file.This can be done by making use of SpeechRecognition python library.

    To install the package you need to run pip install SpeechRecognition

    This library has support for various Speech recognition engines which can be found at Speech Recognition Documentaion , but for my project i arbitarily used Google Speech Recognition support

    import speech_recognition as sr
     
    audio = 'eng.wav' #name of the file
    r = sr.Recognizer()
    with sr.AudioFile(audio) as source:
        audio = r.record(source)
        try:
            recognized_text = r.recognize_google(audio,language='en-US')
            print('Decoded text from Audio is {}'.format(recognized_text))
        except:
            print('Sorry could not recognize your voice')

    Asserting input text with the recognized text

    assert keyword_input.lower() == recognized_text.lower(),"Detected speech text doesnt match with input text"

    Combining all the above pieces , our final test_voice_demo_app.py script would look like:

    import os
    import unittest
    import time
    from selenium import webdriver
    from pydub import AudioSegment
    import speech_recognition as sr
     
    class VoiceWebAppTest(unittest.TestCase):
        "Class to run tests against voice web app"
        def setUp(self):
            "Setup for the test"
            chrome_options = webdriver.ChromeOptions() 
            prefs = {'download.default_directory' : 'E:/workspace-qxf2/hackathon/Voicewebapptest'}
            chrome_options.add_experimental_option('prefs', prefs)
            self.driver = webdriver.Chrome(chrome_options=chrome_options)
            self.driver.maximize_window()
     
     
        def test_voice_web_app(self):
            "Test the voice web app text to speech"        
            url = 'https://text-to-speech-demo.ng.bluemix.net/'
            print 'Opening %s'%url
     
            #Open the Voice Demo App
            self.driver.get(url)
            time.sleep(5)
            #Scroll down        
            self.driver.execute_script("window.scrollBy(0, -150);")
            time.sleep(5)
            #Input the text
            self.driver.find_element_by_xpath("//select[@name='voice']").click()
            time.sleep(2)
            self.driver.find_element_by_xpath("//select[@name='voice']/option[@value='en-US_AllisonVoice']").click()
     
            #Set input text
            keyword_input = 'Thank You'
            print 'Input text is : %s'%keyword_input
            input_text_area = self.driver.find_element_by_xpath("//div[@data-id='Text']/textarea[@class='base--textarea textarea']")
            input_text_area.clear()
            input_text_area.send_keys(keyword_input)       
     
            #Download speech Audio Mp3 file
            self.driver.find_element_by_xpath("//button[text()='Download']").click()
            time.sleep(5)
     
            #Convert Mp3 file to .Wav file  
            sound = AudioSegment.from_mp3("transcript.mp3")
            sound.export("eng.wav", format="wav")  
     
     
            #Recognize the text from Mp3 Audio file 
            audio = 'eng.wav' #name of the file
            r = sr.Recognizer()
            with sr.AudioFile(audio) as source:
                audio = r.record(source)
                try:
                    recognized_text = r.recognize_google(audio,language='en-US')
                    print('Decoded text from Audio is {}'.format(recognized_text))
                except:
                    print('Sorry could not recognize your voice')
     
            assert keyword_input.lower() == recognized_text.lower(),"Detected speech text doesnt match with input text"
     
        def tearDown(self):
            "Tear down the test"
            self.driver.quit()
            os.remove('transcript.mp3')
            os.remove('eng.wav')
     
     
     
     
    #---START OF SCRIPT
    if __name__ == '__main__':
        suite = unittest.TestLoader().loadTestsFromTestCase(VoiceWebAppTest)
        unittest.TextTestRunner(verbosity=2).run(suite)

    How to run

    To run the script , use command –

    python test_voice_demo_app.py

    Output

    What next?

    You can extend this test script to include support for various other input voice language texts available in Text to Speech Demo app by making use of various python language translator packages.

     

     

    6 thoughts on “Automation testing of Text to Speech web app

    1. Hi Can we Automate “speech to text and text to speech” using the above?If yes can you elaborate a little ?

    Leave a Reply

    Your email address will not be published. Required fields are marked *