ai-app-speech-to-text

AI App Speech to Text

In this post we are going to implement an app Converts a voice to text.

perroquets

Python is installed.

Install google-cloud-speech

start command prompt and run the following:

pip install --trusted-host pypi.org --trusted-host pypi.python.org --trusted-host=files.pythonhosted.org google-cloud-speech==2.22.0

Verify Installation:
Open a Python interpreter or a Python script and import the google-cloud-speech library to ensure it’s installed correctly without any errors by running the following command:

C:\Utvecklingprogram\OpenAI>python
>>> import google.cloud.speech
>>>

If there are no errors, it means the library is successfully installed and accessible.

Set Up Authentication

In this post I am using Google Cloud services

You’ll need to set up authentication. This typically involves creating a service account and providing its credentials to your application. The authentication process may vary depending on your specific use case. You can refer to the Google Cloud documentation for instructions on setting up authentication. Here is step by step how to setup Authentication on Google Cloud services:

  1. Open a Web Browser and Type the following URL into the address bar of your web browser:
    https://console.cloud.google.com/
    Sign In to Google Account:
    If you’re not already signed in to your Google Account, you’ll be prompted to sign in. Enter your Google account credentials (email and password) to sign in.

2. Select or Create a Project:
After signing in, you’ll be directed to the Google Cloud Console. If you have existing projects, you’ll see them listed. If not, you may need to create a new project.

To select an existing project, click on the project name dropdown in the top navigation bar and choose the desired project.
To create a new project, click on the project name dropdown, then click on the “New Project” button, follow the prompts to create the project, and ensure it’s selected in the top bar.
Navigate to Service Accounts:

3. Once you’ve selected or created a project, click on the menu icon (☰) in the upper left corner of the Cloud Console.
Go to “IAM & Admin” -> “Service Accounts” from the left-hand side menu.
Create and Manage Service Accounts:
Follow the steps mentioned in the previous response to create a new service account, grant necessary permissions, and download the JSON key file

4. Service accounts for project “My First Project“Service account details:
Service account name: give ett unique name (for me: mehzanxx)
Service account id: give ett unique name (for me: mehzanxx)
Service account description: Test AI Speech
then press to  Create and continue button

Grant users access to this service account (optional):
Service account users role: mehzan07@yahoo.com
Service account admins role: mehzan07@yahoo.com

Create and download the key:

Visit the Google Cloud Console and sign in with your Google Account credentials.
Navigate to Service Accounts:

Click on the menu icon (☰) in the upper left corner to access the navigation menu.
Go to “IAM & Admin” -> “Service Accounts” from the left-hand side menu.
Select the Service Account:

Locate the service account for which you want to create a key.
Click on the service account’s name or email address.
Create a Key:

In the service account details page, locate the “Keys” section.
Click on the “Add Key” drop-down button and select “Create New Key.”
Choose Key Type and Create:

Select the key type as “JSON.”
Click the “Create” button. This action will download a JSON file containing your service account’s credentials to your computer

The file will be downloaded to your computer’s default download location.
You may be prompted to choose a location to save the file. If prompted, select a secure location on your computer to save the JSON key file.

the file name is like: natural-osprey-408612-73337e3649b4.json.

Create a new path in your computer and move this file to this folder, for me is : C:\GoogleServiceAccount

Enable the Cloud Speech-to-Text API

click on the menu icon (☰) in the upper left corner of the Cloud Console then select API and Services and  then find Cloud Speech to Text API and enable it. ( you can enable even Cloud Text to Speech API if you want).

Now you are ready to run Python script which the your app convert a voice file to text.

Create your python script

  1. Create a folder in your local computer (for me: C:\Utvecklingprogram\OpenAI\speech_to_text)
  2. Copy and paste the following code to your Python script file in your folder:
    import os
    os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = "C:\\GoogleServiceAccount\\natural-osprey-408612-73337e3649b4.json"
    
    
    from google.cloud import speech
    
    # Create a SpeechClient object
    client = speech.SpeechClient()
    
    # Specify the audio file to transcribe
    audio_file = "C:\Utvecklingprogram\OpenAI\speech_to_text\audio_files\\synthesis.wav"
    sample_rate = 24000  # Replace this with the actual sample rate of your audio file
    
    
    
    
    # Load the audio file
    with open(audio_file, "rb") as audio_file:
        audio_data = audio_file.read()
    
    # Configure the speech recognition
    audio = speech.RecognitionAudio(content=audio_data)
    config = speech.RecognitionConfig(
        encoding=speech.RecognitionConfig.AudioEncoding.LINEAR16,
        sample_rate_hertz=sample_rate,
        language_code="en-US",
    )
    
    # Perform the speech recognition
    response = client.recognize(config=config, audio=audio)
    
    # Print the recognized text
    for result in response.results:
        print("Transcript: {}".format(result.alternatives[0].transcript))
    
  3. Description of the code above:
  • The first lines from 1 to 2: imports Authentication ( Credential), from the Google Service key.
  • The line 5 imports Speech from  Google Cloud
  • Line 8 Creates Client
  • Line 11 takes the audio_file path ( I have created audio file synthesis.wav  and  stored  it  in  the  path:  .\<span class="token string">OpenAI\speech_to_text\audio_files </span>(You can find a tool and create a voice file.)
  • The remaining code opens audio file and reads it and then converts it to text.

Run the App

  • Go to the folder where the python code is stored (for me is:C:\Utvecklingprogram\OpenAI\speech_to_text  and file is ai_app_SpeechToText.py)
  • Start command prompt and cd to the your folder
  • Run the following python command:
    python ai_app_SpeechToText.py  
  • The output shall be as following:

ai-app-speech-to-text-1.png

As you see the text is  I am fine, which is from the Voice file which is contained voice: I am fine.

Source code is in myGithub

Text to Speech

Now we want to convert text to speech.

you need install Google Cloud Text-to-Speech client library installed. You can install it via pip:

>pip install google-cloud-texttospeech

you should have a voice file in the audio_files folder (synthesis.wav) which you want to convert text.

Copy and paste the following code to a python file and call it (e.g. TTS_STT.py)

import os
from google.cloud import speech_v1p1beta1 as speech
from google.cloud import texttospeech

# Set up Google Cloud credentials
os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = "C:\\GoogleServiceAccount\\natural-osprey-408612-73337e3649b4.json"

# Function for Speech-to-Text (STT)
def speech_to_text(audio_file, sample_rate):
    # Create a SpeechClient object
    client = speech.SpeechClient()

    # Load the audio file
    with open(audio_file, "rb") as audio_file:
        audio_data = audio_file.read()

    # Configure the speech recognition
    audio = speech.RecognitionAudio(content=audio_data)
    config = speech.RecognitionConfig(
        encoding=speech.RecognitionConfig.AudioEncoding.LINEAR16,
        sample_rate_hertz=sample_rate,
        language_code="en-US",
    )

    # Perform the speech recognition
    response = client.recognize(config=config, audio=audio)

    # Print the recognized text
    for result in response.results:
        print("Transcript: {}".format(result.alternatives[0].transcript))

# Function for Text-to-Speech (TTS)
def text_to_speech(text):
    # Initialize a Text-to-Speech client
    tts_client = texttospeech.TextToSpeechClient()

    # Set text input
    synthesis_input = texttospeech.SynthesisInput(text=text)

    # Select voice parameters and audio format
    voice = texttospeech.VoiceSelectionParams(
        language_code="en-US",  # Choose language code
        name="en-US-Wavenet-D",  # Choose voice name
        ssml_gender=texttospeech.SsmlVoiceGender.NEUTRAL  # Choose gender
    )

    audio_config = texttospeech.AudioConfig(
        audio_encoding=texttospeech.AudioEncoding.LINEAR16  # Choose audio format
    )

    # Generate the speech
    response = tts_client.synthesize_speech(
        input=synthesis_input, voice=voice, audio_config=audio_config
    )

    # Save the audio response to a file
    with open("output_audio.wav", "wb") as out:
        out.write(response.audio_content)

    print("Text-to-Speech conversion completed. Output saved as output_audio.wav")

# Ask the user for their choice
print("Choose an option:")
print("1. Speech to Text")
print("2. Text to Speech")
option = input("Enter your choice (1 or 2): ")

# Perform the selected action based on user input
if option == "1":
    # Assuming the audio file and sample rate are already defined
    audio_file = "C:\\Utvecklingprogram\\OpenAI\\speech_to_text\\audio_files\\synthesis.wav"
    sample_rate = 24000  # Replace this with the actual sample rate of your audio file
    speech_to_text(audio_file, sample_rate)
elif option == "2":
    text_to_convert = input("Enter the text you want to convert to speech: ")
    text_to_speech(text_to_convert)
else:
    print("Invalid choice. Please enter either 1 or 2.")

Start command line and cd to  the folder where the code is (e.g. C:\Utvecklingprogram\OpenAI\TextToSpeech_And_SpeechToText)

then run the following python command:

> python TTS_STT.py
it ask you :
1- 1- Speech to text
2- Text to speech

if you give 1 then the output shall be:

Transcript: I am fine

If you give 2 the output shall be:

then asks you write a text and you give a text then i creates you a voice file (output_audio.wav), open the file you can here from the text you have given.

Source code is in myGithub

Conclusion

In this post we have installed google-cloud-speech, created Google Service Account, created and download a key and enabled Cloud Speech-to-Text API. Installed google-cloud-texttospeech  Created Python script code which contains a voice file to text or text to speech.

My next  post is: AI heath Check basic

This post is part of  AI (Artificial Intelligence) step by step

Back to home page