Azure Text to Speech

How To Convert Text To Speech With Azure Cognitive Services

This Azure tutorial will discuss how to convert text to speech with Azure Cognitive Services.

Azure Text to Speech

Before discussing the actual functionality, we should know about Azure Speech Services.

Azure Speech Services

Well, here we will discuss an end-to-end tutorial on Azure Speech Services. Azure Speech Services provides you with the speech processing capability for your application easily. It helps the developer easily implement your application’s speech-processing capability with very little coding effort.

The Speech processing capability includes the following

  • Speech to text: You can convert the speech to readable text with the help of the Azure Cognitive Speech API.
  • Text to speech: We can convert the Text to audible speech easily with the help of the Azure Speech API.
  • Speech translation: you can integrate speech translation easily into your apps using Azure cognitive services Speech API.
  • Speaker recognition: Provides you the ability to recognize the people speaking based on the audio.

Before starting the actual functionality, we should know the Prerequisites needed here.

Prerequisites

  • You must have a valid Azure Subscription or a Valid Azure Account. If you don’t have till now, create an Azure Free Account now.
  • It would be best to have the speech service subscription and the Azure Subscription.
  • Visual Studio 2019 needs to be installed on your local machine. If you don’t have it on your local machine, Install Visual Studio 2019 now.

Assuming you have all the Prerequisites needed here for the development activity, Let’s start the actual development activity. The first step is creating the Azure Cognitive Services Speech API.

Create the Azure Cognitive Services Speech API

Follow the below steps to Create the Azure Cognitive Services Speech API using Azure Portal.

1. Log in to the Azure Portal (https://portal.azure.com/)

2. Then, search for the Speech and click on the search result Speech under the Marketplace as highlighted below.

azure speech api

3. On the Create window, You need to provide the below details.

  • Name: You must provide a name for the Azure Cognitive Services Speech API.
  • Subscription: You need to provide a valid subscription that you want to use to create the Speech Azure Cognitive Services.
  • Location: You must provide a location for the Speech Azure Cognitive Services.
  • Pricing tier: Choose the Pricing tier as Free F0. which is free and can be used for the demo. You can click the view full pricing details link to check out all the price details and select one based on your requirements.
  • Resource group: You can select your existing resource group or create a new one by clicking on the Create new link if you don’t have any existing Resource Group.

Once you have provided all the above options, click the Create button to create the Speech Azure Cognitive Service.

azure text to speech python

On the below screen, click the Go to resource button to navigate to the Speech Azure Cognitive Service you created.

azure text to speech rest api

Now you can see the Cognitive Service you have created just now. Click on the Overview tab. On the Cognitive Service page, You can see the location for the Speech Cognitive Service page. You can note down the location for the Speech Cognitive Service that we need to use in the code in the below steps.

text to speech azure

We need the key for the Speech Cognitive Service to use in our code. On the Cognitive Service page, click on the Keys and Endpoint link from the left navigation. Now, you can see the Key 1 or Key 2 option. Click the copy button to copy the KEY 1 to the clipboard as highlighted below. The key value we will have to use in the code.

If you don’t want the old key, click on the ReGenerate Key1 or ReGenerate Key2 button to generate a new key. If you want the Key values, click the Show Keys button.

azure speech service

Now, we are done with our first step of the development activity. The next step is to create a console application and add the C# code to convert the Speech to text and text to Speech using Visual Studio 2019. Let’s create a console application.

Create a console application using Visual Studio 2019

Open Visual Studio 2019 from your local machine and click the Create a New Project button in the Get Started window.

Choose the Project template as Console App (.NET Framework) and click the Next button.

azure text to speech api

Provide the below details on the Configure your new project window

  • Project Name: Provide a unique name for your console application.
  • Location: Choose a location in your local machine to save your console application.
  • Framework: Select the framework as the latest one.

Finally, click on the Create button to create the console application.

Microsoft azure text to speech

You can see that the project was created successfully without any issues.

microsoft azure speech to text

Installing the Speech SDK

Now, the next step is to work with the Azure Speech service. We need to add a NuGet package named Microsoft.CogntiveServices.Speech. To add the NuGet package to your project, follow the below steps

Right-click on the Project and then click on the Manage NuGet Package Link as shown below

microsoft azure text-to-speech

Now, click the Browse tab and search for Speech once you get the NuGet package, Microsoft.CogntiveServices.Speech as the search result, select the NuGet package, and then click on the Install button to install the NuGet package.

azure text to speech example

Text To Speech

Now, we will see How to convert text to speech with Azure Cognitive Services. Add the code below to your Program.cs file.

Note: Make sure to change the key value and location based on the speech service you created above.

using System;
using System.Threading.Tasks;
using Microsoft.CognitiveServices.Speech;


namespace DemoSpeechService
{
    class Program
    {
        private const string Key = "b8525d4d635c4ead882b8f56b404bff4";
        private const string Location = "eastus"; // Azure Speech Service Location
        public static async Task ConvertTextToSpeech(string text)
        {
            
        var confg = SpeechConfig.FromSubscription(Key, Location);
            
            using (var converter = new SpeechSynthesizer(confg))
            {
                using (var r = await converter.SpeakTextAsync(text))
                {
                    if (r.Reason ==
                    ResultReason.SynthesizingAudioCompleted)
                        Console.WriteLine($"Speech converted " +
                        $"to speaker for text [{text}]");
                    else if (r.Reason == ResultReason.Canceled)
                    {
                        var canc =
                       SpeechSynthesisCancellationDetails.FromResult(r);
                        Console.WriteLine($"CANCELED: " +
                         $"Reason={canc.Reason}");
                        if (canc.Reason ==
                        CancellationReason.Error)
                        {
                            Console.WriteLine($"Cancelled with " +
                            $"Error Code {canc.ErrorCode}");
                            Console.WriteLine($"Cancelled with " +
                            $"Error Details " +
                           $"[{canc.ErrorDetails}]");
                        }
                    }
                }
                Console.WriteLine("Waiting to play " +
                "the audio again...");
                Console.ReadKey();
            }
        }
        static void Main()
        {
            ConvertTextToSpeech("Hello, how are you? " +
            "Welcome to AzureLessons").Wait();
        }

    }
}

Now, run the application, and you will get the error “Cognitive Services Speech SDK doesn’t support ‘Any CPU’ as a platform.” This error is because, by default, Visual Studio set our project to compile to Any CPU. So you need to choose a target platform to compile the code to. Follow the below steps to make the changes to fix this issue

You can do this by clicking the drop-down arrow next to Any CPU. Then, click the Configuration Manager option to set up a target platform

Cognitive Services Speech SDK doesn't support 'Any CPU'

Click New in the Active Solution Platform drop-down menu, as shown below.

microsoft speech to text api

If you are using a 32-bit operating system, you must choose x86. If you’re operating
the system with 64 bits, you can target x86 or x64, select Any CPU as the Copy settings, and then click the Ok button.

microsoft speech to text

Now you can see that the Speech project was assigned to the build option you selected, as shown below.

azure tts python

Now, Run the application. You won’t get the error this time. Once you run the application, you can hear “Hello, how are you? Welcome to AzureLessons” as the Audio on your Laptop speaker.

azure text to speech example

Text To Audio

Above, we have written the C# Logic to convert the text to speech directly. Now, instead of directly converting the written text to speech, we will write the code to create an audio file that can be used to play the written text.

We will refactor the existing code and then add a new method and a few new namespaces.

using System;
using System.Threading.Tasks;
using Microsoft.CognitiveServices.Speech;

using System.IO;
using Microsoft.CognitiveServices.Speech.Audio;

The new method will be like below

public static async Task ConvertTextToAudioFile(string text,
        string func1)
        {
            var config = SpeechConfig.FromSubscription(Key, Location);
            using (FileStream fs = new FileStream(func1, FileMode.Create))
            using (BinaryWriter wr = new BinaryWriter(fs))
                wr.Write(
                System.Text.Encoding.ASCII.GetBytes("RIFF"));
            using (var fw = AudioConfig.FromWavFileOutput(func1))
            using (var ss = new
            SpeechSynthesizer(config, fw))
                await Conversion(text, ss);
        }

The code inside the Main method will be like the one below

static void Main()
{
  string txt = "Hello, how are you?" +
            "Welcome to AzureLessons";
            string fn = @"E:\Bijay\Test\hello.wav";
            ConvertTextToAudioFile(txt, fn).Wait();
}

Text-to-speech and audio code

Now, the complete code for the conversion from Text to speech and audio code is as below

using System;
using System.IO;
using System.Threading.Tasks;
using Microsoft.CognitiveServices.Speech;
using Microsoft.CognitiveServices.Speech.Audio;

namespace DemoSpeechService
{
    class Program
    {
        private const string Key = "b8525d4d635c4ead882b8f56b404bff4";
        private const string Location = "eastus"; // Azure Speech Service Location
        public static async Task ConvertTextToSpeech(string text)
        {
            var config = SpeechConfig.FromSubscription(Key, Location);
            using (var ss = new SpeechSynthesizer(config))
                await Conversion(text, ss);
           
 }
        public static async Task ConvertTextToAudioFile(string text,
        string func1)
        {
            var config = SpeechConfig.FromSubscription(Key, Location);
            using (FileStream fs = new FileStream(func1, FileMode.Create))
            using (BinaryWriter wr = new BinaryWriter(fs))
                wr.Write(
                System.Text.Encoding.ASCII.GetBytes("RIFF"));
            using (var fw = AudioConfig.FromWavFileOutput(func1))
            using (var ss = new
            SpeechSynthesizer(config, fw))
                await Conversion(text, ss);
        }
        private static async Task Conversion(string text,
        SpeechSynthesizer synthesizer)
        {
            using (var r = await synthesizer.SpeakTextAsync(text))
            {
                if (r.Reason == ResultReason.SynthesizingAudioCompleted)
                    Console.WriteLine($"Speech converted " +
                    $"to speaker for text [{text}]");
                else if (r.Reason == ResultReason.Canceled)
                {
                    var cancellation =
                    SpeechSynthesisCancellationDetails.FromResult(r);
                    Console.WriteLine($"CANCELED: " +
                    $"Reason={cancellation.Reason}");
                    if (cancellation.Reason == CancellationReason.Error)
                    {
                        Console.WriteLine($"Cancelled with " +
                        $"Error Code {cancellation.ErrorCode}");
                        Console.WriteLine($"Cancelled with " +
                        $"Error Details " +
                        $"[{cancellation.ErrorDetails}]");
                    }
                }
            }
            Console.WriteLine("Waiting to play " +
            "the audio again...");
            Console.ReadKey();
        }

        static void Main()
        {
            string txt = "Hello, how are you?" +
            "Welcome to AzureLessons";
            string fn = @"E:\Bijay\Test\hello.wav";
            ConvertTextToSpeech(txt).Wait();
            ConvertTextToAudioFile(txt, fn).Wait();
        }
    }

}

Once you run the above code, you will find the. The Wav file will be generated in the mentioned local path.

Azure Text To Speech Rest API

You can able to convert the text into speech, and you can also get a list of supported voices based on region using the REST APIs. One point to note here is that in the case of Rest API, each endpoint belongs to a particular region.

The text-to-speech Rest API has two types of voices, Which are as below

  • Neural text-to-speech voice
  • Standard text-to-speech voice

It’s quite easy to implement the Azure speech-to-text functionality using Rest API with the great help of Azure Cognitive Services. Check out an example on Azure Text To Speech Rest API now.

You may also like following the articles below

Wrapping Up

In this article, we discussed how to convert text to speech with Azure. I hope you have enjoyed this article!!!.