How To Convert Text To Speech With Azure Cognitive Services

How To Convert Text To Speech With Azure Cognitive Services

In this Azure tutorial, we will discuss How to convert text to speech with Azure Cognitive Services. Along with this, we will also discuss the below topics

  • Azure Speech Services
  • Create the Azure Cognitive Services Speech API
  • Installing the Speech SDK
  • Text To Speech
  • Cognitive Services Speech SDK doesn’t support ‘Any CPU’ as a platform.
  • Text To Audio
  • Speech to text
  • How to convert speech to text with Azure Cognitive Services
  • Audio To Text

How to convert text to speech with Azure Cognitive Services

Well, here we will discuss  How to convert text to speech with Azure Cognitive Services. Before discussing the actual functionality, we should know about Azure Speech Services.

Azure Speech Services

Well, here we will discuss an end to end tutorial on the Azure Speech Services. Azure Speech Services provides you the speech processing capability to your application easily. Especially, it helps the developer to easily implement the Speech Processing capability to your application with very little coding effort.

The Speech processing capability includes the followings

  • Speech to text: Where you can convert the speech to readable text with the help of the Azure Cognitive Speech API.
  • Text to speech: We can convert the Text to audible speech easily with the help of the Azure Speech API.
  • Speech translation: you can able to integrate the speech translation easily into your apps using Azure cognitive services Speech API.
  • Speaker recognition: Provides you the ability to recognize the people speaking based on the audio.

As part of this article, we will see How to convert text to speech with Azure Cognitive Services and we will also discuss How to convert speech to text with Azure Cognitive Services. Before starting the actual functionality we should know the Prerequisites needed here.

Prerequisites

  • You must have a valid Azure Subscription or a Valid Azure Account. If you don’t have till now, create an Azure Free Account now.
  • You should have the speech service subscription along with the Azure Subscription.
  • Visual Studio 2019 needs to be installed on your local machine. If you don’t have it on your local machine, Install Visual Studio 2019 now.

Assuming that you have all the Prerequisites needed here for the development activity, Let’s start the actual development activity. The first step is we need to Create the Azure Cognitive Services Speech API.

Create the Azure Cognitive Services Speech API

Follow the below steps to Create the Azure Cognitive Services Speech API using Azure Portal.

Login to the Azure Portal (https://portal.azure.com/)

Then, search for the Speech and then click on the search result Speech under the Marketplace as highlighted below.

Setting up Azure Cognitive Services Azure Portal

On the Create window, You need to Provide the below details.

  • Name: You need to provide a name for the Azure Cognitive Services Speech API.
  • SubscriptionYou need to Provide a valid subscription that you want to use to create the Speech Azure Cognitive Services.
  • Location: You need to provide a location for the Speech Azure Cognitive Services.
  • Pricing tier: Choose the Pricing tier as Free F0. which is free and can be used for the demo purpose and you can click on the view full pricing details link to check out all the price details and you can select one based on your requirement.
  • Resource groupYou can select your existing resource group or you can create a new one by clicking on the Create new link if you don’t have any existing Resource Group.

Once you have provided all the above options, finally, you can click on the Create button to create the Speech Azure Cognitive Service.

How to configure Azure Cognitive Services Azure Portal

On the below screen, click on the Go to resource button to navigate to the Speech Azure Cognitive Service that you have created.

azure speech service configuration

Now you can able to see the Cognitive Service that you have created just now. Click on the Overview tab On the Cognitive Service page, You can able to see the location for the Speech Cognitive Service page. You can note down the location for the Speech Cognitive Service that we need to use in the code in the below steps.

How to configure Azure Cognitive Services using Azure Portal

We need the key for the Speech Cognitive Service to use in our code. On the Cognitive Service page, click on the Keys and Endpoint link from the left navigation. Now you can able to see the Key 1 or Key 2 option, click on the copy button to copy the KEY 1 to the clipboard as highlighted below. The key value we will have to use in the code.

If you don’t want the old key, you can click on the ReGenerate Key1 or ReGenerate Key2 button to generate a new key. If you want the Key values, you can click on the Show Keys button.

Setting up  Speech Azure Cognitive Services Azure Portal

Now we are done with our first step of the development activity. The next step is we will create a console application and add the C# code to convert the Speech to text and text to Speech using Visual Studio 2019. Let’s create a console application.

Create a console application using Visual Studio 2019

Open the Visual studio 2019 from your local machine and click on the Create a New Project button on the get started window.

Choose the Project template as Console App (.NET Framework) and then click on the Next button.

azure speech

Provide the below details on the Configure your new project window

  • Project Name: Provide a unique name for your console application.
  • Location: Choose a location in your local machine to save your console application.
  • Framework: Select the framework as the latest one.

Finally, click on the Create button to create the console application.

azure speech to text tutorial

Now, you can able to see that the project got created successfully with out any issue.

microsoft azure speech to text

Installing the Speech SDK

Now the next step is in order to work with the Azure Speech service, we need to add a NuGet package named Microsoft.CogntiveServices.Speech. To add the NuGet package to your project, follow the below steps

Right click on the Project and then click on the Manage NuGet Package Link as shown below

Cognitive Speech Services

Now, click on the Browse tab and then search for Speech, Once you will get the NuGet package, Microsoft.CogntiveServices.Speech as the search result, select the NuGet package, and then click on the Install button to install the NuGet package.

azure speech to text api example

Text To Speech

Now we will see How to convert text to speech with Azure Cognitive Services. Add the below code to your Program.cs file.

Note: Make sure to change the key value and location based on your speech service that you have created above.

using System;
using System.Threading.Tasks;
using Microsoft.CognitiveServices.Speech;


namespace DemoSpeechService
{
    class Program
    {
        private const string Key = "b8525d4d635c4ead882b8f56b404bff4";
        private const string Location = "eastus"; // Azure Speech Service Location
        public static async Task ConvertTextToSpeech(string text)
        {
            
        var confg = SpeechConfig.FromSubscription(Key, Location);
            
            using (var converter = new SpeechSynthesizer(confg))
            {
                using (var r = await converter.SpeakTextAsync(text))
                {
                    if (r.Reason ==
                    ResultReason.SynthesizingAudioCompleted)
                        Console.WriteLine($"Speech converted " +
                        $"to speaker for text [{text}]");
                    else if (r.Reason == ResultReason.Canceled)
                    {
                        var canc =
                       SpeechSynthesisCancellationDetails.FromResult(r);
                        Console.WriteLine($"CANCELED: " +
                         $"Reason={canc.Reason}");
                        if (canc.Reason ==
                        CancellationReason.Error)
                        {
                            Console.WriteLine($"Cancelled with " +
                            $"Error Code {canc.ErrorCode}");
                            Console.WriteLine($"Cancelled with " +
                            $"Error Details " +
                           $"[{canc.ErrorDetails}]");
                        }
                    }
                }
                Console.WriteLine("Waiting to play " +
                "the audio again...");
                Console.ReadKey();
            }
        }
        static void Main()
        {
            ConvertTextToSpeech("Hello, how are you? " +
            "Welcome to AzureLessons").Wait();
        }

    }
}

Cognitive Services Speech SDK doesn’t support ‘Any CPU’ as a platform.

Now, run the application, you will get the error Cognitive Services Speech SDK doesn’t support ‘Any CPU’ as a platform.”, This error is because, by default, Visual Studio set our project to compile to Any CPU. So you need to choose a target platform to compile the code to. Follow the below steps to do the changes to fix this issue

You can do this by clicking the drop-down arrow next to Any CPU. Then, click the Configuration Manager option to set up a target platform

Cognitive Services Speech SDK doesn't support 'Any CPU'

In the Active solution Platform drop-down menu, click New, as shown below.

microsoft speech to text api

If you are using a 32-bit operating system, you must choose x86. if you’re operating
the system is 64 bits, you can choose to target x86 or x64 and then select Any CPU as the Copy settings from and then click on the Ok button.

microsoft speech to text

Now you can able to see that the Speech project was assigned to the build option you selected as shown below.

azure cognitive services speech to text api

Now Run the application, you won’t get the error this time. Once you will run the application you can able to hear the statement “Hello, how are you? Welcome to AzureLessons” as the Audio on your Laptop speaker.

azure speech to text rest api

Text To Audio

Above, we have written the C# Logic to convert the text to speech directly. Now, instead of directly converting the written text to speech, we will write the code create an audio file that can be used to play the written text.

We will refactor the existing code and then we will also add a new method and we will also have to add a few new namespaces.

using System;
using System.Threading.Tasks;
using Microsoft.CognitiveServices.Speech;

using System.IO;
using Microsoft.CognitiveServices.Speech.Audio;

The new method will be like below

public static async Task ConvertTextToAudioFile(string text,
        string func1)
        {
            var config = SpeechConfig.FromSubscription(Key, Location);
            using (FileStream fs = new FileStream(func1, FileMode.Create))
            using (BinaryWriter wr = new BinaryWriter(fs))
                wr.Write(
                System.Text.Encoding.ASCII.GetBytes("RIFF"));
            using (var fw = AudioConfig.FromWavFileOutput(func1))
            using (var ss = new
            SpeechSynthesizer(config, fw))
                await Conversion(text, ss);
        }

The code inside the Main method will be like below

static void Main()
{
  string txt = "Hello, how are you?" +
            "Welcome to AzureLessons";
            string fn = @"E:\Bijay\Test\hello.wav";
            ConvertTextToAudioFile(txt, fn).Wait();
}

Text to speech and audio code

Now the complete code for the conversion from Text to speech and audio code is as below

using System;
using System.IO;
using System.Threading.Tasks;
using Microsoft.CognitiveServices.Speech;
using Microsoft.CognitiveServices.Speech.Audio;

namespace DemoSpeechService
{
    class Program
    {
        private const string Key = "b8525d4d635c4ead882b8f56b404bff4";
        private const string Location = "eastus"; // Azure Speech Service Location
        public static async Task ConvertTextToSpeech(string text)
        {
            var config = SpeechConfig.FromSubscription(Key, Location);
            using (var ss = new SpeechSynthesizer(config))
                await Conversion(text, ss);
           
 }
        public static async Task ConvertTextToAudioFile(string text,
        string func1)
        {
            var config = SpeechConfig.FromSubscription(Key, Location);
            using (FileStream fs = new FileStream(func1, FileMode.Create))
            using (BinaryWriter wr = new BinaryWriter(fs))
                wr.Write(
                System.Text.Encoding.ASCII.GetBytes("RIFF"));
            using (var fw = AudioConfig.FromWavFileOutput(func1))
            using (var ss = new
            SpeechSynthesizer(config, fw))
                await Conversion(text, ss);
        }
        private static async Task Conversion(string text,
        SpeechSynthesizer synthesizer)
        {
            using (var r = await synthesizer.SpeakTextAsync(text))
            {
                if (r.Reason == ResultReason.SynthesizingAudioCompleted)
                    Console.WriteLine($"Speech converted " +
                    $"to speaker for text [{text}]");
                else if (r.Reason == ResultReason.Canceled)
                {
                    var cancellation =
                    SpeechSynthesisCancellationDetails.FromResult(r);
                    Console.WriteLine($"CANCELED: " +
                    $"Reason={cancellation.Reason}");
                    if (cancellation.Reason == CancellationReason.Error)
                    {
                        Console.WriteLine($"Cancelled with " +
                        $"Error Code {cancellation.ErrorCode}");
                        Console.WriteLine($"Cancelled with " +
                        $"Error Details " +
                        $"[{cancellation.ErrorDetails}]");
                    }
                }
            }
            Console.WriteLine("Waiting to play " +
            "the audio again...");
            Console.ReadKey();
        }

        static void Main()
        {
            string txt = "Hello, how are you?" +
            "Welcome to AzureLessons";
            string fn = @"E:\Bijay\Test\hello.wav";
            ConvertTextToSpeech(txt).Wait();
            ConvertTextToAudioFile(txt, fn).Wait();
        }
    }

}

Once you will run the above code, you will find the . wav file will get generated in the mentioned local path.

Speech to text

So we have discussed above, How to convert text to speech with Azure Cognitive Services. Now let’s see How to convert speech to text with Azure Cognitive Services. Here we will use the SpeechRecognizer method instead of the SpeechSynthesizer method.

How to convert speech to text with Azure Cognitive Services

You need to add the below code in your Program.cs file.

using System;
using System.IO;
using System.Threading.Tasks;
using Microsoft.CognitiveServices.Speech;
using Microsoft.CognitiveServices.Speech.Audio;

namespace DemoSpeechService
{
    class Program
    {
        private const string Key = "b8525d4d635c4ead882b8f56b404bff4";
        private const string Location = "eastus"; // Azure Speech Service Location
        public static async Task ConvertSpeechToText()
        {
            var confg = SpeechConfig.FromSubscription(Key, Location);
            using (var sr = new SpeechRecognizer(confg))
                await Identify(sr);
            
 }
        private static async Task Identify(SpeechRecognizer recognizer)
        {
            var rslt = await recognizer.RecognizeOnceAsync();
            if (rslt.Reason == ResultReason.RecognizedSpeech)
                Console.WriteLine($"Recognized: {rslt.Text}");
            else if (rslt.Reason == ResultReason.NoMatch)
                Console.WriteLine("Speech could not be recognized.");
            else if (rslt.Reason == ResultReason.Canceled)
            {
                var cancellation =
                CancellationDetails.FromResult(rslt);
                Console.WriteLine
                ($"Cancelled due to reason={cancellation.Reason}");
                if (cancellation.Reason == CancellationReason.Error)
                {
                    Console.WriteLine
                    ($"Error code={cancellation.ErrorCode}");
                    Console.WriteLine
                    ($"Error details={cancellation.ErrorDetails}");
                    
                }
            }
        }


        static void Main()
        {
            ConvertSpeechToText().Wait();
            Console.ReadLine();
        }
    }


}

Now run the application, speak out something it will display in the console window like below.

How to convert speech to text with Azure Cognitive Services

Audio To Text

I will use a .wav file which I have kept in my local machine path and will identify the recorded speech as text. You can use the below code in your Program.cs file.

using System;
using System.IO;
using System.Threading.Tasks;
using Microsoft.CognitiveServices.Speech;
using Microsoft.CognitiveServices.Speech.Audio;

namespace DemoSpeechService
{
    class Program
    {
        private const string Key = "b8525d4d635c4ead882b8f56b404bff4";
        private const string Location = "eastus"; // Azure Speech Service Location
        

        
        public static async Task AudioToTextContinuousAsync(string fn)
        {
            var config = SpeechConfig.FromSubscription(Key, Location);
            using (var ai = AudioConfig.FromWavFileInput(fn))
            using (var recognizer = new SpeechRecognizer(config, ai))
                await RecognizeAllSpeech(recognizer);
        }
        private static async Task RecognizeAllSpeech(
        SpeechRecognizer recognizer)
        {
            var tc = new TaskCompletionSource<int>();
            
            recognizer.Recognizing += (sender, eventargs) =>
            {
                
            };
            recognizer.Recognized += (sender, eventargs) =>
            {
                if (eventargs.Result.Reason ==
                ResultReason.RecognizedSpeech)
                    Console.WriteLine
                     ($"Recognized: {eventargs.Result.Text}");
            };
            recognizer.Canceled += (sender, eventargs) =>
            {
                if (eventargs.Reason == CancellationReason.Error)
                    Console.WriteLine("Error reading the audio file.");
                if (eventargs.Reason == CancellationReason.EndOfStream)
                    Console.WriteLine("End of file.");
                tc.TrySetResult(0);
                
            };
            recognizer.SessionStarted += (sender, eventargs) =>
            {
                
            };
            recognizer.SessionStopped += (sender, eventargs) =>
            {
                
                tc.TrySetResult(0);
            };
            
            await recognizer.
            StartContinuousRecognitionAsync().ConfigureAwait(false);
            
            Task.WaitAny(new[] { tc.Task });
            
            await recognizer.StopContinuousRecognitionAsync();
        }
        private static async Task Identify(SpeechRecognizer recognizer)
        {
            var rslt = await recognizer.RecognizeOnceAsync();
            if (rslt.Reason == ResultReason.RecognizedSpeech)
                Console.WriteLine($"Recognized: {rslt.Text}");
            else if (rslt.Reason == ResultReason.NoMatch)
                Console.WriteLine("Speech could not be recognized.");
            else if (rslt.Reason == ResultReason.Canceled)
            {
                var cancellation =
                CancellationDetails.FromResult(rslt);
                Console.WriteLine
                ($"Cancelled due to reason={cancellation.Reason}");
                if (cancellation.Reason == CancellationReason.Error)
                {
                    Console.WriteLine
                    ($"Error code={cancellation.ErrorCode}");
                    Console.WriteLine
                    ($"Error details={cancellation.ErrorDetails}");
                    
                }
            }
        }


        static void Main()
        {
            
            string fn = @"C:\Users\Bijay\Desktop\VM\hello\sample1.wav";
            

            AudioToTextContinuousAsync(fn).Wait();
            Console.ReadLine();
        }
    }


}

Once you will run the application, you will get the expected output.

speech to text with Azure Cognitive Services

So we have discussed, How to convert text to speech with Azure Cognitive Services and How to convert speech to text with Azure Cognitive Services using the above steps.

You may also like following the below articles

Wrapping Up

Well, in this article, we have discussed How To Convert Text To Speech With Azure Cognitive Services, Azure Speech Services, Azure Cognitive Speech Services, Create the Azure Cognitive Services, Speech API, Installing the Speech SDK and we also discussed Text To Speech, Cognitive Services Speech SDK doesn’t support ‘Any CPU’ as a platform., Text To Audio, Speech to text, How to convert speech to text with Azure Cognitive Services and Audio To Text. Hope you have enjoyed this Article !!!.