Azure Text To Speech

This Azure tutorial will discuss how to convert text to speech with Azure Cognitive Services.

Table of Contents

Azure Text to Speech
Text To Audio
- Text-to-speech and audio code
Azure Text To Speech Rest API
Wrapping Up

Before discussing the actual functionality, we should know about Azure Speech Services.

Azure Speech Services

Well, here we will discuss an end-to-end tutorial on Azure Speech Services. Azure Speech Services provides you with the speech processing capability for your application easily. It helps the developer easily implement the application’s speech-processing capability with very little coding effort.

The Speech processing capability includes the following

Speech to text: You can convert the speech to readable text with the help of the Azure Cognitive Speech API.
Text to speech: We can convert the Text to audible speech easily with the help of the Azure Speech API.
Speech translation: you can integrate speech translation easily into your apps using Azure cognitive services Speech API.
Speaker recognition: Provides the ability to recognize people speaking based on their audio.

Before starting the actual functionality, we should be aware of the Prerequisites needed here.

Prerequisites

You must have a valid Azure Subscription or a Valid Azure Account. If you don’t have one till now, create an Azure Free Account now.
It would be best to have the Speech Service subscription and the Azure Subscription.
Visual Studio 2019 needs to be installed on your local machine. If you don’t have it on your local machine, install Visual Studio 2019 now.

Assuming you have all the Prerequisites needed for the development activity, let’s begin the actual development process. The first step is creating the Azure Cognitive Services Speech API.

Create the Azure Cognitive Services Speech API

Follow the steps below to create the Azure Cognitive Services Speech API using the Azure Portal.

1. Log in to the Azure Portal (https://portal.azure.com/)

2. Then, search for ‘Speech’ and click on the search result ‘Speech’ under the Marketplace, as highlighted below.

3. On the Create window, you need to provide the following details.

Name: You must provide a name for the Azure Cognitive Services Speech API.
Subscription: You need to provide a valid subscription that you want to use to create the Speech Azure Cognitive Services.
Location: You must provide a location for the Speech Azure Cognitive Services.
Pricing tier: Select the Free F0 pricing tier, which is complimentary and can be used for the demo. You can click the ‘View full pricing details’ link to review all the price details and select one that meets your requirements.
Resource group: You can select your existing resource group or create a new one by clicking on the Create new link if you don’t have any existing Resource groups.

Once you have provided all the above options, click the Create button to create the Speech Azure Cognitive Service.

On the screen below, click the Go to resource button to navigate to the Speech Azure Cognitive Service you created.

Now you can see the Cognitive Service you have created just now. Click on the Overview tab. On the Cognitive Service page, you can see the location for the Speech Cognitive Service page. You can note down the location for the Speech Cognitive Service that we need to use in the code in the steps below.

We need the key for the Speech Cognitive Service to use in our code. On the Cognitive Service page, click on the Keys and Endpoint link from the left navigation. Now, you can see the Key 1 or Key 2 option. Click the copy button to copy KEY 1 to the clipboard as highlighted below. The key value we will have to use in the code.

If you don’t want to use the old key, click the ReGenerate Key1 or ReGenerate Key2 button to generate a new one. To view the Key values, click the Show Keys button.

Now, we have completed our first step in the development activity. The next step is to create a console application and add the C# code to convert the Speech to text and text to Speech using Visual Studio 2019. Let’s create a console application.

Create a console application using Visual Studio 2019

Open Visual Studio 2019 from your local machine and click the Create a New Project button in the Get Started window.

Choose the Project template as Console App (.NET Framework) and click the Next button.

Provide the details below on the Configure your new project window

Project Name: Provide a unique name for your console application.
Location: Choose a location on your local machine to save your console application.
Framework: Select the latest framework.

Finally, click on the Create button to create the console application.

You can see that the project was created successfully without any issues.

Installing the Speech SDK

Next, we will work with the Azure Speech service. We need to add a NuGet package named Microsoft.CogntiveServices.Speech. To add the NuGet package to your project, follow the steps below

Right-click on the Project and then click on the Manage NuGet Package Link as shown below

Now, click the Browse tab and search for ‘Speech’ once you have the NuGet package from Microsoft.CogntiveServices.Speech as the search result, select the NuGet package, and then click on the Install button to install the NuGet package.

Text To Speech

Now, we will see how to convert text to speech with Azure Cognitive Services. Add the code below to your Program.cs file.

Note: Ensure that you update the key value and location according to the speech service you created above.

using System;
using System.Threading.Tasks;
using Microsoft.CognitiveServices.Speech;


namespace DemoSpeechService
{
    class Program
    {
        private const string Key = "b8525d4d635c4ead882b8f56b404bff4";
        private const string Location = "eastus"; // Azure Speech Service Location
        public static async Task ConvertTextToSpeech(string text)
        {
            
        var confg = SpeechConfig.FromSubscription(Key, Location);
            
            using (var converter = new SpeechSynthesizer(confg))
            {
                using (var r = await converter.SpeakTextAsync(text))
                {
                    if (r.Reason ==
                    ResultReason.SynthesizingAudioCompleted)
                        Console.WriteLine($"Speech converted " +
                        $"to speaker for text [{text}]");
                    else if (r.Reason == ResultReason.Canceled)
                    {
                        var canc =
                       SpeechSynthesisCancellationDetails.FromResult(r);
                        Console.WriteLine($"CANCELED: " +
                         $"Reason={canc.Reason}");
                        if (canc.Reason ==
                        CancellationReason.Error)
                        {
                            Console.WriteLine($"Cancelled with " +
                            $"Error Code {canc.ErrorCode}");
                            Console.WriteLine($"Cancelled with " +
                            $"Error Details " +
                           $"[{canc.ErrorDetails}]");
                        }
                    }
                }
                Console.WriteLine("Waiting to play " +
                "the audio again...");
                Console.ReadKey();
            }
        }
        static void Main()
        {
            ConvertTextToSpeech("Hello, how are you? " +
            "Welcome to AzureLessons").Wait();
        }

    }
}

Now, run the application, and you will get the error “Cognitive Services Speech SDK doesn’t support ‘Any CPU’ as a platform.” This error is because, by default, Visual Studio sets our project to compile to Any CPU. You need to choose a target platform to compile the code for. Follow the steps below to make the changes to fix this issue

You can do this by clicking the drop-down arrow next to Any CPU. Then, click the Configuration Manager option to set up a target platform

Click New in the Active Solution Platform drop-down menu, as shown below.

If you are using a 32-bit operating system, you must choose x86. If you’re operating
the system with 64 bits, you can target x86 or x64, select Any CPU as the Copy settings, and then click the Ok button.

Now you can see that the Speech project was assigned to the build option you selected, as shown below.

Now, run the application. You won’t get the error this time. Once you run the application, you can hear “Hello, how are you? Welcome to AzureLessons” as the Audio on your Laptop speaker.

Text To Audio

Above, we have written the C# Logic to convert the text to speech directly. Now, instead of directly converting the written text to speech, we will write code to create an audio file that can be played to read the written text.

We will refactor the existing code and then add a new method and a few new namespaces.

using System;
using System.Threading.Tasks;
using Microsoft.CognitiveServices.Speech;

using System.IO;
using Microsoft.CognitiveServices.Speech.Audio;

The new method will be like below

public static async Task ConvertTextToAudioFile(string text,
        string func1)
        {
            var config = SpeechConfig.FromSubscription(Key, Location);
            using (FileStream fs = new FileStream(func1, FileMode.Create))
            using (BinaryWriter wr = new BinaryWriter(fs))
                wr.Write(
                System.Text.Encoding.ASCII.GetBytes("RIFF"));
            using (var fw = AudioConfig.FromWavFileOutput(func1))
            using (var ss = new
            SpeechSynthesizer(config, fw))
                await Conversion(text, ss);
        }

The code inside the Main method will be like the one below

static void Main()
{
  string txt = "Hello, how are you?" +
            "Welcome to AzureLessons";
            string fn = @"E:\Bijay\Test\hello.wav";
            ConvertTextToAudioFile(txt, fn).Wait();
}

Text-to-speech and audio code

Now, the complete code for the conversion from Text to speech and audio code is as below

using System;
using System.IO;
using System.Threading.Tasks;
using Microsoft.CognitiveServices.Speech;
using Microsoft.CognitiveServices.Speech.Audio;

namespace DemoSpeechService
{
    class Program
    {
        private const string Key = "b8525d4d635c4ead882b8f56b404bff4";
        private const string Location = "eastus"; // Azure Speech Service Location
        public static async Task ConvertTextToSpeech(string text)
        {
            var config = SpeechConfig.FromSubscription(Key, Location);
            using (var ss = new SpeechSynthesizer(config))
                await Conversion(text, ss);
           
 }
        public static async Task ConvertTextToAudioFile(string text,
        string func1)
        {
            var config = SpeechConfig.FromSubscription(Key, Location);
            using (FileStream fs = new FileStream(func1, FileMode.Create))
            using (BinaryWriter wr = new BinaryWriter(fs))
                wr.Write(
                System.Text.Encoding.ASCII.GetBytes("RIFF"));
            using (var fw = AudioConfig.FromWavFileOutput(func1))
            using (var ss = new
            SpeechSynthesizer(config, fw))
                await Conversion(text, ss);
        }
        private static async Task Conversion(string text,
        SpeechSynthesizer synthesizer)
        {
            using (var r = await synthesizer.SpeakTextAsync(text))
            {
                if (r.Reason == ResultReason.SynthesizingAudioCompleted)
                    Console.WriteLine($"Speech converted " +
                    $"to speaker for text [{text}]");
                else if (r.Reason == ResultReason.Canceled)
                {
                    var cancellation =
                    SpeechSynthesisCancellationDetails.FromResult(r);
                    Console.WriteLine($"CANCELED: " +
                    $"Reason={cancellation.Reason}");
                    if (cancellation.Reason == CancellationReason.Error)
                    {
                        Console.WriteLine($"Cancelled with " +
                        $"Error Code {cancellation.ErrorCode}");
                        Console.WriteLine($"Cancelled with " +
                        $"Error Details " +
                        $"[{cancellation.ErrorDetails}]");
                    }
                }
            }
            Console.WriteLine("Waiting to play " +
            "the audio again...");
            Console.ReadKey();
        }

        static void Main()
        {
            string txt = "Hello, how are you?" +
            "Welcome to AzureLessons";
            string fn = @"E:\Bijay\Test\hello.wav";
            ConvertTextToSpeech(txt).Wait();
            ConvertTextToAudioFile(txt, fn).Wait();
        }
    }

}

Once you run the above code, you will find the. The WAV file will be generated in the mentioned local path.

Azure Text To Speech Rest API

You can convert the text into speech, and you can also obtain a list of supported voices based on region using the REST APIs. One point to note is that, in the case of the REST API, each endpoint is associated with a specific area.

The text-to-speech Rest API has two types of voices, which are as below

Neural text-to-speech voice
Standard text-to-speech voice

It’s quite easy to implement the Azure speech-to-text functionality using the Rest API with the great help of Azure Cognitive Services. Check out an example on the Azure Text To Speech Rest API now.

Wrapping Up

In this article, we discussed how to convert text to speech with Azure. I hope you have enjoyed this article!!!.

You may also like the following articles below

Azure Text To Speech Pricing

Rajkishore

I am Rajkishore, and I am a Microsoft Certified IT Consultant. I have over 14 years of experience in Microsoft Azure and AWS, with good experience in Azure Functions, Storage, Virtual Machines, Logic Apps, PowerShell Commands, CLI Commands, Machine Learning, AI, Azure Cognitive Services, DevOps, etc. Not only that, I do have good real-time experience in designing and developing cloud-native data integrations on Azure or AWS, etc. I hope you will learn from these practical Azure tutorials. Read more.

Azure Text to Speech