
In this Azure tutorial, we will discuss How to convert text to speech with Azure Cognitive Services. Along with this, we will also discuss the below topics
- Azure Speech Services
- Create the Azure Cognitive Services Speech API
- Installing the Speech SDK
- Text To Speech
- Cognitive Services Speech SDK doesn’t support ‘Any CPU’ as a platform.
- Text To Audio
- Speech to text
- How to convert speech to text with Azure Cognitive Services
- Audio To Text
- Cognitive Services Speech SDK
- Azure Speech To Text Python
- Azure Speech To Text JavaScript
- Azure Text To Speech Rest API
- Azure Text To Speech Pricing
Table of Contents
- How to convert text to speech with Azure Cognitive Services
- Azure Speech Services
- Prerequisites
- Create the Azure Cognitive Services Speech API
- Create a console application using Visual Studio 2019
- Installing the Speech SDK
- Text To Speech
- Cognitive Services Speech SDK doesn’t support ‘Any CPU’ as a platform.
- Text To Audio
- Text to speech and audio code
- Azure Speech to text
- How to convert speech to text with Azure Cognitive Services
- Audio To Text
- Cognitive Services Speech SDK
- Azure Speech To Text Python
- Azure Speech To Text JavaScript
- Azure Text To Speech Rest API
- Azure Text To Speech Pricing
- Wrapping Up
How to convert text to speech with Azure Cognitive Services
Well, here we will discuss How to convert text to speech with Azure Cognitive Services. Before discussing the actual functionality, we should know about Azure Speech Services.
Azure Speech Services
Well, here we will discuss an end to end tutorial on the Azure Speech Services. Azure Speech Services provides you the speech processing capability to your application easily. Especially, it helps the developer to easily implement the Speech Processing capability to your application with very little coding effort.
- How To Extract Text from Image Using Azure Cognitive Services
- How To Create Azure Cognitive Service Account PowerShell
The Speech processing capability includes the followings
- Speech to text: Where you can convert the speech to readable text with the help of the Azure Cognitive Speech API.
- Text to speech: We can convert the Text to audible speech easily with the help of the Azure Speech API.
- Speech translation: you can able to integrate the speech translation easily into your apps using Azure cognitive services Speech API.
- Speaker recognition: Provides you the ability to recognize the people speaking based on the audio.
As part of this article, we will see How to convert text to speech with Azure Cognitive Services and we will also discuss How to convert speech to text with Azure Cognitive Services. Before starting the actual functionality we should know the Prerequisites needed here.
Prerequisites
- You must have a valid Azure Subscription or a Valid Azure Account. If you don’t have till now, create an Azure Free Account now.
- You should have the speech service subscription along with the Azure Subscription.
- Visual Studio 2019 needs to be installed on your local machine. If you don’t have it on your local machine, Install Visual Studio 2019 now.
Assuming that you have all the Prerequisites needed here for the development activity, Let’s start the actual development activity. The first step is we need to Create the Azure Cognitive Services Speech API.
Create the Azure Cognitive Services Speech API
Follow the below steps to Create the Azure Cognitive Services Speech API using Azure Portal.
Login to the Azure Portal (https://portal.azure.com/)
Then, search for the Speech and then click on the search result Speech under the Marketplace as highlighted below.

On the Create window, You need to Provide the below details.
- Name: You need to provide a name for the Azure Cognitive Services Speech API.
- Subscription: You need to Provide a valid subscription that you want to use to create the Speech Azure Cognitive Services.
- Location: You need to provide a location for the Speech Azure Cognitive Services.
- Pricing tier: Choose the Pricing tier as Free F0. which is free and can be used for the demo purpose and you can click on the view full pricing details link to check out all the price details and you can select one based on your requirement.
- Resource group: You can select your existing resource group or you can create a new one by clicking on the Create new link if you don’t have any existing Resource Group.
Once you have provided all the above options, finally, you can click on the Create button to create the Speech Azure Cognitive Service.

On the below screen, click on the Go to resource button to navigate to the Speech Azure Cognitive Service that you have created.

Now you can able to see the Cognitive Service that you have created just now. Click on the Overview tab On the Cognitive Service page, You can able to see the location for the Speech Cognitive Service page. You can note down the location for the Speech Cognitive Service that we need to use in the code in the below steps.

We need the key for the Speech Cognitive Service to use in our code. On the Cognitive Service page, click on the Keys and Endpoint link from the left navigation. Now you can able to see the Key 1 or Key 2 option, click on the copy button to copy the KEY 1 to the clipboard as highlighted below. The key value we will have to use in the code.
If you don’t want the old key, you can click on the ReGenerate Key1 or ReGenerate Key2 button to generate a new key. If you want the Key values, you can click on the Show Keys button.

Now we are done with our first step of the development activity. The next step is we will create a console application and add the C# code to convert the Speech to text and text to Speech using Visual Studio 2019. Let’s create a console application.
Create a console application using Visual Studio 2019
Open the Visual studio 2019 from your local machine and click on the Create a New Project button on the get started window.
Choose the Project template as Console App (.NET Framework) and then click on the Next button.

Provide the below details on the Configure your new project window
- Project Name: Provide a unique name for your console application.
- Location: Choose a location in your local machine to save your console application.
- Framework: Select the framework as the latest one.
Finally, click on the Create button to create the console application.

Now, you can able to see that the project got created successfully with out any issue.

Installing the Speech SDK
Now the next step is in order to work with the Azure Speech service, we need to add a NuGet package named Microsoft.CogntiveServices.Speech. To add the NuGet package to your project, follow the below steps
Right click on the Project and then click on the Manage NuGet Package Link as shown below

Now, click on the Browse tab and then search for Speech, Once you will get the NuGet package, Microsoft.CogntiveServices.Speech as the search result, select the NuGet package, and then click on the Install button to install the NuGet package.

Text To Speech
Now we will see How to convert text to speech with Azure Cognitive Services. Add the below code to your Program.cs file.
Note: Make sure to change the key value and location based on your speech service that you have created above.
using System;
using System.Threading.Tasks;
using Microsoft.CognitiveServices.Speech;
namespace DemoSpeechService
{
class Program
{
private const string Key = "b8525d4d635c4ead882b8f56b404bff4";
private const string Location = "eastus"; // Azure Speech Service Location
public static async Task ConvertTextToSpeech(string text)
{
var confg = SpeechConfig.FromSubscription(Key, Location);
using (var converter = new SpeechSynthesizer(confg))
{
using (var r = await converter.SpeakTextAsync(text))
{
if (r.Reason ==
ResultReason.SynthesizingAudioCompleted)
Console.WriteLine($"Speech converted " +
$"to speaker for text [{text}]");
else if (r.Reason == ResultReason.Canceled)
{
var canc =
SpeechSynthesisCancellationDetails.FromResult(r);
Console.WriteLine($"CANCELED: " +
$"Reason={canc.Reason}");
if (canc.Reason ==
CancellationReason.Error)
{
Console.WriteLine($"Cancelled with " +
$"Error Code {canc.ErrorCode}");
Console.WriteLine($"Cancelled with " +
$"Error Details " +
$"[{canc.ErrorDetails}]");
}
}
}
Console.WriteLine("Waiting to play " +
"the audio again...");
Console.ReadKey();
}
}
static void Main()
{
ConvertTextToSpeech("Hello, how are you? " +
"Welcome to AzureLessons").Wait();
}
}
}
Cognitive Services Speech SDK doesn’t support ‘Any CPU’ as a platform.
Now, run the application, you will get the error “Cognitive Services Speech SDK doesn’t support ‘Any CPU’ as a platform.”, This error is because, by default, Visual Studio set our project to compile to Any CPU. So you need to choose a target platform to compile the code to. Follow the below steps to do the changes to fix this issue
You can do this by clicking the drop-down arrow next to Any CPU. Then, click the Configuration Manager option to set up a target platform

In the Active solution Platform drop-down menu, click New, as shown below.

If you are using a 32-bit operating system, you must choose x86. if you’re operating
the system is 64 bits, you can choose to target x86 or x64 and then select Any CPU as the Copy settings from and then click on the Ok button.

Now you can able to see that the Speech project was assigned to the build option you selected as shown below.

Now Run the application, you won’t get the error this time. Once you will run the application you can able to hear the statement “Hello, how are you? Welcome to AzureLessons” as the Audio on your Laptop speaker.

Text To Audio
Above, we have written the C# Logic to convert the text to speech directly. Now, instead of directly converting the written text to speech, we will write the code create an audio file that can be used to play the written text.
We will refactor the existing code and then we will also add a new method and we will also have to add a few new namespaces.
using System;
using System.Threading.Tasks;
using Microsoft.CognitiveServices.Speech;
using System.IO;
using Microsoft.CognitiveServices.Speech.Audio;
The new method will be like below
public static async Task ConvertTextToAudioFile(string text,
string func1)
{
var config = SpeechConfig.FromSubscription(Key, Location);
using (FileStream fs = new FileStream(func1, FileMode.Create))
using (BinaryWriter wr = new BinaryWriter(fs))
wr.Write(
System.Text.Encoding.ASCII.GetBytes("RIFF"));
using (var fw = AudioConfig.FromWavFileOutput(func1))
using (var ss = new
SpeechSynthesizer(config, fw))
await Conversion(text, ss);
}
The code inside the Main method will be like below
static void Main()
{
string txt = "Hello, how are you?" +
"Welcome to AzureLessons";
string fn = @"E:\Bijay\Test\hello.wav";
ConvertTextToAudioFile(txt, fn).Wait();
}
Text to speech and audio code
Now the complete code for the conversion from Text to speech and audio code is as below
using System;
using System.IO;
using System.Threading.Tasks;
using Microsoft.CognitiveServices.Speech;
using Microsoft.CognitiveServices.Speech.Audio;
namespace DemoSpeechService
{
class Program
{
private const string Key = "b8525d4d635c4ead882b8f56b404bff4";
private const string Location = "eastus"; // Azure Speech Service Location
public static async Task ConvertTextToSpeech(string text)
{
var config = SpeechConfig.FromSubscription(Key, Location);
using (var ss = new SpeechSynthesizer(config))
await Conversion(text, ss);
}
public static async Task ConvertTextToAudioFile(string text,
string func1)
{
var config = SpeechConfig.FromSubscription(Key, Location);
using (FileStream fs = new FileStream(func1, FileMode.Create))
using (BinaryWriter wr = new BinaryWriter(fs))
wr.Write(
System.Text.Encoding.ASCII.GetBytes("RIFF"));
using (var fw = AudioConfig.FromWavFileOutput(func1))
using (var ss = new
SpeechSynthesizer(config, fw))
await Conversion(text, ss);
}
private static async Task Conversion(string text,
SpeechSynthesizer synthesizer)
{
using (var r = await synthesizer.SpeakTextAsync(text))
{
if (r.Reason == ResultReason.SynthesizingAudioCompleted)
Console.WriteLine($"Speech converted " +
$"to speaker for text [{text}]");
else if (r.Reason == ResultReason.Canceled)
{
var cancellation =
SpeechSynthesisCancellationDetails.FromResult(r);
Console.WriteLine($"CANCELED: " +
$"Reason={cancellation.Reason}");
if (cancellation.Reason == CancellationReason.Error)
{
Console.WriteLine($"Cancelled with " +
$"Error Code {cancellation.ErrorCode}");
Console.WriteLine($"Cancelled with " +
$"Error Details " +
$"[{cancellation.ErrorDetails}]");
}
}
}
Console.WriteLine("Waiting to play " +
"the audio again...");
Console.ReadKey();
}
static void Main()
{
string txt = "Hello, how are you?" +
"Welcome to AzureLessons";
string fn = @"E:\Bijay\Test\hello.wav";
ConvertTextToSpeech(txt).Wait();
ConvertTextToAudioFile(txt, fn).Wait();
}
}
}
Once you will run the above code, you will find the . wav file will get generated in the mentioned local path.
Azure Speech to text
So we have discussed above, How to convert text to speech with Azure Cognitive Services. Now let’s see How to convert speech to text with Azure Cognitive Services. Here we will use the SpeechRecognizer method instead of the SpeechSynthesizer method.
How to convert speech to text with Azure Cognitive Services
You need to add the below code in your Program.cs file.
using System;
using System.IO;
using System.Threading.Tasks;
using Microsoft.CognitiveServices.Speech;
using Microsoft.CognitiveServices.Speech.Audio;
namespace DemoSpeechService
{
class Program
{
private const string Key = "b8525d4d635c4ead882b8f56b404bff4";
private const string Location = "eastus"; // Azure Speech Service Location
public static async Task ConvertSpeechToText()
{
var confg = SpeechConfig.FromSubscription(Key, Location);
using (var sr = new SpeechRecognizer(confg))
await Identify(sr);
}
private static async Task Identify(SpeechRecognizer recognizer)
{
var rslt = await recognizer.RecognizeOnceAsync();
if (rslt.Reason == ResultReason.RecognizedSpeech)
Console.WriteLine($"Recognized: {rslt.Text}");
else if (rslt.Reason == ResultReason.NoMatch)
Console.WriteLine("Speech could not be recognized.");
else if (rslt.Reason == ResultReason.Canceled)
{
var cancellation =
CancellationDetails.FromResult(rslt);
Console.WriteLine
($"Cancelled due to reason={cancellation.Reason}");
if (cancellation.Reason == CancellationReason.Error)
{
Console.WriteLine
($"Error code={cancellation.ErrorCode}");
Console.WriteLine
($"Error details={cancellation.ErrorDetails}");
}
}
}
static void Main()
{
ConvertSpeechToText().Wait();
Console.ReadLine();
}
}
}
Now run the application, speak out something it will display in the console window like below.

Audio To Text
I will use a .wav file which I have kept in my local machine path and will identify the recorded speech as text. You can use the below code in your Program.cs file.
using System;
using System.IO;
using System.Threading.Tasks;
using Microsoft.CognitiveServices.Speech;
using Microsoft.CognitiveServices.Speech.Audio;
namespace DemoSpeechService
{
class Program
{
private const string Key = "b8525d4d635c4ead882b8f56b404bff4";
private const string Location = "eastus"; // Azure Speech Service Location
public static async Task AudioToTextContinuousAsync(string fn)
{
var config = SpeechConfig.FromSubscription(Key, Location);
using (var ai = AudioConfig.FromWavFileInput(fn))
using (var recognizer = new SpeechRecognizer(config, ai))
await RecognizeAllSpeech(recognizer);
}
private static async Task RecognizeAllSpeech(
SpeechRecognizer recognizer)
{
var tc = new TaskCompletionSource<int>();
recognizer.Recognizing += (sender, eventargs) =>
{
};
recognizer.Recognized += (sender, eventargs) =>
{
if (eventargs.Result.Reason ==
ResultReason.RecognizedSpeech)
Console.WriteLine
($"Recognized: {eventargs.Result.Text}");
};
recognizer.Canceled += (sender, eventargs) =>
{
if (eventargs.Reason == CancellationReason.Error)
Console.WriteLine("Error reading the audio file.");
if (eventargs.Reason == CancellationReason.EndOfStream)
Console.WriteLine("End of file.");
tc.TrySetResult(0);
};
recognizer.SessionStarted += (sender, eventargs) =>
{
};
recognizer.SessionStopped += (sender, eventargs) =>
{
tc.TrySetResult(0);
};
await recognizer.
StartContinuousRecognitionAsync().ConfigureAwait(false);
Task.WaitAny(new[] { tc.Task });
await recognizer.StopContinuousRecognitionAsync();
}
private static async Task Identify(SpeechRecognizer recognizer)
{
var rslt = await recognizer.RecognizeOnceAsync();
if (rslt.Reason == ResultReason.RecognizedSpeech)
Console.WriteLine($"Recognized: {rslt.Text}");
else if (rslt.Reason == ResultReason.NoMatch)
Console.WriteLine("Speech could not be recognized.");
else if (rslt.Reason == ResultReason.Canceled)
{
var cancellation =
CancellationDetails.FromResult(rslt);
Console.WriteLine
($"Cancelled due to reason={cancellation.Reason}");
if (cancellation.Reason == CancellationReason.Error)
{
Console.WriteLine
($"Error code={cancellation.ErrorCode}");
Console.WriteLine
($"Error details={cancellation.ErrorDetails}");
}
}
}
static void Main()
{
string fn = @"C:\Users\Bijay\Desktop\VM\hello\sample1.wav";
AudioToTextContinuousAsync(fn).Wait();
Console.ReadLine();
}
}
}
Once you will run the application, you will get the expected output.

So we have discussed, How to convert text to speech with Azure Cognitive Services and How to convert speech to text with Azure Cognitive Services using the above steps.
Cognitive Services Speech SDK
Well, let’s discuss here an excellent topic i.e Azure Cognitive Services Speech SDK. The SDK provides many of the speech service capability that actually helps you to develop different applications with the speech capability.
One important point the SDK is available in multiple languages like C#, C++, Java, JavaScript, Objective-C / Swift, and Python with different platforms, and the platforms are Windows, Linux, macOS, Android, Node.js, iOS, etc.
Speech-to-text is available on different platforms like C++/Windows & Linux & macOS, C#, Java, JavaScript, Python, Swift, Objective-C, Go, etc.
Same way, Text-to-speech is also available on different platforms like C++/Windows, Linux, C#/Windows, UWP, Unity, Java, Python, Swift, Objective-C, REST API, etc.
You can also use the Speech SDK for Voice assistants that actually helps the developers to implement the natural, human-like conversational interfaces in their applications. The Voice assistants also available on different platforms like C#/Windows, C++/Windows, Linux & macOS, Java, etc.
Another interesting thing is the Speech SDK also supports the Keyword spotting feature. with the help of this feature, you can identify a specific keyword in the speech.
Speech SDK also provides a Conversation Transcription feature that helps you with speech recognition, speaker identification, and sentence attribution to each speaker, etc.
It also supports the Multi-device Conversation feature that helps to connect multiple devices in a conversation to send text-based and speech-based messages.
Speech SDK also supports the REST API and can be used to achieve so many functionalities.
You can check out How to get the speech SDK and the System requirements now.
Azure Speech To Text Python
You can also implement the Azure speech to text functionality using the Python language with Azure Cognitive Services. You can find out an example on Azure Speech To Text Python now.
Azure Speech To Text JavaScript
Using JavaScript also, you can able to implement the Azure speech to text functionality easily with the help of the Azure Cognitive Services. You can find out the example on Azure Speech To Text JavaScript now.
Azure Text To Speech Rest API
You can able to convert the text into speech and you can also get a list of supported voices on the basis of region using the REST APIs. One point to note down here is in case of Rest API, each endpoint belongs to a particular region.
The text to speech Rest API two types of voices, Those are as below
- Neural text-to-speech voice
- Standard text-to-speech voice
It’s quite easy to implement the Azure speech to text functionality using Rest API with the great help of Azure Cognitive Services. Check out an example on Azure Text To Speech Rest API now.
Azure Text To Speech Pricing
Let’s discuss about the Azure Text To Speech Pricing details as below.
Instance Details | Feature Details | Cost Details |
Free (1 concurrent request1) | For Standard | You will get per month 5 audio hours free. |
For Custom | You will get per month 5 audio hours free. You will also get 1 model-free per month as Endpoint hosting. | |
For Multichannel Audio | You will get per month 5 audio hours free. | |
Standard (20 concurrent request 1) | For Standard | You need to pay $1 / audio hour |
For Custom | You need to pay $1.40 per audio hour. Along with this, you will have to pay $0.0538 per model/hour as Endpoint hosting. | |
For Multichannel Audio | You need to pay $2.10 per audio hour 4. |
For more information, check out Azure Text To Speech Pricing now.
You may also like following the below articles
- Build Intelligent C# Apps With Azure Cognitive Services
- Azure Cognitive Services Translator Text API Example
- Microsoft Cognitive Services Bing Search Example
- The term ‘get-azuresubscription’ is not recognized
- The Term ‘Connect-AzureRmAccount’ is Not Recognized
Wrapping Up
Well, in this article, we have discussed How To Convert Text To Speech With Azure Cognitive Services, Azure Speech Services, Azure Cognitive Speech Services, Create the Azure Cognitive Services, Speech API, Installing the Speech SDK and we also discussed Text To Speech, Cognitive Services Speech SDK doesn’t support ‘Any CPU’ as a platform., Text To Audio, Speech to text, How to convert speech to text with Azure Cognitive Services and Audio To Text, Cognitive Services Speech SDK, Azure Speech To Text Python, Azure Speech To Text JavaScript, Azure Text To Speech Rest API, Azure Text To Speech Pricing. Hope you have enjoyed this Article !!!.