How to convert speech to text with Azure Cognitive Services

In my last article, we discussed How to convert text to speech with Azure Cognitive Services. Now, let’s see How to convert speech to text with Azure Cognitive Services. Here, we will use the SpeechRecognizer method instead of the SpeechSynthesizer method.

How to convert speech to text with Azure Cognitive Services

Let’s discuss the Prerequisites needed here.

Prerequisites

  • You must have a Valid Azure Account or subscription. If you don’t have till now, no worries,  create an Azure Free Account now.
  • You must have the speech service subscription with the Azure Subscription.
  • You must install Visual Studio 2019 on your local machine. If you don’t have one, Install Visual Studio 2019 now.

Azure Speech To Text

Now Follow the below steps for the complete functionality

Step-1: Create the Azure Cognitive Services Speech API

Step 2: Create a console application using Visual Studio 2019

Step 3: Installing the Speech SDK

Step 4: Add the below code to your Program.cs file.

using System;
using System.IO;
using System.Threading.Tasks;
using Microsoft.CognitiveServices.Speech;
using Microsoft.CognitiveServices.Speech.Audio;

namespace DemoSpeechService
{
    class Program
    {
        private const string Key = "b8525d4d635c4ead882b8f56b404bff4";
        private const string Location = "eastus"; // Azure Speech Service Location
        public static async Task ConvertSpeechToText()
        {
            var confg = SpeechConfig.FromSubscription(Key, Location);
            using (var sr = new SpeechRecognizer(confg))
                await Identify(sr);
            
 }
        private static async Task Identify(SpeechRecognizer recognizer)
        {
            var rslt = await recognizer.RecognizeOnceAsync();
            if (rslt.Reason == ResultReason.RecognizedSpeech)
                Console.WriteLine($"Recognized: {rslt.Text}");
            else if (rslt.Reason == ResultReason.NoMatch)
                Console.WriteLine("Speech could not be recognized.");
            else if (rslt.Reason == ResultReason.Canceled)
            {
                var cancellation =
                CancellationDetails.FromResult(rslt);
                Console.WriteLine
                ($"Cancelled due to reason={cancellation.Reason}");
                if (cancellation.Reason == CancellationReason.Error)
                {
                    Console.WriteLine
                    ($"Error code={cancellation.ErrorCode}");
                    Console.WriteLine
                    ($"Error details={cancellation.ErrorDetails}");
                    
                }
            }
        }


        static void Main()
        {
            ConvertSpeechToText().Wait();
            Console.ReadLine();
        }
    }


}

Step-5: Now run the application and speak out something it will display in the console window like below. Check out the screenshot below.

azure speech to text api example

Azure Audio To Text

I will use a .wav file, which I kept in my local machine path, and identify the recorded speech as text. You can use the below code in your Program.cs file.

using System;
using System.IO;
using System.Threading.Tasks;
using Microsoft.CognitiveServices.Speech;
using Microsoft.CognitiveServices.Speech.Audio;

namespace DemoSpeechService
{
    class Program
    {
        private const string Key = "b8525d4d635c4ead882b8f56b404bff4";
        private const string Location = "eastus"; // Azure Speech Service Location
        

        
        public static async Task AudioToTextContinuousAsync(string fn)
        {
            var config = SpeechConfig.FromSubscription(Key, Location);
            using (var ai = AudioConfig.FromWavFileInput(fn))
            using (var recognizer = new SpeechRecognizer(config, ai))
                await RecognizeAllSpeech(recognizer);
        }
        private static async Task RecognizeAllSpeech(
        SpeechRecognizer recognizer)
        {
            var tc = new TaskCompletionSource<int>();
            
            recognizer.Recognizing += (sender, eventargs) =>
            {
                
            };
            recognizer.Recognized += (sender, eventargs) =>
            {
                if (eventargs.Result.Reason ==
                ResultReason.RecognizedSpeech)
                    Console.WriteLine
                     ($"Recognized: {eventargs.Result.Text}");
            };
            recognizer.Canceled += (sender, eventargs) =>
            {
                if (eventargs.Reason == CancellationReason.Error)
                    Console.WriteLine("Error reading the audio file.");
                if (eventargs.Reason == CancellationReason.EndOfStream)
                    Console.WriteLine("End of file.");
                tc.TrySetResult(0);
                
            };
            recognizer.SessionStarted += (sender, eventargs) =>
            {
                
            };
            recognizer.SessionStopped += (sender, eventargs) =>
            {
                
                tc.TrySetResult(0);
            };
            
            await recognizer.
            StartContinuousRecognitionAsync().ConfigureAwait(false);
            
            Task.WaitAny(new[] { tc.Task });
            
            await recognizer.StopContinuousRecognitionAsync();
        }
        private static async Task Identify(SpeechRecognizer recognizer)
        {
            var rslt = await recognizer.RecognizeOnceAsync();
            if (rslt.Reason == ResultReason.RecognizedSpeech)
                Console.WriteLine($"Recognized: {rslt.Text}");
            else if (rslt.Reason == ResultReason.NoMatch)
                Console.WriteLine("Speech could not be recognized.");
            else if (rslt.Reason == ResultReason.Canceled)
            {
                var cancellation =
                CancellationDetails.FromResult(rslt);
                Console.WriteLine
                ($"Cancelled due to reason={cancellation.Reason}");
                if (cancellation.Reason == CancellationReason.Error)
                {
                    Console.WriteLine
                    ($"Error code={cancellation.ErrorCode}");
                    Console.WriteLine
                    ($"Error details={cancellation.ErrorDetails}");
                    
                }
            }
        }


        static void Main()
        {
            
            string fn = @"C:\Users\Bijay\Desktop\VM\hello\sample1.wav";
            

            AudioToTextContinuousAsync(fn).Wait();
            Console.ReadLine();
        }
    }


}

Once you run the application, you will get the expected output.

speech to text with Azure Cognitive Services

Azure Speech To Text Python

You can also implement the Azure speech-to-text functionality using Python with Azure Cognitive Services. You can find an example on Azure Speech To Text Python now.

Azure Speech To Text JavaScript

Using JavaScript, you can implement the Azure speech-to-text functionality easily with the help of Azure Cognitive Services. You can find the example on Azure Speech To Text JavaScript now.

You may also like following the articles below

Wrapping Up

In this article, we discussed how to convert speech to text with Azure Cognitive Services. Thanks for reading this article !!!