How To Convert m4a File To Text Using Azure Cognitive Services

How To Convert m4a File To Text Using Azure Cognitive Services

In this Azure tutorial, we will discuss How To Convert m4a File To Text Using Azure Cognitive Services. Along with this, we will also discuss a few other topics like How to Convert the m4a file to a WAV file, Setting up Azure Cognitive Services Azure Portal and we will also discuss Azure Speech To Text Pricing.

How To Convert m4a File To Text Using Azure Cognitive Services? We will convert m4a File To Text Using Azure Cognitive Services with the below steps

  • Convert the m4a file to WAV file, Then WAV file will be converted to Text
  • Create an Azure Cognitive Services  using the Azure Portal
  • Create an ASP.Net Core Application and implement the code using Visual Studio 2019

How To Convert m4a File To Text Using Azure Cognitive Services

Well, here we will discuss How To Convert m4a File To Text Using Azure Cognitive Services. Before starting the actual implementation We should know the Prerequisites to implement this functionality.

Prerequisites

  • A Valid Azure Subscription or a Valid Azure Account. If you don’t have till now, create an Azure Free Account now.
  • Speech service subscription along with the Azure Subscription.
  • Visual Studio 2019 needs to be installed in your Machine. If you don’t have it in your local machine, Install Visual Studio 2019 now.

How to Convert the m4a file to WAV file

We can directly convert the m4a file to text but I prefer to convert the m4a file to a WAV file and then convert the MP3 file to the text file. There are many ways to convert the m4a file to a WAV file. You can do using any of the third-party tools. Our aim is to convert the m4a file to a WAV file, How you are converting, that doesn’t matter.

You can follow the below steps to do that

Navigate to the URL (https://convertio.co/)

You need to click on the Choose Files button and upload the m

How to Convert the m4a file to MP3 file

It will upload the file then choose the option to WAV and then finally click on the Convert button. It will take a few seconds and will convert the file to the MP3 format.

Convert the m4a file to WAV file

Setting up Azure Cognitive Services Azure Portal

Now we need to configure Speech Azure Cognitive Services in the Azure Portal. Follow the below steps to Configure Speech Azure Cognitive Services Azure Portal.

Login to the Azure Portal (https://portal.azure.com/)

Once you logged in to the Azure Portal, search for the Speech and then click on the search result Speech under the Marketplace as highlighted below.

Setting up Azure Cognitive Services Azure Portal

On the Create window, Provide the below options.

  • Name: Provide a name for the Speech Azure Cognitive Services
  • Subscription: Provide a valid subscription that you want to use to configure the Speech Azure Cognitive Services
  • Location: Provide a location for the Speech Azure Cognitive Services.
  • Pricing tier: Choose the Pricing tier as Free F0. which is free and can be used for the demo purpose and you can click on the view full pricing details link to check out all the price details and select one based on your requirement.
  • Resource group: Select your existing resource group or you can create a new one by clicking on the Create new link if you don’t have any existing Resource Group.

Once you have provided all the above options, finally, click on the Create button to create the Speech Azure Cognitive Service.

How to configure Azure Cognitive Services Azure Portal

Now you can able to see that the deployment is complete. Click on the Go to resource button to navigate to the Speech Azure Cognitive Service that you have created.

azure speech service configuration

Once you will click on the Go to resource button, you can able to see the Cognitive Service that you have created just now. On the Cognitive Service page, click on the Overview tab, You can able to see the location for the Speech Cognitive Service page. You can note down the location for the Speech Cognitive Service that we need to use in the code in the below steps.

How to configure Azure Cognitive Services using Azure Portal

Now the next thing we need is, we need the key for the Speech Cognitive Service to use in our code. On the Cognitive Service, click on the Keys and Endpoint link from the left navigation under the RESOURCE MANAGEMENT option. Now you can able to see the Key 1 or Key 2 option, click on the copy button to copy the KEY 1 to the clipboard as highlighted below. The key value we will have to use in the code.

Setting up  Speech Azure Cognitive Services Azure Portal

Let us go and create a .NET Console App using Visual Studio 2019, I am using a .NET Core App, but you can also do it using .Net Framework app if you like.

Open the Visual Studio 2019 and click on the Create a new Project button

Select the Console App (.NET Core) as the project template and then click on the Next button.

Transcribe mp3 audio files to text using Azure SpeechServices and C

On the configure your new project window, Provide a name for the Project and then click provide the location where you want to save your project and then click on the Create button.

Speech to text mp3 audio files using Azure Cognitive Services and .NET Core

Now the project will get created successfully with out any issue. The next thing is we need to add a nugget package to the project

Right click on the Project name —–> Click on the Manage NuGet Packages option as shown below

Convert m4a File To Text Using Azure Cognitive Services

Now search for Microsoft.CognitiveServices.Speech the Nuget package and then select the search result and then click on the Install button and then click on the I accept button to install the Nuget Package as shown below.

How To Convert m4a File To Text Azure Cognitive Services

Now if you will see the projectname.csproj file, it should look like below

<Project Sdk="Microsoft.NET.Sdk">

  <PropertyGroup>
    <OutputType>Exe</OutputType>
    <TargetFramework>netcoreapp3.1</TargetFramework>
  </PropertyGroup>

  <ItemGroup>
    <PackageReference Include="Microsoft.CognitiveServices.Speech" Version="1.13.0" />
  </ItemGroup>

</Project>

Now you need to add the code in the Program. cs file like below. You need to add/ modify the code basedon the requirement

Change the value for your subscription key and Region as per yours. subscription key is the key to your Speech Cognitive Service that you copied in the above steps. You can either use the Key1 value or Key2 value. The Region is the location where the Speech Cognitive Service is located, which is highlighted above also.

using Microsoft.CognitiveServices.Speech;
using Microsoft.CognitiveServices.Speech.Audio;
using System;
using System.Text;
using System.Threading.Tasks;

namespace ConvertMP3ToText
{
    class Program
    {
        static async Task Main(string[] args)
        {

            //Console.WriteLine("Hello World!");
            var tsk = new TaskCompletionSource<int>();
            var config = SpeechConfig.FromSubscription("Put your subscription key here", "Eastus");

            var transcriptionStringBuilder = new StringBuilder();

            using (var input = AudioConfig.FromWavFileInput(@"C:\Users\Bijay\Desktop\VM\hello\sample1.wav"))
            {
                using (var recog = new SpeechRecognizer(config, input))
                {

                    recog.Recognizing += (sender, eventargs) =>
                    {
                        // Add code to handle the result
                        
                    };

                    recog.Recognized += (sender, eventargs) =>
                    {
                        if (eventargs.Result.Reason == ResultReason.RecognizedSpeech)
                        {
                            transcriptionStringBuilder.Append(eventargs.Result.Text);
                        }
                        else if (eventargs.Result.Reason == ResultReason.NoMatch)
                        {
                            // Code to manage the not recognized value  
                        }
                    };

                    recog.Canceled += (sender, eventargs) =>
                    {
                        if (eventargs.Reason == CancellationReason.Error)
                        {
                            //Error Handaling
                        }

                        if (eventargs.Reason == CancellationReason.EndOfStream)
                        {
                            Console.WriteLine(transcriptionStringBuilder.ToString());
                        }

                        tsk.TrySetResult(0);
                    };

                    recog.SessionStarted += (sender, eventargs) =>
                    {
                        // recognition session started
                    };

                    recog.SessionStopped += (sender, eventargs) =>
                    {
                        // recognition session Ended
                        tsk.TrySetResult(0);
                    };

                    await recog.StartContinuousRecognitionAsync().ConfigureAwait(false);

                    Task.WaitAny(new[] { tsk.Task });

                    await recog.StopContinuousRecognitionAsync();
                }
            }

            Console.ReadKey();
        
    }
    }
}

You can see the code implementation here

Steps to Convert m4a File To Text Using Azure Cognitive Services

Now you are done with all the changes. Press F5 to run the application, Once you will run the application you can able to see the expected output. This is the information available in my m4a File.

Convert MP3 File To Text Using Azure Cognitive Services

Note: If you are getting any error check the below stuffs properly

  • Check if the wav file is working properly try with another wav file for confirmation that your file is correct.
  • Cross-check the key and location value just to make sure both are correct

This is How To Convert m4a File To Text Using Azure Cognitive Services following the above mentioned steps.

Azure Speech To Text Tutorial Overview

Here, we will discuss the Overview of Azure Speech To Text. Azure speech to text is also called speech recognition. This helps to transcription audio to text in real-time scenarios. In this service also Microsoft uses the same recognition technology that Microsoft uses for Cortana and Office products.

One of the core features of the Speech service is, it has the ability to recognize and convert human speech to text format easily.

This service supports more than 88 languages like Arabic, Bulgarian, Catalan, Czech, Danish, German, Greek, English, Spanish, Estonian, Finnish, French, Irish, Gujarati, Hindi, Croatian, Hungarian, Italian, Japanese, Korean, Lithuanian, Latvian, Marathi, Maltese, Norwegian, Dutch, Polish, Portuguese, Romanian, Russian, Slovak, Slovenian, Swedish, Tamil, Telugu, Thai, Turkish, Chinese and many more. Nothing wrong telling that it uses the Universal language model.

This service also enables the pronunciation assessment feature that helps to evaluate the pronunciation of the speech and gives feedback to the speakers on the accuracy and fluency.

To use the speech service in your application for the better conversion of human speech to the text format, you need to have an Azure account and Speech service subscription.

Create a speech configuration

We have already created a search service in the Azure Portal and we also discussed above, how to use the search service from the .NET Core application. You can refer to the above section for detailed information.

One important thing to point out here is, you need to initialize the SpeechConfig. There are multiple ways to achieve this. Below are the few key ways to initialize the SpeechConfig.

  • You can use the Subscription bypassing the key and the associated region of the Speech Service (We have already implemented this way above, you can check out for a better understanding).
  • The second approach to initialize the SpeechConfig is bypassing a Speech service endpoint. If you want, you can use an authorization token along with this and that is not mandatory.
  • The third approach is bypassing a host address and again optionally if you want you can use a key or an authorization token.
  • The fourth approach is bypassing an authorization token along with the associated region.

You can also able to customize the Speech service model. The customized speech service model will help you to overcome different speech recognition barriers such as speaking style, vocabulary and background noise, etc. You can’t able to get this feature with the standard Speech service model. Based on your requirement if you want, you can customize the Speech service model.

One more thing to note down in case of customizing the Speech service model is, the customization option differs in terms of language and the locale.

You can also able to transcribe  a large number of Audio files with a Batch operation with the help of the Speech service API model.

Another important thing to learn is the Azure Speech service provides two SDKs. Below are those

  • Speech SDK: This is the first SDK and helps us providing maximum functionalities to interact with the Speech Service. This supports multiple languages.
  • Speech Devices SDK: This is the second SDK and this is especially for specific devices. This also supports multiple languages.

Azure Speech To Text Pricing

Well, here we will discuss Azure Speech To Text Pricing details. See below for the details

INSTANCE DETAILSCATEGORY DETAILSFEATURESPRICE RANGES
Web/Container
1 concurrent request- Free
Speech-to-TextStandardPer month, you will get 5 audio hours free
CustomPer month, you will get 5 audio hours free.hosting: 1 model free per month 2
Multi-channel audio conversion Per month, you will get 5 audio hours free
Web/Container
20 concurrent request 1– Standard
Speech-to-TextStandardYou need to pay  per audio hour $1.28
CustomPer audio hour, you need to pay $1.792 
Endpoint hosting:  per model per hour, you need to pay $0.0688
multi-channel audio conversionYou need to pay $2.69 per audio hour 4
Speech translationStandardPer audio hour, you need to pay $3.20 
Speaker Recognition7Speaker verificationN/A per 1,000 transactions
Speaker identificationN/A per 1,000 transactions

For more information on the Pricing details, you can refer here

You may also like following the below articles

Wrapping Up

Well, In this article, we discussed How To Convert m4a File To Text Using Azure Cognitive Services, How to Convert the m4a file to WAV file, Setting up Azure Cognitive Services Azure Portal and we also discussed Azure Speech To Text Pricing. Hope you have enjoyed this article !!!