
In this Azure tutorial, we will discuss How To Convert m4a File To Text Using Azure Cognitive Services. Along with this, we will also discuss a few other topics like How to Convert the m4a file to a WAV file, Setting up Azure Cognitive Services Azure Portal and we will also discuss Azure Speech To Text Pricing.
How To Convert m4a File To Text Using Azure Cognitive Services? We will convert m4a File To Text Using Azure Cognitive Services with the below steps
- Convert the m4a file to WAV file, Then WAV file will be converted to Text
- Create an Azure Cognitive Services using the Azure Portal
- Create an ASP.Net Core Application and implement the code using Visual Studio 2019
Table of Contents
How To Convert m4a File To Text Using Azure Cognitive Services
Well, here we will discuss How To Convert m4a File To Text Using Azure Cognitive Services. Before starting the actual implementation We should know the Prerequisites to implement this functionality.
- How To Use Azure Text Analytics In Power BI
- Microsoft Azure Machine Learning Tutorial
- Calling An Azure Function From Power Automate (MS Flow)
Prerequisites
- A Valid Azure Subscription or a Valid Azure Account. If you don’t have till now, create an Azure Free Account now.
- Speech service subscription along with the Azure Subscription.
- Visual Studio 2019 needs to be installed in your Machine. If you don’t have it in your local machine, Install Visual Studio 2019 now.
How to Convert the m4a file to WAV file
We can directly convert the m4a file to text but I prefer to convert the m4a file to a WAV file and then convert the MP3 file to the text file. There are many ways to convert the m4a file to a WAV file. You can do using any of the third-party tools. Our aim is to convert the m4a file to a WAV file, How you are converting, that doesn’t matter.
You can follow the below steps to do that
Navigate to the URL (https://convertio.co/)
You need to click on the Choose Files button and upload the m

It will upload the file then choose the option to WAV and then finally click on the Convert button. It will take a few seconds and will convert the file to the MP3 format.

Setting up Azure Cognitive Services Azure Portal
Now we need to configure Speech Azure Cognitive Services in the Azure Portal. Follow the below steps to Configure Speech Azure Cognitive Services Azure Portal.
Login to the Azure Portal (https://portal.azure.com/)
Once you logged in to the Azure Portal, search for the Speech and then click on the search result Speech under the Marketplace as highlighted below.

On the Create window, Provide the below options.
- Name: Provide a name for the Speech Azure Cognitive Services
- Subscription: Provide a valid subscription that you want to use to configure the Speech Azure Cognitive Services
- Location: Provide a location for the Speech Azure Cognitive Services.
- Pricing tier: Choose the Pricing tier as Free F0. which is free and can be used for the demo purpose and you can click on the view full pricing details link to check out all the price details and select one based on your requirement.
- Resource group: Select your existing resource group or you can create a new one by clicking on the Create new link if you don’t have any existing Resource Group.
Once you have provided all the above options, finally, click on the Create button to create the Speech Azure Cognitive Service.

Now you can able to see that the deployment is complete. Click on the Go to resource button to navigate to the Speech Azure Cognitive Service that you have created.

Once you will click on the Go to resource button, you can able to see the Cognitive Service that you have created just now. On the Cognitive Service page, click on the Overview tab, You can able to see the location for the Speech Cognitive Service page. You can note down the location for the Speech Cognitive Service that we need to use in the code in the below steps.

Now the next thing we need is, we need the key for the Speech Cognitive Service to use in our code. On the Cognitive Service, click on the Keys and Endpoint link from the left navigation under the RESOURCE MANAGEMENT option. Now you can able to see the Key 1 or Key 2 option, click on the copy button to copy the KEY 1 to the clipboard as highlighted below. The key value we will have to use in the code.

Let us go and create a .NET Console App using Visual Studio 2019, I am using a .NET Core App, but you can also do it using .Net Framework app if you like.
Open the Visual Studio 2019 and click on the Create a new Project button
Select the Console App (.NET Core) as the project template and then click on the Next button.

On the configure your new project window, Provide a name for the Project and then click provide the location where you want to save your project and then click on the Create button.

Now the project will get created successfully with out any issue. The next thing is we need to add a nugget package to the project
Right click on the Project name —–> Click on the Manage NuGet Packages option as shown below

Now search for Microsoft.CognitiveServices.Speech the Nuget package and then select the search result and then click on the Install button and then click on the I accept button to install the Nuget Package as shown below.

Now if you will see the projectname.csproj file, it should look like below
<Project Sdk="Microsoft.NET.Sdk">
<PropertyGroup>
<OutputType>Exe</OutputType>
<TargetFramework>netcoreapp3.1</TargetFramework>
</PropertyGroup>
<ItemGroup>
<PackageReference Include="Microsoft.CognitiveServices.Speech" Version="1.13.0" />
</ItemGroup>
</Project>
Now you need to add the code in the Program. cs file like below. You need to add/ modify the code basedon the requirement
Change the value for your subscription key and Region as per yours. subscription key is the key to your Speech Cognitive Service that you copied in the above steps. You can either use the Key1 value or Key2 value. The Region is the location where the Speech Cognitive Service is located, which is highlighted above also.
using Microsoft.CognitiveServices.Speech;
using Microsoft.CognitiveServices.Speech.Audio;
using System;
using System.Text;
using System.Threading.Tasks;
namespace ConvertMP3ToText
{
class Program
{
static async Task Main(string[] args)
{
//Console.WriteLine("Hello World!");
var tsk = new TaskCompletionSource<int>();
var config = SpeechConfig.FromSubscription("Put your subscription key here", "Eastus");
var transcriptionStringBuilder = new StringBuilder();
using (var input = AudioConfig.FromWavFileInput(@"C:\Users\Bijay\Desktop\VM\hello\sample1.wav"))
{
using (var recog = new SpeechRecognizer(config, input))
{
recog.Recognizing += (sender, eventargs) =>
{
// Add code to handle the result
};
recog.Recognized += (sender, eventargs) =>
{
if (eventargs.Result.Reason == ResultReason.RecognizedSpeech)
{
transcriptionStringBuilder.Append(eventargs.Result.Text);
}
else if (eventargs.Result.Reason == ResultReason.NoMatch)
{
// Code to manage the not recognized value
}
};
recog.Canceled += (sender, eventargs) =>
{
if (eventargs.Reason == CancellationReason.Error)
{
//Error Handaling
}
if (eventargs.Reason == CancellationReason.EndOfStream)
{
Console.WriteLine(transcriptionStringBuilder.ToString());
}
tsk.TrySetResult(0);
};
recog.SessionStarted += (sender, eventargs) =>
{
// recognition session started
};
recog.SessionStopped += (sender, eventargs) =>
{
// recognition session Ended
tsk.TrySetResult(0);
};
await recog.StartContinuousRecognitionAsync().ConfigureAwait(false);
Task.WaitAny(new[] { tsk.Task });
await recog.StopContinuousRecognitionAsync();
}
}
Console.ReadKey();
}
}
}
You can see the code implementation here

Now you are done with all the changes. Press F5 to run the application, Once you will run the application you can able to see the expected output. This is the information available in my m4a File.

Note: If you are getting any error check the below stuffs properly
- Check if the wav file is working properly try with another wav file for confirmation that your file is correct.
- Cross-check the key and location value just to make sure both are correct
This is How To Convert m4a File To Text Using Azure Cognitive Services following the above mentioned steps.
Azure Speech To Text Tutorial Overview
Here, we will discuss the Overview of Azure Speech To Text. Azure speech to text is also called speech recognition. This helps to transcription audio to text in real-time scenarios. In this service also Microsoft uses the same recognition technology that Microsoft uses for Cortana and Office products.
One of the core features of the Speech service is, it has the ability to recognize and convert human speech to text format easily.
This service supports more than 88 languages like Arabic, Bulgarian, Catalan, Czech, Danish, German, Greek, English, Spanish, Estonian, Finnish, French, Irish, Gujarati, Hindi, Croatian, Hungarian, Italian, Japanese, Korean, Lithuanian, Latvian, Marathi, Maltese, Norwegian, Dutch, Polish, Portuguese, Romanian, Russian, Slovak, Slovenian, Swedish, Tamil, Telugu, Thai, Turkish, Chinese and many more. Nothing wrong telling that it uses the Universal language model.
This service also enables the pronunciation assessment feature that helps to evaluate the pronunciation of the speech and gives feedback to the speakers on the accuracy and fluency.
To use the speech service in your application for the better conversion of human speech to the text format, you need to have an Azure account and Speech service subscription.
Create a speech configuration
We have already created a search service in the Azure Portal and we also discussed above, how to use the search service from the .NET Core application. You can refer to the above section for detailed information.
One important thing to point out here is, you need to initialize the SpeechConfig. There are multiple ways to achieve this. Below are the few key ways to initialize the SpeechConfig.
- You can use the Subscription bypassing the key and the associated region of the Speech Service (We have already implemented this way above, you can check out for a better understanding).
- The second approach to initialize the SpeechConfig is bypassing a Speech service endpoint. If you want, you can use an authorization token along with this and that is not mandatory.
- The third approach is bypassing a host address and again optionally if you want you can use a key or an authorization token.
- The fourth approach is bypassing an authorization token along with the associated region.
You can also able to customize the Speech service model. The customized speech service model will help you to overcome different speech recognition barriers such as speaking style, vocabulary and background noise, etc. You can’t able to get this feature with the standard Speech service model. Based on your requirement if you want, you can customize the Speech service model.
One more thing to note down in case of customizing the Speech service model is, the customization option differs in terms of language and the locale.
You can also able to transcribe a large number of Audio files with a Batch operation with the help of the Speech service API model.
Another important thing to learn is the Azure Speech service provides two SDKs. Below are those
- Speech SDK: This is the first SDK and helps us providing maximum functionalities to interact with the Speech Service. This supports multiple languages.
- Speech Devices SDK: This is the second SDK and this is especially for specific devices. This also supports multiple languages.
Azure Speech To Text Pricing
Well, here we will discuss Azure Speech To Text Pricing details. See below for the details
INSTANCE DETAILS | CATEGORY DETAILS | FEATURES | PRICE RANGES |
Web/Container 1 concurrent request- Free | Speech-to-Text | Standard | Per month, you will get 5 audio hours free |
Custom | Per month, you will get 5 audio hours free.hosting: 1 model free per month 2 | ||
Multi-channel audio conversion | Per month, you will get 5 audio hours free | ||
Web/Container 20 concurrent request 1– Standard | Speech-to-Text | Standard | You need to pay per audio hour $1.28 |
Custom | Per audio hour, you need to pay $1.792 Endpoint hosting: per model per hour, you need to pay $0.0688 | ||
multi-channel audio conversion | You need to pay $2.69 per audio hour 4 | ||
Speech translation | Standard | Per audio hour, you need to pay $3.20 | |
Speaker Recognition7 | Speaker verification | N/A per 1,000 transactions | |
Speaker identification | N/A per 1,000 transactions |
For more information on the Pricing details, you can refer here
You may also like following the below articles
- How to Create And Consume Azure Function From ASP.NET Core
- Azure How Many Functions In One Function App
- CS1061 C# ‘HttpRequest’ does not contain a definition for ‘Content’ and no accessible extension method ‘Content’ accepting a first argument of type ‘HttpRequest’ could be found
- How To Create PowerShell Azure Function
- Where To Instantiate Database Connection In Azure Functions
Wrapping Up
Well, In this article, we discussed How To Convert m4a File To Text Using Azure Cognitive Services, How to Convert the m4a file to WAV file, Setting up Azure Cognitive Services Azure Portal and we also discussed Azure Speech To Text Pricing. Hope you have enjoyed this article !!!