Azure AI Services

Add smart API capabilities to enable contextual interactions

Speech Services pricing

Speech Services

*The following prices are tax-inclusive.
INSTANCE CATEGORY FEATURES PRICE
Free - Web 1 concurrent 1 Speech to Text Standard 5 audio hours free per month
Custom

5 audio hours free per month

Endpoint hosting: 1 model free per month 2

Enhanced add-on features:
Language identification
Batch diarization for 3+ speakers
¥3.66 per audio hour per feature
Text to Speech
Neural 0.5M characters free per month
Speech Translation Standard 5 audio hours free per month
Standard - Web 20 concurrent request 1 Speech to Text Real-time Batchv3.2 API or higher 3
Standard ¥3 per audio hour ¥1.83 per hour
Custom

¥4.452 per audio hour

Endpoint hosting: ¥0.547 per model per hour

¥2.3 per hour

Endpoint hosting: N/A

Enhanced add-on features:

  • Continuous Language identification
  • Diarization
  • Pronunciation Assessment (prosody, grammar, vocabulary, topic)
  • ¥3.05 per hour per feature

    Continuous Language Identification and Diarization Included 4
    Text to Speech
    Neural ¥95.4 per 1M characters
    Speech Translation Standard ¥10.176 per audio hour
    INSTANCE CATEGORY FEATURES PRICE
    Free - Web 1 concurrent 1 Speech to Text Standard 5 audio hours free per month
    Custom

    5 audio hours free per month

    Endpoint hosting: 1 model free per month 2

    Enhanced add-on features:
    Language identification
    Batch diarization for 3+ speakers
    ¥3.66 per audio hour per feature
    Text to Speech
    Neural 0.5M characters free per month
    Speech Translation Standard 5 audio hours free per month
    Standard - Web 20 concurrent request 1 Speech to Text Real-time Batchv3.2 API or higher 3
    Standard ¥3 per audio hour ¥1.83 per hour
    Custom

    ¥4.452 per audio hour

    Endpoint hosting: ¥0.547 per model per hour

    ¥2.3 per hour

    Endpoint hosting: N/A

    Enhanced add-on features:

  • Continuous Language identification
  • Diarization
  • Pronunciation Assessment (prosody, grammar, vocabulary, topic)
  • ¥3.05 per hour per feature

    Continuous Language Identification and Diarization Included 4
    Text to Speech
    Neural ¥95.4 per 1M characters
    Speech Translation Standard ¥10.176 per audio hour
    INSTANCE CATEGORY FEATURES PRICE
    Free - Web 1 concurrent 1 Speech to Text Standard 5 audio hours free per month
    Custom

    5 audio hours free per month

    Endpoint hosting: 1 model free per month 2

    Enhanced add-on features:
    Language identification
    Batch diarization for 3+ speakers
    ¥3.66 per audio hour per feature
    Text to Speech
    Neural 0.5M characters free per month
    Speech Translation Standard 5 audio hours free per month
    Standard - Web 20 concurrent request 1 Speech to Text Real-time Batchv3.2 API or higher 3
    Standard ¥3 per audio hour ¥1.83 per hour
    Custom

    ¥4.452 per audio hour

    Endpoint hosting: ¥0.547 per model per hour

    Custom Neural Training ¥529.152 per hour

    Custom Neural Long Audio Characters ¥1017.6 per M

    ¥2.3 per hour

    Endpoint hosting: N/A

    Enhanced add-on features:

  • Continuous Language identification
  • Diarization
  • Pronunciation Assessment (prosody, grammar, vocabulary, topic)
  • ¥3.05 per hour per feature

    Continuous Language Identification and Diarization Included 4
    Text to Speech
    Neural ¥95.4 per 1M characters
    Speech Translation Standard ¥10.176 per audio hour
    1 The concurrent requests applied to web endpoints only.

    2 Unused models will be automatically decommissioned after 7 days.
    3 To take advantage of this new pricing you need to use the new Speech to text REST API V3.2 preview. See Create a batch transcription - Speech service - Azure AI services | Microsoft Learn for information on the use of the new v3.2 preview API.
    4 所有 Batch API 版本的批处理价格中都包含了增强的加载项功能。

    Commitment Tiers

    Instance Category Features Price(Per Month) Overage
    Azure-Standard Text to Speech Neural 1 ¥6,105.6 for 80M characters
    ¥24,804 for 400M characters
    ¥95,400 for 400M characters
    ¥152,640 for 4000M characters
    ¥76.32 per 1M characters
    ¥62.01 per 1M characters
    ¥47.7 per 1M characters
    ¥38.16 per 1M characters
    Connected container - Standard Text to Speech Neural 1 ¥5,800.32 for 80M characters
    ¥23,563.8 for 400M characters
    ¥90,630 for 400M characters
    ¥145,008 for 4000M characters
    ¥72.5 per 1M characters
    ¥58.9 per 1M character-counts
    ¥45.32 per 1M character-counts
    ¥36.252 per 1M characters
    1 Real-time synthesis only, this does not include long audio creation.

    Computer Vision

    This state-of-the-art, cloud-based API provides developers with access to advanced algorithms that allow you to extract rich information from images to categorize and process visual data. Capabilities include image analysis, tagging, recognition celebrities, text extraction, and smart thumbnail generation.

    Image Analysis

    Instance Features Price
    Free (F0) - Web/Container All 5,000 free transactions per month20 transactions per minute
    Standard (S1) - Web/Container Group 1 Tag
    Face
    GetThumbnail
    Color
    Image Type
    GetAreaOfInterest
    People Detection (preview)
    Smart Crops
    OCR
    Adult
    Celebrity
    Landmark
    Object Detection
    Brand

    0-1M transactions - ¥6.36 per 1,000 transactions

    1-10M transactions - ¥ 5.088 per 1,000 transactions

    10-100M transactions - ¥ 4.134 per 1,000 transactions

    Group 2 Describe
    Read
    Caption
    Dense Captions

    0-1M transactions - ¥9.54 per 1,000 transactions

    1M+ transactions - ¥3.82 per 1,000 transactions

    Spatial Analysis

    Instance Features Price
    Free (F0) - Web/Container Spatial Analysis on Edge 1 free camera/month
    Standard (S1) - Web/Container ¥0.07314 per hour

    Content Moderator

    Content moderator enhances your ability to detect potentially offensive or unwanted images through machine-learning based classifiers, custom blacklists, and optical character recognition (OCR). It helps you detect potential profanity in more than 100 languages and match text against your custom lists automatically. Content Moderator also checks for possible personally identifiable information (PII). Each Text API call can contain up to 1,024 characters each. Scan images (minimum 128 pixels, maximum 4MB size) for adult and racy content, optical character recognition (OCR) detection. You can also match against custom image lists. Each API call is a transaction.

    *The following prices are tax-inclusive.
    INSTANCE TRANSACTIONS PER SECOND (TPS) FEATURES PRICE
    Free 1 TPS Moderate 5,000 transactions free per month
    1 TPS Review N/A
    Standard 10 TPS Moderate 0-1M transactions - ¥10.18 per 1,000 transactions
    1M-5M transactions - ¥7.63 per 1,000 transactions
    5M-10M transactions - ¥6.11 per 1,000 transactions
    10M+ transactions - ¥4.07 per 1,000 transactions

    Language Service

    Language Service API is a cloud-based service that provides advanced natural language processing over raw text, and includes three main functions—sentiment analysis, key phrase extraction, and language detection.

    *The following prices are tax-inclusive.
    INSTANCE FEATURES Inferencing
    Per 1,000 text records
    Free - Web Sentiment Analysis
    Key Phrase Extraction
    Language Detection
    Entity Extraction
    Document summarization (Extractive)
    Conversational language understanding
    5,000 transactions free per month
    Standard
    up to 100 requests per second and 1,000 requests per minute
    Sentiment Analysis
    Key Phrase Extraction
    Language Detection
    Entity Extraction
    Document summarization (Extractive)
    0-500,000 text records — ¥10.176 per 1,000 text records
    0.5M-2.5M text records — ¥7.632 per 1,000 text records
    2.5M-10.0M text records — ¥3.053 per 1,000 text records
    10M+ text records — ¥2.54 per 1,000 text records
    ¥20.352 per 1,000 text records
    Conversational language understanding ¥21.56

    Translator Text

    Translator Text API is a cloud-based machine translation service supporting multiple languages, reaching more than 95% of world's gross domestic product (GDP). Use Translator to build applications, websites, tools, or any solution requiring multi-language support.

    *The following prices are tax-inclusive.
    INSTANCE FEATURES PRICE
    Free Text Translation
    Language Detection
    Bilingual Dictionary
    Transliteration
    2M chars free per month
    S1 Text Translation
    Language Detection
    Bilingual Dictionary
    Transliteration
    ¥102 per million chars
    Document Translation ¥152.6 per million characters
    S2 Text Translation
    Language Detection
    Bilingual Dictionary
    Transliteration
    ¥20,925 / month / Up to 250M chars per month, Overage : ¥84 per million chars
    S3 Text Translation
    Language Detection
    Bilingual Dictionary
    Transliteration
    ¥61,070 / month / Up to 1B chars per month, Overage : ¥61 per million chars
    S4 Text Translation
    Language Detection
    Bilingual Dictionary
    Transliteration
    ¥457,932 / month / Up to 10B chars per month, Overage : ¥46 per million chars
    D3
    Variable cost plus Fixed plus overage
    Document Translation ¥61,817/month
    675M chars per month included
    Overage: ¥10.1124 per million chars

    Translator Text

    Translator Text API is a cloud-based machine translation service supporting multiple languages, reaching more than 95% of world's gross domestic product (GDP). Use Translator to build applications, websites, tools, or any solution requiring multi-language support.

    *The following prices are tax-inclusive.
    INSTANCE FEATURES PRICE
    Free Text Translation
    Language Detection
    Bilingual Dictionary
    Transliteration
    2M chars free per month
    S1 Text Translation
    Language Detection
    Bilingual Dictionary
    Transliteration
    ¥102 per million chars
    Document Translation ¥152.6 per million characters
    S2 Text Translation
    Language Detection
    Bilingual Dictionary
    Transliteration
    ¥20,925 / month / Up to 250M chars per month, Overage : ¥84 per million chars
    S3 Text Translation
    Language Detection
    Bilingual Dictionary
    Transliteration
    ¥61,070 / month / Up to 1B chars per month, Overage : ¥61 per million chars
    S4 Text Translation
    Language Detection
    Bilingual Dictionary
    Transliteration
    ¥457,932 / month / Up to 10B chars per month, Overage : ¥46 per million chars

    Language Understanding

    Language Understanding (LUIS) offers a fast and effective way of adding language understanding to applications. With LUIS, you can use pre-existing, world-class, pre-built models whenever they suit your purposes. When you need specialized models, LUIS guides you through the process of quickly building them.

    *The following prices are tax-inclusive.
    INSTANCE TRANSACTIONS PER SECOND (TPS) 1 FEATURES PRICE
    Free 2 -
    Web
    5 TPS Text Requests 10,000 transactions* free per month *
    Standard -
    Web
    50 TPS Text Requests ¥15.26 per 1000 transactions per month *
    1 TPS only applies to web endpoint.

    2 Free Tier includes only text as an input.

    * Dispatch will do two text transactions per request.

    Training

    Instance Feature Training
    Free - Web Conversational language understanding Standard training: free
    Advanced training: up to 1 hour free
    Standard(S) - Web Conversational language understanding Standard training: free
    Advanced training: ¥32.3 /hour

    FAQ

    Expand all

    Common

    Computer Vision

    • What operations can be completed with Computer Vision API?

      Tag - The Computer Vision API returns tags based on more than 2,000 recognizable objects, living beings, types of scenery, and actions. If tags are ambiguous or unusual, the API response will provide 'hints' to clarify the meaning of the tag.

      GetThumbnail - GetThumbnail generates high quality thumbnails after images are uploaded. The Computer Vision API algorithm analyzes objects within images, then crops images according to the requirements for the region of interest (ROI).

      Color - The Computer Vision algorithm extracts colors from an image. The colors are analyzed in three different contexts (foreground, background, and whole). They are grouped into 12 dominant accent colors.

      Image Type - The Computer Vision API can set a Boolean flag to indicate whether an image is black and white or color. It can use the same method to indicate whether an image is a line drawing. It can also indicate whether an image is clip art, along with its quality.

      OCR - Optical Character Recognition (OCR) technology detects text content in an image and extracts the identified text into a machine-readable character stream. You can use the results for searches and numerous other purposes, from medical records to security and banking. It automatically detects the language. OCR saves time and provides convenience for users by allowing them to simply take photos of text instead of transcribing it. Please refer to the Computer Vision Documentation page for supported languages.

      Adult - Apply adult/racy settings to automatically restrict adult content in images.

      Celebrities - Azure’s celebrity recognition model can recognize 200,000 celebrities from business, politics, sports, and entertainment around the world.

    Content Moderator

    Text Analytics

    • How does billing for the Text Analytics API work?

      The Text Analytics API can be purchased in units of the S0-S4 tier at a fixed price. Each unit of a tier comes with included quantities of API transactions. If the user exceeds the included quantities, overages are charged at the rate specified in the pricing table above. These overages are prorated and the service is billed on a monthly basis. The included quantities in a tier are reset each month. In the S tier, the service is billed for only the amount of Text Records submitted to the service.

    • What happens if I exceed the transaction limit on my free tier for Text Analytics?

      Usage is throttled if the transaction limit is reached on the Free tier. Customers cannot accrue overages on the free tier.

    • What constitutes a transaction in the S0-S4 tiers of the Text Analytics API?

      Any annotation to a document counts as a transaction. Batch scoring calls will also take into consideration the number of documents that need to be scored in that transaction. So for instance, if 1,000 documents are sent for sentiment analysis in a single API call, that will count for 1,000 transactions. If an API supports more than one annotation operation, that will also be considered. Let’s say an API call performs both sentiment analysis and key-phrase extraction on 1,000 documents, that will count for 2,000 transactions (2 annotations × 1,000 documents).

    • What happens if I exceed the transaction limit on the S0-S4 tier?

      If the usage on the S0-S4 tier is exceeded, the account starts to accrue overages. These overages are billed on a monthly basis and are calculated at the rate specified for each tier.

    • Can I change the tier of service I subscribed to?

      You may upgrade to a higher tier at any time. Billing rate and included quantities corresponding to the higher tier will begin immediately.

    • What constitutes a Text Record in the S Tier?

      A text record in the S tier contains up to 1,000 characters as measured by String.Length . If an input document into the text analytics API is more than 1,000 characters, it counts as one text record for each unit of 1,000 characters. For instance, if an input document sent to the API contains 7,500 characters, it would count as 8 text records. If an input document sent to the API contains 500 characters, it would count as 1 text record. If two documents are submitted, one document of 500 characters and one document of 1,200 characters, then the service would be billed for three text records in total: one record for the 500 character document and two text records for the 1,200 character document.

    Translator Text

    • How do I calculate monthly volume?

      For the Microsoft Translator Text API, the volume you are billed for is the number of characters in the input. Every Unicode code point counts as a character. Every character of the input counts. Each translation of a text to a new language counts as a separate translation. The number of queries, words, bytes, or sentences is irrelevant.

      To estimate your monthly volume, take the total characters to translate, multiply it by the number of languages you want to have it translated into, then take the number and spread it over the maximum number of hours or days you are able to wait for completion.

      More information on how we count characters for the Translator Text API can be found in our documentation .

    • What happens if I reach the limit of the free subscription plan?

      If you subscribe to the free subscription plan, the Microsoft Translator service will stop if you reach 2 million characters during a subscription month for the Text Translation API. The Microsoft Translator service will start again at the beginning of your next subscription month or when you change your subscription to a paid plan.

    • What languages does Microsoft Translator support?

      See the language list for text translation using the Microsoft Translator Text API.

      Developer oriented language lists, including language codes can be found in our documentation .

    • Can I customize my translations?

      Customization currently is not available with subscriptions on Azure.cn.

    Language Understanding

    Speech Services

    • How does billing work?

      For Speech Translation, Speech to Text : usage is billed in one-second increments

      For Text to Speech : usage is billed per character

      Please reference the pricing note here for the SSML tag charging, Chinese, Japanese and Korean(CJK) character pricing.

    Support & SLA

    If you have any questions or need help, please visit Azure Support and select self-help service or any other method to contact us for support.

    We guarantee that Azure AI Services running at the Standard tier will be available at least 99.9% of the time. No SLA is provided for the Free tier. If you want to learn more about the details of our server level agreement, please visit the Service Level Agreement page.