The following are some examples of VoiceFirst related development concepts. Because development is so different between platforms, and because it's constantly evolving, we recommend using the platform's latest documentation as your primary guide.

Front End

The "front-end" of a voice app typically includes an invocation name, an interaction/dialog model, common utterances, intents, slots and more. These may differ between voice assistant platforms.

Alexa Skills

  • Custom Interaction Model
  • Smart Home Skills (Pre-built Model)
  • Flash Briefing Skills (Pre-built Model)
  • Video Skills (Pre-built Model)
  • Music Skills (Pre-built Model)
  • List Skills
  • Intents: An intent represents an action that fulfills a user's spoken request. Intents can optionally have arguments called slots. Intents are specified in a JSON structure called the intent schema.
  • Sample utterances: A set of likely spoken phrases mapped to the intents. This should include as many representative phrases as possible.
  • Custom slot types: A representative list of possible values for a slot. Custom slot types are used for lists of items that are not covered by one of Amazon's built-in slot types.
  • Dialog model (optional): A structure that identifies the steps for a multi-turn conversation between your skill and the user to collect all the information needed to fulfill each intent. This simplifies the code you need to write to ask the user for information.
When designing and building a custom skill, you create the following:
  • A set of intents that represent actions that users can do with your skill. These intents represent the core functionality for your skill.
  • A set of sample utterances that specify the words and phrases users can say to invoke those intents. You map these utterances to your intents. This mapping forms the interaction model for the skill.
  • An invocation name that identifies the skill. The user includes this name when initiating a conversation with your skill.
  • If applicable, a set of images, audio files, and video files that you want to include in the skill. These must be stored on a publicly accessible site so that each item is accessible by a unique URL.
  • A cloud-based service that accepts these intents as structured requests and then acts upon them. This service must be accessible over the Internet. You provide an endpoint for your service when configuring the skill.
Alexa - Speech Synthesis Markup Language (SSML) In some cases you may want additional control over how Alexa generates the speech from the text in your response. For example, you may want a longer pause within the speech, or you may want a string of digits read back as a standard telephone number. The Alexa Skills Kit provides this type of control with Speech Synthesis Markup Language (SSML) support. The Alexa Skills Kit supports a subset of the tags defined in the SSML specification.

Google Actions

Google - Ways to build Actions
  • Enhance your web content to increase discoverability through Search and the Assistant
  • Extend your Android app to the Assistant
  • Build for the Assistant with vertical solutions
  • Build for the Assistant with templates
  • Build custom Assistant experiences
Google Actions - Speech Synthesis Markup Language (SSML) When returning a response to the Google Assistant, you can use a subset of the Speech Synthesis Markup Language (SSML) in your responses. By using SSML, you can make your agent's responses seem more like natural speech. The following shows an example of SSML markup and how it's read back by the Google Assistant.


For most voice apps, you'll need a back-end that will receive POST requests when a user interacts with your voice app. The request body contains parameters that your service can use to perform logic and generate a JSON-formatted response.


Alexa Hosted You can build, edit, and publish Alexa skills without leaving the Alexa Developer Console using Alexa-hosted skills (beta). With an Alexa-hosted skill, there's no need to create your own AWS account. Instead, the Alexa-hosted skill will automatically provision and manage AWS cloud services for your skill’s back end. This allows you to build skills faster and spend your time designing engaging experiences, rather than managing cloud services. If you decide to have Alexa host your skill, you'll get access to our code editor, which will allow you to deploy code directly to AWS Lambda from the developer console.
The easiest way to build the cloud-based service for a custom Alexa skill is to use AWS Lambda, an Amazon Web Services offering that runs your code only when it's needed and scales automatically, so there is no need to provision or continuously run servers. You upload the code for your Alexa skill to a Lambda function and Lambda does the rest, executing it in response to Alexa voice interactions and automatically managing the compute resources for you.
Provision Your Own You can host your own HTTPS web service endpoint as long as the service meets the requirements of your front-end. Your service will receive POST requests when a user interacts with your voice app. The request body contains parameters that your service can use to perform logic and generate a JSON-formatted response.