In Alexa skills: foundation(part 1) we discussed the basics about Alexa and Amazon’s voice service in general and explored a few devices which are readily available for everyday purchase.
In this post, we will discuss concepts that come together to power Amazon’s voice tech.
We will cover the following concepts in detail:
- Custom skill
- Alexa Skills Kit
- Components of Custom skill
We will familiarise ourselves with basic concepts and in the next post we will build an Alexa skill using all these concepts.
So without wasting any time let’s discuss a Custom Skill.
A Custom skill for Alexa (Amazon’s voice service) can be defined as a capability that enables customers to create personalised experiences while using an Amazon voice activated device like an Echo Dot, Echo spot, Echo Show etc.
A custom skill is composed of two parts:
The Interaction model and
The Skill Service
Let’s discuss these components in detail
The Interaction model
In the context of Alexa, an interaction model is somewhat analogous to a graphical user interface in a traditional app. Instead of clicking buttons and selecting options from dialog boxes, users make their requests and respond to questions by voice – Amazon
In other words, interaction model comprises of the various phrases that the user can use while trying to interact with Alexa.
The table below shows quick comparison of user interactions, when a user interacts via touch events and when he/she interacts via voice commands.
|Action||Voice Activated devices (Interaction Model)||GUI Interface|
|Check Weather||User says, “Alexa, please tell me about the weather.”||User opens the weather app on his device and check weather by making use of tap events(button clicks, gestures etc.).|
|Collect more information from the user||Alexa replies, “For what city?”and then waits for a response.||App displays a dialog box and waits for user to enter the name of the city.|
|Provide needed information||User replies, “Sydney.”||User keys in the city name and clicks OK.|
|User’s request is completed||Alexa speaks the requested information:
“In Sydney, it is 13 degree celsius with cloudy skies…“
|App displays the results of the request.|
When users speak questions and make requests, Alexa uses the interaction model to interpret and translate the words into a specific request that can be handled by a particular skill. The request is then sent to the skill.
You define your own interaction model when creating a custom skill. The Smart Home Skill API and Content Skill API provide built-in interaction models.
Examples of Interaction Models
Note this phrase a user can speak:
User: Alexa, ask JP Morgan, What is the target price of ebay.
- “JP Morgan” is the invocation name that identifies a particular skill. When invoking a custom skill, users must include this name.
- “What is the target price of ebay” is a phrase in JP Morgan’s interaction model. This phrase is mapped to a specific intent supported by this skill.
Alexa uses this custom interaction model to create a structured representation of the request called an intent. Alexa sends the intent to JP Morgan skill. The skill can then look up tide information and send back a response.
As discussed above, other component of Custom Skill is the skill service.
The Skill service
The skill service is the code that lives somewhere on the internet and answers the questions of performs the tasks on Alexa’s behalf.
The custom Interaction model is what Alexa needs to route questions and tasks to skill service
We will be using the Alexa Skills kit to build our custom skills. We already know that to do so we need to create two components i.e. the interaction model and the skill services as discussed earlier.
Alexa Skills Kit
We can use Alexa Skills Kit to build not only Custom skill but also Smart home skill (to control home appliance like a thermostat or a switch) or Flash briefing skill(where users can get quick information from NEWS sources or other entertainment content).
The Alexa skills kit is defined by a set of following components:
- Software libraries
- Online Tools
Software libraries are programs used by developers to develop and build applications for a given platform like the Alexa voice service platform. The Alexa skills kit is currently available for Node.js, java, C# and Python. Since it’s easy to find tools to work in these languages, you have a plenty of options to write services to handle requests from Alexa.
For our demo project we will use Alexa skills kit sdk for Node.js. You don’t need to get worried about the details of the implementation as we will discuss everything that you will need later.
Amazon’s Developer documentation comes very handy when it comes to provide information about Alexa but you might find it a bit intimidating at first since the information provided is heaps. Don’t worry if that is the case, we will cover the concepts in a concise to the point manner that will make it easier for you to start and once you are comfortable you can go ahead and visit the following links to learn more:
All the things/tools that you will need to build your custom interaction models can be defined as Online tools.
In order to construct a custom skill, you need to provide Alexa some examples of dialogues that you expect Alexa to respond to, in other words, you need to teach Alexa how to interpret if a user is asking a question. This is known as a custom interaction model.
Let’s break down the components of a custom skill and have discuss terminology that is used to describe how a user interacts with Alexa.
Components of Custom Skill
Custom skill in Alexa is made up of four components:
- Intents and
Invocation is the act of beginning an interaction with a custom skill, for example the phrase: “Alexa, ask JP Morgan what is the target price of ebay?” tells Alexa to use the JP Morgan skill to get the price of ebay. The keyword JP Morgan is known as invocation name.
The names represent a custom skill name that can be used for interaction with Alexa. Normally, the invocation name should be comprised of two words as single word invocation names are not allowed unless the name is unique to your brand or IP.
The combination on invocation name and the phrase following the invocation name allows the correct skill to respond to user’s query. The phrase after the invocation name can be called as an example of an Utterance.
Multiple utterances are required to be configured for one skill in order for Alexa to answer correctly to a user query. For example, for above example a user might ask: “Alexa, ask JP Morgan how much is ebay selling for?”
An intent represents an action that fulfills a user’s spoken request. When you create a new custom intent, you provide a name and a list of utterances that users would say to invoke this intent – Amazon
In order to extract more detailed or specific information from an action/intent, we use slots. For example: “Alexa, ask JP Morgan to send me the latest research report by Joyce Chang”. Here we can see that the user wants the latest research report out of all the possible reports that might exist.
The sample utterances are set of likely spoken phrases mapped to the intents.- Amazon
You don’t need to cover every single utterance as Alexa will use Natural language processing to automatically respond to requests that are very close to the one’s already provided.
In the upcoming posts we will start planning and creating an Alexa Skill as we finally create intent schema. Till then enjoy reading and stay tuned!
For other updates you can follow me on Twitter on my twitter handle @NavRudraSambyal
Thanks for reading, please share it if you found it useful 🙂