Speech recognition in WordPress based on Alexa Skill - wordpress

I would like to develop a WordPress plugin that will allow users to voice-interact with a WordPress website. I want it to be based on Alexa Skill.
What would be the architecture for this task?

If you think your use case is relatively standard, you can take a look at VoiceWP, which was built to allow for management of an Alexa skill mostly from within WordPress.
If you need something more custom, you can use the WordPress REST API to provide Alexa with the data you need. With this architecture, your plugin on the WordPress side would just be setting up and managing all the REST API endpoints.
From the top down the architecture looks like this:
This leaves you with 3 pieces to build:
Set up the Alexa Skill
First, you have to set up the skill with the Alexa Skills Kit. This involves setting up things like the name of your skill, the icon, and most importantly, where the skill should look to get it's functionality. In our example, we'll point the skill to an AWS Lambda function.
Set up the Lambda Skill to fulfill the Alexa input
Once the Skill knows to look to the Lambda function for it's functionality, we actually need to code the Lambda function. This can be done in Node.js (JavaScript), Python, Java (Java 8 compatible), C# (.NET Core) or Go. What the Lambda function needs to do is parse the JSON that comes from Alexa and determine which endpoint to call or which parameters to pass to this endpoint. For an example of this in Python, you can check out my example on GitHub.
Set up WordPress endpoints to provide data
Once you have the Lambda function parsing the user's intent and pushing the request to the specific endpoints, you need to write the code from within WordPress to make sure all the endpoints you need are available. This is the part that I'm able to give the least input on because the specific endpoints that you will need are based on your use case, which I don't really know at this point. But for an example of how we created a settings field and returned that value through a custom REST API endpoint, you can see this example on GitHub.
Wrapping up and Extending it Further
So once the data is returned from WordPress, formatted by the Lambda function and returned to Alexa, the user will hear the results of their query.
This can be customized and further functionality added by adding more endpoints to WordPress and more routing to the Lambda function based on new Alexa voice inputs.
Further Reading Watching
If you're interested in learning more, I've given a couple talks about this:
WP REST API as the Foundation of the Open Web Voice stuff starts at 11:06
Voice Is The New Keyboard: Voice Interfaces In 2018 And Beyond - This uses Google Home for the custom skill, but the ideas presented here are the same.

Related

HTTP POST from GOOGLE ASSISTANT to PRIVATE SERVER and convert response in voice

I want use Google Assistant from my phone to send HTTP POST command to my server. I have a simple webnms app running over it, this server support REST API and now I want to use Google Assistant to shoot GET or POST command to that server and return my output.
Is it something possible? I am not full time developer.
Yes, as #Prisoner says it is possible. It is not what you asked - but have you seen these ways that Google provides to get skills published without requiring a lot of developer savvy?
https://developers.google.com/actions/content-actions/
https://developers.google.com/actions/templates/first-app
I don't speak for them, but IMO Google's target audience for Action building apart from the above is those who have at least some familiarity with the JavaScript language and its "run-time" Node.
There is also this - which I haven't tried by the way.
https://www.techadvisor.co.uk/how-to/digital-home/easy-actions-google-assistant-3665372/
In case it is not obvious, Google Actions are essentially websites that interact with Google's assistant running on a Home device or a smart phone, say. Think of the Assistant as a browser initiating requests and your Action as serving them. If you can (build and?) deploy a server that handles POSTS over HTTPS on a publicly addressable URL, and if you can understand the JSON payload that the Assistant sends and respond with appropriate JSON to carry out you application then you are good to go.
Where you don't have a public IP address - e.g. in testing - you can use a tool like ngrok ( https://ngrok.com/ ) to reverse proxy requests emanating from the Assistant to your server.
I have slides for a presentation I did targeting fledgling developers who had never built an Action here
https://docs.google.com/presentation/d/1lGxmoMDZLFSievf5phoQVmlp85ofWZ2LDjNnH6wx7UY/edit?usp=sharing
and the code that goes with it here
https://github.com/unclewill/parrot
On the upside the code is about as simple as it gets. On the downside it does almost nothing. In particular, it doesn't try to understand language. As #Prisoner says you'll likely need a tool like Dialog Flow for that.
Yes, it is possible.
Your server will need to implement the Actions on Google API. This is a REST API which will accept JSON containing what the user is intending to do and specific information about what they have said. Your server will need to send back JSON indicating the reply, along with additional information about how to continue the conversation.
You will likely also want to use a tool such as Dialogflow to handle building the conversational script and converting a user's phrases into something that makes sense to you. You'll also need to use the Actions on Google console to manage your Action and provide additional details about how users contact your Action. All of this is explained in the Actions on Google documentation.
Simple Actions are fairly easy to develop, and can certainly be done by a developer as a hobby. Good Actions, however, take a lot more thought and planning. Google offers you to the tools - it is up to you to best take advantage of them.
I've found the solution.
In the "Action" console https://console.actions.google.com/project/sandbox-csuite/scenes/Start
Go to menu "Webhook", click "Change fulfillment method", and then select "HTTPS endpoint"

Can I restrict Bing Autosuggest API to suggest only technical skills?

My application let users capture and organize their online learning experiences. For this we ask for users' learning interests during onboarding. Just wanted to check if I can make use of AutoSuggestion API to provide suggestions to users. Here the intention of users is to enter something related to learning interest like JAVA, AWS, Oracle, Geography, digital marketing, SEO etc.
Ex. if user enters "ja", the application should show java, java script. Currently I get the following responses: java, jacobsconnect, jamba juice, jack in the box etc. Am using Test API console for Auto Suggest API. It finally makes this HTTP request
GET https://api.cognitive.microsoft.com/bing/v5.0/suggestions/?q=ja
HTTP/1.1 Host: api.cognitive.microsoft.com Ocp-Apim-Subscription-Key:
••••••••••••••••••••••••••••••••
I can build a curated list of skills and implement auto-fill. But am just curious to know if I can use Auto Suggest API instead. I couldn't find any useful information through their online documents.
I have got the following response from Azure service team and unfortunately the requested feature is not possible with Autosuggest API. Please find their response below:
Hi! We wanted to follow up and let you know that the service team informed us the functionality requested in your SO post is not part of the AutoSuggest API. They have, however, noted your interest in this type of behavior and appreciate the feedback.

AWS Alexa - perform basic auth

I am trying to create a skill that will reach out to an application that uses Basic authentication to render APIs (albeit i know this is bad practice). I was wanting to go down a route similar to account linking, however seems they enforce the usage of OAuth 2.0.
Is there an alternative to this or am I forced to use OAuth 2.0 in order to request APIs to a 3rd party application?
My wanted workflow:
customer enables skill
Skill card request for username/pw combo
after setup, the skill can be utilized fully
Not sure if its helpful, but Im using Lambda to run my skill source code.
That is a terrible practice.
First of all, what if your user's password includes case sensitive letters and numbers and possibly other characters?
You can use Literal Slots but they are not case sensitive and probably won't return a number-word combination either. For example your user's pass is Word123 literal slots may return word one two three
https://developer.amazon.com/public/solutions/alexa/alexa-skills-kit/docs/alexa-skills-kit-interaction-model-reference#literal-slot-type-reference
I am not sure if you can force user to spell his password's characters and so then you can try to detect the password though... Again this sounds like a terrible practice.
So as you mentioned: Users link their accounts using the Amazon Alexa app. Note that users must use the app. There is no support for establishing the link solely by voice
I guess you have to do the linking the way amazon requires
https://developer.amazon.com/blogs/post/Tx3CX1ETRZZ2NPC/alexa-account-linking-5-steps-to-seamlessly-link-your-alexa-skill-with-login-with-amazon

developing and testing alexa skill (with authorization)

I am about to develop my first custom skill for Alexa. I do not have a Echo device.
What I did was to creating and testing a basic skill with the amazon developer console (Alexa Skill + Lambda).
Now I'm have some general (nooby) questions here:
1) Is this really the way you have to develop and test your custom skills? I mean it is not the real user experience that can be tested. You have to enter the text and analyse the JSON request/responses. So, there is no realistic end-to-end testing possible?
2) What happens when you finish the developing phase in the Amazon developer console? I'm currently in the Testing step but I can see that the next steps are about publishing information (images, texts, etc.) and I can also see the button "Submit for Certification". So for me it seems that my custom skill gets published on some kind of market to other Alexa users? Is this correct? Is there a way to just use this skill for my personal usage - just like a APK-file Android app?
3) I'm developing a custom skill that needs some kind of authorization (User). I see there is a large article about it and it seems that there is some action on the Alexa App needed on the smartphone. My question is now here, how to test it without having a real device? Is it actually possible?
I'd suggest, first test locally, then use the test console and
finally, you can use https://echosim.io which will provide you a
very close test bed to what you get when interacting with the Echo
(more precisely, the Echo Tap, you have to tap the button for it to
listen).
If you just want the skill for yourself, forget about
anything past the testing step. That extra information is only for
the "store" as you guessed.
If you only need to identify individual
users, then you DO NOT need to use the user authentication stuff.
There is a unique user identifier provided in every request. If you
want to authenticate users with a third party Oauth-like scheme,
then read that document.
There's a pretty useful series by Big Nerd Ranch about developing the skills locally using NodeJS: https://www.bignerdranch.com/blog/developing-alexa-skills-locally-with-nodejs-setting-up-your-local-environment/. They use alexa-app, mocha, chai, and alexa-app-server.

Will Google block my access if I use their features without token?

I'm using this link https://www.google.com/reader/api/0/stream/contents/feed/FEEDHERE?output=json&n=20
to fetch feeds using Google's algorithm. As you can see I'm not adding any other parameters, just fetching the returned data in JSON format. My app will be heavily used hopefully and if I send a lot of requests to this link, will Google block my access or something?
Is there anything I can include, like userip, url for my app (so if they have problem to just contact me) or something else?
The most basic answer to your question is that Google will change its Terms of Service whenever it likes, and you've got no say in the matter. So if it's allowed today, it might not be allowed tomorrow, at Google's whim.
On this issue, though, you seem fairly safe. From the Terms of Service (these is the general document, since Reader doesn't seem to have a specific one):
Don’t misuse our Services. For example, don’t interfere with our Services or try to access them using a method other than the interface and the instructions that we provide.
Google provides RSS and Atom. They provide these feeds, so I assume they expect that they'll be used. They don't say that it's a misuse to point someone else at those feeds, so it looks OK for now, but they could add such a clause at any time.
All online services are subject to the terms and conditions of the providers of those services. So, as others have said, they may be ok with your use today, but they can change their mind any time down the line. I doubt including a URL or email or contact info will help anything, because when these services change, they don't notify every user of the service, they just announce the change publicly, and usually they give several month's notice in order to give users a chance to adapt their applications, but this is not standardized or enforced so there is no guarantee. One example would be the fairly recent discontinuance of the Google Finance API (for which no replacement has been announced).
The safest approach would be to design your app such that this feature that uses google's functionality is decoupled as much as possible from the rest of your app, so that, when or if the availability of the service changes (ie it's no longer available at all) you can adapt your app to use some other source for the feeds with minimal impact to the rest of the app. Design for change and plan for the worst.

Resources