Custom Wake Word with Speech SDK - microsoft-cognitive

I'm looking for a way to have an application just process voice commands every time I say a specific phrase. The behavior should be similar to Cortana, Alexa or others. The application will be deployed on a hands-free device, but the Speech device SDK cannot be used, because it runs solely on a PC.
Is there an addition to the Speech SDK with a custom wake word yet?

MS speech recognition uses RESTful APIs to cloud services. You can use them from any device.
You can use speech-to-text and voice command services on Android or iOS. https://azure.microsoft.com/en-us/services/cognitive-services/speech/?v=18.05: Examples exist for both Java and Objective-C.
I think you also want to check out the preview of speaker recognition.
https://learn.microsoft.com/en-us/azure/cognitive-services/speaker-recognition/home
Does this help?

Looks like you would need to go to a specific site to set a wake work as it's still in preview mode. That will generate some files that you'd download for the SDK.
There's more in this documentation page.

Related

Can we use Botium CLI for testing a mobile voice app in SauceLabs?

Can we leverage Botium speech processing and Botium CLI to test a mobile voice based App (built on Rasa)?
The idea is to test the app on SauceLabs or any cloud service. Most documentations are around Alexa Skills.
This article by Florian Treml is nice but it uses Botium-box and is a different use case than ours.
https://medium.com/swlh/beginners-guide-to-automated-voice-app-testing-4596dd9130fd
Is there a working example that we can refer to?
Upfront, most things that work in Botium Box are working in Botium Core and Botium CLI as well, but it requires more setup effort.
When testing a chatbot, doing End2End-Tests on Appium at the very beginning is a bad idea - I wrote about the reasons here.
I recommend to start testing on the Rasa API level.
And to answer your question: right now it is not possible out-of-the-box with the Botium Webdriver connector to send and receive voice - this requires customization, and it heavily depends on implementation details of the app under test.
UPDATE
For testing Rasa on API Level with audio input, I wrote a blog article how to add voice capabilities in the Rasa processing pipeline. And this scenario can be tested with the Botium Socket.io connector, there is even a Rasa sample available based on the mentioned article: https://github.com/codeforequity-at/botium-connector-simple-socketio/tree/master/samples/rasa

Will ASP.NET Form Get value from Barcode

ASP.NET Form. If running a form in a browser on a small (Android) device with a barcode scanner, will the scanned barcode go into the ASP.NET textbox? Or I need to add something to the application?
Well, it going to depend on which of the 150+ barcode scanners you decide to grab from google play.
However, the answer is yes, or no. It will depend on the kind of scanner.
If you download just a scanning application (software based - not built in scanner).
The reason is Android (and even iOS) don't allow one application to set focus, get/grab/take data from other applications. Nor is the reverse allowed. If that was possible, then the app could also get/grab/take values from when you are say running your on-line banking application.
I don't think Android thus supports focus to another application during scan that has focus. Now if this is factory supplied software on the phone? Then yes, this works like a desktop keyboard "wedge". That means the program does not know if you are typing from keyboard, or input is from the scanner (hence the name keyboard wedge). These will work with a web form.
However, we now seeing the rise of software based keyboard wedges. That means the software scanner is installed on android as a custom keyboard. And this in case, then once again, it will work in a web form.
So, for devices with a built in scanner? yes, that will work in all applications. For a software only (uses built in camera), then again, this is possible if the software in question works as a keyboard/wedge scanner.
If you going to adopt android scanning? then use a purpose built Android scanner.
And another possible if you want to use a software scanner? Write a small android application and have it talk to your web site. This I think is the best solution, but of course means you have to adopt some Android dev tools.
So how this works will depend on if the android device has a built in scanner, or it is a software + camera based scanner. However, it would seem that even now installable software based scanners in theory can be made to work for any application since the application is running and behaving as a user installed keyboard.
So, you have to check the particular device. The answer is not in all cases, and the answer depends on if you using a Android device with a built in scanner, or you looking to use any Android phone as that scanner.

Android Things - OTA via bluetooth?

I haven't had much success searching for this. I'm developing an Android Things application that will connect to a user phone to do certain things. I want to use this for delivering app updates as well.
So far, my crude searches on this have just discussed OTA via the Console and thus internet.
My gut has said that I could just build this - I could have a new version of the APK, transfer it to my device via bluetooth, and then just have the device copy it over the old one and reboot. But, not sure. I was hoping maybe there was an API for this and I'm just not wise enough to know how to find it via the searches.

Possible to cross-platform develop Watch/Wearable applications?

since I am new in the world of developing apps for watches, and the fact that it exists for smartphones the following frameworks:
Xamarin
PhoneGap
appcelerator
kony
Cordova
...
I wonder if there exists for watches apps similar frameworks? So that you code once but run overall.
Thanks
Edit 1:
At this day (12.05.2015) regarding to the answer of a nativescript maintainer here. I will go with nativescript to start writing app for wearables.
Cordova/PhoneGap apps don't work directly on the wearable devices/watches. Cordova/PhoneGap is basically a javascript API which can run on WebKit/WebView on all the mobile OS's. But the Android Watch and Apple Watch doesn't support WebKit and so the apps developed with Cordova don't work directly on Watch devices. But if want to extend some of the features from the existing Cordova app to the wearable app, you need to create the extension app in native language and the extension should be able to communicate with the paired app on the mobile device. The extension on the Watch will have only UI and the bussiness logic etc runs on the Cordova app on the mobile. It is possible to establish communication between these apps which will drive the display on the watch devices.
I am not sure about the other frameworks you listed above on how much they support wearable devices.
As #kiran and #NRimer have mentioned, these cross platform frameworks are relying on the WebKit/WebView which is the almost universal layer supported on every mobile device. They dont run directly on the device, but device runs WebKit platform that runs these cross platform apps. So comparing the capabilities of the native app with cross platform app, native app is bigger, because it can have a hands on device hardware related features. The thing particular to the smart watches is that they mostly rely on other smart phone device, and it uses it's communication protocols, that are hardware specific, and WebKit doesnt have its hands on it.
It depends on what you're looking to do with the framework. Watch apps build off data provided by their containing app. For example if you want to provide custom notifications on the watch, the app (or server for remote notifications) constructs them. When your watch app needs information, it makes a request to the containing app. Lets say you have a group of apps that you want to provide the same notifications or functions on each of their watch apps, you could make a framework that handles these functions for the containing app. As for the watch portion, think of it as more of a display of information provided. Unfortunately i dont think there's a way to generate frameworks for watch apps yet. If you're looking to have a lot of code within the watch app this might be more difficult but for simple display of information you should be alright.

Text to Speech in ASP.NET

I would like to do some japanese text to speech on my dedicated windows 2003 x64 server with .net framework, using c#
I found something on google, but requires to install a lot of files on the server... i don't like, for stability issues: there is another option, like a linked dll or something?
You can use Microsoft Speech SDK. It's a set of COM APIs containing TTS and SR engines. I'm not sure if it contains Japanese TTS though.
What you most likely want is the Microsoft Speech Server especially if your webite is going to encounter any decent load or volume.
From the site:
"A speech platform, MSS contains all
the server components for deploying
telephony (voice-only) and multimodal
(voice/visual) applications. MSS
combines Web technologies,
speech-processing services, and
telephony capabilities into a single
system. "
There is also a dedicated Microsft Speech community which will likely help you get started in this realm. Also, I'm not sure what the latest version is...2004 R2?
This article has a decent diagram outlining the various components. Looks like a good fit for integration with an ASP Web Application.
using SAPI in an ASP.NET website, is impossible: the sound will be reproduced on the server :S
It seems that there is the need of Microsoft Speech Server
...
Or not? With asp.net is possible to run a commandline exe on the server to save an mp3, then stream that mp3, right? (how to do that? i will try to figure it)
I will go this way, i let you know the result :)
edit: this is how i solved:
How to save text-to-speech as a wav with Microsoft SAPI?
I save the generated voice in a wav file, then i embed it on the page, playing it in a flash player
COOL!!
Use Microsoft Speech Library and see this article Text to Speech with the Microsoft Speech Library and SDK version 5.1 in CodeProject. Also see Giving Computers a Voice in Coding4Fun
The System.Speech.Synthesis namespace has been part of the framework since .NET 3.0. However, it has internal dependencies on the Speech SDK COM libraries (it chooses the correct version depending on the host OS), so I would recommend prototyping the work before you jump in.
The class you should probably look at first is System.Speech.Synthesis.SpeechSynthesizer (whitepaper and example code)
Warning: I have personally experienced issues using the speech APIs in an ASP.NET environment whereby the request that returned the audio data never returned. Despite heavy debugging I was never able to resolve the issue and the feature was dropped. I have had an unresolved support case with Microsoft for 12 months now.

Resources