I have an existing web app with search functionality. I want to implement voice search with Google's Speech-to-Text API. Currently, I'm using WebRTC and Websocket to recieve audio stream from the browser and send it to the server, but it seems too slow. My biggest concern is the speed and calling API as few times as possible. Which option would be better in terms of user experience, and how do other companies implement voice search in real world?
Related
I have recently purchased a Nest Camera (Wired, Indoor) today and I am wanting to stream it to Youtube so I can embed a live-feed into my website and so I can share the stream with my friends so I do not have to add them to the Google Home application and give them access to all devices.
I've seen the api documents for it, but I do not understand them. I found the API Documents on google's developer site (https://developers.google.com/nest/device-access/api/camera-wired#extend_a_live_stream), however I do not understand how this works, or how to complete it.
In my personal opinion, there should be a feature to stream it to a platform such as Youtube, such a feature would make things a lot easier for many people.
I have so far found no documents explaining how to do this except from the api documents.
There is no simple "out of the box" way to do this. You would need to setup some kind of device as an intermediary. One one side it would connect to the SDM API, open a stream, and start receiving the data. On the other side it would need to connect to the YouTube API (or equivalent) and pass the data through. Unfortunately you would need some degree of programming skill to engineer such as system.
I'm setting up a server using firebase and want to load Tweets dynamically/live from Twitter by filtering them by hashtags.
Basically what I want to do is integrate a live feed of sorts from twitter to load tweets about Bitcoin and other Cryptocurrencies on my webpage. I'm using Vue CLI for the front end.
I've done extensive research on twitter, signed up for a developer account and had many go's and tries, but without any luck. I am really stuck as there does not seem to be any way to fetch tweets and then display the, live on my front end.
Ì actually do not have any code to show, as I do not even understand how it would be possible.
I've set-up the backend successfully on firebase and do not have any issues with CRUD operations and Authentications etc. What I need is to dynamically load(live) tweets from twitter and then filter them using a hashtag.
I haven't been successful at even understanding if this is possible, so I haven't received any error messages. It seems to me that you can only let users sign in and then they can post tweets through an integrated API.
You have to open a twitter stream in your backend and send the results to the frontend using sockets, if you want it to be updated in real time.
You can check this out to get you started:
Running a Node.JS background process for Twitter streaming API in Firebase
Adding to what Haris has said, another alternative would be to use SSE (Server-Sent Events) since you are only concerned of unilateral communication rather than bilateral communication. A tutorial for this action, however, will depend on the backend framework you are using.
Feel free to consult the following link that suites your framework on how to use SSEs:
Server-Sent Events with Node
Real-Time Web Apps with Server-Sent Events (Express JS)
Server-Sent Events with Fastify (fastify-sse)
Server-Sent Events with Hapi (SuSiE)
After the user invokes an app on Alexa , Is there a way to get the query as a voice stream/audio file of a user? Through alexa I want to send the stream to a webservice/lambda that the invoked app will call and analyze the intent there.
We have some proprietary code that we want to use for analyzing intent hence we cant do it on the alexa side
Since I am sending the query after the user has invoked the app and through the app there are no privacy concerns(hopefully)
Thanks
No, that is not possible, and I don't think it will be.
Echo devices connect to Amazon only, and Amazon uses Lex (which is also available via AWS) to parse speech files. As a skill developer, you will only receive the parsed results: intent, slots - and maybe, when Amazon implements user differentiation, an anonymous ID for the speaker.
There is no way to access the original speech audio in your skill. As every file is also used by Amazon to train their speech recognition, I doubt they will open their ecosystem accordingly.
Only option I see currently: build your own Echo with e.g. a Raspberry Pi, then you have full control. But you can't leverage the install base of Echo.
Same applies to Google Home and Microsoft Cortana, so it's not just Amazon.
I would like to have a custom skill, but it would need direct access to the users voice (our output of a recorded audio). Can/will Alexa relay the stream rather than sending the request invocations (launch/intent/session-end)?
I understand custom skills can send back mp3s as responses, but being able to gain access to the actual voice requests, either the stream or a mp3, would be awesome.
Edit:
It seems that there is not a provided mp3 in the request object: https://developer.amazon.com/public/solutions/alexa/alexa-skills-kit/docs/alexa-skills-kit-interface-reference#LaunchRequest
Alexa does not provide this service.
Having an always-on device in a domestic setting, that can hear everything said, plus background noise, and side conversations, is a huge security concern. Amazon mitigates this concern by filtering the input, performing the difficult Speech-to-text work, and only providing the resulting text. (After further processing by your interaction model.)
In short, no - I can't find anywhere specifically in the documentation but I just created a Python library that encapsulates all the JSON structures, so I know you can't do this yet.
The only control over audio is 'output' through embedding links in SSML.
https://developer.amazon.com/public/solutions/alexa/alexa-skills-kit/docs/handling-requests-sent-by-alexa#Including%20Pre-Recorded%20Audio%20in%20your%20Response
There is an application that should read user tweets for every registered user, process them and store data for future usage.
It can reach Twitter 2 ways: either REST API (poll twitter every x mins), or use its Streaming API to get tweets delivered.
Besides completely different implementations on server side I wonder what are other impacts on server side?
Say application has tousands of users. Is it better to build kind of queue and poll twitter for each user (the simplest scenario), or is it better to use Streaming API and keep HTTP connection open for each user? I'm a bit worried about the latter as it'd require keeping tousands of connections open all the time. Are there any drawbacks of that I'm not aware of? If I'd like to deploy my app on Heroku or on EC2 instance, would it be ok or are there any limits?
How it is done in other apps that constantly need getting data for each user?