Digital Agent Not Playing All Pre-Recorded Audio Files - voice

I recently made a voice bot using pre-recorded audio files for the responses. While the logic/structure of my flows was working fine, there were times where the bot would stall out and do one of the following, depending on the situation:
If an audio file was longer than 20 seconds or so, the digital agent would not continue forward in the flow. If other audio files were supposed to be played after that, the digital agent would not play them. If the flow required a response from the user, the digital agent would not recognize it. After enough time passed, the no-input response would trigger.
If I divided the longer audio up into a few audio files instead of one longer one, the digital agent would stall out, just as above.
Eventually, I was able to get the digital agent working by separating the audio into as few under-20-second clips as possible and then moving some of those responses into their own flows. Strangely, sometimes I had to do the opposite: pull a response out of its own flow. It all seemed very arbitrary. The audio files were all encoded properly and stored in a Google Cloud bucket, according to the instructions in their documentation.
Is there a limit to how many audio files can be played in a row, regardless of whether they are under the total audio limit listed in Google's documentation? Also, is there a limit to how long an individual response can be?

Related

Best strategy to develop back end of an app with large userbase, taking into account limitations of bandwidth, concurrent connections etc.?

I am developing an Android app which basically does this: On the landing(home) page it shows a couple of words. These words need to be updated on daily basis. Secondly, there is an 'experiences' tab in which a list of user experiences (around 500) shows up with their profile pic, description,etc.
This basic app is expected to get around 1 million users daily who will open the app daily at least once to see those couple of words. Many may occasionally open up the experiences section.
Thirdly, the app needs to have a push notification feature.
I am planning to purchase a managed wordpress hosting, set up a website, and add a post each day with those couple of words, use the JSON-API to extract those words and display them on app's home page. Similarly for the experiences, I will add each as a wordpress post and extract them from the Wordpress database. The reason I am choosing wordpress is that it has ready made interfaces for data entry which will save my time and effort.
But I am stuck on this: will the wordpress DB be able to handle such large amount of queries ? With such a large userbase and spiky traffic, I suspect I might cross the max. concurrent connections limit.
What's the best strategy in my case ? Should I use WP, or use firebase or any other service ? I need to make sure the scheme is cost effective also.
My app is basically very similar to this one:
https://play.google.com/store/apps/details?id=com.ekaum.ekaum
For push notifications, I am planning to use third party services.
Kindly suggest the best strategy I should go with for designing the back end of this app.
Thanks to everyone out there in advance who are willing to help me in this.
I have never used Wordpress, so I don't know if or how it could handle that load.
You can still use WP for data entry, and write a scheduled function that would use WP's JSON API to copy that data into Firebase.
RTDB-vs-Firestore scalability states that RTDB can handle 200 thousand concurrent connections and Firestore 1 million concurrent connections.
However, if I get it right, your app doesn't need connections to be active (i.e. receive real-time updates). You can get your data once, then close the connection.
For RTDB, Enabling Offline Capabilities on Android states that
On Android, Firebase automatically manages connection state to reduce bandwidth and battery usage. When a client has no active listeners, no pending write or onDisconnect operations, and is not explicitly disconnected by the goOffline method, Firebase closes the connection after 60 seconds of inactivity.
So the connection should close by itself after 1 minute, if you remove your listeners, or you can force close it earlier using goOffline.
For Firestore, I don't know if it happens automatically, but you can do it manually.
In Firebase Pricing you can see that 100K Firestore document reads is $0.06. 1M reads (for the two words) should cost $0.6 plus some network traffic. In RTDB, the cost has to do with data bulk, so it requires some calculations, but it shouldn't be much. I am not familiar with the pricing small details, so you should do some more research.
In the app you mentioned, the experiences don't seem to change very often. You might want to try to build your own caching manually, and add the required versioning info in the daily data.
Edit:
It would possibly be more efficient and less costly if you used Firebase Hosting, instead of RTDB/Firestore directly. See Serve dynamic content and host microservices with Cloud Functions and Manage cache behavior.
In short, you create a HTTP function that reads your database and returns the data you need. You configure hosting to call that function, and configure the cache such that subsequent requests are served the cached result via hosting (without extra function invocations).

Throttle messaging in Firebase

We have 1M+ devices registered. Is there a way to limit how quickly the messages get delivered? Obviously it's real hard to scale if 1M+ notifications at the exact same time cause a massive spike of traffic to your backend. Would be great if instead of all the messages getting delivered immediately to all devices, you could make it only send X messages per second.
The best way to control the delivery of those message is actually by calling FCM with the token IDs yourself, preferably with the batched delivery feature from the legacy API (look for the registration_ids parameter there). You can scale this up to as many calls to the API as you need to deliver your message to all devices.
Using topics is also possible, but you lose control of the delivery performance since the fan-out happens in a process you don't control.
Alternatively: consider sending a data message that contains a timestamp on when it should be displayed. That way you separate the delivery time from the display time, removing the critical path (but of course introducing other considerations).

How often can I ping Google Calendar without getting banned?

We are writing our custom scheduling app for our website.
Necessarily, it requests Google Calendar data to see when one of our 3 team members are available and then offers the visitor an array of available time slots.
Problem is, this takes too damn long to get the updated info.
I'm wondering if we could simply get all this data in the background and offer visitors to pick from data that is a few seconds old :)
So my question is, how often can we initiate this without getting banned by Google.
Here you go. The limit you're looking for depends on the type of google account you're using.
https://developers.google.com/apps-script/guides/services/quotas
Also you won't get banned, it just won't run. If you're on a consumer account you could ping it 1x every 18 seconds without it failing. That's as close as you can get to "Live Data".

Can an Alexa custom skill get access to the voice stream/audio file of a user?

I would like to have a custom skill, but it would need direct access to the users voice (our output of a recorded audio). Can/will Alexa relay the stream rather than sending the request invocations (launch/intent/session-end)?
I understand custom skills can send back mp3s as responses, but being able to gain access to the actual voice requests, either the stream or a mp3, would be awesome.
Edit:
It seems that there is not a provided mp3 in the request object: https://developer.amazon.com/public/solutions/alexa/alexa-skills-kit/docs/alexa-skills-kit-interface-reference#LaunchRequest
Alexa does not provide this service.
Having an always-on device in a domestic setting, that can hear everything said, plus background noise, and side conversations, is a huge security concern. Amazon mitigates this concern by filtering the input, performing the difficult Speech-to-text work, and only providing the resulting text. (After further processing by your interaction model.)
In short, no - I can't find anywhere specifically in the documentation but I just created a Python library that encapsulates all the JSON structures, so I know you can't do this yet.
The only control over audio is 'output' through embedding links in SSML.
https://developer.amazon.com/public/solutions/alexa/alexa-skills-kit/docs/handling-requests-sent-by-alexa#Including%20Pre-Recorded%20Audio%20in%20your%20Response

Sending notifications according to database value changes

I am working on a vendor portal. An owner of a shop will login and in the navigation bar (similar to facebook) I would like the number of items sold to appear INSTANTLY, WITHOUT ANY REFRESH. In facebook, new notifications pop up immediately. I am using sql azure as my database. Is it possible to note a change in the database and INSTANTLY INFORM the user?
Part 2 of my project will consist of a mobile phone app for the vendor. In this app I, too , would like to have the same notification mechanism. In this case, would I be correct if I search on push notifications and apply them?
At the moment my main aim is to solve the problem in paragraph 1. I am able to retrieve the number of notifications, but how on earth is it possible to show the changes INSTANTLY? thank you very much
First you need to define what INSTANT means to you. For some, it means within a second 90% of the time. For others, they would be happy to have a 10-20 second gap on average. And more importantly, you need to understand the implications of your requirements; in other words, is it worth it to have near zero wait time for your business? The more relaxed your requirements, the cheaper it will be to build and the easier it will be to maintain.
You should know that having near-time notification can be very expensive in terms of computing and locking resources. The more you refresh, the more web roundtrips are needed (even if they are minimal in this case). Having data fresh to the second can also be costly to the database because you are potentially creating a high volume of requests, which in turn could affect otherwise good performing requests. For example, if your website runs with 1000 users logged on, you may need 1000 database requests per second (assuming that's your definition of INSTANT), which could in turn create a throttling condition in SQL Azure if not designed properly.
An approach I used in the past, for a similar requirement (although the precision wasn't to the second; more like to the minute) was to load all records from a table in memory in the local website cache. A background thread was locking and refreshing the in memory data for all records in one shot. This allowed us to reduce the database traffic by a factor of a thousand since the data presented on the screen was coming from the local cache and a single database connection was needed to refresh the cache (per web server). Because we had multiple web servers, and we needed the data to be exactly the same on all web servers within a second of each other, we synchronized the requests of all the web servers to refresh the cache every minute. Putting this together took many hours, but it allowed us to build a system that was highly scalable.
The above technique may not work for your requirements, but my point is that the higher the need for fresh data, the more design/engineering work you will need to make sure your system isn't too impacted by the freshness requirement.
Hope this helps.

Resources