I am looking at using the Autocomplete API from Here Maps and using the Suggestion.json endpoint. My question is that as a user keys in characters, the autocomplete API for suggestions will be called on every key press. This means that for each key press, I need to call the API. This will turn out to be quite expensive. Assuming, I type in "London", it will call the API 6 times. Is there a better way to do this? Also, is there any option of a session token to be created such that, I get charged only for a session token in which I key in multiple characters for a search suggestion list to be generated?
There are a few things that you can do to reduce the number of calls:
Some places have just short names like https://en.wikipedia.org/wiki/List_of_short_place_names, so you might have to consider even single character names for your autosuggest. So you may consider building a cache of autosuggest keywords i.e. say a user types L and the one time you make a call to autosuggest API, you can cache the result with L as Key and then for the next key press repeat building the data structure or data store per your requirement, so that the number of hits to the API gradually decreases.
Once you have built your cache, you can decide to refresh you cache after every 10 calls or so. This will greatly reduce your call to the external API.
Lookup Trie data structure. might be helpful.
Here bills you based on the number of requests you make to the backend services so there is not such possibility to bill by session token. You can talk to your account executive if you can negotiate the cost of the offering though.
You can do this pretty easily with the Javascript API: here's an example to get started and here's the documentation page for autosuggest Javascript API.
Related
We have a table with 100M rows in google cloud datastore. What is the most efficient way to look up the existence of a large number of keys (500K-1M)?
For context, a use case could be that we have a big content datastore (think of all webpages in a domain). This datastore contains pre-crawled content and metadata for each document. Each document, however, could be liked by many users. Now when we have a new user and he/she says he/she likes document {a1, a2, ..., an}, we want to tell if all these document ak {k in 1 to n} are already crawled. That's the reason we want to do the lookup mentioned above. If there is a subset of documents that we don't have yet, we would start to crawl them immediately. Yes, the ultimate goal is to retrieve all these document content and use them to build the user profile.
My current thought is to issue a bunch of batch lookup requests. Each lookup request can contain up to 1K of keys [1]. However to get the existence of every key in a set of 1M, I still need to issue 1000 requests.
An alternative is to use a customized middle layer to provide a quick look up (for example, can use bloom filter or something similar) to save the time between multiple requests. Assuming we never delete keys, every time we insert a key, we add it through the middle layer. The bloom-filter keeps track of what keys we have (with a tolerable false positive rate). Since this is a custom layer, we could provide a micro-service without a limit. Say we could respond to a request asking for the existence of 1M keys. However, this definitely increases our design/implementation complexity.
Is there any more efficient ways to do that? Maybe a better design? Thanks!
[1] https://cloud.google.com/datastore/docs/concepts/limits
I'd suggest breaking down the problem in a more scalable (and less costly) approach.
In the use case you mentioned you can deal with one document at a time, each document having a corresponding entity in the datastore.
The webpage URL uniquely identifies the page, so you can use it to generate a unique key/identifier for the respective entity. With a single key lookup (strongly consistent) you can then determine if the entity exists or not, i.e. if the webpage has already been considered for crawling. If it hasn't then a new entity is created and a crawling job is launched for it.
The length of the entity key can be an issue, see How long (max characters) can a datastore entity key_name be? Is it bad to haver very long key_names?. To avoid it you can have the URL stored as a property of the webpage entity. You'll then have to query for the entity by the url property to determine if the webpage has already been considered for crawling. This is just eventually consistent, meaning that it may take a while from when the document entity is created (and its crawling job launched) until it appears in the query result. Not a big deal, it can be addressed by a bit of logic in the crawling job to prevent and/or remove document duplicates.
I'd keep the "like" information as small entities mapping a document to a user, separated from the document and from the user entities, to prevent the drawbacks of maintaining possibly very long lists in a single entity, see Manage nested list of entities within entities in Google Cloud Datastore and Creating your own activity logging in GAE/P.
When a user likes a webpage with a particular URL you just have to check if the matching document entity exists:
if it does just create the like mapping entity
if it doesn't and you used the above-mentioned unique key identifiers:
create the document entity and launch its crawling job
create the like mapping entity
otherwise:
launch the crawling job which creates the document entity taking care of deduplication
launch a delayed job to create the mapping entity later, when the (unique) document entity becomes available. Possibly chained off the crawling job. Some retry logic may be needed.
Checking if a user liked a particular document becomes a simple query for one such mapping entity (with a bit of care as it's also eventually consistent).
With such scheme in place you no longer have to make those massive lookups, you only do one at a time - which is OK, a user liking documents one a time is IMHO more natural than providing a large list of liked documents.
I'm wondering if there's a possibility to fetch disaggregated data from Google, using their APIs.
Currently I'm able to already receive a quite detailed segmentation by selecting ga:source, ga:dateHourMinute, ga:country and others, but of course these are still groups of sessions.
Thanks a lot!
Not by default - there is no dimensions for sessions in the API, and not even the client id is exposed via the API.
An easy way to obtain a session marker is to store a random number in a session scoped custom dimension. Since a session scoped dimension by definition stores only the last value in the session this will give you an unique (well, not technically unique, but unique enough) value per session, which can be use in conjunction with the client id, which you'd need to store in another custom dimension.
Of course since this will give you a lot of single rows you will be running into API limits pretty soon.
In a GA360 account you could use BigQuery - the BQ export schema includes session identifiers.
I am doing the setup of OAuth with Firebase for a Google Actions app.
I chose the Authorization Code Flow and I am following the steps from the doc here :
https://developers.google.com/actions/identity/oauth2-code-flow
Step 4 of Handle user sign-in, there are two ways to create an authorization code.
I prefer the one that use a json to store the expiration date to save a database call in the next step.
Now, I would like to store all the authorization codes generated and I am not sure about what is the best way to do so. My auth codes are very long (170 characters), and I am not sure if it is a great way to store them as Index in Firebase.
Here is what my DB looks like :
I thought about using a hash to shorten them, but I am afraid about hash not being unique.
What would be the cleanest way to store auth codes in Firebase ?
Thanks!
Keys can be up to 768 characters, so using the auth code as a key makes perfect sense.
Using a hash is reasonable since a good hash has a very low chance of collision, but doesn't provide you much additional value in your case and will (slightly) increase computation time and program complexity.
In the firebase example (https://gist.github.com/anantn/4323981), to add an user to the game, we attach the transaction method to playerListRef. Now, every time firebase attempts to update data, it will call the callback passed to the transaction method with the list of userid of all players. If my game supports thousands of users to join at a time, every instance this method executes, the entire user list will be downloaded and passed which will be bad.
If this is true, what is the recommended way to assign users then?
This is specifically what Firebase was designed to handle. If your application needs to actually assign player numbers, this example is the way to go. Otherwise, if the players just need to be in the same "game" or "room" without any notion of ordering you could remove the transaction code to speed things up a bit. The snippet as well as the backend have handled the number of concurrent connections you've mentioned—if you're seeing any specific problems with your code or behavior with Firebase that appears to be a bug, please contact us at support#firebase.com and we can dig into it.
I'm writing a simple Wordpress plugin for work and am wondering if using the Transients API is practical in this case, or if I should seek out another way.
The plugin's purpose is simple. I'm making a call to USZip Web Service (http://www.webservicex.net/uszip.asmx?op=GetInfoByZIP) to retrieve data. Our sales team is using a Lead Intake sheet that the plugin will run on.
I wanted to reduce the number of API calls, so I thought of setting a transient for each zip code as the key and store the incoming data (city and zip). If the corresponding data for a given zip code already exists, then no need to make an API call.
Here are my concerns:
1. After a quick search, I realized that the transient data is stored in the wp_options table and storing the data would balloon that table in no time. Would this cause a significance performance issue if the db becomes huge?
2. Is this horrible practice to create this many transient keys? It could easily becomes thousands in a few months time.
If using Transient is not the best way, could you please help point me in the right direction? Thanks!
P.S. I opted for the Transients API vs the Options API. I know zip codes don't change often, but they sometimes so. I set expiration time of 3 months.
A less-inflated solution would be:
Store a single option called uszip with a serialized array inside the option
Grab the entire array each time and simply check if the zip code exists
If it doesn't exist, grab the data and save the whole transient again
You should make sure you don't hit the upper bounds of a serialized array in this table (9,000 elements) considering 43,000 zip codes exist in the US. However, you will most likely have a very localized subset of zip codes.