Update large datasets in apigee baas - apigee

I have a scenario where a collection in the BaaS will have to be updated frequently. As I understand, to insert entities into a collection, I can do single HTTP POST requests with the payload containing an array of entities.
However, using HTTP PUT, I have to insert a single entity per request and I'm not sure about its performance.
What is the best / recommended way of updating a collection with a large number of entities?
Regards

You can do a batch update using HTTP PUT. See here: http://apigee.com/docs/app-services/content/updating-collections. Notice that the ql clause is Usergrid query language to restrict which items get updated.

As far as I know, that needs to be done singularly. I apologize for any inconvenience this may cause.

Related

How to use multiple namespaces in DoctrineCacheBundle cacheDriver?

I know I can setup multiple namespaces for DoctrineCacheBundle in config.yml file. But Can I use one driver but with multiple namespaces?
The case is that in my app I want to cache all queries for all of my entities. The problem is with flushing cache while making create/update actions. I want to flush only part of my cached queries. My app is used by multiple clients. So when a client updates sth in his data for instance in Article entity, I want to clear cache only for this client only for Article. I could add proper IDs for each query and remove them manually but the queries are dynamically used. In my API mobile app send version number for which DB should return data so I don't know what kind of IDs will be used in the end.
Unfortunately I don't think what you want to do can be solved with some configuration magic. What you want it some sort of indexed cache, and for that you have to find a more powerful tool.
You can take a look at doctrines second level cache. Don't know how good it is now (tried it once when it was in beta and did not make the cut for me).
Or you can build your own cache manager. If you do i recommend using redis. The data structures will help you keep you indexes (Can be simulated with memcached, but it requires more work). What I meen by indexes.
You will have a key like client_1_articles where 1 is the client id. In that key you will store all the ids of the articles of client 1. For every article id you will have a key like article_x where x is the id the of article. In this example client_1_articles is a rudimentary index that will help you, if you want at some point, to invalidated all the caches of articles coming from client 1.
The abstract implementation for the above example will end up being a graph like structure over your cache, with possibly
-composed indexes 'client_1:category_1' => {article_1, article_2}
-multiple indexes for one item eg: 'category_1'=>{article_1, article_2, article_3}, 'client_1' => {article_1, article_3}
-etc.
Hope this help you in some way. At least that was my solution for a similar problem.
Good luck with your project,
Alexandru Cosoi

Slow apigee query when using geolocation with wildcard search

We have a requirement to allow users to search for a business name and have the results sorted by proximity. A rather basic function. We are trying the following query but it takes up to a minute to come back with a response. If we exclude the geolocation constraint the response is instantaneous. Can someone let us know how we can optimize the query and/or entity collection.
https://api.usergrid.com/org/app/businesses/?ql=select * where business_name contains 'subway*' OR business_name='subway* AND location within 10000 of 49.3129366, -123.0795565&limit=10
Thank you in advance!
We have made recent updates to the platform and queries are now significantly faster. Basically instead of processing data from shards serially it is now done in parallel.
Unfortunately, with the previous (1.0) version each predicate would make the query much slower. We have resolved that with the recent updates to 1.0 as well as in our 2.1 release.

entity_load or EntityFieldQuery to pull entity ids in Drupal

Should I be using entity_load or EntityFieldQuery to get entity ids from a custom entity?
I was going to use entity_load to pull all of the entities in question of a particular type, and grab their relevant information (but that seems like it could be inefficient).
EntityFieldQuery will only return an array of entity IDs. If that is all you need then EntityFieldQuery will be much faster.
If you need to get the field values you should do entity_load. It is slow but it is the Drupal way.
If it is a very large number of nodes you may have timeout issues. To overcome this use Drupals Batch API or you can use the Database API to write a custom query to pull in the exact data you need in one query. This is technically faster but requires more code and can break compatibility.

What is the proper RESTful HTTP method for update of unique "slug" for a given record?

Given an existing set of records tied to a given unique identifier (a.k.a. slug), which is a portion of a key for multiple related records, what HTTP method is appropriate for updating such records to a new unique identifier?
OPTIONS, GET, PUT, POST, HEAD, DELETE, TRACE, and CONNECT all seem irrelevant to the functionality in question.
I'm reluctant to create a separate set of URIs that represent the "update slug/identifier" functionality when the URI schema is well established.
Thoughts? Opinions?
I'm reluctant to create a separate set of URIs that represent the
"update slug/identifier" functionality when the URI schema is well
established.
If I am interpreting you correctly then you are going find REST very difficult. In RESTful design, it is very common to have to create new resources to help work around the limited set of methods.
To answer the question directly. I would consider something like,
GET /recordsets/{oldslug}
to retrieve the items you wish to change the slug for and then
POST /recordsets/{newslug}
to assign the new slug to recordsets passed in the body. If for perf reasons you do not want to roundtrip the recordsets you could do,
POST /recordsets/{newslug}?source=/recordsets/{oldslug}

How can I deal with HTTP GET query string length limitations and still want to be RESTful?

As stated in http://www.boutell.com/newfaq/misc/urllength.html, HTTP query string have limited length. It can be limited by the client (Firefox, IE, ...), the server (Apache, IIS, ...) or the network equipment (applicative firewall, ...).
Today I face this problem with a search form. We developed a search form with a lot of fields, and this form is sent to the server as a GET request, so I can bookmark the resulting page.
We have so many fields that our query string is 1100 bytes long, and we have a firewall that drops HTTP GET requests with more than 1024 bytes. Our system administrator recommends us to use POST instead so there will be no limitation.
Sure, POST will work, but I really feel a search as a GET and not a POST. So I think I will review our field names to ensure the query string is not too long, and if I can't I will be pragmatic and use POST.
But is there a flaw in the design of RESTful services? If we have limited length in GET request, how can I do to send large objects to a RESTful webservice? For example, if I have a program that makes calculations based on a file, and I want to provide a RESTful webservice like this: http://compute.com?content=<base64 file>. This won't work because the query string has not unlimited length.
I'm a little puzzled...
HTTP specification actually advises to use POST when sending data to a resource for computation.
Your search looks like a computation, not a resource itself. What you could do if you still want your search results to be a resource is create a token to identify that specific search result and redirect the user agent to that resource.
You could then delete search results tokens after some amount of time.
Example
POST /search
query=something&category=c1&category=c2&...
201 Created
Location: /search/01543164876
then
GET /search/01543164876
200 Ok
... your results here...
This way, browsers and proxies can still cache search results but you are submitting your query parameters using POST.
EDIT
For clarification, 01543164876 here represents a unique ID for the resource representing your search. Those 2 requests basically mean: create a new search object with these criteria, then retrieve the results associated with the created search object.
This ID can be a unique ID generated for each new request. This would mean that your server will leak "search" objects and you will have to clean them regularly with a caching strategy.
Or it can be a hash of all the search criteria actually representing the search asked by the user. This allows you to reuse IDs since recreating a search will return an existing ID that may (or may not) be already cached.
Based on your description, IMHO you should use a POST. POST is for putting data on the server and, in some cases, obtain an answer. In your case, you do a search (send a query to the server) and get the result of that search (retrieve the query result).
The definition of GET says that it must be used to retrieve an already existing resource. By definition, POST is to create a new resource. This is exactly what you are doing: creating a resource on the server and retrieving it! Even if you don't store the search result, you created an object on the server and retrieved it. As PeterMmm previsouly said, you could do this with a POST (create and store the query result) and then use a GET to retrive the query, but it's more pratical do only a POST and retrieve the result.
Hope this helps! :)
REST is a manner to do things, not a protocol. Even if you dislike to POST when it is really a GET, it will work.
If you will/must stay with the "standard" definition of GET, POST, etc. than maybe consider to POST a query, that query will be stored on the server with a query id and request the query later with GET by id.
Regarding your example:http://compute.com?content={base64file}, I would use POST because you are uploading "something" to be computed. For me this "something" feels more like a resource as a simple parameter.
In contrast to this in usual search I would start to stick with GET and parameters. You make it so much easier for api-clients to test and play around with your api. Make the read-only access (which in most cases is the majority of traffic) as simple as possible!
But the dilemma of large query strings is a valid limitation of GET. Here I would go pragmatic, as long as you don't hit this limit go with GET and url-params. This will work in 98% of search-cases. Only act if you hit this limit and then also introduce POST with payload (with mime-type Content-Type: application/x-www-form-urlencoded).
Have you got more real-world examples?
The confusion around GET is a browser limitation. If you are creating a RESTful interface for an A2A or P2P application then there is no limitation to the length of your GET.
Now, if you happen to want to use a browser to view your RESTful interface (aka during development/debugging) then you will run into this limit, but there are tools out there to get around this.
This is an easy one. Use POST. HTTP doesn't impose a limit on the URL length for GET but servers do. Be pragmatic and work around that with a POST.
You could also use a GET body (that is allowed) but that's a double-whammy in that it is not correct usage and probably going to have server problems.
I think if u develop the biz system, encounter this issue, u must think whether the api design reasonable, if u GET api param design a biz_ids, and it too long.
u should think about with UI or Usecase, whether use other_biz_id to find biz_ids and build target response instead of biz_ids directly or not.
if u old api be depended on, u can add a new api for this usecase, if u module design well u add this api may fast.
I think should use protocols in a standard way as developer.
hope help u.

Resources