How to retrieve resources based on different conditions using GET in RESTful api? - http

As per REST framework, we can access resources using GET method, which is fine, if i know key my resource. For example, for getting transaction, if i pass transaction_id then i can get my resource for that transaction. But when i want to access all transactions between two dates, then how should i write my REST method using GET.
For getting transaciton of transaction_id : GET/transaction/id
For getting transaction between two dates ???
Also if there are other conditions, i need to put like latest 10 transactions, oldest 10 transaction, then how should i write my URL, which is main key in REST.
I tried to look on google but not able to find a way which is completely RESTful and solve my queries, so posting my question here. I have clear understanding of POST and DELETE, but if i want to do same update using PUT for some resource based on condition, then how to do it?

There are collection and item resources in REST.
If you want to get a representation of an item, you usually use an unique identifier:
/books/123
/books/isbn:32t4gf3e45e67 (not a valid isbn)
or with template
`/books/{id}
/books/isbn:{isbn}
If you want to get a representation of a collection, or a reduced collection you use the unique identifier of the collection and add some filters to it:
/books/since:{fromDate}/to:{toDate}/
/books/?since="{fromDate}"&to="{toDate}"
the filters can go into the path or into the queryString part of the url.
In the response you should add links with these URLs (aka HATEOAS), which the REST clients can follow. You should use link relations, for example IANA link relations to describe those links, and linked data, for example schema.org or to describe the data in your representation. There are other vocabs as well, for example GoodRelations, and ofc. you can write your own vocab as well for your application.

Related

REST URI - GET Resource batch using array of ID's

The title is probably poorly worded, but I'm trying my hand at creating a REST api with symfony. I've studied a few public api's to get a feel for it, and a common principle seems to be dealing with a single resource path at a time. However, the data I'm working with has a lot of levels (7-8), and each level is only guaranteed to be unique under its parent (the whole path makes a composite key).
In this structure, I'd like to get all children resources from all or several parents. I know about filtering data using the queryParam at the end of a URI, but it seems like specifying the parent id(s) as an array is better.
As an example, let's say I have companies in my database, which own routers, which delegate traffic for some number of devices. The REST URI to get all devices for a router might look like this:
/devices/company/:c_id/routers/:r_id/getdevices
but then the user has to crawl through all :r_id's to get all the devices for a company. Some suggestions I've seen all involve moving the :r_id out of the path and using it in the the query string:
/devices/company/:c_id/getdevices?router_id[]=1&router_id[]=2
I get it, but I wouldn't want to use it at that point.
Instead, what seems functionally better, yet philosophically questionable, is doing this:
/devices/company/:c_id/routers/:[r_ids]/getdevices
Where [r_ids] is a stringified array of ids that can be decoded into an array of integers/strings server-side. This also frees up the query-parameter string to focus on filtering devices by attributes (age, price, total traffic, status).
However, I am new to all of this and having trouble finding what is "standard". Is this a reasonable solution?
I'll add I've tested the array string out in Symfony and it works great. But I can't tell if it can become a vehicle for malicious queries since I intend on using Doctrine's DBAL - I'll take tips on that too (although it seems like a problem regardless for string id's)
However, I am new to all of this and having trouble finding what is "standard". Is this a reasonable solution?
TL;DR: yes, it's fine.
You would probably see an identifier like that described using a level 4 URI Template, with your list of identifiers encoded via a path segment expansion.
Your example template might look something like:
/devices/company{/c_id}/routers{/r_ids}/devices
And you would need to communicate to the template consumer that c_id is a company id, and r_ids is a list of router identifiers, or whatever.
You've seen simplified versions of this on the web: URI templates are generalizations of web forms that read information from input controls and encode the inputs into the query string.

Lookup the existence of a large number of keys (up to1M) in datastore

We have a table with 100M rows in google cloud datastore. What is the most efficient way to look up the existence of a large number of keys (500K-1M)?
For context, a use case could be that we have a big content datastore (think of all webpages in a domain). This datastore contains pre-crawled content and metadata for each document. Each document, however, could be liked by many users. Now when we have a new user and he/she says he/she likes document {a1, a2, ..., an}, we want to tell if all these document ak {k in 1 to n} are already crawled. That's the reason we want to do the lookup mentioned above. If there is a subset of documents that we don't have yet, we would start to crawl them immediately. Yes, the ultimate goal is to retrieve all these document content and use them to build the user profile.
My current thought is to issue a bunch of batch lookup requests. Each lookup request can contain up to 1K of keys [1]. However to get the existence of every key in a set of 1M, I still need to issue 1000 requests.
An alternative is to use a customized middle layer to provide a quick look up (for example, can use bloom filter or something similar) to save the time between multiple requests. Assuming we never delete keys, every time we insert a key, we add it through the middle layer. The bloom-filter keeps track of what keys we have (with a tolerable false positive rate). Since this is a custom layer, we could provide a micro-service without a limit. Say we could respond to a request asking for the existence of 1M keys. However, this definitely increases our design/implementation complexity.
Is there any more efficient ways to do that? Maybe a better design? Thanks!
[1] https://cloud.google.com/datastore/docs/concepts/limits
I'd suggest breaking down the problem in a more scalable (and less costly) approach.
In the use case you mentioned you can deal with one document at a time, each document having a corresponding entity in the datastore.
The webpage URL uniquely identifies the page, so you can use it to generate a unique key/identifier for the respective entity. With a single key lookup (strongly consistent) you can then determine if the entity exists or not, i.e. if the webpage has already been considered for crawling. If it hasn't then a new entity is created and a crawling job is launched for it.
The length of the entity key can be an issue, see How long (max characters) can a datastore entity key_name be? Is it bad to haver very long key_names?. To avoid it you can have the URL stored as a property of the webpage entity. You'll then have to query for the entity by the url property to determine if the webpage has already been considered for crawling. This is just eventually consistent, meaning that it may take a while from when the document entity is created (and its crawling job launched) until it appears in the query result. Not a big deal, it can be addressed by a bit of logic in the crawling job to prevent and/or remove document duplicates.
I'd keep the "like" information as small entities mapping a document to a user, separated from the document and from the user entities, to prevent the drawbacks of maintaining possibly very long lists in a single entity, see Manage nested list of entities within entities in Google Cloud Datastore and Creating your own activity logging in GAE/P.
When a user likes a webpage with a particular URL you just have to check if the matching document entity exists:
if it does just create the like mapping entity
if it doesn't and you used the above-mentioned unique key identifiers:
create the document entity and launch its crawling job
create the like mapping entity
otherwise:
launch the crawling job which creates the document entity taking care of deduplication
launch a delayed job to create the mapping entity later, when the (unique) document entity becomes available. Possibly chained off the crawling job. Some retry logic may be needed.
Checking if a user liked a particular document becomes a simple query for one such mapping entity (with a bit of care as it's also eventually consistent).
With such scheme in place you no longer have to make those massive lookups, you only do one at a time - which is OK, a user liking documents one a time is IMHO more natural than providing a large list of liked documents.

Should nested relationships be reflected in URLs for JSON API?

I'm trying to follow JSON API. I need to expose CRUD access to a nested resource: product reviews.
Prior to using JSON API, I'd expect a REST interface like this:
GET /products/:product_id/reviews - list reviews for a product
POST /products/:product_id/reviews - add a review for a product
PATCH /products/:product_id/reviews/:id - update a review for a product
DELETE /products/:product_id/reviews/:id - delete a review for a product
I see some mention of a nested structure like this in the spec:
For example, the URL for a photo’s comments will be:
/photos/1/comments
But I'm not sure whether this structure is intended for all actions.
On the one hand, POST /products/:product_id/reviews for creation seems redundant if I'm going to specify the product in the POST body, under the review data's relationships.
On the other hand, if it's useful to specify a product id when deleting a review (maybe it isn't), DELETE /products/:product_id/reviews/:id seems like the only sane way to do it; people argue about whether a request body is even allowed for DELETE requests.
I could nest for some requests and not others:
GET /products/:product_id/reviews - list reviews for a product
POST /products/:product_id/reviews - add a review for a product
PATCH /reviews/:id - update a review
DELETE /reviews/:id - delete a review
But that seems weirdly inconsistent.
I could never nest:
GET /reviews - list reviews for the product specified in params
POST /reviews - add a review for the product specified in params
PATCH /reviews/:id - update a review
DELETE /reviews/:id - delete a review
But that seems awkward, and doesn't seem to match the first quote I made from the docs.
Should nested resource relationships be reflected in the URL when using JSON API?
I really like your question, since I have been having the same thoughts. I'm puzzled that no one has left an answer yet.
I have been using JSON API a little over a year on a production system and I would like to give my two cents.
At first when I started the project that was going to use JSON API, I was in doubt of nested vs non-nested resources. I then ran into issues with nested resources that would have been avoided with non-nested resources.
To take the same paths as in your example, consider the GET /products/:product_id/reviews endpoint.
When this is made it make very much sense to nest a review under a product because we are initially showing reviews in context of a product. Everything is good.
We then later want to build a page in the frontend that shows a user and all the reviews that user has authored.
Although we already have an endpoint for getting reviews, we will have to build a new one, e.g. GET /users/:id/reviews.
If we hade just put the first endpoint on GET /reviews with a filter of ?filter[product_id]=:id, we could just add a new filter to that endpoint, which makes much sense IMO.
I do use nested resources, but only for singleton resources like GET /users/:id/email_settings and a few other special cases where it makes sense.
In my experience, it makes it easier in the future if each resource is thought of as independent from other resources. There exists resources and relationships between resources. No resource "owns" another resource in the context of the API (in context of business-logic it's another story).
I have worked with this strategy and it still surprises me how well it works when adding new functionality to existing endpoints and when adding new endpoints.
If you coming from CQRS camp, you will understand why design Restful API sometimes awkward. It is awkward because naturally Query actions (GET) and Mutation actions (POST, PATCH, DELETE) should talk in two different languages.
Query actions naturally relationship-oriented and data rich; while Mutation actions not. So it feel easy to use nested URL to traversal between relationship entities.
But Mutation you should provide just enough information for tasks. Sometimes it is redundant like your Post example. Sometimes missing like your DELETE example. Sometimes you have a task involve many resources; you don't know where to put in.
You should check Facebook Graph API or Azure Graph API, they met same problems and have some good solutions. It's important that you should follow consistent design.
Some rules are:
DELETE, UPDATE always to direct resource.
POST use with nested resource if you want create both object and main relationship. Secondary relationships should put in BODY. If you have two equal relationships, consider have both nested APIs.
Use POST against fake resource to for tasks involve with many resources.
POST /transferfund
Use POST against fake relationship for tasks could not fit with any HTTP verbs. For example, you want have body for delete action, use
POST /resource/id/deleteItForMe
{ reason: "I hate it"}

how to retrieve particular set of information using TopicAPI?

I'm a newbie in Freebase Topic API. Currently I am looking for "How to retrieve specific set of data using Freebase Topic API?"
for e.g. if we request for particular information using following URL
https://www.googleapis.com/freebase/v1/topic/en/nicobar_scrubfowl?filter=/common/topic/description
we get ample of information like "id","property","values" array containing "text","lang","value" etc.. And I don't want all the information.
So how to retrieve particular set of information using topicAPI (like only "value" from "values" array OR only "provider" etc..)
thanks
If you want that level of control, you should investigate the MQLRead API.
There's no way to filter out those parts of the Topic API response. Every property value will have at least text, lang, id, creator and timestamp.
Why is this a problem in your application? As long as you're parsing this data with a JSON parser you will be able to access any of the data you want while ignoring the rest. If you're worried about the size of the response you can ask for a GZip response.

Bulk Collection Manipulation through a REST (RESTful) API

I'd like some advice on designing a REST API which will allow clients to add/remove large numbers of objects to a collection efficiently.
Via the API, clients need to be able to add items to the collection and remove items from it, as well as manipulating existing items. In many cases the client will want to make bulk updates to the collection, e.g. adding 1000 items and deleting 500 different items. It feels like the client should be able to do this in a single transaction with the server, rather than requiring 1000 separate POST requests and 500 DELETEs.
Does anyone have any info on the best practices or conventions for achieving this?
My current thinking is that one should be able to PUT an object representing the change to the collection URI, but this seems at odds with the HTTP 1.1 RFC, which seems to suggest that the data sent in a PUT request should be interpreted independently from the data already present at the URI. This implies that the client would have to send a complete description of the new state of the collection in one go, which may well be very much larger than the change, or even be more than the client would know when they make the request.
Obviously, I'd be happy to deviate from the RFC if necessary but would prefer to do this in a conventional way if such a convention exists.
You might want to think of the change task as a resource in itself. So you're really PUT-ing a single object, which is a Bulk Data Update object. Maybe it's got a name, owner, and big blob of CSV, XML, etc. that needs to be parsed and executed. In the case of CSV you might want to also identify what type of objects are represented in the CSV data.
List jobs, add a job, view the status of a job, update a job (probably in order to start/stop it), delete a job (stopping it if it's running) etc. Those operations map easily onto a REST API design.
Once you have this in place, you can easily add different data types that your bulk data updater can handle, maybe even mixed together in the same task. There's no need to have this same API duplicated all over your app for each type of thing you want to import, in other words.
This also lends itself very easily to a background-task implementation. In that case you probably want to add fields to the individual task objects that allow the API client to specify how they want to be notified (a URL they want you to GET when it's done, or send them an e-mail, etc.).
Yes, PUT creates/overwrites, but does not partially update.
If you need partial update semantics, use PATCH. See http://greenbytes.de/tech/webdav/draft-dusseault-http-patch-14.html.
You should use AtomPub. It is specifically designed for managing collections via HTTP. There might even be an implementation for your language of choice.
For the POSTs, at least, it seems like you should be able to POST to a list URL and have the body of the request contain a list of new resources instead of a single new resource.
As far as I understand it, REST means REpresentational State Transfer, so you should transfer the state from client to server.
If that means too much data going back and forth, perhaps you need to change your representation. A collectionChange structure would work, with a series of deletions (by id) and additions (with embedded full xml Representations), POSTed to a handling interface URL. The interface implementation can choose its own method for deletions and additions server-side.
The purest version would probably be to define the items by URL, and the collection contain a series of URLs. The new collection can be PUT after changes by the client, followed by a series of PUTs of the items being added, and perhaps a series of deletions if you want to actually remove the items from the server rather than just remove them from that list.
You could introduce meta-representation of existing collection elements that don't need their entire state transfered, so in some abstract code your update could look like this:
{existing elements 1-100}
{new element foo with values "bar", "baz"}
{existing element 105}
{new element foobar with values "bar", "foo"}
{existing elements 110-200}
Adding (and modifying) elements is done by defining their values, deleting elements is done by not mentioning it the new collection and reordering elements is done by specifying the new order (if order is stored at all).
This way you can easily represent the entire new collection without having to re-transmit the entire content. Using a If-Unmodified-Since header makes sure that your idea of the content indeed matches the servers idea (so that you don't accidentally remove elements that you simply didn't know about when the request was submitted).
Best way is :
Pass Only Id Array of Deletable Objects from Front End Application To Web API
2. Then You have Two Options:
2.1 Web API Way : Find All Collections/Entities using Id arrays and Delete in API , but you need to take care of Dependant entities like Foreign Key Relational Table Data too
2.2. Database Way : Pass Ids to your database side, find all records in Foreign Key Tables and Primary Key Tables and Delete in same order i.e. F-Key Table records then P-Key Table records

Resources