How to get all the Wikidata redirections from a Wikidata Json Dump? - wikidata

Wikidata has now redirections that make some concepts pointing to others that have the same meaning. This is great but my app works with Wikidata and old concepts in my database are now absent from the new dump, and the linking with the new concept is not possible.
For example this link is the item I get when requesting the item "Q3840848" on Wikidata.
However those redirections are absent in the Json Dump. I tried to find the previous item "Q3840848" with a simple grep and didn't get any results, furthermore I got no parse error when I parse the dump according to the doc, so how can I retrieve all the redirections from a Json Dump ?

The dump you mentioned in the comments is no longer maintained. If you really want to get the redirection you have to parse entirely the .ttl file.

We just ran into the same issue, you can query the redirects from the SPARQL endpoint:
#Redirect
SELECT ?primary ?secondary
WHERE
{ ?primary wdt:P31 wd:Q11173 . #Example: instance of chemical compoun
?secondary owl:sameAs ?primary. # redirect
} LIMIT 50

Related

Is it valid to combine a form POST with a query string?

I know that in most MVC frameworks, for example, both query string params and form params will be made available to the processing code, and usually merged into one set of params (often with POST taking precedence). However, is it a valid thing to do according to the HTTP specification? Say you were to POST to:
http://1.2.3.4/MyApplication/Books?bookCode=1234
... and submit some update like a change to the book name whose book code is 1234, you'd be wanting the processing code to take both the bookCode query string param into account, and the POSTed form params with the updated book information. Is this valid, and is it a good idea?
Is it valid according HTTP specifications ?
Yes.
Here is the general syntax of URL as defined in those specs
http_URL = "http:" "//" host [ ":" port ] [ abs_path [ "?" query ]]
There is no additional constraints on the form of the http_URL. In particular, the http method (i.e. POST,GET,PUT,HEAD,...) used don't add any restriction on the http URL format.
When using the GET method : the server can consider that the request body is empty.
When using the POST method : the server must handle the request body.
Is it a good idea ?
It depends what you need to do. I suggest you this link explaining the ideas behind GET and POST.
I can think that in some situation it can be handy to always have some parameters like the user language in the query part of the url.
I know that in most MVC frameworks, for example, both query string params and form params will be made available to the processing code, and usually merged into one set of params (often with POST taking precedence).
Any competent framework should support this.
Is this valid
Yes. The POST method in HTTP does not impose any restrictions on the URI used.
is it a good idea?
Obviously not, if the framework you are going to use is still clue-challenged. Otherwise, it depends on what you want to accomplish. The major use case (redirection of a data subset to a new POST target) has been irretrievably broken by browser implementations (all mechanically following the broken lead of Mosaic/Netscape), so the considerations here are mostly theoretical.

Proper REST response for empty table?

Let's say you want to get list of users by calling GET to api/users, but currently the table was truncated so there are no users. What is the proper response for this scenario: 404 or 204?
I'd say, neither.
Why not 404 (Not Found) ?
The 404 status code should be reserved for situations, in which a resource is not found. In this case, your resource is a collection of users. This collection exists but it's currently empty. Personally, I'd be very confused as an author of a client for your application if I got a 200 one day and a 404 the next day just because someone happened to remove a couple of users. What am I supposed to do? Is my URL wrong? Did someone change the API and neglect to leave a redirection.
Why not 204 (No Content) ?
Here's an excerpt from the description of the 204 status code by w3c
The server has fulfilled the request but does not need to return an entity-body, and might want to return updated metainformation.
While this may seem reasonable in this case, I think it would also confuse clients. A 204 is supposed to indicate that some operation was executed successfully and no data needs to be returned. This is perfect as a response to a DELETE request or perhaps firing some script that does not need to return data. In case of api/users, you usually expect to receive a representation of your collection of users. Sending a response body one time and not sending it the other time is inconsistent and potentially misleading.
Why I'd use a 200 (OK)
For reasons mentioned above (consistency), I would return a representation of an empty collection. Let's assume you're using XML. A normal response body for a non-empty collection of users could look like this:
<users>
<user>
<id>1</id>
<name>Tom</name>
</user>
<user>
<id>2</id>
<name>IMB</name>
</user>
</users>
and if the list is empty, you could just respond with something like this (while still using a 200):
<users/>
Either way, a client receives a response body that follows a certain, well-known format. There's no unnecessary confusion and status code checking. Also, no status code definition is violated. Everybody's happy.
You can do the same with JSON or HTML or whatever format you're using.
I'd answer one of two codes depending on runtime situation:
404 (Not Found)
This answer is pretty correct if you have no table. Not just empty table but NO USER TABLE. It confirms exact idea - no resource. Further options are to provide more details WHY your table is absent, there is couple of more detailed codes but 404 is pretty good to refer to situation where you really have no table.
200 (OK)
All cases where you have table but it is empty or your request processor filtered out all results. This means 'your request is correct, everything is OK but you do not match any data just because either we have no data or we have no data which matches your request. This should be different from security denial answer. I also vote to return 200 in situation where you have some data and in general you are allowed to access table but have no access to all data which match your request (data was filtered out because of object level security but in general you are allowed to request).
If you are expecting list of user object, the best solution is returning an empty list ([]) with 200 OK than using a 404 or a 204 response.
definitely returns 200.
404 means resource not found. But the resource exists. And also, if the response has 404 status. How can you know users list empty or filled?
'/users' if is empty should return '200'.
'/users/1' if the id is not found. should return 404.
It must 200 OK with empty list.
Why: Empty table means the table exists but does not have any records.
404 Not Found means requested end point does not exist.

Designing proper REST URIs

I have a Java component which scans through a set of folders (input/processing/output) and returns the list of files in JSON format.
The REST URL for the same is:
GET http://<baseurl>/files/<foldername>
Now, I need to perform certain actions on each of the files, like validate, process, delete, etc. I'm not sure of the best way to design the REST URLs for these actions.
Since its a direct file manipulation, I don't have any unique identifier for the files, except their paths. So I'm not sure if the following is a good URL:
POST http://<baseurl>/file/validate?path=<filepath>
Edit: I would have ideally liked to use something like /file/fileId/validate. But the only unique id for files is its path, and I don't think I can use that as part of the URL itself.
And finally, I'm not sure which HTTP verb to use for such custom actions like validate.
Thanks in advance!
Regards,
Anand
When you implement a route like http:///file/validate?path you encode the action in your resource that's not a desired effect when modelling a resource service.
You could do the following for read operations
GET http://api.example.com/files will return all files as URL reference such as
http://api.example.com/files/path/to/first
http://api.example.com/files/path/to/second
...
GET http://api.example.com/files/path/to/first will return validation results for the file (I'm using JSON for readability)
{
name : first,
valid : true
}
That was the simple read only part. Now to the write operations:
DELETE http://api.example.com/files/path/to/first will of course delete the file
Modelling the file processing is the hard part. But you could model that as top level resource. So that:
POST http://api.example.com/FileOperation?operation=somethingweird will create a virtual file processing resource and execute the operation given by the URL parameter 'operation'. Modelling these file operations as resources gives you the possibility to perform the operations asynchronous and return a result that gives additional information about the process of the operation and so on.
You can take a look at Amazon S3 REST API for additional examples and inspiration on how to model resources. I can highly recommend to read RESTful Web Services
Now, I need to perform certain actions on each of the files, like validate, process, delete, etc. I'm not sure of the best way to design the REST URLs for these actions. Since its a direct file manipulation, I don't have any unique identified for the files, except their paths. So I'm not sure if the following is a good URL: POST http:///file/validate?path=
It's not. /file/validate doesn't describe a resource, it describes an action. That means it is functional, not RESTful.
Edit: I would have ideally liked to use something like /file/fileId/validate. But the only unique id for files is its path, and I don't think I can use that as part of the URL itself.
Oh yes you can! And you should do exactly that. Except for that final validate part; that is not a resource in any way, and so should not be part of the path. Instead, clients should POST a message to the file resource asking it to validate itself. Luckily, POST allows you to send a message to the file as well as receive one back; it's ideal for this sort of thing (unless there's an existing verb to use instead, whether in standard HTTP or one of the extensions such as WebDAV).
And finally, I'm not sure which HTTP verb to use for such custom actions like validate.
POST, with the action to perform determined by the content of the message that was POSTed to the resource. Custom “do something non-standard” actions are always mapped to POST when they can't be mapped to GET, PUT or DELETE. (Alas, a clever POST is not hugely discoverable and so causes problems for the HATEOAS principle, but that's still better than violating basic REST principles.)
REST requires a uniform interface, which in HTTP means limiting yourself to GET, PUT, POST, DELETE, HEAD, etc.
One way you can check on each file's validity in a RESTful way is to think of the validity check not as an action to perform on the file, but as a resource in its own right:
GET /file/{file-id}/validity
This could return a simple True/False, or perhaps a list of the specific constraint violations. The file-id could be a file name, an integer file number, a URL-encoded path, or perhaps an unencoded path like:
GET /file/bob/dir1/dir2/somefile/validity
Another approach would be to ask for a list of the invalid files:
GET /file/invalid
And still another would be to prevent invalid files from being added to your service in the first place, ie, when your service processes a PUT request with bad data:
PUT /file/{file-id}
it rejects it with an HTTP 400 (Bad Request). The body of the 400 response could contain information on the specific error.
Update: To delete a file you would of course use the standard HTTP REST verb:
DELETE /file/{file-id}
To 'process' a file, does this create a new file (resource) from one that was uploaded? For example Flickr creates several different image files from each one you upload, each with a different size. In this case you could PUT an input file and then trigger the processing by GET-ing the corresponding output file:
PUT /file/input/{file-id}
GET /file/output/{file-id}
If the processing isn't near-instantaneous, you could generate the output files asynchronously: every time a new input file is PUT into the web service, the web service starts up an asynchronous activity that eventually results in the output file being created.

REST URL structure advice

I'm trying to finalise on a restful url structure for the wishlist section of a site I'm working on. It's a pretty simple model, a user can have many wishlists and each wishlist can contain many products.
Currently I have the obvious CRUD URLs to manipulate the wishlist itself :
GET account/wishlists.json
GET account/wishlists/{id}.json
POST account/wishlists.json?name=My%20Wishlist
POST account/wishlists/{id}.json?name=My%20New%20Name
DELETE account/wishlists/{id}.json
However, I don't think I know how to structure the URLs that would add/remove a product to a wishlist :(
Here are my current choices :
1) Have the product to add as part of the URL and use the HTTP verb to define my action
POST account/wishlist/{id}/product/{product_id}.json
DELETE account/wishlist/{id}/product/{product_id}.json
or
2) Have the action as part of the URL and the product id as part of the payload
POST account/wishlist/{id}/add.json?product_id={product_id}
POST account/wishlist/{id}/remove.json?product_id={product_id}
(1) is clean and, as far as I can tell it's pretty RESTful but doesn't allow things like adding multiple products easily etc.
I'm also a bit concerned about using the DELETE verb - I'm not deleting the product or the wishlist, I'm just removing one from the other.
(2) is more explicit but veers away from REST - I wouldn't be just referring to the resource in the url, I would be referring to an operation on that resource :(
Any advice on which of the above would be more correct would be very helpful! (If there's a third option that's better than mine, feel free to correct me!)
(1) is the only valid approach for REST, using HTTP verbs for actions.
(2) encodes method names into the URI which is more like RPC and of course not RESTful.
Concerning your shortcomings about the first approach:
The DELETE verb is fine, because your resource is the item inside in the wishlist, not the item itself.
You can support batch requests. For instance, you might want to allow to POST a list of items to a wishlist resource resulting in mutliple adds.
PS: Prefer HTTP content negotiation (Accept and Content-Type headers) over representation formats encoded in the URI.
I think your first option is more in line with the REST philosophy. If you want to manipulate multiple products, you could pass the ids as a list in the body, instead of using a query parameter.
As for the delete part, given that you are deleting a subresource of wishlist, I think the intention is clear (i.e. remove the connection from the wishlist to the product). If you wanted to globally remove a product, the URL should be something like
DELETE /products/{id}
As noted by other responses, the first option is clearly the RESTful approach. The approach to deleting products from the wishlist looks fine - after all you would be doing a DELETE on product/{product_id} to remove the product itself.
For adding products, you might wish to consider a POST to account/wishlist/{id}/product/ the body of which could contain a list of product IDs.
Here's a nice article on how to think about REST URLs

How to apply the PUT verb in a REST request?

I'm working on a REST server. I have an order RESOURCE.
From my understanding the PUT verb should create a new order based on the URL. My question is: How can this work if the resource is new and you don't know the ID of the new order?
I know the debate about POST vs PUT, but I'm quoting the w3 specs for PUT http://www.w3.org/Protocols/rfc2616/rfc2616-sec9.html
"If the Request-URI does not point to an existing resource, and that URI is capable of being defined as a new resource by the requesting user agent, the origin server can create the resource with that URI"
In RESTful APIs, PUT is typically used to update a resource or create one if it doesn't exist at the specified URL (i.e. the client provides the id). If the server generates the id, RESTful APIs typically use a POST to create new resources. In the latter scenario, the generated id/url is usually returned or specified in a redirect.
Example: POST /orders/
According to W3C Both PUT and POST can be used for update and/or create.
The basic difference between them is how the server handles the Request-URI. PUT URI identifies the entity and the server should't try to map it to another URL, while POST URI can be a handler for that content. Examples:
It's OK to POST a new order to /order, but not a PUT. You can update order 1 with a PUT or POST to /order/1.
To put it simply POST is for creating and PUT is for updating. If you don't have an ID for an object because it isn't created yet, you should be using a POST. If an object DOES exist and you just don't have the ID for it, you're going to have to search for it using a GET of some kind.
The thing to remember is Idempotence. A PUT (and GET for that matter) is idempotent. Basically meaning, you can hit the same URL over and over and it shouldn't make a difference the 2nd or 3rd time (It edits the data once, and calling it again it doesn't make that change again). However a POST is not idempotent. Meaning, you hit the same URL 3 or 4 times in a row and it's going to keep changing data (creating more and more objects). This is why a browser will warn you if you click back to a POST url.
You say, "don't know the ID of the new order" therefore the following is not true "URI is capable of being defined as a new resource by the requesting user agent", therefore PUT is not appropriate in your scenario.
Where is the confusion? I am of course assuming the Id would be part of the URL.

Resources