Publish URI over HTTP - uri

My graduation project is titled as "Web Storytelling using dynamic Semantic Publishing". It's all about taking a query from the user and show the result in the form of a story that contains text and images, these data will be extracted from google search resulted links for that query.
But I'm confused about something, I want to use linked data techniques , what I did first that i collected the data and put them in RDF format but right now i'm confused about what is called URI.
as my rdf has a lot of triples that describe some entities and some sentences , Is that necessary to make for every subject in my rdf that describe sentences a URI that dereferencable over HTTP ?
or can i make uri for each story or article that I'll make at run time and when the user Query them again I'll have an rdf and html representation for that article.
so to sum up Is that a must to make for every triple a URI that dereferencable over HTTP even if that triple describe sentences? And If i have An rdf describe entities can i use the URI in owl:sameas to get the representation from other site like dbpedia by 303 response instead of making for every resource representation in rdf or html ?

Related

Trouble accessing rdf data from Sponger

I am currently working on a project which uses Virtuoso Sponger. I have been having multiple issues, and I have referenced a lot of material before asking these questions. Since I am new to Virtuoso, please be patient with me.
I cannot seem to access RDF data using this format as given on the Sponger page — http://{virtuoso-host}/about/data/{format}/{URIscheme}/{authority}/{local-path}
I tried it both on the linkeddata.uriburner.com and the personal server I hosted with Virtuoso installed.
I wrote this in the address bar —
http://linkeddata.uriburner.com/about/data/xml/http://www.bbc.co.uk/music/artists/ed2ac1e9-d51d-4eff-a2c2-85e81abd6360%01artist
— and got this error —
Error HTTP/1.1 404 File not found
The requested URL was not found
URI = '/about/data/xml/http:/www.bbc.co.uk/music/artists/ed2ac1e9-d51d-4eff-a2c2-85e81abd6360artist'
When I try the HTML —
http://{virtuoso-host}/about/html/{URIscheme}/{authority}/{local-path}
— option of Browser input, I get much less data output from my server than from linkeddata.uriburner.com. How can I correct this?
My main objective is to get RDF data from social media and information sites, and store it in a database to be searched on locally. So for example, BBC has info on Bob Marley; so has Wiki. I get structured data from both of them, take out redundant data, and add new data so that a single object is created. I wish to query on this data from the database.
How would I store this data to the database by executing using the Browser Input method?
Also, lets say this data got stored under graphs (I saw its link in the Virtuoso Conductor -> LinkedData -> Graph); then how do I query on it?
Shrivansh,
There are many issues here, so I am going to provide a broad answer.
The Sponger is going to transform a Web Resource into RDF-based Linked Data. The transformed data ends up in a Virtuoso-hosted RDF Document, which is identified by a Named Graph IRI.
Given a Web Resource URL —
http://www.slideshare.net/kleinerperkins/internet-trends-v1
— you could construct an extract, transform, and load (ETL) service URL as —
http://linkeddata.uriburner.com/about/html/http/www.slideshare.net/kleinerperkins/internet-trends-v1
The results of the above are as follows:
Basic Entity Description Page (note the alternative document type links in the page footer) — http://linkeddata.uriburner.com/about/html/http/www.slideshare.net/kleinerperkins/internet-trends-v1
Faceted Browsing-oriented Entity Description Page (again note the alternative document type links in the footer) — http://linkeddata.uriburner.com/c/9DH6GNQ6
Named Graph IRI — http://www.slideshare.net/kleinerperkins/internet-trends-v1
SPARQL Query Results Page — http://linkeddata.uriburner.com/c/9DJ563FL
SPARQL Query Definition (so you can see the query source code) — http://linkeddata.uriburner.com/c/9BL763CG
When using a local Virtuoso Sponger instance, note the following:
You have to install Sponger Cartridges for target data sources (e.g.,
Slideshare, LinkedIn, Facebook, Twitter, etc.)
The live URIBurner.com instance has many cartridges and meta cartridges installed and configured -- so you will see more results there than you get locally (unless you also install and enable all the asme cartridges on your local instance)
A list of available cartridges

How to retrieve resources based on different conditions using GET in RESTful api?

As per REST framework, we can access resources using GET method, which is fine, if i know key my resource. For example, for getting transaction, if i pass transaction_id then i can get my resource for that transaction. But when i want to access all transactions between two dates, then how should i write my REST method using GET.
For getting transaciton of transaction_id : GET/transaction/id
For getting transaction between two dates ???
Also if there are other conditions, i need to put like latest 10 transactions, oldest 10 transaction, then how should i write my URL, which is main key in REST.
I tried to look on google but not able to find a way which is completely RESTful and solve my queries, so posting my question here. I have clear understanding of POST and DELETE, but if i want to do same update using PUT for some resource based on condition, then how to do it?
There are collection and item resources in REST.
If you want to get a representation of an item, you usually use an unique identifier:
/books/123
/books/isbn:32t4gf3e45e67 (not a valid isbn)
or with template
`/books/{id}
/books/isbn:{isbn}
If you want to get a representation of a collection, or a reduced collection you use the unique identifier of the collection and add some filters to it:
/books/since:{fromDate}/to:{toDate}/
/books/?since="{fromDate}"&to="{toDate}"
the filters can go into the path or into the queryString part of the url.
In the response you should add links with these URLs (aka HATEOAS), which the REST clients can follow. You should use link relations, for example IANA link relations to describe those links, and linked data, for example schema.org or to describe the data in your representation. There are other vocabs as well, for example GoodRelations, and ofc. you can write your own vocab as well for your application.

Scraping BRfares for train fares

I am looking for advise. The following website
http://brfares.com/#home
provides fares information for UK train lines. I would like to use it to build a database of travel costs for seasons tickets from different locations. I have never done this kind of thing before but have experience with Python/Bash scripting and some HTML.
Viewing the source code for a typical query the actual fair information is not displayed in index.html. Can anyone provide a pointer as to how to go about scraping (a new word for me) the information.
This is the url for the query : http://brfares.com/querysimple?orig=SUY&dest=0415&rlc=
the response is a json object.
First you need to build a lookup table of all destinations codes. you can use the following link to do that http://brfares.com/ac_loc?term=. Do it for all the letters in the alphabet and then parse for a unique list.
Then you take them by the pair, execute the json query, parse the returned json and feed the data to a database.
Now you can do whatever you want with that database.

REST best-practice for overlong URIs

I have REST services which should receive really long queries via GET. Say for example I want to query a service with many geographical coordinates to find out something about all this coordinates.
1) My first thought was to use long URIs and increase the max URI length of the servlet container.
It would look like this:
GET http://some.test/myresource?query={really big JSON object}
But it seems that URIs longer than 2 KB are not reliable due to old proxy servers (is that right?).
2) My workaround is to create a temporary resource via POST first and use the URI of this resource as parameter in the actual GET request. That would look like this:
POST http://some.test/temp
Request Body: {really big JSON object}
201 Created Location: http://some.test/temp/12309871
GET http://some.test/myresource?query=http://some.test/temp/12309871
3) Use body of GET request. I've read the answers to the question whether it is a good idea to use the body of a GET request for the query, and the consensus is: no. Even Roy Fielding says that this is a bad idea.
4) Another approach could be to interpret POST as "create query result resource" and delete this resource after the request. But I consider that to be not RESTful and to be a bad idea.
Is there a better way to handle big queries with GET requests?
Use PUT.
Why? For the following reasons:
Just because the verb PUT 'may update' the resource, doesn't mean it will or must alter underlying state of the resource.
No new resource identifier (url) should be created by the API side of a PUT. Yes, technically a PUT with a client specified identifier is possible, but in this case you're hitting an existing resource.
PUT is like GET in the fact that it should be idempotent, meaning the results of the request will always be the same regardless of how often you call it and it has no side effects.
PUT means you're putting resource data to an existing resource. In terms of a article or post in the document / blog post worlds, it would be like uploading a new revision of some document to an existing resource URL. If you upload the same revision to the same URL, nothing should change in the resource you get back.
In your case, the geo data is some new resource data you're uploading and the result you get back should be the same every time you make the same request.
A more purist method to use the GET verb for the request might be:
Create an endpoint for a query resource type
POST the JSON set of query details to a query resource endpoint and get an identifier for the query resource (say it returns a query id of 123)
Submit to the get request a query identifier http://some.test/myresource?query_id=123
Delete the query resource 123
I see the pure method much more overhead than using PUT with query resource data in the body.
I thought that the whole point in REST was to work on "documents" (or something alike). The URI part of a request is there to identify uniquely the resource to work on. The body part in contrast is there for the "contents" part of the document.
Hence, use the "body" part of the request.
Also note that the semantics of a "GET" request isn't supposed to be used for "PUTTING" or "POSTING" documents (comment in relation to your "query" example above which seems to "create" an object).
In any case, as you have pointed out, the URI part is limited (for good reason I am sure).
If you are concerned with caching, then the use of ETag/Last-Modified fields (in conjunction with "conditional GET" helps for this purpose.
Here is a slight variation on your second option. Create yourself a processor resource called QueryMaker. POST your parameters to it and let it redirect you to a temporary query resource that will return your results.
POST /QueryMaker
Body: Big Json representation of parameters
303: See Other
Location: http://example.org/TemporaryQueries/123213
If you are using a GET request to send large objects, you are not using REST correctly.
GET should be used for retrieving
resources (via some sort of unique
identifier)
POST should be used for
creating resources (with the contents
in the body)
PUT should be used for
updating a resource (with the
contents in the body)
DELETE should be used for deleting a resource
If you follow these guidelines you will never have to have overly long URIs.
Some best practice REST guidelines are here: http://www.xml.com/pub/a/2004/08/11/rest.html
The biggest limitation of URL lengths on the open Web is actually IE, which constraints them to 2083 characters.
Some proxies (e.g., all but the most recent versions of Squid) will limit them to about 4k, although this is moving towards 8k slowly.
Your #2 workaround is a good approach, depending on your use case.
Sending bodies on GETs may be allowed by some implementations, and disallowed by others, so it's a bad idea for interoperability as well as theoretical reasons. Most importantly, how will a cache know what to use as a key?
Can't you just send the big JSON data with the GET request body, instead of creating the temp resource?
Although it's not 100% kosher, i've found it works nicely with firefox and IE and IMO, the querystring is inelegant and usually exposes implementation details that don't belong in the URI. Just make sure to add a cache buster querystring parameter if you need up-to-date data because the server will ignore the data when determining whether it can return a cached response.
See here for a discussion of pros and cons of stuffing data in the GET request body.

RESTifying URLs

At work here, we have a box serving XML feeds to business partners. Requests for our feeds are customized by specifying query string parameters and values. Some of these parameters are required, but many are not.
For example, we've require all requests to specify a GUID to identify the partner, and a request can either be for a "get latest" or "search" action:
For a search: http://services.null.ext/?id=[GUID]&q=[Search Keywords]
Latest data in category: http://services.null.ext/?id=[GUID]&category=[ID]
Structuring a RESTful URL scheme for these parameters is easy:
Search: http://services.null.ext/[GUID]/search/[Keywords]
Latest: http://services.null.ext/[GUID]/latest/category/[ID]
But what how should we handle the dozen or so optional parameters we have? Many of these are mutually exclusively, and many are required in combinations. Very quickly, the number of possible paths becomes overwhelmingly complex.
What are some recommended practices for how to map URLs with complex query strings to friendlier /REST/ful/paths?
(I'm interested in conventions, schemes, patterns, etc. Not specific technologies to implement URL-rewriting on a web server or in a framework.)
You should leave optional query parameters in the Query string. There is no "rule" in REST that says there cannot be a query string. Actually, it's quite the opposite. The query string should be used to alter the view of the representation you are transferring back to the client.
Stick to "Entities with Representable State" for your URL path components. Category seems OK, but what exactly is it that you are feeding over XML? Posts? Catalog Items? Parts?
I think a much better REST taxonomy would look like this (assuming the content of your XML feed is an "article"):
http://myhost.com/PARTNERGUID/articles/latest?criteria1=value1&criteria2=value2
http://myhost.com/PARTNERGUID/articles/search?criteria1=value1&criteria2=value2
If you're not thinking about the entities you are representing while building your REST structure, you're not doing REST. You're doing something else.
Take a look at this article on REST best practices. It's old, but it may help.
Parameters with values? One option is the query string. Using it is not inherently non-restful. Another option is to use the semi-colon, Tim Berners-Lee talks about them and they might just fit the bill, allowing the URL to make sense, without having massively long paths.

Resources