Freebase MQL Query to fetch all instances and their Wiki description - freebase

I want to fetch and save Wikipedia descriptions of all instances from /sports/sport to my database.
It requires 2 API calls - one to fetch mid and another to fetch wiki description. Is it possible to combine the 2 queries to a single query?
Thanks in advance.

The API isn't really designed for bulk downloads. There's a dump file available for these types of bulk operations. https://developers.google.com/freebase/data#freebase-rdf-dumps
If you were searching for a specific sport or sports, you could get both the description and MID using the Search API e.g. https://www.googleapis.com/freebase/v1/search?indent=true&filter=(all%20type:/sports/sport)&output=(description)
Don't forget that you need to cite Wikipedia as well as Freebase, since you're using their text.

Related

Practical usage for linked data

I've been reading about linked data and I think I understand the basics of publishing linked data, but I'm trying to find real world practical (and best practise) usage for linked data. Many books and online tutorials talk a lot about RDF and SPARQL but not about dealing with other peoples data.
My question is, if I have a project with a bunch of data that I output as RDF, what is the best way to enhance (or correctly use) other people's data?
If I create an application for animals and I want to use data from the BBC wildlife page (http://www.bbc.co.uk/nature/life/Snow_Leopard) what should I do? Crawl the BBC wildlife page, for RDF, and save the contents to my own triplestore or query the BBC with SPARQL (I'm not sure that this is actually possible with the BBC) or do I take the URI for my animal (owl:sameAs) and curl the content from the BBC website?
This also asks the question, can you programmatically add linked data? I imagine you would have to crawl the BBC wildlife page unless they provide an index of all the content.
If I wanted to add extra information such as location for these animals (http://www.geonames.org/2950159/berlin.html) again what is considered the best approach? owl:habitat (fake predicate) Brazil? and curl the RDF for Brazil from the geonames site?
I imagine that linking to the original author is the best way because your data can then be kept up-to-date, which from these slides from a BBC presentation (http://www.slideshare.net/metade/building-linked-data-applications) is what the BBC does, but what if the authors website goes down or is too slow? And if you were to index the author's RDF I imagine your owl:sameAs would point to a local RDF.
Here's one potential way of creating and consuming linked data.
If you are looking for an entity (i.e., a 'Resource' in Linked Data terminology) online, see if there is Linked Data description about it. One easy place to find this is DBpedia. For Snow Leopard, one URI that you can use is http://dbpedia.org/page/Snow_leopard. As you can see from the page, there are several object and property descriptions. You can use them to create a rich information platform.
You can use SPARQL in two ways. Firstly, you can directly query a SPARQL endpoint on the web where there might be some data. BBC had one for music; I'm not sure if they do for other information. DBpedia can be queried using snorql. Secondly, you can retrieve the data you need from these endpoints and load into your triple store using INSERT and INSERT DATA features of SPARQL 1.1. To access the SPARQL end points from your triple store, you will need to use the SERVICE feature of SPARQL. The second approach protects you from the inability to execute your queries when a publicly available end point is down for maintenance.
To programmatically add the data to your triplestore, you can use one of the predesigned libraries. In Python, RDFlib is useful for such applications.
To enrich the data with that sourced from elsewhere, there can again be two approaches. The standard way of doing it is using existing vocabularies. So, you'd have to look for the habitat predicate and just insert this statement:
dbpedia:Snow_leopard prefix:habitat geonames:Berlin .
If no appropriate ontologies are found to contain the property (which is unlikely in this case), one needs to create a new ontology.
If you want to keep your information current, then it makes sense to periodically run your queries. Using something such as DBpedia Live is useful is this regard.

How to get a list of all films on Wikidata?

I was using Freebase to get all movies/films there for my website, but it's getting shut down soon. so I was searching for another free database for movies and came across Wikidata. To be honest it's too complicated to understand how to query all the movies.
So I thought you guys could help me to get all the movies in Wikidata. In the future I want to include TV shows and series as well.
Programming language doesn't matter, I want to use web query with a link.
you can look for all the entities that are an instance of film, which in Wikidata is translated as:
P31 (instance of) -> Q11424 (film)
For the moment, the best way to do this query is to use the wdq.wmflabs.org API, where this query translate as: http://wdq.wmflabs.org/api?q=claim[31:11424] (avoid making this query in a browser as it will probably make it crash due to the ). At the moment I'm writting, this requests returns 157038 items in the form of numeric ids (ex: 125). To get the Wikidata ids, just add a leading Q -> Q125.
To get the labels and data from all those Wikidata ids, use the Wikidata API wbgetentities action: https://www.wikidata.org/w/api.php?action=wbgetentities&ids=Q180736&format=json&languages=en.
Beware of the 50 entities per-query limit though
[UPDATE] Wikidata now offers a SPARQL endpoints: the same query in SPARQL or even this same query but also including subclasses of films

How to update metadata using content indexs in webcenter content

I need to create a program which can search a document and fill the metadata from document( eg. resume of candidate) like user experience, user skill , location etc.
for this i like to use oracle indexing mechanism(Oracle text search) because it index all the data from document. when it index the document, i like to first update my metadata field from indexed data and then content server will update their indexes. Can anyone help me how i will get to know the working of indexer and event on which i will trap and do some modification for updating my metadata.
i need to update metadata because requirement are:
Extensive choices for Search Filter criteria (that searches within Resumes and not just form keywords) :
- Boolean search between multiple parameters
- Have search on Skills, Years of experiences, particular company, education qualification, Geo/Location and Submission date of the profile.
- Search on who referred, name, team , BU etc.
- Result window adequate size of results, filters
- Predefined resume filter criteria to assisting screening in case of candidate applying on job portal
You are looking at this problem from the wrong end. The indexer (OracleText Search) is a powerful and complex tool embedded inside the workings of the database. What you are suggesting is to interpret the results of text indexing and use this as metadata for your content - if I am not mistaken? OracleText generates huge amounts of data and literally "chops" up documents word for word. For you to make meaningful metadata from this would be a huge task.
Instead you should be looking at the capture of the metadata from as close to the source as possible. This could be done using (if you are using MS-OFFICE) Word vbScript when the user saves to the repository or filesystem. I believe you can fully manipulate the metadata in a document at savetime.
You will of course need to install the Oracle WebCenter Content Desktop Integration suite.
Look into Oracle WebCenter Capture. WebCenter Capture can scan a document and allows metadata to be automatically tagged on the document. WebCenter Capture integrates with WebCenter Content (WCC) and allows you to directly checkin scanned documents to WebCenter Content.
http://www.oracle.com/technetwork/middleware/webcenter/content/index-090596.html

Scraping BRfares for train fares

I am looking for advise. The following website
http://brfares.com/#home
provides fares information for UK train lines. I would like to use it to build a database of travel costs for seasons tickets from different locations. I have never done this kind of thing before but have experience with Python/Bash scripting and some HTML.
Viewing the source code for a typical query the actual fair information is not displayed in index.html. Can anyone provide a pointer as to how to go about scraping (a new word for me) the information.
This is the url for the query : http://brfares.com/querysimple?orig=SUY&dest=0415&rlc=
the response is a json object.
First you need to build a lookup table of all destinations codes. you can use the following link to do that http://brfares.com/ac_loc?term=. Do it for all the letters in the alphabet and then parse for a unique list.
Then you take them by the pair, execute the json query, parse the returned json and feed the data to a database.
Now you can do whatever you want with that database.

which freebase api(s) returns details for machine ids? (mid)

I am attempting to develop a Freebase Explorer application. one part of the application allows a user to drill down through freebase Domains, then types then type instances, then finally using the freebase Topic API i display the selected Type Instance. however many of the type instances lists do have "null" for the name and machine ids for the id.
what combination of freebase api calls can i employ to return something of value/interest (man readable) using a freebase mid?
where should i look in the freebase site/wiki to help?
A machine ID (MID) can be used anywhere any other ID is used in Freebase. There's no requirement that an object have a name. "Something of value/interest" will depend a lot on the context, but the types and property values of an object help show how it's connected to the rest of the graph.
You might also look at the existing Freebase Schema Explorer app for ideas and inspiration.
Tom's explanation regarding machine ids is spot on, here is some additional information:
Domains and types are schema objects and it's preferable that you use human readable ids for these. Items "of interest" are usually topics, and those are all objects that are typed with /common/topic.
You can use MQL to get a list of types and domains, and then as you say use the Topic API - which will also be available in the new APIs - to get all the data for a given topic.

Resources