Querying a vector of strings to Wikidata using WikidataQueryServiceR - r

Provided a vector of movies' names, I would like to know their genres querying Wikidata.
Since I am a R user, I have recently discovered WikidataQueryServiceR which has exactly the same example I was looking for:
library(WikidataQueryServiceR)
query_wikidata('SELECT DISTINCT
?genre ?genreLabel
WHERE {
?film wdt:P31 wd:Q11424.
?film rdfs:label "The Cabin in the Woods"#en.
?film wdt:P136 ?genre.
SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}')
## 5 rows were returned by WDQS
Unfortunately, this query uses a static text, so I would like to replace The Cabin in the Woods by a vector. In order to do, I tried with the following code:
library(WikidataQueryServiceR)
example <- "The Cabin in the Woods" # Single string for testing purposes.
query_wikidata(paste('SELECT DISTINCT ?human ?humanLabel ?sex_or_gender ?sex_or_genderLabel WHERE {
?human wdt:P31 wd:Q5.
?human rdfs:label', example, '#en.
?human wdt:P21 ?sex_or_gender.
SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
OPTIONAL { ?human wdt:P2561 ?name. }
}', sep = ""))
But that does not work as expected, as I get the following result:
Error in FUN(X[[i]], ...) : Bad Request (HTTP 400).
What am I doing wrong?

Have you tried to output your SPARQL query? —
There is no space after rdfs:label
There are no quotes around The Cabin in the Woods
In your R code, instead of
?human rdfs:label', example, '#en.
line 7 should be:
?human rdfs:label "', example, '"#en.
Although query_wikidata() can accept vector of strings, I'd suggest to use SPARQL 1.1 VALUES instead, in order to avoid too many requests.
library(WikidataQueryServiceR)
example <- c("John Lennon", "Paul McCartney")
values <- paste(sprintf("('%s'#en)", example), collapse=" ")
query <- paste(
'SELECT DISTINCT ?label ?human ?humanLabel ?sexLabel {
VALUES(?label) {', values,
'}
?human wdt:P31 wd:Q5.
?human rdfs:label ?label.
?human wdt:P21 ?sex.
SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}'
)
query_wikidata(query)
For large number of VALUES, you probably need to use the development verion of WikidataQueryServiceR: it seems that only the development version supports POST requests.

Related

Get the complete info of a Wikidata item

I'm using the following query to get the info of a specific Wikidata item.
For example, this one gets the info about the movie Titanic
SELECT ?wd ?wdLabel ?ps ?ps_Label ?wdpqLabel ?pq_Label {
VALUES (?film) {(wd:Q44578)}
?film ?p ?statement .
?statement ?ps ?ps_ .
?wd wikibase:claim ?p.
?wd wikibase:statementProperty ?ps.
OPTIONAL {
?statement ?pq ?pq_ .
?wdpq wikibase:qualifier ?pq .
}
SERVICE wikibase:label { bd:serviceParam wikibase:language "en" }
} ORDER BY ?wd ?statement ?ps_
It works well and I do get the info, but I want to add the items ("Q") beside them.
For example, if the genre is "romance film" I would like to get Q1054574 besides it. And if the actor is Leonardo DiCaprio I would like to get Q38111.
How can I achieve this in this kind of query?
You could add ?ps_ to the SELECT:
SELECT ?wd ?wdLabel ?ps ?ps_Label ?ps_ ?wdpqLabel ?pq_Label
Result: Screenshot

Query WikiData using Wikipedia categories

I'm wondering if it's possible to write a WikiData SparkQL query that can retrieve all entities under a category?
For example: the wikipage of Barak Obama has a bunch of categories including: "African-American Christians", "African-American educators", "African-American feminists", "African-American lawyers"
I'm trying to find a way to select all "humans" what match those categories. The wikidata page of Obama doesnt have any of those categories so I'm not sure how to query this.
Thanks
SELECT * WHERE {
wd:Q76 wdt:P910 ?category .
?link schema:about ?category; schema:isPartOf <https://en.wikipedia.org/>; schema:name ?title .
SERVICE wikibase:mwapi {
bd:serviceParam wikibase:endpoint "en.wikipedia.org";
wikibase:api "Generator";
mwapi:generator "categorymembers";
mwapi:gcmtitle ?title;
mwapi:gcmprop "ids|title|type";
mwapi:gcmlimit "max".
?member wikibase:apiOutput mwapi:title.
?ns wikibase:apiOutput "#ns".
?item wikibase:apiOutputItem mwapi:item.
}
}
adapted from here

SPARQL - recursive query to get a term and all of its multi-level children

I wrote a query for this thesaurus:
http://vocabs.ceh.ac.uk/evn/tbl/envthes.evn#http%3A%2F%2Fvocabs.lter-europe.net%2FEnvThes%2F10000
The Sparql endpoint is here:
http://vocabs.ceh.ac.uk/evn/tbl/swp?_viewClass=endpoint:HomePage
Just select "urn:x-evn-pub:envthes" as the default graph.
It returns every term that is sorted under the term "measure" (http://vocabs.lter-europe.net/EnvThes/10004). It works as desired, but the problem is that it neither elegant nor easy to write like that.
Therefore I am looking for a better way to write the following query:
select distinct ?concept (str(?prefLab) as ?label) (str(?altlab) as ?code)
(str(?p) as ?parent) (str(?pl) as ?parlab) ("EnvThes" as ?source)
WHERE {
?concept <http://www.w3.org/2004/02/skos/core#broader> ?level2.
?concept <http://www.w3.org/2004/02/skos/core#prefLabel> ?prefLab.
OPTIONAL {?concept <http://www.w3.org/2004/02/skos/core#altLabel> ?altlab.
FILTER (lang(?altlab)='en').
}.
OPTIONAL {?concept <http://www.w3.org/2004/02/skos/core#broader> ?p.
?p <http://www.w3.org/2004/02/skos/core#prefLabel> ?pl}
OPTIONAL
{?level2 <http://www.w3.org/2004/02/skos/core#broader> ?level3.
OPTIONAL
{?level3 <http://www.w3.org/2004/02/skos/core#broader> ?level4.
OPTIONAL
{?level4 <http://www.w3.org/2004/02/skos/core#broader> ?level5.
OPTIONAL
{?level5 <http://www.w3.org/2004/02/skos/core#broader> ?level6.
OPTIONAL
{?level6 <http://www.w3.org/2004/02/skos/core#broader> ?level7.
OPTIONAL
{?level7 <http://www.w3.org/2004/02/skos/core#broader> ?level8.
OPTIONAL
{?level8 <http://www.w3.org/2004/02/skos/core#broader> ?level9.
OPTIONAL
{?level9 <http://www.w3.org/2004/02/skos/core#broader> ?level10.
}.}.}.}.}.}.}.}.
FILTER(
?level10 = <http://vocabs.lter-europe.net/EnvThes/10004> ||
?level9 = <http://vocabs.lter-europe.net/EnvThes/10004> ||
?level8 = <http://vocabs.lter-europe.net/EnvThes/10004> ||
?level7 = <http://vocabs.lter-europe.net/EnvThes/10004> ||
?level6 = <http://vocabs.lter-europe.net/EnvThes/10004> ||
?level5 = <http://vocabs.lter-europe.net/EnvThes/10004> ||
?level4 = <http://vocabs.lter-europe.net/EnvThes/10004> ||
?level3 = <http://vocabs.lter-europe.net/EnvThes/10004> ||
?level2 = <http://vocabs.lter-europe.net/EnvThes/10004>).
FILTER(lang(?prefLab) = 'en').
}
Is there any way to make this recursive? I'm still very new to Sparql and having a hard time writing queries that actually work in the first place.
Thanks.
EDIT: Wow. Thanks for the super helpful answer and comment. I was already able to rewrite the query. For the sake of completeness I'll put it here, but the credit belongs to Joshua Taylor.
Shortened query:
prefix skos: <http://www.w3.org/2004/02/skos/core#>
prefix envthes: <http://vocabs.lter-europe.net/EnvThes/>
select * {
values ?category { envthes:10004 }
?concept skos:broader* ?category .
?concept skos:prefLabel ?prefLab .
filter langMatches(lang(?prefLab), 'en')
optional {
?concept skos:altLabel ?altlab
filter langMatches(lang(?altLabel), 'en')
}.
OPTIONAL {?concept skos:altLabel ?altlab.
FILTER (lang(?altlab)='en').}.
OPTIONAL {?concept skos:broader ?parent.
?parent skos:prefLabel ?parLab .
}.
}
This isn't exactly the same as your query, and I may have reversed the direction of the link between the concept and the category (I never remember the exact semantics of skos:broader, and which way it's supposed to go). The main changes here, though, are to use a prefix to make the query more readable, and to use a property path (skos:broader*) to link ?category and ?concept. I also used a values block to bind ?concept to the particular fixed value that you mentioned.
prefix skos: <http://www.w3.org/2004/02/skos/core#>
select * {
#-- specify the value for ?category (you can just
#-- use this inline, too, but defining it with
#-- values makes it easier to add others later, and
#-- can make the query easier to read)
values ?category { <http://vocabs.lter-europe.net/EnvThes/10004> }
#-- require that ?concept is related to ?category
#-- by a chain skos:broader properties of length
#-- zero or more. (Zero means that ?concept can be
#-- ?category. Use skos:broader+ to require a path
#-- of length one or more.)
?concept skos:broader* ?category .
#-- get an preferred label in English (required)
?concept skos:prefLabel ?prefLab .
filter langMatches(lang(?prefLab), 'en')
#-- get an alternative label in English (optional)
optional {
?concept <http://www.w3.org/2004/02/skos/core#altLabel> ?altlab
filter langMatches(lang(?altLabel), 'en')
}
}

Mongo query that matches field to any element of array

I am trying to query a Mongo Db through R (rmongodb package). i have a simple requirement:
Return records where the field "email" matches any of the emails in the vector usr$email. I think I am close but just not able to find the right syntax to pull it through.
I saw this response to an earlier question (Mongo: If any array position matches single query) and am trying along the lines:
eids_l <- paste0("'", unique(usr$email), "'", collapse=", ")
eids_l1 <- sprintf("[ %s ]", eids_l)
q <- sprintf('{"email": {"$in": %s}}', eids_l1)
cursor <- mongo.find.all(mongo, namespace, buf)
I still get an error:
Error in mongo.bson.from.JSON(arg) :
Not a valid JSON content: {"email": {"$in": [ 'xx#gmail.com',
cursor <- mongo.find.all(mongo, "namespace", query='{ "email": {
"$in": ["xx#gmail.com", "yy#gmail.com", "zz#gmail.com" ] } }')
Be careful with the use of apostrophes(') and quotation marks(").
I always use the rmongodb Cheat sheet:
https://cran.r-project.org/web/packages/rmongodb/vignettes/rmongodb_cheat_sheet.pdf

virtuoso-opensource: trouble while (jena) querying data loaded using vload script?

I followed this article on "Installing and Managing Virtuoso SPARQL Endpoint" (http://logd.tw.rpi.edu/tutorial/installing_using_virtuoso_sparql_endpoint)
After loading data from a ntriple file with the following command
sudo ./vload nt /path/to/data/file/data.nt http://www.soctrace.org/ontologies/st.owl
I successfuly queried those data from the Web interface SPARQL endpoint located at http://localhost:8890/sparql
SELECT ?s ?p ?o WHERE { ?s ?p ?o }
However, I'm interested on querying those data from jena, so I ran the following Java code
public void queryVirtuoso( ) {
Model model = VirtModel.openDatabaseModel("http://www.soctrace.org/ontologies/st.owl", "jdbc:virtuoso://localhost:1111", "dba", "dba");
// Query string.
String queryString = "SELECT ?s ?p ?o WHERE {?s ?p ?o}" ;
System.out.println("Execute query=\n"+queryString) ;
System.out.println() ;
QueryExecution qexec = VirtuosoQueryExecutionFactory.create(queryString, model) ;
try {
ResultSet rs = qexec.execSelect() ;
System.out.println("Number of results founded " + rs.getRowNumber());
} finally {
qexec.close() ;
}
}
But unfortunatly the code returns no result.
It seems that the first parameter of the openDatabaseModel from my code is not correct but I don't know what the correct value is.
Does someone have any indication about how to query a virtuodo graph from Jena giving that data are imported using vload script ?
Best regards,
If you not sure about the graph-name, you might look them up in the LinkedData tab in you Virtuoso conductor. It should also be possible to use VirtModel.openDatabaseModel without a graph name (connectionURL, username, password)...

Resources