I'm trying to convert my current relational database to an RDF/OWL triple store. One thing that I'm running into is when I have a bridge/join table that has a composite/compound key of multiple values. For example, I have the following:
Equipment (EquipmentId)
EquipmentPoints (EquipmentId, PointId, CommodityId)
Point (PointId)
I'm unsure of how I would model the data in regards to saying Equipment :hasPoint ...? A particular point could be a different commodity depending on the type of equipment it is on.
Appreciate any help.
First, perhaps you could remodel things, getting rid of :EquipmentPoints. They are possibly just artefacts of relational modeling. RDF properties may have multiple values. See here for more details.
For the sake of clarity, I'll simplify your data model slightly:
Equipment (EquipmentId)
EquipmentPoints (EquipmentId, PointId)
Point (PointId)
RDF
RDF is schemaless, there is no constraints in RDF.
You could model things as shown in another answer:
#prefix : <https://stackoverflow.com/q/51974155/7879193#> .
#prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
#prefix owl: <http://www.w3.org/2002/07/owl#> .
:equipment1 rdf:type :Equipment .
:equipment2 rdf:type :Equipment .
:point1 rdf:type :Point .
:point2 rdf:type :Point .
:equipmentPoint1 rdf:type :EquipmentPoint .
:equipmentPoint2 rdf:type :EquipmentPoint .
:equipmentPoint1 :hasEquipment :equipment1 ;
:hasPoint :point1 .
:equipmentPoint2 :hasEquipment :equipment1 ;
:hasPoint :point1 . # "constraint violation"
In order to define constraints, you could use languages like SHACL. Unfortunately, there is no core constraint components for compound keys in SHACL, one should use SPARQL-based constraints:
:EquipmentPointShape a sh:NodeShape ;
sh:targetClass :EquipmentPoint ;
sh:sparql [
sh:message "Violation!" ;
sh:severity sh:Violation ;
sh:select """
SELECT ?this {
?point1 ^:hasPoint ?this, ?that .
?equipment ^:hasEquipment ?this, ?that .
FILTER (?this != ?that)
}
"""
] .
OWL
OWL was designed for inferencing, not for constraint checking. See this answer for more details. However, you could use OWL 2 keys.
First, add some ontological "boilerplate":
[] rdf:type owl:Ontology .
:Equipment rdf:type owl:Class .
:Point rdf:type owl:Class .
:EquipmentPoint rdf:type owl:Class .
:hasPoint rdf:type owl:ObjectProperty .
:hasEquipment rdf:type owl:ObjectProperty .
:equipment1 rdf:type owl:NamedIndividual .
:equipment2 rdf:type owl:NamedIndividual .
:point1 rdf:type owl:NamedIndividual .
:point2 rdf:type owl:NamedIndividual .
:equipmentPoint1 rdf:type owl:NamedIndividual .
:equipmentPoint2 rdf:type owl:NamedIndividual .
Now you have correct Turtle-serialized ontology. Then add:
:EquipmentPoint owl:hasKey (:hasEquipment
:hasPoint
) .
[ rdf:type owl:AllDifferent ;
owl:distinctMembers (:equipmentPoint1
:equipmentPoint2
)
] .
A reasoner will infer that your ontology is inconsistent.
Note, there is no Unique Name Assumption and there is Open World Assumption in OWL.
After removing
[ rdf:type owl:AllDifferent ;
owl:distinctMembers (:equipmentPoint1
:equipmentPoint2
)
] .
a reasoner will infer that
:equipmentPoint1 owl:sameAs :equipmentPoint2 .
You need to think of the Semantic Web as a web, or graph, of data more so than a set of tables linked to one another via foreign keys.
The entities in your dataset can link to each other precisely like a website can link to another website. In that sense, there is no such thing as a primary key, or specially composite keys. There is just resources linking to each other.
In you case, I would probably model it like in the following example:
#base <http://example.com/resource/> .
<Equipment1> <hasId> "1" .
<Equipment1> <name> "Equipment 1" .
<Point1> <hasId> "1" .
<Point1> <description> "This is Point 1" .
<Equipment1> <hasEquipmentPoint> <EquipmentPoint1> .
<Point1> <hasEquipmentPoint> <EquipmentPoint1> .
<EquipmentPoint1> <hasCommodity> <Commodity1> .
Alternatively you could try to model it closer to the table you presented, and make the equipmentPoint link to the point and equipment instead:
<EquipmentPoint1> <hasEquipment> <Equipment1> .
<EquipmentPoint1> <hasPoint> <Point1> .
<EquipmentPoint1> <hasCommodity> <Commodity1> .
Obviously revamp names, etc.
As you can see, there is no concept of keys, it is just a bunch of edges in a knowledge graph, which describes a link between two resources. Your table with composite keys could just be a separate resource, as I described above. There is no primary key, but it should still be possible to search through.
Related
I'm working on a DBPedia project to locate female singers who would have been active during the 1960s (approx).
Unfortunately when I try to select a range of singers who were active from 1955 - 1972 I miss out on singers who were active before 1955 (the results negate some singers, for instance Umm Kulthum who was active from 1925-1973).
My code is below, and shows where the filter is only including artists who were active exclusively for this date range. I want to create a filter that says "give me all singers who would have been musically active during the this date range in particular, but also include those who might have been active from a period before and including this date range"? I don't want those that were only active before this date range.
PREFIX dbo: <http://dbpedia.org/ontology/>
PREFIX dbp: <http://dbpedia.org/resource/>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX dct: <http://purl.org/dc/terms/>
PREFIX dbc: <http://dbpedia.org/resource/Category:>
SELECT distinct ?name ?person ?thumbnail ?birthDate ?active
where {
?person foaf:name ?name .
?person dct:subject ?subject.
?person dbo:birthDate ?birthDate.
OPTIONAL {?person dbo:thumbnail ?thumbnail}
OPTIONAL {?person dbo:activeYearsStartYear ?active}
{ ?person a dbo:MusicalArtist .
filter exists {?person
dct:subject/skos:broader*dbc:Female_singers_by_nationality}}
filter (?active > '1955-04-18T22:29:33.667Z'^^xsd:dateTime && ?active <
'1974-01-01T21:37:37.708Z'^^xsd:dateTime)} order by ?active
One solution is to also check the reverse in your filter using a boolean or. Like this:
SELECT distinct ?name ?person ?thumbnail ?birthDate ?activeStart ?activeEnd
where {
?person foaf:name ?name .
?person dct:subject ?subject.
?person dbo:birthDate ?birthDate.
?person dbo:activeYearsStartYear ?activeStart.
?person dbo:activeYearsEndYear ?activeEnd
OPTIONAL {?person dbo:thumbnail ?thumbnail }
{ ?person a dbo:MusicalArtist .
filter exists {
?person dct:subject/skos:broader* dbc:Female_singers_by_nationality
}
}
BIND('1955-04-18T22:29:33.667Z'^^xsd:dateTime as ?startPeriod)
BIND('1974-01-01T21:37:37.708Z'^^xsd:dateTime as ?endPeriod)
filter ( (?activeStart > ?startPeriod && ?activeStart < ?endPeriod)
|| (?activeStart < ?startPeriod && ?activeEnd > ?startPeriod))
}
order by ?activeStart
I'm creating an interface in SPARQL to query DBpedia.
For example you can search people who were born in Paris, or people who born in 1966.
My request is generalized and the value changes according to your choice.
According to my example above, here variable1= dbo:birthplace or variable1=dbo:birthDate.
SELECT *
WHERE {
?x a dbo:Person .
?x variable1 ?z.
}
I add a line to write the name of the place you want:
SELECT *
WHERE {
?x a dbo:Person .
?x variable1 ?z.
?z rdfs:label variable2.
}
But this can work only if ?z is an URI, which is not the case for date.
Does someone know a way to make these 2 situations working ?
I tried to add an if statement saying:
If ?z is a URI, add the line ?z rdfs:label variable2.
Otherwise check if ?z = variable2
But it seems that if statement works only to create a new parameter, in this example ?type.
BIND (IF(isURI(?z),"URI","Not")AS ?type).
While I would like something like :
BIND (IF(isURI(?z),?z rdfs:label ?nameobject,?nameobject)AS ?nameobject).
Sorry if my question is not asked correctly, I tried to do it as clear as I could ..
EDIT: Using OPTIONAL, thanks to Stanislav Kralin
I tried with optional, here is my code:
SELECT distinct *
WHERE {
?x a dbo:Person .
?x rdfs:label ?name .
?x dbp:birthName ?z .
OPTIONAL{ ?z rdfs:label ?nameobject .}
OPTIONAL{BIND(?z as ?nameobject) .}
BIND (concat("http://wikipedia.org/wiki/",replace(?name," ","_")) as ?wikilink) .
}
LIMIT 100
So if ?z is an URI, it gives the rdfs:label; if not (that is typed literal or plain literal with language tag), it should keep ?z.
It does the first optional but not the the second one. However if I write this
OPTIONAL{BIND("Try" as ?nameobject) .}
it writes the "Try" statement. So I think I am not far from the solution, perhaps I'm not writing correctly the BIND.
Finally, here is the solution! :)
Here is the beginning of my code :
SELECT distinct *
WHERE {
?x a dbo:Activity .
?x rdfs:label ?name .
?x dbp:skills ?z .
}
ORDER BY?x
LIMIT 100
My problem was that I needed to make 2 different queries according to the data type of my ?z variable.
I tried to do it with IF, but as explained here, in SPARQL IF is an operator and not a statement.
So I tried with OPTIONAL by saying :
OPTIONAL{ ?z rdfs:label ?nameobject .}
OPTIONAL{BIND( ?z as ?nameobject) .}
That means, if rdfs:label of ?z exists, put it in ?nameobject, otherwise, put ?z in ?nameobject.
But that didn't work, probably because of the different types of variables.
Finally my solution is to create 2 columns, to put the data in the same type, and then to put them in the same column:
SELECT distinct *
WHERE {
?x a dbo:Activity .
?x rdfs:label ?name .
?x dbp:skills ?z .
OPTIONAL{ ?z rdfs:label ?nameobjectURI .}
BIND( IF(isURI(?z),"",concat(?z," ")) as ?nameobjectOTH) .
BIND( IF(bound(?nameobjectURI),STR(?nameobjectURI),?nameobjectOTH) as ?nameobject) .
}
ORDER BY?x
LIMIT 100
And that works! I hope it will help someone else :)
EDIT with COALESCE solution, from Stanislav Kralin
It is possible to simplify the code like this :
SELECT distinct *
WHERE {
?x a dbo:Activity .
?x rdfs:label ?name .
BIND(STR(?name) as ?namestr) .
?x dbp:skills ?z .
OPTIONAL{ ?z rdfs:label ?nameobjectURI .}
BIND (COALESCE(STR(?nameobjectURI),concat(?z," ")) as ?nameobject) .
}
I am trying to run an example of inferencing using subClassOf relationship.
For some reason, I am getting the select query results when I use xquery but not when I use sparql.
xquery
let $sq :=
'PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT *
WHERE { ?s rdf:type <http://www.smartlogic.com/geography#Europe> .
} '
let $rs := sem:ruleset-store("rdfs.rules", sem:store())
return sem:sparql($sq, (), (), $rs)
sparql
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT *
WHERE { ?s rdf:type <http://www.smartlogic.com/geography#Europe> .
}
As of now (MarkLogic 8.0-3), the SPARQL interface does not provide a way to specify a set of inference rules to use. You can configure a default ruleset to use with the database, which will be used with all SPARQL queries.
As you've done, you can use sem:ruleset-store() (XQuery) or sem.rulesetStore() (JavaScript) to specify a ruleset to use.
I have created a simple turtle file containing the following contents-
#prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
#prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
#prefix owl: <http://www.w3.org/2002/07/owl#> .
#prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
<uri:uuid#1> rdfs:label "Communication"^^xsd:string .
<uri:uuid#2> rdfs:label "Communication" .
Then I LOADED this turtle file in Big data.
After this I ran two select queries. The first one was-
prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
prefix owl: <http://www.w3.org/2002/07/owl#>
prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT ?x ?name ?datatype
WHERE {
?x rdfs:label ?name .
FILTER (STRSTARTS(?name,"Comm"))
BIND(datatype(?name) as ?datatype)
}
This gave the following result-
x name datatype
<uri:uuid#1> Communication xsd:string
<uri:uuid#2> Communication xsd:string
But when I ran a bit different query using REGEX in the FILTER like this-
prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
prefix owl: <http://www.w3.org/2002/07/owl#>
prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT ?x ?name ?datatype
WHERE {
?x rdfs:label ?name .
FILTER (regex(?name, "^Comm"))
BIND(datatype(?name) as ?datatype)
}
the result was -
x name datatype
<uri:uuid#2> Communication xsd:string
I was expecting the same result for both the SELECT query as in both cases ‘Communication’ is a string.
Can you please let me know why the results are different, is it for the REGEX. If so then is it that in Big Data if a string is ‘strongly typed’ xsd:string then REGEX does not work.
Any help will be much appreciated.
Ok, got a solution for this. Regex actually works only on simple, untyped literals. To make the Regex work ?name needs to be wrapped around a str() operator. So the query needs to be-
SELECT ?x ?name ?datatype
WHERE {
?x rdfs:label ?name .
FILTER (regex(str(?name), "^Comm"))
BIND(datatype(?name) as ?datatype)
}
This will bring back both the triples.
I have a triple store that contains mail archive data. So let's say I have a lot of persons (foaf:Person) that have sent (ex:hasSent) and received (ex:hasReceived) emails (ex:Email).
Example:
SELECT ?person ?email
WHERE {
?email rdf:type ex:Email.
?person rdf:type foaf:Person;
ex:hasSent ?email.
}
The same works for ex:hasReceived, of course. Now I would like to do some statistics and analytics, i.e. determine how many emails an individual has sent and received. Doing this for only one predicate is a simple aggregation:
SELECT ?person (COUNT(?email) AS ?count)
WHERE {
?email rdf:type ex:Email.
?person rdf:type foaf:Person;
ex:hasSent ?email.
}
GROUP BY ?person
However, I need need the number of received emails as well and I would like to do this without having to issue a separate query. So I tried the following:
SELECT ?person (COUNT(?email1) AS ?sent_emails) (COUNT(?email2) AS ?received_emails)
WHERE {
?person rdf:type foaf:Person.
?sent_email rdf:type ex:Email.
?person ex:hasSent ?sent_email.
?received_email rdf:type ex:Email.
?person ex:hasReceived ?received_email.
}
GROUP BY ?person
This did not seem to be right, as the numbers for the emails sent vs. received were exactly the same. I assume this is because my SPARQL statement results in a cross product of all mails a person has ever sent and received, right?
What do I need to do in order to get the statistics right on a per-individual basis?
COUNT(?email1) isn't counting anything as ?email1 is undefined. Also, there is partial cross product as you mention - DISTINCT will help.
Try (COUNT(DISTINCT ?sent_email) AS ?sent_emails)