SPARQL CONSTRUCT to generate RDF collection from GROUP BY bindings - collections

I would like to generate an RDF collection (i.e. rdf:first/rdf:rest linked list) using a SPARQL construct query, putting all grouped bindings for a variable into one collection.
So for the data
#prefix ex: <https://example.com/ns#> .
ex:example1 a ex:Example ;
ex:name "Example1" ;
ex:even false .
ex:example2 a ex:Example ;
ex:name "Example2" ;
ex:even true .
ex:example3 a ex:Example ;
ex:name "Example3" ;
ex:even false .
ex:example4 a ex:Example ;
ex:name "Example4" ;
ex:even true .
ex:example5 a ex:Example ;
ex:name "Example5" ;
ex:even false .
if the SELECT query
PREFIX ex: <https://example.com/ns#>
select (group_concat(?name) as ?names) where {
?a ex:even ?even;
ex:name ?name
} group by ?even
yields
names
Example1 Example3 Example5
Example2 Example4
what would a corresponding CONSTRUCT query look like that contains the bindings for ?names as an rdf collection, ie something like
( "Example1" "Example3" "Example5" )
( "Example2" "Example4")
(Assuming TTL interpretation of the above)
Background: I would like to generate SHACL shapes using SHACL-AF SPARQLRules, and one thing I am struggling with is to generate sh:in (...) where the list is generated as an aggregate over multiple solutions of the query.

Related

Parsing an array of objects, do some math with their index

I have a large json which is actually a concatenated array of objects from several configuration files. I would like to use them to bring up a menue in a bash script. To make the menue easier to read, the json array contains special objects that would trigger a line break. In the end, the user picks the index of the array.
A simplified json looks like this:
[
{
"index" : 0,
"value" : "one a"
},
{
"index" : 3,
"value" : "two a"
},
{
"value" : ""
},
{
"index" : 2,
"value" : "three a"
},
{
"value" : ""
},
{
"index" : 1,
"value" : "one b"
},
{
"index" : 3,
"value" : "two b"
},
{
"index" : 2,
"value" : "three b"
}
]
All values with a come from the first file, all bs from the second file. The entries with an empty value are line breaks.
What I got so far, after hours of researching, is this:
jq --raw-output 'to_entries[] | "\(.key + 1). \(.value.value) (\(.value.index))"' test.json
Which produces this out of the above data:
1. one a (0)
2. two a (3)
3. (null)
4. three a (2)
5. (null)
6. one b (1)
7. two b (3)
8. three b (2)
Now the user would type 8 to work with the three b.
What I need, however, is this:
1. one a (0)
2. two a (3)
3. three a (2)
4. one b (1)
5. two b (3)
6. three b (2)
So the user would need to type 6 to do the same.
Any idea welcome!
Using foreach to count would be one way:
foreach .[] as {$index, $value} (0;
if $value != "" then . + 1 else . end;
if $value != "" then "\(.). \($value) (\($index))" else "" end
)
1. one a (0)
2. two a (3)
3. three a (2)
4. one b (1)
5. two b (3)
6. three b (2)
Demo

SPARQL Query to create a merged graph from different graphs based on property value comparisons

I have three graph data models with nodes representing the same physical entity in different ways in the three graphs.
Graph G1 where Pump P1 is of type CentrifugalPumpType
Graph G2 where Pump P2 is of type PADIMType
Graph G3 where Pump P3 is of type PumpType
As you can see in the above three graphs the same Pump is being modelled in different ways. However there is a way to find out if they are indeed the same pump. Between the first graph (G1) and the second graph (G2) the comparison can be done based on the values of the TagNameAssignmentClass property (from the G1 graph) with that of SignalTag property (from the G2 graph), in this example they both have the value "P1612-A". Similary between G2 and the third Graph (G3) the comparison can be done between the Manufacturer properties from G2 and G3 (in the example they have the same value "XYZ") and the respective SerialNumber properties from G2 and G3 ((in the example they have the same value "1234"). All of these properties are direct or indirect properties of the node representing the same pump (P1, P2 and P3) in all three models. The aim of the merge would be to actually merge the node representing the pump in the three models. The merged Graph would then look something like this:
I am a complete newbie to this new way of thinking, I went through all the basic SPARQL tutorials that are out there, however this query that I am trying to write is too complex for my current level of understanding of SPARQL. It would be great if someone out there could help! The string literals are just to explain what I mean, I do not mean to mention the string literals in my query, rather I would just want to directly compare the properties that I mentioned without mentioning what literal value that is.
Edit 1: I was asked to create a minimal reproducible example, so here is a try after removing unnecessary properties and simplifying the aim further:
So the Graph G1 dataset is as follows:
#prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns> .
#prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
#prefix eg1: <http://www.myexample1.com> .
eg1:PumpP1
rdf:type eg1:CentrifugalPumpType ;
has_property eg1:DifferentialPressure ;
has_property eg1:TagNameAssignmentClass .
eg1:TagNameAssignmentClass
rdf:value "P1612-A" .
Graph G2 Dataset is as follows:
#prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns> .
#prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
#prefix eg2: <http://www.myexample2.com> .
eg2:PumpP2
rdf:type eg2:PADIMType ;
has_property eg2:SignalSet ;
has_property eg2:Manufacturer ;
has_property eg2:SerialNumber .
eg2:Manufacturer
rdf:value "XYZ" .
eg2:SerialNumber
rdf:value "1234" .
eg2:SignalSet
has_property eg2:SignalS1 .
eg2:SignalS1
has_property eg2:SignalTag .
eg2:SignalTag
rdf:value "P1612-A" .
Graph G3 Dataset may look like:
#prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns> .
#prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
#prefix eg3: <http://www.myexample3.com> .
eg3:PumpP3
rdf:type eg3:PumpType ;
has_property eg3:Identification ;
has_property eg3:Ports .
eg3:Identification
has_property eg3:Manufacturer ;
has_property eg3:SerialNumber .
eg3:Manufacturer
rdf:value "XYZ" .
eg3:SerialNumber
rdf:value "1234" .
Expected Graph after the merge could look like this:
#prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns> .
#prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
#prefix eg1: <http://www.myexample1.com> .
#prefix eg2: <http://www.myexample2.com> .
#prefix eg3: <http://www.myexample3.com> .
#prefix mg: <http://www.mymergeexample.com> .
mg:PumpP
rdf:type eg1:CentrifugalPumpType ;
rdf:type eg2:PADIMType ;
rdf:type eg3:PumpType ;
has_property eg1:DifferentialPressure ;
has_property eg1:TagNameAssignmentClass ;
has_property eg2:Manufacturer ;
has_property eg2:SerialNumber ;
has_property eg2:SignalSet ;
has_property eg3:Identification ;
has_property eg3:Ports .
eg1:TagNameAssignmentClass
rdf:value "P1612-A" .
eg2:Manufacturer
rdf:value "XYZ" .
eg2:SerialNumber
rdf:value "1234" .
eg2:SignalSet
has_property eg2:SignalS1 .
eg2:SignalS1
has_property eg2:SignalTag .
eg2:SignalTag
rdf:value "P1612-A" .
eg3:Identification
has_property eg3:Manufacturer ;
has_property eg3:SerialNumber .
eg3:Manufacturer
rdf:value "XYZ" .
eg3:SerialNumber
rdf:value "1234" .
Please forgive syntax errors incase I might have made any.

jq select elements with array not containing string

Now, this is somewhat similar to jq: select only an array which contains element A but not element B but it somehow doesn't work for me (which is likely my fault)... ;-)
So here's what we have:
[ {
"employeeType": "student",
"cn": "dc8aff1",
"uid": "dc8aff1",
"ou": [
"4210910",
"4210910 #Abg",
"4210910 Abgang",
"4240115",
"4240115 5",
"4240115 5\/5"
]
},
{
"employeeType": "student",
"cn": "160f656",
"uid": "160f656",
"ou": [
"4210910",
"4210910 3",
"4210910 3a"
] } ]
I'd like to select all elements where ou does not contain a specific string, say "4210910 3a" or - which would be even better - where ou does not contain any member of a given list of strings.
When it comes to possibly changing inputs, you should make it a parameter to your filter, rather than hardcoding it in. Also, using contains might not work for you in general. It runs the filter recursively so even substrings will match which might not be preferred.
For example:
["10", "20", "30", "40", "50"] | contains(["0"])
is true
I would write it like this:
$ jq --argjson ex '["4210910 3a"]' 'map(select(all(.ou[]; $ex[]!=.)))' input.json
This response addresses the case where .ou is an array and we are given another array of forbidden strings.
For clarity, let's define a filter, intersectq(a;b), that will return true iff the arrays have an element in common:
def intersectq(a;b):
any(a[]; . as $x | any( b[]; . == $x) );
This is effectively a loop-within-a-loop, but because of the semantics of any/2, the computation will stop once a match has been found.(*)
Assuming $ex is the list of exceptions, then the filter we could use to solve the problem would be:
map(select(intersectq(.ou; $ex) | not))
For example, we could use an invocation along the lines suggested by Jeff:
$ jq --argjson ex '["4210910 3a"]' -f myfilter.jq input.json
Now you might ask: why use the any-within-any double loop rather than .[]-within-all double loop? The answer is efficiency, as can be seen using debug:
$ jq -n '[1,2,3] as $a | [1,1] as $b | all( $a[]; ($b[] | debug) != .)'
["DEBUG:",1]
["DEBUG:",1]
false
$ jq -n '[1,2,3] as $a | [1,1] as $b | all( $a[]; . as $x | all( $b[]; debug | $x != .))'
["DEBUG:",1]
false
(*) Footnote
Of course intersectq/2 as defined here is still O(m*n) and thus inefficient, but the main point of this post is to highlight the drawback of the .[]-within-all double loop.
Here is a solution that checks the .ou member of each element of the input using foreach and contains.
["4210910 3a"] as $list # adjust as necessary
| .[]
| foreach $list[] as $e (
.; .; if .ou | contains([$e]) then . else empty end
)
EDIT: I now realize a filter of the form foreach E as $X (.; .; R) can almost always be rewritten as E as $X | R so the above is really just
["4210910 3a"] as $list
| .[]
| $list[] as $e
| if .ou | contains([$e]) then . else empty end

Float Number mathematical operation

I have a .txt file as output from a program that contains some values of interest. The problem is that in certain cases these values have a strange format and i'm not able to apply mathematical operations on them.
E.g: My file contains these numbers:
-2.55622-3
-0.31-2
-3.225-2
...
These numbers in a normal math format should be:
-2.55622e-03
-0.31e-02
-3.225e-02
OF course, if i try to sum these values, this is the error:
can't use non-numeric string as operand of "+"
How can i operate with my original values? I have really no ideas.
Please remember that i can't change the values format of my .txt file
Another approach:
% exec cat file
-2.55622-3
-0.31-2
-3.225-2
42
foo
% set fh [open "file" r]
% while {[gets $fh line] != -1} {
puts -nonewline "\"$line\"\t=> "
if {! [regsub {[-+]\d+$} [string trim $line] {e&} num]} {set num $line}
puts -nonewline "$num\t=> "
puts [expr {$num + 0}]
}
"-2.55622-3 " => -2.55622e-3 => -0.00255622
"-0.31-2" => -0.31e-2 => -0.0031
"-3.225-2" => -3.225e-2 => -0.03225
"42" => 42 => 42
can't use non-numeric string as operand of "+"
"foo" => foo => %
Try to regsub digits with pattern (\d+.\d+).
set fp [open "somefile.txt" r]
set file_data [read $fp]
close $fp
set data [split $file_data "\n"]
foreach line $data {
set e "e"
set number [ regexp -all -inline {(\d+.\d+)} $line ]
regsub -all {(\d+.\d+)} $number "$number$e" number
}
For example read file and split by lines.
set e "e"
Find number by:
set number [ regexp -all -inline {(\d+.\d+)} $line ]
then replace
regsub -all {(\d+.\d+)} $number "$number$e" number

Computing custom histogram metrics to understand graph structure using SPARQL

I am looking to analyze the structure of a graph and one particular query I wanted to try out was to extract different combinations of subject type - edge type - object type in a graph.
This is a follow up from a couple of earlier questions of mine:
How to generate all triples that fit a particular node type or/and edge type using SPARQL query?
How to list and count the different types of node and edge entities in the graph data using SPARQL query?
For example: If there is a semantic graph with edge types(property/predicate types) as
IsCapitalOf
IsCityOf
HasPopulation
etc etc etc
And if the node types are like:
Cities
Countries
Rivers
Mountains
etc
Then I should get:
City->IsCapitalOf->Country 4 tuples
City->IsCityOf->Country 21 tuples
River->IsPartOf->Country 3
River->PassesThrough->City 11
and so on...
Note: No literals in object field as I want the unit subgraph pattern fitting (subjecttype edgetype objecttype)
To summarize: I think the way I'd approach this would be:
a) Compute distinct subject types in graph
b) Compute distinct edge types in graph
c) Compute distinct object type in graph
(a/b/c have been answered in my previous questions)
Now d) Generate all possible combinations(of subject type -> edge type -> object type(NO literals) and counts (like a histogram) of such patterns
Hope the question is articulated reasonably well.|
Edit: Adding sample data [few rows from the entire dataset] It is the yago dataset which is available publicly
<Alabama> rdf:type <wordnet_country_108544813> .
<Abraham_Lincoln> rdf:type <wordnet_president_110467179> .
<Aristotle> rdf:type <wordnet_writer_110794014> .
<Academy_Award_for_Best_Art_Direction> rdf:type <wordnet_award_106696483> .
<Academy_Award> rdf:type <wordnet_award_106696483> .
<Actrius> rdf:type <wordnet_movie_106613686> .
<Animalia_(book)> rdf:type <wordnet_book_106410904> .
<Ayn_Rand> rdf:type <wordnet_novelist_110363573> .
<Allan_Dwan> rdf:type <wikicategory_American_film_directors> .
<Algeria> rdf:type <wordnet_country_108544813> .
<Andre_Agassi> rdf:type <wordnet_player_110439851> .
<Austro-Asiatic_languages> rdf:type <wordnet_language_106282651> .
<Afroasiatic_languages> rdf:type <wordnet_language_106282651> .
<Andorra> rdf:type <wordnet_country_108544813> .
<Animal_Farm> rdf:type <wordnet_novelette_106368962> .
<Alaska> rdf:type <wordnet_country_108544813> .
<Aldous_Huxley> rdf:type <wordnet_writer_110794014> .
<Andrei_Tarkovsky> rdf:type <wordnet_film_maker_110088390> .
Suppose you've got data like this:
#prefix : <http://stackoverflow.com/q/24313367/1281433/> .
:City1 a :City .
:City2 a :City .
:Country1 a :Country .
:Country2 a :Country .
:Country3 a :Country .
:River1 a :River .
:River2 a :River .
:River3 a :River .
:City1 :isCapitalOf :Country1 .
:River1 :isPartOf :Country1, :Country2 .
:River2 :isPartOf :Country2, :Country3 .
:River1 :passesThrough :City1, :City2 .
:River2 :passesThrough :City2 .
Then this query gives you the kind results you want, I think:
prefix : <http://stackoverflow.com/q/24313367/1281433/>
select ?type1 ?p ?type2 (count(distinct *) as ?count) where {
[ a ?type1 ; ?p [ a ?type2 ] ]
}
group by ?type1 ?p ?type2
----------------------------------------------
| type1 | p | type2 | count |
==============================================
| :River | :passesThrough | :City | 3 |
| :City | :isCapitalOf | :Country | 1 |
| :River | :isPartOf | :Country | 4 |
----------------------------------------------
If you're not too comfortable with the [ … ] blank node syntax, it might help to see the expanded form:
SELECT ?type1 ?p ?type2 (count(distinct *) AS ?count)
WHERE
{ _:b0 rdf:type ?type1 .
_:b0 ?p _:b1 .
_:b1 rdf:type ?type2
}
GROUP BY ?type1 ?p ?type2
This only catches things that have types, though. If you want to include things that don't have rdf:types, you'd want to do
SELECT ?type1 ?p ?type2 (count(distinct *) AS ?count) {
?x ?p ?y
optional { ?x a ?type1 }
optional { ?y a ?type2 }
}
GROUP BY ?type1 ?p ?type2

Resources