How does a triplestore decide whether to add "background" triples? - r

I use a few different triplestores, and code in R and Scala. I think I'm seeing some differences in:
whether the triplestores include triples other than the ones I
explicitly loaded.
the point at which these "background" triples might be added.
Are there any general rules for whether supporting vocabularies need to be added, independent of the implementation technology?
Using Jena in R, via rrdf, I usually only see what I loaded:
library(rrdf)
turtle.input.string <-
"PREFIX prefix: <http://example.com/>
prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
prefix:subject rdf:type prefix:object"
jena.model <-
fromString.rdf(rdfContent = turtle.input.string, format = "TURTLE")
model.string <- asString.rdf(jena.model, format = "TURTLE")
cat(model.string)
This gives:
#prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
#prefix prefix: <http://example.com/> .
prefix:subject a prefix:object .
But sometimes triples from RDF and RDFS seems to appear when I add or remove triples afterwards. That's what "bothers" me the most, but I'm having trouble finding an example right now. If nobody knows what I mean, I'll dig something up later today.
When I use Blazegraph in Scala, via the OpenRDF Sesame library, I think I always get RDF, RDFS, and OWL "for free"
import java.util.Properties
import org.openrdf.query.QueryLanguage
import org.openrdf.rio._
import com.bigdata.journal._
import com.bigdata.rdf.sail._
object InjectionTest {
val jnl_fn = "sparql_tests.jnl"
def main(args: Array[String]): Unit = {
val props = new Properties()
props.put(Options.BUFFER_MODE, BufferMode.DiskRW)
props.put(Options.FILE, jnl_fn)
val sail = new BigdataSail(props)
val repo = new BigdataSailRepository(sail)
repo.initialize()
val cxn = repo.getConnection()
val resultStream = new java.io.ByteArrayOutputStream
val resultWriter = Rio.createWriter(RDFFormat.TURTLE, resultStream)
val ConstructString = "construct {?s ?p ?o} where {?s ?p ?o}"
cxn.prepareGraphQuery(QueryLanguage.SPARQL, ConstructString).evaluate(resultWriter)
var resString = resultStream.toString()
println(resString)
}
}
Even without adding any triples, the construct output includes blocks like this:
rdfs:isDefinedBy rdfs:domain rdfs:Resource ;
rdfs:range rdfs:Resource ;
rdfs:subPropertyOf rdfs:isDefinedBy , rdfs:seeAlso .

Are there any general rules for whether supporting vocabularies need to be added, independent of the implementation technology?
That depends on what inferencing scheme your triplestore claims to support. For a pure RDF store (no inferencing), no additional triples should be added at all.
Judging from that fragment you showed, the Blazegraph store you used has at least RDFS inferencing (and possibly partial OWL reasoning as well?) enabled. Note that this is store-specific, not framework, so it's not a Jena vs. Sesame thing: both frameworks support stores that either do or do not do reasoning. Of course, if you use either framework and use the "excluded inferred triples" option that they offer, the backing store should respect that config option and not include such inferred triples in the result.

Related

BertModel transformers outputs string instead of tensor

I'm following this tutorial that codes a sentiment analysis classifier using BERT with the huggingface library and I'm having a very odd behavior. When trying the BERT model with a sample text I get a string instead of the hidden state. This is the code I'm using:
import transformers
from transformers import BertModel, BertTokenizer
print(transformers.__version__)
PRE_TRAINED_MODEL_NAME = 'bert-base-cased'
PATH_OF_CACHE = "/home/mwon/data-mwon/paperChega/src_classificador/data/hugingface"
tokenizer = BertTokenizer.from_pretrained(PRE_TRAINED_MODEL_NAME,cache_dir = PATH_OF_CACHE)
sample_txt = 'When was I last outside? I am stuck at home for 2 weeks.'
encoding_sample = tokenizer.encode_plus(
sample_txt,
max_length=32,
add_special_tokens=True, # Add '[CLS]' and '[SEP]'
return_token_type_ids=False,
padding=True,
truncation = True,
return_attention_mask=True,
return_tensors='pt', # Return PyTorch tensors
)
bert_model = BertModel.from_pretrained(PRE_TRAINED_MODEL_NAME,cache_dir = PATH_OF_CACHE)
last_hidden_state, pooled_output = bert_model(
encoding_sample['input_ids'],
encoding_sample['attention_mask']
)
print([last_hidden_state,pooled_output])
that outputs:
4.0.0
['last_hidden_state', 'pooler_output']
While the answer from Aakash provides a solution to the problem, it does not explain the issue. Since one of the 3.X releases of the transformers library, the models do not return tuples anymore but specific output objects:
o = bert_model(
encoding_sample['input_ids'],
encoding_sample['attention_mask']
)
print(type(o))
print(o.keys())
Output:
transformers.modeling_outputs.BaseModelOutputWithPoolingAndCrossAttentions
odict_keys(['last_hidden_state', 'pooler_output'])
You can return to the previous behavior by adding return_dict=False to get a tuple:
o = bert_model(
encoding_sample['input_ids'],
encoding_sample['attention_mask'],
return_dict=False
)
print(type(o))
Output:
<class 'tuple'>
I do not recommend that, because it is now unambiguous to select a specific part of the output without turning to the documentation as shown in the example below:
o = bert_model(encoding_sample['input_ids'], encoding_sample['attention_mask'], return_dict=False, output_attentions=True, output_hidden_states=True)
print('I am a tuple with {} elements. You do not know what each element presents without checking the documentation'.format(len(o)))
o = bert_model(encoding_sample['input_ids'], encoding_sample['attention_mask'], output_attentions=True, output_hidden_states=True)
print('I am a cool object and you can acces my elements with o.last_hidden_state, o["last_hidden_state"] or even o[0]. My keys are; {} '.format(o.keys()))
Output:
I am a tuple with 4 elements. You do not know what each element presents without checking the documentation
I am a cool object and you can acces my elements with o.last_hidden_state, o["last_hidden_state"] or even o[0]. My keys are; odict_keys(['last_hidden_state', 'pooler_output', 'hidden_states', 'attentions'])
I faced the same issue while learning how to implement Bert. I noticed that using
last_hidden_state, pooled_output = bert_model(encoding_sample['input_ids'], encoding_sample['attention_mask'])
is the issue. Use:
outputs = bert_model(encoding_sample['input_ids'], encoding_sample['attention_mask'])
and extract the last_hidden state using
output[0]
You can refer to the documentation here which tells you what is returned by the BertModel

How to access and mutate node property value by the property name string in Cypher?

My goal is to access and mutate a property of a node in a cypher query where the name of the property to be accessed and mutated is an unknown string value.
For example, consider a command:
Find all nodes containing a two properties such that the name of the first property is lower-case and the name of the latter is the upper-case representation of the former. Then, propagate the value of the property with the lower-case string name to the value of the property with the upper-case name.
The particular case is easy:
MATCH ( node )
WHERE has(node.age) AND has(node.AGE) AND node.age <> node.AGE
SET node.AGE = node.age
RETURN node;
But I can't seem to find a way to implement the general case in a single request.
Specifically, I am unable to:
Access the property of the node with a string and a value
Mutate the property of the node with a string and a value
For the sake of clarity, I'll include my attempt to handle the general case. Where I failed to modify the property of the node I was able to generate the cypher for a command that would accomplish my end goal if it were executed in a subsequent transaction.
MERGE ( justToMakeSureOneExists { age: 14, AGE : 140 } ) WITH justToMakeSureOneExists
MATCH (node)
WHERE ANY ( kx IN keys(node) WHERE kx = LOWER(kx) AND ANY ( ky in keys(node) WHERE ky = UPPER(kx) ) )
REMOVE node.name_conflicts // make sure results are current
FOREACH(kx in keys(node) |
SET node.name_conflicts
= COALESCE(node.name_conflicts,[])
+ CASE kx
WHEN lower(kx)
THEN []
+ CASE WHEN any ( ky in keys(node) WHERE ky = upper(kx) )
THEN ['match (node) where id(node) = ' + id(node)+ ' and node.' + upper(kx) + ' <> node.' + kx + ' set node.' + upper(kx) + ' = node.' + kx + ' return node;']
ELSE [] END
ELSE []
END )
RETURN node,keys(node)
Afterthought: It seems like the ability to mutate a node property by property name would be a pretty common requirement, but the lack of obvious support for the feature leads me to believe that the feature was omitted deliberately? If this feature is indeed unsupported is there any documentation to explain why and if there is some conflict between the approach and the recommended way of doing things in Neo/Cypher?
There is some discussion going on regarding improved support for dynamic property access in Cypher. I'm pretty confident that we will see support for this in the future, but I cannot comment on a target release nor on a date.
As a workaround I'd recommend implementing that into a unmanaged extension.
It appears that the desired language feature was added to Cypher in Neo4j 2.3.0 under the name "dynamic property". The Cypher docs from version 2.3.0-up declare the following syntax group as a valid cypher expression:
A dynamic property: n["prop"], rel[n.city + n.zip], map[coll[0]].
This feature is documented for 2.3.0 but is absent from the previous version (2.2.9).
Thank you Neo4j Team!

Filtering tab completion in input task implementation

I'm currently implementing a SBT plugin for Gatling.
One of its features will be to open the last generated report in a new browser tab from SBT.
As each run can have a different "simulation ID" (basically a simple string), I'd like to offer tab completion on simulation ids.
An example :
Running the Gatling SBT plugin will produce several folders (named from simulationId + date of report generaation) in target/gatling, for example mysim-20140204234534, myothersim-20140203124534 and yetanothersim-20140204234534.
Let's call the task lastReport.
If someone start typing lastReport my, I'd like to filter out tab-completion to only suggest mysim and myothersim.
Getting the simulation ID is a breeze, but how can help the parser and filter out suggestions so that it only suggest an existing simulation ID ?
To sum up, I'd like to do what testOnly do, in a way : I only want to suggest things that make sense in my context.
Thanks in advance for your answers,
Pierre
Edit : As I got a bit stuck after my latest tries, here is the code of my inputTask, in it's current state :
package io.gatling.sbt
import sbt._
import sbt.complete.{ DefaultParsers, Parser }
import io.gatling.sbt.Utils._
object GatlingTasks {
val lastReport = inputKey[Unit]("Open last report in browser")
val allSimulationIds = taskKey[Set[String]]("List of simulation ids found in reports folder")
val allReports = taskKey[List[Report]]("List of all reports by simulation id and timestamp")
def findAllReports(reportsFolder: File): List[Report] = {
val allDirectories = (reportsFolder ** DirectoryFilter.&&(new PatternFilter(reportFolderRegex.pattern))).get
allDirectories.map(file => (file, reportFolderRegex.findFirstMatchIn(file.getPath).get)).map {
case (file, regexMatch) => Report(file, regexMatch.group(1), regexMatch.group(2))
}.toList
}
def findAllSimulationIds(allReports: Seq[Report]): Set[String] = allReports.map(_.simulationId).distinct.toSet
def openLastReport(allReports: List[Report], allSimulationIds: Set[String]): Unit = {
def simulationIdParser(allSimulationIds: Set[String]): Parser[Option[String]] =
DefaultParsers.ID.examples(allSimulationIds, check = true).?
def filterReportsIfSimulationIdSelected(allReports: List[Report], simulationId: Option[String]): List[Report] =
simulationId match {
case Some(id) => allReports.filter(_.simulationId == id)
case None => allReports
}
Def.inputTaskDyn {
val selectedSimulationId = simulationIdParser(allSimulationIds).parsed
val filteredReports = filterReportsIfSimulationIdSelected(allReports, selectedSimulationId)
val reportsSortedByDate = filteredReports.sorted.map(_.path)
Def.task(reportsSortedByDate.headOption.foreach(file => openInBrowser((file / "index.html").toURI)))
}
}
}
Of course, openReport is called using the results of allReports and allSimulationIds tasks.
I think I'm close to a functioning input task but I'm still missing something...
Def.inputTaskDyn returns a value of type InputTask[T] and doesn't perform any side effects. The result needs to be bound to an InputKey, like lastReport. The return type of openLastReport is Unit, which means that openLastReport will construct a value that will be discarded, effectively doing nothing useful. Instead, have:
def openLastReport(...): InputTask[...] = ...
lastReport := openLastReport(...).evaluated
(Or, the implementation of openLastReport can be inlined into the right hand side of :=)
You probably don't need inputTaskDyn, but just inputTask. You only need inputTaskDyn if you need to return a task. Otherwise, use inputTask and drop the Def.task.

Transform bag of key-value tuples to map in Apache Pig

I am new to Pig and I want to convert a bag of tuples to a map with specific value in each tuple as key. Basically I want to change:
{(id1, value1),(id2, value2), ...} into [id1#value1, id2#value2]
I've been looking around online for a while, but I can't seem to find a solution. I've tried:
bigQMap = FOREACH bigQFields GENERATE TOMAP(queryId, queryStart);
but I end up with a bag of maps (e.g. {[id1#value1], [id2#value2], ...}), which is not what I want. How can I build up a map out of a bag of key-value tuple?
Below is the specific script I'm trying to run, in case it's relevant
rawlines = LOAD '...' USING PigStorage('`');
bigQFields = FOREACH bigQLogs GENERATE GFV(*,'queryId')
as queryId, GFV(*, 'queryStart')
as queryStart;
bigQMap = ?? how to make a map with queryId as key and queryStart as value ?? ;
TOMAP takes a series of pairs and converts them into the map, so it is meant to be used like:
-- Schema: A:{foo:chararray, bar:int, bing:chararray, bang:int}
-- Data: (John, 27, Joe, 30)
B = FOREACH A GENERATE TOMAP(foo, bar, bing, bang) AS m ;
-- Schema: B:{m: map[]}
-- Data: (John#27,Joe#30)
So as you can see the syntax does not support converting a bag to a map. As far as I know there is no way to convert a bag in the format you have to map in pure pig. However, you can definitively write a java UDF to do this.
NOTE: I'm not too experienced with java, so this UDF can easily be improved on (adding exception handling, what happens if a key added twice etc.). However, it does accomplish what you need it to.
package myudfs;
import java.io.IOException;
import org.apache.pig.EvalFunc;
import java.util.Map;
import java.util.HashMap;
import java.util.Iterator;
import org.apache.pig.data.Tuple;
import org.apache.pig.data.DataBag;
public class ConvertToMap extends EvalFunc<Map>
{
public Map exec(Tuple input) throws IOException {
DataBag values = (DataBag)input.get(0);
Map<Object, Object> m = new HashMap<Object, Object>();
for (Iterator<Tuple> it = values.iterator(); it.hasNext();) {
Tuple t = it.next();
m.put(t.get(0), t.get(1));
}
return m;
}
}
Once you compile the script into a jar, it can be used like:
REGISTER myudfs.jar ;
-- A is loading some sample data I made
A = LOAD 'foo.in' AS (foo:{T:(id:chararray, value:chararray)}) ;
B = FOREACH A GENERATE myudfs.ConvertToMap(foo) AS bar;
Contents of foo.in:
{(open,apache),(apache,hadoop)}
{(foo,bar),(bar,foo),(open,what)}
Output from B:
([open#apache,apache#hadoop])
([bar#foo,open#what,foo#bar])
Another approach is to use python to create the UDF:
myudfs.py
#!/usr/bin/python
#outputSchema("foo:map[]")
def BagtoMap(bag):
d = {}
for key, value in bag:
d[key] = value
return d
Which is used like this:
Register 'myudfs.py' using jython as myfuncs;
-- A is still just loading some of my test data
A = LOAD 'foo.in' AS (foo:{T:(key:chararray, value:chararray)}) ;
B = FOREACH A GENERATE myfuncs.BagtoMap(foo) ;
And produces the same output as the Java UDF.
BONUS:
Since I don't like maps very much, here is a link explaining how the functionality of a map can be replicated with just key value pairs. Since your key value pairs are in a bag, you'll need to do the map-like operations in a nested FOREACH:
-- A is a schema that contains kv_pairs, a bag in the form {(id, value)}
B = FOREACH A {
temp = FOREACH kv_pairs GENERATE (key=='foo'?value:NULL) ;
-- Output is like: ({(),(thevalue),(),()})
-- MAX will pull the maximum value from the filtered bag, which is
-- value (the chararray) if the key matched. Otherwise it will return NULL.
GENERATE MAX(temp) as kv_pairs_filtered ;
}
I ran into the same situation so I submitted a patch that just got accepted: https://issues.apache.org/jira/browse/PIG-4638
This means that what you wanted is a core part starting with pig 0.16.

AX 2009: Adjusting User Group Length

We're looking into refining our User Groups in Dynamics AX 2009 into more precise and fine-tuned groupings due to the wide range of variability between specific people within the same department. With this plan, it wouldn't be uncommon for majority of our users to fall user 5+ user groups.
Part of this would involve us expanding the default length of the User Group ID from 10 to 40 (as per Best Practice for naming conventions) since 10 characters don't give us enough room to adequately name each group as we would like (again, based on Best Practice Naming Conventions).
We have found that the main information seems to be obtained from the UserGroupInfo table, but that table isn't present under the Data Dictionary (it's under the System Documentation, so unavailable to be changed that way by my understanding). We've also found the UserGroupName EDT, but that is already set at 40 characters. The form itself doesn't seem to restricting the length of the field either. We've discussed changing the field on the SQL directly, but again my understanding is that if we do a full synchronization it would overwrite this change.
Where can we go to change this particular setting, or is it possible to change?
The size of the user group id is defined as as system extended data type (here \System Documentation\Types\userGroupId) and you cannot change any of the properties including the size 10 length.
You should live with that, don't try to fake the system using direct SQL changes. Even if you did that, AX would still believe that length is 10.
You could change the SysUserInfo form to show the group name only. The groupId might as well be assigned by a number sequence in your context.
I wrote a job to change the string size via X++ and it works for EDTs, but it can't seem to find the "userGroupId". From the general feel of AX I get, I'd be willing to guess that they just have it in a different location, but maybe not. I wonder if this could be tweaked to work:
static void Job9(Args _args)
{
#AOT
TreeNode treeNode;
Struct propertiesExt;
Map mapNewPropertyValues;
void setTreeNodePropertyExt(
Struct _propertiesExt,
Map _newProperties
)
{
Counter propertiesCount;
Array propertyInfoArray;
Struct propertyInfo;
str propertyValue;
int i;
;
_newProperties.insert('IsDefault', '0');
propertiesCount = _propertiesExt.value('Entries');
propertyInfoArray = _propertiesExt.value('PropertyInfo');
for (i = 1; i <= propertiesCount; i++)
{
propertyInfo = propertyInfoArray.value(i);
if (_newProperties.exists(propertyInfo.value('Name')))
{
propertyValue = _newProperties.lookup(propertyInfo.value('Name'));
propertyInfo.value('Value', propertyValue);
}
}
}
;
treeNode = TreeNode::findNode(#ExtendedDataTypesPath);
// This doesn't seem to be able to find the system type
//treeNode = treeNode.AOTfindChild('userGroupId');
treeNode = treeNode.AOTfindChild('AccountCategory');
propertiesExt = treeNode.AOTgetPropertiesExt();
mapNewPropertyValues = new Map(Types::String, Types::String);
mapNewPropertyValues.insert('StringSize', '30');
setTreeNodePropertyExt(propertiesExt, mapNewPropertyValues);
treeNode.AOTsetPropertiesExt(propertiesExt);
treeNode.AOTsave();
info("Done");
}

Resources