NetworkMapCache is empty in MockNetwork tests - corda

I'm writing some lightweight flow tests with mocked everything and I ran into error that on all nodes NetworkMapService contains only the node itself. Identity service from the other hand contains all 3 nodes participating in a test.
net = MockNetwork()
issuer = net.createNode(legalName = CHARLIE.name)
alice = net.createNode(legalName = ALICE.name)
bob = net.createNode(legalName = BOB.name)
issuer.registerInitiatedFlow(IssueClaimFlow.Issuer::class.java)
alice.registerInitiatedFlow(VerifyClaimFlow.Prover::class.java)
MockServices.makeTestDatabaseAndMockServices(createIdentityService = { InMemoryIdentityService(listOf(ALICE_IDENTITY, BOB_IDENTITY, CHARLIE_IDENTITY), emptySet(), DEV_TRUST_ROOT) } )
net.registerIdentities()
net.runNetwork()
In this case flow goes well until first sendAndRecieve() call. There I get:
12:28:12.832 [Mock network] WARN net.corda.flow.[8f685c46-9ab6-4d64-b3f2-6b7476813c3b] - Terminated by unexpected exception
java.lang.IllegalArgumentException: Don't know about party C=ES,L=Madrid,O=Alice Corp
Funny thing that test is still finishes green (no useful work done thou). But its probably topic for another question.
I can overcome it with setting up cache manually like this:
alice.services.networkMapCache.addNode(issuer.info)
bob.services.networkMapCache.addNode(alice.info)
But is this correct way to go? I don't see something like this in samples or wherever.

If you look at the definition of MockNetwork.createNode, the networkMapAddress defaults to null.
Instead, you should use MockNetwork.createSomeNodes, which creates a network map node for you, then sets that node as the network map for every subsequent node it creates.
Here's an example from the CorDapp Example:
network = MockNetwork()
val nodes = network.createSomeNodes(2)
a = nodes.partyNodes[0]
b = nodes.partyNodes[1]
You can see the full example here: https://github.com/corda/cordapp-example/blob/release-V2/kotlin-source/src/test/kotlin/com/example/flow/IOUFlowTests.kt.

Related

AzureML Dataset.File.from_files creation extremely slow even with 4 files

I have a few thousand of video files in my BlobStorage, which I set it as a datastore.
This blob storage receives new files every night and I need to split the data and register each split as a new version of AzureML Dataset.
This is how I do the data split, simply getting the blob paths and splitting them.
container_client = ContainerClient.from_connection_string(AZ_CONN_STR,'keymoments-clips')
blobs = container_client.list_blobs('soccer')
blobs = map(lambda x: Path(x['name']), blobs)
train_set, test_set = get_train_test(blobs, 0.75, 3, class_subset={'goal', 'hitWoodwork', 'penalty', 'redCard', 'contentiousRefereeDecision'})
valid_set, test_set = split_data(test_set, 0.5, 3)
train_set, test_set, valid_set are just nx2 numpy arrays containing blob storage path and class.
Here is when I try to create a new version of my Dataset:
datastore = Datastore.get(workspace, 'clips_datastore')
dataset_train = Dataset.File.from_files([(datastore, b) for b, _ in train_set[:4]], validate=True, partition_format='**/{class_label}/*.mp4')
dataset_train.register(workspace, 'train_video_clips', create_new_version=True)
How is it possible that the Dataset creation seems to hang for an indefinite time even with only 4 paths?
I saw in the doc that providing a list of Tuple[datastore, path] is perfectly fine. Do you know why?
Thanks
Do you have your Azure Machine Learning Workspace and your Azure Storage Account in different Azure Regions? If that's true, latency may be a contributing factor with validate=True.
Another possibility may be slowness in the way datastore paths are resolved. This is an area where improvements are being worked on.
As an experiment, could you try creating the dataset using a url instead of datastore? Let us know if that makes a difference to performance, and whether it can unblock your current issue in the short term.
Something like this:
dataset_train = Dataset.File.from_files(path="https://bloburl/**/*.mp4?accesstoken", validate=True, partition_format='**/{class_label}/*.mp4')
dataset_train.register(workspace, 'train_video_clips', create_new_version=True)
I'd be interested to see what happens if you run the dataset creation code twice in the same notebook/script. Is it faster the second time? I ask because it might be an issue with the .NET core runtime startup (which would only happen on the first time you run the code)
EDIT 9/16/20
While it doesn't seem to make sense that .NET core invoked when not data is moving, is suspect it is the validate=True part of the param that requires that all the data be inspected (which can computationally expensive). I'd be interested to see what happens if that param is False

Slackr: x Problem with `id` - Cannot send messages

I am not an admin so I can't change the scopes. I can send slackr_bot messages to a channel I set up in the creation of the app in UI but doing the below does not work. Has anyone found a solution to this?
I created a txt file called: test.txt
Within that txt file it looks like this:
api_token: xxxxxxxxxxxx
channel: #channel_name
username: myusername
incoming_webhook_url: https://hooks.slack.com/services/xxxxxxxxxxx/xxxxxxxxxxxxx
Then I want to simply send a message but eventually I would like to run the function
ggslackr(qplot(mpg, wt, data=mtcars))
slackr_setup(config_file = "test.txt")
my_message <- paste("I'm sending a Slack message at", Sys.time(), "from my R script.")
slackr_msg(my_message, channel = "#channel_name", as_user=F)
Here is the error message:
Error: Join columns must be present in data.
x Problem with `id`.
Run `rlang::last_error()` to see where the error occurred.
In addition: Warning message:
In structure(vars, groups = group_vars, class = c("dplyr_sel_vars", :
Calling 'structure(NULL, *)' is deprecated, as NULL cannot have attributes.
Consider 'structure(list(), *)' instead.
Edit #2:
Okay, I learned some things regarding packages. If I had to do this over, I'd have gone to their github repo and read the issue tracker.
The reason is that it appears that slackr has a few issues related to changes in Slack's API.
And also since there has been a large updating of R (version 4.x) a lot of packages got broken.
My sense is that our issue is with a line of code inside a slackr function (slackr_util.r--iirc) that calls a dplyr join that is looking for a particular id that does not exist.
So, I'm going to watch the issue tracker and see what comes of it.
Edit: Try slackr_bot(my_message,channel = "#general")
worked as advertised!
But ggslackr continues to fail.
I'm having the same issue. I've found in another thread a debugging start:
`rlang::last_error()`
When I run that,
Backtrace:
1. slackr::slackr_msg(my_message, channel = "#general")
5. slackr::slackr_chtrans(channel)
6. slackr::slackr_ims(api_token)
8. dplyr:::left_join.data.frame(users, ims, by = "id", copy = TRUE)
9. dplyr:::join_mutate(...)
10. dplyr:::join_cols(...)
11. dplyr:::standardise_join_by(by, x_names = x_names, y_names = y_names)
12. dplyr:::check_join_vars(by$y, y_names)
So, step 8 there is a join effort by id, which I suppose this implies that 'id' is missing.
yet, if I run from github issue tracker : slackr::slackrSetup(echo=TRUE) I get the following:
{
"SLACK_CHANNEL": ["#general"],
"SLACK_USERNAME": ["slackr_brian"],
"SLACK_ICON_EMOJI": ["NA"],
"SLACK_INCOMING_URL_PREFIX": ["https://hooks.xxxxxxx"],
"SLACK_API_TOKEN": ["token secret"]
}
I'm not sure where to go from here as the issue tracker conversation makes mention of confirming webhooks going to the correct channel and becomes very user specific.
So, that's as far as I have gotten.

Cannot use method 'updateProp' in RNeo4j package

I'm using RNeo4j package together with igraph to calculate betweenness centrality and write back to Neo4j database.
It can calculate perfectly without any problem connecting with Neo4j. After I'd got vector named with id of its node and contained its betweenness centrality value, I tried to update only one node and I got problem with 'updateProp' method.
The error I got is this.
Error in UseMethod("updateProp") :
no applicable method for 'updateProp' applied to an object of class "list"
And this is some part of my code that stuck.
...
bet <- betweenness(g)
alice = getLabeledNodes(neo4j, "User", id = as.integer(names(bet[1])))
# returned valid node
# following line got the mentioned error.
alice = updateProp(alice,betweenness_centrality = as.numeric(bet[[1]]))
I also tried other way like this without any luck.
(Also hardcoded the value to be 0 but it didn't work either)
newProp = list(betweenness_centrality = bet[[1]])
alice = updateProp(alice,newProp)
p.s. for my reference website http://rpackages.ianhowson.com/cran/RNeo4j/man/updateProp.html .
Thank you in advance.
updateProp expects the first argument to be a node. You're passing it a list. It should work if you access the first node of that list.
bet <- betweenness(g)
alice = getLabeledNodes(neo4j, "User", id = as.integer(names(bet[1])))
alice = alice[[1]]
# returned valid node
# following line got the mentioned error.
alice = updateProp(alice, betweenness_centrality = as.numeric(bet[[1]]))

Creating graph in titan from data in csv - example wiki.Vote gives error

I am new to Titan - I loaded titan and successfully ran GraphOfTheGods example including queries given. Next I went on to try bulk loading csv file to create graph and followed steps in Powers of ten - Part 1 http://thinkaurelius.com/2014/05/29/powers-of-ten-part-i/
I am getting an error in loading wiki-Vote.txt
gremlin> g = TitanFactory.open("/tmp/1m") Backend shorthand unknown: /tmp/1m
I tried:
g = TitanFactory.open('conf/titan-berkeleydb-es.properties’)
but get an error in the next step in load-1m.groovy
==>titangraph[berkeleyje:/titan-0.5.4-hadoop2/conf/../db/berkeley] No signature of method: groovy.lang.MissingMethodException.makeKey() is applicable for argument types: () values: [] Possible solutions: every(), any()
Any hints what to do next? I am using groovy for the first time. what kind of groovy expertise needed for working with gremlin
That blog post is meant for Titan 0.4.x. The API shifted when Titan went to 0.5.x. The same principles discussed in the posts generally apply to data loading but the syntax is different in places. The intention is to update those posts in some form when Titan 1.0 comes out with full support of TinkerPop3. Until then, you will need to convert those code examples to the revised API.
For example, an easy way to create a berkeleydb database is with:
g = TitanFactory.build()
.set("storage.backend", "berkeleyje")
.set("storage.directory", "/tmp/1m")
.open();
Please see the docs here. Then most of the schema creation code (which is the biggest change) is now described here and here.
After much experimenting today, I finally figured it out. A lot of changes were needed:
Use makePropertyKey() instead of makeKey(), and makeEdgeLabel() instead of makeLabel()
Use cardinality(Cardinality.SINGLE) instead of unique()
Building the index is quite a bit more complicated. Use the management system instead of the graph both to make the keys and labels, as well as build the index (see https://groups.google.com/forum/#!topic/aureliusgraphs/lGA3Ye4RI5E)
For posterity, here's the modified script that should work (as of 0.5.4):
g = TitanFactory.build().set("storage.backend", "berkeleyje").set("storage.directory", "/tmp/1m").open()
m = g.getManagementSystem()
k = m.makePropertyKey('userId').dataType(String.class).cardinality(Cardinality.SINGLE).make()
m.buildIndex('byId', Vertex.class).addKey(k).buildCompositeIndex()
m.makeEdgeLabel('votesFor').make()
m.commit()
getOrCreate = { id ->
def p = g.V('userId', id)
if (p.hasNext()) {
p.next()
} else {
g.addVertex([userId:id])
}
}
new File('wiki-Vote.txt').eachLine {
if (!it.startsWith("#")){
(fromVertex, toVertex) = it.split('\t').collect(getOrCreate)
fromVertex.addEdge('votesFor', toVertex)
}
}
g.commit()

Titan Graph Queries taking too long to execute

I have a problem with the executing speed of Titan queries.
To be more specific:
I created a property file for my graph using BerkeleyJe which is looking like this:
storage.backend=berkeleyje
storage.directory=/finalGraph_script/graph
Afterwards, i opened the Gremlin.bat to open my Graph.
I set up all the neccessary Index Keys for my nodes:
m = g.getManagementSystem();
username = m.makePropertyKey('username').dataType(String.class).make()
m.buildIndex('byUsername',Vertex.class).addKey(username).unique().buildCompositeIndex()
m.commit()
g.commit()
(all other keys are created the same way...)
I imported a csv file containing about 100 000 lines, each line is producing at least 2 nodes and some edges. All this is done via Batchloading.
That works without a Problem.
Then i execute a groupBy query which is looking like that:
m = g.V.has("imageLink").groupBy{it.imageLink}{it.in("is_on_image").out("is_species")}{it._().species.groupCount().cap.next()}.cap.next()
With this query i want for every node with the property key "imageLink" the number of the different "species". "Species" are also nodes, and can be called by going back the edge "is_on_image" and following the edge "is_species".
Well this is also working like a charm, for my recent nodes. This query is taking about 2 minutes on my local PC.
But now to the problem.
My whole dataset is a csv with 10 million entries. The structure is the same as above, and each line is also creating at least 2 nodes and some edges.
With my local PC i cant even import this set, causing an Memory Exception after 3 days of loading.
So I tried the same on a server with much more RAM and memory. There the Import works, and takes about 1 day. But the groupBy failes after about 3 days.
I actually dont know if the groupBy itself fails, or just the Connection to the Server after that long time.
So my first Question:
In my opinion about 15 million nodes shouldn't be that big deal for a graph database, should it?
Second Question:
Is it normal that it takes so long? Or is there anyway to speed it up using indices? I configured the indices as listet above :(
I don't know which exact information you need for helping me, but please just tell me what you need in addition to that.
Thanks a lot!
Best regards,
Ricardo
EDIT 1: The way im loading the CSV in the Graph:
I'm using this code, i deleted some unneccassry properties, which are also set an property for some nodes, loaded the same way.
bg = new BatchGraph(g, VertexIDType.STRING, 10000)
new File("annotation_nodes_wNothing.csv").eachLine({ final String line ->def (annotationId,species,username,imageLink) = line.split('\t')*.trim();def userVertex = bg.getVertex(username) ?: bg.addVertex(username);def imageVertex = bg.getVertex(imageLink) ?: bg.addVertex(imageLink);def speciesVertex = bg.getVertex(species) ?: bg.addVertex(species);def annotationVertex = bg.getVertex(annotationId) ?: bg.addVertex(annotationId);userVertex.setProperty("username",username);imageVertex.setProperty("imageLink", imageLink);speciesVertex.setProperty("species",species);annotationVertex.setProperty("annotationId", annotationId);def classifies = bg.addEdge(null, userVertex, annotationVertex, "classifies");def is_on_image = bg.addEdge(null, annotationVertex, imageVertex, "is_on_image");def is_species = bg.addEdge(null, annotationVertex, speciesVertex, "is_species");})
bg.commit()
g.commit()

Resources