Updating edge attributes of a large dense graph - graph

I have a large and dense graph whose edge attributes are updated using the following code. Briefly, I set the edge attributes based on some calculations on the values fetched from other dictionaries (degdict, pifeadict, nodeneidict etc). My smallest graph has 15 million edges. When the execution reaches this stage, the CPU usage dips as low as 10% and memory hikes up to 69%. For large graphs, my process is getting killed because of 90% memory usage. I am not sure where things are going wrong.
In addition to fixing this memory problem, I also need to speed up this loop, if possible - perhaps, a parallel solution to update the edge attributes. Please suggest solutions.
for fauth, sauth in Gcparam.edges_iter():
first_deg = degdict[fauth]
sec_deg = degdict[sauth]
paval = float(first_deg*sec_deg)/float(currmaxdeg * \
currmaxdeg)
try:
f2 = dmpdict[first_deg][sec_deg]
except KeyError:
f2 = 0.0
try:
pival = pifeadict[first_deg][sec_deg]
except KeyError:
pival = 0.0
delDval = float(abs(first_deg - sec_deg))/(float(currmaxdeg)*delT)
f5 = calc_comm_kws(fauth, sauth, kwsdict)
avg_ndeg = getAvgNeiDeg(fauth, sauth, nodeneidict, currmaxdeg)/delT
prop = getPropensity(fauth, sauth, nodeneidict, currmaxdeg, Gparam)/delT
tempdict = {'years':[year], 'pa':[paval],\
'dmp':[f2], 'pi':[pival], 'deld':[delDval],\
'delndeg':[avg_ndeg], 'delprop' :[prop],\
'ck' :[f5]
}
Gcparam[fauth][sauth].update(tempdict)

You can estimate the amount of storage you need for the data on each edge like this:
In [1]: from pympler.asizeof import asizeof
In [2]: tempdict = {'years':[1900], 'pa':[1.0],\
'dmp':[2.0], 'pi':[3.0], 'deld':[7],\
'delndeg':[3.4], 'delprop' :[7.5],\
'ck' :[22.0]
}
In [3]: asizeof(tempdict)
Out[3]: 1000
So it looks like 1000 bytes is a lower bound for what you are doing. Multiply that by the number of edges for the total.
NetworkX also has some overhead for the node and edge data structures which depends on what type of object you use for nodes. Integers are smallest.

Related

JanusGraph BulkLoad reports NoSuchElement Error, nodes loaded but no edges

I am exporing ways of loading csv in JanusGraph. I tried the grateful-dead example given by the official document and it worked just fine. Approach as follows:
hadoop-load-csv.properties
gremlin.graph=org.apache.tinkerpop.gremlin.hadoop.structure.HadoopGraph
gremlin.hadoop.graphReader=org.apache.tinkerpop.gremlin.hadoop.structure.io.script.ScriptInputFormat
gremlin.hadoop.scriptInputFormat.script=./data/script-input-grateful-dead.groovy
gremlin.hadoop.inputLocation=./data/grateful-dead.txt
gremlin.hadoop.graphWriter=org.apache.hadoop.mapreduce.lib.output.NullOutputFormat
gremlin.hadoop.graphOutputFormat=org.apache.hadoop.mapreduce.lib.output.NullOutputFormat
gremlin.hadoop.outputLocation=output
gremlin.hadoop.jarsInDistributedCache=true
janusgraph-grateful.properties
gremlin.graph=org.janusgraph.core.JanusGraphFactory
storage.hbase.table=grateful
storage.hostname=
run.groovy
outputGraphConfig = [path to janusgraph-grateful.properties]
:load ./data/grateful-dead-jansugraph-schema.groovy
graph = JanusGraphFactory.open(outputGraphConfig)
defineGratefulDeadSchema(graph)
graph.close()
readGraph = GraphFactory.open([path to hadoop-load-csv.properties])
blvp = BulkLoaderVertexProgram.build().writeGraph(outputGraphConfig).create(readGraph)
readGraph.compute(SparkGraphComputer).program(blvp).submit().get()
g = GraphFactory.open(outputGraphConfig).traversal()
g.V().count()
g.E().count()
After that I dropped the whole graph, subsampled the data and loaded it again, and it failed.
1,song,HEY BO DIDDLEY,cover,5 followedBy,2,1|followedBy,3,2|followedBy,4,1|followedBy,5,1|followedBy,6,1
2,song,IM A MAN,cover,1 followedBy,1,1
3,song,NOT FADE AWAY,cover,531 followedBy,5,572 followedBy,5,40|followedBy,1,2
4,song,BERTHA,original,394 followedBy,10,4 followedBy,1,1
5,song,GOING DOWN THE ROAD FEELING BAD,cover,293
6,song,MONA,cover,1 sungBy,3|writtenBy,5 followedBy,1,1|followedBy,2,1
7,song,WHERE HAVE THE HEROES GONE,,0 followedBy,8,1 followedBy,9,1
8,song,OH BOY,cover,2 followedBy,9,1|followedBy,3,1|followedBy,7,1|sungBy,5|writtenBy,4 followedBy,1,1|followedBy,7,1|followedBy,6,1
800,song,WINING BOY BLUES,cover,1 sungBy,5|writtenBy,4
9,song,HERE COMES SUNSHINE,original,65 followedBy,10,1 followedBy,6,2
10,song,HERE COMES SUNSHINE,original,65
I got a NoSuchElement Error and when I looked into the graph, g.V().count() returns 10 while g.E().count() returns 0.
Does anyone know what is happening? It would be very kind of you to give me some advice.

How find shortest path in transport connection by using Cypher query (neo4j) over GTFS data?

I am new in neo4j, I created a graph following this steps, based on a data model from GTFS. I would like to find all the shortest indirect routes in the graph (with transfers).
Data model of graph database contains 4 entities: Route, Trip, Stop, Stoptime. Here is a screenshot of db.scheme().
Based on query which wrote Bruggen, I modified it for my use:
MATCH
(from:Stop {code:'VBR'})--(st_from:Stoptime),
(to:Stop {code:'VIR'})--(st_to:Stoptime),
p1=((st_from)-[:PRECEDES*]->(st_midway_arr:Stoptime)),
(st_midway_arr)--(midway:Stop),
(midway)--(st_midway_dep:Stoptime),
p2=((st_midway_dep)-[:PRECEDES*]->(st_to))
WHERE
st_from.departure_time > '00:00'
AND st_from.departure_time < '23:00'
AND st_midway_arr.arrival_time > st_from.departure_time
AND st_midway_dep.departure_time > st_midway_arr.arrival_time
AND st_to.arrival_time > st_midway_dep.departure_time
RETURN
from,st_from,to,st_to,p1,p2,midway
order by (st_to.arrival_time_int-st_from.departure_time_int) ASC
limit 1;
This query is not using the shortest path, and it takes in average 30s to find a path, but the output of the query is good.
So I tried to write another query, with method allshortestpaths, it really fast (0,3s). But it returns me also trips which run in a different direction (VIR -> VBR)... another problem is the timing od that connection.
Could you help me, how to access to the transfer node (Station) when I am using allshortestpath method? I want to write a condition for timing and stop_sequence to be sure that's the right direction.
match (from:Stop {code:'VBR'}),(to:Stop {code:'VIR'})
with from,to
match p = allshortestpaths((from)-[*]-(to))
where NONE (x in relationships(p) where type(x)="OPERATES")
return p
limit 10;
match (from:Stop {code:'VBR'}),(to:Stop {code:'VIR'})
with from,to
match p = allshortestpaths((from)-[*]->(to)) // here you needed you give the direction to make sure paths are from 'VBR' to 'VIR'
where NONE (x in relationships(p) where type(x)="OPERATES")
return p
limit 10;
Next , if you want to see the nodes in the path , then you can use the nodes(p)
match (from:Stop {code:'VBR'}),(to:Stop {code:'VIR'})
with from,to
match p = allshortestpaths((from)-[*]->(to))
where NONE (x in relationships(p) where type(x)="OPERATES")
AND ALL(node in nodes WHERE node = from OR node = to OR YOUR CONDTION ON TRANSFER NODE)
limit 10

Filter vertices on several properties - Julia

I am working on julia with the Metagraphs.jl library.
In order to conduct an optimization problem, I would like to get the set/list of edges in the graph that point to a special set of vertices having 2 particular properties in common.
My first guess was to first get the set/list of vertices. But I am facing a first issue which is that the filter_vertices function doesn't seem to accept to apply a filter on more than one property.
Here is below an example of what I would like to do:
g = DiGraph(5)
mg = MetaDiGraph(g, 1.0)
add_vertex!(mg)
add_edge!(mg,1,2)
add_edge!(mg,1,3)
add_edge!(mg,1,4)
add_edge!(mg,2,5)
add_edge!(mg,3,5)
add_edge!(mg,5,6)
add_edge!(mg,4,6)
set_props!(mg,3,Dict(:prop1=>1,:prop2=>2))
set_props!(mg,1,Dict(:prop1=>1,:prop2=>0))
set_props!(mg,2,Dict(:prop1=>1,:prop2=>0))
set_props!(mg,4,Dict(:prop1=>0,:prop2=>2))
set_props!(mg,5,Dict(:prop1=>0,:prop2=>2))
set_props!(mg,6,Dict(:prop1=>0,:prop2=>0))
col=collect(filter_vertices(mg,:prop1,1,:prop2,2))
And I want col to find vertex 3 and no others.
But the filter_vertices would only admit one property at a time and then it makes it more costly to do a loop with 2 filters and then try to compare in order to sort a list with the vertices that have both properties.
Considering the size of my graph I would like to avoid defining this set with multiple and costly loops. Would any one of you have an idea of how to solve this issue in an easy and soft way?
I ended up making this to answer my own question:
fil3=Array{Int64,1}()
fil1=filter_vertices(mg,:prop1,1)
for f in fil1
if get_prop(mg,f,:prop2)==2
push!(fil3,f)
end
end
println(fil3)
But tell me if you get anything more interesting
Thanks for your help!
Please provide a minimal working example in a way we can simply copy and paste, and start right away. Please also indicate where the problem occurs in the code. Below is an example for your scenario:
Pkg.add("MetaGraphs")
using LightGraphs, MetaGraphs
g = DiGraph(5)
mg = MetaDiGraph(g, 1.0)
add_vertex!(mg)
add_edge!(mg,1,2)
add_edge!(mg,1,3)
add_edge!(mg,1,4)
add_edge!(mg,2,5)
add_edge!(mg,3,5)
add_edge!(mg,5,6)
add_edge!(mg,4,6)
set_props!(mg,3,Dict(:prop1=>1,:prop2=>2))
set_props!(mg,1,Dict(:prop1=>1,:prop2=>0))
set_props!(mg,2,Dict(:prop1=>1,:prop2=>0))
set_props!(mg,4,Dict(:prop1=>0,:prop2=>2))
set_props!(mg,5,Dict(:prop1=>0,:prop2=>2))
set_props!(mg,6,Dict(:prop1=>0,:prop2=>0))
function my_vertex_filter(g::AbstractMetaGraph, v::Integer, prop1, prop2)
return has_prop(g, v, :prop1) && get_prop(g, v, :prop1) == prop1 &&
has_prop(g, v, :prop2) && get_prop(g, v, :prop2) == prop2
end
prop1 = 1
prop2 = 2
col = collect(filter_vertices(mg, (g,v)->my_vertex_filter(g,v,prop1,prop2)))
# returns Int[3]
Please check ?filter_vertices --- it gives you a hint on what/how to write to define your custom filter.
EDIT. For filtering the edges, you can have a look at ?filter_edges to see what you need to achieve the edge filtering. Append the below code excerpt to the solution above to get your results:
function my_edge_filter(g, e, prop1, prop2)
v = dst(e) # get the edge's destination vertex
return my_vertex_filter(g, v, prop1, prop2)
end
myedges = collect(filter_edges(mg, (g,e)->my_edge_filter(g,e,prop1,prop2)))
# returns [Edge 1 => 3]
I found this solution:
function filter_function1(g,prop1,prop2)
fil1=filter_vertices(g,:prop1,prop1)
fil2=filter_vertices(g,:prop2,prop2)
filter=intersect(fil1,fil2)
return filter
end
This seems to work and is quite easy to implement.
Just I don't know if the filter_vertices function is taking a lot of computational power.
Otherwise a simple loop like this seems to also work:
function filter_function2(g,prop1,prop2)
filter=Set{Int64}()
fil1=filter_vertices(g,:prop1,prop1)
for f in fil1
if get_prop(g,f,:prop2)==prop2
push!(filter,f)
end
end
return filter
end
I am open to any other answers if you have some more elegant ones.

math.random and corona incorrect syntax?

I've been doing this for 2 days and getting nowhere.
I want to select 4 balls from the array and randomly drop them, but for the system to remeber and ask for input later.
I'm stuck on the first bit:
local ballImages =
{
display.newImage("ball1.png"),
display.newImage("ball2.png"),
display.newImage("ball3.png"),
display.newImage("ball4.png"),
display.newImage("ball5.png"),
display.newImage("ball6.png"),
display.newImage("ball7.png"),
display.newImage("ball8.png"),
display.newImage("ball9.png"),
display.newImage("ball10.png"),
display.newImage("ball11.png"),
display.newImage("ball12.png"),
display.newImage("ball13.png"),
display.newImage("ball14.png"),
display.newImage("ball15.png"),
display.newImage("ball16.png"),
display.newImage("ball17.png"),
display.newImage("ball18.png"),
display.newImage("ball19.png"),
display.newImage("ball20.png")
}
function setup()
math.randomseed(os.time())
end
setup()
local ballImages = ballImages[math.random(4,#ballImages)]
physics.addBody(ballImages)
I'm only getting 1 ball to randomly drop. Do I have the syntax in math.random wrong?
Ive tried it several ways but not sure where to go from here.
Thanks in advance for help!
Yes, the syntax is wrong. See http://docs.coronalabs.com/api/library/math/random.html :
When called with two integer numbers m and n, math.random returns a uniform pseudo-random integer in the range [m, n].
You should make 4 calls to math.random(#ballImages).
If you just want 4 balls, possibly more than once the same ball, you're done.
However if you want distinct balls, you may have to redraw, if you draw a number that corresponds to a ball selected previously. That is if two of your math.random(#ballImages) return the same number i, that would mean "dropping the ball" i twice. If that doesn't make sense, you can do something like the following :
drawn = {}
local drop=4 -- how many balls to draw
while drop > 0 do -- while we have balls left to draw
local ball = math.random(#ballImages) -- draw a random ball
if drawn[ball] == nil then -- if ball wasn't selected before
drawn[ball] = 1 -- mark it as selected
physics.addBody(ballImages[ball]) -- "drop" the ball
drop = drop - 1 -- decrement how many more balls
end
end
If your display and physics objects work, then so should this snippet. See here : http://ideone.com/GQC2C6

how to increase the limit for max.print in R

I am using the Graph package in R for maxclique analysis of 5461 items.
The final output item which I get is very long, so I am getting the following warning:
reached getOption("max.print") -- omitted 475569 rows
Can somebody please provide me the pointers with how to increase the limit
for max.print.
Use the options command, e.g. options(max.print=1000000).
See ?options:
‘max.print’: integer, defaulting to ‘99999’. ‘print’ or ‘show’
methods can make use of this option, to limit the amount of
information that is printed, to something in the order of
(and typically slightly less than) ‘max.print’ _entries_.
See ?options:
options(max.print=999999)
You can use the options command to change the max.print value for the value limit you want to reach. For example:
options(max.print = 1000000)
There you can change the value of the max.print in R.
set the function options(max.print=10000) in top of your program. since you want intialize this before it works. It is working for me.
I fixed it just now. But it looks busty. Anyone make it simple please?
def list_by_tag_post(request):
# get POST
all_tag = request.POST.getlist('tag_list')
arr_query = list(all_tag)
for index in range(len(all_tag)):
tag_result = Tag.objects.get(id=all_tag[index])
all_english_text = tag_result.notes.all().values('english_text', 'id')
arr_query[index] = all_english_text
for index in range(len(arr_query)):
all_english_text = all_english_text | arr_query[index]
# Remove replicated items
all_english_text = all_english_text.order_by('id').distinct()
# render
context = {'all_english_text': all_english_text, 'all_tag': all_tag}
return render(request, 'list_by_tag.html', context)

Resources