Vertex in Python Gremlin not updating - gremlin

Using python gremlin on Neptune workbench, I have two functions:
The first adds a Vertex with a set of properties, and returns a reference to the traversal operation
The second adds to that traversal operation.
For some reason, the first function's operations are getting persisted to the DB, but the second operations do not. Why is this?
Here are the two functions:
def add_v(v_type, name):
tmp_id = get_id(f"{v_type}-{name}")
result = g.addV(v_type).property('id', tmp_id).property('name', name)
result.iterate()
return result
def process_records(features):
for i in features:
v_type = i[0]
name = i[1]
v = add_v(v_type, name)
if len(i) > 2:
%debug
props = i[2]
for r in props:
v.property(r[0], r[1]).iterate()

Your add_V method has already iterated the traversal. If you want to return the traversal from add_v in a way that you can add to it remove the iterate.

Related

fastAPI SQLmodel MultipleResultsFound: Multiple rows were found when exactly one was required

This is my delete function.
def delete_session(self,session_id: int, db):
with Session(engine) as session:
statement = select(db).where(db.session == session_id)
results = session.exec(statement)
sess = results.one()
print("sess: ", sess)
if not sess:
raise HTTPException(status_code=404, detail="Session not found")
session.delete(sess)
session.commit()
return {"Session Deleted": True}
I want to delete all records where session_id matches.
But its throwing following error
MultipleResultsFound: Multiple rows were found when exactly one was required
How can i delete multiple rows at once.
I tried using
sess = results.all()
but it say
sqlalchemy.orm.exc.UnmappedInstanceError: Class 'builtins.list' is not mapped
Thanks
Currently, you are trying to delete several data items, except that session.delete() only takes a single value, not a list of values.
You are using results.one() probably thinking that you can isolate your answers and return only one. However, it is explained in the documentation that if multiple entries are found in the parameter passed to one() then it will throw a MultipleResultsFound exception, hence your error.
Indeed, your statement returns a list, so multiple values.
In order to delete all your elements, you should not use one() but simply iterate with a for loop on your results and delete one by one, your data, as follows:
def delete_session(self, session_id: int, db):
with Session(engine) as session:
statement = select(db).where(db.session == session_id)
results = session.exec(statement).all()
for sess in results:
session.delete(sess)
session.commit()
return {"Session Deleted": True}

Runtime error:dictionary changed size during iteration

I iterate thru items of a dictionary "var_dict".
Then as I iterate in a for loop, I need to update the dictionary.
I understand that is not possible and that triggers the runtime error I experienced.
My question is, do I need to create a different dictionary to store data? As is now, I am trying to use same dictionary with different keys.
I know the problem is related to iteration thru the key and values of a dictionary and attempt to change it. I want to know if the best option in this case if to create a separate dictionary.
for k, v in var_dict.items():
match = str(match)
match = match.strip("[]")
match = match.strip("''")
result = [index for index, value in enumerate(v) if match in value]
result = str(result)
result = result.strip("[]")
result = result.strip("'")
#====> IF I print(var_dict), at this point I have no error *********
if result == "0":
#It means a match between interface on RP PSE2 model was found; Interface position is on PSE2 architecture
print (f'PSE-2 Line cards:{v} Interfaces on PSE2:{entry} Interface PortID:{port_id}')
port_id = int(port_id)
print(port_id)
if port_id >= 19:
#print(f'interface:{entry} portID={port_id} CPU_POS={port_cpu_pos} REPLICATION=YES')
if_info = [entry,'PSE2=YES',port_id,port_cpu_pos,'REPLICATION=YES']
var_dict['IF_PSE2'].append(if_info)
#===> *** This is the point that if i attempt to print var_dict, I get the Error during olist(): dictionary changed size during iteration
else:
#print(f'interface:{entry},portID={port_id} CPU_POS={port_cpu_pos} REPLICATION=NO')
if_info = [entry,'PSE2=YES',port_id,port_cpu_pos,'REPLICATION=NO']
var_dict['IF_PSE2'].append(if_info)
else:
#it means the interface is on single PSE. No replication is applicable. Just check threshold between incoming and outgoing rate.
if_info = [entry,'PSE2=NO',int(port_id),port_cpu_pos,'REPLICATION=NO']
var_dict['IF_PSE1'].append(if_info)
I did a shallow copy and that allowed me to iterate a dictionary copy and make modifications to the original dictionary. Problem solved. Thanks.
(...)
temp_var_dict = var_dict.copy()
for k, v in temp_var_dict.items():
(...)

accumulator in pyspark with dict as global variable

Just for learning purpose, I tried to set a dictionary as a global variable in accumulator the add function works well, but I ran the code and put dictionary in the map function, it always return empty.
But similar code for setting list as a global variable
class DictParam(AccumulatorParam):
def zero(self, value = ""):
return dict()
def addInPlace(self, acc1, acc2):
acc1.update(acc2)
if __name__== "__main__":
sc, sqlContext = init_spark("generate_score_summary", 40)
rdd = sc.textFile('input')
#print(rdd.take(5))
dict1 = sc.accumulator({}, DictParam())
def file_read(line):
global dict1
ls = re.split(',', line)
dict1+={ls[0]:ls[1]}
return line
rdd = rdd.map(lambda x: file_read(x)).cache()
print(dict1)
For anyone who arrives at this thread looking for a Dict accumulator for pyspark: the accepted solution does not solve the posed problem.
The issue is actually in the DictParam defined, it does not update the original dictionary. This works:
class DictParam(AccumulatorParam):
def zero(self, value = ""):
return dict()
def addInPlace(self, value1, value2):
value1.update(value2)
return value1
The original code was missing the return value.
I believe that print(dict1()) simply gets executed before the rdd.map() does.
In Spark, there are 2 types of operations:
transformations, that describe the future computation
and actions, that call for action, and actually trigger the execution
Accumulators are updated only when some action is executed:
Accumulators do not change the lazy evaluation model of Spark. If they
are being updated within an operation on an RDD, their value is only
updated once that RDD is computed as part of an action.
If you check out the end of this section of the docs, there is an example exactly like yours:
accum = sc.accumulator(0)
def g(x):
accum.add(x)
return f(x)
data.map(g)
# Here, accum is still 0 because no actions have caused the `map` to be computed.
So you would need to add some action, for instance:
rdd = rdd.map(lambda x: file_read(x)).cache() # transformation
foo = rdd.count() # action
print(dict1)
Please make sure to check on the details of various RDD functions and accumulator peculiarities because this might affect the correctness of your result. (For instance, rdd.take(n) will by default only scan one partition, not the entire dataset.)
For accumulator updates performed inside actions only, their value is
only updated once that RDD is computed as part of an action

DSE graph Batch write with ifnotexist on edges

I am using DSE graph to load data from a excel and preparing addE gremlin queries through java code and at last executing them over DSE graph.
In current testing need to fire 4,00,000 addE gremlin queries with two edge labels.
1) What is best practice to finish this execution in few minutes ?
Right now i am giving gremlin queries in 1000 batch to dseSession.executeGraph(new SimpleGraphStatement("")) which leading to exception Method code too large! at groovyjarjarasm.asm.MethodWriter
2) For edge labels in this usecase, my schema defined as single cardinality.
Also using custom vertex ids for vertexes.
So if a edge already exist then DSE should just ignore it without any exception ?
The query parameter should be a simple array that looks like this:
[[from1, to1, label1], [from2, to2, label2], ...]
Then your script should look like this:
for (def triple in arg) {
def (id1, id2, lbl) = triple
def v1 = graph.vertices(id1).next()
def v2 = graph.vertices(id2).next()
if (!g.V(v1).outE(lbl).filter(inV().is(v2)).hasNext()) {
v1.addEdge(lbl, v2)
}
}
Alternatively:
for (def triple in arg) {
def (id1, id2, lbl) = triple
def v1 = graph.vertices(id1).next()
if (!g.V(v1).outE(lbl).filter(inV().hasId(id2)).hasNext()) {
v1.addEdge(lbl, graph.vertices(id2).next())
}
}
Try both variants; at least one of them should outperform any other solution.

Parallelize function on dictionary in IPython

Up till now, I have parallelized functions by mapping them on to lists that are distributed out to the various clusters using the function map_sync(function, list) .
Now, I need to run a function on each entry of a dictionary.
map_sync does not seem work on dictionaries. I have also tried to scatter the dictionary and use decorators to run the function in parallel. However, dictionaries dont seem to lend themselves to scattering either. Is there some other way to parallelize functions on dictionaries without having to convert to lists?
These are my attempts thus far:
from IPython.parallel import Client
rc = Client()
dview = rc[:]
test_dict = {'43':"lion", '34':"tiger", '343':"duck"}
dview.scatter("test",test)
dview["test"]
# this yields [['343'], ['43'], ['34'], []] on 4 clusters
# which suggests that a dictionary can't be scattered?
Needless to say, when I run the function itself, I get an error:
#dview.parallel(block=True)
def run():
for d,v in test.iteritems():
print d,v
run()
AttributeError
Traceback (most recent call last) in ()
in run(dict)
AttributeError: 'str' object has no attribute 'iteritems'
I don't know if it's relevant, but I'm using an IPython Notebook connected to Amazon AWS clusters.
You can scatter a dict with:
def scatter_dict(view, name, d):
"""partition a dictionary across the engines of a view"""
ntargets = len(view)
keys = d.keys() # list(d.keys()) in Python 3
for i, target in enumerate(view.targets):
subd = {}
for key in keys[i::ntargets]:
subd[key] = d[key]
view.client[target][name] = subd
scatter_dict(dview, 'test', test_dict)
and then operate on it remotely, as you normally would.
You can also gather the remote dicts into one local one again with:
def gather_dict(view, name):
"""gather dictionaries from a DirectView"""
merged = {}
for d in view.pull(name):
merged.update(d)
return merged
gather_dict(dv, 'test')
An example notebook

Resources