NebulaGraph Database: How to write data with spark-connector in pyspark?

NebulaGraph Database: How to write data with spark-connector in pyspark? - nebula-graph

In the example, I saw the way of writing data in scala. Is there a way to write nebulagraph data in python?
/spark/bin/pyspark --driver-class-path nebula-spark-connector-3.0.0.jar --jars nebula-spark-connector-3.0.0.jar
df = spark.read.format(
"com.vesoft.nebula.connector.NebulaDataSource").option(
"type", "vertex").option(
"spaceName", "basketballplayer").option(
"label", "player").option(
"returnCols", "name,age").option(
"metaAddress", "metad0:9559").option(
"partitionNumber", 1).load()

It seems that pyspark is already supported by nebula-spark-connector.
The related issue has been addressed and closed on Github Issue #19.
If you search "pyspark" on the Github project README, you'll see some examples.
Just make sure that you set the paths to the spark-connector jar file in SparkConf before starting your spark application.
An example taken from the README:
df.write.format("com.vesoft.nebula.connector.NebulaDataSource").option(
"type", "vertex").option(
"spaceName", "basketballplayer").option(
"label", "player").option(
"vidPolicy", "").option(
"vertexField", "_vertexId").option(
"batch", 1).option(
"metaAddress", "metad0:9559").option(
"graphAddress", "graphd1:9669").option(
"passwd", "nebula").option(
"user", "root").save()

Related

How to append / add layers to geopackages in PyQGIS

For a project I am creating different layers which should all be written into one geopackage.
I am using QGIS 3.16.1 and the Python console inside QGIS which runs on Python 3.7
I tried many things but cannot figure out how to do this. This is what I used so far.
vl = QgsVectorLayer("Point", "points1", "memory")
vl2 = QgsVectorLayer("Point", "points2", "memory")
pr = vl.dataProvider()
pr.addAttributes([QgsField("DayID", QVariant.Int), QgsField("distance", QVariant.Double)])
vl.updateFields()
f = QgsFeature()
for x in range(len(tag_temp)):
f.setGeometry(QgsGeometry.fromPointXY(QgsPointXY(lon[x],lat[x])))
f.setAttributes([dayID[x], distance[x]])
pr.addFeature(f)
vl.updateExtents()
# I'll do the same for vl2 but with other data
uri ="D:/Documents/QGIS/test.gpkg"
options = QgsVectorFileWriter.SaveVectorOptions()
context = QgsProject.instance().transformContext()
QgsVectorFileWriter.writeAsVectorFormatV2(vl1,uri,context,options)
QgsVectorFileWriter.writeAsVectorFormatV2(vl2,uri,context,options)
Problem is that the in the 'test.gpkg' a layer is created called 'test' and not 'points1' or 'points2'.
And the second QgsVectorFileWriter.writeAsVectorFormatV2() also overwrites the output of the first one instead of appending the layer into the existing geopackage.
I also tried to create single .geopackages and then use 'Package Layers' processing tool (processing.run("native:package") to merge all layers into one geopackage, but then the attributes types are all converted into strings unfortunately.
Any help is much appreciated. Many thanks in advance.

You need to change the SaveVectorOptions, in particular the mode of actionOnExistingFile after creating the gpkg file :
options = QgsVectorFileWriter.SaveVectorOptions()
#options.driverName = "GPKG"
options.layerName = v1.name()
QgsVectorFileWriter.writeAsVectorFormatV2(v1,uri,context,options)
#switch mode to append layer instead of overwriting the file
options.actionOnExistingFile = QgsVectorFileWriter.CreateOrOverwriteLayer
options.layerName = v2.name()
QgsVectorFileWriter.writeAsVectorFormatV2(v2,uri,context,options)
The documentation is here : SaveVectorOptions
I also tried to create single .geopackages and then use 'Package Layers' processing tool (processing.run("native:package") to merge all layers into one geopackage, but then the attributes types are all converted into strings unfortunately.
This is definitively the recommended way, please consider reporting the bug

rpmUtils.miscutils in python3.6

I am refactoring code from python2(RHEL 7.6) to python3(RHEL 8.2) and I have problem with missing library in python3.6.
Problem:
from rpmUtils.miscutils import splitFilename ModuleNotFoundError: No module named 'rpmUtils'
I've tried to install python3-dnf and python3-rpm packages to RHEL8, but still not working. Is there any solution how to use this library in python3.6 and RHEL8 or should I write some custom function by myself?
Thank you for your answer.

This library was indeed removed, but you have several other options you can use.
Please note that these other functions expect to receive a string in the NEVRA (name, epoch, version, release, architecture) format as an input, not a filename. Thus you must remove the '.rpm' extension of the filename, in order to get a NVRA string (epoch normally is not included in the filename of the RPM package).
So basically you have 2 options:
to use dnf as suggested in i.e. https://bugzilla.redhat.com/show_bug.cgi?id=1364504
to use hawkey i.e. :
import hawkey
rpm_base_filename = os.path.basename(rpm_file)
nevra = hawkey.split_nevra(rpm_base_filename[:-len(".rpm")])
name = nevra.name
version = nevra.version
release = str(nevra.release)
epoch = str(nevra.epoch)
arch = nevra.arch
For example here is a patch for such modification that I made for one of the tools we use as part of the oVirt release process:
https://github.com/oVirt/releng-tools/commit/823405e6b261f7ff27ddbba0e8fa2b86dd2a8698

gremlin python clone traversal

I'm using gremlin-python to connect to gremlin-server and I'm trying to build up a query incrementally but I'm getting stuck. I have an initial part of my query like the following:
query = g.V().hasLabel('<some_label>')
Now I would like to do multiple things with this query, firstly I just want a count:
query.count().next()
Now if I do anything else using the query variable the count step is on the traversal, so something like the following doesn't work:
query.out('<some_edge_label>').valueMap().toList()
Looking at the docs it seems like I need to clone the traversal so I replaced the above with:
query = g.V().hasLabel('<some_label>')
count_query = query.clone()
count_query.count().next()
But query still has the count() step on it, when I print the bytecode even though I cloned it. Is this the expected behaviour for gremlin-python? Here is a complete example of what I'm talking about, printing the bytecode at each step:
query = g.V().hasLabel('alabel')
print(query)
q_count = query.clone()
print(q_count.count())
print(query)
[['V'], ['hasLabel', 'alabel']]
[['V'], ['hasLabel', 'alabel'], ['count']]
[['V'], ['hasLabel', 'alabel'], ['count']]
What do I do to clone/copy the start of the traversal so I can reuse it in gremlin-python?

There were some fixes in the area of deep cloning traversals in the 3.4.7 (3.3.11) [1] [2] Apache TinkerPop release (June 2020). Installing one of those drivers should help.
[1] https://github.com/apache/tinkerpop/blob/master/CHANGELOG.asciidoc
[2] https://issues.apache.org/jira/browse/TINKERPOP-2350

It looks like this issue was a bug in gremlin-python and has been fixed in version 3.4.7. Updating the version solved my issue.

Where can I find all command functions of Atom editor?

I can't find anywhere on the internet. I made some custom key bindings using stuff like:
editor = #getModel()
bufferRow = editor.bufferPositionForScreenPosition(editor.getCursorScreenPosition()).row
if editor.isFoldedAtBufferRow(bufferRow)
editor.unfoldBufferRow(bufferRow)
else
editor.foldBufferRow(bufferRow)
and
atom.workspace.getActiveTextEditor()?.selectAll()
atom.workspace.getActiveTextEditor()?.pasteText()
atom.workspace.getActiveTextEditor()?.save()
But I had to search so much. No list of all functions to use?
PS: Best thing close to it for me was: https://gist.github.com/philipmadeley/1fb35efdf5ab639c12c6

Atom has an abundance of documentation over at https://atom.io/docs/
More specifically, this would be what you were looking for:
https://atom.io/docs/api/v1.16.0/TextEditor

Creating graph in titan from data in csv - example wiki.Vote gives error

I am new to Titan - I loaded titan and successfully ran GraphOfTheGods example including queries given. Next I went on to try bulk loading csv file to create graph and followed steps in Powers of ten - Part 1 http://thinkaurelius.com/2014/05/29/powers-of-ten-part-i/
I am getting an error in loading wiki-Vote.txt
gremlin> g = TitanFactory.open("/tmp/1m") Backend shorthand unknown: /tmp/1m
I tried:
g = TitanFactory.open('conf/titan-berkeleydb-es.properties’)
but get an error in the next step in load-1m.groovy
==>titangraph[berkeleyje:/titan-0.5.4-hadoop2/conf/../db/berkeley] No signature of method: groovy.lang.MissingMethodException.makeKey() is applicable for argument types: () values: [] Possible solutions: every(), any()
Any hints what to do next? I am using groovy for the first time. what kind of groovy expertise needed for working with gremlin

That blog post is meant for Titan 0.4.x. The API shifted when Titan went to 0.5.x. The same principles discussed in the posts generally apply to data loading but the syntax is different in places. The intention is to update those posts in some form when Titan 1.0 comes out with full support of TinkerPop3. Until then, you will need to convert those code examples to the revised API.
For example, an easy way to create a berkeleydb database is with:
g = TitanFactory.build()
.set("storage.backend", "berkeleyje")
.set("storage.directory", "/tmp/1m")
.open();
Please see the docs here. Then most of the schema creation code (which is the biggest change) is now described here and here.

After much experimenting today, I finally figured it out. A lot of changes were needed:
Use makePropertyKey() instead of makeKey(), and makeEdgeLabel() instead of makeLabel()
Use cardinality(Cardinality.SINGLE) instead of unique()
Building the index is quite a bit more complicated. Use the management system instead of the graph both to make the keys and labels, as well as build the index (see https://groups.google.com/forum/#!topic/aureliusgraphs/lGA3Ye4RI5E)
For posterity, here's the modified script that should work (as of 0.5.4):
g = TitanFactory.build().set("storage.backend", "berkeleyje").set("storage.directory", "/tmp/1m").open()
m = g.getManagementSystem()
k = m.makePropertyKey('userId').dataType(String.class).cardinality(Cardinality.SINGLE).make()
m.buildIndex('byId', Vertex.class).addKey(k).buildCompositeIndex()
m.makeEdgeLabel('votesFor').make()
m.commit()
getOrCreate = { id ->
def p = g.V('userId', id)
if (p.hasNext()) {
p.next()
} else {
g.addVertex([userId:id])
}
}
new File('wiki-Vote.txt').eachLine {
if (!it.startsWith("#")){
(fromVertex, toVertex) = it.split('\t').collect(getOrCreate)
fromVertex.addEdge('votesFor', toVertex)
}
}
g.commit()

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

NebulaGraph Database: How to write data with spark-connector in pyspark? - nebula-graph

Related

How to append / add layers to geopackages in PyQGIS

rpmUtils.miscutils in python3.6

gremlin python clone traversal

Where can I find all command functions of Atom editor?

Creating graph in titan from data in csv - example wiki.Vote gives error

Categories

Resources