We want to perform bulk (millions of low-cost) query/read operations on a graph that we have in Cosmos DB using the Gremlin API in .NET C#. What is the best way to do so?
We know about BulkExecutor Library which only supports Bulk Import and Bulk Update operations. Is there something similar for Bulk Query/Read operations?
Currently, I am using a Gremlin.Net.Driver, GremlinClient (https://learn.microsoft.com/en-us/azure/cosmos-db/create-graph-dotnet) to fire queries. However, this driver is very slow compared to the BulkExecutor libraries BulkImport operations.
Currently the only support using Bulk is using Bulk V3 for point read operations (not queries).
So if you know the Id and Partition key value of your documents, you can leverage Bulk mode with ReadItem operations.
Related
I'm thinking about learn JanusGraph to use in my new big project but i can't understand some things.
Janus can be used like any database and supports "insert", "update", "delete" operations so JanusGraph will write data into Cassandra or other database to store these data, right?
Where JanusGraph store the Nodes, Edges, Attributes etc, it will write these into database, right?
These data should be loaded in memory by Janus or will be read from Cassandra all the time?
The data that JanusGraph read, must be load in JanusGraph in every query or it will do selects in database to retrieve the data I need?
The data retrieved in database is only what I need or Janus will read all records in database all the time?
Should I use JanusGraph in my project in production or should I wait until it becomes production ready?
I'm developing some kind of social network that need to store friendship, posts, comments, user blocks and do some elasticsearch too, in this case, what database backend should I use?
Janus will write data into Cassandra or other database to store these data, right?
Where Janus store the Nodes, Edges, Attributes etc, it will write these into database, right?
Janus Graph will write the data into whatever storage backend you configure it to use. This includes Cassandra. It writes this data into the underlaying database using the data model roughly outlined here
These data should be loaded in memory by Janus or will be read from Cassandra all the time?
The data retrieved in database is only what I need or Janus will read all records in database all the time?
Janus Graph will only load into memory vertices and edges which you touch during a query/traversal. So if you do something like:
graph.traversal().V().hasLabel("My Amazing Label");
Janus will read and load into memory only the vertices with that label. So you don't need to worry about initializing a graph connection and then waiting for the entire graph to be serialised into memory before you can query. Janus is a lazy reader.
Should I use Janus in my project in production or should I wait until it becomes production ready?
That is entirely up to you and your use case. Janus is being used in production already as can be seen here at the bottom of the page. Janus was forked from and improved on TitanDB which is also used in several production use cases. So if you wondering "is it ready" then I would say yes, it's clearly ready given it's existing uses.
what database backend should I use?
Again, that's entirely up to you. I use Cassandra because it can scale horizontally and I find it easier to work with. It also seems to suit all different sizes of data.
I have toyed with Google Big Table and that seems very powerful as well. However, it's only really suited for VERY big data and it's also only on the cloud where as Cassandra can be hosted locally very easily.
I have not used Janus with HBase or BerkeleyDB so I can't comment there.
It's very simple to change between backends though (all you need to do is adjust some configs and check your dependencies are in place) so during your development feel free to play around with the backends. You only really need to commit to a backend when you go production or are more sure of each backend.
When considering what storage backend to use for a new project it's important to consider what tradeoffs you'd like to make. In my personal projects, I've enjoyed using NoSQL graph databases due to the following advantages over relational dbs
Not needing to migrate schemas increases productivity when rapidly iterating on a new project
Traversing a heavily normalized data-model is not as expensive as with JOINs in an RDBMS
Most include in-memory configurations which are great for experimenting & testing.
Support for multi-machine clusters and Partition Tolerance.
Here are sample JanusGraph and Neo4j backends written in Kotlin:
https://github.com/pm-dev/janusgraph-exploration
https://github.com/pm-dev/neo4j-exploration
The main advantage with JanusGraph is the flexibility of pluging-in whichever storage backend you'd like.
We will be moving from Oracle and use MarkLogic 8 as our datastore and will be using MarkLogic's Java api to talk with data.
I am exploring for any UI tool (like SQL Developer is there for Oracle), which can be used for ML. I found that ML's Query Manager can used for accessing data. But I see multiple options wrt language:
SQL
SPARQL
XQuery
JavaScript
We need to perform CRUD operations and search for data, and our testing team is aware of SQL (for Oracle), so I am confused which route I should follow and on what basis I should decide which one/two will be better to explore. We are most likely to use JSON document type.
Any help/suggestions would be helpful.
You already mention you will be using the MarkLogic Java Client API, that should provide most of the common needs you could have, including search, CRUD, facets, lexicon values, and also custom extension though REST extensions as the Client API will be leveraging the MarkLogic REST API. It saves you from having to code inside MarkLogic to a large extent.
Apart from that you can run ad hoc commands from the Query Console, using one of the above mentioned languages. SQL will require the presence of a so-called SQL view (see also your earlier question Using SQL in Query Manager in MarkLogic). SPARQL will require enabling the triple index, and ingestion of RDF data.
That leaves XQuery and JavaScript, that have pretty much identical expression power, and performance. If you are unfamiliar with XQuery and XML languages in general, JavaScript might be more appealing.
HTH!
we are currently testing out Riak, we have in Riak a huge bucket with millions of keys and I need to query all the keys and save them in a file.
We are using Java as the API.
Is there any way to get the result of the query paged?
Listing millions of keys in Riak is not recommended in production environments as it is a very expensive operation. If you still need to do it, it is best to use the list keys function as this allows Riak to stream results back to the client and will work for any backend.
While it is possible to perform paging for secondary index queries if you are using LevelDB or the memory backend in version 1.4 onwards, this requires sorting on the server side and is therefore not recommended for such large result sets.
I was wondering how SQLite behaves when it's given multiple databases to insert/update/delete in at the same time? Does it spawn multiple processes which can in theory have better concurrency than using a single database/single process or it utilizes the same process for each?
Searching through the documentation didn't provide e with a definitive answer. I am aware that SQLite isn't the most ideal environment for multiple writes, as the database resides in as single file. But does that mean that multiple files = different write processes?
databaseOne = connectToSqlite('databaseOne');
databaseTwo = connectToSqlite('databaseTwo');
function write()
queryDatabaseOne("INSERT SOMETHING INTO SOME_TABLE VALUES SOME_VALUES");
queryDatabaseTwo("INSERT SOMETHING INTO SOME_TABLE VALUES SOME_VALUES");
So, two different sqlite databases, and two inserts executed in parallel, towards tables in the two databases.
Thanks
Normally, database queries are blocking - they do not return until they are complete. This helps secure the integrity of the database. The SQLITE API is blocking.
Of course, if you have a multiple databases, then you can write a multi-threaded application with non-blocking routines that call the the SQLITE API and then code overlapping, parallel inserts to the multiple databases. You will have to be careful about all the usual things in a multithreaded application - the SQLITE API will neither help not hinder - with added complication of insuring that there in no possibility of overlapping accesses to the SAME database.
I have a problem with my iPad app. I have to convert a PostgreSQL database to SQLite database. Luckily I did it but I didn't know how to convert the PostgreSQL functions. Anyone knows how to did it? Or how can I get the same result in Sqlite?
Thanks in advance, Luca
As you allready discovered sqlite has no functions. So you will probably have to move some logic from the function and the single query out of your db into your app. This will probably mean executing a query looking at the result and based on that executing some more queries. So you can do the logic of the function in your app.
While this is not recommended in PostgreSQL as performance suffers from many small queries the performance loss in SQlite is much smaller as it is an embedded database (no communication overhead) and it's planner is more straight forward which makes the planning overhead of seperate queries smaller.