What is difference between TinkerPop and TitanDB? - bigdata

As a beginner, Want to know the difference between TinkerPop and Titan

TitanDB is a graph database engine with a different backend storage ( like Cassandra etc ) and an optional query index ( like elasticsearch etc.. ) . In other words it creates property graph data models and stores it in one of the many supported backend stores and for optional faster querying relies on products like elasticsearch etc for indexing...
Tinkerpop is a framework that sits on top of titanDB. It also supports other graph databases like Neo4j for example. One of the many features implemented in tinkerpop is the gremlin graph query language (analogous to SQL for relational db) that interacts with graph databases like Titan to help the user create and query graph data.

Related

Is there a way to see what a Cosmos DB Gremlin API call looks like under the hood?

If I write: g.V().has("person","name","Diego");
What atom-record-sequence calls are used to make this query work?
Is there a way to directly query the atom-record-sequence in a Cosmos DB?
According to this Ms Doc The core type system of Azure Cosmos DB’s database engine is atom-record-sequence (ARS) based. Atoms consist of a small set of primitive types e.g. string, bool, number etc., records are structs and sequences are arrays consisting of atoms, records or sequences. The database engine of Azure Cosmos DB is capable of efficiently translating and projecting the data models onto the ARS based data model.
The engine is agnostic to the concept of a schema, blurring the boundary between the structure and instance values of records. Cosmos DB achieves full schema agnosticism by automatically indexing everything upon ingestion in an efficient manner, which allows users to query their globally distributed data without having to deal with schema or index management.
The article below explains this pretty well
https://learn.microsoft.com/en-us/azure/cosmos-db/global-dist-under-the-hood

Weaviate Search Graph Vs. GA of IBM Graph

IBM Graph service is only compared to how it can add and store properties in the form of key/value pairs associated with the data, for both vertices and nodes connected by edges, rather than the more traditional form of storing the data in tabular form using rows and columns. However, how is the GA of IBM Graph compared to Knowledge Graph with Weaviate Search Graph (GraphQL - RESTful - Semantic Search - Semantic Classification - Knowledge Representation)?
The answer is based on the assumption that you mean JanusGraph because of this article.
JanusGraph is a Graph DBMS optimized for distributed clusters.
Weaviate is a GraphQL based semantic search engine.
The main difference is that JanusGraph focusses on large graphs whereas Weaviate focusses on search where results are represented in graph format.
You might pick JanusGraph if you want to store and analyze a large graph.
You might pick Weaviate if you build a search engine and/or store data in graph format.
Something Weaviate can do what other search engines (regardless if they are graph-based or not) is index your data semantically, e.g., you can find a data object about the publication Vogue by searching for fashion.
Weaviate query example:
Disclaimer:
I'm part of the team working on Weaviate.

Does Gremlin support Namespace?

I'd like to create different graphs based on different domain. So, a kind of namespace or schema are needed. Just like the "Schema" conception in RDBMS. Therefore, does Gremlin support Namespace or similar sth?
Thanks
There is no notion of a schema name in the Gremlin language that is exactly like what you typically have in SQL. Your Gremlin query is bound to the graph to which you connect. If you have two or more domains then you either:
Create one graph per domain in which you can't traverse across those domains (you'd have to combine results after traversals - without explicit edges, i.e. joins, to connect the domains Gremlin has no way to do those sorts of queries), or
Create one large graph to house both domains and then constrain your traversal to the domain (in TinkerPop this is sometimes accomplished with PartitionStrategy)

Perfomance SQL Server 2017 Graph vs Neo4j

I am researching about graph databases. I stumbled into SQL Server 2017 and learned that they added the option to use a graph database. But I have some uncertainties about the performance. I watched several Youtube videos, tutorials and papers about this SQL Server 2017 Graph. For example this page.
With the image above in mind. When I try to find a node, is it true that the time complexity is O(n)? And is the performance in other graph databases like Neo4j similar? I am only talking about node lookup and not shortest path algorithms etc.
I also have a feeling that the graph functionality in SQL Server is just a relational database in disguise. Is this correct?
Thanks in advance.
There is a big difference between a graph database and a relational database with graph capabilities, in the sense of how the data is stored.
To summarise simply, when a triple ( aka 2 nodes connected by a relationship ) is stored, the underlying database difference will be :
Neo4j, the triple is stored as a graph on disk, nodes have pointers to the relationships they have, so during retrieval it will just be pointer chasing from nodes
SQL like : one node is stored in one table, the other node is stored in another table, yet you can query as a graph but the operation will be really making a JOIN
Based on those two facts, we can say that in native graphs the join is performed at write time compared to having joins at query time in non-native graphs.
Be very careful when you hear distributed graphs, partitions, planet scale and the like. If you start having relationships that have to be traversed over the network you will always suffer performance issues. Most of the distributed graphs platforms note also that for maximum performance you have to store everything on one partition (which defeats the partitioning purpose).

Should CosmosDB be modeled like a document database or a graph database?

I see that a CosmosDb can support both graph queries as well as more traditional SQL like queries - however I'm a bit confused about what kind of underlying schema is best at the collections level. If I were to model something in MongoDb or SQL Server, or Neo4j, I would have very different schemas. Also - it seems like I can query using more traditional SQL-like syntax - which makes it confusing about what's right or efficient underneath. Sometimes, making something easy to query does not mean that one should assume that it's an efficient query.
Is CosmosDb at it's heart a document database and I should model it accordingly - or is it a very different beast.
Example use case
Here's an example- let's say I have:
a user profile
multiple post types (photo, blog, question)
users can like photos
users can comment on photos, blogs, questions
With a sql database I would have tables:
profiles
photos
blogs
questions
and join tables with referential integrity to support the actions:
photoLikes
blogComments
photoComments
questionComments
With a graph database
I would just have the same core tables
profiles
photos
blogs
questions
and just create graph relationship types for like and comment - relying on the code business logic to enforce the rule that you can't like blogs, etc..
With a document db like MongoDb
Again, I might have the same core tables
profiles
photos
blogs
questions
Comments would be sub collections under each - and there would be a question of whether we want to keep the likes as an embedded collection under each profile, or under photos.. and we would have to increment and sync a like count to the other collection (depending on the use case we might create a like collection as well). Comments would be tucked under each photo, blog or question as an embedded collection and not have their own top-level collection.
So my question is this:
How do we model this schema in CosmosDB? Should we model it like a traditional Document Database like MongoDb, or does having access to a graph query allow us additional freedoms like not having to denormalize fields for actions such as "like?"
Azure Cosmos DB database engine is designed to be fully schema-agnostic.
A container (which can be a graph, a collection of documents, or a table) is a schema-agnostic container of arbitrary user generated content which gets automatically indexed upon ingest. I suggest to read "Schema-Agnostic Indexing with Azure DocumentDB" - http://www.vldb.org/pvldb/vol8/p1668-shukla.pdf, which is the same in Cosmos DB to better understand the details.
How do we model this schema in CosmosDB? Should we model it like a traditional Document Database like MongoDb, or does having access to a graph query allow us additional freedoms like not having to denormalize fields for actions such as "like?"
When you start modeling data in Azure Cosmos DB, you need to consider: 1.Is your application read heavy or write heavy? 2.How is your application going to query and update data? etc. Normally denormalized data models can provide better read performance, normalizing can provide better write performance.
This article explained with example how to model document data for NoSQL databases, and shared some scenarios for using embedded data models, normalized data models and Hybrid data models, which should be helpful.

Resources