some generic questions about neo4j - servlets

I'm new to non-php web applications and to nosql databases. I was looking for a smart solution matching my application requirements and I was very surprised when I knew that there exist graph based db. Well I found neo4j very nice and very suitable for my application, but as I've already wrote I'm new to this and I have some limitations in understending how it works. I hope you guys could help me to learn.
If I embed neo4j in a servlet program then the database access I create is shared among the different threads of that servet right? so I need to put database creation in init() method and the shutdown in the destroy() right? And it will be thread safe.(every dot is a "right?") But what if I want to create a database shared among the whole application?
I heard that graph databases in general relies on a relational low level. Is that true for neo4j? But if it is then I see an high level interface to the real persistence layer, so what a Connection is in this case? Are there some techniques like connection pooling or these low level things are all managed by neo4j?
In my application I need to join some objects to users and many other classification stuff. any of these object has an unique id (a String). then If some one asks to view some stuff about object having id=QW then I need to load the vertex associate to object.QW. Is this an easy operation for graph datbases?
If I need to manage authentications, so as I receive the couple (usr,pwd) and I need to check whether exists this couple in my graph. Is the same problem as before or there exist some good variation for managing authentications?
thanks

If you're coming from PHP world in most cases you're better of running Neo4j in server mode and access it either via REST directly or use a client driver like https://github.com/jadell/neo4jphp. If you still want to embed Neo4j in a servlet environment, the GraphDatabaseService is a shared component, maybe stored within the ServletContext. On a per request (and therefore per-thread) basis you start and commit transactions.
Neo4j is a native graph database. The bare metal persistence layer is optimized for navigating from one node to its neighbors as fast as possible and written by the Neo4j devteam themselves. There are other graph databases out there reusing other persistence technologies for their underlying persistence.
Best thing is to run the Neo4j online course at http://www.neo4j.org/learn/online_course.
see SecurityRules

As the Neo4j is NoSql Graph Database,
Genration of the Unique ID you have to handle using the GUID(with 3.x autonincremented proery also supported for particular label),
as the Neo4j default genrated id is unique but can be realocated to the another object once the first assigned object is deleted,
I am .net developer in my project I used the Neo4j rest api it works well, i will sugesst you to go with that,as it is implemented using async-awit programing pattern, so long running operation you can pass to DB and utilize your web server resources in more prominent way.

Related

Multiple micro-services backed by the same DynamoDB table?

Is it an anti pattern if multiple micro-services read/write to/from the same DynamoDB table?
Please note that:
We have a fixed schema which will not change any soon.
All read/write will go though REST services though API gateway + Lambda
micro-services are deployed on Kubernetes or Lambda
When you writing microservices it is advised to not share databases. That is there because it provides for easy scaling and provides each to services to have their own say when it comes to their set of data. It also enables one of these services to take a call on how data is to be kept and can change it at their will. This gives services flexibility.
Even if your schema is not changing you can be sure of the fact that one service when throttled will not impact the others.
If all your crud calls are channeled through a rest service you do have a service layer in front of a database and yes you can do that under microservices guidelines
I will still not call the approach an anti pattern. Its just that the collective experience says that its better not to talk to one databases for some of the reasons I mentioned above.
I have a different opinion here and I'll call this approach as anti-pattern. Few of the key principles which are getting violated here:
Each MS should have it's own bounded context, here if all are sharing the same data set then their business boundaries are blurred.
DB is single point of failure if DB goes down, all your services goes down.
If you try to scale single service and spawn multiple instances, it will impact DB performance and eventually, will effect the performance of other microservices.
Solution,..IMO,
First analyze if you have a case for MSA, if your data is very tightly coupled and dependent on each other then you don't need this architecture.
Explore CQRS pattern, you might like to have different DB for read and write and synchronize them via event pattern.
Hope this helps!!

Should I use JanusGraph as main database to store all my data for a new project?

I'm thinking about learn JanusGraph to use in my new big project but i can't understand some things.
Janus can be used like any database and supports "insert", "update", "delete" operations so JanusGraph will write data into Cassandra or other database to store these data, right?
Where JanusGraph store the Nodes, Edges, Attributes etc, it will write these into database, right?
These data should be loaded in memory by Janus or will be read from Cassandra all the time?
The data that JanusGraph read, must be load in JanusGraph in every query or it will do selects in database to retrieve the data I need?
The data retrieved in database is only what I need or Janus will read all records in database all the time?
Should I use JanusGraph in my project in production or should I wait until it becomes production ready?
I'm developing some kind of social network that need to store friendship, posts, comments, user blocks and do some elasticsearch too, in this case, what database backend should I use?
Janus will write data into Cassandra or other database to store these data, right?
Where Janus store the Nodes, Edges, Attributes etc, it will write these into database, right?
Janus Graph will write the data into whatever storage backend you configure it to use. This includes Cassandra. It writes this data into the underlaying database using the data model roughly outlined here
These data should be loaded in memory by Janus or will be read from Cassandra all the time?
The data retrieved in database is only what I need or Janus will read all records in database all the time?
Janus Graph will only load into memory vertices and edges which you touch during a query/traversal. So if you do something like:
graph.traversal().V().hasLabel("My Amazing Label");
Janus will read and load into memory only the vertices with that label. So you don't need to worry about initializing a graph connection and then waiting for the entire graph to be serialised into memory before you can query. Janus is a lazy reader.
Should I use Janus in my project in production or should I wait until it becomes production ready?
That is entirely up to you and your use case. Janus is being used in production already as can be seen here at the bottom of the page. Janus was forked from and improved on TitanDB which is also used in several production use cases. So if you wondering "is it ready" then I would say yes, it's clearly ready given it's existing uses.
what database backend should I use?
Again, that's entirely up to you. I use Cassandra because it can scale horizontally and I find it easier to work with. It also seems to suit all different sizes of data.
I have toyed with Google Big Table and that seems very powerful as well. However, it's only really suited for VERY big data and it's also only on the cloud where as Cassandra can be hosted locally very easily.
I have not used Janus with HBase or BerkeleyDB so I can't comment there.
It's very simple to change between backends though (all you need to do is adjust some configs and check your dependencies are in place) so during your development feel free to play around with the backends. You only really need to commit to a backend when you go production or are more sure of each backend.
When considering what storage backend to use for a new project it's important to consider what tradeoffs you'd like to make. In my personal projects, I've enjoyed using NoSQL graph databases due to the following advantages over relational dbs
Not needing to migrate schemas increases productivity when rapidly iterating on a new project
Traversing a heavily normalized data-model is not as expensive as with JOINs in an RDBMS
Most include in-memory configurations which are great for experimenting & testing.
Support for multi-machine clusters and Partition Tolerance.
Here are sample JanusGraph and Neo4j backends written in Kotlin:
https://github.com/pm-dev/janusgraph-exploration
https://github.com/pm-dev/neo4j-exploration
The main advantage with JanusGraph is the flexibility of pluging-in whichever storage backend you'd like.

Is there a way to expire value in Neo4j, and change it to default one

We are building small social network, with friend connections and presences. We are planning to implement friend connections using Neo4J. And we already have infrastructure for presence management, implemented using expiration keys in Redis (If there is no key that means a user is offline, otherwise he is considered online).
Most likely after getting friend connections, user will try to get presences, so the question is.
Is there a way to implement an automatic, value change in Neo4J node?
In our case the purpose is to replace Redis presence, with Neo4J stored data, to minimize Requests to the server, and extra querying.
Right now there is no such mechanism but should be pretty simple to implement in a server extension or kernel-extension.
You would use a transaction-event-lister to see property changes and add those keys to a timeout-expiry map (or index) which is then regularly checked.
But it would require some java-coding or reaching out to the Neo4j community for help or Neo Technology for consulting.

Multi-Database Transactional System & ASP.NET MVC

So I have a challenge to build a site that people online can use to interact with organizations.: Asp.NET MVC Customer Application
One of the requirements is financial processing and accounting.
I'm very comfortable using SQL Transactions and stored procedures to do this; i.e. CreateCustomer also creates an entity, and an account record. We have a stored procedure to do this, that does a begin transaction, creates some setup records we need, then does a commit. I'm not seeing a good way to do this with an ORM, and after reading some great blog articles I'm starting to wonder if I'm going down the wrong path.
Part of the complexity here is the data itself:
I'm querying x databases (one per existing customer) to get some of my data, though my app has its own data store as well. I need to query the x databases, run stored procedures on the x databases, and also to my own datastore.
I'm not seeing strong support for things like stored procedures and thereby transactions, though it does seem to be present.
Maybe I'm just trying to make my app a nail here, cause the MVC hammer is sooo shiny. I'm plenty comfortable with raw ADO.NET of course, but I'm in love with the expressive feel to writing Linq code in C# and I'd rather not give up on it.
Down to the question:
Is this a bad idea? Should I try to use Linq / Entity Framework, or something like nHibernate... and stick with the ORM pattern or should I trash it and use raw ADO.NET data access?
Edit: a note on scale; from a queries per second standpoint this app is not "huge". But, from a data complexity perspective, it does need to query against 50+ databases (all identical, or close to it) to read data from an external application and publish data back to that application. ORM feels right when dealing with "my" data store, but feels very wrong for accessing the data from the external application.
From a certain size (number of databases) up, you have to change the paradigm. Are you at that size?
When you deploy what ultimately is a distributed application and yet try to controll it as an ordinary local application you are going to run into a set of fundamental issues around availability, scalability and correctness. If you use concepts like 'distributed transactions', 'linked servers' and 'ORM', your are down the wrong path. True distributed applications will use terms like 'message', 'queue' and and 'service'. Terms like Linq, EF, nHibernate are all fine and good, but none will bring you anything extra from what a simple Transact-SQL SELECT statement brings. In other words, if a SELECT solves your issues, then the client side various ORM will work. If not, they won't add any miraculos value.
I recommend you go over the slides on the SQLCAT: High Performance Distributed Applications in Real World Deployments which explain how a site like MySpace manages to read and write into a store of nearly 500 servers and thousands of databases.
Ultimately what you need to internalize is this: one database can have 95% availability (uptime and acceptable service response time). A system consiting of 10 databases with 95% availability has 59% availability. And a system of 100 databases each with 99.5% availability has 60% availability. 1000 databases with 99.95% availability (5 min downtime per week) have 60% availability. And this is for an ideal situation. In reality there is always a snowball effect caused by resource consumption (eg. threads blocked on trying to access an unavailable or slow resource) that makes things far worse.
This means that one cannot write a large distributed system relying on synchronous, tightly coupled operatiosn and transactions. Is simply impossible. You always rely on asynchronous operations (usually messaging and queues), which is something completely different from your run-of-the-mill database application.
use TransactionScope object available in System.Transaction.
What I have chosen is to use Entity Framework to allow access to the application's main data store, and create a custom DAL for access to external application data and for access to stored procedures within the application.
Here's hoping Entity Framework 4.0 fixes the issue. For now, I'm using the concept listed here.
http://social.msdn.microsoft.com/forums/en-US/adodotnetentityframework/thread/44a0a7c2-7c1b-43bc-98e0-4d072b94b2ab/

How to implement locking across a server farm?

Are there well-known best practices for synchronizing tasks across a server farm? For example if I have a forum based website running on a server farm, and there are two moderators trying to do some action which requires writing to multiple tables in the database, and the requests of those moderators are being handled by different servers in the server farm, how can one implement some locking functionality to ensure that they can't take that action on the same item at the same time?
So far, I'm thinking about using a table in the database to sync, e.g. check the id of the item in the table if doesn't exsit insert it and proceed, otherwise return. Also probably a shared cache could be used for this but I'm not using this at the moment.
Any other way?
By the way, I'm using MySQL as my database back-end.
Your question implies data level concurrency control -- in that case, use the RDBMS's concurrency control mechanisms.
That will not help you if later you wish to control application level actions which do not necessarily map one to one to a data entity (e.g. table record access). The general solution there is a reverse-proxy server that understands application level semantics and serializes accordingly if necessary. (That will negatively impact availability.)
It probably wouldn't hurt to read up on CAP theorem, as well!
You may want to investigate a distributed locking service such as Zookeeper. It's a reimplementation of a Google service that provides very high speed distributed resource locking coordination for applications. I don't know how easy it would be to incorporate into a web app, though.
If all the state is in the (central) database then the database transactions should take care of that for you.
See http://en.wikipedia.org/wiki/Transaction_(database)
It may be irrelevant for you because the question is old, but it still may be useful for others so i'll post it anyway.
You can use a "SELECT FOR UPDATE" db query on a locking object, so you actually use the db for achieving the lock mechanism.
if you use ORM, you can also do that. for example, in nhibernate you can do:
session.Lock(Member, LockMode.Upgrade);
Having a table of locks is a OK way to do it is simple and works.
You could also have the code as a Service on a Single Server, more of a SOA approach.
You could also use the the TimeStamp field with Transactions, if the timestamp has changed since you last got the data you can revert the transaction. So if someone gets in first they have priority.

Resources