Setting-up an open-source decentralized social-network - encryption

I am trying to build an open-source decentralized social network, created and supported by the community (Facebook like).
Using Datastax Enterprise/Cassandra it is possible to setup a working DHT (Distributed hash table) to store the large amount of data within a cluster owned by a single 'business' or 'company'.
This way all the data (like: users/profile data, posts, like, etc.) are stored under control of this company within their own cluster, so that the data are more or less "safe".
However in my case, other people (from the community) should be able to set-up their own node within the cluster to support the cluster and load balancing. This could be anyone (good or evil)...
Meaning that the data between the nodes should not only be encrypted (via SSL/TLS), but also the data ITSELF what is stored on the nodes, should be encrypted as well!
So, my question before continuing using the Datastax software is:
Is it possible to store all the data encrypted somehow on every
node, so that the cluster can be run by a given individual/random person?
Thank you!
Kind regards,
Melroy van den Berg

I think it's safe to say that current database technology is based on the concept of controlled access to database servers themselves and that "random persons" (or computer programs!) can only access the database remotely by a well-controlled API.
That said, you could always create your own application layer which mediates between said random users and DSE itself, providing limited administrative access to DSE based on use cases programmed into the application layer itself.
DSE does support transparent data encryption (TDE), but once again that is oriented towards very controlled access to the database servers. You could use it, but the suggested application layer may obviate the need for encryption on the database server(s).

Related

Is there a reason why not model social network with SQL API?

In one of our apps we need to introduce a feature of users being able to choose friends, see friends activity, etc.
So far we use CosmosDb container with SQL API (for things app does beside this social network aspect).
I am wondering is there a reason not to model it with SQL API but to go strictly with Gremlin?
I’ve seen examples on Microsoft site about modeling basic social network done with ordinary SQL API but i am not sure if i am missing something that would bite me down the road in a case not going with Gremlin?
You should be safe in choosing either. From docs:
Each API operates independently, except the Gremlin and SQL API, which
are interoperable.
The stored data is JSON in both cases.
More on choosing an API.

Corda for Digital Identites?

Hi is Corda a recommended platform for Digital Identity? For a use-case of Account based-Certification. (i.e. i as a user store my certificates/Identity on the ledger and access it via a password/key where i would go through a node, at the same time to allow a specified certificate only to be seen by a specified party. Where the control is on the user/account level and not a node level. Which means i could specify which certificate/identities i would want to allow another organisation to access)
for blockchain technologies I understand that the data is duplicated across all nodes as long as the user have the key the user can access his own data even if the node is a newly joined node to the network.
As i understand also Corda doesn't support multiple identities on a single node as it is node basis. What will be the approach for this case using Corda platform?
first of all - Corda is not like Ethereum, Fabric and any other blockchain where all nodes store same common state. In Corda network nodes store only transactions and states they were participating in or observing. So its more peer-to-peer rather than broadcast.
Check here for more details:
https://docs.corda.net/key-concepts-ledger.html
From this perspective Corda is probably not the best candidate for public Identity network.
For solution about self-sovereign identity management I would recommend to have a look at something like Sovrin(Indy). You can use it to build app on top of the platform. Or just learn their design ;)
Corda may have sense in Identity context if there are different organisations and they exchange its members identity info for some reason. Then node will be Identity Manager and store info about people who gave it its credentials of any kind. So Identity will be mere state here, I think. Corda itself will play transport and storage role. Not a blockchain-style decentralized way at all but may be useful in some cases.

Can one single Corda node support multiple parties/accounts?

Just wonder if a single Corda node can support multiple accounts like Bitcoin does.
A related open question I found on discourse https://discourse.corda.net/t/can-you-have-a-corda-identity-adress-without-running-a-node/1298
From Peter FroystadFroystadP6d
Does Corda support multiple people having accounts/addresses on the network without running a Corda node?
Similar to how Bitcoin allow people to own coins with a private key, but not running a full node?
In the financial world, this would correspond to people having an account in the bank, and they would share facts on a 1-1 basis with their bank regarding their dealings. These customers wouldn't run a peer node however. But they would want a service similar to a Bitcoin wallet that would allow them to access their dealings with the bank
Corda is designed for varied institutions which are not all banks, so it doesn't have a direct concept of "account" like Bitcoin does. If you want to implement customer accounts on top of Corda you need to track the balances yourself and use Corda for inter-institutional transfers. Corda's ability to easily integrate with SQL databases and MQ makes that kind of integration quite easy though.
If you're asking about multiple identities on a single node, so one machine can do both legal entity A and B at once, the answer is: we're working on it.
It is now possible with Corda 4.3 and the new Accounts SDK
https://github.com/corda/accounts
However, it is not a simple drop in replacement for Party and requires changes to an existing cordapp and implementing several parts of the business logic of what you might consider an "account" or "wallet".
According to the blog post by Mike Hearn: https://discourse.corda.net/t/mobile-consumer-payment-experiences-with-corda-on-ledger-cash/966
Note that your balance in this scheme is simply your bank balance. There are no separate wallets.
It looks to me running multiple accounts/parties on a single node is not supported in the moment. Yet, we may expect the support realized in Phase 2 Bitcoin SPV wallet mode.
In phase two this is extended to support a model more like Bitcoin SPV, whereby the sending device manages its own private keys and transaction data. It thus becomes a true wallet app.

Encrypting data in SQL Server Azure database with separate key for each user's data

I'm trying to create a service based on an Azure SQL Database backend.
The service will be multi-tenant, and would contain highly sensitive information from multiple "clients" (potentially hundreds of thousands), that must be strictly isolated from one another and secured heavily against data leaks. "by design"
Using so many individual databases would not be feasible, as there will be a lot of clients with very little information per client.
I have looked into the transparent encryption offered by Azure, but this would essentially encrypt the whole database as one, so it would in other words not protect against leaks between clients or someone else; due to development errors, or hostile attacks, and it's very critical that one "client's" information never comes into anyone else's hands.
So what I would really like to achieve, is to encrypt each client's data in the database with a different key, so that you would have to obtain the key from each client (from their "physical" location) to de-crypt any data you might manage to extract from the database for that particular client, which would be virtually impossible for anyone to do.
Is it clear what I mean?
Do you guys have any suggestions for me on how to manage this problem, or know of any third party solution that allows for this functionality? Any other advise?
You're looking at protecting/isolating the tenants "by design" in a single table, why not check out Row Level Security. You could configure it to serve up only the applicable rows to a specific tenant.
This doesn't directly address your initial question about encrypting the data with a separate key for each tenant; If you have a separate table for each tenant, then you could do this via Always Encrypted, but this would seem to have some complexity in key management, if you're trying to handle 200k keys.
AFAIK, there isn't a native SQL Server functionality to encrypt each set of rows that belongs to a tenant with a distinct key- but there may be some elegant solutions that I haven't seen yet; Of course, you could do this on the app side and store it in SQL and there would be no issues; the trick would be the same as the AE based solution above- managing a large number of keys.

Is Graph Database a good use case for a messaging system?

I am diving in the universe of Graph Databases and I'm simply amazed by how powerful it is. I chose OrientDB to start my first use case but I'm not certain if my domain applies to this specific section of my App.
An User follows another User.
An User can be part of a Conversation.
A Message can be sent (with a timestamp) to a Conversation.
A Message can be read (with a timestamp) by an User.
I'm worried to end up with millions (even billions) of Message nodes and sent or read edges thus affecting the overall performance of the system. The messaging section is not the main concept of the app, it is just a small portion of it.
Would it be a problem for OrientDB to handle? Is it a good application for a Graph Database?
Thank you all for your patience,
Vinicius
Don't think a Graph Database is a best candidate for a messaging system. Message system are relational in nature and suits the likes my MySQL.
You wouldn't be surprised to hear though that Facebook uses document-oriented databases for their messaging system.
Facebook is currently the largest installation of Cassandra, which is excellent for scalability. We already know that from Facebook. Plus its great for storing messages due to its distributed nature.
Take a look at the suggested way to use OrientDB with a similar use case:
http://orientdb.com/docs/last/Chat-use-case.html
The choice of a graph database ultimately depends on what are you going to do with the data.
In your case, do you plan to use any graph-processing algorithms, or graph traversals?
An edge in graph theory represents a relationship between nodes (objects). In the case of a timestamp for read and sent, it does not really fit and you will end up with billions of edges, killing the performance of the system.
The follower concept perfectly fits the database. Now concerning the Conversation it could be an attribute of the node. Do you need to create an edge to represent ownership just to query the Conversation ID ?
If the messaging is just a small part of your application, I suggest to use the best tool for your need and to combine both a column-oriented database (Cassandra) and use Orient-DB to represent relationships or use Orient-DB as in the Chat use case (Thanks #Lvca)
What we suggest is to avoid using Edges or Vertices connected with
edges for messages. The best way is using the document API by creating
one class per chat room, with no index, to have super fast access to
last X messages.
Also wondering about this topic but I think any RDBMS will be better for this task.
Also, Chat is kinda of a log. So ElasticSearch (and similar) can be perfect match for storing Terra bytes of chat data.
A lot of dissonant answers here.. Speaking from experience, I've built a few messaging systems on plain MongoDB instances with no issues whatsoever handling hundreds/thousands of concurrent users (with chat groups).
I'd say go with either Cassandra as it's a battle-tested database if you're very worried about scalability (as it's got it practically built-in) or some of the newcomers like MongoDB which is constantly being upgraded and you can relatively easily then include search via ElasticSearch on top of that. MongoDB supports scaling via sharding and it can therefore horizontally scale to your needs.
Just be sure to not bottleneck your speed on your backend service, implement as much asynchronous operations as possible.
Now, you can even go as far as to implement a streaming platform like Kafka which is excellent for CDC (change data capture) and will persist your message log until it is read by a service that actually writes messages to your database of choice, adding to your resiliency factor.

Resources