What kind of data storage uses Sparkfun's phant? Do they use a kind of a key-value or relational data store in background or a document-based one?
Okay, I guess i found an answer for this now on my own. According to www.prokarma.com's blog they use
a repository for storing key value pairs being pushed from devices
connected to the internet.
Related
It's being proposed that we store a data about a relationship between two vertices on the edge between them. The idea would be that these two vertices are related and there are user level pieces of information that are looking to be stored in graph. The best example I can think of would be a Book, and a Reader, and the Reader can store cliff notes on the edges for retrieval later on.
Is this common practice? It seems to me that we should minimize the amount of data living in edges and that a vast majority of GraphDB data be derived data, rather than using it as an actual data store. Given that its in memory, what happens when it goes down? (We're using Neptune so.. there are technically backups).
Sorry if the question is a bit vague, but I'm not sure else how to ask. I've googled around looking for best practices and its all pretty generic data related to the concepts and theories of graph db.
An additional question, is it common practice to expose the gremlin API directly to users, or should there always be a GraphQL (or other) API in front of it?
Without too much additional detail it is hard to provide exact modeling advice , but in general one of the advantages of using a graph databases is that edges are first class citizens and allow for properties on edges. A common use case for this would be something like PERSON - purchases -> Product where you might have a purchase_date on the purchases edge to represent the date of the purchase, as someone might buy the same thing multiple times.
I am not sure what exactly you mean by that a vast majority of GraphDB data be derived data as you can use graphs to derive and infer data/relationships based on the connections but they do fully support storing data in them as well.
Given that its in memory, what happens when it goes down? - Amazon Neptune (and most other DBS) use a buffer cache to store some data in memory, but that data is also persisted to disk, so if the instance goes down, there is no problem with recovering it from the durable storage.
An additional question, is it common practice to expose the gremlin API directly to users, or should there always be a GraphQL (or other) API in front of it? - Just as with any database, I would not recommend exposing the Gremlin API directly to consumers, as doing so comes with a whole host of potential security risks. Generally, the underlying data store of any application should be transparent to the users. They should be interacting with an interface like REST/GraphQL that is designed to answer business related questions and not really know or care that there is a graph database backing those requests.
I have 50-100mb dataset that users need to have access to. It's static, so doesn't make sense to host a server for it. There are two kinds of operations I'll perform on the data:
Reading objects by unique ObjectId. Each object is ~3kb.
Full text search through ~300.000 strings. Each string is 4-60 characters.
I'm considering to store data as JSON files. The 300k strings will be stored separately. I'll use https://github.com/nextapps-de/flexsearch or something similar to perform search over it. I've done something similar before with ~10mb dataset back in 2016. I used just regex search and it was working flawlessly.
Are there reasons to use RealmDB, SQLite, PouchDB or something else instead of just JSON?
I wish I did this question an year ago...
In the office I currently work we tried creating an app by using PouchDB and react native, we basically saw PouchDB as an advantage because it wouldn't require our API to send all data over and over again on every refresh triggered by the user, it would only send the data that changed based on the client's checkpoint. As the data in the server was quite heavy (around 6k entries with more than 200 attributes each) we tried at all costs to go easy on the client's data plan.
Months after this implementation was in place we implemented a search functionality with many different options for sorting and filtering, and not only we had to throw away all our implementation of PouchDB, but we had to start from scratch replacing all its logic with indexed JSON values. PouchDB performance was extremely slow, it was taking more than 5 seconds or so to retrieve results, and we just couldn't afford to delay this time on our scope.
In the end we accomplished to reach a very quick search running flex search inside our indexed JSONs. Don't do the same mistake we did, PouchDB costed us too much budget and precious time. It was a terrible choice.
Unfortunately I cannot offer proof or more details from a reputable source, I can only share the own personal terrible experience I had when I thought we were reaching the end of a project and we had to start from scratch. it was a mess.
Oh boy, a bountied, opinion based question!
I have about 5 years experience with pouchDB specifically, a little with SQLite. I have but a cursory experience with RealmDB - I tried it out and decided it was not a good fit for my hybrid/mobile needs.
pouchDB exceeds in on one area hands down - synchronization/replication just like it's big brother CouchDB. Providing interaction with an offline database that synchronizes with a remote database is huge for many mobile apps. pouchDB is schemaless, leveraging JSON documents. With pouchDB one may choose among several data stores via adapters. As there can be quota headaches1 for your data size the right choice may likely be the SQLite adapter. pouchDB does not support full text search.
SQLite is what its name implies - a relational database, requiring a schema. An advantage to SQLite is platform support and the size of the database is not subject to quota headaches like web storage (e.g. IndexedDB). SQLite supports full text search, and apps can deploy with a canned database.
Between pouchDB and SQLite lies RealmDB - it is a schema based object database that supports synchronization/replication. Like pouchDB, it does not support full text search.
Now your requirements
Looking up object by id
300k static text
full-text search
I read 'static' to mean immutable.
Since your data does not change and full-text search is required, pouchDB and RealmDB would not be good choices. If there is a requirement to enhance, remove or add to the data, either would make sense as changes to data on a single server would replicate changes to the local database, practically in a seamless fashion.
SQLite might be a reasonable choice since it supports search and it is possible to deploy a canned database with the app. However, SQLite can be slow in hybrid apps.
So,
pouchDB and RealmDB would be massive overkill and not a good fit.
SQLite would add a fair bit of complexity.
For your specific requirements I'd stay on your path, though I have a care as it appears flexsearch loads its index into memory - if its performance returns some penalty then SQLite, with it's ability to deploy a canned database and providing a search facility may prove a reasonable trade off versus complexity.
Good luck!
1 Quota Headaches
I would say it really just depends on whether you want and NEED to leverage the power of relational queries. Because your data is never changing I would use JSON unless you are trying to perform complex comparisons between your data. In your case it sounds like you are just going to be searching for the particular ObjectId so JSON is your best bet especially because you are saying you won't need to change the data later.
If you organize your JSON so that your ObjectId are in a sorted order you will easily be able to search quickly.
When creating a CosmosDB instance, we can choose the API that we will use to communicate with the instance (e.g. SQL, MongoDB, Cassandra, etc.)
What is not clear to me is - does this selection dictates how the data is stored, or only the way we communicate with the instance? For example, if we choose MongoDB, does it mean that CosmosDB will store data in a MongoDB fashion?
The choice of API does not change how the data is stored. Cosmos DB always stores data using something called atom-record-sequence (ARS) which is essentially a set of primitive types, structs and arrays. The database engine translates the native ARS format into the data structures used by the various APIs (i.e. json documents, table rows, etc.)
So the answer to your question is that the choice of API only impacts how you communicate with the databases for that Cosmos DB account.
As David Makogon points out in his comment on another answer, while the way the data is stored is the same regardless of the API used, the content of the data will be different because each API requires it's own metadata so that the underlying data can be projected into the format expected by each API.
Here is a good technical overview of how Cosmos works under the hood.
https://azure.microsoft.com/en-us/blog/a-technical-overview-of-azure-cosmos-db/
Data is always stored in the same fashion (as a bunch of json documents), only the way you interact with the data changes
https://learn.microsoft.com/en-us/azure/cosmos-db/introduction#develop-applications-on-cosmos-db-using-popular-open-source-software-oss-apis
Edit: After posting the question I thought I could also make this post a quick reference for those of you needs a quick peek at some of the differences between these two technologies which might help you decide on one of them eventually. I will be editing this question and adding more info as I learn more.
I have decided to use firebase for the backend of my project. For firestore is says "the next generation of the realtime database". Now I am trying to decide which way to go. Realtime database or cloud firestore?
Billing:
At a first glance, it looks like firestore charges per number of results returned, number of reads, number of writes/updates etc. Real-time database charges based on the data transmitted. The number of read-write operations is irrelevant. They both also charge on the data stored on the google servers too (I think in this respect firestore is cheaper one). Why am I mentioning this price point? Because from my point of view, although it might a lower weight, it is also a point to consider while choosing the one over the other.
Scaling:
Cloudstore seems to scale horizontally seamlessly. I think this is not possible with the real-time database.
Edit:
In the real-time database, you need to shard your data yourself using multiple databases. And you can only do this if you are in BLAZE pracing plan.
ref: https://firebase.google.com/docs/database/usage/sharding
Performance & Indexing:
Another thing is the real-time database data structure is different in both. The real-time database stores the data as a JSON object in any way we structure them. Firestore structures the data as collections and documents. And hence the querying also changes between the two.
I think firestore does auto indexing which increases the read performance greatly too (which will decrease read performance). I am not sure if this is also the case with the real-time database.
Edit:
The real-time database does not automatically index your data. You need to do it yourself after a solid inspection of your data and your needs.
ref:https://firebase.google.com/docs/database/security/indexing-data
What other differences can you think of?
What would be (or has been) your choice for different types of projects?
Do you still go with the real-time database or have you migrated from that to the firestore? If so why?
And one last thing. How would you compare the SDKs of these two?
Thanks a lot!
What other differences can you think of?
what i think, ok. I use realtime-database for 6 months experience and difference is, firestore easy for sorting data. As Example, i want to retrieving user name based timestamp.
Query firstQuery = firestore.collection("Names").orderBy("timestamp", Query.Direction.DESCENDING).limit(10); // load 10 names
What would be (or has been) your choice for different types of
projects?
For me, Realtime-Database for Data Streaming when i work with Arduino, i want to store Drone Speed.
And Firestore for SMART OFFICE, like Air Conditioner, or light-room and Enterprise like Inventory Quantities, etc.
Do you still go with the real-time database or have you migrated from
that to the firestore? If so why?
still go with real-time because i need TREE for displaying streaming data strucure instead of query TABLE like firestore.
I'm currently developing a mobile application which uses AJAX request to get data from a server.
To enable offline navigation in my application, I need to store all data collected.
My application is quite powerful because there's a section where the user can see charts (powered by highcharts).
I'm asking myself about the best solution to cache the data collected in the JSON format.
Is it light or efficient enough to JSON.stringify the data array into local storage like:
localStorage.setItem("graph_1_datas", JSON.stringify(json_data_array));
Or would it be better to create a database, and a table like that:
TABLE
-----
id
graphId
blockId
x
y
I have 3 graphIds by blockId, and about 10 blockIds...
Storing the JSON strings to local storage should be fairly fast and efficient. Just store a separate file for each request and then it will give you clear simple code for getting the data either from local storage or web service.
If you are likely to want to edit the data offline then you may wish to consider an SQLite database as it will make it easier/more efficient to add code to track changes.
You may also want to consider an SQLite database if your object graph gets more complicated and fits a relational database model.