Model Firebase realtime database in JSON schema - firebase

The Firebase database uses a subset of JSON. Thus it seems obvious to use JSON schema to describe the data model. This would allow to make use of tools which generate HTML forms or typescript models from it or generate random test data.
My question: How would one model key-value pairs in JSON schema, where the key is an id?
Example: (borrowed from firebase spec)
{
"users": {
"mchen": {
"name": "Mary Chen",
// index Mary's groups in her profile
"groups": {
// the value here doesn't matter, just that the key exists
"alpha": true,
"charlie": true
}
},
...
The group name here is used as an group id. In this reference (groups object) as well as in the group object itself, the id is used as the property name.
JSON schema for above example is:
{
"$schema": "http://json-schema.org/draft-04/schema#",
"type": "object",
"properties": {
"users": {
"type": "object",
"properties": {
"mchen": {
"type": "object",
"properties": {
"name": {
"type": "string"
},
"groups": {
"type": "object",
"properties": {
"alpha": {
"type": "boolean"
},
"charlie": {
"type": "boolean"
}
}
}
}
}
}
}
}
}
What I would need for the example is something like the following, where NAME is a placeholder for the property name and NAME_TYPE defines it's type.
{
"$schema": "http://json-schema.org/draft-04/schema#",
"type": "object",
"properties": {
"users": {
"type": "object",
"properties": {
NAME: {
"type": "object",
NAME_TYPE: "string",
"properties": {
"name": {
"type": "string"
},
"groups": {
"type": "object",
"properties": {
NAME: {
NAME_TYPE: "string"
"type": "boolean"
}
}
}
}
}
}
}
}
}
(Maybe I am on the completely wrong path here or maybe JSON schema isn't able to model the required structure.)

There are certainly arrays in Firebase but they are situational and should be used only in certain use cases and should generally be avoided.
The Firebase structure you posted is very common and there are key:value pairs in your structure so the question is a tad unclear but I'll give it a shot.
'groups' is the parent key and the values are the children key:value pairs of group1:value, group2:value.
The group1, group2 keys you listed are essentially the same as the id's listed in the first example, other than it's not an array. i.e. array's have sequential, hard coded-indexes (0th, 1st, 2nd etc) whereas the keys in firebase are open-ended and can generally be set to any alphanumeric value - they are used more to refer to a specific node than the enforce an particular order (i'm speaking generally here)
In the Firebase structure, those keys could be id0, id1, id2... or a,b,c... or a timestamp... or an auto-generated Firebase id (childByAutoId) that would also make them 'sequential'.
However, you can get into trouble assigning your own with id0, id1 etc.
id0
id1
id2
.
id9
id10
id11
The reality here is that the actual order will be
id0
id1
id10
id11
id2
The 'key' is that if you are using the keys to read data in sequentially, set them up as such. You may also want to consider generating your keys with childByAutoId (see docs for language specifics) and orderBy one of the child values such as a timestamp or index.
'groups': {
'auto-generated id': {
'name': 'alpha',
'index': 0,
'timestamp': '20160814074146'
...
},
'auto-generated id': {
'name': 'charlie',
'index': 1,
'timestamp': '20160814073600'
...
},
...
}
in the above case, I can orderBy name, index or timestamp.
Name and index will read the nodes in the order they are listed, if we order by timestamp, then the charlie node will be loaded first. Leveraging the child values to orderBy is very flexible.
Also, you can limit the set of data you are loading in with startingAt and endingAt. So for example, you want to load nodes starting at node 10 and ending at node 14. Easily done with non-array JSON data but not easily done if it's stored in an array as the entire array must be read in.

Related

Non Recursive Index CosmosDb

As a CosmosDB (SQL API) user I would like to index all non object or array properties inside of an object.
By default the index in cosmos /* will index every property, our data set is getting extremely large (expensive) and this strategy is no longer optimal. We store our metadata at the root and our customer data wrapped inside of an object property data.
Our platform restricts queries on the data path to be value type properties, this means that for us to index objects and arrays nested under the data path is just slowing down writes and costing RUs to store but never getting used.
I have tried several iterations of index policies but cannot find one that fits. Example:
{
"partitionKey": "f402a704-19bb-4f4d-93e6-801c50280cf6",
"id": "4a7a11e5-00b5-4def-8e80-132a8c083f24",
"data": {
"country": "Belgium",
"employee": 250,
"teammates": [
{ "name": "Jake", "id": 123 ...},
{ "name": "kyle", "id": 3252352 ...}
],
"user": {
"name": "Brian",
"addresses": [{ "city": "Moscow" ...}, { "city": "Moscow" ...}]
}
}
}
In this case I want to only index the root properties as well as /data/employee and /data/country.
Policies like /data/* will not work because it would then index /data/teammates/name ... and so on.
/data/? => assumes data is a value type which it never will be so this doesn't work.
/data/ and /data/*/? and /data/*? are not accepted by cosmos as valid policies.
Additionally I can't simply exclude /data/teammates/ and /data/user/ because what is inside of data is completely dynamic so while that might cover this use case there are several 100k others that it would not.
I have tried many iterations but it seems that options don't work for various reasons, is there a way to support what I am trying to do?
This indexing policy will index the properties you are asking for.
{
"indexingMode": "consistent",
"automatic": true,
"includedPaths": [
{
"path": "/partitionKey/?"
},
{
"path": "/data/country/?"
},
{
"path": "/data/employee/?"
}
],
"excludedPaths": [
{
"path": "/*"
}
]
}

JSON API automatic reverse links

Is it a violation of the JSON-API spec to allow reverse relationships to be created automatically?
I need to create resources that, when I link A to B in a relationship, automatically links B to A. In this way I can traverse A to find all of its Bs and can find the parent A from a B. However, I don't want to POST/PATCH to 2 relationships to get this right. I want to establish the relationship once.
Now I know that it is an implementation detail as to how the server maintains link/references as well as how the behaviour is established but I want to build the API in such a way that it doesn't violate the spec.
Assuming I have resources Books and Authors. Books have Authors and Authors have Books. The question is, once I relate an Author to a Book, I need the reverse relationship to be created as well. Is it a violation of the spec in any way to assume that this reverse relationship can be automatically created by simply doing one POST to the Books resource's relationship?
By way of example, starting with the book.
{
"data": {
"type": "books", "id": 123, "attributes": ...,
"links": { "self": "/books/123" },
"relationships": {
"self": "/books/123/relationships/authors",
"related": "/books/123/authors"
}
}
}
And the author
{
"data": {
"type": "authors", "id": 456, "attributes": ...,
"links": { "self": "/authors/456" },
"relationships": {
"self": "/authors/456/relationships/books",
"related": "/authors/456/books"
}
}
}
If I establish the link from a book to an author with a POST to /books/123/relationships/authors
{
"data": [{ "data": "authors", "id": "456" }]
}
Do I need to explicitly do the same for the Author 456 as a POST to /authors/456/relationships/books?
{
"data": [{ "data": "books", "id": "123" }]
}
Or can I let the server build the relationship for me so that I can avoid the second POST and just see the automatic reverse relationship at GET /authors/456/relationships/books?
From the perspective of the spec this is only one relationship represented from two different sides. author and book have a many-to-many relationship. This relationship could be represented in author's resource object as well as book's resource object and of course also via there relationship links. Actually it would be a violation of the spirit of the specification if representations wouldn't match. Having one-sided relationships is another story but in that case one side wouldn't know about the relationships at all (e.g. a book is associated with an author but the author model does not know which books are associated with it).
A post to either one side of that relationship creates the relationship between the two records. It shouldn't matter which side is used to create that relationship and if it's created as part of a creation / update to a resource via it's resource object or via a relationship link representing that relationship. The same applies to deletion of that relationship.
Maybe an example would make that even more clear. Let's assume a book is created with a POST to /books?include=author having these payload:
{
"data": {
"type": "books",
"relationships": {
"author": {
"data": {
"type": "authors",
"id": "1"
}
}
}
}
}
The response may look like this:
{
"data": {
"type": "books",
"id": "7",
"relationships": {
"author": {
"data": { "type": "authors", "id": "1" }
}
}
},
"included": [
{
"type": "authors",
"id": "1",
"relationships": {
"books": {
"data": [
{ "type": "books", "id": "7" }
]
}
}
}
]
}

Access definitions on ref property with newtonsoft.json.schema

I want to have access to the definitions in the schema in order to get the naming of the definition. I am using newtonsoft.json v11.01
I am building a c# converter for jsonschema to make a syntaxtree and compile it in order to get a typed version of the object at runtime.
{
"$id": "https://example.com/arrays.schema.json",
"$schema": "http://json-schema.org/draft-07/schema#",
"description": "xml remarks",
"type": "object",
"properties": {
"fruits": {
"type": "array",
"items": {
"type": "object",
"title": "fruit",
"required": ["naam"],
"properties": {
"naam": {
"type": "string",
"description": "The name of the fruit."
}
}
}
},
"vegetables": {
"type": "array",
"items": { "$ref": "#/definitions/veggie" }
}
},
"definitions": {
"veggie": {
"type": "object",
"required": [ "veggieName", "veggieLike" ],
"properties": {
"veggieName": {
"type": "string",
"description": "The name of the vegetable."
},
"veggieLike": {
"type": "boolean",
"description": "Do I like this vegetable?"
}
}
}
}
}
in the schema a reference is created with the name veggie. This is used in the property vegetable with a reference.
Json schema contains a definition on the root object but is doesn't have it on the property element. On the property element there is nothing identifiable to point to the right definition.
how do i find the right definition for the property?
In general, in order to resolve a json-pointer (a $ref is a "uri-reference", and the part after the # is a "json-pointer"), you need to have access to the root of the json document.
So if you currently have a function that only gets an argument that points to the "properties" section, then you need to give that function a second argument that points to the root of the document.
(It gets more complicated when you're using a schema made up of more than one file; then you need a second argument that points to the roots of all the schema documents)
It's one of the more difficult parts of writing software that interprets json-schema files, especially if your language/library doesn't have support for json-pointers built-in - you'll need to write it yourself in that case.

Redux + Normalizr : Adding and deleting normalized entities in Redux state

I have an API response which has a lot of nested entities. I use normalizr to keep the redux state as flat as possible. For eg. the api response looks like below:
{
"id": 1,
"docs": [
{
"id": 1,
"name": "IMG_0289.JPG"
},
{
"id": 2,
"name": "IMG_0223.JPG"
}
],
"tags": [
{
"id": "1",
"name": "tag1"
},
{
"id": "2",
"name": "tag2"
}
]
}
This response is normalized using normalizr using the schema given below:
const OpeningSchema = new schema.Entity('openings', {
tags: [new schema.Entity('tags')],
docs: [new schema.Entity('docs')]
});
and below is how it looks then:
{
result: "1",
entities: {
"openings": {
"1": {
"id": 1,
"docs": [1,2],
"tags": [1,2]
}
},
"docs": {
"1": {
id: "1",
"name": "IMG_0289.JPG"
},
"2": {
id: "2",
"name": "IMG_0223.JPG"
}
},
"tags": {
"1": {
"id": 1,
"name": "tag1"
},
"2": {
"id": 2,
"name": "tag2"
}
}
}
}
The redux state now looks something like below:
state = {
"opening" : {
id: 1,
tags: [1,2],
docs: [1,2]
},
"tags": [
{
"id":1,
"name": "tag1"
},
{
"id":2,
"name": "tag2"
}
],
"docs": [
{
"id":1,
"name": "IMG_0289.JPG"
},
{
"id":2,
"name": "IMG_0223.JPG"
}
]
}
Now if I dispatch an action to add a tag, then it adds a tag object to state.tags but it doesn't update state.opening.tags array. Same behavior while deleting a tag also.
I keep opening, tags and docs in three different reducers.
This is an inconsistency in the state. I can think of the following ways to keep the state consistent:
I dispatch an action to update tags and listen to it in both tags reducer and opening reducer and update tags subsequently at both places.
The patch request to update opening with tags returns the opening response. I can again dispatch the action which normalizes the response and set tags, opening etc with proper consistency.
What is the right way to do this. Shouldn't the entities be observing for changes to the related entities and make the changes itself. Or there are any other patterns that could be followed any such action.
First to summarise how normalizr works: normalizr flattens nested API response to entities defined by your schemas. So, when you made your initial GET openings API request, normalizr flattened the response and created your Redux entities and the flattened objects: openings, docs, tags.
Your suggestions are viable, but I find normalizr's real benefit in separating API data from UI state; so I don't update the data in Redux store myself... All my API data are kept in entities and they are not altered by me; they are vanilla back-end data... All I do is to do a GET upon state changing API operations, and normalise the GET response. There is a small exception for DELETE case that I'll expand on later on... A middleware will deal with such cases, so you should use one if you haven't been using. I created my own middleware, but I know redux-promise-middleware is quite popular.
In your data set above; when you add a new tag, I assume you are making an API POST to do so, which in turn updates the back-end. Then, you should do another GET openings which will update the entities for openings and all its nested schemas.
When you delete a tag, e.g. tag[2], upon sending the DELETE request to the back-end, you should nullify the deleted object in your entities state, ie. entities.tags[2] = null before making the GET openings again to update your normalizr entities.

How to form inner SubQuery in Gremlin Server (Titan 1.0)?

I'm using Following Query :
g.V(741440).outE('Notification').order().by('PostedDateLong', decr).range(0,1).as('notificationInfo').match(
__.as('notificationInfo').inV().as('postInfo'),
).select('notificationInfo','postInfo')
it is giving following result :
{
"requestId": "9846447c-4217-4103-ac2e-de3536a3c62a",
"status": {
"message": "",
"code": ​200,
"attributes": { }
},
"result": {
"data": [
{
"notificationInfo": {
"id": "c0zs-fw3k-347p-g2g0",
"label": "Notification",
"type": "edge",
"inVLabel": "Comment",
"outVLabel": "User",
"inV": ​749664,
"outV": ​741440,
"properties": {
"ParentPostId": "823488",
"PostedDate": "2016-05-26T02:35:52.3889982Z",
"PostedDateLong": ​635998269523889982,
"Type": "CommentedOnPostNotification",
"NotificationInitiatedByVertexId": "1540312"
}
},
"postInfo": {
"id": ​749664,
"label": "Comment",
"type": "vertex",
"properties": {
"PostImage": [
{
"id": "amto-g2g0-2wat",
"value": ""
}
],
"PostedByUser": [
{
"id": "am18-g2g0-2txh",
"value": "orbitpage#gmail.com"
}
],
"PostedTime": [
{
"id": "amfg-g2g0-2upx",
"value": "2016-05-26T02:35:39.1489483Z"
}
],
"PostMessage": [
{
"id": "aln0-g2g0-2t51",
"value": "hi"
}
]
}
}
}
],
"meta": { }
}
}
I want to get information of Vertex "NotificationInitiatedByVertexId" (Edge Property ) in the response as well.
For that i tried following query :
g.V(741440).outE('Notification').order().by('PostedDateLong', decr).range(0,2).as('notificationInfo').match(
__.as('notificationInfo').inV().as('postInfo'),
g.V(1540312).next().as('notificationByUser')
).select('notificationInfo','postInfo','notificationByUser')
Note : I tried directly with vertex Id in subquery as I wasn't aware how to dynamically get value from edge property in query itself.
It is giving error. I tried a lot but am not able to find any solution.
I'm assuming that you are storing a Titan generated identifier in that edge property called NotificationInitiatedByVertexId. If so, please consider the following even though this first part doesn't really answer your question. I don't think you should store a vertex identifier on the edge. Your graph model should explicitly track the relationship of NotificationInitiatedBy with an edge and by storing the identifier of the vertex on the edge itself you are bypassing that. Also, if you ever have to migrate your data in some way, the ids won't be preserved (Titan will generate new ones) and trying to sort that out will be a mess.
Even if that is not a Titan generated identifier and a logical one you created, I still think I would look to adjust your graph schema and promote that Notification to a vertex. Then your Gremlin traversals would flow more easily.
Now, assuming you don't change that, then I don't see a reason to not just issue two queries in the same request and then combine the results to one data structure. You just need to do a lookup with the vertex id which is going to be pretty fast and inexpensive:
edgeStuff = g.V(741440).outE('Notification').
order().by('PostedDateLong', decr).range(0,1).as('notificationInfo').
... // whatever logic you have
select('notificationInfo','postInfo').next()
vertexStuff = g.V(edgeStuff.get('notificationInfo').value('NotificationInitiatedByVertexId')).next()
[notificationInitiatedBy: vertexStuff, notification: edgeStuff]

Resources