Build File Tree of Cloud Storage in Firebase Firestore [duplicate] - firebase

What is a clean/efficient method for storing the directory Hierarchy/tree in a Key-Value database (in my case MongoDB but any of them)?
For example a tree structure
- Cars
+ Audi
+ BMW
- M5
+ Ford
- Color
+ Red
- Apple
- Cherry
+ Purple
- Funny
The method I am using now, each object links to it's parent
{
dir: "red"
parent-dir: "color"
}
This makes it very efficient/fast to insert and reorder any aspect of the tree (for example if I want to move Red and all it's children to the Cars directory).
But this method sucks when I want to all subdirectories and their children for a given directory recursively. To make it efficient to parse I can have a structure for example
{
dir: "red"
children: "audi, bmw, ford"
}
{
dir: "bmw"
children: "m5"
}
But if I want to modify the tree, a whole bunch of objects need to touched and modified.
Are there any other methods to storing a directory structure in a KV store?

The method you currently use now is called adjacency list model.
Another model to store hierarchical data in a (relational) database is the nested set model. Its implementation in SQL databases is well known. Also see this article for the modified preorder tree traversal algorithm.
A very simple method: you could store a path per object - with those it should be easy to query trees in NOSQL databases:
{ path: "Color", ... }
{ path: "Color.Red", ... }
{ path: "Color.Red.Apple", ... }
{ path: "Color.Red.Cherry", ... }
When nodes will be removed or renamed some paths must be updated. But in general, this method looks promising. You just have to reserve a special character as separator. The storage space overhead should be negligible.
edit: this method is called materialized path
Finally, here is a comparison of different methods for hierarchical data in NOSQL databases.

I don't have a huge amount of NOSQL experience, so this isn't a definitive answer, but here's how I'd approach it:
I would likely use your first approach, where you have:
{
dir: 'dir_name',
parent_dir: 'parent_dir_name'
}
And then set up a map-reduce to quickly query the children of a directory. MongoDB's map-reduce functionality is still only available in the development branch and I haven't worked with it yet, but in CouchDB (and I assume, with a few modification, in MongoDB) you could do something like:
map:
function(doc) {
emit( doc.parent_dir, doc.dir );
}
reduce:
function(key, values) {
return( values );
}
Which would give you the list of sub-directories for each parent directory.

I suggest storing a heap to the the id's of the data items.
I think this is the best plan. If you need lots and lots of stuff any heap element could be an index to another heap.
eg
{ "id:xxx", "id:yyy", "sub-heap-id:zzz"....}
If this is not clear post a comment and I will explain more when I get home.

Make an index!
http://www.mongodb.org/display/DOCS/Indexes

Related

How to normalizing data from the server in easy way in ngrx

I'm using ngrx/redux pattern in my app.
In Normalizing State Shape article, is it written that I should create a "table" for each object and link between them by an id.
for example:
posts = [{ id, author, comments: ["commentId1", "commentId2"....] }]
comments = [{ id: 'commentId1', comment: '..' } … ]
From my server side I get the object nested within,
posts: [ { id, author, comments: [ { id, comment } ] } ..]
So I need to write a code to refactor the object that match the Normalized State? for each arrays properties in my objects?
Is sound a big work to do. First, am I right I need to do that? Second, If so, there is a easy way to handle this?
I recently faced the same problem. I ended up using NGRX Entity with different states. In your case, one state for posts and one for comments. One could go further and normalize everything much more, but as you said it is a lot of work and I am not sure if it is worth it.
I found Todd Motto's tutorials to be really good: https://www.youtube.com/watch?v=al0LNgH3I4A
One way or another, you will still need a mapper that maps your server object into models you can use in your app. Different selectors can than help you to easily get the right comments for a given post.

Firestore Update fields in nested objects with dynamic key

I need to update a field in a nested object with a dynamic key.
the path could look like this: level1.level2.DYNAMIC_KEY : updatedValue
The update-method deletes everything else on level1 instead of only updating the field in the nested object. The update() acts more like a set(). What am I doing wrong?
I tried the following already:
I read the documentation https://firebase.google.com/docs/firestore/manage-data/add-data#update-data
but that way it is a) static and b) still deletes the other fields.
Update fields in nested objects
If your document contains nested objects, you can use "dot notation" to reference nested fields within the document when you call update()
This would be static and result in
update({
'level1.level2.STATIC_KEY' : 'updatedValue'
});
Then I found this answer https://stackoverflow.com/a/47296152/5552695
which helped me to make the updatepath dynamic.
The desired solution after this could look like
field[`level1.level2.${DYNAMIC_KEY}`] = updateValue;
update(field);
But still: it'll delete the other fields in this path.
UPDATE:
The Structure of my Doc is as follows:
So inside this structure i want to update only complexArray > 0 > innerObject > age
Writing the above path into the update() method will delete everything else on the complexArray-level.
A simple update on first-level-fields works fine and lets the other first-level-fields untouched.
Is it possible, that firestore functions like update() can only act on the lowest field-level on an document. And as soon as i put complex objects into an document its not possible to select such inner fields?
I know there would be the solution to extract those "complex" objects into separate collections + documents and put these into my current lowest document level. I think this would be a more accurate way to stick to the FireStore principles. But on Application side it is easier to work with complex object than to always dig deeper in firestore collection + document structure.
So my current solution is to send the whole complex object into the update() method even though I just changed only one field on application side.
Have you tried using the { merge: true } option in your request?
db
.collection("myCollection")
.doc("myDoc")
.set(
{
level1: { level2: { myField: "myValue" } }
},
{ merge: true }
)

Sails.js - Is there intended support for a "one-way-many" association

I'm interested in a one-way-many association. To explain:
// Dog.js
module.exports = {
attributes: {
name: {
type: 'string'
},
favorateFoods: {
collection: 'food',
dominant: true
}
}
};
and
// Food.js
module.exports = {
attributes: {
name: {
type: 'string'
},
cost: {
type: 'integer'
}
}
};
In other words, I want a Dog to be associated w/ many Food entries, but as for Food, I don't care which Dog is associated.
If I actually implement the above, believe it or not it works. However, the table for the association is named in a very confusing manner - even more confusing than normal ;)
dog_favoritefoods__food_favoritefoods_food, with id, dog_favoritefoods, and food_favoritefoods_food.
REST blueprints function with the Dog model just fine, I don't see anything that "looks bad" except for the funky table name.
So, the question is, is it supposed to work this way, and does anyone see something that might potentially go haywire?
I think you should be ok.
However, there does not really seem any reason to not complete the association for a Many to Many. The reason would be because everything is already being created for that single collection. The join table and its attributes are already there. The only thing missing in this equation is the reference back on food.
I could understand if putting the association on food were to create another table or create another weird join, but that has already been done. There really is no overhead to creating the other association.
So in theory you might as well create it, thus avoiding any potential conflicts unless you have a really compelling reason not to?
Edited: Based on the comments below we should note that one could experience overhead in lift based the blueprints and dynamic finders created.

MeteorJS collection vs. array

I am creating an application using MeteorJS that allows users to create items (e.g, text, images) and to collaboratively spatially organize these on a canvas. Multiple canvases can be created. In the future, items may be reused in or copied between (I am unsure yet) multiple canvases. I have never designed a collaborative (or even database driven) application before.
I could not find the possibility to create nested MeteorJS collections, and I am unsure about the (dis)advantages (e.g. considering scalability, speed) of using multiple collections vs. using an array of objects inside a collection, so I wonder what a good design pattern would be:
A:
Collection Canvases {
Canvas {
Array Items;
}
Canvas {
Array Items;
}
}
B:
Collection Items {
Item {
_id
}
Item {
_id
}
}
Collection Canvases {
Canvas {
Array ItemIDs;
}
Canvas {
Array ItemIDs;
}
}
Or perhaps something different?
Since Meteor "identifies changes based on the fields of the MongoDB document. But ... does not support nested fields and arrays", I'd use some data structure like you suggested in your proposal B: two collections. This ensures that only new/ updated items get pushed out to clients and not all items of a canvas.
Then make the relation between Canvas and Items as saimeunt pointed out in his comment above: Canvas{_id:"xxx"} Item{_id:"xxx",canvasId:"xxx"}. (I'm using a similar approach in my project minutocash which works fine.)
Furthermore, you can publish all related items of a canvas with the publish-with-relations package as David Weldon pointed out in this answer to a question of mine, about an issue which you might run into later with this data structure.

What solutions are there for circular references?

When using reference counting, what are possible solutions/techniques to deal with circular references?
The most well-known solution is using weak references, however many articles about the subject imply that there are other methods as well, but keep repeating the weak-referencing example. Which makes me wonder, what are these other methods?
I am not asking what are alternatives to reference counting, rather what are solutions to circular references when using reference counting.
This question isn't about any specific problem/implementation/language rather a general question.
I've looked at the problem a dozen different ways over the years, and the only solution I've found that works every time is to re-architect my solution to not use a circular reference.
Edit:
Can you expand? For example, how would you deal with a parent-child relation when the child needs to know about/access the parent? – OB OB
As I said, the only good solution is to avoid such constructs unless you are using a runtime that can deal with them safely.
That said, if you must have a tree / parent-child data structure where the child knows about the parent, you're going to have to implement your own, manually called teardown sequence (i.e. external to any destructors you might implement) that starts at the root (or at the branch you want to prune) and does a depth-first search of the tree to remove references from the leaves.
It gets complex and cumbersome, so IMO the only solution is to avoid it entirely.
Here is a solution I've seen:
Add a method to each object to tell it to release its references to the other objects, say call it Teardown().
Then you have to know who 'owns' each object, and the owner of an object must call Teardown() on it when they're done with it.
If there is a circular reference, say A <-> B, and C owns A, then when C's Teardown() is called, it calls A's Teardown, which calls Teardown on B, B then releases its reference to A, A then releases its reference to B (destroying B), and then C releases its reference to A (destroying A).
I guess another method, used by garbage collectors, is "mark and sweep":
Set a flag in every object instance
Traverse the graph of every instance that's reachable, clearing that flag
Every remaining instance which still has the flag set is unreachable, even if some of those instances have circular references to each other.
I'd like to suggest a slightly different method that occured to me, I don't know if it has any official name:
Objects by themeselves don't have a reference counter. Instead, groups of one or more objects have a single reference counter for the entire group, which defines the lifetime of all the objects in the group.
In a similiar fashion, references share groups with objects, or belong to a null group.
A reference to an object affects the reference count of the (object's) group only if it's (the reference) external to the group.
If two objects form a circular reference, they should be made a part of the same group. If two groups create a circular reference, they should be united into a single group.
Bigger groups allow more reference-freedom, but objects of the group have more potential of staying alive while not needed.
Put things into a hierarchy
Having weak references is one solution. The only other solution I know of is to avoid circular owning references all together. If you have shared pointers to objects, then this means semantically that you own that object in a shared manner. If you use shared pointers only in this way, then you can hardly get cyclic references. It does not occur very often that objects own each other in a cyclic manner, instead objects are usually connected through a hierarchical tree-like structure. This is the case I'll describe next.
Dealing with trees
If you have a tree with objects having a parent-child relationship, then the child does not need an owning reference to its parent, since the parent will outlive the child anyways. Hence a non-owning raw back pointer will do. This also applies to elements pointing to a container in which they are situated. The container should, if possible, use unique pointers or values instead of shared pointers anyways, if possible.
Emulating garbage collection
If you have a bunch of objects that can wildly point to each other and you want to clean up as soon as some objects are not reachable, then you might want to build a container for them and an array of root references in order to do garbage collection manually.
Use unique pointers, raw pointers and values
In the real world I have found that the actual use cases of shared pointers are very limited and they should be avoided in favor of unique pointers, raw pointers, or -- even better -- just value types. Shared pointers are usually used when you have multiple references pointing to a shared variable. Sharing causes friction and contention and should be avoided in the first place, if possible. Unique pointers and non-owning raw pointers and/or values are much easier to reason about. However, sometimes shared pointers are needed. Shared pointers are also used in order to extend the lifetime of an object. This does usually not lead to cyclic references.
Bottom line
Use shared pointers sparingly. Prefer unique pointers and non-owning raw pointers or plain values. Shared pointers indicate shared ownership. Use them in this way. Order your objects in a hierarchy. Child objects or objects on the same level in a hierarchy should not use owning shared references to each other or to their parent, but they should use non-owning raw pointers instead.
No one has mentioned that there is a whole class of algorithms that collect cycles, not by doing mark and sweep looking for non-collectable data, but only by scanning a smaller set of possibly circular data, detecting cycles in them and collecting them without a full sweep.
To add more detail, one idea for making a set of possible nodes for scanning would be ones whose reference count was decremented but which didn't go to zero on the decrement. Only nodes to which this has happened can be the point at which a loop was cut off from the root set.
Python has a collector that does that, as does PHP.
I'm still trying to get my head around the algorithm because there are advanced versions that claim to be able to do this in parallel without stopping the program...
In any case it's not simple, it requires multiple scans, an extra set of reference counters, and decrementing elements (in the extra counter) in a "trial" to see if the self referential data ends up being collectable.
Some papers:
"Down for the Count? Getting Reference Counting Back in the Ring" Rifat Shahriyar, Stephen M. Blackburn and Daniel Frampton
http://users.cecs.anu.edu.au/~steveb/downloads/pdf/rc-ismm-2012.pdf
"A Unified Theory of Garbage Collection" by David F. Bacon, Perry Cheng and V.T. Rajan
http://www.cs.virginia.edu/~cs415/reading/bacon-garbage.pdf
There are lots more topics in reference counting such as exotic ways of reducing or getting rid of interlocked instructions in reference counting. I can think of 3 ways, 2 of which have been written up.
I have always redesigned to avoid the issue. One of the common cases where this comes up is the parent child relationship where the child needs to know about the parent. There are 2 solutions to this
Convert the parent to a service, the parent then does not know about the children and the parent dies when there are no more children or the main program drops the parent reference.
If the parent must have access to the children, then have a register method on the parent which accepts a pointer that is not reference counted, such as an object pointer, and a corresponding unregister method. The child will need to call the register and unregister method. When the parent needs to access a child then it type casts the object pointer to the reference counted interface.
When using reference counting, what are possible solutions/techniques to deal with circular references?
Three solutions:
Augment naive reference counting with a cycle detector: counts decremented to non-zero values are considered to be potential sources of cycles and the heap topology around them is searched for cycles.
Augment naive reference counting with a conventional garbage collector like mark-sweep.
Constrain the language such that its programs can only ever produce acyclic (aka unidirectional) heaps. Erlang and Mathematica do this.
Replace references with dictionary lookups and then implement your own garbage collector that can collect cycles.
i too am looking for a good solution to the circularly reference counted problem.
i was stealing borrowing an API from World of Warcraft dealing with achievements. i was implicitely translating it into interfaces when i realized i had circular references.
Note: You can replace the word achievements with orders if you don't like achievements. But who doesn't like achievements?
There's the achievement itself:
IAchievement = interface(IUnknown)
function GetName: string;
function GetDescription: string;
function GetPoints: Integer;
function GetCompleted: Boolean;
function GetCriteriaCount: Integer;
function GetCriteria(Index: Integer): IAchievementCriteria;
end;
And then there's the list of criteria of the achievement:
IAchievementCriteria = interface(IUnknown)
function GetDescription: string;
function GetCompleted: Boolean;
function GetQuantity: Integer;
function GetRequiredQuantity: Integer;
end;
All achievements register themselves with a central IAchievementController:
IAchievementController = interface
{
procedure RegisterAchievement(Achievement: IAchievement);
procedure UnregisterAchievement(Achievement: IAchievement);
}
And the controller can then be used to get a list of all the achievements:
IAchievementController = interface
{
procedure RegisterAchievement(Achievement: IAchievement);
procedure UnregisterAchievement(Achievement: IAchievement);
function GetAchievementCount(): Integer;
function GetAchievement(Index: Integer): IAchievement;
}
The idea was going to be that as something interesting happened, the system would call the IAchievementController and notify them that something interesting happend:
IAchievementController = interface
{
...
procedure Notify(eventType: Integer; gParam: TGUID; nParam: Integer);
}
And when an event happens, the controller will iterate through each child and notify them of the event through their own Notify method:
IAchievement = interface(IUnknown)
function GetName: string;
...
function GetCriteriaCount: Integer;
function GetCriteria(Index: Integer): IAchievementCriteria;
procedure Notify(eventType: Integer; gParam: TGUID; nParam: Integer);
end;
If the Achievement object decides the event is something it would be interested in it will notify its child criteria:
IAchievementCriteria = interface(IUnknown)
function GetDescription: string;
...
procedure Notify(eventType: Integer; gParam: TGUID; nParam: Integer);
end;
Up until now the dependancy graph has always been top-down:
IAchievementController --> IAchievement --> IAchievementCriteria
But what happens when the achievement's criteria have been met? The Criteria object was going to have to notify its parent `Achievement:
IAchievementController --> IAchievement --> IAchievementCriteria
^ |
| |
+----------------------+
Meaning that the Criteria will need a reference to its parent; the who are now referencing each other - memory leak.
And when an achievement is finally completed, it is going to have to notify its parent controller, so it can update views:
IAchievementController --> IAchievement --> IAchievementCriteria
^ | ^ |
| | | |
+----------------------+ +----------------------+
Now the Controller and its child Achievements circularly reference each other - more memory leaks.
i thought that perhaps the Criteria object could instead notify the Controller, removing the reference to its parent. But we still have a circular reference, it just takes longer:
IAchievementController --> IAchievement --> IAchievementCriteria
^ | |
| | |
+<---------------------+ |
| |
+-------------------------------------------------+
The World of Warcraft solution
Now the World of Warcraft api is not object-oriented friendly. But it does solve any circular references:
Do not pass references to the Controller. Have a single, global, singleton, Controller class. That way an achievement doesn't have to reference the controller, just use it.
Cons: Makes testing, and mocking, impossible - because you have to have a known global variable.
An achievement doesn't know its list of criteria. If you want the Criteria for an Achievement you ask the Controller for them:
IAchievementController = interface(IUnknown)
function GetAchievementCriteriaCount(AchievementGUID: TGUID): Integer;
function GetAchievementCriteria(Index: Integer): IAchievementCriteria;
end;
Cons: An Achievement can no longer decide to pass notifications to it's Criteria, because it doesn't have any criteria. You now have to register Criteria with the Controller
When a Criteria is completed, it notifies the Controller, who notifies the Achievement:
IAchievementController-->IAchievement IAchievementCriteria
^ |
| |
+----------------------------------------------+
Cons: Makes my head hurt.
i'm sure a Teardown method is much more desirable that re-architecting an entire system into a horribly messy API.
But, like you wonder, perhaps there's a better way.
If you need to store the cyclic data, for a snapShot into a string,
I attach a cyclic boolean, to any object that may be cyclic.
Step 1:
When parsing the data to a JSON string, I push any object.is_cyclic that hasn't been used into an array and save the index to the string. (Any used objects are replaced with the existing index).
Step 2: I traverse the array of objects, setting any children.is_cyclic to the specified index, or pushing any new objects to the array. Then parsing the array into a JSON string.
NOTE: By pushing new cyclic objects to the end of the array, will force recursion until all cyclic references are removed..
Step 3: Last I parse both JSON strings into a single String;
Here is a javascript fiddle...
https://jsfiddle.net/7uondjhe/5/
function my_json(item) {
var parse_key = 'restore_reference',
stringify_key = 'is_cyclic';
var referenced_array = [];
var json_replacer = function(key,value) {
if(typeof value == 'object' && value[stringify_key]) {
var index = referenced_array.indexOf(value);
if(index == -1) {
index = referenced_array.length;
referenced_array.push(value);
};
return {
[parse_key]: index
}
}
return value;
}
var json_reviver = function(key, value) {
if(typeof value == 'object' && value[parse_key] >= 0) {
return referenced_array[value[parse_key]];
}
return value;
}
var unflatten_recursive = function(item, level) {
if(!level) level = 1;
for(var key in item) {
if(!item.hasOwnProperty(key)) continue;
var value = item[key];
if(typeof value !== 'object') continue;
if(level < 2 || !value.hasOwnProperty(parse_key)) {
unflatten_recursive(value, level+1);
continue;
}
var index = value[parse_key];
item[key] = referenced_array[index];
}
};
var flatten_recursive = function(item, level) {
if(!level) level = 1;
for(var key in item) {
if(!item.hasOwnProperty(key)) continue;
var value = item[key];
if(typeof value !== 'object') continue;
if(level < 2 || !value[stringify_key]) {
flatten_recursive(value, level+1);
continue;
}
var index = referenced_array.indexOf(value);
if(index == -1) (item[key] = {})[parse_key] = referenced_array.push(value)-1;
else (item[key] = {})[parse_key] = index;
}
};
return {
clone: function(){
return JSON.parse(JSON.stringify(item,json_replacer),json_reviver)
},
parse: function() {
var object_of_json_strings = JSON.parse(item);
referenced_array = JSON.parse(object_of_json_strings.references);
unflatten_recursive(referenced_array);
return JSON.parse(object_of_json_strings.data,json_reviver);
},
stringify: function() {
var data = JSON.stringify(item,json_replacer);
flatten_recursive(referenced_array);
return JSON.stringify({
data: data,
references: JSON.stringify(referenced_array)
});
}
}
}
Here are some techniques described in Algorithms for Dynamic Memory Management by R. Jones and R. Lins. Both suggest that you can look at the cyclic structures as a whole in some way or another.
Friedman and Wise
The first way is one suggested by Friedman and Wise. It is for handling cyclic references when implementing recursive calls in functional programming languages. They suggest that you can observe the cycle as a single entity with a root object. To do that, you should be able to:
Create the cyclic structure all at once
Access the cyclic structure only through a single node - a root, and mark which reference closes the cycle
If you need to reuse just a part of the cyclic structure, you shouldn't add external references but instead make copies of what you need
That way you should be able to count the structure as a single entity and whenever the RC to the root falls to zero - collect it.
This should be the paper for it if anyone is interested in more details - https://www.academia.edu/49293107/Reference_counting_can_manage_the_circular_invironments_of_mutual_recursion
Bobrow's technique
Here you could have more than one reference to the cyclic structure. We just distinguish between internal and external references for it.
Overall, all allocated objects are assigned by the programmer to a group. And all groups are reference counted. We distinguish between internal and external references for the group and of course - objects can be moved between groups. In this case, intra-group cycles are easily reclaimed whenever there are no more external references to the group. However, inter-group cyclic structures should still be an issue.
If I'm not mistaken, this should be the original paper - https://dl.acm.org/doi/pdf/10.1145/357103.357104
I'm in the search for other general purpose alternatives besides those algorithms and using weak pointers.
Y Combinator
I wrote a programming language once in which every object was immutable. As such, an object could only contain pointers to objects that were created earlier. Since all reference pointed backwards in time, there could not possibly be any cyclic references, and reference counting was a perfectly viable way of managing memory.
The question then is, "How do you create self-referencing data structures without circular references?" Well, functional programming has been doing this trick forever using the fixed-point/Y combinator to write recursive functions when you can only refer to previously-written functions. See: https://en.wikipedia.org/wiki/Fixed-point_combinator
That Wikipedia page is complicated, but the concept is really not that complicated. How it works in terms of data structures is like this:
Let's say you want to make a Map<String,Object>, where the values in the map can refer to other values in the map. Well, instead of actually storing those objects, you store functions that generate those objects on-demand given a pointer to the map.
So you might say object = map.get(key), and what this will do is call a private method like _getDefinition(key) that returns this function and calls it:
Object get(String key) {
var definition = _getDefinition(key);
return definition == null ? null : definition(this);
}
So the map has a reference to the function that defines the value. This function doesn't need to have a reference to the map, because it will be passed the map as an argument.
When the definition called, it returns a new object that does have a pointer to the map in it somewhere, but the map it self does not have a pointer to this object, so there are no circular references.
There are couple of ways I know of for walking around this:
The first (and preferred one) is simply extracting the common code into third assembly, and make both references use that one
The second one is adding the reference as "File reference" (dll) instead of "Project reference"
Hope this helps

Resources