Is there (or has there been considered) anything like 'merge' or 'batch' setting in Firebase? - firebase

In doing a bit more programming with Firebase today, I found myself wishing for a couple of features:
1) Merge set:
Say I have a firebase ref that has the value {a:1,b:2,c:3}.
If I do something like ref.set({a:-1,b:-2}) the new value will (unsurprisingly) be {a:-1,b:-2}.
Instead, imagine ref.mergeSet({a:-1,b:-2}) which would have a result in the value of the ref being {a:-1,b:-2,c:3}.
Now, I realize that I could do something like ref.child("a").set(-1) and ref.child("b").set(-2) to achieve this result, but in at least some cases, I'd prefer to get only a single call to my .on() handler.
This segues into my second idea.
2) Batch set:
In my application I'd like a way to force an arbitrary number of calls to .set to only result in one call to .on in other clients. Something like:
ref.startBatch()
ref.child("a").set(1)
ref.child("b").set(2)
....
ref.endBatch()
In batch mode, .set wouldn't result in a call to .on, instead, the minimal number of calls to .on would all result from calling .endBatch.
I readily admit that these ideas are pretty nascent, and I wouldn't be surprised if they conflict with existing architectural features of Firebase, but I thought I'd share them anyway. I find that I'm having to spend more time ensuring consistency across clients when using Firebase than I expected to.
Thanks again, and keep up the great work.

UPDATE: We've added a new update() method to the Firebase web client and PATCH support to the REST API, which allow you to atomically modify multiple siblings at a particular location, while leaving the other siblings unmodified. This is what you described as "mergeSet" and can be used as follows:
ref.update({a: -1, b: -2});
which will update 'a' and 'b', but leave 'c' unmodified.
OLD ANSWER
Thanks for the detailed feature request! We'd love to hear more about your use case and how these primitives would help you. If you're willing to share more details, email support#firebase.com and we can dig into your scenario.
To answer your question though, the primary reason we don't have these features is related our architecture and the performance / consistency guarantees that we're trying to maintain. Not to go too deep, but if you imagine that your Firebase data is spread across many servers, it's easier for us to have stronger guarantees (atomicity, ordering, etc.) when modifying data that's close in the tree than when modifying data that's far away. So by limiting these guarantees to data that you can replace with a single set() call, we push you in a direction that will perform well with the Firebase architecture.
In some cases, you may be able to get roughly what you want by just reorganizing your tree. For instance, if you know you always want to set 'a' and 'b' together, you could put them under a common 'ab' parent and do ref.child('ab').set({a:-1, b:-2});, which won't affect the 'c' child.
Like I said, we'd love to hear more about your scenario. We're in beta so that we can learn from developers about how they're using the API and where it's falling short! support#firebase.com :-)

Related

How to find which kinds are not being used in Google Datastore

There's any way to list the kinds that are not being used in google's datastore by our app engine app without having to look into our code and/or logic? : )
I'm not talking about indexes, which I can list by issuing an
gcloud datastore indexes list
and then compare with the datastore-indexes.xml or index.yaml.
I tried to check datastore kinds statistics and other metadata but I could not find anything useful to help me on this matter.
Should I give up to find ways of datastore providing me useful stats and code something to keep collecting datastore statistics(like data size), during a huge period to have at least a clue of which kinds are not being used and then, only after this research, take a look into our app code to see if the kind Model was removed?
Example:
select bytes from __Stat_Kind__
Store it somewhere and keep updating for a period. If the Kind bytes size does not change than probably the kind is not being used anymore.
The idea is to do some cleaning in datastore.
I would like to find which kinds are not being used anymore, maybe for a long time or were created manually to be used once... You know, like a table in oracle that no one knows what is used for and then if we look into the statistics of that table we would see that this table was only used once 5 years ago. I'm trying to achieve the same in datastore, I want to know which kinds are not being used anymore or were used a while ago, then ask around and backup/delete it if no owner was found.
It's an interesting question.
I think you would be best-placed to audit your code and instill organizational practice that requires this documentation to be performed in future as a business|technical pre-prod requirement.
IIRC, Datastore doesn't automatically timestamp Entities and keys (rightly) aren't incremental. So there appears no intrinsic mechanism to track changes short of taking a snapshot (expensive) and comparing your in-flight and backup copies for changes (also expensive and inconclusive).
One challenge with identifying a Kind that appears to be non-changing is that it could be referenced (rarely) by another Kind and so, while it does not change, it is required.
Auditing your code and documenting it for posterity should not only provide you with a definitive answer (and identify owners) but it pays off a significant technical debt that has been incurred and avoids this and probably future problems (e.g. GDPR-like) requirements that will arise in the future.
Assuming you are referring to records being created/updated, then I can think of the following options
Via the Cloud Console (Datastore > Dashboard) - This lists all your 'Kinds' and the number of records in each Kind. Theoretically, you can take a screen shot and compare the counts so that you know which one has experienced an increase or not.
Use of Created/LastModified Date columns - I usually add these 2 columns to most of my datastore tables. If you have them, then you can have a stored function that queries them. For example, you run a query to sort all of your Kinds in descending order of creation (or last modified date) and you only pull the first record from each one. This tells you the last time a record was created or modified.
I would write a function as part of my App, put it behind a page which requires admin privilege (only app creator can run it) and then just clicking a link on my App would give me the information.

3 column query in DynamoDB using DynamooseJs

My table is (device, type, value, timestamp), where (device,type,timestamp) makes a unique combination ( a candidate for composite key in non-DynamoDB DBMS).
My queries can range between any of these three attributes, such as
GET (value)s from (device) with (type) having (timestamp) greater than <some-timestamp>
I'm using dynamoosejs/dynamoose. And from most of the searches, I believe I'm supposed to use a combination of the three fields (as a single field ; device-type-timestamp) as id. However the set: function of Schema doesn't let me use the object properties (such as this.device) and due to some reasons, I cannot do it externally.
The closest I got (id:uuidv4:hashKey, device:string:GlobalSecIndex, type:string:LocalSecIndex, timestamp:Date:LocalSecIndex)
and
(id:uuidv4:rangeKey, device:string:hashKey, type:string:LocalSecIndex, timestamp:Date:LocalSecIndex)
and so on..
However, while using a Query, it becomes difficult to fetch results of particular device,type as the id, (hashKey or rangeKey) keeps missing from the scene.
So the question. How would you do it for such kind of table?
And point to be noted, this table is meant to gather content from IoT devices, which is generated every 5 mins by each device on an average.
I'm curious why you are choosing DynamoDB for this task. Advanced queries like this seem to be much better suited for a SQL based database as opposed to a NoSQL database. Due to the advanced nature of SQL queries, this task in my experience is a lot easier in SQL databases. So I would encourage you to think about if DynamoDB is truly the right system for what you are trying to do here.
If you determine it is, you might have to restructure your data a little bit. You could do something like having a property that is device-type and that will be the device and type values combined. Then set that as an index, and query based on that and sort by the timestamp, and filter out the results that are not greater than the value you want.
You are correct that currently, Dynamoose does not pass in the entire object into the set function. This is something that personally I'm open to exploring. I'm a member on the GitHub project, and if you would like to submit a PR adding that feature I would be more than happy to help explore that option with you and get that into the codebase.
The other thing you might want to explore is having a DynamoDB stream, that will set that device-type property whenever it gets added to your DynamoDB table. That would abstract that logic out of DynamoDB and your application. I'm not sure if it's necessary for what you are doing to decouple it to that level, but it might be something you want to explore.
Finally, depending on your setup, you could figure out which item will be more unique, device or type, and setup an index on that property. Then just query based on that, and filter out the results of the other property that you don't want. I'm not sure if that is what you are looking for, it will of course work, but I'm not sure how many items you will have in your table, and there get to be questions about scalability at a certain level. One way to solve some of those scalability questions might be to set the TTL of your items if you know that you the timestamp you are querying for is constant, or predictable ahead of time.
Overall there are a lot of ways to achieve what you are looking to do. Without more detail about how many items, what exactly those properties will be doing, the amount of scalability you require, which of those properties will be most unique, etc. it's hard to give a good solution. I would highly encourage you to think about if NoSQL is truly the best way to go. That query you are looking to do seems a LOT more like a SQL query. Not saying it's impossible in DynamoDB, but it will require some thought about how you want to structure your data model, and such.
Considering opinion of #charlie-fish, I decided to jump into Dynamoose and improvise the code to pass the model to the set function of the attribute. However, I discovered that the model is already being passed to default parameter of the attribute. So I changed my Schema to the following:
id:hashKey;default: function(model){ return model.device + "" + model.type; }
timestamp:rangeKey
For anyone landing here on this answer, please note that the default & set functions can access attribute options & schema instance using this . However both those functions should be regular functions, rather than arrow functions.
Keeping this here as an answer, but I won't accept it as an answer to my question for sometime, as I want to wait for someone else to hit out a better approach.
I also want to make sure that if a value is passed for id field, it shouldn't be set. For this I can use set to ignore the actual incoming value, which I don't know how, as of yet.

Firebase data structure and url to use

I'm really new to firebase, want to try out a simple mix-client app on it - android, js. I have a users table and a tasks table. The very first question that comes to my mind is, how to store them (and thus how the url to be)? For example, based on the tasks table, should I use:
/tasks/{userid}/task1, /tasks/{userid}/task2, ...
Or
/{userid}/tasks/task1, /{userid}/tasks/task2, ...
The next question, based on the answer to the first one - why to use any of the versions?
In my opinion, the first version is good because domains are separated.
The second approach is good because data is stored per-user which may make some of the operations easier.
Any ideas/suggestions?
Update: For the current case, let's say there are following features:
show list of tasks for each user
add new task to the list
edit/delete a task by user.
Simple operations.
This answer might come in late, but here's how I feel about the question after a year's experience with Firebase.
For your very first question, it totally depends on which data your application will mostly read and how and in which order ( kind of like sorting ) you expect to read the data.
your first proposal of data structure, that is "/tasks/{userid}/task1", "taks/{userid}/task2"... is good if the application will oftentimes read the tasks as per users with an added advantage of possibly sorting the data by any task's "attribute" if I might call it so.
say each task has got a priority attribute then,
// get all of a user's tasks with a priority of 25.
var userTasksRef = firebase.database().ref("tasks/${auth.uid}");
userTasksRef.orderByChild("priority").equalTo(25).on(
"desired_event",
(snapshot) => {
//do something important here.
});
2. I'll highly advice against the second approach because generally most if not all of the data that is associated to that user will be stored under the "/{userid}/" node and with firebase's mechanism, should a situation be in which you need more than one datum at that path level, it will require you getting that data with all the other data that's associated to that user's node ( tasks and any other data included). I won't want that behavior on my database. Nonetheless, this approach still permits you to store the tasks as per the users or making multiple RESTfull requesting and collecting the required data datum after datum. Suggest fanning out the data structure if this situation is encountered. Totally valid data structure if there don't exist a use case in the application where in datum at the first level of the path is needed and only that datum is needed but rather the block of data available at that path level with all the data at the deriving paths at that level( that is 2nd 3rd ... levels).
As per the use cases you've described, and if the database structure you've given is exhaustive of your database structure, I'll say it isn't enough to cover your use cases.
Suggest reading the docs here. Great and exhaustive documentation of their's.
As a pick, the first approach is a better approach to modelling this data use case in NoSQL and more accurately Firebase's NoSQL database.

Symfony framework best pratice

I'm currently developing in Symfony 3 and I'm wondering what's (if there is) the best practice in the following case :
Supposed I have clients and orders entity, each order being linked to one client.
If I want to calculate the sum of the orders by client, what's the best way ?
a function in the client class that parse the client's orders to sum them and return the result
a function in the order's repository taking a client as parameter and returning a scalar result (... SUM(order.value) WHERE order.client =: client) ...)
a function in the repository that returns all the orders of a client and then summing the values in the controller
Thanks for the help and have a nice day
In spite of the popularity of the phrase, there is no such thing as an objective "best practice". It's always subjective and always depends on your specific problem. Which is why these sorts of questions tend to be frowned upon, down voted and closed.
a function in the client class that parse the client's orders to sum them and return the result
From a domain driven design point of view, this would be ideal. Business logic in your business entity. The problem is how many orders do you think a client will accumulate because this approach requires loading in all the orders for a given client. A few dozen orders? Probably fine. A few thousand orders? Might start to slow things up. Maybe you could start with this approach and then refactor as the system grows?
a function in the order's repository taking a client as parameter and returning a scalar result (... SUM(order.value) WHERE order.client =: client) ...)
This seems like it would be very efficient. On the other hand, it starts to leak some of the client functionality into to the order domain. Might become hard to maintain if you end up with many of these special functions scattered about. But if this is the only one then it's probably fine.
a function in the repository that returns all the orders of a client and then summing the values in the controller
Grabbing all the orders has the same scaling problems as the you first solution. Do you really need the complete order just to sum it? What the heck does summing mean anyways? Putting business functionality inside a controller is generally not a good thing. What if you need to calculate the sum in some other place as well? On the other, it is Symfony, and quite a few Symfony apps do have rather fat controllers with plenty of business logic and work just fine. So this might be the best approach if it fits in with the rest of your application.
And while you did not mention it, creating a ClientOrder service is also a possibility.
But at the end of the day it is really your decision. No magical best practices are going to help. I'd suggest writing a few tests just to make possible refactoring easier and then pick an approach that meets your current needs and move on.
Please note that the following is just my opinion, because this is not mentioned in the official doctrine best practices.
a function in the client class that parse the client's orders to sum them and return the result
This would result in loading all the related orders from the database unless you mark the collection as "extra lazy" (Extra Lazy Associations). But in my opinion you should not do that, as long as you're not working with the whole collection anyways.
a function in the order's repository taking a client as parameter and returning a scalar result (... SUM(order.value) WHERE order.client =: client) ...)
This should be the way to go. You put all the "heavy lifting" into the Repository class and your application does not need to know how the repository gets the order count. Additionally you don't clutter your entity class with additonal functionality (imo they should only contain the data and more specific requests should be handled by the repositories).
a function in the repository that returns all the orders of a client and then summing the values in the controller
This would load all the entities similar to variant 1, so don't do that unless you want to work with all the entities anyway.

Firebase security for things like indexes

I'm looking at using firebase for a small project, but one stumbling block I can't find an answer to is that of security as it relates to things like indexes for a purely client side application.
For example, if I need an index for articles -- that is, not using priority -- for alternate sorting, how would I secure this?
The client would need access to the list that contains the article ids sorted appropriately, which as far as I can tell also means the client can then be malicious and completely reorder or delete that index, not just the article it posted.
For that matter, the same goes for setting priority, or really any kind of auxiliary data that is automatic and not user entered - a change date for example.
Am I missing something? Or are you forced to have a server component to accomplish that level of data security/integrity?
Edit: The simplest case of this I can think of, is something like a date created field on an article - What prevents the client from just setting that maliciously?

Resources