Is it possible to subscribe to changes in Azure DocumentDb? - azure-cosmosdb

Is there some way to subscribe to changes in an Azure DocumentDb? For example, something similar to SQL Server SQL Dependency. If there is nothing "built-in", is there a recommended approach to solving this problem?
Update: As of May 2017 Change Feed is available. See more here:
https://learn.microsoft.com/en-us/azure/cosmos-db/change-feed

There is no way to subscribe to changes but it's a frequently requested feature (see votes for it here which also shows it as "under review"). I heard a while back that Azure Functions also wants this for their DocumentDB connection so maybe that will help get it from "under review" to "in progress". Go vote it up to help.
Until then, most people poll the collection using either the _ts field or with their own time series sequential field. However, there is no guarantee that a document with an earlier _ts won't show up later with default eventual or even session consistency so you have to work around that usually by going back in time and then checking for duplicates (idempotency).

Related

How to find which kinds are not being used in Google Datastore

There's any way to list the kinds that are not being used in google's datastore by our app engine app without having to look into our code and/or logic? : )
I'm not talking about indexes, which I can list by issuing an
gcloud datastore indexes list
and then compare with the datastore-indexes.xml or index.yaml.
I tried to check datastore kinds statistics and other metadata but I could not find anything useful to help me on this matter.
Should I give up to find ways of datastore providing me useful stats and code something to keep collecting datastore statistics(like data size), during a huge period to have at least a clue of which kinds are not being used and then, only after this research, take a look into our app code to see if the kind Model was removed?
Example:
select bytes from __Stat_Kind__
Store it somewhere and keep updating for a period. If the Kind bytes size does not change than probably the kind is not being used anymore.
The idea is to do some cleaning in datastore.
I would like to find which kinds are not being used anymore, maybe for a long time or were created manually to be used once... You know, like a table in oracle that no one knows what is used for and then if we look into the statistics of that table we would see that this table was only used once 5 years ago. I'm trying to achieve the same in datastore, I want to know which kinds are not being used anymore or were used a while ago, then ask around and backup/delete it if no owner was found.
It's an interesting question.
I think you would be best-placed to audit your code and instill organizational practice that requires this documentation to be performed in future as a business|technical pre-prod requirement.
IIRC, Datastore doesn't automatically timestamp Entities and keys (rightly) aren't incremental. So there appears no intrinsic mechanism to track changes short of taking a snapshot (expensive) and comparing your in-flight and backup copies for changes (also expensive and inconclusive).
One challenge with identifying a Kind that appears to be non-changing is that it could be referenced (rarely) by another Kind and so, while it does not change, it is required.
Auditing your code and documenting it for posterity should not only provide you with a definitive answer (and identify owners) but it pays off a significant technical debt that has been incurred and avoids this and probably future problems (e.g. GDPR-like) requirements that will arise in the future.
Assuming you are referring to records being created/updated, then I can think of the following options
Via the Cloud Console (Datastore > Dashboard) - This lists all your 'Kinds' and the number of records in each Kind. Theoretically, you can take a screen shot and compare the counts so that you know which one has experienced an increase or not.
Use of Created/LastModified Date columns - I usually add these 2 columns to most of my datastore tables. If you have them, then you can have a stored function that queries them. For example, you run a query to sort all of your Kinds in descending order of creation (or last modified date) and you only pull the first record from each one. This tells you the last time a record was created or modified.
I would write a function as part of my App, put it behind a page which requires admin privilege (only app creator can run it) and then just clicking a link on my App would give me the information.

Updating database at certain time

I'm looking to make my Firebase Database update at a particular time.
The way it should work is that, for a group, the leader sets a deadline time. The group votes on some stuff. At the deadline time, I would like the database to automatically tabulate the votes and store the response within.
I'm not sure how to set these types of rules for the database without doing a check whenever a member of the group is online and refreshes their feed. Also, this would allow any member to write to the vote-result field, which seems bad when I want it to just be automatic. It seems like there should be an easier way than this, but I just can't find anything.
It seems like the other option would be to set up a separate server that counts through all the time-frames and sends an update request when the time has allotted. But it seems like Firebase should have this built in. I'm sure I'm missing something. Thank you in advance.
EDIT: Here is a more comprehensive look at my usecase. I am looking into cron stuff now, as I think it will solve my problem, but I don't know.
1) Leader creates a group and invites friends to it. Event is created is firebase database. Group is created with a specific deadline.
2) Before deadline, leader and friends can vote on certain options. Basically they submit a dictionary to database with their votes.
3) On deadline, either just need to change the state of the group (from voting to closed) or calculate the vote response. Same problem, which is that I don't know to do do it at a certain time w/o using user clients.

Keep track of server synchronization state

I'm creating an indicator which notifies the user about whether or not the local state is in sync with the latest fetched server state. I can think of several ways to keep track of this:
storing an additional 'pristine' state next to the normal 'soiled' state upon fetching data and diffing those states in my indicator component
toggling a up-to-date-flag upon state changes and fetching data back and forth
But those solutions seem over-engineered and error-prone to me. I think middleware is probably the cleanest solution here, but thusfar I haven't came across a viable out-of-the-box solution. If anyone could hook me up, because I probably lack the right idiom to use in my search terms, that would be awesome. On a sidenote: I'm not allowed to store my data in the localstorage.
I would recommend calculating a hash (this works as the signature of your state) of your data on the server, and sending it along with your data to local clients. Your clients can then use this hash to know if it matches with the latest data.
You might also want to take a look at Etag for a more standardized approach.
https://en.wikipedia.org/wiki/HTTP_ETag

Solution for previewing user changes and allowing rollback/commit over a period of time

I have asked a few questions today as I try to think through to the solution of a problem.
We have a complex data structure where all of the various entities are tightly interconnected, with almost all entities heavily reliant/dependant upon entities of other types.
The project is a website (MVC3, .NET 4), and all of the logic is implemented using LINQ-to-SQL (2008) in the business layer.
What we need to do is have a user "lock" the system while they make their changes (there are other reasons for this which I won't go into here that are not database related). While this user is making their changes we want to be able to show them the original state of entities which they are updating, as well as a "preview" of the changes they have made. When finished, they need to be able to rollback/commit.
We have considered these options:
Holding open a transaction for the length of time a user takes to make multiple changes stinks, so that's out.
Holding a copy of all the data in memory (or cached to disk) is an option but there is heck of a lot of it, so seems unreasonable.
Maintaining a set of secondary tables, or attempting to use session state to store changes, but this is complex and difficult to maintain.
Using two databases, flipping between them by connection string, and using T-SQL to manage replication, putting them back in sync after commit/rollback. I.e. switching on/off, forcing snapshot, reversing direction etc.
We're a bit stumped for a solution that is relatively easy to maintain. Any suggestions?
Our solution to a similar problem is to use a locking table that holds locks per entity type in our system. When the client application wants to edit an entity, we do a "GetWithLock" which gets the client the most up-to-date version of the entity's data as well as obtaining a lock (a GUID that is stored in the lock table along with the entity type and the entity ID). This prevents other users from editing the same entity. When you commit your changes with an update, you release the lock by deleting the lock record from the lock table. Since stored procedures are the api we use for interacting with the database, this allows a very straight forward way to lock/unlock access to specific entities.
On the client side, we implement IEditableObject on the UI model classes. Our model classes hold a reference to the instance of the service entity that was retrieved on the service call. This allows the UI to do a Begin/End/Cancel Edit and do the commit or rollback as necessary. By holding the instance of the original service entity, we are able to see the original and current data, which would allow the user to get that "preview" you're looking for.
While our solution does not implement LINQ, I don't believe there's anything unique in our approach that would prevent you from using LINQ as well.
HTH
Consider this:
Long transactions makes system less scalable. If you do UPDATE command, update locks last until commit/rollback, preventing other transaction to proceed.
Second tables/database can be modified by concurent transactions, so you cannot rely on data in tables. Only way is to lock it => see no1.
Serializable transaction in some data engines uses versions of data in your tables. So after first cmd is executed, transaction can see exact data available in cmd execution time. This might help you to show changes made by user, but you have no guarantee to save them back into storage.
DataSets contains old/new version of data. But that is unfortunatelly out of your technology aim.
Use a set of secondary tables.
The problem is that your connection should see two versions of data while the other connections should see only one (or two, one of them being their own).
While it is possible theoretically and is implemented in Oracle using flashbacks, SQL Server does not support it natively, since it has no means to query previous versions of the records.
You can issue a query like this:
SELECT *
FROM mytable
AS OF TIMESTAMP
TO_TIMESTAMP('2010-01-17')
in Oracle but not in SQL Server.
This means that you need to implement this functionality yourself (placing the new versions of rows into your own tables).
Sounds like an ugly problem, and raises a whole lot of questions you won't be able to go into on SO. I got the following idea while reading your problem, and while it "smells" as bad as the others you list, it may help you work up an eventual solution.
First, have some kind of locking system, as described by #user580122, to flag/record the fact that one of these transactions is going on. (Be sure to include some kind of periodic automated check, to test for lost or abandoned transactions!)
Next, for every change you make to the database, log it somehow, either in the application or in a dedicated table somewhere. The idea is, given a copy of the database at state X, you could re-run the steps submitted by the user at any time.
Next up is figuring out how to use database snapshots. Read up on these in BOL; the general idea is you create a point-in-time snapshot of the database, do whatever you want with it, and eventually throw it away. (Only available in SQL 2005 and up, Enterprise edition only.)
So:
A user comes along and initiates one of these meta-transactions.
A flag is marked in the database showing what is going on. A new transaction cannot be started if one is already in process. (Again, check for lost transactions now and then!)
Every change made to the database is tracked and recorded in such a fashion that it could be repeated.
If the user decides to cancel the transaction, you just drop the snapshot, and nothing is changed.
If the user decides to keep the transaction, you drop the snapshot, and then immediately re-apply the logged changes to the "real" database. This should work, since your requirements imply that, while someone is working on one of these, no one else can touch the related parts of the database.
Yep, this sure smells, and it may not apply to well to your problem. Hopefully the ideas here help you work something out.

How to handle concurrency control in ASP.NET Dynamic Data?

I've been quite impressed with dynamic data and how easy and quick it is to get a simple site up and running. I'm planning on using it for a simple internal HR admin site for registering people's skills/degrees/etc.
I've been watching the intro videos at www.asp.net/dynamicdata and one thing they never mention is how to handle concurrency control.
It seems that DD does not handle it right out of the box (unless there is some setting I haven't seen) as I manually generated a change conflict exception and the app failed without any user friendly message.
Anybody know if DD handles it out of the box? Or do you have to somehow build it into the site?
Concurrency is not handled out the of the box by DD.
One approach would be to implement this on the database side, by adding a "last updated" timestamp column (or other unique stamp, such as a GUID) to each table.
You then create an update trigger for each table. For each row being updated, is the "last updated" stamp passed in the same as the one on the row in the database?
If so, update the row, but give it a new "last updated" stamp.
If not, raise a specific "Data is out of date" exception.
On the client side, for each row you update, you'd need to refresh the "last updated" stamp.
In the client code you watch for the "Data is out of date" exception and display a helpful message to the user, asking them to refresh the data and re-submit their change.
Hope this helps.
All depends on the definition, what do you mean under "out of the box". Of cause you have to create a lot of code to handle concurrency, but some features help us to implement it.
My favorite model is "optimistic concurrency" based on rowversion datatype of SQL Server. It is like "last updated" timestamp, but you need not use any update trigger for each table. All updates of the corresponding "timestamp" column in your tables will be made automatically by SQL server at every update of data in the table row. I describes it in my old answer Concurrency handling of Sql transactrion. I hope it will be helpful for you.
I was of the impression the Dynamic data does the update on the underlying data source. Maybe you can specify the concurrency model (pessimistic/optimistic) on the data meta model that gets registered on the App_Init section. But you would probably get unable to save changes error, so by default would be pessimistic, last in loses....
Sorry to replay late. Yes DD is too strong when it come to fast development of project. Not only that it is base for .Net 4.0. DD is more enhance and have been included in .Net 4.0.
DD mostly work on Linq to sql. I will suggest you to have a look on that part.
In linq to SQl when you go to property of table you will find a property there which specify wheater to check the old value before updating new value. If you set that true I think your proble will get handle.
wish you best luck.
Let's learn from each other.
The solution given by Binary Worrier works and it's widely used on platforms providing a GUI to merge the changes (e.g. source control programs, wiki engines, etc). That way none of the users lose their changes. In the other hand, it requires much code or using external components or DLLs.
If you are not happy with that, another approach is just to lock the record that is being edited. Nobody else will be able to edit that record until the user commit the changes or his session expires. It has pros and cons but requires little code compared with the first option.

Resources