Issue with objectify save - objectify

I have a usecase wherein I save an entity for the first time and a second after saving it I fetch it, update it a bit and save it in batch along with two other entites (different 'kinds'). In a few cases (10 out of 50K), the update to datastore is ignored.
I mean, it's there in the objectify cache but the change didn't happen in datastore.
How I am able to justify the above, is because after the save, I fetch it again after a second and I'm able to see it.
PS: I also use .now() while saving. This shouldn't happen when now() is used right?

Sounds like you are seeing eventual consistency in the datastore. There is quite a bit of Google documentation available, but this looks to be the most comprehensive:
https://cloud.google.com/datastore/docs/articles/balancing-strong-and-eventual-consistency-with-google-cloud-datastore/
There are a lot of ways to deal with eventual consistency, either by avoiding it (using get-by-key operations), changing the structure of your data (using #Parent relationships), or masking it with UI behavior (say, add the new thing to the list in UI code instead of just refreshing the whole list).

Related

To speed up data refresh should I create a view or use query folding?

I have an SQLite database of 5GB which gets updated few times a day and is used to refresh a PowerBI dashboard. While below 1GB I could refresh the dashboard in under a minute, but now takes around 20 minutes.
Should I create views so merges, joins, etc. are made in the view instead of loading the table itself and using Power Query to perform data manipulation?
Should I use incremental refresh? Is it possible in SQLite?
I would implement views on the database side in this case. This is in my eyes the benefit of having control of the DB yourself - pushing these data transforms to the DB instead of tinkering in Power Query is a major benefit.
When the views are set up, work on incremental refresh if the SQLITE3 connector supports it, but if it doesn't use the "regular" SQL Database connector it may not do so, according to the official documentation.
5GB is tiny in size and your refreshes should not be overly long. How complicated are your queries and what kind of transformations are you doing?
I would first look at making sure you're not breaking query folding. Seemingly simple steps do not always support folding but by rearranging them you can ensure folding happens and have a drastic effect on refresh times. Check at what stage folding is breaking and see if you can move things around.
Next, I'd look at moving your transformations upstream to a view (Roche's maxim).
Incremental refresh will also work so it is up to you really to pick from the available options.

How to find which kinds are not being used in Google Datastore

There's any way to list the kinds that are not being used in google's datastore by our app engine app without having to look into our code and/or logic? : )
I'm not talking about indexes, which I can list by issuing an
gcloud datastore indexes list
and then compare with the datastore-indexes.xml or index.yaml.
I tried to check datastore kinds statistics and other metadata but I could not find anything useful to help me on this matter.
Should I give up to find ways of datastore providing me useful stats and code something to keep collecting datastore statistics(like data size), during a huge period to have at least a clue of which kinds are not being used and then, only after this research, take a look into our app code to see if the kind Model was removed?
Example:
select bytes from __Stat_Kind__
Store it somewhere and keep updating for a period. If the Kind bytes size does not change than probably the kind is not being used anymore.
The idea is to do some cleaning in datastore.
I would like to find which kinds are not being used anymore, maybe for a long time or were created manually to be used once... You know, like a table in oracle that no one knows what is used for and then if we look into the statistics of that table we would see that this table was only used once 5 years ago. I'm trying to achieve the same in datastore, I want to know which kinds are not being used anymore or were used a while ago, then ask around and backup/delete it if no owner was found.
It's an interesting question.
I think you would be best-placed to audit your code and instill organizational practice that requires this documentation to be performed in future as a business|technical pre-prod requirement.
IIRC, Datastore doesn't automatically timestamp Entities and keys (rightly) aren't incremental. So there appears no intrinsic mechanism to track changes short of taking a snapshot (expensive) and comparing your in-flight and backup copies for changes (also expensive and inconclusive).
One challenge with identifying a Kind that appears to be non-changing is that it could be referenced (rarely) by another Kind and so, while it does not change, it is required.
Auditing your code and documenting it for posterity should not only provide you with a definitive answer (and identify owners) but it pays off a significant technical debt that has been incurred and avoids this and probably future problems (e.g. GDPR-like) requirements that will arise in the future.
Assuming you are referring to records being created/updated, then I can think of the following options
Via the Cloud Console (Datastore > Dashboard) - This lists all your 'Kinds' and the number of records in each Kind. Theoretically, you can take a screen shot and compare the counts so that you know which one has experienced an increase or not.
Use of Created/LastModified Date columns - I usually add these 2 columns to most of my datastore tables. If you have them, then you can have a stored function that queries them. For example, you run a query to sort all of your Kinds in descending order of creation (or last modified date) and you only pull the first record from each one. This tells you the last time a record was created or modified.
I would write a function as part of my App, put it behind a page which requires admin privilege (only app creator can run it) and then just clicking a link on my App would give me the information.

How to define multiple flow executors for differents flow and disable continuation snapshot for some flows?

I'm working on a huge project and we would like to have a different management of continuation for some flows.
We want to be able to use the continuation snapshots (those that permit the use of the back button) for most of our flows but we also want to be able to totally disable continuation snapshots for some of our flows that use huge quantity of memory and that we don't want to serialize.
Is it possible ? And how ?
Thank you very much.
Big caveat that I haven't tried to do any of this. But, here's a potential approach.
First of all, you need your own implementation of FlowExecutionSnapshotFactory. This will allow you to manage the creation and restoration of snapshots. You'll probably want to wrap SerializedFlowExecutionSnapshotFactory, but only allow the snapshot to be created in certain circumstances. Even better, you might want to allow the snapshot to be created, but to omit some of the data from it.
Now the problem is getting Webflow to use your new SnapshotFactory. The factory is created in FlowExecutorFactoryBean.createFlowExecutionSnapshotFactory(). So you need to get this created. You can specify your own FlowExecutorFactoryBean in your application-context.xml file. There's instructions on how to do that at http://forum.springsource.org/showthread.php?54714-SWF-2-0-Backtracking-and-exception-catching - scroll down to angrysoul's post at the bottom.
Now you just need to make sure you provide your own own instance of FlowExecutorImpl, that contains your own snapshot factory.

Solution for previewing user changes and allowing rollback/commit over a period of time

I have asked a few questions today as I try to think through to the solution of a problem.
We have a complex data structure where all of the various entities are tightly interconnected, with almost all entities heavily reliant/dependant upon entities of other types.
The project is a website (MVC3, .NET 4), and all of the logic is implemented using LINQ-to-SQL (2008) in the business layer.
What we need to do is have a user "lock" the system while they make their changes (there are other reasons for this which I won't go into here that are not database related). While this user is making their changes we want to be able to show them the original state of entities which they are updating, as well as a "preview" of the changes they have made. When finished, they need to be able to rollback/commit.
We have considered these options:
Holding open a transaction for the length of time a user takes to make multiple changes stinks, so that's out.
Holding a copy of all the data in memory (or cached to disk) is an option but there is heck of a lot of it, so seems unreasonable.
Maintaining a set of secondary tables, or attempting to use session state to store changes, but this is complex and difficult to maintain.
Using two databases, flipping between them by connection string, and using T-SQL to manage replication, putting them back in sync after commit/rollback. I.e. switching on/off, forcing snapshot, reversing direction etc.
We're a bit stumped for a solution that is relatively easy to maintain. Any suggestions?
Our solution to a similar problem is to use a locking table that holds locks per entity type in our system. When the client application wants to edit an entity, we do a "GetWithLock" which gets the client the most up-to-date version of the entity's data as well as obtaining a lock (a GUID that is stored in the lock table along with the entity type and the entity ID). This prevents other users from editing the same entity. When you commit your changes with an update, you release the lock by deleting the lock record from the lock table. Since stored procedures are the api we use for interacting with the database, this allows a very straight forward way to lock/unlock access to specific entities.
On the client side, we implement IEditableObject on the UI model classes. Our model classes hold a reference to the instance of the service entity that was retrieved on the service call. This allows the UI to do a Begin/End/Cancel Edit and do the commit or rollback as necessary. By holding the instance of the original service entity, we are able to see the original and current data, which would allow the user to get that "preview" you're looking for.
While our solution does not implement LINQ, I don't believe there's anything unique in our approach that would prevent you from using LINQ as well.
HTH
Consider this:
Long transactions makes system less scalable. If you do UPDATE command, update locks last until commit/rollback, preventing other transaction to proceed.
Second tables/database can be modified by concurent transactions, so you cannot rely on data in tables. Only way is to lock it => see no1.
Serializable transaction in some data engines uses versions of data in your tables. So after first cmd is executed, transaction can see exact data available in cmd execution time. This might help you to show changes made by user, but you have no guarantee to save them back into storage.
DataSets contains old/new version of data. But that is unfortunatelly out of your technology aim.
Use a set of secondary tables.
The problem is that your connection should see two versions of data while the other connections should see only one (or two, one of them being their own).
While it is possible theoretically and is implemented in Oracle using flashbacks, SQL Server does not support it natively, since it has no means to query previous versions of the records.
You can issue a query like this:
SELECT *
FROM mytable
AS OF TIMESTAMP
TO_TIMESTAMP('2010-01-17')
in Oracle but not in SQL Server.
This means that you need to implement this functionality yourself (placing the new versions of rows into your own tables).
Sounds like an ugly problem, and raises a whole lot of questions you won't be able to go into on SO. I got the following idea while reading your problem, and while it "smells" as bad as the others you list, it may help you work up an eventual solution.
First, have some kind of locking system, as described by #user580122, to flag/record the fact that one of these transactions is going on. (Be sure to include some kind of periodic automated check, to test for lost or abandoned transactions!)
Next, for every change you make to the database, log it somehow, either in the application or in a dedicated table somewhere. The idea is, given a copy of the database at state X, you could re-run the steps submitted by the user at any time.
Next up is figuring out how to use database snapshots. Read up on these in BOL; the general idea is you create a point-in-time snapshot of the database, do whatever you want with it, and eventually throw it away. (Only available in SQL 2005 and up, Enterprise edition only.)
So:
A user comes along and initiates one of these meta-transactions.
A flag is marked in the database showing what is going on. A new transaction cannot be started if one is already in process. (Again, check for lost transactions now and then!)
Every change made to the database is tracked and recorded in such a fashion that it could be repeated.
If the user decides to cancel the transaction, you just drop the snapshot, and nothing is changed.
If the user decides to keep the transaction, you drop the snapshot, and then immediately re-apply the logged changes to the "real" database. This should work, since your requirements imply that, while someone is working on one of these, no one else can touch the related parts of the database.
Yep, this sure smells, and it may not apply to well to your problem. Hopefully the ideas here help you work something out.

Caching ListView data a viable option?

Here's my scenario:
1) User runs search to retrieve values for display in ListView via LinqDataSource.
2) They click on one of the items which takes them to another page where the details can be examined, further drill-down can happen, etc.
3) User wants to go back to the original ListView results to select another item for inspection.
I can see it's possible to pass the querystring params around, allowing the querying to be duplicated each time the user comes back to the ListView, but it seems like there ought to be a way to cache the results.
Since I'm using the LinqDataSource, though, I believe the actual results are fetched each time the query is run. I'm currently feeding a "select new {blah, blah}" type of IEnumerable to the e.Results, which can't be turned into a List since it's populated with anonymous types.
In short:
1) Does it make sense to try to place potentially large query results in the users session?
2) If it does, is a List the reasonable data structure?
3) Do I need to resort to something like creating a class with the correct properties to hold the anonymous data, enumerate the query return, populate the List?
4) Is there a better option than the LinqDataSource for this type goal?
5) Or, does it just make more sense to run the query each time they hit the ListView?
I apologize if this wasn't clear. I would really appreciate it if someone can set me straight before I nuke a bunch of my free time headed down the wrong path :)
First, I would suggest that you look into the caching mechanism that comes with ASP.NET, unless the data is private for a certain user.
Second, I would suggest that you design your application in a way so that you create natural points where you could try to get data from a cache before querying the database (and insert data into the cache, with expiration rules), but don't start putting stuff into the cache until you have verified that it will actually make a difference.
Measure how much time that is actually spent on retrieving data and use caching in the cases where it makes a difference.
I'm not sure if resurrecting threads from the dead is cool on SO, but here is what I found to answer this question:
http://weblogs.asp.net/pwelter34/archive/2007/08.aspx

Resources