I'm creating an indicator which notifies the user about whether or not the local state is in sync with the latest fetched server state. I can think of several ways to keep track of this:
storing an additional 'pristine' state next to the normal 'soiled' state upon fetching data and diffing those states in my indicator component
toggling a up-to-date-flag upon state changes and fetching data back and forth
But those solutions seem over-engineered and error-prone to me. I think middleware is probably the cleanest solution here, but thusfar I haven't came across a viable out-of-the-box solution. If anyone could hook me up, because I probably lack the right idiom to use in my search terms, that would be awesome. On a sidenote: I'm not allowed to store my data in the localstorage.
I would recommend calculating a hash (this works as the signature of your state) of your data on the server, and sending it along with your data to local clients. Your clients can then use this hash to know if it matches with the latest data.
You might also want to take a look at Etag for a more standardized approach.
https://en.wikipedia.org/wiki/HTTP_ETag
I've been struggling with the following issue in Meteor + iron router:
I have a page (route) that has a subscription to a mongo collection
on that page, I have some logic which relies on a cursor querying the collection, also utilizing an observeChanges handler (namely, I'm running a search on that collection)
the problem in this case is the collection is being preserved in the client throughout route changes, which causes 2 unwanted effects:
1) the collection isn't necessarily needed outside that route, meaning i'm wasting client RAM (the collection, or even a subset of it, is likely to be quite big)
2) whenever i go back to that route, I want to start off with an empty subset for the observeChanges handler to work properly.
Any advice on how to clear the mirrored collection? (using the Collection._collection.remove({}) hack is bad practice, and doesn't even solve the problem)
Thanks!
solved this by storing the subscription handles. used them to unsubscribe the data (i.e. subscription_handle.stop() ) on template.destroyed()
Where should I store the ETag for a given resource?
Approach A: compute on the fly
Get the resource and compute the ETag on the fly upon each request:
$resource = $repository->findByPK($id); // query
// Compute ETag
$etag = md5($resource->getUpdatedAt());
$response = new Response();
$response->setETag($etag);
$response->setLastModified($resource->getUpdatedAt());
if($response->isNotModified($this->getRequest())) {
return $response; // 304
}
Approach B: storing at database level
Saving a bit of CPU time while making INSERT and UPDATE statements a bit slower (we use triggers to get ETag updated):
$resource = $repository->findByPK($id); // query
$response = new Response();
$response->setETag($resource->getETag());
$response->setLastModified($resource->getUpdatedAt());
if ($response->isNotModified($this->getRequest())) {
return $response;
}
Approach C: caching the ETag
This is like approach B but ETag is stored in some cache middleware.
I suppose it would depend on the cost of having available the items going into the ETag itself.
I mean, the user sends along a request for a given resource; this should trigger a retrieval operation on the database (or some other operation).
If the retrieval is something simple such as fetching a file, then inquiring on the file stats is fast, and there's no need of storing anything anywhere: a MD5 of the file path plus its update time is enough.
If the retrieval implies querying a database, then it depends on whether you can decompose the query without losing performance (e.g., the user requests an article by ID. You might retrieve relevant data from the article table only. So a cache "hit" will entail a single SELECT on a primary key. But a cache "miss" means you have to query again the database, wasting the first query - or not - depending on your model).
If the query (or sequence of queries) is well-decomposable (and the resulting code maintenable) then I'd go with the dynamic ETag again.
If it is not, then most depends on the query cost and the overall cost of maintenance of a stored-ETag solution. If the query is costly (or the output is bulky) and INSERT/UPDATEs are few, then (and, I think, only then) it will be advantageous to store a secondary column (or table) with the ETag.
As for the caching middleware, I don't know. If I had a framework keeping track of everything for me, I might say 'go for it' -- the middleware is supposed to caring and implementing the points above. Should the middleware be implementation-agnostic (unlikely, unless it's a cut-and-paste slap-on ... which is not unheard of), then there would be either the risk of it "screening" updates to the resource, or maybe an excessive awkwardness on invoking some cache-clearing API upon updates. Both factors would need to be evaluated against the load improvement offered by ETag support.
I don't think that in this case a 'silver bullet' exists.
Edit: in your case there is little - or even no - difference between cases A and B. To be able to implement getUpdatedAt(), you would need to store the update time in the model.
In this specific case I think that it would be simpler and more maintainable the dynamic, explicit calculation of the ETag (case A). The retrieval cost is incurred in any case, and the explicit calculation cost is that of a MD5 calculation, which is really fast and completely CPU-bound. The advantages in maintainability and simplicity in my opinion are overwhelming.
On a semi-related note, it occurs to me that in some cases (infrequent updates to the database and much more frequent queries to the same) it might be advantageous and almost transparent to implement a global Last-Modified time for the whole database. If the database has not changed, then there is no way that any query to the database can return varied resources, no matter what the query is. In such a situation, one would only need to store the Last-Modified global flag in some easy and quick to retrieve place (not necessarily the database). For example
function dbModified() {
touch('.last-update'); // creates the file, or updates its modification time
}
in any UPDATE/DELETE code. The resource would then add a header
function sendModified() {
$tsstring = gmdate('D, d M Y H:i:s ', filemtime('.last-update')) . 'GMT';
Header("Last-Modified: " . $tsstring);
}
to inform the browser of that resource's modification time.
Then, any request for a resource including If-Modified-Since could be bounced back with a 304 without ever accessing the persistency layer (or at least saving all persistent resource access). No update time at record level would (have to) be needed:
function ifNotModified() {
// Check out timezone settings. The GMT helps but it's not always the ticket
$ims = isset($_SERVER['HTTP_IF_MODIFIED_SINCE'])
? strtotime($_SERVER['HTTP_IF_MODIFIED_SINCE'])
: -1; // This ensures the test will FAIL
if (filemtime('.last-update') <= $ims) {
// The database was never updated after the resource retrieval.
// There's no way the resource may have changed.
exit(Header('HTTP/1.1 304 Not Modified'));
}
}
One would put the ifNotModified() call as early as possible in the resource supply route, the sendModified as early as possible in the resource output code, and the dbModified() wherever the database gets modified significantly as far as resources are concerned (i.e., you can and probably should avoid it when logging access statistics to database, as long as they do not influence the resources' content).
In my opinion persisting ETags is BAD IDEA unless your business logic is ABOUT persisting ETags. Like when you write application to track users basing on ETags and this is a business feature :).
Potential savings in execution time will be small or non-existng. Bad sides of this solution are certain and growing as your application grows.
According to specification Resource in the same version shall give different E-Tags depending on end point from with has been obtained.
From http://en.wikipedia.org/wiki/HTTP_ETag:
"Comparing ETags only makes sense with respect to one URL—ETags for resources obtained from different URLs may or may not be equal, so no meaning can be inferred from their comparison."
From this you may conclude that you should persist not just ETags but also its endpoint and store as many ETags as many enpoints you have. Sounds crazy?
Even if you want to ignore HTTP specification and just provide one Etag for Entity without any metadata about its endpoints. You still kind of binding at least 2 layers (caching and business logic) that ideally should not be mixed. Idea behind of having Entity (versus some lose data) is to have separated and not coupled business logic in them, and do not pollute them with stuff about networking, view layer data or... caching.
IHMO, this depends on how often resources are updated vs how often resources are read.
If each ETag is read 1 or 2 times between modifications, then just calculate them on the fly.
If your resources are read far more times than they're updated, then you'd better cache them, calcultating the ETag every time the resource is modified (so you don't have to bother with out-of-date cached ETags).
If ETags are modified almost as often as they're read, then I'd still cache them, especially since it seems your resources are stored on a database.
I have method in my BLL that interacts with the database and retrieves data based on the defined criteria.
The returned data is a collection of FAQ objects which is defined as follows:
FAQID,
FAQContent,
AnswerContent
I would like to cache the returned data to minimize the DB interaction.
Now, based on the user selected option, I have to return either of the below:
ShowAll: all data.
ShowAnsweredOnly: faqList.Where(Answercontent != null)
ShowUnansweredOnly: faqList.Where(AnswerContent != null)
My Question:
Should I only cache all data returned from DB (e.g. FAQ_ALL) and filter other faqList modes from cache (= interacting with DB just once and filter the data from the cache item for the other two modes)? Or should I have 3 cache items: FAQ_ALL, FAQ_ANSWERED and FAQ_UNANSWERED (=interacting with database for each mode [3 times]) and return the cache item for each mode?
I'd be pleased if anyone tells me about pros/cons of each approach.
Food for thought.
How many records are you caching, how big are the tables?
How much mid-tier resources can be reserved for caching?
How many of each type data exists?
How fast filtering on the client side will be?
How often does the data change?
how often is it changed by the same application instance?
how often is it changed by other applications or server side jobs?
What is your cache invalidation policy?
What happens if you return stale data?
Can you/Should you leverage active cache invalidation, like SqlDependency or LinqToCache?
If the dataset is large then filtering on the client side will be slow and you'll need to cache two separate results (no need for a third if ALL is the union of the other two). If the data changes often then caching will return stale items frequently w/o a proactive cache invalidation in place. Active cache invalidation is achievable in the mid-tier if you control all the updates paths and there is only one mid-tier instance application, but becomes near really hard if one of those prerequisites is not satisfied.
It basically depends how volatile the data is, how much of it there is, and how often it's accessed.
For example, if the answered data didn't change much then you'd be safe caching that for a while; but if the unanswered data changed a lot (and more often) then your caching needs might be different. If this was the case it's unlikely that caching it as one dataset will be the best option.
It's not all bad though - if the discrepancy isn't too huge then you might be ok cachnig the lot.
The other point to think about is how the data is related. If the FAQ items toggle between answered and unanswered then it'd make sense to cache the base data as one - otherwise the items would be split where you wanted it together.
Alternatively, work with the data in-memory and treat the database as an add-on...
What do I mean? Well, typically the user will hit "save" this will invoke code which saves to the DB; when the next user comes along they will invoke a call which gets the data out of the DB. In terms of design the DB is a first class citizen, everything has to go through it before anyone else gets a look in. The alternative is to base the design around data which is held in-memory (by the BLL) and then saved (perhaps asynchronously) to the DB. This removes the DB as a bottleneck but gives you a new set of problems - like what happens if the database connection goes down or the server dies with data only in-memory?
Pros and Cons
Getting all the data in one call might be faster (by making less calls).
Getting all the data at once if it's related makes sense.
Granularity: data that is related and has a similar "cachability" can be cached together, otherwise you might want to keep them in separate cache partitions.
For the sake of argument assume that I have a webform that allows a user to edit order details. User can perform the following functions:
Change shipping/payment details (all simple text/dropdowns)
Add/Remove/Edit products in the order - this is done with a grid
Add/Remove attachments
Products and attachments are stored in separate DB tables with foreign key to the order.
Entity Framework (4.0) is used as ORM.
I want to allow the users to make whatever changes they want to the order and only when they hit 'Save' do I want to commit the changes to the database. This is not a problem with textboxes/checkboxes etc. as I can just rely on ViewState to get the required information. However the grid is presenting a much larger problem for me as I can't figure out a nice and easy way to persist the changes the user made without committing the changes to the database. Storing the Order object tree in Session/ViewState is not really an option I'd like to go with as the objects could get very large.
So the question is - how can I go about preserving the changes the user made until ready to 'Save'.
Quick note - I have searched SO to try to find a solution, however all I found were suggestions to use Session and/or ViewState - both of which I would rather not use due to potential size of my object trees
If you have control over the schema of the database and the other applications that utilize order data, you could add a flag or status column to the orders table that differentiates between temporary and finalized orders. Then, you can simply store your intermediate changes to the database. There are other benefits as well; for example, a user that had a browser crash could return to the application and be able to resume the order process.
I think sticking to the database for storing data is the only reliable way to persist data, even temporary data. Using session state, control state, cookies, temporary files, etc., can introduce a lot of things that can go wrong, especially if your application resides in a web farm.
If using the Session is not your preferred solution, which is probably wise, the best possible solution would be to create your own temporary database tables (or as others have mentioned, add a temporary flag to your existing database tables) and persist the data there, storing a single identifier in the Session (or in a cookie) for later retrieval.
First, you may want to segregate your specific state management implementation into it's own class so that you don't have to replicate it throughout your systems.
Second, you may want to consider a hybrid approach - use session state (or cache) for a short time to avoid unnecessary trips to a DB or other external store. After some amount of inactivity, write the cached state out to disk or DB. The simplest way to do this, is to serialize your objects to text (using either serialization or a library like proto-buffers). This helps allow you to avoid creating redundant or duplicate data structure to capture the in-progress data relationally. If you don't need to query the content of this data - it's a reasonable approach.
As an aside, in the database world, the problem you describe is called a long running transaction. You essentially want to avoid making changes to the data until you reach a user-defined commit point. There are techniques you can use in the database layer, like hypothetical views and instead-of triggers to encapsulate the behavior that you aren't actually committing the change. The data is in the DB (in the real tables), but is only visible to the user operating on it. This is probably a more complicated implementation than you may be willing to undertake, and requires intrusive changes to your persistence layer and data model - but allows the application to be ignorant of the issue.
Have you considered storing the information in a JavaScript object and then sending that information to your server once the user hits save?
Use domain events to capture the users actions and then replay those actions over the snapshot of the order model ( effectively the current state of the order before the user started changing it).
Store each change as a series of events e.g. UserChangedShippingAddress, UserAlteredLineItem, UserDeletedLineItem, UserAddedLineItem.
These events can be saved after each postback and only need a link to the related order. Rebuilding the current state of the order is then as simple as replaying the events over the currently stored order objects.
When the user clicks save, you can replay the events and persist the updated order model to the database.
You are using the database - no session or viewstate is required therefore you can significantly reduce page-weight and server memory load at the expense of some page performance ( if you choose to rebuild the model on each postback ).
Maintenance is incredibly simple as due to the ease with which you can implement domain object, automated testing is easily used to ensure the system behaves as you expect it to (while also documenting your intentions for other developers).
Because you are leveraging the database, the solution scales well across multiple web servers.
Using this approach does not require any alterations to your existing domain model, therefore the impact on existing code is minimal. Biggest downside is getting your head around the concept of domain events and how they are used and abused =)
This is effectively the same approach as described by Freddy Rios, with a little more detail about how and some nice keyword for you to search with =)
http://jasondentler.com/blog/2009/11/simple-domain-events/ and http://www.udidahan.com/2009/06/14/domain-events-salvation/ are some good background reading about domain events. You may also want to read up on event sourcing as this is essentially what you would be doing ( snapshot object, record events, replay events, snapshot object again).
how about serializing your Domain object (contents of your grid/shopping cart) to JSON and storing it in a hidden variable ? Scottgu has a nice article on how to serialize objects to JSON. Scalable across a server farm and guess it would not add much payload to your page. May be you can write your own JSON serializer to do a "compact serialization" (you would not need product name,product ID, SKU id, etc, may be you can just "serialize" productID and quantity)
Have you considered using a User Profile? .Net comes with SqlProfileProvider right out of the box. This would allow you to, for each user, grab their profile and save the temporary data as a variable off in the profile. Unfortunately, I think this does require your "Order" to be serializable, but I believe all of the options except Session thus far would require the same.
The advantage of this is it would persist through crashes, sessions, server down time, etc and it's fairly easy to set up. Here's a site that runs through an example. Once you set it up, you may also find it useful for storing other user information such as preferences, favorites, watched items, etc.
You should be able to create a temp file and serialize the object to that, then save only the temp file name to the viewstate. Once they successfully save the record back to the database then you could remove the temp file.
Single server: serialize to the filesystem. This also allows you to let the user resume later.
Multiple server: serialize it but store the serialized value in the db.
This is something that's for that specific user, so when you persist it to the db you don't really need all the relational stuff for it.
Alternatively, if the set of data is v. large and the amount of changes is usually small, you can store the history of changes done by the user instead. With this you can also show the change history + support undo.
2 approaches - create a complex AJAX application that stores everything on the client and only submits the entire package of changes to the server. I did this once a few years ago with moderate success. The applicaiton is not something I would want to maintain though. You have a hard time syncing your client code with your server code and passing fields that are added/deleted/changed is nightmarish.
2nd approach is to store changes in the data base in a temp table or "pending" mode. Advantage is your code is more maintainable. Disadvantage is you have to have a way to clean up abandonded changes due to session timeout, power failures, other crashes. I would take this approach for any new development. You can have separate tables for "pending" and "committed" changes that opens up a whole new level of features you can add. What if? What changed? etc.
I would go for viewstate, regardless of what you've said before. If you only store the stuff you need, like { id: XX, numberOfProducts: 3 }, and ditch every item that is not selected by the user at this point; the viewstate size will hardly be an issue as long as you aren't storing the whole object tree.
When storing attachments, put them in a temporary storing location, and reference the filename in your viewstate. You can have a scheduled task that cleans the temp folder for every file that was last saved over 1 day ago or something.
This is basically the approach we use for storing information when users are adding floorplan information and attachments in our backend.
Are the end-users internal or external clients? If your clients are internal users, it may be worthwhile to look at an alternate set of technologies. Instead of webforms, consider using a platform like Silverlight and implementing a rich GUI there.
You could then store complex business objects within the applet, provide persistant "in progress" edit tracking across multiple sessions via offline storage and easily integrate with back-end services that providing saving / processing of the finalised order. All whilst maintaining access via the web (albeit closing out most *nix clients).
Alternatives include Adobe Flex or AJAX, depending on resources and needs.
How large do you consider large? If you are talking sessions-state (so it doesn't go back/fore to the actual user, like view-state) then state is often a pretty good option. Everything except the in-process state provider uses serialization, but you can influence how it is serialized. For example, I would tend to create a local model that represents just the state I care about (plus any id/rowversion information) for that operation (rather than the full domain entities, which may have extra overhead).
To reduce the serialization overhead further, I would consider using something like protobuf-net; this can be used as the implementation for ISerializable, allowing very light-weight serialized objects (generally much smaller than BinaryFormatter, XmlSerializer, etc), that are cheap to reconstruct at page requests.
When the page is finally saved, I would update my domain entities from the local model and submit the changes.
For info, to use a protobuf-net attributed object with the state serializers (typically BinaryFormatter), you can use:
// a simple, sessions-state friendly light-weight UI model object
[ProtoContract]
public class MyType {
[ProtoMember(1)]
public int Id {get;set;}
[ProtoMember(2)]
public string Name {get;set;}
[ProtoMember(3)]
public double Value {get;set;}
// etc
void ISerializable.GetObjectData(
SerializationInfo info,StreamingContext context)
{
Serializer.Serialize(info, this);
}
public MyType() {} // default constructor
protected MyType(SerializationInfo info, StreamingContext context)
{
Serializer.Merge(info, this);
}
}