Where should I store a million-record array? - asp.net

I have a question in the field of optimization and application design.
I am building a web application using asp.net and sql server.
In one of my screens I must perform an action that generates a random number of user ids. I present the viewer with some statistics about the selected users. If the viewer likes the statistics I want to save them.
So basically I need to save the temporary random data, and if user likes it keep it.
Should I store the generated ids in the database or should I store them in the session?

Well, since you are generating random ids, you are using some kind of pseudorandom generator. Have you considered the possibility of just storing the seed for that generator? I've recently had a similar issue. In fact, very similar. Have a look:
My Post about Random(int seed)
EDIT: In comments you suggest you want to do much of this on the SQL server. Have a look at the following post. In addition you may want to consider the special case of new user being added whilst your admin or whatever ponders if he "likes" to save this or not. In that case you'd need to additionally store the number of users when the request was made, and adjust your random selection function accordingly. In the even more special case of removed user, this approach, admittedly, is useless.
Seeding SQL

It depends on your case and requirements. If you store the data in the session, then that is fine, but you will lose the data once the session is abandoned, or ended. But if that is not important, meaning that you don't want to keep the data there forever, then storing in the session (temporary) is better. But you will need to look into performance issues, specially if multiple users do the same thing at once, that will decrease performance.
If you choose to store in a database, then that will also work but you will need to decide on having ViewState enabled or disabled, again for the sake of performance.

if i have to do so , i store data in temp table in sql which will be drop on every page load event and recreate, i show the data in a grid where from selected user id can be deleted and saved

Related

Storing large object to InProc session rather than reloading on every page

This is my first post/question so please let me know if/how I can improve it. I found similar questions but nothing quite covered this.
When you store to InProc session you're just storing a reference to the data. So, if I have a public property foo, and I store it in Session("foo") = foo, then I haven't really taken up any additional memory (aside from the 32/64 bits used by the pointer)?
In my case, we are currently reloading foo on every page of our website, so if I were to instead store it in session, then it should be taking the same about of space, but not needing to reload on every page. I'm seeing a lot of people say not to store large objects in session, but if that large object already exists, what difference does it make to have a pointer to it? Of course I would remove the object from session the moment it was no longer needed.
The data we are trying to store is an object specific to the user's current work, but not user data. As an analogy, say the user was a car dealer, and he is looking at all the data for a particular customer. We have multiple pages for this customer, and we want to keep all the customer info loaded on each page, All the customer data is stored in a single xml data column in a SQL table, which we parse on every page.
We have tried binary serialization instead of parsing xml, so we could store with session in state server mode, but we found the performance to actually be worse.
We are running on a single web server.
First off, no. When you store something in the session state all the data required to store that object is consumed by the website process(s). Just because .NET treats variables like references doesn't mean it actually uses less memory than a no-GC language. It just means that copying that variable around is done efficiently without using reference operators or pointers.
Your question is a bit vague, but you have a few options for persisting data:
1) Send the data to the client as JSON and store it on the browser if it should be per-user and is needed more on the client side than the server side. You can then send pieces of the data with different requests if you need to (put it in hidden fields if you have to use ASPX web forms).
2) Store it in the session state if it is a small bit of per user data.
3) Store it in the ASP.NET cache if it is large and common to all users, see here (https://msdn.microsoft.com/en-us/library/6hbbsfk6.aspx).
4) If it is large and user-specific that is used primarily on the server then you have more of a performance problem. You should see if you can break out any user specific stuff from static stuff. If you do that and its still large then a database may not be a bad solution. If you are already using DB calls in your application then looking up this data on every request won't cause too much overhead and you won't have to regenerate it from scratch (You should only do this if the data takes a considerable time to generate as a DB call could be slower than just regenerating the data itself). I recommend writing some sort of middleware (HttpModule or OwinMiddleware) that uses whatever user Identity you use for auth to look up the data and then set it on the HttpContext.Current.Items collection. This way the data is usable for the entire request and you can add logic in the middleware to figure out when to set it.
I would think that having a large chunk of user-specific data would be a red flag as user data should just be a list of what the user can/can't do and what their preferences are.
If this is static data then its super simple. The application cache is what you want. The only complications would be if you have multiple servers that need synced data.

Should I use Session/Cache to store the DataSet or should I fetch fresh from the Database each time?

The amount of coding that goes into the making of a DataSet is often significant. Now I'm not sure what the industry standard or best practise when dealing with data requests from multiple ASP.NET pages. Should I use a cache/session to pass on the DataSet from page to page or should I fetch directly from the database for each page?
What's the most common approach here?
Here are my thoughts:
It depends on the database and the type of data that you're trying to get, as well as what may modify the data. Do you have backend processes that run concurrent with the data you're going to want? Is this data only updated because of the current page, or does it update at all? How many people are going to use said page?
I personally almost always call to the database, simply because there are so many what-ifs when it comes to this kind of thing. At any time the data can change; it's never as static as people would think it would be. I would personally trade correct data over performance any day.
But that's just me personally. This question is so open ended that it's impossible to take every single thing into consideration since I don't know your database structure, nor how expensive it is to retrieve it, nor what you're using it for.
Sorry I couldn't really be more help.
It depends upon you need. If data size is very large then don't save it in Session or Cahce, because Session or Cache is stored in server Memory. Session is user specific and it will store data for each user in the server, so avoid from it. I think you should directly fetch data each time you need, don't save it in session. If data is very small/limited then you can save it in session ( example UserName or UserId etc ). If you are using a gridview to showdata then use paging and on each page request fetch the data from the database.

Should I avoid the session with a complex object in asp.net?

Here's my issue, we have a large patient object that is used on multiple screens throughout the admin. Each screen contains different information about the same patient. It can't all be on one screen.
The only time I want to persist the patient is when the user clicks save. I need to have an in memory patient somewhere. A user may be in the admin, change patient information on various screens, run validation and decide to not save that patient. This is typical use.
Is it ok to store this patient in the session? Or, is there a better approach to do this? At most this admin would have 20 users with access.
Opinions may vary on this. Session is tricky, especially if you use something other than in-memory session. Distributed session will break a non-serializable object. If this object is a simple POCO or object you control, try your best to make it play with serialization. If it does you're set. For an admin tool without much load I'd say you'd be fine.
Hey I found this - know nothing about the site, but illustrates my point:
https://www.fortify.com/vulncat/en/vulncat/dotnet/asp_dotnet_bad_practices_non_serializable_object_stored_in_session.html
I had a similar situation with similar amount of users. I did it and it worked great.
My situation was about scheduling events.
Someone would create an event and through multiple web pages would modify and configure this event. When they were all done it would save all the details to SQL. In the end, I was surprised just how well it worked.
Session should be fine here. You have what appears to be a light user load... but you might want to check exactly how much memory the object takes up, multiply that by the maximum number of users, and see where you are.
If you want to avoid the session altogether, you could use System.Web.Caching to store the object instead, and key the stored object using the users identifier plus some constant string.
In either case, you'll want to be aware of how many web servers are running the application. If it's just one web server, no worries. If you have multiple web servers, you'll want to make sure they are "sticky" - then the user is guaranteed to have all requests processed by the same server. How this is done is entirely dependent on your flavor of load balancing... normally the "IT folks" handle this for you.

Is there any other way to store the data source other than view state?

I have a page on which data table is created programmatically if the data is not there in the database tables. I have to use this data table in many events during postbacks. The data table may contain hundreds of records and there may be multiple users accessing the same page(Of course with different data source for each user). I am storing the data table in view state but I am afraid that this practice will make the page heavier. Is there any other way to preserve the data table across postbacks.The code is very long so I can not copy and past it here.
Using session will again make the whole application heavier...So is it better choice over viewstate??
You should use Session. Also it's possible to use Application or Cache but you'll have to generate and store a unique key on your page to negate possible interferention between requests from different users.
In your case the view state can get very big and will hurt page load performance. IMHO the best thing would be to revise the way your are handling the post back events.
Use Caching if more than one user needs the same data.
Using the Session if the data is specific for each zuser. But keep in mind, that if you are in a
clustered invironment it has some pitfalls.
Load the data from the database each time the user posts back to the server. No Statehandling on the server needs to be done but you loose performance while doing a network roundtrip.
For a quick fix I usually store the View State on the server. Refer to this page to read about it... http://aspguy.wordpress.com/2008/07/09/reducing-the-page-size-by-storing-viewstate-on-server/
May be you need to store the data in the Cache object
If the data table is different for each user, you should use Session or you could use Cache , assumed to make a different Cache object for each user.
But if the data table is very big probably is not a good idea to store it in memory instead of direct db access.
If data is user specific then you can use Session. However, in case of out of process session state, you may have issues because all that data needs to be marshaled back & forth from session store.
Otherwise Cache is a good option but you need to choose cache period and expiration policy on typical usage scenarios (and also need to handle cache expiry scenario gracefully).
Yet another option is to push the data into temp file - however, in such case, you need to manage file clean up etc.

Ways to store an object across multiple postbacks

For the sake of argument assume that I have a webform that allows a user to edit order details. User can perform the following functions:
Change shipping/payment details (all simple text/dropdowns)
Add/Remove/Edit products in the order - this is done with a grid
Add/Remove attachments
Products and attachments are stored in separate DB tables with foreign key to the order.
Entity Framework (4.0) is used as ORM.
I want to allow the users to make whatever changes they want to the order and only when they hit 'Save' do I want to commit the changes to the database. This is not a problem with textboxes/checkboxes etc. as I can just rely on ViewState to get the required information. However the grid is presenting a much larger problem for me as I can't figure out a nice and easy way to persist the changes the user made without committing the changes to the database. Storing the Order object tree in Session/ViewState is not really an option I'd like to go with as the objects could get very large.
So the question is - how can I go about preserving the changes the user made until ready to 'Save'.
Quick note - I have searched SO to try to find a solution, however all I found were suggestions to use Session and/or ViewState - both of which I would rather not use due to potential size of my object trees
If you have control over the schema of the database and the other applications that utilize order data, you could add a flag or status column to the orders table that differentiates between temporary and finalized orders. Then, you can simply store your intermediate changes to the database. There are other benefits as well; for example, a user that had a browser crash could return to the application and be able to resume the order process.
I think sticking to the database for storing data is the only reliable way to persist data, even temporary data. Using session state, control state, cookies, temporary files, etc., can introduce a lot of things that can go wrong, especially if your application resides in a web farm.
If using the Session is not your preferred solution, which is probably wise, the best possible solution would be to create your own temporary database tables (or as others have mentioned, add a temporary flag to your existing database tables) and persist the data there, storing a single identifier in the Session (or in a cookie) for later retrieval.
First, you may want to segregate your specific state management implementation into it's own class so that you don't have to replicate it throughout your systems.
Second, you may want to consider a hybrid approach - use session state (or cache) for a short time to avoid unnecessary trips to a DB or other external store. After some amount of inactivity, write the cached state out to disk or DB. The simplest way to do this, is to serialize your objects to text (using either serialization or a library like proto-buffers). This helps allow you to avoid creating redundant or duplicate data structure to capture the in-progress data relationally. If you don't need to query the content of this data - it's a reasonable approach.
As an aside, in the database world, the problem you describe is called a long running transaction. You essentially want to avoid making changes to the data until you reach a user-defined commit point. There are techniques you can use in the database layer, like hypothetical views and instead-of triggers to encapsulate the behavior that you aren't actually committing the change. The data is in the DB (in the real tables), but is only visible to the user operating on it. This is probably a more complicated implementation than you may be willing to undertake, and requires intrusive changes to your persistence layer and data model - but allows the application to be ignorant of the issue.
Have you considered storing the information in a JavaScript object and then sending that information to your server once the user hits save?
Use domain events to capture the users actions and then replay those actions over the snapshot of the order model ( effectively the current state of the order before the user started changing it).
Store each change as a series of events e.g. UserChangedShippingAddress, UserAlteredLineItem, UserDeletedLineItem, UserAddedLineItem.
These events can be saved after each postback and only need a link to the related order. Rebuilding the current state of the order is then as simple as replaying the events over the currently stored order objects.
When the user clicks save, you can replay the events and persist the updated order model to the database.
You are using the database - no session or viewstate is required therefore you can significantly reduce page-weight and server memory load at the expense of some page performance ( if you choose to rebuild the model on each postback ).
Maintenance is incredibly simple as due to the ease with which you can implement domain object, automated testing is easily used to ensure the system behaves as you expect it to (while also documenting your intentions for other developers).
Because you are leveraging the database, the solution scales well across multiple web servers.
Using this approach does not require any alterations to your existing domain model, therefore the impact on existing code is minimal. Biggest downside is getting your head around the concept of domain events and how they are used and abused =)
This is effectively the same approach as described by Freddy Rios, with a little more detail about how and some nice keyword for you to search with =)
http://jasondentler.com/blog/2009/11/simple-domain-events/ and http://www.udidahan.com/2009/06/14/domain-events-salvation/ are some good background reading about domain events. You may also want to read up on event sourcing as this is essentially what you would be doing ( snapshot object, record events, replay events, snapshot object again).
how about serializing your Domain object (contents of your grid/shopping cart) to JSON and storing it in a hidden variable ? Scottgu has a nice article on how to serialize objects to JSON. Scalable across a server farm and guess it would not add much payload to your page. May be you can write your own JSON serializer to do a "compact serialization" (you would not need product name,product ID, SKU id, etc, may be you can just "serialize" productID and quantity)
Have you considered using a User Profile? .Net comes with SqlProfileProvider right out of the box. This would allow you to, for each user, grab their profile and save the temporary data as a variable off in the profile. Unfortunately, I think this does require your "Order" to be serializable, but I believe all of the options except Session thus far would require the same.
The advantage of this is it would persist through crashes, sessions, server down time, etc and it's fairly easy to set up. Here's a site that runs through an example. Once you set it up, you may also find it useful for storing other user information such as preferences, favorites, watched items, etc.
You should be able to create a temp file and serialize the object to that, then save only the temp file name to the viewstate. Once they successfully save the record back to the database then you could remove the temp file.
Single server: serialize to the filesystem. This also allows you to let the user resume later.
Multiple server: serialize it but store the serialized value in the db.
This is something that's for that specific user, so when you persist it to the db you don't really need all the relational stuff for it.
Alternatively, if the set of data is v. large and the amount of changes is usually small, you can store the history of changes done by the user instead. With this you can also show the change history + support undo.
2 approaches - create a complex AJAX application that stores everything on the client and only submits the entire package of changes to the server. I did this once a few years ago with moderate success. The applicaiton is not something I would want to maintain though. You have a hard time syncing your client code with your server code and passing fields that are added/deleted/changed is nightmarish.
2nd approach is to store changes in the data base in a temp table or "pending" mode. Advantage is your code is more maintainable. Disadvantage is you have to have a way to clean up abandonded changes due to session timeout, power failures, other crashes. I would take this approach for any new development. You can have separate tables for "pending" and "committed" changes that opens up a whole new level of features you can add. What if? What changed? etc.
I would go for viewstate, regardless of what you've said before. If you only store the stuff you need, like { id: XX, numberOfProducts: 3 }, and ditch every item that is not selected by the user at this point; the viewstate size will hardly be an issue as long as you aren't storing the whole object tree.
When storing attachments, put them in a temporary storing location, and reference the filename in your viewstate. You can have a scheduled task that cleans the temp folder for every file that was last saved over 1 day ago or something.
This is basically the approach we use for storing information when users are adding floorplan information and attachments in our backend.
Are the end-users internal or external clients? If your clients are internal users, it may be worthwhile to look at an alternate set of technologies. Instead of webforms, consider using a platform like Silverlight and implementing a rich GUI there.
You could then store complex business objects within the applet, provide persistant "in progress" edit tracking across multiple sessions via offline storage and easily integrate with back-end services that providing saving / processing of the finalised order. All whilst maintaining access via the web (albeit closing out most *nix clients).
Alternatives include Adobe Flex or AJAX, depending on resources and needs.
How large do you consider large? If you are talking sessions-state (so it doesn't go back/fore to the actual user, like view-state) then state is often a pretty good option. Everything except the in-process state provider uses serialization, but you can influence how it is serialized. For example, I would tend to create a local model that represents just the state I care about (plus any id/rowversion information) for that operation (rather than the full domain entities, which may have extra overhead).
To reduce the serialization overhead further, I would consider using something like protobuf-net; this can be used as the implementation for ISerializable, allowing very light-weight serialized objects (generally much smaller than BinaryFormatter, XmlSerializer, etc), that are cheap to reconstruct at page requests.
When the page is finally saved, I would update my domain entities from the local model and submit the changes.
For info, to use a protobuf-net attributed object with the state serializers (typically BinaryFormatter), you can use:
// a simple, sessions-state friendly light-weight UI model object
[ProtoContract]
public class MyType {
[ProtoMember(1)]
public int Id {get;set;}
[ProtoMember(2)]
public string Name {get;set;}
[ProtoMember(3)]
public double Value {get;set;}
// etc
void ISerializable.GetObjectData(
SerializationInfo info,StreamingContext context)
{
Serializer.Serialize(info, this);
}
public MyType() {} // default constructor
protected MyType(SerializationInfo info, StreamingContext context)
{
Serializer.Merge(info, this);
}
}

Resources