Uploading multiple/large files - asp.net

I have this page where a user can upload documents (multiple documents, size limit 10MB each). It is a two step process. Step 1 has the input form. Step 2 is the preview page with a submit button.
How should I handle the scenario where the user closes the browser while on the preview page, without submitting the form? Should I save the files in a temp location after step 1? Is this a decent solution?
And what are the best practices in general for uploading (reasonably) large files?
Thanks.

Take a look at this:
http://www.codeproject.com/Articles/68374/Upload-Multiple-Files-in-ASP-NET-using-jQuery
One way or another, you'll probably end up looking at a jQuery/AJAX control to do this.

You can use a temporary folder to save the files and copy the files to their final location only on submission of the form.
In any case, it would be better to implement a garbage collector. The garbage collector can empty the temporary folder every night. But when using a garbage collector, if you have a way to identify files that were not submitted (for example, if a row is added to a database upon submission), you can put the files in their final location from the beginning, and let the garbage collector remove them every night.
Upload of large files can be done using a JQuery UI plugin such as Uploadify: http://www.uploadify.com/.
You should pay attention that it uses flash, which on the one hand is very good for uploading large files, but on the other hand it will prevent your application from supporting Apple machines such as iPad.

If the user leaves, then let them start over. More than likely they left for a good reason. If there was a crash, leave the responsibility on their end. If you choose to store their data without them submitting this could allow malicious users to exploit your storage.
You can also look into a process called chunking.
For a more in depth discussion on file uploads in mvc3, see this SO post: MVC 3 file upload and model binding

Related

Serving Lazy Thumbnail Images from Azure Blob Storage - What is the overhead of Exists?

I have a website where users upload images. These images are shown on various sections of the site with various thumbnail dimensions. Since the site is still under rapid development, I don't yet want to commit to a set number of thumb sizes. Thus I believe I should be generating thumbnails on a lazy basis.
Of the two options, which is the most performant way to do this:
When I go to serve the thumbnail, convert the dimensions into a canonical filename (like "bighouse-thumb-160x120"). Check if the file exists in blob storage using client.GetContainerReference(containerName).GetBlockBlobReference(key).Exists(); If it does not exist, generate it and save it.
When I go to serve the thumbnail, query my SQL database to see if the thumbnail exists. If it exists, get the blob URI from the DB and emit that as HTML. If it does not exist, generate it and update the SQL database.
I've used #2 in the past, but design-wise it is duplicating state which is bad. If querying azure for the existence of blobs is scalable, I'd rather do that. I don't really understand the threading model in Asp.Net. If I have 200 users requesting thumbs, will my azure Exists calls all happen in parallel? Even if they do, two round trips seem like a lot of overhead. I assume roundtripping the database is faster and lends itself more easily to generic caching solutions.
What is the right answer?
Regardless of the overhead, I would pre-generate thumbnails when you upload/store the image. This way you move the burden of generating thumbnails from something that is done many times (retrieving an image) to one that is much less often executed (storing an image).
Consider the following scenario, when you lazily generate thumbnails on the first view:
Check for an existing thumbnail (is false, first view remember ;))
Generate a thumbnail
Store the thumbnail
Send the thumbnail to the client
With pre-generated thumbnails the process is much shorter:
Send the thumbnail to the client
Done.
With 'lazy generating' the check for existing can be expensive due to network overhead (on every hit!), generating the thumbnail can be hugely expensive memory- & CPU-wise and than you have to store it, with network overhead again. You can even offload generating the thumbnail(s) to a separate process, possibly started by queue messages, to take the burden of generating the images even further away from your webservers.
However, this brings up the question of what you should do when you introduce a new thumbnail/image size. When you pre-generate the thumbnails you can write a simple tool to create the new sizes and store them, and if you went the separate process route it's even simpler. Just upgrade the separate process, generate a queue message for every existing image and just let it do its work.

Using VB.NET to Detect Changes in a Web Page

Again I come to you guys for your expertise and advice on an issue that I am having. I was wondering if any of you would know how to detect if a web page has been modified using VB.NET. I need to be able to set up a task which periodically (like once a week) scans the user inputted web pages and if the web page content has changed, I need to fire off an email to an individual that it has changed (not the exact location on the page itself). I'll be storing the HTTP status and of course the page data itself as well as the date of when it was last modified. Of course this needs to be very fault tolerant since it could be another week before the check runs again. Any help would be great. Thank you.
EDIT
New twist on this question sorry. I had more time to think about what we wanted. So... Detecting ANY change on a web page would be kind of silly since time dependent elements of the page would change every so often. Instead, what I would like to do is be able to detect the documents in the page. For instance if there are excel, word docs, or pdfs that get changed on that page. So, I'd run the hash on these documents then on some sort of schedule do a check to see if new documents have been added or if the old documents have been modified. Any suggestions on how to detect the documents embedded on the page and running the hash? Thanks again!
As I mentioned in a comment, this sort of job is what checksums (also known as hash functions) were designed for.
You code for will look something like this:
- for each webpage of interest
- pull webbpage
- calculate checksum of contents
- is current checksum different to last checksum?
- if yes, send email
- store new checksum and other appropriate data
The .Net framework has a number of checksums available. The two most popular are MD5 and sha1
In addition to the checksum option, there are also various Diff function that achieve this, and provide much more information than changed=true/false. This question has more info:
How to tell when a web page has changed by x% in VB.net?

I have multiple users, can i lock the web page so that only one user at a time can update a record?

Can anyone help or provide me with some suggestions for the below query.
I have a web form (Minutes of Meeting) and 8 users that need to access this web page and update their area. A user may have more than one area to update and essentially i would like to some how lock down the web page if possible when a user is using it so that no other user can update this web page till joe bloggs has finished with it.
I have a Active Directory security group set up to restrict the site to that group of users only, but i need to think of a solution to the above?
Is there a way i can do this via a web control or via SQL?
There must be better ways to do it. However, Is it possible for you to introduce a sql table column similar to "UpdateInProgress" (bit). Any update process sees that column, If 0 then It updates to 1 and after It saves the changes and updates back to 0 so that the form is available for other to update. If update process sees 1, It can't update the web form because update is in progress.
I also suggest to introduce another column named "UpdateInProgressBy" to check who has opened it for editing.
First of all we must note that there is a big time from the moment the user reads the data, get it in a page, change them and then try to write them back. So we are not talking for the lock command on SQL, nether any other lock that happens in milliseconds and help to synchronize threads, but here we must synchronize people and what they write.
There is also a problem if the user leave the page for any reason and this can make the data lock for ever.
This problem can solve with two approaches.
the easy one, when a user try to save data you must check if the same data have been change in the middle, and warn him, or show a merge dialog, or merge programmaticall, or something similar - I do not know what you won.
the difficult way is to constantly monitor the page that read and change the data, and keep this monitor results on a common table in the data base, and there if a user have been and stay on page, the rest users get a warning and read only data, until the user go.
This monitor must be made with javasript and must know even if a user abandon the page.
SET TRANSACTION ISOLATION LEVEL as SERIALIZABLE
for more information check this link:
http://msdn.microsoft.com/en-us/library/ms173763.aspx

Need an application design advice

I'm developing a web application which processes invoices(the functionality is not limited to invoices, but it doesn't matter). One of the parts of the workflow is to print invoice after it was published. This means that the website user is able to select 10-20(and more) invoices and print it at once. Also there may be several invoice templates which may be customizable(this is one of the key requirements).
I should also mention that we decide to generate PDFs from the html code and then print it. So as the PDF creation process may take some time to complete we decided to use a windows service for invoices printing.
So, summarizing we have the following requirements:
There should be customizable invoice
templates;
The website user should be
able to specify which template he
want to use with the invoice item
specified;
There should be a possibility to
print one or several selected
invoices in one click.
Our first idea was to use user controls as invoice templates. The user control will be responsible for invoice layout. This also means there will be a base class for these user controls through which we will be able to define a data source for the controls.
In this case we may even allow users to modify ascx file (or something similar) to edit basic captions if necessary.
The problem begins in the windows service where we are unable to generate output for user controls. So the other solution is to use http handler or web service to generate user controls output and transfer this to the windows service. But this complicates the solution (e.g. we need to use authentication for this and similar problems).
Maybe there is a much more simpler way to do it?
Thanks in advance.
In response to your comment, I suggest you have the website generate the HTML and save it into a 'GeneratedInvoice' field in your DB, which the service then processes (i.e. converts to PDF however your pdf conversion software does it). It's appropriate because you have a 'saved' copy of the generated invoice; i.e. if your invoice processing routine changes (different styles, etc) your old invoices aren't affected, and yet you can regenerate a given bunch in a possibly 'new' format if required.

Where can I view rountrip information in my ASP.NET application?

I'm playing around with storing application settings in my database, but I think I may have created a situation where superfluous roundtrips are being made. Is there an easy way to view roundtrips made to an MS Access (I know, I know) backend?
I guess while I'm here, I should ask for advice on the best way to handle this project. I'm building an app that generates links based on file names (files are numbered ints, 0-5000). The files are stored on network shares, arranged by name, and the paths change frequently as files are bulk transfered to create space, etc.
Example:
Files 1000 - 2000 go to /path/1000s
Files 2001 - 3000 go to /path/2000s
Files 3001 - 4000 go to /path/3000s
etc
I'm sure by now you can see where I'm going with this. Ultimately, I'm trying to avoid making a roundtrip to get the paths for every single file as they are displayed in a gridview.
I'm open to the notion that I've gone about this all wrong and that my idea might be rubbish. I've toyed around with the notion of just creating a flat file, but if I do that, do I still run into the problem of having that file opened and closed for every file displayed in a gridview?
1) set A breakpoint in the first line of page_load section in code by clicking in the leftmost bar (a dim thick line down the left side). You should se a round and red mark there then
2) ... and run Debug in Visual Studio (hit F5)
3) Turn back to Visual Studio after the app has started and step through the program, line-by-line, by pushing the F8 button. Great fun

Resources