Export large amounts of data to client in asp.net - asp.net

I need to export a large amount of data (~100mb) from a sql table to a user via the web. What would be the best solution for doing so? One thought was to export the data to a folder on the db server, compress it (by some means) and then provide a download link for the user. Any other methods for doing so? Also, can we compress data from within sql server?
Any approaches are welcome.

I wouldn't tie up the database waiting for the user to download 100Mb, even for a high speed user. When the user requests the file have them specify an email address. Then call an asynch process to pull the data, write it to a temp file (don't want > 100mb in memory after all), then zip the temp file to a storage location, then send the user an email with a link to download the file.

You can respond to a page request with a file:
Response.AddHeader("Content-Disposition",
"attachment; filename=yourfile.csv");
Response.ContentType = "text/plain";
Be sure to turn buffering off, so IIS can start sending the first part of the file while you are building the second:
Response.BufferOutput = false;
After that, you can start writing the file like:
Response.Write("field1,field2,field3\r\n");
When the file is completely written, end the response, so ASP.NET doesn't append a web page to your file:
Response.End();
This way, you don't have to write files on your web servers, you just create the files in memory and send them to your users.
If compression is required, you can write a ZIP file in the same way. This is a nice free library to create ZIP files.

Your approach works fine. SSIS + 7zip might be useful for automating the process if you need to do it more than a couple times.

If XML is OK, one approach would be to select the data "FOR XML" like this:
http://www.sqljunkies.ddj.com/Article/296D1B56-8BDD-4236-808F-E62CC1908C4E.scuk
And then spit out the raw XML directly to the browser as content-type: text/xml. Also be sure to set up Gzip compression on your web server for files with XML extensions. http://www.microsoft.com/technet/prodtechnol/WindowsServer2003/Library/IIS/502ef631-3695-4616-b268-cbe7cf1351ce.mspx?mfr=true
This will shrink the XML file down to 1/3 or maybe 1/4 the size as it's transferred. This wouldn't be the highest performance option because of the inherent wasted space in XML files, but a lot depends on what format you're looking for in the end.
Another option would be to use the free CSharpZipLib to compress the XML (or whatever format you want) into a zip file that the user would download. Along those lines, if this is something that will be used frequently you might want to look into caching and storing the zip file on the web server with some sort of expiration so it's not regenerated for every single request.

The download link is a perfectly valid and reasonable solution. Another would be to automatically redirect the user to that file so they didn't need to click a link. It really depends on your workflow and UI experience.
I would suggest against implementing compression in the SQL Server engine. Instead look at the DotNetZip library (Or System.IO.Conpression if you think your users have the capability of uncompressing gzip archives) and implement the compression within the web application.

Related

Right way to transfer a CSV file to a BI application?

We are doing a BI application, and our customers send us data files daily. We are doing data exchange using CSV files, because our customers are used to watch data with Excel, and they are not ready yet to use an API on their system (maybe in few years we will be able to use XML/JSON webservice, we hope).
Currently the data transfer is made with FTP (SFTP in fact). Our customers upload file automatically on an FTP server, and we have a CRON task that watches if a file has been sent.
But there are many disadvantages with that:
We cannot know with reliability if the upload is done, or still in progress (we asked them to upload a file with a temporary name, and move it after, but many of them still don't do that)
So, we can try to guess, and consider upload is done if enough time has passed. But FTP protocol doesn't allow to get server time, and time can be desynced. So we can upload an empty file and read it's date to know the time of the server. But we need write permission to do that...
FTP protocol allow to pause upload...
Then, we are considering to transfer files by asking our customer to upload them directly on our application, using HTTPS. This is more reliable, but less convenient:
Our customer cannot check the content of the file after upload
We have to be careful with upload size and timeout on our server
Files can be quite large (up to 300Mo), so it's better to zip them before upload (can reduce size to 10%).
This is more work for us than just an FTP server (we need to create UI, upload progress, list files to download them back, ...)
There is other solutions? How usually BI applications share data? Is HTTPS a good solutions for us?
We found a solution which is a webdav server. We are using Nextcloud, it provides an online interface, and script access with webdav protocol.
It's more reliable than FTP, because the file appear only when upload is done.
And it's better than HTTP upload on our application. We don't have to handle file upload, create interfaces, ...

Render HTML from a ZIP stream on client side

This is both a strategy and a technical question, I'm building a web posting mechanism and I will need to store a lot of HTML posts (discussions, comments etc.)
I'm thinking about saving all my HTML posts into database as a ZIP compressed stream (instead of plain text or XML) in order to save space and increase security by encrypting those ZIP data steams, so it will be saved to the database compressed (hopefully close to 90% smaller) and secure. (it does not need to be searchable, I'm going to create the search index myself out of the content of each post)
I want to deliver the ZIP object to the web page/cache and then have the client side unzip the stream and render the HTML that it represent.
This is a Microsoft based MVC web site (c#)
I'm trying to figure out reasons not to do it... other than performance, can anyone pinpoint any other issues with doing something like that?
Also, is there any recommended libraries or built-in ones that I should use for better performance - that both server side and client side can understand (zip and unzip with encryption key/password)?
Thanks in advance.
In normal operation, http allows to send the html in a gzipped stream. The webserver compresses the data and sets the corresponding header. The client then unzips transparently.
You simply have to make sure to set the correct header and not have the webserver zip again the already zipped stream.
I see a major drawbacks :
You cannot alter the data. That means you cannot add the code for your template nor link between the pages.
I don't think this is a good approach. Store your data as you like and decompress it on the server.

How download managers work?

Hi from long time i have doubt. when we use "http" protocol to download something the download starts from the first byte of the file. I mean if there is a file of 2MB on the site and when we click it, it starts downloading from the first byte. But when we give the link of the file to the download managers they work differently. I mean after downloading few bytes if we pause they stop downloading and when we resume they start from where they have stopped(not from the beginning). how is this possible?
The answer is the server setting. If a server allows the client to read the file from somewhere after the first byte, the client can specify the number of bytes to skip and the server will start sending the file from that position in the file. If the server doesn't allow then the client is forced to start reading the file from the beginning, whether any download manager is used or not.
For example 4shared.com always allows to start from beginning.
Note: In such cases using any download manager provides no gains.
It really depends on the server where file is hosted if it allows the byte-seeking. In other words, if a file hosting service has "streaming" feature than just "download" feature, applications like download managers will be able to pull a file in pieces & combine them after all the pieces have been downloaded.

Response.TransmitFile vs. Direct Link

I am using a Azure cloud storage solution, and as such, each file has it's own URL. I need to allow users to download files such as PDFs from our ASP .Net website. Currently, we are using Response.TransmitFile to send the file to the client's browser, but this requires that we fetch the data from the cloud storage and then send it to the client (seems like an inefficient way to do it).
I'm wondering if we could not just have a direct link to the file, and if so, how would this differ from the Response.TrnasmitFile method? That is, without the TransmitFile method, we cannot set the Content-type header, etc... How does that effect anything?
Thanks
Usually I stay away from using Response.TransmitFile as it does require that you fetch the file and then stream it down to the client.
The only times I have used it was to protect files and only serve them to users that had permission to access them instead of just linking directly to the file.
If the files you are serving are public, then I would recommend just linking to them. If all you need is to set the Content-Type header, then simply make sure the .pdf extension is mapped to the correct MIME type (application/pdf).

FileUpload virus protection of server

My ASP.NET Application has an FileUpload control. My server doesn't have any antivirus program. If I add a byte to binary content of the file before saving file, does my server affect from virus? When displaying file, I will remove extra byte from the content.
Thanks for replies.
A virus will only cause you problems if it is run on the server (i.e. the file is opened). You can get around this by renaming all uploaded files with a .resources extension. All requests for this type of file are sent by IIS to ASP.NET, which rejects them. So effectively, the files store the data but can't be opened/run at all. Then you can still serve them back by reading their content in an ASP.NET page/module, and returning the data as a file with the correct extension.
Transforming the data as you suggest will also provide a level of protection, though I'd probably do more than add a byte to the end. Perhaps run the whole stream through a reversible algorithm (e.g. a fast encryption or something).
Of course, this doesn't protect the client from any virus.

Resources