How download managers work? - http

Hi from long time i have doubt. when we use "http" protocol to download something the download starts from the first byte of the file. I mean if there is a file of 2MB on the site and when we click it, it starts downloading from the first byte. But when we give the link of the file to the download managers they work differently. I mean after downloading few bytes if we pause they stop downloading and when we resume they start from where they have stopped(not from the beginning). how is this possible?

The answer is the server setting. If a server allows the client to read the file from somewhere after the first byte, the client can specify the number of bytes to skip and the server will start sending the file from that position in the file. If the server doesn't allow then the client is forced to start reading the file from the beginning, whether any download manager is used or not.
For example 4shared.com always allows to start from beginning.
Note: In such cases using any download manager provides no gains.

It really depends on the server where file is hosted if it allows the byte-seeking. In other words, if a file hosting service has "streaming" feature than just "download" feature, applications like download managers will be able to pull a file in pieces & combine them after all the pieces have been downloaded.

Related

Right way to transfer a CSV file to a BI application?

We are doing a BI application, and our customers send us data files daily. We are doing data exchange using CSV files, because our customers are used to watch data with Excel, and they are not ready yet to use an API on their system (maybe in few years we will be able to use XML/JSON webservice, we hope).
Currently the data transfer is made with FTP (SFTP in fact). Our customers upload file automatically on an FTP server, and we have a CRON task that watches if a file has been sent.
But there are many disadvantages with that:
We cannot know with reliability if the upload is done, or still in progress (we asked them to upload a file with a temporary name, and move it after, but many of them still don't do that)
So, we can try to guess, and consider upload is done if enough time has passed. But FTP protocol doesn't allow to get server time, and time can be desynced. So we can upload an empty file and read it's date to know the time of the server. But we need write permission to do that...
FTP protocol allow to pause upload...
Then, we are considering to transfer files by asking our customer to upload them directly on our application, using HTTPS. This is more reliable, but less convenient:
Our customer cannot check the content of the file after upload
We have to be careful with upload size and timeout on our server
Files can be quite large (up to 300Mo), so it's better to zip them before upload (can reduce size to 10%).
This is more work for us than just an FTP server (we need to create UI, upload progress, list files to download them back, ...)
There is other solutions? How usually BI applications share data? Is HTTPS a good solutions for us?
We found a solution which is a webdav server. We are using Nextcloud, it provides an online interface, and script access with webdav protocol.
It's more reliable than FTP, because the file appear only when upload is done.
And it's better than HTTP upload on our application. We don't have to handle file upload, create interfaces, ...

Uploading file in web applications

This may be immature question but...
When we use html input file control to upload a file, OS encrypts! the full path of the file due to security. i.e.: C:\falsepath\XXXXXX.txt
why security has to be enforced, since the client is the one uploading the file, he obviously knows the location, why can't it just provide full path (client script)
But how does server gets stream of bytes from client?
Can somebody explain me what is happening behind the screen?
OS-windows environment , Browsers -all
Server does not to know what is local path, browser sends to him stream of bytes. Local path is for good looking for user, nothing else.
If you ask: how does BROWSER know where the file is, this is good question, but you didn't write what is your OS.
You should know, that the server is completely separated from the client.
The client application sends to the server a message, which contains the content of the file and a file name (just the name of the file, not the directory. The change of the actual name to the C:\falsepath* is made only to prevent scripts on client's side to know anything about the original location, which may contain sensitive information you don't want to publish.

FileUpload virus protection of server

My ASP.NET Application has an FileUpload control. My server doesn't have any antivirus program. If I add a byte to binary content of the file before saving file, does my server affect from virus? When displaying file, I will remove extra byte from the content.
Thanks for replies.
A virus will only cause you problems if it is run on the server (i.e. the file is opened). You can get around this by renaming all uploaded files with a .resources extension. All requests for this type of file are sent by IIS to ASP.NET, which rejects them. So effectively, the files store the data but can't be opened/run at all. Then you can still serve them back by reading their content in an ASP.NET page/module, and returning the data as a file with the correct extension.
Transforming the data as you suggest will also provide a level of protection, though I'd probably do more than add a byte to the end. Perhaps run the whole stream through a reversible algorithm (e.g. a fast encryption or something).
Of course, this doesn't protect the client from any virus.

Need to check uptime on a large file being hosted

I have a dynamically generated rss feed that is about 150M in size (don't ask)
The problem is that it keeps crapping out sporadically and there is no way to monitor it without downloading the entire feed to get a 200 status. Pingdom times out on it and returns a 'down' error.
So my question is, how do I check that this thing is up and running
What type of web server, and server side coding platform are you using (if any)? Is any of the content coming from a backend system/database to the web tier?
Are you sure the problem is not with the client code accessing the file? Most clients have timeouts and downloading large files over the internet can be a problem depending on how the server behaves. That is why file download utilities track progress and download in chunks.
It is also possible that other load on the web server or the number of users is impacting server. If you have little memory available and certain servers then it may not be able to server that size of file to many users. You should review how the server is sending the file and make sure it is chunking it up.
I would recommend that you do a HEAD request to check that the URL is accessible and that the server is responding at minimum. The next step might be to setup your download test inside or very close to the data center hosting the file to monitor further. This may reduce cost and is going to reduce interference.
Found an online tool that does what I needed
http://wasitup.com uses head requests so it doesn't time out waiting to download the whole 150MB file.
Thanks for the help BrianLy!
Looks like pingdom does not support the head request. I've put in a feature request, but who knows.
I hacked this capability into mon for now (mon is a nice compromise between paying someone else to monitor and doing everything yourself). I have switched entirely to https so I modified the https monitor to do it. The did it the dead-simple way: copied the https.monitor file, called it https.head.monitor. In the new monitor file I changed the line that says (you might also want to update the function name and the place where that's called):
get_https to head_https
Now in mon.cf you can call a head request:
monitor https.head.monitor -u /path/to/file

Export large amounts of data to client in asp.net

I need to export a large amount of data (~100mb) from a sql table to a user via the web. What would be the best solution for doing so? One thought was to export the data to a folder on the db server, compress it (by some means) and then provide a download link for the user. Any other methods for doing so? Also, can we compress data from within sql server?
Any approaches are welcome.
I wouldn't tie up the database waiting for the user to download 100Mb, even for a high speed user. When the user requests the file have them specify an email address. Then call an asynch process to pull the data, write it to a temp file (don't want > 100mb in memory after all), then zip the temp file to a storage location, then send the user an email with a link to download the file.
You can respond to a page request with a file:
Response.AddHeader("Content-Disposition",
"attachment; filename=yourfile.csv");
Response.ContentType = "text/plain";
Be sure to turn buffering off, so IIS can start sending the first part of the file while you are building the second:
Response.BufferOutput = false;
After that, you can start writing the file like:
Response.Write("field1,field2,field3\r\n");
When the file is completely written, end the response, so ASP.NET doesn't append a web page to your file:
Response.End();
This way, you don't have to write files on your web servers, you just create the files in memory and send them to your users.
If compression is required, you can write a ZIP file in the same way. This is a nice free library to create ZIP files.
Your approach works fine. SSIS + 7zip might be useful for automating the process if you need to do it more than a couple times.
If XML is OK, one approach would be to select the data "FOR XML" like this:
http://www.sqljunkies.ddj.com/Article/296D1B56-8BDD-4236-808F-E62CC1908C4E.scuk
And then spit out the raw XML directly to the browser as content-type: text/xml. Also be sure to set up Gzip compression on your web server for files with XML extensions. http://www.microsoft.com/technet/prodtechnol/WindowsServer2003/Library/IIS/502ef631-3695-4616-b268-cbe7cf1351ce.mspx?mfr=true
This will shrink the XML file down to 1/3 or maybe 1/4 the size as it's transferred. This wouldn't be the highest performance option because of the inherent wasted space in XML files, but a lot depends on what format you're looking for in the end.
Another option would be to use the free CSharpZipLib to compress the XML (or whatever format you want) into a zip file that the user would download. Along those lines, if this is something that will be used frequently you might want to look into caching and storing the zip file on the web server with some sort of expiration so it's not regenerated for every single request.
The download link is a perfectly valid and reasonable solution. Another would be to automatically redirect the user to that file so they didn't need to click a link. It really depends on your workflow and UI experience.
I would suggest against implementing compression in the SQL Server engine. Instead look at the DotNetZip library (Or System.IO.Conpression if you think your users have the capability of uncompressing gzip archives) and implement the compression within the web application.

Resources