FTP vs SFTP, this time- stability - unix

I have read a great post regarding the FTP/SFTP speeds: Why when I transfer a file through SFTP, it takes longer than FTP?
But did anyone check, or have a proof or documents,tests showing which of these two is more 'stable' when transferring files (thousands of small files, and couple huge files)?
Which protocol (SFTP, FTP) using native client/server (Unix) is more/less likely to drop the connections or be able to reconnect or even detect stalled connection, and is less probable to mess up the files

Related

How to handle 20k concurrent listeners on an Icecast server

I want to know how to handle more than 20k listeners concurrently on an Icecast server. I am using liquidsoap as the audio stream generator (Only one audio stream is distributed through the Icecast server ). The server is configured on AWS. Further, I want to know whether I need to use LB and CDN to handle this much traffic.
Your main concern is bandwidth. Nothing else, bandwidth. You always run out of bandwidth first. Really.
You'll likely want to spread the load across multiple servers and e.g. do simple DNS round-robin. Also because multiple servers means more bandwidth available.
Feeding in a Primary+Relays (master/slave) topology is typical and documented. For more details I'd recommend the Icecast documentation and searching the Icecast mailing list archives.
There are some minor things like making sure that your ulimit for file descriptors is high enough for the Icecast process.
PS: In theory you can squeeze around 20k concurrent connections out of Icecast, but most of the time you won't have enough actual bandwidth to feed those anyway.

What is the difference between in FTP and HTTP?

HTTP is used to display the info and also can be used to transfer files from one host to another host.
FTP is used to transfer files from one host to another.
So I come to this point that FTP and HTTP both are almost doing the same work. Then what is the exact benefit of using FTP while I can do this with the HTTP?
Correct me if I am wrong.
Thanks
FTP is a File Transfer Protocol, for transferring files.
FTP is significantly older, it is a protocol designed to enable the transfer of files over a long-running session. There are a wide array of commands and the intent is to allow you to navigate and browse a remote file system and retrieve files (originally over a separate data connection).
FTP still sees a lot of use, but many files are actually transferred over HTTP instead.
HTTP
The HyperText Transfer Protocol was originally designed to transfer hypertext documents and the various assets needed to render them. In practice, this is the way information is transferred on the web -- html, css, images, data are all transferred between web servers and web browsers, as well as between one server and another this way.
HTTP was designed to retrieve a resource from a URL that may or may not match the remote file system (in many web apps, the structure of the URLs has very little to do with the file locations). There is often only a single request in a single http connection and the data uses the same connection as the request.
So I come to this point that FTP and HTTP both are almost doing the same work.
Not really. FTP can be used for file transfer and not really much more. HTTP is way more flexible since it not only transfers byte streams but also meta data (what kind of data is this), supports implicit compression, client specific responses (like based on supported languages), has more flexible ways for authentication, is tuned for less overhead (i.e. can be faster) ...
Then what is the exact benefit of using FTP while I can do this with the HTTP?
There is no real benefit of FTP today. In contrary, in contrast to alternatives like HTTP the design of FTP leads to lots of problems in today's infrastructure where NAT is heavily used (i.e. multiple internal systems behind a single router with public IP address).
FTP remains mostly in places where clients or servers don't support more modern ways for file exchange. A typical example is cheap web hosting where access to the server to update files is often done by FTP since lots of tools have FTP builtin and it is easy to setup on the server too. Alternatives like WebDAV (HTTP based) or SFTP (SSH based) are less used here since they have less support in clients and servers even though they would offer more security and more flexibility and less problems.

What are the options for transferring 60GB+ files over the a network?

I'm about to start developing an application to transfer very large files without any rush but with need of reliability. I would like people that had worked coding such a particular case give me an insight of what I'm about to get into.
The environment will be intranet ftp server> so far using active ftp normal ports windows systems. I might need to also zip up the files before sending and I remember working with a library once that would zip in memory and there was a limit on the size... ideas on this would also be appreciated.
Let me know if I need to clarify something else. I'm asking for general/higher level gotchas if any not really detail help. I've done apps with normal sizes (up to 1GB) before but this one seems I'd need to limit the speed so I don't kill the network or things like that.
Thanks for any help.
I think you can get some inspiration from torrents.
Torrents generally break up the file in manageable pieces and calculate a hash of them. Later they transfer them piece by piece. Each piece is verified against hashes and accepted only if matched. This is very effective mechanism and let the transfer happen from multiple sources and also let is restart any number of time without worrying about corrupted data.
For transfer from a server to single client, I would suggest that you create a header which includes the metadata about the file so the receiver always knows what to expect and also knows how much has been received and can also check the received data against hashes.
I have practically implemented this idea on a client server application but the data size was much smaller, say 1500k but reliability and redundancy were important factors. This way, you can also effectively control the amount of traffic you want to allow through your application.
I think the way to go is to use the rsync utility as an external process to Python -
Quoting from here:
the pieces, using checksums, to possibly existing files in the target
site, and transports only those pieces that are not found from the
target site. In practice this means that if an older or partial
version of a file to be copied already exists in the target site,
rsync transports only the missing parts of the file. In many cases
this makes the data update process much faster as all the files are
not copied each time the source and target site get synchronized.
And you can use the -z switch to have compression on the fly for the data transfer transparently, no need to boottle up either end compressing the whole file.
Also, check the answers here:
https://serverfault.com/questions/154254/for-large-files-compress-first-then-transfer-or-rsync-z-which-would-be-fastest
And from rsync's man page, this might be of interest:
--partial
By default, rsync will delete any partially transferred
file if the transfer is interrupted. In some circumstances
it is more desirable to keep partially transferred files.
Using the --partial option tells rsync to keep the partial
file which should make a subsequent transfer of the rest of
the file much faster

Efficient reliable incremental HTTP multi-file (or whole directory) upload software

Imagine you have a web site that you want to send a lot of data. Say 40 files totaling the equivalence of 2 hours of upload bandwidth. You expect to have 3 connection losses along the way (think: mobile data connection, WLAN vs. microwave). You can't be bothered to retry again and again. This should be automated. Interruptions should not cause more data loss than neccessary. Retrying a complete file is a waste of time and bandwidth.
So here is the question: Is there a software package or framework that
synchronizes a local directory (contents) to the server via HTTP,
is multi-platform (Win XP/Vista/7, MacOS X, Linux),
can be delivered as one self-contained executable,
recovers partially uploades files after interrupted network connections or client restarts,
can be generated on a server to include authentication tokens and upload target,
can be made super simple to use
or what would be a good way to build one?
Options I have found until now:
Neat packaging of rsync. This requires an rsync (server) instance on the server side that is aware of a privilege system.
A custom Flash program. As I understand, Flash 10 is able to read a local file as a bytearray (indicated here) and is obviously able to speak HTTP to the originating server. Seen in question 1344131 ("Upload image thumbnail to server, without uploading whole image").
A custom native application for each platform.
Thanks for any hints!
Related work:
HTML5 will allow multiple files to be uploaded or at least selected for upload "at once". See here for example. This is agnostic to the local files and does not feature recovery of a failed upload.
Efficient way to implement a client multiple file upload service basically asks for SWFUpload or YUIUpload (Flash-based multi-file uploaders, otherwise "stupid")
A comment in question 997253 suggests JUpload - I think using a Java applet will at least require the user to grant additional rights so it can access local files
GearsUploader seems great but requires Google Gears - that is going away soon

Comparing HTTP and FTP for transferring files

What are the advantages (or limitations) of one over the other for transferring files over the Internet?
(I am aware of secure forms of both protocols. I'd like to hear comparisons through personal experiences in terms of performance, reliability, file size limitations etc.)
Here's a performance comparison of the two. HTTP is more responsive for request-response of small files, but FTP may be better for large files if tuned properly. FTP used to be generally considered faster. FTP requires a control channel and state be maintained besides the TCP state but HTTP does not. There are 6 packet transfers before data starts transferring in FTP but only 4 in HTTP.
I think a properly tuned TCP layer would have more effect on speed than the difference between application layer protocols. The Sun Blueprint Understanding Tuning TCP has details.
Heres another good comparison of individual characteristics of each protocol.
I just benchmarked a file transfer over both FTP and HTTP :
over two very good server connections
using the same 1GB .zip file
under the same network conditions (tested one after the other)
The result:
using FTP: 6 minutes
using HTTP: 4 minutes
using a concurrent http downloader software (fdm): 1 minute
So, basically under a "real life" situation:
1) HTTP is faster than FTP when downloading one big file.
2) HTTP can use parallel chunk download which makes it 6x times faster than FTP depending on the network conditions.
Many firewalls drop outbound connections which are not to ports 80 or 443 (http & https); some even drop connections to those ports that are not HTTP(S). FTP may or may not be allowed, not to speak of the active/PASV modes.
Also, HTTP/1.1 allows for much better partial requests ("only send from byte 123456 to the end of file"), conditional requests and caching ("only send if content changed/if last-modified-date changed") and content compression (gzip).
HTTP is much easier to use through a proxy.
From my anecdotal evidence, HTTP is easier to make work with dropped/slow/flaky connections; e.g. it is not needed to (re)establish a login session before (re)initiating transfer.
OTOH, HTTP is stateless, so you'd have to do authentication and building a trail of "who did what when" yourself.
The only difference in speed I've noticed is transferring lots of small files: HTTP with pipelining is faster (reduces round-trips, esp. noticeable on high-latency networks).
Note that HTTP/2 offers even more optimizations, whereas the FTP protocol has not seen any updates for decades (and even extensions to FTP have insignificant uptake by users). So, unless you are transferring files through a time machine, HTTP seems to have won.
(Tangentially: there are protocols that are better suited for file transfer, such as rsync or BitTorrent, but those don't have as much mindshare, whereas HTTP is Everywhereâ„¢)
One consideration is that FTP can use non-standard ports, which can make getting though firewalls difficult (especially if you're using SSL). HTTP is typically on a known port, so this is rarely a problem.
If you do decide to use FTP, make sure you read about Active and Passive FTP.
In terms of performance, at the end of the day they're both spewing files directly down TCP connections so should be about the same.
One advantage of FTP is that there is a standard way to list files using dir or ls. Because of this, ftp plays nice with tools such as rsync. Granted, rsync is usually done over ssh, but the option is there.
Both of them uses TCP as a transport protocol, but HTTP uses a persistent connection, which makes the performance of the TCP better.

Resources