Efficient reliable incremental HTTP multi-file (or whole directory) upload software - http

Imagine you have a web site that you want to send a lot of data. Say 40 files totaling the equivalence of 2 hours of upload bandwidth. You expect to have 3 connection losses along the way (think: mobile data connection, WLAN vs. microwave). You can't be bothered to retry again and again. This should be automated. Interruptions should not cause more data loss than neccessary. Retrying a complete file is a waste of time and bandwidth.
So here is the question: Is there a software package or framework that
synchronizes a local directory (contents) to the server via HTTP,
is multi-platform (Win XP/Vista/7, MacOS X, Linux),
can be delivered as one self-contained executable,
recovers partially uploades files after interrupted network connections or client restarts,
can be generated on a server to include authentication tokens and upload target,
can be made super simple to use
or what would be a good way to build one?
Options I have found until now:
Neat packaging of rsync. This requires an rsync (server) instance on the server side that is aware of a privilege system.
A custom Flash program. As I understand, Flash 10 is able to read a local file as a bytearray (indicated here) and is obviously able to speak HTTP to the originating server. Seen in question 1344131 ("Upload image thumbnail to server, without uploading whole image").
A custom native application for each platform.
Thanks for any hints!
Related work:
HTML5 will allow multiple files to be uploaded or at least selected for upload "at once". See here for example. This is agnostic to the local files and does not feature recovery of a failed upload.
Efficient way to implement a client multiple file upload service basically asks for SWFUpload or YUIUpload (Flash-based multi-file uploaders, otherwise "stupid")
A comment in question 997253 suggests JUpload - I think using a Java applet will at least require the user to grant additional rights so it can access local files
GearsUploader seems great but requires Google Gears - that is going away soon

Related

Polling SFTP Server for new files

We have a requirement where one of the applications (AppA) will push files (sometimes 15K - 20K files in one go) to SFTP folder. Another Application (AppB) will be polling this folder location to poll the files and upon reading will push Ack file on the same SFTP in different folder. There have been issues with file lost, mismatch between files sent and Ack received. Now, we need to develop one mechanism (as Auditor) to monitor the SFTP location. This **SFTP is on Windows servers. Single file size will be less than 1 MB.
**
We are planning to adopt one of following approach:
Write one external utility in Java, which will keep on polling the SFTP location and download the file locally, read the content of it and store it in local DB for reconciliation.
Pro:
Utility will be a standalone utility no dependency on SFTP server as such (apart from reading the file)
Con:
In addition to AppB, this utility will also be making connection with SFTP and download the file, this may overload the SFTP server and might hamper the regular functioning of AppA and AppB
Write a Java utility/script and deploy it on SFTP server itself as scheduler or it can be configured to listen the respective folder. Upon reading the file locally on SFTP, this utility will call external API to post the content of file and store it in DB for reconciliation.
Pro:
There will be no overhead on SFTP server for connection and file download
File reading will be faster and almost Realtime (in case listener is used)
Con:
Java needs to be installed on SFTP server
This utility will call the external API, and in case of 15K - 20K files, it will slowdown the process of capturing the data and storing in DB
We are currently in design process, need your suggestions and any insight if anyone has implemented similar kind of mechanism.

Does BizTalk Server support exchanging large files over Azure File Shares when 3rd Party system is using the REST API?

"Starting with BizTalk Server 2016, you can connect to an Azure file
share using the File adapter. The Azure storage account must be
mounted on your BizTalk Server."
source: https://learn.microsoft.com/en-us/biztalk/core/configure-the-file-adapter
So at first glance, this would appear to be a supported thing to do. And until recently, we have been using Azure File Shares with BizTalk Server with no problems. However, we are now looking to exchange larger files (approx 2 MB). BizTalk Server is consuming the files without any errors but the file contains only NUL bytes. (The message in the tracking database is the correct size but is filled with NUL bytes).
The systems writing the files (Azure Logic Apps, Azure Storage Explorer) are seeing the following error:
{
"status": 409,
"message": "The specified resource may be in use by an SMB client.\r\nclientRequestId: 4e0085f6-4464-41b5-b529-6373fg9affb0",
}
If we try uploading the file to the mounted drive using Windows Explorer (thus using the SMB protocol), the file is picked up without problems by BizTalk Server.
As such, I suspect the BizTalk Server File adapter is not supported when the system writing or consuming the file is using the REST API rather than the SMB protocol.
So my questions are:
Is this a caveat to BizTalk Server support of Azure File Share that is documented somewhere?
Is there anything we can do to make this work?
Or do we just have to use a different way of exchanging files?
We have unsuccessfully investigated/tried the following:
I cannot see any settings in the Azure File Storage connector (as
used by Logic Apps) that would ensure files are locked till they are
fully written.
Tried using the File adapter advanced adapter property “rename files while reading”, this did not solve the problem.
Look at the SFTP-SSH connector. It does message chunking with a total file size of 1 GB or smaller and: Provides the Rename file action, which renames a file on the SFTP server.!!
With an ISE environment you could potentially leverage a total file size of 5B
Here is the solution we have implemented with justifications for this choice.
Chosen Option: We stuck with Azure File Shares and implemented the signal file pattern
The Logic Apps of the integrated system writes a signal file to the same folder where the message file is created. The signal file has the same filename but with a .done extension. e.g. myfile.json.done.
In the BizTalk solution, a custom pipeline component has be written to retrieve the related message file for the signal file.
Note: Concern that the Azure Files connector is still in preview.
Discounted Option 1: Logic Apps use the BizTalk Server connector
Whilst this would work, I was keen to keep a layer of separation between the system and BizTalk. This allows BizTalk applications to be deployed without downtime of the endpoints to system.
Restricts the load levelling (throttling) capabilities of BizTalk Server. Note: we have a custom file adapter to restrict the rate that files are picked up.
This option also requires setup of the “On-Premise Data Gateway”.
Discounted Option 2: Use of File System connector
Logic Apps writes the file in chunks of 2MB and then releases the lock on the file. This enables BizTalk to pick up the file instantly. When the connector tries to write the next chunk of 2MB, the file is not available anymore and hence fails with a 400 status error "The requested action could not be completed. Check your request parameters to make sure the path //test.json' exists on your file system.”
File sizes are limited to 20MB.
Required setup of On-Premise Data Gateway. Note: We also considered this to be a good time to also introduce use of Integration Service Environment (ISE) to host Logic Apps within the vNET. The thinking is that this would keep File exchanges between the system and BizTalk within the network. However, currently there is no ISE specific connector for the File System.
Discounted Option 3: Use of SFTP connector
Our expectation is that logic apps using FTP will experience similar chunking issues while Logic Apps is writing files.
The Azure SFTP connector has no rename action.
We were keen to avoid use of this ageing protocol.
We were keen to avoid extra infrastructure and software needed to support SFTP.
Discounted Option 4: Logic Apps Renames the File once written
There is no rename action in the File Storage REST API or File Connector. Only a Copy action. Our concern Copy is the file still needs time to be written so the same chunking problem remains.
Discounted Option 5: Logic Apps use of Service Bus Connector
The maximum size of a message is 1MB.
Discounted Option 6: Using Azure File Sync to Mirror files to another location.
The File Sync only happens once every 24 hours, as such was not suitable for our integration needs. Microsoft are planning to build change notifications into Azure File Shares to address this.
Microsoft have just announced "Azure Service Bus premium tier namespaces now support sending and receiving message payloads up to 100 MB (up from 1MB previously)."
https://azure.microsoft.com/en-us/updates/azure-service-bus-large-message-support-reaches-general-availability/
https://learn.microsoft.com/en-us/azure/service-bus-messaging/service-bus-premium-messaging#large-messages-support

Can uploads be too fast?

I'm not sure if this is the right place to ask this, but I'll do it anyway.
I have developed an uploader for the Umbraco CMS that lets people upload a queue of files in one go. This uses some simple flash app that just calls a .NET ashx to upload the files one at a time. When one is done, the next one starts.
Recently I've had a user hit a problem where 1 or 2 uploads will go up fine, but then the rest fail. This happens for himself and a client of his. After some debugging, he thinks he's found the problem, but it seems weird so was wondering if anyone else has had this problem?
Both him and his client are on a fibre optic broadband connection so have got really fast upload speeds. When it was tested on a lesser speed broadband connection, all the files were uploaded no problem. According to one of his developer friends, apparently they had come across it before and had to put a slight delay in the upload script to make it work.
Does this sound possible? Had anyone else hit this problem? Is there a known workaround to prevent the uploads from failing?
I have not struck this precise problem before, but I have done a lot of diagnosis of DSL and broadband troubleshooting before, so will do my best to answer this.
There are 2 possible causes for this particular symptom, both generally outside of your network control (I would have thought).
1) Packet loss
Of course where some links receive a very high volume of traffic then they can chose to just dump a lot of data (eg all that is over that link maximum set size), but TCP/IP should be controlling that, and also expecting that sort of thing to drop from time to time, so this seems less likely.
2) The receiving server
May have some HTTP bottlenecks into that server or even the receiving server CPU / RAM etc, may be at capacity.
From a troubleshooting perspective, even if these symptoms shouldn't (in theory) exist, the fact that they do, and you have a specific
Next steps if you really need to understand how it is all working might be to get some sort of packet sniffer (like WireShark) to try to work out at a packet level what exactly is happening.
Also Socket programming can often program directly to the TCP/IP sockets, so you would be processing at the lower network layers, and seeing the responses and timeouts etc.
Also if you control the receiving server, then you can do the same from that end, or at least review the error logs to see what is getting thrown up as a problem.
A really basic method could be to send a pathping to the receiving server if that is possible, and that might highlight slow nodes getting the server, or packet loss between your local machine and the end server.
The upshot? Put in a slow down function in the upload code, and that should at least make the code work.
Get in touch if you need any analysis of the WireShark stuff.
I have run into a similar problem with an MVC2 website using Flash uploader and Firefox. The servers were load balanced with a Big-IP load balancer. What we found out in debugging this is that Flash, in Firefox, did not send the session ID on continuation requests and the load balancer would send continuation requests off to another server. Because the user had no session on the new server, the request failed.
If a file could be sent in one chunk, it would upload fine. If it required a second chunk, it failed. Because of this the upload would fail after an undetermined number of files being uploaded.
To fix it, I wrote a Silverlight uploader.

Secure data transfer over HTTP when HTTPS is not an option

I would like to write an application to manage files, directories and processes on hundreds of remote PCs. There are measurement programs running on these machines, which are currently managed manually using TightVNC / RealVNC. Since the number of machines is large (and increasing) there is a need for automatic management. The plan is that our operators would get a scriptable client application, from which they could send queries and commands to server applications running on each remote PC.
For the communication, I would like to use a TCP-based custom protocol, but it is administratively complicated and would take very long to open pinholes in every firewall in the way. Fortunately, there is a program with a built-in TinyWeb-based custom web server running on every remote PC, and port 80 is opened in every firewall. These web servers serve requests coming from a central server, by starting a CGI program, which loads and sends back parts of the log files of measurement programs.
So the plan is to write a CGI program, and communicate with it from the clients through HTTP (using GET and POST). Although (most of) the remote PCs are inside the corporate intranet, they are scattered all over the country, I would like to secure the communication. It would not be wise to send commands, which manipulate files and processes, in plain text. Unfortunately the program which contains the web server cannot be touched, so I cannot simply prepare it for HTTPS. I can only implement the security layer in the client and in the CGI program. What should I do?
I have read all similar questions in SO, but I am still not sure what to do in this specific situation. Thank you for your help.
There are several webshells but as far as I can see ( http://www-personal.umich.edu/~mressl/webshell/features.html ) they run on the top of an existing SSL/TLS layer.
There is also S-HTTP.
There are several ways of authenticating to an server (username/passwort) in a protected way, without SSL. http://www.switchonthecode.com/tutorials/secure-authentication-without-ssl-using-javascript . But these solutions are focused only on sending a username/password to the server.
Would it be possible to implement something like message-level security in SOAP/WS-Security? I realise this might be a bit heavy duty and complicated to implement, but at least it is
standardised
definitely secure
possibly supported by some libraries or frameworks you could use
suitable for HTTP

Programmable, secure FTP replacement

We need to move off traditional FTP for security purposes (it transmits it's passwords unencrypted). I am hearing SSH touted as the obvious alternative. However I have been driving FTP from an ASP.NET program interface to automate my web-site development, which is now quite a highly web-enabled process.
Can anyone recommend a secure way to transfer files around which has a program interface that I can drive from ASP.NET?
sharpssh implements sending files via scp.
the question has three subquestions:
1) choosing the secure transfer protocol
The secure version of old FTP exists - it's called FTP/SSL (plain old FTP over SSL encrypted channel). Maybe you can still use your old deployment infrastructure - just check whether it supports the FTPS or FTP/SSL.
You can check details about FTP, FTP/SSL and SFTP differences at http://www.rebex.net/secure-ftp.net/ page.
2) SFTP or FTP/SSL server for Windows
When you choose whether to use SFTP or FTPS you have to deploy the proper server. For FTP/SSL we use the Gene6 (http://www.g6ftpserver.com/) on several servers without problems. There is plenty of FTP/SSL Windows servers so use whatever you want. The situation is a bit more complicated with SFTP server for Windows - there is only a few working implementations. The Bitvise WinHTTPD looks quite promising (http://www.bitvise.com/winsshd).
3) Internet File Transfer Component for ASP.NET
Last part of the solution is secure file transfer from asp.net. There is several components on the market. I would recommend the Rebex File Transfer Pack - it supports both FTP (and FTP/SSL) and SFTP (SSH File Transfer).
Following code shows how to upload a file to the server via SFTP. The code is taken from our Rebex SFTP tutorial page.
// create client, connect and log in
Sftp client = new Sftp();
client.Connect(hostname);
client.Login(username, password);
// upload the 'test.zip' file to the current directory at the server
client.PutFile(#"c:\data\test.zip", "test.zip");
// upload the 'index.html' file to the specified directory at the server
client.PutFile(#"c:\data\index.html", "/wwwroot/index.html");
// download the 'test.zip' file from the current directory at the server
client.GetFile("test.zip", #"c:\data\test.zip");
// download the 'index.html' file from the specified directory at the server
client.GetFile("/wwwroot/index.html", #"c:\data\index.html");
// upload a text using a MemoryStream
string message = "Hello from Rebex SFTP for .NET!";
byte[] data = System.Text.Encoding.Default.GetBytes(message);
System.IO.MemoryStream ms = new System.IO.MemoryStream(data);
client.PutFile(ms, "message.txt");
Martin
We have used a variation of this solution in the past which uses the SSH Factory for .NET
The traditional secure replacement for FTP is SFTP, but if you have enough control over both endpoints, you might consider rsync instead: it is highly configurable, secure just by telling it to use ssh, and far more efficient for keeping two locations in sync.
G'day,
You might like to look at ProFPD.
Heavily customisable. Based on Apache module structure.
From their web site:
ProFTPD grew out of the desire to have a secure and configurable FTP server, and out of a significant admiration of the Apache web server.
We use our adapted version for large scale transfer of web content. Typically 300,000 updates per day.
HTH
cheers,
Rob

Resources