Polling SFTP Server for new files - sftp

We have a requirement where one of the applications (AppA) will push files (sometimes 15K - 20K files in one go) to SFTP folder. Another Application (AppB) will be polling this folder location to poll the files and upon reading will push Ack file on the same SFTP in different folder. There have been issues with file lost, mismatch between files sent and Ack received. Now, we need to develop one mechanism (as Auditor) to monitor the SFTP location. This **SFTP is on Windows servers. Single file size will be less than 1 MB.
**
We are planning to adopt one of following approach:
Write one external utility in Java, which will keep on polling the SFTP location and download the file locally, read the content of it and store it in local DB for reconciliation.
Pro:
Utility will be a standalone utility no dependency on SFTP server as such (apart from reading the file)
Con:
In addition to AppB, this utility will also be making connection with SFTP and download the file, this may overload the SFTP server and might hamper the regular functioning of AppA and AppB
Write a Java utility/script and deploy it on SFTP server itself as scheduler or it can be configured to listen the respective folder. Upon reading the file locally on SFTP, this utility will call external API to post the content of file and store it in DB for reconciliation.
Pro:
There will be no overhead on SFTP server for connection and file download
File reading will be faster and almost Realtime (in case listener is used)
Con:
Java needs to be installed on SFTP server
This utility will call the external API, and in case of 15K - 20K files, it will slowdown the process of capturing the data and storing in DB
We are currently in design process, need your suggestions and any insight if anyone has implemented similar kind of mechanism.

Related

Does BizTalk Server support exchanging large files over Azure File Shares when 3rd Party system is using the REST API?

"Starting with BizTalk Server 2016, you can connect to an Azure file
share using the File adapter. The Azure storage account must be
mounted on your BizTalk Server."
source: https://learn.microsoft.com/en-us/biztalk/core/configure-the-file-adapter
So at first glance, this would appear to be a supported thing to do. And until recently, we have been using Azure File Shares with BizTalk Server with no problems. However, we are now looking to exchange larger files (approx 2 MB). BizTalk Server is consuming the files without any errors but the file contains only NUL bytes. (The message in the tracking database is the correct size but is filled with NUL bytes).
The systems writing the files (Azure Logic Apps, Azure Storage Explorer) are seeing the following error:
{
"status": 409,
"message": "The specified resource may be in use by an SMB client.\r\nclientRequestId: 4e0085f6-4464-41b5-b529-6373fg9affb0",
}
If we try uploading the file to the mounted drive using Windows Explorer (thus using the SMB protocol), the file is picked up without problems by BizTalk Server.
As such, I suspect the BizTalk Server File adapter is not supported when the system writing or consuming the file is using the REST API rather than the SMB protocol.
So my questions are:
Is this a caveat to BizTalk Server support of Azure File Share that is documented somewhere?
Is there anything we can do to make this work?
Or do we just have to use a different way of exchanging files?
We have unsuccessfully investigated/tried the following:
I cannot see any settings in the Azure File Storage connector (as
used by Logic Apps) that would ensure files are locked till they are
fully written.
Tried using the File adapter advanced adapter property “rename files while reading”, this did not solve the problem.
Look at the SFTP-SSH connector. It does message chunking with a total file size of 1 GB or smaller and: Provides the Rename file action, which renames a file on the SFTP server.!!
With an ISE environment you could potentially leverage a total file size of 5B
Here is the solution we have implemented with justifications for this choice.
Chosen Option: We stuck with Azure File Shares and implemented the signal file pattern
The Logic Apps of the integrated system writes a signal file to the same folder where the message file is created. The signal file has the same filename but with a .done extension. e.g. myfile.json.done.
In the BizTalk solution, a custom pipeline component has be written to retrieve the related message file for the signal file.
Note: Concern that the Azure Files connector is still in preview.
Discounted Option 1: Logic Apps use the BizTalk Server connector
Whilst this would work, I was keen to keep a layer of separation between the system and BizTalk. This allows BizTalk applications to be deployed without downtime of the endpoints to system.
Restricts the load levelling (throttling) capabilities of BizTalk Server. Note: we have a custom file adapter to restrict the rate that files are picked up.
This option also requires setup of the “On-Premise Data Gateway”.
Discounted Option 2: Use of File System connector
Logic Apps writes the file in chunks of 2MB and then releases the lock on the file. This enables BizTalk to pick up the file instantly. When the connector tries to write the next chunk of 2MB, the file is not available anymore and hence fails with a 400 status error "The requested action could not be completed. Check your request parameters to make sure the path //test.json' exists on your file system.”
File sizes are limited to 20MB.
Required setup of On-Premise Data Gateway. Note: We also considered this to be a good time to also introduce use of Integration Service Environment (ISE) to host Logic Apps within the vNET. The thinking is that this would keep File exchanges between the system and BizTalk within the network. However, currently there is no ISE specific connector for the File System.
Discounted Option 3: Use of SFTP connector
Our expectation is that logic apps using FTP will experience similar chunking issues while Logic Apps is writing files.
The Azure SFTP connector has no rename action.
We were keen to avoid use of this ageing protocol.
We were keen to avoid extra infrastructure and software needed to support SFTP.
Discounted Option 4: Logic Apps Renames the File once written
There is no rename action in the File Storage REST API or File Connector. Only a Copy action. Our concern Copy is the file still needs time to be written so the same chunking problem remains.
Discounted Option 5: Logic Apps use of Service Bus Connector
The maximum size of a message is 1MB.
Discounted Option 6: Using Azure File Sync to Mirror files to another location.
The File Sync only happens once every 24 hours, as such was not suitable for our integration needs. Microsoft are planning to build change notifications into Azure File Shares to address this.
Microsoft have just announced "Azure Service Bus premium tier namespaces now support sending and receiving message payloads up to 100 MB (up from 1MB previously)."
https://azure.microsoft.com/en-us/updates/azure-service-bus-large-message-support-reaches-general-availability/
https://learn.microsoft.com/en-us/azure/service-bus-messaging/service-bus-premium-messaging#large-messages-support

BizTalk SFTP receive port not picking files larger than 1GB

BizTalk SFTP receive port not picking files larger than 1GB(in my csae i can receive upto 5GB files). Even though it picks the file its very slow and before the whole file is dropped into the file folder the orchestration starts unzipping the zip file and throws error : cannot unzip as the file is being used by another process.Any help?
What you are seeing is not a problem with BizTalk Server or the SFTP Adapter.
This is happening because the SFTP Server is allowing the download to start before the file is completely written. This can be because the SFTP Server is not honoring a write lock or the actual source app is doing multiple open-write-close cycles while emitting the data.
So, this is actually not your problem and not a problem you can solve as the client.
The SFTP Server either needs to block the download or a temporary location/filename must be used until the file is complete.
This is not an uncommon problem, but must be fixed on the server side.

How to perform checksums during a SFTP file transfer for data integrity?

I have a requirement to perform checksum (for data integrity) for SFTP. I was hoping this could be done during the SFTP file transfer - I realize this could be product dependent (FYI: using CLEO VLTrader), but was wondering if this is customary?
I am also looking for alternative data integrity checking options that are as good (or better) than using a checksum algorithm. Thanks!
With the SFTP, running over an encrypted SSH session, there's negligible chance the file contents could get corrupted while transferring. The SSH itself does data integrity verification.
So unless the contents gets corrupted, when reading the local file or writing the remote file, you can be pretty sure that the file was uploaded correctly, if no error is reported. That implies that a risk of data corruption as about the same as if you were copying the files between two local drives.
If you would not consider it necessary to verify data integrity after copying the files from one local drive to another, then I do not think, you need to verify integrity after an SFTP transfer, and vice versa.
If you want to test explicitly anyway:
While there's the check-file extension to the SFTP protocol to calculate a remote file checksum, it's not widely supported. Particularly it's not supported by the most widespread SFTP server implementation, the OpenSSH. See What SFTP server implementations support check-file extension.
Not many clients/client libraries support it either. You didn't specify, what client/library you are using, so I cannot provide more details.
For details about some implementations, see:
Python Paramiko: How to check if Paramiko successfully uploaded a file to an SFTP server?
.NET WinSCP: Verify checksum of a remote file against a local file over SFTP/FTP protocol
What SFTP server implementations support check-file extension
Other than that, your only option is to download the file back (if uploading) and compare locally.
If you have a shell access to the server, you can of course try to run some shell checksum command (e.g. sha256sum) over a separate shell/SSH connection (or the "exec" channel) and parse the results. But that's not an SFTP solution anymore.
Examples:
Calculate hash of file with Renci SSH.NET in VB.NET
Comparing MD5 of downloaded files against files on an SFTP server in Python

Efficient reliable incremental HTTP multi-file (or whole directory) upload software

Imagine you have a web site that you want to send a lot of data. Say 40 files totaling the equivalence of 2 hours of upload bandwidth. You expect to have 3 connection losses along the way (think: mobile data connection, WLAN vs. microwave). You can't be bothered to retry again and again. This should be automated. Interruptions should not cause more data loss than neccessary. Retrying a complete file is a waste of time and bandwidth.
So here is the question: Is there a software package or framework that
synchronizes a local directory (contents) to the server via HTTP,
is multi-platform (Win XP/Vista/7, MacOS X, Linux),
can be delivered as one self-contained executable,
recovers partially uploades files after interrupted network connections or client restarts,
can be generated on a server to include authentication tokens and upload target,
can be made super simple to use
or what would be a good way to build one?
Options I have found until now:
Neat packaging of rsync. This requires an rsync (server) instance on the server side that is aware of a privilege system.
A custom Flash program. As I understand, Flash 10 is able to read a local file as a bytearray (indicated here) and is obviously able to speak HTTP to the originating server. Seen in question 1344131 ("Upload image thumbnail to server, without uploading whole image").
A custom native application for each platform.
Thanks for any hints!
Related work:
HTML5 will allow multiple files to be uploaded or at least selected for upload "at once". See here for example. This is agnostic to the local files and does not feature recovery of a failed upload.
Efficient way to implement a client multiple file upload service basically asks for SWFUpload or YUIUpload (Flash-based multi-file uploaders, otherwise "stupid")
A comment in question 997253 suggests JUpload - I think using a Java applet will at least require the user to grant additional rights so it can access local files
GearsUploader seems great but requires Google Gears - that is going away soon

Programmable, secure FTP replacement

We need to move off traditional FTP for security purposes (it transmits it's passwords unencrypted). I am hearing SSH touted as the obvious alternative. However I have been driving FTP from an ASP.NET program interface to automate my web-site development, which is now quite a highly web-enabled process.
Can anyone recommend a secure way to transfer files around which has a program interface that I can drive from ASP.NET?
sharpssh implements sending files via scp.
the question has three subquestions:
1) choosing the secure transfer protocol
The secure version of old FTP exists - it's called FTP/SSL (plain old FTP over SSL encrypted channel). Maybe you can still use your old deployment infrastructure - just check whether it supports the FTPS or FTP/SSL.
You can check details about FTP, FTP/SSL and SFTP differences at http://www.rebex.net/secure-ftp.net/ page.
2) SFTP or FTP/SSL server for Windows
When you choose whether to use SFTP or FTPS you have to deploy the proper server. For FTP/SSL we use the Gene6 (http://www.g6ftpserver.com/) on several servers without problems. There is plenty of FTP/SSL Windows servers so use whatever you want. The situation is a bit more complicated with SFTP server for Windows - there is only a few working implementations. The Bitvise WinHTTPD looks quite promising (http://www.bitvise.com/winsshd).
3) Internet File Transfer Component for ASP.NET
Last part of the solution is secure file transfer from asp.net. There is several components on the market. I would recommend the Rebex File Transfer Pack - it supports both FTP (and FTP/SSL) and SFTP (SSH File Transfer).
Following code shows how to upload a file to the server via SFTP. The code is taken from our Rebex SFTP tutorial page.
// create client, connect and log in
Sftp client = new Sftp();
client.Connect(hostname);
client.Login(username, password);
// upload the 'test.zip' file to the current directory at the server
client.PutFile(#"c:\data\test.zip", "test.zip");
// upload the 'index.html' file to the specified directory at the server
client.PutFile(#"c:\data\index.html", "/wwwroot/index.html");
// download the 'test.zip' file from the current directory at the server
client.GetFile("test.zip", #"c:\data\test.zip");
// download the 'index.html' file from the specified directory at the server
client.GetFile("/wwwroot/index.html", #"c:\data\index.html");
// upload a text using a MemoryStream
string message = "Hello from Rebex SFTP for .NET!";
byte[] data = System.Text.Encoding.Default.GetBytes(message);
System.IO.MemoryStream ms = new System.IO.MemoryStream(data);
client.PutFile(ms, "message.txt");
Martin
We have used a variation of this solution in the past which uses the SSH Factory for .NET
The traditional secure replacement for FTP is SFTP, but if you have enough control over both endpoints, you might consider rsync instead: it is highly configurable, secure just by telling it to use ssh, and far more efficient for keeping two locations in sync.
G'day,
You might like to look at ProFPD.
Heavily customisable. Based on Apache module structure.
From their web site:
ProFTPD grew out of the desire to have a secure and configurable FTP server, and out of a significant admiration of the Apache web server.
We use our adapted version for large scale transfer of web content. Typically 300,000 updates per day.
HTH
cheers,
Rob

Resources