BizTalk SFTP receive port not picking files larger than 1GB - sftp

BizTalk SFTP receive port not picking files larger than 1GB(in my csae i can receive upto 5GB files). Even though it picks the file its very slow and before the whole file is dropped into the file folder the orchestration starts unzipping the zip file and throws error : cannot unzip as the file is being used by another process.Any help?

What you are seeing is not a problem with BizTalk Server or the SFTP Adapter.
This is happening because the SFTP Server is allowing the download to start before the file is completely written. This can be because the SFTP Server is not honoring a write lock or the actual source app is doing multiple open-write-close cycles while emitting the data.
So, this is actually not your problem and not a problem you can solve as the client.
The SFTP Server either needs to block the download or a temporary location/filename must be used until the file is complete.
This is not an uncommon problem, but must be fixed on the server side.

Related

Polling SFTP Server for new files

We have a requirement where one of the applications (AppA) will push files (sometimes 15K - 20K files in one go) to SFTP folder. Another Application (AppB) will be polling this folder location to poll the files and upon reading will push Ack file on the same SFTP in different folder. There have been issues with file lost, mismatch between files sent and Ack received. Now, we need to develop one mechanism (as Auditor) to monitor the SFTP location. This **SFTP is on Windows servers. Single file size will be less than 1 MB.
**
We are planning to adopt one of following approach:
Write one external utility in Java, which will keep on polling the SFTP location and download the file locally, read the content of it and store it in local DB for reconciliation.
Pro:
Utility will be a standalone utility no dependency on SFTP server as such (apart from reading the file)
Con:
In addition to AppB, this utility will also be making connection with SFTP and download the file, this may overload the SFTP server and might hamper the regular functioning of AppA and AppB
Write a Java utility/script and deploy it on SFTP server itself as scheduler or it can be configured to listen the respective folder. Upon reading the file locally on SFTP, this utility will call external API to post the content of file and store it in DB for reconciliation.
Pro:
There will be no overhead on SFTP server for connection and file download
File reading will be faster and almost Realtime (in case listener is used)
Con:
Java needs to be installed on SFTP server
This utility will call the external API, and in case of 15K - 20K files, it will slowdown the process of capturing the data and storing in DB
We are currently in design process, need your suggestions and any insight if anyone has implemented similar kind of mechanism.

syslog-ng processing all messages after restart

i'm running syslog-ng inside docker, i'm collecting logs from local files, process them and then write to another logfile or send them to slack.
I noticed that whenever i need to updated syslog-ng config and restart container, syslog-ng re-reads all messages from source logfiles which causes duplications in destination files, slack channel.
Is there option to tell syslog that after reboot only new messages should be processed or maybe process only 1hour old logfiles?
i tried to google/check documentation but without luck, i'm probably not asking the question correctly because i would assume this option exist or not?
thanks
syslog-ng, by default, persists positions for sources where the concept of "bookmarking" or "position-tracking" is applicable.
This is true for regular file sources as well.
All you have to do is keep the syslog-ng persist file intact (syslog-ng.persist under the /var folder).
It sounds like you might be losing the persist file that was mentioned by MrAnno. You could try putting this and the log file on a persisted folder so that syslog-ng can reference where it was last and process from there.

Does BizTalk Server support exchanging large files over Azure File Shares when 3rd Party system is using the REST API?

"Starting with BizTalk Server 2016, you can connect to an Azure file
share using the File adapter. The Azure storage account must be
mounted on your BizTalk Server."
source: https://learn.microsoft.com/en-us/biztalk/core/configure-the-file-adapter
So at first glance, this would appear to be a supported thing to do. And until recently, we have been using Azure File Shares with BizTalk Server with no problems. However, we are now looking to exchange larger files (approx 2 MB). BizTalk Server is consuming the files without any errors but the file contains only NUL bytes. (The message in the tracking database is the correct size but is filled with NUL bytes).
The systems writing the files (Azure Logic Apps, Azure Storage Explorer) are seeing the following error:
{
"status": 409,
"message": "The specified resource may be in use by an SMB client.\r\nclientRequestId: 4e0085f6-4464-41b5-b529-6373fg9affb0",
}
If we try uploading the file to the mounted drive using Windows Explorer (thus using the SMB protocol), the file is picked up without problems by BizTalk Server.
As such, I suspect the BizTalk Server File adapter is not supported when the system writing or consuming the file is using the REST API rather than the SMB protocol.
So my questions are:
Is this a caveat to BizTalk Server support of Azure File Share that is documented somewhere?
Is there anything we can do to make this work?
Or do we just have to use a different way of exchanging files?
We have unsuccessfully investigated/tried the following:
I cannot see any settings in the Azure File Storage connector (as
used by Logic Apps) that would ensure files are locked till they are
fully written.
Tried using the File adapter advanced adapter property “rename files while reading”, this did not solve the problem.
Look at the SFTP-SSH connector. It does message chunking with a total file size of 1 GB or smaller and: Provides the Rename file action, which renames a file on the SFTP server.!!
With an ISE environment you could potentially leverage a total file size of 5B
Here is the solution we have implemented with justifications for this choice.
Chosen Option: We stuck with Azure File Shares and implemented the signal file pattern
The Logic Apps of the integrated system writes a signal file to the same folder where the message file is created. The signal file has the same filename but with a .done extension. e.g. myfile.json.done.
In the BizTalk solution, a custom pipeline component has be written to retrieve the related message file for the signal file.
Note: Concern that the Azure Files connector is still in preview.
Discounted Option 1: Logic Apps use the BizTalk Server connector
Whilst this would work, I was keen to keep a layer of separation between the system and BizTalk. This allows BizTalk applications to be deployed without downtime of the endpoints to system.
Restricts the load levelling (throttling) capabilities of BizTalk Server. Note: we have a custom file adapter to restrict the rate that files are picked up.
This option also requires setup of the “On-Premise Data Gateway”.
Discounted Option 2: Use of File System connector
Logic Apps writes the file in chunks of 2MB and then releases the lock on the file. This enables BizTalk to pick up the file instantly. When the connector tries to write the next chunk of 2MB, the file is not available anymore and hence fails with a 400 status error "The requested action could not be completed. Check your request parameters to make sure the path //test.json' exists on your file system.”
File sizes are limited to 20MB.
Required setup of On-Premise Data Gateway. Note: We also considered this to be a good time to also introduce use of Integration Service Environment (ISE) to host Logic Apps within the vNET. The thinking is that this would keep File exchanges between the system and BizTalk within the network. However, currently there is no ISE specific connector for the File System.
Discounted Option 3: Use of SFTP connector
Our expectation is that logic apps using FTP will experience similar chunking issues while Logic Apps is writing files.
The Azure SFTP connector has no rename action.
We were keen to avoid use of this ageing protocol.
We were keen to avoid extra infrastructure and software needed to support SFTP.
Discounted Option 4: Logic Apps Renames the File once written
There is no rename action in the File Storage REST API or File Connector. Only a Copy action. Our concern Copy is the file still needs time to be written so the same chunking problem remains.
Discounted Option 5: Logic Apps use of Service Bus Connector
The maximum size of a message is 1MB.
Discounted Option 6: Using Azure File Sync to Mirror files to another location.
The File Sync only happens once every 24 hours, as such was not suitable for our integration needs. Microsoft are planning to build change notifications into Azure File Shares to address this.
Microsoft have just announced "Azure Service Bus premium tier namespaces now support sending and receiving message payloads up to 100 MB (up from 1MB previously)."
https://azure.microsoft.com/en-us/updates/azure-service-bus-large-message-support-reaches-general-availability/
https://learn.microsoft.com/en-us/azure/service-bus-messaging/service-bus-premium-messaging#large-messages-support

How to perform checksums during a SFTP file transfer for data integrity?

I have a requirement to perform checksum (for data integrity) for SFTP. I was hoping this could be done during the SFTP file transfer - I realize this could be product dependent (FYI: using CLEO VLTrader), but was wondering if this is customary?
I am also looking for alternative data integrity checking options that are as good (or better) than using a checksum algorithm. Thanks!
With the SFTP, running over an encrypted SSH session, there's negligible chance the file contents could get corrupted while transferring. The SSH itself does data integrity verification.
So unless the contents gets corrupted, when reading the local file or writing the remote file, you can be pretty sure that the file was uploaded correctly, if no error is reported. That implies that a risk of data corruption as about the same as if you were copying the files between two local drives.
If you would not consider it necessary to verify data integrity after copying the files from one local drive to another, then I do not think, you need to verify integrity after an SFTP transfer, and vice versa.
If you want to test explicitly anyway:
While there's the check-file extension to the SFTP protocol to calculate a remote file checksum, it's not widely supported. Particularly it's not supported by the most widespread SFTP server implementation, the OpenSSH. See What SFTP server implementations support check-file extension.
Not many clients/client libraries support it either. You didn't specify, what client/library you are using, so I cannot provide more details.
For details about some implementations, see:
Python Paramiko: How to check if Paramiko successfully uploaded a file to an SFTP server?
.NET WinSCP: Verify checksum of a remote file against a local file over SFTP/FTP protocol
What SFTP server implementations support check-file extension
Other than that, your only option is to download the file back (if uploading) and compare locally.
If you have a shell access to the server, you can of course try to run some shell checksum command (e.g. sha256sum) over a separate shell/SSH connection (or the "exec" channel) and parse the results. But that's not an SFTP solution anymore.
Examples:
Calculate hash of file with Renci SSH.NET in VB.NET
Comparing MD5 of downloaded files against files on an SFTP server in Python

Does deleting a file on a webserver cancel its transmission?

I have a huge file on a server, e.g. a movie. Someone starts to down load that file. The download is not immediate, because the network has a certain maximum transmission rate. While the server is in the process of sending the file, I enter the command to delete the file.
What is the expected behavior?
Is the transmission cancelled?
Is the transmission completed first?
And if it is completed first, what if another request to download that file comes in before the delete command is carried out? Is that request queued behind the delete command or is it carried out parallel to other commands so that it is begun before the delete comes into effect, effectively keeping on blocking it.
On my desktop computer I cannot delete a file that is in use. Do web servers differ?
If the platform is Windows you can't delete the file.
if the platform is Unix- or Linux-based you can delete the file: however it remains in existence while it is open, which includes while it is being transmitted.
I'm not aware of any operating system where you are notified that a file you have open has been deleted, which is the only mechanism that could possibly cause transmission to be cancelled.

Resources