Chunking in Azure Logic Apps SFTP-SSH Create action doesn't work - sftp

What I want to achieve:
Send a >50MB file via HTTP to a Logic App
The Logic App to save the file to an SFTP server
An error I am getting in the SFTP-SSH 'Create file' action:
The provided file size '64065320' for the create or update operation
exceeded the maximum allowed file size '52428800' using non-chunked
transfer mode. Please enable chunked transfer mode to create or update
large files.
Chunking on the SFTP-SSH 'Create file' action is enabled. Overriding chunk size doesn't help. Using the body of the 'Compose' action as an input for 'Create file' also doesn't help.
The current workflow:
SFTP-SSH 'Create file' action parameters:
SFTP-SSH 'Create file' action settings:
Error:
Any ideas about the reason of the error?
P.S. I want to clarify the issue; it is about a very specific workflow: when a large file is sent to a Logic App via HTTP (the 'When a HTTP request is received' trigger) it needs to be saved to an SFTP server. Not transformed, just saved as it is. I know that when collecting (pulling) a large file from elsewhere (SFTP/blob/etc.) and saving it to SFTP, chunking works fine. But in this scenario (pushing the file to the Logic App) it doesn't. Although the Handle large messages with chunking in Azure Logic Apps article at first says that "Logic App triggers don't support chunking" and "for actions that support and are enabled for chunking you can't use trigger bodies", then it gives a workaround: "Instead, use the Compose action. Specifically, you must create a body field by using the Compose action to store the data output from the trigger body. Then, to reference the data, in the chunking action, use #body('Compose')". Well, this workaround didn't work for me as seen from the screenshots I provided. I'd appreciate if someone could clarify how to overcome this issue.

According to this Documentation, the endpoint to which you are sending the request, use chunking to download the whole data by sending partial data which enable HTTP connector. To comply with this connector's limit, Logic Apps splits any message larger than 30 MB into smaller chunks. You can split up large content downloads and uploads over HTTP so that your logic app and an endpoint can exchange large messages.
You can even refer HERE which discusses the same topic.

Related

upload file api with uploadtask in symfony 2.8

We realize that if we want to produce a multipart query that contains a video file of 15GB, it is impossible to allocate in memory the size needed for such a large amount of data, most devices have only 2 or 3GB of RAM.
It is therefore absolutely necessary to switch to the uploadTask method which will push to the server the contents of a block file of the maximum size allowed by the IP packets sent to the server.
This is a POST method. However, it does not contain parameters such as the folder id or the file name. So you need a way to transmit these parameters. The best way is to code them in the URL.
I proposed an encoding format in the form of a path behind the endpoint of the API, but we can also very well encode these two parameters in a classic way in the URL, eg:
/api/upload?id=123&filename=video.mp4
From what I read on Stackoverflow, it's trivial with Symfony to retrieve id and filename. Then all the data received in the body of the POST request can be written in a raw way directly into a file, without also passing through a buffer in server-side memory.
The user data must imperatively be streamed, whether mobile side or server side, and whether upload or download. Loading user content in memory is also very dangerous in terms of security.
In symfony, how can I do that?
This goes way beyond Symfony and depends on the web server you are using.
By default with apache/nginx and php you will receive an already buffered request, so you cannot stream it to a file.
However, there are solutions, for example with Apache you can stream requests, see http://hc.apache.org/httpclient-3.x/performance.html#Request_Response_entity_streaming
Probably nginx also has options for it, but I don't know about those.
Another option might be websockets, see http://en.wikipedia.org/wiki/WebSocket

Symfony: call command in controller for lengthy action

I have a application in which I can generate raw export in xls.
The problem is that the xls generation can be very long, more than the timeout duration.
I've checked and my query isn't the culprit (takes <2s for a regular query), but the xls generation is very long (for several thousand lines, I put different colors in cells, conditionally display data...).
I was thinking about the command, which runs in CLI, without timeout problem.
I can't use it directly, because the data to be generated has to be called by users (without cli access).
So I thought about calling the command in my controller
The user would choose the parameters in a form, send the form, and then in the controller, the parameters would be passed to the command that would do the heavy lifting.
My question is: In this case, is the command called in the CLI context (with CLI timeout = 0) or is it called in the application (Web) context (with timeout <50s) ? In the latter case, this would be useless, and I would be grateful for any advice on any alternate method to resolve my problem.
This is a textbook case for a message queue.
RabbitMq is recommended, and easy to use with Symfony.
You will have a producer, which will generate a message and put it in a queue. This will be done in your controller.
The db query and the sheet generation should be placed in the consumer (the command running in the background, picking messages from the queue and processing them).
When the sheet is ready, save it as a file, and perhaps log it in the database with a unique ID.
This migth sound difficult, but it is very simple, and you should learn it anyway :)
A problem is showing the result to the user. The simplest way is to refresh the browser every X seconds. Other choices include polling with ajax, and websocket based notifications from the server.

Efficiently handling HTTP uploads of many large files in Go

There is probably an answer within reach, but most of the search results are "handling large file uploads" where the user does not know what they're doing or "handing many uploads" where the answer consistently is just an explanation of how to work with multipart requests and/or Flash uploader widgets.
I haven't had time to sift through Go's HTTP implementation, yet, but when does the application have the first chance to see the incoming body? Not until it has been completely received?
If I were to [poorly] decide to use HTTP to transfer a large amount of data and posted a single request with several 10-gigabyte parts, would I have to wait for the whole thing to be received before processing it or does the io.Reader with the body iteratively process it?
This is only tangentially related, but I also haven't been able to get a clear answer about whether I can choose to forcibly close the connection in the middle; whether or not, even if I close it, it will just keep receiving it on the port.
Thanks so much.
An application's handler is called after the headers are parsed and before the request body is read. The handler can read the request body as soon as the handler is called. The server does not buffer the entire request body.
An application can read file uploads without buffering the entire request by getting a multipart reader and iterating through the parts.
An application can replace the request body with a MaxBytesReader to force close the connection after a specified limit is breached.
The above comments are about the net/http server included in the standard library. The comments may not apply to other servers.
While I haven't done this with GB size files, my strategy with file processing (mostly stuff I read from and write to S3) is to use https://golang.org/pkg/os/exec/ with a cmd line utility that handles chunking a way you like. Then read and process by tailing the file as explained here: Reading log files as they're updated in Go
In my situations, network utilities can download the data far faster than my code can process it, so it makes sense to send it to disk and pick it up as fast as I can, that way I'm not holding some connection open while I process.

Can a web server begin responding before the client has sent the full request?

I am writing a web application for an academic research group. The researchers need to be able to upload large data sets (100MB - 1GB) in CSV format. I've written the server to process the data as it comes in. This means that if there is an error in the first row of the CSV, we can return an error straight away.
However, when this happens, the browser reports that "The connection was reset" or similar. Clearly, my web server is responding in a way that doesn't make sense.
If I explicitly close the HTTP request stream (this is Kotlin on the JVM by the way) before returning the error to the browser, then the problem goes away. However, it turns out that the close implementation of the request stream first goes and reads the whole stream to its end. So at that point the user still has to wait 30mins+ to find out that there is an error in the first row of their CSV.
Is what I am trying to do possible? Does the HTTP protocol permit a web server, in any circumstances, to begin responding before the full request body has been sent? If not, can you suggest a workaround that would allow me to deliver a user experience where the user doesn't have to wait for the whole file to be uploaded before finding out if there are any problems?
The answer is yes, according to the http spec servers should be able to send responses early and the client should stop sending the request body. Most browsers however, don't implement this correctly.
In theory, your http server needs to return a 4xx error code with a response body, then reset the connection to prevent the upload continuing in the background. See the answers below for a more detailed description of the issue. There are a couple of browser versions that do support this, so if you're doing this in lab conditions where you can control the client being used the links below will help.
https://stackoverflow.com/a/14483857/2274303
https://stackoverflow.com/a/18370751/2274303
[edit]
To answer your question about using a workaround, chunking the uploads using javascript is a good way to mitigate internet connectivity issues, but if you want to parse it in real time it's not as simple as arbitrarily breaking up the file into pieces. You need to make sure you're not splitting the file in the middle of a line, otherwise it will fail even if the data is valid. That brings up the issue of parsing a 1GB file in javascript, which isn't a good idea imo.
If you want to use javascript, continue uploading the entire file at once via an ajax request, so you can get the response outside of the main dom and force a redirect or cancel the upload. Depending on which js libraries you're using there are different ways of doing this.
None of this solves the reverse scenario. What if the file is 95% uploaded before there's an error? The researcher will need to either upload the whole thing again or edit the file to only include the rows from the error going forward. That means your application needs to support partial uploads and know to pick up where it left off. All these things are possible, but you're probably not going to find a simple workaround to get this working well.
Without understanding the dataset and what kind of validation you are doing it's hard to come up with a full solution. If parsing each row doesn't depend on the previous rows being valid, you could always upload the whole file, then display the rows with errors at the end and ask them to upload a second file with just the corrections.
The normal process of a HTTP web server happens like:
Server listens for request
Client creates request
Client sends request to server
Server processes request
Server creates response
Server sends response to client
Client processes response
The client starts the connection for communication and the server is able to respond on that connection, however if you close the connection the server will need to send a response on another connection. The browser may not allow the server to start a new connection that the client didn't request.
You may be able to respond by reading the first line and creating an error quickly, but the client will not read the response until it is done sending the request.
By sending the file in chunks or asynchronously sending lines of the file, you will be able to give feedback more immediately. You will be sending many smaller requests with the ability to respond in between.
The question was about HTTP protocol. I feel like this would be allowed by the protocol if you wrote a custom app and web app, however if you are using browsers then you must use HTTP as the companies have implemented it. In a custom app you could check for interruptions however most browsers will probably fire a full request before listening for a response, which is also a reason AJAX took off 20years ago.

Does Firebase guarantee that data set using updateValues or setValue is available in the backend as one atomic unit?

We have an application that uses base64 encoded content to transmit attachments to backend. Backend then moves the content to Storage after some manipulation. This way we can enjoy world class offline support and sync and at the same time use the much cheaper Storage to store the files in the end.
Initially we used updateChildren to set the content in one go. This works fairly well, but then users started to upload bigger and more files at the same time, resulting in silent freezing of the database in the end user devices.
We then changed the code to write the files one by one using FirebaseDatabase.getInstance().getReference("/full/uri").setValue(base64stuff), and then using updateChildren to only set the metadata.
This allowed seemingly endless amount of files (provided that it is chopped to max 9 meg chunks), but now we're facing another problem.
Our backend uses Firebase listener to start working once new content is available. The trigger waits for the metadata and then starts to process the attachments. It seems that even though the client device writes the files before we set the metadata, the backend usually receives the metadata before the content from the files is available. This forced us to change backend code to stop processing and check later again if the attachment base64 data is available.
This works, but is not elegant and wastes cpu cycles and increases latencies.
I haven't found anything in the docs wether Firebase guarantees anything about the order in which the data is received by the backend. It seems that everything written in one go (using setValue or updateChildren) is available in the backend as one atomic unit.
Is this correct? Can I depend on that as a fact that will not change in the future?
The way I'm going to go about this (if the assumptions are correct above) is to write metadata first using updateChildren in the client like this
"/uri/of/metadata/uid/attachments/attachment_uid1" = "per attachment metadata"
"/uri/of/metadata/uid/attachments/attachment_uid2" = "per attachment metadata"
and then each base64 chunk using updateChildren with following payload:
"/uri/of/metadata/uid/uploaded_attachments/attachment_uid2" = true
"/uri/of/base64/content/attachment_uid" = "base64content"
I can't use setValue for any data to prevent accidental overwrite depending the order in which the writes will happen in the end.
This would allow me to listen to /uri/of/base64/content and try to start the handling of the metadata package every time a new attachment completes the load. The only thing needed to determine if all files have been already uploaded is to grab the metadata and see that all attachment uids found from /attachments/ are also present /uploaded_attachments/.
Writes from a single Firebase Database client are delivered to the server in the same order as they are executed on the client. They are also broadcast out to any listening clients in the same order.
There is no chance that another client will see the results of write B without seeing the results from write A (unless A was rejected by security rules)

Resources