Http content-length absence and keep-alive - http

I use Nginx + lua module and body_filter_by_lua directive.
Nginx-lua docs said
When the Lua code may change the length of the response body, then it is required to always clear out the Content-Length response header (if any) in a header filter to enforce streaming output.
ngx.header.content_length = nil
Could it break keepalive connections?
Could it break requests on problematic channels?
How client will know that data is completely read from server?
Why Nginx does not forces Transfer-Encoding: chunked for this responses?
Update.
As a temporary solution i convert response to a chunked via
ngx.header['Content-Type'] = "text/html"
ngx.header['Content-Length'] = nil
ngx.header['Transfer-Encoding'] = 'chunked'
and in content-rewrite phase
-- Length of current chunk.
local hexlen = string.format("%x", #ngx.arg[1])
ngx.arg[1] = hexlen .. "\r\n" .. ngx.arg[1] .. "\r\n"
-- Last chunk. Send final sequence.
if (ngx.arg[2]) then
ngx.arg[1] = ngx.arg[1] .. "0\r\n\r\n"
end
Update 2.
Use ngx.location.capture!

Related

R how do I switch off JSON outgoing/incoming header messages?

I have an issue using an R script as a data source in Microsoft PowerBi. I think this is fundermentally an issue with PowerBi, but in the short term I'll need to find a solution in R.
Essentially, PowerBi doesn't appear to be able to handle the messages that would be sent to the console if I was using R Studio.
Within the R script I'm using a REST API to request data from a URL. The JSON message that is received is converted into an R data frame. When using the script as a datasource in PowerBi, this only works if I set the verbose settings to FALSE i.e. if I was using R Studio no messages (in particular data in) are sent to the console.
response <- GET(<url>,
body = list(),
add_headers(.headers = c('<identity token>' = ID_to_use)),
verbose(data_out = FALSE,
data_in = FALSE,
info = FALSE,
ssl = FALSE),
encode = "json")
However, I do not have the option to switch off the incoming/outgoing JSON header messages (which is going to come back to bite!).
<< {"identity":" <token>"}
* Connection #54 to <host> left intact
No encoding supplied: defaulting to UTF-8.
-> GET <URL request> HTTP/1.1
-> Host: <host>
-> User-Agent: libcurl/7.64.1 r-curl/4.3 httr/1.4.1
-> Accept-Encoding: deflate, gzip
-> Accept: application/json, text/xml, application/xml, */*
-> <Identity>: <Identity>
->
<- HTTP/1.1 200 OK
<- X-Session-Expiry: 3599
<- Content-Type: application/json
<- Transfer-Encoding: chunked
<- Date: Thu, 06 Aug 2020 16:14:26 GMT
<- Server: <Server>
<-
No encoding supplied: defaulting to UTF-8.
No encoding supplied: defaulting to UTF-8.
No encoding supplied: defaulting to UTF-8.
From R help
.
.
verbose() uses the following prefixes to distinguish between different components of the http messages:
* informative curl messages
-> headers sent (out)
>> data sent (out)
*> ssl data sent (out)
<- headers received (in)
<< data received (in)
<* ssl data received (in)
.
.
Switching the verbose settings to FALSE works for a single request, however, I need to put the request into a loop and keep requesting more data until the API gateway indicates there is no more data to be received. PowerBi appears to fail when in the script five or more request/replies are sent/received.
Just from observation, I assume this is to do with the JSON Header messages piling up.
I've tried a number of approaches but nothing seems to work: sink('NUL'), invisible(), capture.output().
Any help would be appreciated.
I found a hacky solution, which at least solved the problem I had in R, but not in PowerBi.
By writing a "wrapper" R script (see below) which calls my main script THE_SCRIPT.R using a shell command. THE_SCRIPT dumps out a CSV file, which I then read in the wrapper script:
#Required by PowerBi
library(mice)
#set the directory, between R and the shell it's a pain to deal with spaces in the directories and quotes
setwd("C:/Program Files/R/R-3.6.2/bin/")
system("Rscript.exe C:\\Users\\<USER>\\Documents\\THE_SCRIPT.R > Nul 2>&1")
A_DATA_TABLE <- read.csv("C:\\Users\\<USER>\\Documents\\THE_FILE.csv")
However, this still didn't resolve the issue when running it in PowerBi.
Note, I tried sink('Nul 2>&1') in R, didn't work.

Lua Socket Send 200 OK close connection

What is minimal HTTP 200 OK Connection close response for Nginx/lua/openresty. I have:
local sock, err = ngx.req.socket(true)
sock:send("HTTP/1.1 200 OK\\r\\nConnection: close\\r\\n\\r\\n")
and curl says:
curl: (52) Empty reply from server
In a case of no response body, you should probably use 204 No Content response code; "201 Created" may be an option as well for requests that create resources.
Also: replace each double slash with a single one, as you don't need to escape slash to generate CR LF sequence.

nginx map of header from upstream goes to default

I want to have a conditional header based on a header I want to get from the upstream.
For some reason it always gets translated to default.
Configuration:
upstream service decides if a header called x-no-iframe-protection should exist.
main nginx:
map $http_x_no_iframe_protection $x_frame_options {
yes "";
default "SAMEORIGIN";
}
server {
...
add_header X-Frame-Options $x_frame_options;
...
}
No matter what I try - I get both headers:
$ curl -v myhost
...
< x-no-iframe-protection: yes
< x-frame-options: SAMEORIGIN
...
Just to clarify - I use the x-no-iframe-protection just as a trick to remove x-frame-options in specific cases. I'm OK with it staying (although it is not needed once parsed by nginx)
Anyways - how can I make it get caught in order to replace the header value?
An HTTP transaction contains request headers and response headers. From the context of your question you are setting the value of a response header based on the value of another response header (which was received from upstream).
Nginx stores request headers in variables with names beginning with $http_ and response headers in variables with names beginning with $sent_.
In addition, response headers received from upstream may also be stored in variables with names beginning with $upstream_http_.
In your configuration you use the variable $http_x_no_iframe_protection, whereas you should be using either $sent_x_no_iframe_protection or perhaps $upstream_http_x_no_iframe_protection.
All of the Nginx variables are documented here.
try using $upstream_x_no_iframe_protection to access upstream response header.

How to change Content-length in body_filter_by_lua* in openresty

I am using openresty as a proxy server, which may change response from upstream. Directive header_filter_by_lua* is executed before body_filter_by_lua*. But I changed Content-length in body_filter_by_lua*, and headers has been sent at that time.
So how to set correct Content-length when response from upstream is changed in body_filter_by_lua*?
Thank you!
From https://github.com/openresty/lua-nginx-module#body_filter_by_lua:
When the Lua code may change the length of the response body, then it is required to always clear out the Content-Length response header (if any) in a header filter to enforce streaming output, as in
location /foo {
# fastcgi_pass/proxy_pass/...
header_filter_by_lua_block { ngx.header.content_length = nil }
body_filter_by_lua 'ngx.arg[1] = string.len(ngx.arg[1]) .. "\\n"';
}
I expect that nginx would use http://greenbytes.de/tech/webdav/rfc2616.html#chunked.transfer.encoding in this case (didn't test)

Must the Access-Control-Allow-Origin header include scheme?

I'm having some problems with CORS definitions, and I have a question (not about CORS in general - that I'm fine with - just about the official specification and usage):
According to the IETF, if the Origin header is passed and if it is a URL, that URL must be fully serialized, and must include scheme and host (and optionally port). From https://www.rfc-editor.org/rfc/rfc6454#section-7.1:
The Origin header field has the following syntax:
origin = "Origin:" OWS origin-list-or-null OWS
origin-list-or-null = %x6E %x75 %x6C %x6C / origin-list
origin-list = serialized-origin *( SP serialized-origin )
serialized-origin = scheme "://" host [ ":" port ]
; <scheme>, <host>, <port> from RFC 3986
At least, I think I have understood that correctly.
The IETF also says that the format of the Access-Control-Allow-Origin header must follow the same format. From http://www.w3.org/TR/cors/#access-control-allow-origin-response-header:
Access-Control-Allow-Origin = "Access-Control-Allow-Origin" ":" origin-list-or-null | "*"
and links to the Origin header page.
However, I have seen numerous examples (both here on SO and elsewhere) which show ACAO headers without the scheme (i.e. not an exact 'mirror' of the Origin header), e.g. they show this being passed in the request:
Origin: http://www.example.com
and this as the 'correct' response:
Access-Control-Allow-Origin: www.example.com
So is that ACAO header valid? I thought that the ACAO header had to be an exact mirror of the Origin header value (or '*' or 'null').
If I respond with an ACAO header which doesn't include the scheme, should the User Agent accept it? Or is it on a UA-by-UA basis? What if the Origin includes a port number - do I need to include that in the ACAO response header, with or without the scheme?
As you mentionned, RFC 6454 define the syntax of an origin without ambiguity:
origin = "Origin:" OWS origin-list-or-null OWS
origin-list-or-null = %x6E %x75 %x6C %x6C / origin-list
origin-list = serialized-origin *( SP serialized-origin )
serialized-origin = scheme "://" host [ ":" port ]
and CORS W3C recommandation explicity refer to the same definition.
Access-Control-Allow-Origin = "Access-Control-Allow-Origin" ":" origin-list-or-null | "*"
So the following header is not valid
Access-Control-Allow-Origin: www.example.com
and must not be accepted by User Agent
When generating an Origin header field, the user agent MUST meet the
following requirements:
Each of the serialized-origin productions in the grammar MUST be
the ascii-serialization of an origin.
This is particularly important because of the same-origin policy:
The same-origin policy is one of the cornerstones of security for
many user agents, including web browsers.
Concerning the second part of the question about the port the number, the ASCII serialization of an origin algorithm states:
If the port part of the origin triple is different from the
default port for the protocol given by the scheme part of the
origin triple:
Append a U+003A COLON code point (":") and the given port, in base ten, to result.

Resources