Documentation for Airflow HTTP Operator/Sensor Extra Options? - airflow

I'm trying to read into the extra_options setting for Airflow to see what properties are possible to set (mainly interested in http timeout). I can't find any supporting documentation for this specific param anywhere: https://airflow.readthedocs.io/en/1.9.0/code.html?highlight=requests#airflow.operators.SimpleHttpOperator.
Has anyone worked with this before and is able to help?

According to source code (airflow.hooks.http_hook.HttpHook.run_and_check) extra_options uses these parameters:
response = session.send(
prepped_request,
stream=extra_options.get("stream", False),
verify=extra_options.get("verify", False),
proxies=extra_options.get("proxies", {}),
cert=extra_options.get("cert"),
timeout=extra_options.get("timeout"),
allow_redirects=extra_options.get("allow_redirects", True))
You can read more about them in requests library docs:
stream – (optional) whether to immediately download the response content. Defaults to False.
verify – (optional) Either a boolean, in which case it controls whether we verify the server’s TLS certificate, or a string, in which case it must be a path to a CA bundle to use. Defaults to True.
proxies – (optional) Dictionary mapping protocol or protocol and hostname to the URL of the proxy.
cert – (optional) if String, path to ssl client cert file (.pem). If Tuple, (‘cert’, ‘key’) pair.
timeout (float or tuple) – (optional) How long to wait for the server to send data before giving up, as a float, or a (connect timeout, read timeout) tuple.
allow_redirects (bool) – (optional) Set to True by default.

Following this trail of links in Airflow's source-code, you can easily determine what all things can be passed in SimpleHttpOperator, or more specifically, in extra field of Http Connection. I'm hereby adding trail of calls in Airflow's source that I used to trace the usage of extra_options
extra_options are passed to run() method of HttpHook
run() method of HttpHook passes those extra_options to run_and_check() method
run_and_check() method extracts variety of information from extra_options, as shown in the source-code snippet below
try:
response = session.send(
prepped_request,
stream=extra_options.get("stream", False),
verify=extra_options.get("verify", False),
proxies=extra_options.get("proxies", {}),
cert=extra_options.get("cert"),
timeout=extra_options.get("timeout"),
allow_redirects=extra_options.get("allow_redirects", True))
if extra_options.get('check_response', True):
self.check_response(response)
return response
except requests.exceptions.ConnectionError as ex:
self.log.warning(str(ex) + ' Tenacity will retry to execute the operation')
raise ex

Related

Mocked request doing actual requests call, not mock

I am mocking a get request in my unittest code using requests-mock, but when I run the code during testing, it still tries to hit the actual URL instead of returning the mocked data.
This is my code:
try:
response = requests.get(api_url, auth=requests.auth.HTTPBasicAuth(username, password))
response.raise_for_status()
except requests.ConnectionError as e:
raise dke.CLIError(f"Could not connect to Artifactory server to get NPM auth information: {str(e)}")
This is my test code
class MyTest(unittest.TestCase):
#classmethod
def setUpClass(cls):
m = requests_mock.Mocker()
m.get('https://artifactory.apps.openshift-sandbox.example.com/artifactory/api/npm/auth',
text=("_auth = base64string==\n"
"always-auth = true\n"
"email = shareduser#fake.com"))
The api_url in my code matches the URL I pass to m.get(). However, when I run the test, I do not get the value of "text" but instead I get a 401 Client Error: Unauthorized and a response from the server indicating "Bad credentials" which tells me it actually tried to contact the server instead of returning the text I had requested in the Mock.
I'm lost. As far as I can tell, I'm using this exactly as the docs indicate. Any ideas?
So, it seems you can't use the request_mocker that way. It has to be a decorator or context manager.
I mean the context manager/decorator is just a pattern around normal code. I haven't tested it but i think you just have to call m.start()/m.stop().
It's generally used as a context manager or decorator because if you instantiate it once like that then your request history is going to include all requests across all unit tests which is very hard to make assertions about.

Http request command line tool

I am developing a Command Line Tool in Swift 3, i have this code:
let url = "www.google.com"
var request = URLRequest(url:url)
request.httpMethod = "GET"
let task = session.dataTask(with: a){ (data,response,error) in
print("REACHED")
handler(response,data)
}
task.resume()
I cannot reach the task if i use "http://" or "https://" as prefix in any url, i am wondering if i need a App Transport Security plist, i already tried create a simple plist, anyone knows some if has a particularity for this problem?
When specifying a URL, you need the "scheme" (e.g. http:// or https://).
The url is a URL, not a String, so it should be:
let url = URL(string: "http://www.google.com")!
Yes, you need Info.plist entry if you want to use http://. E.g. https://stackoverflow.com/a/37552442/1271826 or Transport security has blocked a cleartext HTTP or just search stack overflow for "[ios] http info.plist" or with [osx].
Note, in Xcode 8.1, console apps don't necessarily have a Info.plist file, so if you don't have one, you may have to add one by pressing command+n:
update your target settings to specify the plist:
and then add the appropriate settings, e.g.:
I assume where you have URLRequest(with: a) you meant URLRequest(with: request).
You'll need something to keep the command app alive while you're performing the request (e.g. a semaphore or something like that).

How to get client ip address in HttpHandler (Julia Language)?

I need to get client's IP address in regular HttpHandler like this:
http = HttpHandler() do req::Request, res::Response
Response( ismatch(r"^/hello/",req.resource) ? string("Hello ", split(req.resource,'/')[3], "!") : 404 )
end
Neither req nor http.sock contain this information.
The Approach
This can be done, if you know the internals of Julia a little. It turns out that Julia uses the library libuv for low level system processing and that library has a function called uv_tcp_getpeername. This function is not exported by Julia.Base, but you can gain access to it via ccall. Also, the module HttpServer allows for way define a callback for various events, including the connect event.
Example Module
module HTTPUtil
export get_server
using HttpServer
function handle_connect(client)
try
buffer = Array(Uint8,32)
bufflen::Int64 = 32
ccall(:uv_tcp_getpeername,Int64,(Ptr{Void},Ptr{Uint8},Ptr{Int64}),client.sock.handle,buffer,&bufflen)
peername::IPv4 = IPv4(buffer[5:8]...)
task_local_storage(:ip,peername)
catch e
println("Error ... $e")
end
end
function get_server()
http = HttpHandler() do req::Request, res::Response
ip = task_local_storage(:ip)
println("Connection received from from $ip")
Response(ismatch(r"^/hello/",req.resource)?string("Hello ",split(req.resource,'/')[3], " $(ip)!") : 404 )
end
http.events["connect"]=(client)->handle_connect(client)
server = Server(http)
end
end
Explained
Each time a connection request is made, a peer socket is created by the server, and the connect handler is called which is defined to be handle_connect. It takes one parameter, client, of type Client. A Client type has a field called sock of type TcpSocket, and a TcpSocket has a field handle which is used by libuv. The object, then is each time a connection request is made, the connect handler is called, which calls uv_tcp_getpeername with the data contained in the TcpSocket handle. A byte array is declared to act as a buffer, which then is cast back to Base.IPv4. The module HTTPServer creates exactly 1 task for each client using #async, so the ip address can be stored local to the client using task_local_storage; thus there is no race condition.
Using it
julia> using HTTPUtil
julia> server = get_server()
Server(HttpHandler((anonymous function),TcpServer(init),Dict{ASCIIString,Function} with 3 entries:
"error" => (anonymous function)
"listen" => (anonymous function)
"connect" => (anonymous function)),nothing)
julia> #async run(server,8000)
Listening on 8000...
Task (queued) #0x000000000767e7a0
julia> Connection received from from 192.168.0.23
Connection received from from 192.168.0.22
... etc
Notes
For illustration, the output is modified so that the server will respond to each browser "Hello ipaddr"
This should be included in Base and/or HttpServer, but currently is not, so you'll need to use this workaround until it is.
The typical looping structure is used in get_server to illustrate there is no requirement for it to change, except to add in the ip address.
Assumes IPv4, but can be improved to allow both IPv4 and IPv6 straightforwardly as libuv supports both.
Thanks to waTeim's answer, but it's from 2014 and things changed in Julia. This works nicely in Julia 6.0 and probably all above:
function ip(socket::TCPSocket)
buffer = Array{UInt8}(32)
bufflen::Int64 = 32
ccall(:uv_tcp_getpeername,Int64,(Ptr{Void},Ptr{UInt8},Ptr{Int64}), socket.handle, buffer, &bufflen)
peername::IPv4 = IPv4(buffer[5:8]...)
end
Building on the excellent answer from waTeim I simplified things a little to work with IPv6, and also for SSL connections:
using MbedTLS
function handle_connect(client)
ip, port = getsockname(isa(client.sock, MbedTLS.SSLContext) ? client.sock.bio : client.sock)
task_local_storage(:ip, ip)
end
(Would have added this as a comment to Josh Bode's answer, but I don't have the necessary reputation.)
Note that it is necessary to use getpeername() instead of getsockname() as of Julia v0.7.
https://github.com/JuliaLang/julia/pull/21825
Depending on your situation, you may be able to pass a linux command.
userIP = strip(readstring(`hostname -i`), ['\n'])

How to switch off logout encryption in simplesamlphp

All my logout responses from simplesamlphp IdP come encrypted. I looked in simplesamlphp docs but cannot find any option to switch off encryption.
(I have logout signing on; but signing should be independent of encryption, and use Redirect binding)
Is it possible to send logout responses via Redirect binding inencrypted? Or is always on by default for some reason?
Paramenter 'assertion.encryption' defined on IdP remote metadata
Whether assertions received from this IdP must be encrypted. The default value is FALSE. If this option is set to TRUE, assertions from the IdP must be encrypted. Unencrypted assertions will be rejected.
Note that this option overrides the option with the same name in the SP configuration.
Reference: http://simplesamlphp.org/docs/stable/simplesamlphp-reference-idp-remote
Parameter 'assertion.encryption' in saml20-idp-hosted.php
Whether assertions sent from this IdP should be encrypted. The default value is FALSE.
Note that this option can be set for each SP in the SP-remote metadata.
Reference: http://simplesamlphp.org/docs/stable/simplesamlphp-reference-idp-hosted
Edited to add an explanation:
simpleSAMLphp uses the function encryptAssertion (modules/saml/lib/IdP/SAML2.php) to decide if encrypt or not all the assertions that it handler. This function checks the values of the 'assertion.encryption' defined on the IdP/SP metadata file (if this parameter is not defined the assertion is not encrypted
private static function encryptAssertion(SimpleSAML_Configuration $idpMetadata,
SimpleSAML_Configuration $spMetadata, SAML2_Assertion $assertion) {
$encryptAssertion = $spMetadata->getBoolean('assertion.encryption', NULL);
if ($encryptAssertion === NULL) {
$encryptAssertion = $idpMetadata->getBoolean('assertion.encryption', FALSE);
}
if (!$encryptAssertion) {
/* We are _not_ encrypting this assertion, and are therefore done. */
return $assertion;
}
The issue was with something else. I just reused the code that processed POST binding to also process Redirect binding; but with Redirect binding, the payload is deflated, so the code for POST cannot be reused directly.

How do I log asynchronous thin+sinatra+rack requests?

I'm writing my first Sinatra-based web app as a frontend to another TCP-based service, using EventMachine and async_sinatra to process incoming HTTP requests asynchronously. When I'm testing my app, all requests to synchronous routes are logged to stdout in common log format, but asynchronous requests are not.
I've read through bits of the source code to async_sinatra, Sinatra, Thin, and Rack, and it looks like logging of synchronous requests is done through CommonLogger#call. However, I can't find anywhere in the asynchronous code in async_sinatra or Thin that seems to pass asynchronous requests through the logging middleware (I'm looking at Sinatra::Helpers#body in async_sinatra and at Thin::Connection.post_process which is written into env['.async_callback'] in Thin's connection.rb:68 and request.rb:132).
I'm experienced with C but relatively new to Ruby, so if I've used some terminology or notation incorrectly, please correct me. Thanks in advance.
Edit: this also affects error handling. If an exception is raised in an asynchronous request, the request is never finished and the error is never logged.
I eventually found that using rack-async with async_sinatra was causing problems with 404 pages, exception handling, and logging:
!! Unexpected error while processing request: undefined method `bytesize' for nil:NilClass
Instead I used the following wrapper around aroute for logging:
module Sinatra::Async
alias :oldaroute :aroute
def aroute verb, path, opts = {}, &block
# Based on aroute from async_sinatra
run_method = :"RunA#{verb} #{path} #{opts.hash}"
define_method run_method, &block
log_method = :"LogA#{verb} #{path} #{opts.hash}"
define_method(log_method) { |*a|
puts "#{request.ip} - #{status} #{verb} #{path}"
}
oldaroute verb, path, opts do |*a|
oldcb = request.env['async.callback']
request.env['async.callback'] = proc { |*args|
async_runner(log_method, *a)
oldcb[*args]
}
async_runner(run_method, *a)
end
end
end
This is for the same versions of async_sinatra, Thin, and Rack that I was using when I asked this question last year; newer versions may allow the use of common Rack middleware for logging.
I am running on sinatra-synchrony and therefore I have a slightly different core than you.
But basically I solved the same problem.
Here is an abstract of the solution:
I am not using Rack::CommonLogger, I use my own Logger
You need to buffer log output in an async aware storage
The buffered log output must be flushed at the end of the request
In my sinatra-synchrony application I am running the following middleware for logging:
# in app.rb I register Logger::Middleware as the first middleware
use Logger::Middleware
# in logger.rb
module Logger
attr_accessor :messages
def log(message)
stack << message
end
def stack
# This is the important async awareness
# It stores messages for each fiber separately
messages[Fiber.current.object_id] ||= []
end
def flush
STDERR.puts stack.join("\n") unless stack.empty?
messages.delete Fiber.current.object_id
end
extend self
class Middleware
def initialize(app)
#app = app
end
def call(env)
# before the request
Logger.log "#{env['REQUEST_METHOD']} #{env['REQUEST_URI']}"
result = #app.call(env)
# after the request
Logger.flush
result
end
end
end
Logger.messages = {} # initialize the message storage
Everywhere in the application I am able to use Logger.log("message") for logging.

Resources