In Airflow, cannot check for equality using config variable passed - airflow

Our DAG
import func1
import func2
all_tables = [
'tablea',
'tableb',
'tablec',
'tabled'
]
with TABLE_MONGO_LOAD:
# Pass "table" parameter
this_table = "{{ dag_run.conf['table'] }}"
this_function = func1 if this_table in all_tables else func2
get_local_json = PythonOperator(
task_id=f'bq_to_local',
python_callable=this_function,
op_kwargs={ 'table_name': this_table, 'file_name': this_table
)
...
We trigger this DAG manually, and pass { "table": "tablea" }. We want this_function = func1 if this_table in all_tables else func2 to lead to this_function equalling func, because tablea is in fact in all_tables, however everytime we run this DAG, func2 is returned. No matter what we pass to the config, this_function = func1 if this_table in all_tables else func2 always returns the else case.
Is this_table = "{{ dag_run.conf['table'] }}" not returning a string? Even when I do a super basic check:
if this_table == 'tablea':
print('a')
else:
print('b')
this always prints b even if the config passed is { "table": "tablea" }. What's going on here?

the line "this_function = func1 if this_table in all_tables else func2" is running before the dag is running while airflow is interpreting that the dag is valid.
in order to check while running the DAG you should use BranchPythonOperator

Related

Run testthat test in separate R session (how to combine the outcomes)

I need to test package loading operations (for my multiversion package) and know that unloading namespaces and stuff is dangerous work. So I want to run every test in a fresh R session. Running my tests in parallel does not meet this demand since it will reuse slaves, and these get dirty.
So I thought callr::r would help me out. Unfortunately I am again stuck with the minimally documented reporters it seems.
The following is a minimal example. Placed in file test-mytest.R.
test_that('test 1', {
expect_equal(2+2, 5)
})
reporter_in <- testthat::get_reporter()
# -- 1 --
reporter_out <- callr::r(
function(reporter) {
reporter <- testthat::with_reporter(reporter, {
testthat::test_that("test inside", {
testthat::expect_equal('this', 'wont match')
})
})
},
args = list(reporter = reporter_in),
show = TRUE
)
# -- 2 --
testthat::set_reporter(reporter_out)
# -- 3 --
test_that('test 2', {
expect_equal(2+2, 8)
})
I called this test file using:
# to be able to check the outcome, work with a specific reporter
summary <- testthat::SummaryReporter$new()
testthat::test_file('./tests/testthat/test-mytest.R', reporter = summary)
Which seems to do what I want, but when looking at the results...
> summary$end_reporter()
== Failed ===============================================================================================
-- 1. Failure (test-load_b_pick_last_true.R:5:5): test 1 ------------------------------------------------
2 + 2 (`actual`) not equal to 5 (`expected`).
`actual`: 4
`expected`: 5
== DONE =================================================================================================
...it is only the first test that is returned.
How it works:
An ordinary test is executed.
The reporter, currently in use, is obtained (-- 1 --)
callr::r is used to call a testthat block including a test.
Within the call, I tried using set_reporter, but with_reporter is practically identical.
The callr::r call returns the reporter (tried it with get_reporter(), but with_reporter also returns the reporter (invisibly))
Now the returned reporter seems fine, but when setting it as the actual reporter with set_reporter, it seems that it is not overwriting the actual reporter.
Note that at -- 2 --, the reporter_out contains both test outcomes.
Question
I am not really sure what I expect it to do, but in the end I want the results to be added to the original reporter ((summary or) reporter_in that is, if that is not some kind of copy).
One workaround I can think of would be to move the actual test execution outside of the callr::r call, but gather the testcases inside.
I think it is neat, as long as you can place these helper functions (see the elaborate example) in your package, you can write tests with little overhead.
It doesn't answer how to work with the 'reporter' object though...
Simple example:
test_outcome <- callr::r(
function() {
# devtools::load_all()
list(
check1 = mypackage::sum(5,5), # some imaginary exported functions sum and name.
check2 = mypackage::name()
)
}
)
test_that('My test case', {
expect_equal(test_outcome$check1, 10)
expect_equal(test_outcome$check2, 'Siete')
})
Elaborate example
Note that from .add_test to .exp_true are only function definitions which can better be included in your package so they will be available when being loaded with devtools::load_all(). load_all also loads not-exported functions by default.
test_outcome <- callr::r(
function() {
# devtools::load_all()
# Defining helper functions
tst <- list(desc = 'My first test', tests = list())
.add_test <- function(type, A, B) {
# To show at least something about what is actually tested when returning the result, we can add the actual `.exp_...` call to the test.
call <- as.character(sys.call(-1))
tst$tests[[length(tst$tests) + 1]] <<- list(
type = type, a = A, b = B,
# (I couldn't find a better way to create a nice call string)
call = paste0(call[1], '(', paste0(collapse = ', ', call[2:length(call)]), ')'))
}
.exp_error <- function(expr, exp_msg) {
err_msg <- ''
tryCatch({expr}, error = function(err) {
err_msg <<- err$message
})
.add_test('error', err_msg, exp_msg)
}
.exp_match <- function(expr, regex) {
.add_test('match', expr, regex)
}
.exp_equal <- function(expr, ref) {
.add_test('equal', expr, ref)
}
.exp_false <- function(expr) {
.add_test('false', expr, FALSE)
}
.exp_true <- function(expr) {
.add_test('true', expr, TRUE)
}
# Performing the tests
.exp_match('My name is Siete', 'My name is .*')
.exp_equal(mypackage::sum(5,5), 10) # some imaginary exported functions sum and name.
.exp_match(mypackage::name(), 'Siete')
.exp_false('package:testthat' %in% search())
return(tst)
},
show = TRUE)
# Performing the actual testthat tests:
.run_test_batch <- function(test_outcome) {
test_that(test_outcome$desc, {
for (test in test_outcome$tests) {
# 'test' is a list with the fields 'type', 'a', 'b' and 'call'.
# Where 'type' can contain 'match', 'error', 'true', 'false' or 'equal'.
if (test$type == 'equal') {
with(test, expect_equal(a, b, label = call))
} else if (test$type == 'true') {
expect_true( test$a, label = test$call)
} else if (test$type == 'false') {
expect_false(test$a, label = test$call)
} else if (test$type %in% c('match', 'error')) {
with(test, expect_match(a, b, label = call))
}
}
})
}
.run_test_batch(test_outcome)
When moving the functions to your package you would need the following initialize function too.
tst <- new.env(parent = emptyenv())
tst$desc = ''
tst$tests = list()
.initialize_test <- function(desc) {
tst$desc = desc
tst$tests = list()
}
It works as follows:
An empty list is created: tst
By calling .exp_... functions, tests are added to that list
The list with tests is returned by the function in callr::r
Then we loop over the list and execute every test

How to return event$data in rstudio/websocket

I am trying to extend websocket::Websocket with a method that sends some data and returns the message, so that I can assign it to an object. My question is pretty much identical to https://community.rstudio.com/t/capture-streaming-json-over-websocket/16986. Unfortunately, the user there never revealed how they solved it themselves. My idea was to have the onMessage method return the event$data, i.e. something like:
my_websocket <- R6::R6Class("My websocket",
inherit = websocket::WebSocket,
public = list(
foo = function(x) {
msg <- super$send(paste("x"))
return(msg)
} )
)
load_websocket <- function(){
ws <- my_websocket$new("ws://foo.local")
ws$onMessage(function(event) {
return(event$data)
})
return(ws)
}
my_ws <- load_websocket()
my_ws$foo("hello") # returns NULL
but after spending a good hour on the Websocket source code, I am still completely in the dark as to where exactly the callback happens, "R environment wise".
You need to use super assignment operator <<-. <<- is most useful in conjunction with closures to maintain state. Unlike the usual single arrow assignment (<-) that always works on the current level, the double arrow operator can modify variables in parent levels.
my_websocket <- R6::R6Class("My websocket",
inherit = websocket::WebSocket,
public = list(
foo = function(x) {
msg <<- super$send(paste("x"))
return(msg)
} )
)
load_websocket <- function(){
ws <- my_websocket$new("ws://foo.local")
ws$onMessage(function(event) {
return(event$data)
})
return(ws)
}
my_ws <- load_websocket()
my_ws$foo("hello")

tryCatch block in R, change value of outer variable

Here is my code. It produces infinite loop, because value of something variable does not change within captured error. Is it supposed to be this way? How can I fix it so that value of something changes to FALSE?
something <- TRUE
counter <- 1
while(something){
print(counter)
tryTest = tryCatch(
{
arima(rep(1,3), order = c(1,0,0))
},
warning = function(w) {
print('this is warning')
print(w)
},
error = function(e) {
something <- FALSE
print('this is error')
print(e)
},
finally = {}
)
counter <- (counter +1)
}
This happens because the environment of something in the outer code is different from the environment of something in your lambda:
function(e) {
something <- FALSE
print('this is error')
print(e)
}
So setting something <- FALSE in your lambda actually sets a different something then in the outer code. To fix this you can either 1) make something a global variable or 2) create an environment to use something in.
1)
assign("something", TRUE, env=globalenv())
to set the variable and
get("something", env=globalenv())
to access the variable.
You can also set something inside your lambda in the same way:
assign("something", FALSE, env=globalenv())
or
2)
First create a new variable:
env=new.env()
Then set and access your variable in a similar way as before:
assign("something", TRUE, env=env)
get("something", env=env)
You can assign something inside your lambda with:
assign("something", FALSE, env=env)
Using env is possible because R copies variables to child environments. However if you set a variable in a child environment (like when you did `somethi

R get the environment created by a function at the call

I would like to get the environment created by a function when it is runned WITHOUT modifying the function source (ie from outside of the function), is it possible ?
fn=function()
{#Here a new environment is created at each call, how to get it ?
#This environment can be access with environment() but only (to what I know)
#from inside the function
...
}
I would like something like this:
env=some_function(fn())
where env is the environment id created by fn at the call.
You could trace the function to bind the call environment to a symbol in the global environment:
fn <- function() {x <- 2; 1}
trace(fn, quote(efn <<- environment()), at = 1)
fn()
#Tracing fn() step 1
#[1] 1
untrace(fn)
efn$x
#[1] 2

Is there a way to run an expression on.exit() but only if completes normally, not on error?

I'm aware of the function on.exit in R, which is great. It runs the expression when the calling function exits, either normally or as the result of an error.
What I'd like is for the expression only to be run if the calling function returns normally, but not in the case of an error. I have multiple points where the function could return normally, and multiple points where it could fail. Is there a way to do this?
myfunction = function() {
...
on.exit( if (just exited normally without error) <something> )
...
if (...) then return( point 1 )
...
if (...) then return( point 2 )
...
if (...) then return( point 3 )
...
return ( point 4 )
}
The whole point of on.exit() is exactly to be run regardless of the exit status. Hence it disregards any error signal. This is afaik equivalent to the finally statement of the tryCatch function.
If you want to run code only on normal exit, simply put it at the end of your code. Yes, you'll have to restructure it a bit using else statements and by creating only 1 exit point, but that's considered good coding practice by some.
Using your example, that would be:
myfunction = function() {
...
if (...) then out <- point 1
...
else if (...) then out <- point 2
...
else if (...) then out <- point 3
...
else out <- point 4
WhateverNeedsToRunBeforeReturning
return(out)
}
Or see the answer of Charles for a nice implementation of this idea using local().
If you insist on using on.exit(), you can gamble on the working of the traceback mechanism to do something like this :
test <- function(x){
x + 12
}
myFun <- function(y){
on.exit({
err <- if( exists(".Traceback")){
nt <- length(.Traceback)
.Traceback[[nt]] == sys.calls()[[1]]
} else {FALSE}
if(!err) print("test")
})
test(y)
}
.Traceback contains the last call stack resulting in an error. You have to check whether the top call in that stack is equal to the current call, and in that case your call very likely threw the last error. So based on that condition you can try to hack yourself a solution I'd never use myself.
Just wrap the args of all your return function calls with the code that you want done. So your example becomes:
foo = function(thing){do something; return(thing)}
myfunction = function() {
...
if (...) then return( foo(point 1) )
...
if (...) then return( foo(point 2) )
...
if (...) then return( foo(point 3) )
...
return ( foo(point 4) )
}
Or just make each then clause into two statements. Using on.exit to lever some code into a number of places is going to cause spooky action-at-a-distance problems and make the baby Dijkstra cry (read Dijkstra's "GOTO considered harmful" paper).
Bit more readable version of my comment on #Joris' answer:
f = function() {
ret = local({
myvar = 42
if (runif(1) < 0.5)
return(2)
stop('oh noes')
}, environment())
# code to run on success...
print(sprintf('myvar is %d', myvar))
ret
}
I guess there is not a clean way yet. I usually create an OK variable at the beginning as FALSE and turn it to TRUE at the end. I prefer on.exit over isolating all my code into a tryCatch.
myfun = function() {
OK=F # the flag "OK" will be FALSE until the function ends OK
conn = my.db.connection.function()
dbBegin(conn)
on.exit({
if(OK) dbCommit(conn) else dbRollback(conn)
dbDisconnect(conn)
})
# ... Your code. You can edit database as a transaction.
# if anything fails in R or in the database a rollback will occur
OK=T # only if the code came to the end everything went ok, so we set the flag OK as TRUE
return(NULL)
}

Resources