airflow ValueError: Not a valid timetable

airflow ValueError: Not a valid timetable - airflow

I am relatively new to airflow & currently running 2.2.3. When trying to set up a costume timetable I was getting the error :
airflow ValueError: Not a valid timetable
Following the discussion here: https://github.com/apache/airflow/issues/19578
I changed a line in the my version of the airflow/models/dag.py that you get on installing airflow.
I change the infer_automated_data_interval function as so
def infer_automated_data_interval(self, logical_date: datetime) -> DataInterval:
"""Infer a data interval for a run against this DAG.
This method is used to bridge runs created prior to AIP-39
implementation, which do not have an explicit data interval. Therefore,
this method only considers ``schedule_interval`` values valid prior to
Airflow 2.2.
DO NOT use this method is there is a known data interval.
"""
timetable_type = type(self.timetable)
if issubclass(timetable_type, (NullTimetable, OnceTimetable)):
return DataInterval.exact(timezone.coerce_datetime(logical_date))
start = timezone.coerce_datetime(logical_date)
if issubclass(timetable_type, CronDataIntervalTimetable):
end = cast(CronDataIntervalTimetable, self.timetable)._get_next(start)
elif issubclass(timetable_type, DeltaDataIntervalTimetable):
end = cast(DeltaDataIntervalTimetable, self.timetable)._get_next(start)
else:
end = logical_date # I added this to avoid raising the value error
#raise ValueError(f"Not a valid timetable: {self.timetable!r}") # I commented this out
return DataInterval(start, end)
As far as I can see the jobs are now running on the expected schedule - but this feels like a very hacky and dangerous solution. Has anyone else had this issue?
What is this function supposed to be checking and why it's failing?
Thanks!

Related

Reticulate AWS Cogntito

This is my Python code (that I've checked and it works):
from warrant.aws_srp import AWSSRP
def auth(USERNAME,PASSWORD):
client = boto3.client('cognito-idp',region_name=region_name)
aws = AWSSRP(username=USERNAME, password=PASSWORD, pool_id=POOL_ID,
client_id=CLIENT_ID,client=client)
try:
tokens = aws.authenticate_user()
return(tokens)
except Exception as e:
return(e)
I'm working with R in order to create a visual interface for doing some operation (including this one) and it is a requirement.
I use the reticulare R package to execute Python code. I tested it with some dummy code in order to check the correct functioning (and it is okay).
When i execute the above function by running:
reticulate::source_python(FILE_PATH)
py$auth(USERNAME,PASSWORD)
i get the following error:
An error occurred (InvalidParameterException) when calling the RespondToAuthChallenge operation: TIMESTAMP format should be EEE MMM d HH:mm:ss z yyyy in english.
I tried to search a lot but I found nothing, I suppose that can exist a sort of wrapper or formatter. Maybe someone as already face this problem...
Thank a lot of any help.

Pytest failing on file open command string assert - what's the best way to test this?

I am constructing a command to pass to requests library to Post an attachment - as in
files= attachment = {"attachment": ("image.png", open("C:\tmp\sensor.png", "rb"), "image/png")}
The code is working but I cannot get PyTest to test it as -is because of the open command which is executed when evaluated. Here is simplified code of the problem
import pytest
def openfile():
cmd = {"cmd": open(r"C:\tmp\sensor.png")}
return cmd
def test_openfile():
cmd = openfile()
#assert str(cmd) == str({"cmd": open(r"C:\tmp\sensor.png")}) # this works
assert cmd == {"cmd": open(r"C:\tmp\sensor.png")} # this does not
PyTest complains that the two side are different but then confirms they are the same in the diff panel!
Expected :{'cmd': <_io.TextIOWrapper name='C:\tmp\sensor.png' mode='r' encoding='cp1252'>}
Actual :{'cmd': <_io.TextIOWrapper name='C:\tmp\sensor.png' mode='r' encoding='cp1252'>}
'Click to see difference' - Opening diff panel reports 'Contents are identical'!
I can just stick with comparing the generated string with expected string but am wondering if there is a better way to do this.
Ideas?

You need to test the properties of the actual file buffer that is returned by the open call, instead of the references to that buffer, for example:
def test_openfile():
cmd = openfile()
expected_filename = r"C:\tmp\sensor.png"
assert "cmd" in cmd
file_cmd = cmd["cmd"]
assert file_cmd.name == expected_filename
with open(expected_filename) as f:
contents = f.read()
assert file_cmd.read() == contents
Note that in a test you may not have the file contents, or have them in another place like a fixture, so testing the file contents may have to be adapted, or may not be needed, depending on what you want to test.

After talking this through with a friend I think my original approach is perfectly valid. For anyone that trips over this question here's why:
I am trying to pytest building of an executable parameter to pass to another library for execution. The execution of the parameter is not relevant, just that it is correctly formatted. The test is to compare what is generated with the expected parameter ( as if I typed it) .
Therefore casting to string or json and comparing is appropriate since that is what a human does to manually check the code!

Airflow Get Retry Number

In my Airflow DAG I have a task that needs to know if it's the first time it's ran or if it's a retry run. I need to adjust my logic in the task if it's a retry attempt.
I have a few ideas on how I could store the number of retries for the task but I'm not sure if any of them are legitimate or if there's an easier built in way to get this information within the task.
I'm wondering if I can just have an integer variable inside the dag that I append every time the task runs. Then if the task if reran I could check the value of the variable to see that it's greater than 1 and hence would be a retry run. But I'm not sure if mutable global variables work that way in Airflow since there can be multiple workers for different tasks (I'm not sure on this though).
Write it in an XCOM variable?

The retry number is available from the task instance, which is available via the macro {{ task_instance }}. https://airflow.apache.org/code.html#default-variables
If you are using the python operator simply add provide_context=True, to your operator kwargs, and then in the callable do kwargs['task_instance'].try_number
Otherwise you can do something like:
t = BashOperator(
task_id='try_number_test',
bash_command='echo "{{ task_instance.try_number }}"',
dag=dag)
Edit:
When the task instance is cleared, it will set the max_retry number to be the current try_number + retry value. So you could do something like:
ti = # whatever method you do to get the task_instance object
is_first = ti.max_tries - ti.task.retries + 1 == ti.try_number
Airflow will increments the try_number by 1 when running, so I imagine you'd need the + 1 when subtracting the max_tries from the configured retry value. But I didn't test that to confirm

#cwurtz answer was spot on. I was able to use it like this:
def _get_actual_try_number(self, context):
'''
Returns the real try_number that you also see in task details or logs.
'''
return context['task_instance'].try_number
def _get_relative_try_number(self, context):
'''
When a task is cleared, the try_numbers continue to increment.
This returns the try number relative to the last clearing.
'''
ti = context['task_instance']
actual_try_number = self._get_actual_try_number(context)
# When the task instance is cleared, it will set the max_retry
# number to be the current try_number + retry value.
# From https://stackoverflow.com/a/51757521
relative_first_try = ti.max_tries - ti.task.retries + 1
return actual_try_number - relative_first_try + 1

RED Robot Editor - One TestCase Dependent To Other [duplicate]

I have a large number of test cases, in which several test cases are interdependent. Is it possible that while a later test case is getting executed you can find out the status of a previously executed test case?
In my case, the 99th test case depends on the status of some prior test cases and thus, if either the 24th or the 38th fails I would like the 99th test case NOT to get executed at all and thus save me a lot of time.
Kindly, explain with some example if possible. Thanks in advance!

Robot is very extensible, and a feature that was introduced in version 2.8.5 makes it easy to write a keyword that will fail if another test has failed. This feature is the ability for a library to act as a listener. With this, a library can keep track of the pass/fail status of each test. With that knowledge, you can create a keyword that fails immediately if some other test fails.
The basic idea is, cache the pass/fail status as each test finishes (via the special _end_test method). Then, use this value to determine whether to fail immediately or not.
Here's an example of how to use such a keyword:
*** Settings ***
Library /path/to/DependencyLibrary.py
*** Test Cases ***
Example of a failing test
fail this test has failed
Example of a dependent test
[Setup] | Require test case | Example of a failing test
log | hello, world
Here is the library definition:
from robot.libraries.BuiltIn import BuiltIn
class DependencyLibrary(object):
ROBOT_LISTENER_API_VERSION = 2
ROBOT_LIBRARY_SCOPE = "GLOBAL"
def __init__(self):
self.ROBOT_LIBRARY_LISTENER = self
self.test_status = {}
def require_test_case(self, name):
key = name.lower()
if (key not in self.test_status):
BuiltIn().fail("required test case can't be found: '%s'" % name)
if (self.test_status[key] != "PASS"):
BuiltIn().fail("required test case failed: '%s'" % name)
return True
def _end_test(self, name, attrs):
self.test_status[name.lower()] = attrs["status"]

To solve this problem I'm using something like this:
Run Keyword if '${PREV TEST STATUS}'=='PASSED' myKeyword
so maybe this will be usable also for you.

R - doRedis - Overwrite getTask to control the order of execution in parallel foreach loops

Problem: I need to control the order of execution in which tasks are processed in parallel by a foreach loop. Unfortunately, this is not supported by foreach.
Solution in mind: Using doRedis to use the database to hold all tasks, that are executed in the foreach loop. To control the order I want to overwrite getTask by setGetTask to get the tasks based on pre-specified order. Though I could not find to much documentation on how to do this.
Additional Information:
There is a small paragraph on setGetTask with an example in the redis documentation.
getTask <- function ( queue , job_id , ...)
{
key <- sprintf("
redisEval("local x=redis.call('hkeys',KEYS[1])[1];
if x==nil then return nil end;
local ans=redis.call('hget',KEYS[1],x);
redis.call('hdel',KEYS[1],x);i
return ans",key)
}
setGetTask(getTask)
I though think the code in the documentation is syntactically not correct (missing imho a " and a closing bracket ")"). I thought this is not possible on CRAN, as the code for the documentation is executed on submission.
Changing the getTask function does not change anything in regard of the workers getting tasks (even if introducing obvious non-sense into the redisEval like changing it to redisEval("dddddddddd(((")
I only had access to the setGetTask function after installing the package from source (which I downloaded from the official CRAN package page of version 1.1.1 (which imho should make no difference than installing it directly from CRAN)
Data: The Dataframe of tasks to execute looks the following:
taskName;taskQueuePosition;parameter1;paramterN
taskT;1;val1;10
taskK;2;val2;8
taskP;3;val3;7
taskA;4;val4;7
I want to use 'taskQueuePosition' to control the order, tasks with lower numbers should be executed first.
Questions:
Does anybody know any sources where I can get more information on doing this with doRedis or on setGetTask?
Does anybody know how I need to change getTask to achieve the above described?
Any other smart ideas to control the order of execution in a foreach loop? Preferably so that at some point I can use doRedis as parallel back end (changing this would mean a major change in the processing due to complicated technical infrastructure reasons).
Code (for easy reproduction):
The following assumes that the redis-server is started on the local machine.
Redis DB Filling:
library(doRedis)
library(foreach)
options('redis:num'=TRUE) # needed for proper execution
REDIS_JOB_QUEUE = "jobs"
registerDoRedis(REDIS_JOB_QUEUE)
# filling up the data frame
taskDF = data.frame(taskName=c("taskT","taskK","taskP","taskA"),
taskQueuePosition=c(1,2,3,4),
parameter1=c("val1","val2","val3","val4"),
parameterN=c(10,8,7,7))
foreach(currTask=iter(taskDF, by='row'),
.verbose = T
) %dopar% {
print(paste("Executing task: ",currTask$taskName))
Sys.sleep(currTask$parameterN)
}
removeQueue(REDIS_JOB_QUEUE)
Worker:
library(doRedis)
REDIS_JOB_QUEUE = "jobs"
startLocalWorkers(n=1, queue=REDIS_JOB_QUEUE)

I could solve the problem and now can control the order of task execution.
Additional information:
1. There seems to be a typo in the documentation, that renders the getTask example not working. By considering the form of the default_getTask function from the file task.R in the package, it should look probably something like:
getTaskDefault <- function ( queue , job_id , ...)
{
key <- sprintf("%s:%s",queue, job_id)
return(redisEval("local x=redis.call('hkeys',KEYS[1])[1];
if x==nil then return nil end;
local ans=redis.call('hget',KEYS[1],x);
redis.call('set', KEYS[1] .. '.start.' .. x, x);
redis.call('hdel',KEYS[1],x);
return ans",key))
}
It seems that the letters behind first percent sign in the first line of the function got lost. This would explain the uneven number of brackets and quotes.
2) setGetTask still does not have any effect for me. When I set the getTask function though through .option while the DB is filled (like it is described in the vignette of the package) it is successfully called.
3) The information on 2) means that I do not need the getTask function, so I can use the package from CRAN.
----- Questions -----
1) The doRedis vignette describes how a custom getTask can be successfully set.
2 and 3) When the LUA script in getTask function is modified like below, the tasks are drawn from the database in the way they are submitted. This is not exactly what I was asking for, but due to time restraints and the fact I have (or better had) not the first idea about LUA script, it is imho a satisfying solution to control the order of submission by the taskQueuePosition column.
getTaskInOrder <- function ( queue , job_id , ...)
{
key <- sprintf("%s:%s",queue, job_id)
return(redisEval("
local tasks=redis.call('hkeys',KEYS[1]); -- get all tasks
local x=tasks[1]; -- get first task available task
if x==nil then -- if there are no tasks left, stop processing
return nil
end;
local xMin = 65535; -- if we have more tasks than 65535, getting the
-- task with the lowest taskID is not guaranteed to be the first one
local i = 1;
-- local iMinFound = -1;
while (x ~= nil) do -- search the array until there are no tasks left
-- print('x: ',x)
local xNum = tonumber(x);
if(xNum<xMin) then
xMin = xNum;
-- iMinFound = i;
end
i=i+1;
-- print('i is now: ',i);
x=tasks[i];
end
-- print('Minimum is task number',xMin,' found at i ', iMinFound)
x=tostring(xMin) -- convert it back to a string (maybe it would
-- be better to keep the original string somewhere,
-- in case we loose some information whilst converting to number)
-- print('x is now:',x);
-- print(KEYS[1] .. '.start.' .. x, x);
-- print('');
local ans=redis.call('hget',KEYS[1],x);
redis.call('set', KEYS[1] .. '.start.' .. x, x);
redis.call('hdel',KEYS[1],x);
return ans",key))
}
Important note: I noticed that if a task is aborted, the order is screwed up and the resubmitted task (even though the task number remains the same), will be executed after the originally submitted tasks. This is okay for me.
------ Code (for easy reproduction):------
This leads to the following code example (with 12 entries in the task data frame, instead the original 4):
Redis DB Filling:
library(doRedis)
library(foreach)
options('redis:num'=TRUE) # needed for proper execution
REDIS_JOB_QUEUE = "jobs"
getTaskInOrder <- function ( queue , job_id , ...)
{
...like above
}
registerDoRedis(REDIS_JOB_QUEUE)
# filling up the data frame already in order of tasks to be executed
# otherwise the dataframe has to be sorted by taskQueuePosition
taskDF = data.frame(taskName=c("taskA","taskB","taskC","taskD","taskE","taskF","taskG","taskH","taskI","taskJ","taskK","taskL"),
taskQueuePosition=c(1,2,3,4,5,6,7,8,9,10,11,12),
parameter1=c("val1","val2","val3","val4","val1","val2","val3","val4","val1","val2","val3","val4"),
parameterN=c(5,5,5,4,4,4,4,3,3,3,2,2))
foreach(currTask=iter(taskDF, by='row'),
.verbose = T,
.options.redis = list(getTask = getTaskInOrder
) %dopar% {
print(paste("Executing task: ",currTask$taskName))
Sys.sleep(currTask$parameterN)
}
removeQueue(REDIS_JOB_QUEUE)
Worker:
library(doRedis)
REDIS_JOB_QUEUE = "jobs"
startLocalWorkers(n=1, queue=REDIS_JOB_QUEUE)
Another note: just in case you are processing long jobs, as I do, please notice a bug in redis 1.1.1 (the current version on CRAN), which leads to tasks being resubmitted (due to a timeout) despite the workers still working on them.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

airflow ValueError: Not a valid timetable - airflow

Related

Reticulate AWS Cogntito

Pytest failing on file open command string assert - what's the best way to test this?

Airflow Get Retry Number

RED Robot Editor - One TestCase Dependent To Other [duplicate]

R - doRedis - Overwrite getTask to control the order of execution in parallel foreach loops

Categories

Resources