I have an airflow dag that uses the following jinja template: "{{ execution_date.astimezone('Etc/GMT+6').subtract(days=1).strftime('%Y-%m-%dT00:00:00') }}"
This template works in other dags, and it works when the schedule_interval for the dag is set to timedelta(hours=1). However, when we set the schedule interval to 0 8 * * *, it throws the following traceback at runtime:
Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/airflow/models/__init__.py", line 1426, in _run_raw_task
self.render_templates()
File "/usr/lib/python2.7/site-packages/airflow/models/__init__.py", line 1790, in render_templates
rendered_content = rt(attr, content, jinja_context)
File "/usr/lib/python2.7/site-packages/airflow/models/__init__.py", line 2538, in render_template
return self.render_template_from_field(attr, content, context, jinja_env)
File "/usr/lib/python2.7/site-packages/airflow/models/__init__.py", line 2520, in render_template_from_field
for k, v in list(content.items())}
File "/usr/lib/python2.7/site-packages/airflow/models/__init__.py", line 2520, in <dictcomp>
for k, v in list(content.items())}
File "/usr/lib/python2.7/site-packages/airflow/models/__init__.py", line 2538, in render_template
return self.render_template_from_field(attr, content, context, jinja_env)
File "/usr/lib/python2.7/site-packages/airflow/models/__init__.py", line 2514, in render_template_from_field
result = jinja_env.from_string(content).render(**context)
File "/usr/lib64/python2.7/site-packages/jinja2/environment.py", line 1008, in render
return self.environment.handle_exception(exc_info, True)
File "/usr/lib64/python2.7/site-packages/jinja2/environment.py", line 780, in handle_exception
reraise(exc_type, exc_value, tb)
File "<template>", line 1, in top-level template code
TypeError: astimezone() argument 1 must be datetime.tzinfo, not str
It appears the execution date being passed in is a string, not a datetime object; but I am only able to hit this error on this specific dag, and no others. I've tried deleting the dag entirely and recreating it with no luck.
Looks like astimezone(..) function is misbehaving, it expects a datetime.tzinfo while you are passing it an str argument ('Etc/GMT+6')
TypeError: astimezone() argument 1 must be datetime.tzinfo, not str
While I couldn't make the exact thing work, I believe following achieves pretty much the same effect as what you are trying
{{ execution_date.in_timezone("US/Eastern") - timedelta(days=1) }}
Recall that
execution_date macro is a Pendulum object
in_timezone(..) converts it into a datetime.datetime(..)
then we just add a datetime.timedelta(days=1) to it
Related
Say I have the following DAG (stuff omitted for clarity)
#dag.py
from airflow.operators.python import PytonOperator
def main():
print("Task 1")
#some code
print("Task 2")
#some more code
print("Done")
return 0
t1 = PythonOperator(python_callable=main)
t1
Say the program fails at #some more code due to e.g RAM-issues I just get an error in my log e.g
[2021-05-25 12:49:54,211] {process_utils.py:137} INFO - Output:
[2021-05-25 12:52:44,605] {taskinstance.py:1482} ERROR - Task failed with exception
Traceback (most recent call last):
File "/usr/local/lib/python3.6/site-packages/airflow/models/taskinstance.py", line 1138, in _run_raw_task
self._prepare_and_execute_task_with_callbacks(context, task)
File "/usr/local/lib/python3.6/site-packages/airflow/models/taskinstance.py", line 1311, in _prepare_and_execute_task_with_callbacks
result = self._execute_task(context, task_copy)
File "/usr/local/lib/python3.6/site-packages/airflow/models/taskinstance.py", line 1341, in _execute_task
result = task_copy.execute(context=context)
File "/usr/local/lib/python3.6/site-packages/airflow/operators/python.py", line 493, in execute
super().execute(context=serializable_context)
File "/usr/local/lib/python3.6/site-packages/airflow/operators/python.py", line 117, in execute
return_value = self.execute_callable()
File "/usr/local/lib/python3.6/site-packages/airflow/operators/python.py", line 531, in execute_callable
string_args_filename,
File "/usr/local/lib/python3.6/site-packages/airflow/utils/process_utils.py", line 145, in execute_in_subprocess
raise subprocess.CalledProcessError(exit_code, cmd)
subprocess.CalledProcessError: Command '['/tmp/venv2wbjnabi/bin/python', '/tmp/venv2wbjnabi/script.py', '/tmp/venv2wbjnabi/script.in', '/tmp/venv2wbjnabi/script.out', '/tmp/venv2wbjnabi/string_args.txt']' died with <Signals.SIGKILL: 9>.
[2021-05-25 13:00:55,733] {taskinstance.py:1532} INFO - Marking task as FAILED. dag_id=test_dag, task_id=clean_data, execution_date=20210525T105621, start_date=20210525T105732, end_date=20210525T110055
[2021-05-25 13:00:56,555] {local_task_job.py:146} INFO - Task exited with return code 1
but none of the print-statements are printed thus I don't know where the program failed (I know it now due to debugging).
I assume, due to that, that Airflow don't flush before the task is marked as "success". Is there a way to make Airflow flush on runtime/print on runtime?
I'm currently working on a parser to make a small preview of a page from a URL given by the user in PHP.
I'd like to retrieve only the title of the page and a little chunk of information (a bit of text)
The project: for a list of meta-data of popular wordpress-plugins and gathering the first 50 URLs - that are 50 plugins which are of interest! The challenge is: i want to fetch meta-data of all the existing plugins. What i subsequently want to filter out after the fetch is - those plugins that have the newest timestamp - that are updated (most) recently. It is all aobut acutality...
https://wordpress.org/plugins/wp-job-manager
https://wordpress.org/plugins/ninja-forms
import requests
from bs4 import BeautifulSoup
from concurrent.futures.thread import ThreadPoolExecutor
url = "https://wordpress.org/plugins/browse/popular/{}"
def main(url, num):
with requests.Session() as req:
print(f"Collecting Page# {num}")
r = req.get(url.format(num))
soup = BeautifulSoup(r.content, 'html.parser')
link = [item.get("href")
for item in soup.findAll("a", rel="bookmark")]
return set(link)
with ThreadPoolExecutor(max_workers=20) as executor:
futures = [executor.submit(main, url, num)
for num in [""]+[f"page/{x}/" for x in range(2, 50)]]
allin = []
for future in futures:
allin.extend(future.result())
def parser(url):
with requests.Session() as req:
print(f"Extracting {url}")
r = req.get(url)
soup = BeautifulSoup(r.content, 'html.parser')
target = [item.get_text(strip=True, separator=" ") for item in soup.find(
"h3", class_="screen-reader-text").find_next("ul").findAll("li")[:8]]
head = [soup.find("h1", class_="plugin-title").text]
new = [x for x in target if x.startswith(
("V", "Las", "Ac", "W", "T", "P"))]
return head + new
with ThreadPoolExecutor(max_workers=50) as executor1:
futures1 = [executor1.submit(parser, url) for url in allin]
for future in futures1:
print(future.result())
see the results:
Extracting https://wordpress.org/plugins/tuxedo-big-file-uploads/Extracting https://wordpress.org/plugins/cherry-sidebars/
Extracting https://wordpress.org/plugins/meks-smart-author-widget/
Extracting https://wordpress.org/plugins/wp-limit-login-attempts/
Extracting https://wordpress.org/plugins/automatic-translator-addon-for-loco-translate/
Extracting https://wordpress.org/plugins/event-organiser/
Traceback (most recent call last):
File "/home/martin/unbenannt0.py", line 45, in <module>
print(future.result())
File "/home/martin/anaconda3/lib/python3.7/concurrent/futures/_base.py", line 428, in result
return self.__get_result()
File "/home/martin/anaconda3/lib/python3.7/concurrent/futures/_base.py", line 384, in __get_result
raise self._exception
File "/home/martin/anaconda3/lib/python3.7/concurrent/futures/thread.py", line 57, in run
result = self.fn(*self.args, **self.kwargs)
File "/home/martin/unbenannt0.py", line 34, in parser
"h3", class_="screen-reader-text").find_next("ul").findAll("li")[:8]]
AttributeError: 'NoneType' object has no attribute 'find_next'
well i have a severe error - the
AttributeError: 'NoneType' object has no attribute 'find_next'
It looks like soup.find("h3", class_="screen-reader-text") has not found anything.
Well we could either break this line up and only call find_next if there was a result or use a try/except that captures the AttributeError.
at the moment i do not know how to fix this whole thing - only that we can surround the offending code with:
try:
code that causes error
except AttributeError:
print(f"Attribution error on {some data here}, {whatever else would be of value}, {...}")
... whatever action is thinkable to take here.
btw.- besides this error i want to add a option that gives the results back: see complete and unaltered error traceback. It contains valuable process call stack information.
Extracting https://wordpress.org/plugins/automatic-translator-addon-for-loco-translate/
Extracting https://wordpress.org/plugins/wpforo/Extracting https://wordpress.org/plugins/accesspress-social-share/
Extracting https://wordpress.org/plugins/mailoptin/
Extracting https://wordpress.org/plugins/tuxedo-big-file-uploads/
Extracting https://wordpress.org/plugins/post-snippets/
Extracting https://wordpress.org/plugins/woocommerce-payfast-gateway/Extracting https://wordpress.org/plugins/woocommerce-grid-list-toggle/
Extracting https://wordpress.org/plugins/goodbye-captcha/
Extracting https://wordpress.org/plugins/gravity-forms-google-analytics-event-tracking/
Traceback (most recent call last):
File "/home/martin/dev/wordpress_plugin.py", line 44, in <module>
print(future.result())
File "/home/martin/anaconda3/lib/python3.7/concurrent/futures/_base.py", line 428, in result
return self.__get_result()
File "/home/martin/anaconda3/lib/python3.7/concurrent/futures/_base.py", line 384, in __get_result
raise self._exception
File "/home/martin/anaconda3/lib/python3.7/concurrent/futures/thread.py", line 57, in run
result = self.fn(*self.args, **self.kwargs)
File "/home/martin/dev/wordpress_plugin.py", line 33, in parser
"h3", class_="screen-reader-text").find_next("ul").findAll("li")[:8]]
AttributeError: 'NoneType' object has no attribute 'find_next'
hope that this was not too long and complex - thank you for the help!
I am new to Airflow. Using Taskflow API, I am trying to dynamically change the flow of DAGs. If a condition is met, the two step workflow should be executed a second time.
After defining two functions/tasks, if I fix the DAG sequence as below, everything works fine.
#dag(default_args=default_args, schedule_interval=None, start_date=days_ago(2))
def genesis(**kwargs):
#task()
def extract():
print("X")
#task()
def add_timeframe():
print("Y")
extracted_data = extract()
timeframe_data = add_timeframe(extracted_data)
However, I write any conditional logic to trigger the second run (either inside a DAG or after the function/task definitions), I get the error below. The error seems to be about setting upstream tasks. But the older "task.set_upstream(task2)" commands don't work in Taskflow Airflow 2.0.
All examples I could find of conditionally branching were based on the non Taskflow API. Please help.
#dag(default_args=default_args, schedule_interval=None, start_date=days_ago(2))
def genesis(**kwargs):
#task()
def extract():
print("X")
if <condition>:
extracted_data2 = extract()
timeframe_data2 = add_timeframe(extracted_data2)
#task()
def add_timeframe():
print("Y")
extracted_data = extract()
timeframe_data = add_timeframe(extracted_data)
ERROR - Tried to create relationships between tasks that don't have DAGs yet. Set the DAG for at least one task and try again: [<Task(_PythonDecoratedOperator): add_timeframe>, <Task(_PythonDecoratedOperator): extract>]
Traceback (most recent call last):
File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/airflow/models/taskinstance.py", line 1086, in _run_raw_task
self._prepare_and_execute_task_with_callbacks(context, task)
File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/airflow/models/taskinstance.py", line 1260, in _prepare_and_execute_task_with_callbacks
result = self._execute_task(context, task_copy)
File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/airflow/models/taskinstance.py", line 1300, in _execute_task
result = task_copy.execute(context=context)
File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/airflow/operators/python.py", line 233, in execute
return_value = self.python_callable(*self.op_args, **self.op_kwargs)
File "/opt/airflow/dags/genesis.py", line 77, in extract
timeframe_data = add_timeframe(extracted_data)
File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/airflow/operators/python.py", line 294, in factory
**kwargs,
File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/airflow/models/baseoperator.py", line 91, in __call__
obj.set_xcomargs_dependencies()
File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/airflow/models/baseoperator.py", line 722, in set_xcomargs_dependencies
apply_set_upstream(arg)
File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/airflow/models/baseoperator.py", line 711, in apply_set_upstream
apply_set_upstream(elem)
File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/airflow/models/baseoperator.py", line 708, in apply_set_upstream
self.set_upstream(arg.operator)
File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/airflow/models/baseoperator.py", line 1239, in set_upstream
self._set_relatives(task_or_task_list, upstream=True)
File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/airflow/models/baseoperator.py", line 1205, in _set_relatives
"task and try again: {}".format([self] + task_list)
On Migrating Airflow from V1.10.2 to V1.10.10 One of our DAG have a task which is of dagrun_operator type.
Code snippet of the task looks something as below. Please assume that DAG dag_process_pos exists.
The DAG that is being triggered by the TriggerDagRunOperator is dag_process_pos. That starts with task of type dummy_operator [ Just a hint if this could be the trouble maker ]
task_trigger_dag_positional = TriggerDagRunOperator(
trigger_dag_id="dag_process_pos",
python_callable=set_up_dag_run_preprocessing,
task_id="trigger_preprocess_dag",
on_failure_callback=log_failure,
execution_date=datetime.now(),
provide_context=False,
owner='airflow')
def set_up_dag_run_preprocessing(context, dag_run_obj):
ti = context['ti']
dag_name = context['ti'].task.trigger_dag_id
dag_run = context['dag_run']
trans_id = dag_run.conf['transaction_id']
routing_info = ti.xcom_pull(task_ids="json_validation", key="route_info")
new_file_path = routing_info['file_location']
new_file_name = os.path.basename(routing_info['new_file_name'])
file_path = os.path.join(new_file_path, new_file_name)
batch_id = "123-AD-FF"
dag_run_obj.payload = {'inputfilepath': file_path,
'transaction_id': trans_id,
'Id': batch_id}
The DAG runs all fine. In fact the python callable of the task mentioned until the last line. Then it errors out.
[2020-06-09 11:36:22,838] {taskinstance.py:1145} ERROR - No row was found for one()
Traceback (most recent call last):
File "/usr/local/lib/python3.6/site-packages/airflow/models/taskinstance.py", line 983, in _run_raw_task
result = task_copy.execute(context=context)
File "/usr/local/lib/python3.6/site-packages/airflow/operators/dagrun_operator.py", line 95, in execute
replace_microseconds=False)
File "/usr/local/lib/python3.6/site-packages/airflow/api/common/experimental/trigger_dag.py", line 141, in trigger_dag
replace_microseconds=replace_microseconds,
File "/usr/local/lib/python3.6/site-packages/airflow/api/common/experimental/trigger_dag.py", line 98, in _trigger_dag
external_trigger=True,
File "/usr/local/lib/python3.6/site-packages/airflow/utils/db.py", line 74, in wrapper
return func(*args, **kwargs)
File "/usr/local/lib/python3.6/site-packages/airflow/models/dag.py", line 1471, in create_dagrun
run.refresh_from_db()
File "/usr/local/lib/python3.6/site-packages/airflow/utils/db.py", line 74, in wrapper
return func(*args, **kwargs)
File "/usr/local/lib/python3.6/site-packages/airflow/models/dagrun.py", line 109, in refresh_from_db
DR.run_id == self.run_id
File "/usr/local/lib64/python3.6/site-packages/sqlalchemy/orm/query.py", line 3446, in one
raise orm_exc.NoResultFound("No row was found for one()")
sqlalchemy.orm.exc.NoResultFound: No row was found for one()
After which the on_failure_callback of that task is executed and all code of that callable runs perfectly ok as is expected. The query here is why did the dagrun_operator fail after the python callable.
P.S : The DAG that is being triggered by the TriggerDagRunOperator , in this case dag_process_pos starts with task of typedummy_operator
if in my diazo controlpanel > 'Parameter expressions' I put
have_left_portlets = python:context and context.restrictedTraverse('##plone').have_portlets('plone.leftcolumn',context)
I obtain an error only when I'm on the portal homepage:
2012-06-26 16:51:42 ERROR plone.transformchain Unexpected error whilst trying to apply transform chain
Traceback (most recent call last):
File "/Users/vito/.buildout/eggs/plone.transformchain-1.0.2-py2.6.egg/plone/transformchain/transformer.py", line 48, in __call__
newResult = handler.transformIterable(result, encoding)
File "/Users/vito/.buildout/eggs/plone.app.theming-1.0-py2.6.egg/plone/app/theming/transform.py", line 257, in transformIterable
params[name] = quote_param(expression(expressionContext))
File "/Users/vito/.buildout/eggs/Zope2-2.13.13-py2.6.egg/Products/PageTemplates/ZRPythonExpr.py", line 48, in __call__
return eval(self._code, vars, {})
File "PythonExpr", line 1, in <expression>
File "/Users/vito/.buildout/eggs/AccessControl-2.13.7-py2.6-macosx-10.6-x86_64.egg/AccessControl/ImplPython.py", line 675, in guarded_getattr
v = getattr(inst, name)
AttributeError: 'FilesystemResourceDirectory' object has no attribute 'restrictedTraverse'
How I can solve this?
I suspect this is a bug in plone.app.theming: the context isn't set correctly. Strange, though.
Just confirming that the issue exits:
I get about the same traceback, the site itself looks fine, but for every click inside the site I get a the following traceback in my instance fg:
2012-08-10 15:05:05 ERROR plone.transformchain Unexpected error whilst trying to apply transform chain
Traceback (most recent call last):
File "/opt/etc/buildout/eggs/plone.transformchain-1.0.2-py2.6.egg/plone/transformchain/transformer.py", line 48, in __call__
newResult = handler.transformIterable(result, encoding)
File "/opt/etc/buildout/eggs/plone.app.theming-1.0-py2.6.egg/plone/app/theming/transform.py", line 257, in transformIterable
params[name] = quote_param(expression(expressionContext))
File "/opt/etc/buildout/eggs/Zope2-2.13.10-py2.6.egg/Products/PageTemplates/ZRPythonExpr.py", line 48, in __call__
return eval(self._code, vars, {})
File "PythonExpr", line 1, in <expression>
AttributeError: 'FilesystemResourceDirectory' object has no attribute 'Language'
This is because I have the following line in my manifest.cfg (which is about the same as the parameter line in the plone_control_panel:
lang = python: context.Language()
In a way in my case this is sort of logical, since not all content objects have an index called Language().
But the 'context' in this case is apparently refering to the 'FileSystemResourceDirectory' and not to the piece of content you are on?
I'll try with pdb if I can find some more info...