I want to run a python script which need a datetime param and post its output to slack. But I don't know how to get airflow template variable ds.
Let's say, I have below code:
def make_txt():
# get ds
ds = get_ds()
ds = ds * 3 + 4 / 5 # do something with ds
return ds
slack_task = SlackAPIPostOperator(
text=make_txt(),
token='xoxp-xxxxxxx',
)
Because I will run with airflow backfill dag_id -s 2016-10-01, the ds (here is 2016-10-01) should pass to slack text.
I try to write python script output to a file and then read it and pass to slack text directly. But I don't think that's a perfect solution.
The text field of SlackAPIPostOperator is templated, so if you add {{ ds }} somewhere within the text it will be inserted by jinja.
Related
I am really new to Airflow, so please forgive me if this is a dim question. I did search unsuccessfully on Stackoverlow to find a similar question.
I have a download stream task that waits for a file to download. I'd like to abstract the hardcoded filepath and retrieve the path, stored in an XCOM.
t2 = FileSensor(
task_id = 'waiting_for_file_download',
poke_interval = 60 * 5,
timeout = 60 * 10,
mode = 'reschedule',
filepath = {{ ti.xcom_pull(task_ids = 'downloaded_file', key = 'file_path') }} + 'transformed' + '_new.csv.gz'
)
Is this possible?
Reading the official filesensor documentation did not really help me, as a newcomer.
I see there are two additional fields 1) templates 2) fs_conn_id
UPDATE
Reading the docs I can see a XCOM.get_one() however this is not working either:
filepath = XCom.get_one(
execution_date = date.today(),
dag_id = 'My_DAG',
task_id = 'downloaded_file',
key = 'file_path'
I see that other users use this in conjunction with **context, however, I do not know how you can use the context within the FileSenor()?
Since filepath is declared as a templated field in the FileSensor class, it's possible to use Jinja templating and perform the xcom.pull() during runtime.
I think you were only missing the fact that the Jinja syntax goes withing a string, try this:
filepath = "{{ ti.xcom_pull(task_ids = 'downloaded_file', key = 'file_path') }}" + 'transformed' + '_new.csv.gz'
Let me know if that worked for you.
I have the following code:
def chunck_import(**kwargs):
...
for i in range(1, num_pages + 1):
start = lower + chunks * i
end = start + chunks
if i>1:
start = start + 1
logging.info(start, end)
if end > max_current:
end = max_current
where = 'where orders_id between {0} and {1}'.format(start,end)
logging.info(where)
import_orders_products_op = MySqlToGoogleCloudStorageOperator(
task_id='import_orders_and_upload_to_storage_orders_products_{}'.format(i),
mysql_conn_id='mysql_con',
google_cloud_storage_conn_id='gcp_con',
provide_context=True,
approx_max_file_size_bytes = 100000000, #100MB per file
sql = 'import_orders.sql',
params={'WHERE': where},
bucket=GCS_BUCKET_ID,
filename=file_name_orders_products,
dag=dag)
start_task_op = DummyOperator(task_id='start_task', dag=dag)
chunck_import_op = PythonOperator(
task_id='chunck_import',
provide_context=True,
python_callable=chunck_import,
dag=dag)
start_task_op >> chunck_import_op
This code uses PythonOperator to calculate how many runs I need from the MySqlToGoogleCloudStorageOperator and create the WHERE cluster of the SQL then it needs to execute it.
The problem is that the MySqlToGoogleCloudStorageOperator isn't being executed.
I can't actually do
chunck_import_op >> import_orders_products_op
How can I make the MySqlToGoogleCloudStorageOperator be executed inside the PythonOperator?
I think at the end of your for loop, you'll want to call import_orders_products_op.execute(context=kwargs) possibly preceded by import_orders_products_op.pre_execute(context=kwargs). This is a bit complicated in that it skips the render_templates() call of the task_instance, and actually if you instead made a task_instance to put each of these tasks in, you could call run or _raw_run_task instead but these both require information from the dagrun (which you can get in the python callable's context like kwargs['dag_run'])
Looking at what you've passed to the operators it looks like as is you'll need the templating step to load the import_orders.sql file and fill in the WHERE parameter. Alternatively it's okay within the callable itself to load the file into a string, replace the {{ params.WHERE }} part (and any others) manually without Jinja2 (or you could spend time to figure out the right jinja2 calls), and then set the import_orders_products_op.sql=the_string_you_loaded before calling import_orders_products_op.pre_execute(context=kwargs) and import_orders_products_op.execute(context=kwargs).
This is my operator:
bigquery_check_op = BigQueryOperator(
task_id='bigquery_check',
bql=SQL_QUERY,
use_legacy_sql = False,
bigquery_conn_id=CONNECTION_ID,
trigger_rule='all_success',
xcom_push=True,
dag=dag
)
When I check the Render page in the UI. Nothing appears there.
When I run the SQL in the console it return value 1400 which is correct.
Why the operator doesn't push the XCOM?
I can't use BigQueryValueCheckOperator. This operator is designed to FAIL against a check of value. I don't want nothing to fail. I simply want to branch the code based on the return value from the query.
Here is how you might be able to accomplish this with the BigQueryHook and the BranchPythonOperator:
from airflow.operators.python_operator import BranchPythonOperator
from airflow.contrib.hooks import BigQueryHook
def big_query_check(**context):
sql = context['templates_dict']['sql']
bq = BigQueryHook(bigquery_conn_id='default_gcp_connection_id',
use_legacy_sql=False)
conn = bq.get_conn()
cursor = conn.cursor()
results = cursor.execute(sql)
# Do something with results, return task_id to branch to
if results == 0:
return "task_a"
else:
return "task_b"
sql = "SELECT COUNT(*) FROM sales"
branching = BranchPythonOperator(
task_id='branching',
python_callable=big_query_check,
provide_context= True,
templates_dict = {"sql": sql}
dag=dag,
)
First we create a python callable that we can use to execute the query and select which task_id to branch too. Second, we create the BranchPythonOperator.
The simplest answer is because xcom_push is not one of the params in BigQueryOperator nor BaseOperator nor LoggingMixin.
The BigQueryGetDataOperator does return (and thus push) some data but it works by table and column name. You could chain this behavior by making the query you run output to a uniquely named table (maybe use {{ds_nodash}} in the name), and then use the table as a source for this operator, and then you can branch with the BranchPythonOperator.
You might instead try to use the BigQueryHook's get_conn().cursor() to run the query and work with some data inside the BranchPythonOperator.
Elsewhere we chatted and came up with something along the lines of this for putting in the callable of a BranchPythonOperator:
cursor = BigQueryHook(bigquery_conn_id='connection_name').get_conn().cursor()
# one of these two:
cursor.execute(SQL_QUERY) # if non-legacy
cursor.job_id = cursor.run_query(bql=SQL_QUERY, use_legacy_sql=False) # if legacy
result=cursor.fetchone()
return "task_one" if result[0] is 1400 else "task_two" # depends on results format
I call groovy script in Jenkins pipeline.
def start_time = new Date()
def sdf = new SimpleDateFormat("yyyyMMddHH:mm:ss")
println sdf.format(start_time)
But I get "201608171708:34:35", the day has been output twice.
So I test it on my local machine with groovy, and I get the same result.
Any thing I missed?
I believe there are non-ASCII/Unicode characters in the format string. (They were clear when I pasted the code into Vim.) I have removed them and this works fine:
import java.text.*
def start_time = new Date()
def sdf = new SimpleDateFormat("yyyyMMddHH:mm:ss")
println sdf.format(start_time)
Michael is right, there is problem with the text provided in the question.
By the way, in groovy, one can directly format on the Date object without using SimpleDateFormat like below and does the same:
println new Date().format('yyyyMMddHH:mm:ss')
Output
2016081711:04:17
I am doing a project in Python, django rest framework. I am using haystack SearchQuerySet. My code is here.
from haystack import indexes
from Medications.models import Salt
class Salt_Index(indexes.SearchIndex, indexes.Indexable):
text = indexes.CharField(document=True, use_template=True)
name = indexes.CharField(model_attr='name',null=True)
slug = indexes.CharField(model_attr='slug',null=True)
if_i_forget = indexes.CharField(model_attr='if_i_forget',null=True)
other_information = indexes.CharField(model_attr='other_information',null=True)
precautions = indexes.CharField(model_attr='precautions',null=True)
special_dietary = indexes.CharField(model_attr='special_dietary',null=True)
brand = indexes.CharField(model_attr='brand',null=True)
why = indexes.CharField(model_attr='why',null=True)
storage_conditions = indexes.CharField(model_attr='storage_conditions',null=True)
side_effects = indexes.CharField(model_attr='side_effects',null=True)
def get_model(self):
return Salt
def index_queryset(self, using=None):
return self.get_model().objects.all()
and my views.py file is -
from django.views.generic import View
from haystack.query import SearchQuerySet
from django.core import serializers
class Medication_Search_View(View):
def get(self,request,format=None):
try:
get_data = SearchQuerySet().all()
print get_data
serialized = ss.serialize("json", [data.object for data in get_data])
return HttpResponse(serialized)
except Exception,e:
print e
my python manage.py rebuild_index is working fine (showing 'Indexing 2959 salts') but in my 'views.py' file , SearchQuerySet() is returning an empty query set...
I am very much worried for this. Please help me friends if you know the reason behind getting empty query set while I have data in my Salt model.
you should check app name it is case sensitive.try to write app name in small letters
My problem is solved now. The problem was that i had wriiten apps name with capital letters and the database tables were made in small letters(myapp_Student). so it was creating problem on database lookup.