Branch airflow tasks based on schedule - airflow

Is there any way we can branch airflow schedules.
Eg.
If the date is between 1-7, branch X should execute.
If the date is between 8-end_of_month branch Y should execute.
ie.
Task_1
|
Task_2
/\
X_1 Y_1
| |
X_2 Y_2
This will help to avoid new replica version of DAG and save maintenance cost.
This scenario can further be extended to regular days vs monthend, weekday vs weekends etc..

If the tasks take place in one DAG, you can download execution_date in the Python function and select which task to perform based on it. E.g:
def _branch_operator_func(**kwargs):
execution_date = kwargs['execution_date']
if execution_date.day >= 1 and execution_date.day <= 7:
return 'X_1'
elif (...)

Related

Datetime column from Table1 is not matching the DateTime column from Table 2

Hello I have an issue of matching two different datetime columns.
I need to compare the two of them (and their data), but at the moment of putting them in the same table (using a datetime relation) I do not get the match I need:
What I need:
| Datetime_1 | Datetime_2 |
| ---------- | ---------- |
| 01/01/2023 08:00:00 AM | |
... ...
| 01/11/2023 12:00:00 AM | 01/11/2023 12:00:00 AM |
| 01/11/2023 01:00:00 AM | 01/11/2023 01:00:00 AM |
... ...
| 31/01/2023 12:00:00 PM | 31/01/2023 12:00:00 PM |
What I get:
Datetime_1 goes from 01/01/2023 12:00:00AM to 01/31/2023 11:00:00PM (with steps of 1h) and Datetime_2 goes from 01/11/2023 8:15:00 PM to 02/06/2023 7:45:00 PM (with steps of 30min).
I did a relation with the two of them and I didn't receive any error:
I already put both lists in Date/Time format in Power Query and Data panel.
However, I noticed my main datetime list doesn't have the hierarchy icon on fields panel, while the secondary datetime lists have it, (but not the hour section):
Also, as I mentioned before, my list have a range between Jan and Feb. I do not understand why this range continues and match some dates on the on my main datetime list:
Troubleshooting
Part of the difficulty troubleshooting this is the two columns are formatted differently. Just for now, make sure both are formatted as Long Date Time. When comparing the relationship, do not drag the hierarchy (for the one that has it) into the table but rather just the date itself. When you do, you will see the full timestamp for both columns and the issue will become more clear.
Power BI & Relationships on DateTime
Power BI will only match related rows if the date and time match exactly, so 4/15/2023 12:00:00 AM will not match 4/15/2023/12:00:01 AM. You mentioned one side of the relationship has 30 minute steps while the other has 1 hour steps. Power BI is not going to match up a 1:30am and 1:00am value for you. If you want that 1:30 value to match up to 1:00, create another column truncating the :30 minutes and build your relationship on the truncated column.
Time Dimension
I'm not sure of your application so don't know if this will work, but when dealing with time, I try to separate Date and Time into separate columns and have both a Date and Time dimension. Below is my time dimension DAX. You can generate any minute-precise interval with it. Notice the last defined column "timekey". I create a column in my fact table to relate to this key.
DimTime =
var every_n_minutes = 15 /* between 0 and 60; remainders in last hourly slice */
/* DO NOT CHANGE BELOW THIS LINE */
var slice_per_hour = trunc(DIVIDE(60,every_n_minutes),0)
var rtn =
ADDCOLUMNS(
SELECTCOLUMNS(
GENERATESERIES(0, 24*slice_per_hour - 1, 1),
"hour24", TRUNC(DIVIDE([Value],slice_per_hour),0),
"mins", MOD([Value],slice_per_hour) * every_n_minutes
),
"hour12", MOD([hour24] + 11,12) + 1,
"asTime", TIME([hour24],[mins],0),
"timekey", [hour24] * 100 + [mins]
)
return rtn
As requested, turning this into an answer. The reason you're getting these results is that your time stamps will never line up. Yes, it let you create the join, but my guess is that is only because both fields have the same formatting. Also, it is best practices to separate your dates and time in separate date and time dimensions, then join them via a fact table. See also here.

PromQL: Counting samples of a time series

For a 2 minute time window this vector has the following results (I am using Grafana Explore with a picked 2 minute time):
instana_metrics{aggregation="max", endpoint="mutation addProduct"}
t1 - 3051
t2 - 5347
t3 - 5347
t4 - 4224
t5 - 4224
I need something equivalent to
SELECT Count(*)
FROM instana_metrics
with a result of 5.
The best I was able to come up with is this
count( instana_metrics{aggregation="max", endpoint="mutation addProduct"} )
t1 | 1
t2 | 1
t3 | 1
t4 | 1
t5 | 1
My interpretation is that every point in time has a count of 1 sample value. But the result itself is a time series, and I am expecting one scalar.
Btw: I understand that I can use Grafana transformation for this, but unfortunately I need a PromQL only solution.
Just use count_over_time function. For example, the following query returns the number of raw samples over the last 2 minutes per each time series with the name instana_metrics:
count_over_time(instana_metrics[2m])
Note that Prometheus calculates the provided query independently per each point on the graph, e.g. each value on the graph shows the number of raw samples per each matching time series over 2 minutes lookbehind window ending at the given point.
If you need just a single value per each matching series over the selected time range in Grafana, then use the following instant query:
count_over_time(instana_metrics[$__range])
See these docs about $__range variable.
See these docs about instant query type in Grafana.

how to make task to run between specific time interval with n number of times every 5minutes in a dag

Is there any possibility we can create a dag and inside that task should run multiple iterations in every 10 minutes between a time frame..
We have two tasks : t1 and t2
t1 should run for 20 times in a day for every 5 minutes of gap and once 20 times is completed it should trigger a task2 (t2)
tried creating two different dags it worked but do we have any way to do it in a single dag.
Any suggestions please ..
task_1(run for 20 times each time should have 5 minutes of gap and then) >> task_2
What if you try something like this: (this is not executable code)
for i in range(1,6):
task1 = SomeOperator(id = f"task1_execution{i}")
sleep = BashOperator(id=f'sleep_{i}', command='sleep 5')
task2 = SomeOperator(id = "task2_execution")
task1 >> task2

How to render average duration of events falling on the same time instead of their summary in App Insights?

I have telemetry for builds where some builds were started at the same time. Here is the query and the resulting graph:
AllBuilds
| project buildDef, startTime, result, runSeconds, runDuration, buildNumber
| where buildDef == 'dayforce-PR-AzureTest'
and startTime >= todatetime('2021-11-27 04:27')
and startTime < todatetime('2021-12-03 21:34')
and result == 'succeeded'
| project startTime, minutes = runSeconds / 60, runDuration, buildNumber
| render columnchart
The spikes represent the builds that fell on the same time. Their build times are added, which is not what I want. I would like to average them instead. Ideally, I prefer keeping both values, but since they fall on the same time exactly I am not sure if this is possible.
I tried adding with (accumulate=false), but it does not seem to bear any effect.
If I understand correctly, you could try using the avg() aggregation function:
AllBuilds
| where buildDef == 'dayforce-PR-AzureTest'
and startTime >= todatetime('2021-11-27 04:27')
and startTime < todatetime('2021-12-03 21:34')
and result == 'succeeded'
| summarize avg(runDuration) by bin(startTime, 1h)
| render timechart

Dates subtraction: has the event occurred or not?

If I have everyday datetime - how to find out, the event has already occurred or not, by subtraction with datetime.now()
Let we had everyday meeting at 15:35. Today John came earlier - at 12:45, but Alex was late for 2 h. and 15 min. (came at 17:40).
meet_dt = datetime(year=2015, month=8, day=19, hour=15, minute=35)
john_dt = datetime(year=2015, month=8, day=19, hour=12, minute=45)
alex_dt = datetime(year=2015, month=8, day=19, hour=17, minute=40)
print(meat_dt - john_dt) # came before > 2:50:00
print(meat_dt - alex_dt) # came after > -1 day, 21:55:00
If I take away from the big date less - then everything is fine, but conversely I recive -1 day, 21:55:00 why not -2:15:00, what a minus day?
Because timedeltas are normalized
All of the parts of the timedelta other than the days field are always nonnegative, as described in the documentation.
Incidentally, if you want to see what happened first, don't do this subtraction. Just compare directly with <:
if then < datetime.datetime.now():
# then is in the past

Resources