Daily_schedule triggered runs and backfill runs have different date partition

Daily_schedule triggered runs and backfill runs have different date partition - dagster

I have #daily_schedule triggered daily at 3 minutes past 12am
When triggered by the scheduled tick at '2021-02-16 00:03:00'
The date input shows '2021-02-15 00:00:00', partition tagged as '2021-02-15'
While if triggered via backfill for partition '2021-02-16'
The date input shows '2021-02-16 00:00:00', partition tagged as '2021-02-16'
Why does the scheduled tick fill the partition a day before? Is there an option to use the datetime of execution instead (without using cron #schedule)? This descrepency is confusing when I perform queries using the timestamp for exact dates
P.S I have tested both scheduled run and backfil run to have the same Timezone.
#solid()
def test_solid(_, date):
_.log.info(f"Input date: {date}")
#pipeline()
def test_pipeline():
test_solid()
#daily_schedule(
pipeline_name="test_pipeline",
execution_timezone="Asia/Singapore",
start_date=START_DATE,
end_date=END_DATE,
execution_time=time(00, 03),
# should_execute=four_hourly_fitler
)
def test_schedule_daily(date):
timestamp = date.strftime("%Y-%m-%d %X")
return {
"solids": {
"test_solid":{
"inputs": {
"date":{
"value": timestamp
}
}
}
}
}

Sorry for the trouble here - the underlying assumption that the system is making here is that for schedules on pipelines that are partitioned by date, you don't fill in the partition for a day until that day has finished (i.e. the job filling in the data for 2/15 wouldn't run until the next day on 2/16). This is a common pattern in scheduled ETL jobs, but you're completely right that it's not a given that all schedules will want this behavior, and this is good feedback that we should make this use case easier.
It is possible to make a schedule for a partition in the way that you want, but it's more cumbersome. It would look something like this:
from dagster import PartitionSetDefinition, date_partition_range, create_offset_partition_selector
def partition_run_config(date):
timestamp = date.strftime("%Y-%m-%d %X")
return {
"solids": {
"test_solid":{
"inputs": {
"date":{
"value": timestamp
}
}
}
}
}
test_partition_set = PartitionSetDefinition(
name="test_partition_set",
pipeline_name="test_pipeline",
partition_fn=date_partition_range(start=START_DATE, end=END_DATE, inclusive=True, timezone="Asia/Singapore"),
run_config_fn_for_partition=partition_run_config,
)
test_schedule_daily = (
test_partition_set.create_schedule_definition(
"test_schedule_daily",
"3 0 * * *",
execution_timezone="Asia/Singapore",
partition_selector=create_offset_partition_selector(lambda d:d.subtract(minutes=3)),
)
)
This is pretty similar to #daily_schedule's implementation, it just uses a different function for mapping the schedule execution time to a partition (subtracting 3 minutes instead of 3 minutes and 1 day - that's the create_offset_partition_selector part).
I'll file an issue for an option to customize the mapping for the partitioned schedule decorators, but something like that may unblock you in the meantime. Thanks for the feedback!

Just an update on this: We added a 'partition_days_offset' parameter to the 'daily_schedule' decorator (and a similar parameter to the other schedule decorators) that lets you customize this behavior. The default is still to go back 1 day, but setting partition_days_offset=0 will give you the behavior you were hoping for where the execution day is the same as the partition day. This should be live in our next weekly release on 2/18.

Related

Correct way to get accurate time in Rust?

I'm trying to get accurate time with:
use chrono::{DateTime, Local, Utc};
use std::time::SystemTime;
fn main() {
println!(
"Local.now() {}",
Local::now().format("%H:%m:%S").to_string()
);
println!("Utc.now() {}", Utc::now().format("%H:%m:%S").to_string());
let system_time = SystemTime::now();
let stime: DateTime<Utc> = system_time.into();
println!("SystemTime.now() {}", stime.format("%H:%m:%S"));
}
However, if I run it:
$ date && target/debug/mybin
Sun Jan 15 04:08:19 PM CET 2023
Local.now() 16:01:19
Utc.now() 15:01:19
SystemTime.now() 15:01:19
I don't know where comes from the shift, but I want to know what's the correct way to get the right time?

The %m token inserts the current month's number, which is 1 because it is January. You probably want %M instead, which inserts the minute number. So you are correctly obtaining the current time, but are incorrectly displaying it by using the month number in the place where you'd expect to see the minute number.
See chrono's strftime documentation for a complete list of formatting codes.

Groovy: Date and Time comparisons with a slight delay

So I have the following script:
import groovy.time.TimeCategory
def dueDate = context.expand( '${Test 4 - create user task#Response#$[\'_embedded\'][\'userTaskDtoList\'][0][\'dueDate\']}' )
def date = new Date(messageExchange.getTimestamp())
use(groovy.time.TimeCategory){
after24Hours = (date + 24.hours).format("yyyy-MM-dd'T'HH:mm:ss'Z'", TimeZone.getTimeZone('UTC')) }
assert dueDate == after24Hours
What I'm trying to do with this is take the date and time from a REST request (dueDate - which comes in UTC format and with a 24h delay) and create a new date and time from the timestamp of the moment when that request has been sent, which is registered from my system. I then convert that time to UTC to accommodate the format from dueDate and add 24h to it. At the end I verify that the date and time from dueDate and after24Hours is the same.
The output does return the same time but in certain cases if there is a delay between the time the request is being sent and the time is received then the assertion will fail. This depends on the server, usually there is a difference of like 1 millisecond but I'm thinking that if the server will be slower at some point this will definitely be bigger.
What could I do to allow some margin of error in the assertion, maybe like a few seconds or even a couple of minutes?
Ok, so I managed to do this:
import groovy.time.*
def dueDate = context.expand( '${Test 4 - create user task#Response#$[\'_embedded\'][\'userTaskDtoList\'][0][\'dueDate\']}' )
def date = new Date(messageExchange.getTimestamp())
use(groovy.time.TimeCategory){
after24Hours = (date + 24.hours).format("yyyy-MM-dd'T'HH:mm:ss'Z'", TimeZone.getTimeZone('UTC'))
def date1 = Date.parse("yyyy-MM-dd'T'HH:mm:ss'Z'", dueDate)
def date2 = Date.parse("yyyy-MM-dd'T'HH:mm:ss'Z'", after24Hours)
TimeDuration difference = TimeCategory.minus(date2, date1)
log.info date1
log.info date2
assert difference < 2.minutes
}
The script seems to work and it does return an error only if the time is longer than the one I've set in the assertion.
Unfortunately I have another issue now.
For some reason, my date output looks like this:
Fri Oct 01 16:24:10 EEST 2021: INFO: Sat Oct 02 13:24:10 EEST 2021
Which is not the correct format. That date should appear in the Zulu format, after all when I parsed the dates that was the format that I used.
Am I missing something?

What could I do to allow some margin of error in the assertion, maybe
like a few seconds or even a couple of minutes?
Instead of asserting that they are equal, you could assert that the difference between them is less than a threshold that you get to define.

If you use something like AssertJ, and I'd recommend you do, then you can do something like the following:
assertThat(dueDate).isCloseTo(after24Hours, within(1, ChronoUnit.MINUTE));
This will give a small margin to the comparison of the dates, and should fix your issue.

How to check if the device's time is between two times in Flutter from Firebase/Firestore?

In the Firestore project, I have documents in a collection containing data for shops, having fields like shopName, shopAddress, startTime(eg. 10 AM) and closeTime(eg. 10 PM) . (all strings for now)
When the user is browsing the app, i have retrieved the data from Firestore of the shops displayed in the app, now i wanna show that the shop is closed when the device's time is not between the startTime and closeTime of the shop. How do i achieve this?
So far I can detect the device's current time using dart package intl using this code:
print("${DateFormat('j').format(DateTime.now())}");
It gives output as follows:
I/flutter (14877): 6 PM
This is in DateFormat, and the data types stored in Firestore are strings.. I dont know how to compare them.. Do let me know if i have to change the data types in Firestore too.
Thank You

I think if you use 24 Hour Time Format and convert startTime, closeTime and actualTime to int or double ( if the shop close at 20:30/8:30pm), then you can easily compare them with if. On your firebase server string format is perfect.
For example you make a map and iterate it, and check if the actualTime is higher than startTime and lower than closeTime.
I have never tried this code, but i think it is going to work.
Map map = {'1am': 1, '2am': 2, '3am': 3, ... , '11pm': 23};
map.entries.forEach((e) {
if(e.key == actualTime) {
if(e.value >= startTime && e.value < closeTime) {
print('Open');
}
else{
print('Closed');
}
}
});
By the way, I think you should use UTC, because if you change the time-zone on your device, your app is going to show that the shop is closed, but in fact the shop is open, just you are in a different time-zone. You can easily implement this with this code.
var now = DateTime.now().toUtc();

Maybe you can create a hash map like this:
hashMap=['12 AM', '1 AM', '2 AM', ... , '11 PM', '12 AM'];
After that you can get the positions of startTime, closeTime and actualTime, and see if the actualTime is between start and close times positions.
Let me know if you want to give you a code example.

Time subtraction in Aurelia

I would like to print the duration of an event that occurs between 'startDateTime' and 'endDateTime', expressed in minutes or seconds (if less than 1 minute).
In other words, ${startDateTime | dateFormat:"YYYY-MM-DD HH:mm"} is 2018-09-07 11:57 and ${startDateTime | dateFormat:"YYYY-MM-DD HH:mm"} is 2018-09-07 13:00.
What I would like to print is 63 minutes.
In PHP, I would do ->getTimestamp(), but in Aurelia I have no clue what to even try.
I did test with something like ${endDateTime| dateFormat:"HH:mm:ss" - startDateTime| dateFormat:"HH:mm:ss"} but this can't work as it doesn't convert the entire date time to seconds or minutes...
Therefore, is there a clean solution I can implement in my view?

I solved it using a value converter.
import moment = require("moment");
export class DurationValueConverter {
public toView(startAt, endAt) {
if (!endAt) {
// If end date is missing, use the current date and time.
endAt = moment();
}
const duration = moment.duration(moment(endAt).diff(moment(startAt)));
return duration.humanize();
}
}
Usage: ${startedAt | duration:endedAt}

What you want to have is relative time, It's on its way to browsers, but for now, you will have to use polyfill / library for it. One you can find is from yahoo: https://github.com/yahoo/intl-relativeformat

Geb: Waiting/sleeping between tests

Is there a way to wait a set amount of time between tests? I need a solution to compensate for server lag. When creating a record, it takes a little bit of time before the record is searchable in my environment.
In the following code example, how would I wait 30 seconds between the first test and the second test and have no wait time between second test and third test?
class MySpec extends GebReportingSpec {
// First Test
def "should create a record named myRecord"() {
given:
to CreateRecordsPage
when:
name_field = "myRecord"
and:
saveButton.click()
then:
at IndexPage
}
// Second Test
def "should find record named myRecord"() {
given:
to SearchPage
when:
search_query = "myRecord"
and:
searchButton.click()
then:
// haven't figured this part out yet, but would look for "myRecord" on the results page
}
// Third Test
def "should delete the record named myRecord"() {
// do the delete
}
}

You probably don't want to wait a set amount of time - it will make your tests slow. You would ideally want to continue as soon as the record is added. You can use Geb's waitFor {} to poll for a condition to be fulfilled.
// Second Test
def "should find record named myRecord"() {
when:
to SearchPage
then:
waitFor(30) {
search_query = "myRecord"
searchButton.click()
//verify that the record was found
}
}
This will poll every half a second for 30 seconds for the condition to be fulfilled passing as soon as it is and failing if it's still not fulfilled after 30 seconds.
To see what options you have for setting waiting time and interval have look at section on waiting in The Book of Geb. You might also want to check out the section on implicit assertions in waitFor blocks.
If your second feature method depends on success of the first one then you should probably consider annotating this specification with #Stepwise.

You should always try to use waitFor and check conditions wherever possible. However if you find there isn't a specific element you can check for, or any other condition to check, you can use this to wait for a specified amount of time:
def sleepForNSeconds(int n) {
def originalMilliseconds = System.currentTimeMillis()
waitFor(n + 1, 0.5) {
(System.currentTimeMillis() - originalMilliseconds) > (n * 1000)
}
}
I had to use this while waiting for some chart library animations to complete before capturing a screenshot in a report.

Thread.sleep(30000)
also does the trick. Of course still agree to "use waitFor whenever possible".

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Daily_schedule triggered runs and backfill runs have different date partition - dagster

Related

Correct way to get accurate time in Rust?

Groovy: Date and Time comparisons with a slight delay

How to check if the device's time is between two times in Flutter from Firebase/Firestore?

Time subtraction in Aurelia

Geb: Waiting/sleeping between tests

Categories

Resources