Re-run airflow historical runs - airflow

I have a dag with the following parameters:
start_date=datetime(2020, 7, 6)
schedule_interval="0 12 * * *",
concurrency=2,
max_active_runs=6,
catchup=True
I had to re-process a year's historical data, so I did a reset of dag run status for past one year. In the mid of re-processing, I realised I need to re-process a few latest days first due to some business priority change, but airflow seems be a bit random in picking which days to run, though often favours old days more, so my tree view of the dag run is a bit messed up, and it's going to take quite a while to catch up all runs.
I had two choices:
Set all old dag run days to failure
Delete old dag run days
To avoid generating excessive number of failure notification, I chose the 2nd option.
Here is a quick illustration. Before I delete, in my tree view, I have:
1 Sep 2021 - 1 Jan 2022: dag run successful
2 Jan 2022 - 3 Jan 2022: dag running
4 Jan 2022 - 1 Aug 2022: dag scheduled
2 Aug 2022 - 6 Aug 2022: dag run successful
7 Aug 2022 - 1 Sep 2022: dag scheduled
To speed up the process of August data, I deleted dag runs scheduled between 4 Jan and 1 Aug, now the tree view now becomes
1 Sep 2021 - 1 Jan 2022: dag run successful
2 Jan 2022 - 3 Jan 2022: dag running
2 Aug 2022 - 6 Aug 2022: dag run successful
7 Aug 2022 - 1 Sep 2022: dag scheduled
Note that dag runs between 4 Jan 2022 and 1 Aug 2022 are now completely gone from the tree view.
Unfortunately, because the latest dag run is 6 Aug 2022 and airflow thinks there is only runs starting from 7 Aug to catchup and all deleted runs between 4 Jan and 1 Aug are hence ignored.
So my question now is, if I don't want to re-process those days that have already been re-processed, is there a way for me to tell airflow I need to re-run those days I have deleted?

Related

Append list of logged in users to a log file using crontab?

I need to create a basic log file through the use of a crontab job that appends a timestamp, followed by a list of logged in users. It must be at 23:59 each night.
(I have used 18 18 * * * as an example to make sure the job works for now)
So far, I have;
!#/bin/bash
59 23 * * * (date ; who) >> /root/userlogfile.txt
for my crontab script, the output;
Fri Dec 9 18:18:01 UTC 2022
root console 00:00 Dec 9 18:15:15
My required output is something similar to;
Fri 09 Dec 23:59:00 GMT 2022
user1 tty2 2017-11-30 22:00 (:0)
user5 pts/1 2017-11-30 20:35 (192.168.1.1)
How would I go about this?

OpenVPN Server TCP_CLIENT link local: (not bound)

I've been trying to set up an OpenVPN server on my Linux recently but I continuously get the same error every time I try to connect to my server.
My settings are like this:
proto tcp
port 443
resolv-retry infinite
nobind
user nobody
group nogroup
cipher AES-256-CBC
auth SHA256
script-security 2
up /etc/openvpn/update-systemd-resolved
down /etc/openvpn/update-systemd-resolved
down-pre
dhcp-option DOMAIN-ROUTE .
I have checked the settings on my server and local computer a million times and all of them are the same. Still don't know what I have to do about it. Thanks in advance! :*
Sat Nov 27 23:45:11 2021 OpenVPN 2.4.7 x86_64-pc-linux-gnu [SSL (OpenSSL)] [LZO] [LZ4] [EPOLL] [PKCS11] [MH/PKTINFO] [AEAD] built on Jul 19 2021
Sat Nov 27 23:45:11 2021 library versions: OpenSSL 1.1.1f 31 Mar 2020, LZO 2.10
Sat Nov 27 23:45:11 2021 NOTE: the current --script-security setting may allow this configuration to call user-defined scripts
Sat Nov 27 23:45:11 2021 TCP/UDP: Preserving recently used remote address: [AF_INET]myserverip:443
Sat Nov 27 23:45:11 2021 Socket Buffers: R=[131072->131072] S=[16384->16384]
Sat Nov 27 23:45:11 2021 Attempting to establish TCP connection with [AF_INET]myserverip:443 [nonblock]
Sat Nov 27 23:45:12 2021 TCP connection established with [AF_INET]myserverip:443
Sat Nov 27 23:45:12 2021 TCP_CLIENT link local: (not bound)
Sat Nov 27 23:45:12 2021 TCP_CLIENT link remote: [AF_INET]myserverip:443
Sat Nov 27 23:45:12 2021 NOTE: UID/GID downgrade will be delayed because of --client, --pull, or --up-delay
Sat Nov 27 23:45:12 2021 Connection reset, restarting [0]
Sat Nov 27 23:45:12 2021 SIGUSR1[soft,connection-reset] received, process restarting
Sat Nov 27 23:45:12 2021 Restart pause, 5 second(s)

Scraping a Table that has "Show More" button using R

I am trying to pull some economic data from Investing.com. Here is a link to the non-farm payroll I am looking to pull.
https://ca.investing.com/economic-calendar/nonfarm-payrolls-227
As you can see, once you click the show more button, more rows are loaded. I would like to scrape all the hidden data in the table.
If you inspect the page you can quite easily see the html tags associated with each row. I was wondering if there was an easy way to scrape the data without using R selenium.
Here is my current code that only returns the 6 rows initially showed when first entering the site.
x = read_html("https://ca.investing.com/economic-calendar/nonfarm-payrolls-227")%>%
html_nodes('table')%>%.[1]%>%html_table(fill = T)
print(x)
# Release Date Time Actual Forecast Previous
1 May 03, 2019 (Apr) 08:30 263K 181K 189K NA
2 Apr 05, 2019 (Mar) 08:30 196K 175K 33K NA
3 Mar 08, 2019 (Feb) 09:30 20K 181K 311K NA
4 Feb 01, 2019 (Jan) 09:30 304K 165K 222K NA
5 Jan 04, 2019 (Dec) 09:30 312K 178K 176K NA
6 Dec 07, 2018 (Nov) 09:30 155K 200K 237K NA

How to run cronjob on alternate weekday?

I have a script which runs everyday at 1.00 AM regularly for every day.
But On every alternate Wednesday I need to change the timings to 6.00 AM and which currently I am doing separately on every Tuesday Manually.
e.g
Wednesday Nov 09 2016 6.00 AM.
Wednesday Nov 23 2016 6.00 AM.
Wednesday Dec 07 2016 6.00 AM.
The main thing is for every Wednesday in between the job should be as per regular timings.
Using this bash trick it could be done with 3 cron entries (possibly 2):
#Every day except Wednesdays at 1am
0 1 * * 0,1,2,4,5,6 yourCommand
#Every Wednesdays at 1am, proceeds only on even weeks
0 1 * * 3 test $((10#$(date +\%W)\%2)) -eq 0 && yourCommand
#Every Wednesdays at 6am, proceeds only on odd weeks
0 6 * * 3 test $((10#$(date +\%W)\%2)) -eq 1 && yourCommand
Change the -eq's to 1 or 0 depending if you want to start with odd or even week. It should work according to your example, because Wednesday Nov 09 2016 6.00 AM is even.

Matplotlib date on y axis

I am trying to plot a series of sunset times in matplotlib but I get the following error:
"TypeError: Empty 'DataFrame': no numeric data to plot"
I have looked at several options to convert, e.g. plt.dates.date2num but that doesn't really fullfil my needs as i would like to plot it in a readable format, i.e. times. All examples I have found have times on the x-axis but non have them on the y-axis.
Is there no way of accomplishing this task? Has anyone got an idea?
I am looking very forward to your replies.
Best regards, Arne
3 Jan 2013 16:44:00
4 Jan 2013 16:45:00
5 Jan 2013 16:46:00
6 Jan 2013 16:47:00
7 Jan 2013 16:48:00
8 Jan 2013 16:49:00
9 Jan 2013 16:51:00
10 Jan 2013 16:52:00
11 Jan 2013 16:53:00
12 Jan 2013 16:55:00
13 Jan 2013 16:56:00
14 Jan 2013 16:57:00
It's not quite clear from your question if you're trying to plot some unspecified data on the x-axis with date/time on the y-axis or if you're trying to plot days on the x-axis with times on the y-axis.
From your question, though, I'm going to assume it's the latter.
It sounds like you might be using pandas, but for the moment, I'll just assume you have two sequences of strings: One with the day, and another sequence with the time.
To treat a given axis as dates, just call ax.xaxis_date() or ax.yaxis_date(). In this case, both will actually be dates. (The times will have today as the day, though you won't see this directly.)
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
date = ['3 Jan 2013', '4 Jan 2013', '5 Jan 2013', '6 Jan 2013', '7 Jan 2013',
'8 Jan 2013', '9 Jan 2013', '10 Jan 2013', '11 Jan 2013', '12 Jan 2013',
'13 Jan 2013', '14 Jan 2013']
time = ['16:44:00', '16:45:00', '16:46:00', '16:47:00', '16:48:00', '16:49:00',
'16:51:00', '16:52:00', '16:53:00', '16:55:00', '16:56:00', '16:57:00']
# Convert to matplotlib's internal date format.
x = mdates.datestr2num(date)
y = mdates.datestr2num(time)
fig, ax = plt.subplots()
ax.plot(x, y, 'ro-')
ax.yaxis_date()
ax.xaxis_date()
# Optional. Just rotates x-ticklabels in this case.
fig.autofmt_xdate()
plt.show()

Resources