Airflow Audit Logs - airflow

I'm wondering what Airflow offers in the sense of Audit Logs. My Airflow environment is running Airflow version 1.10 and uses the [ldap] section of the airflow.cfg file to use my companies Active Dicrectory (AD) for authentication. I see when someone logs into Airflow through the Web UI it writes the users name into the webserver's log (shown below). I'm wondering though if Airflow can be modified to also log when the user turns on/off a DAG, creates a new Airflow Variable or Pool, Clears a Task, marks a Task as Success, and any other operation that a user can do.
I need to be able to have some sort of tractability to the user's activities because in order to use Airflow at my work I have to get it to pass a security review from an Architect and he requires the ability to trace user's activities.
Is this ability offered out of the box by Airflow? I see that if I were to go with Google Cloud's Airflow service called Cloud Composer then I would get Audit Logs through their service but unfortunately I'm tied to the Amazon Web Services (AWS) ecosystem and I am maintaining Airflow myself (not provided through a service).
I see on the airflow webserver logs that when I traverse the Airflow Web UI it's sending rest calls
161.179.215.170 - - [17/Sep/2018:16:39:26 -0400] "GET /admin/ HTTP/1.1" 200 71942 "http://1.2.3.4:8080/admin/airflow/graph?dag_id=ARL_OnDemand" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/68.0.3440.106 Safari/537.36"
and when I log in I see it tells me the username (which is logged in the login function here https://github.com/apache/incubator-airflow/blob/master/airflow/contrib/auth/backends/ldap_auth.py)
[2018-09-17 16:27:15,493] {ldap_auth.py:287} INFO - User foobaruser successfully authenticated
161.179.215.170 - - [17/Sep/2018:16:27:16 -0400] "POST /admin/airflow/login HTTP/1.1" 302 221 "http://1.2.3.4:8080/admin/airflow/login?next=%2Fadmin%2F" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/68.0.3440.106 Safari/537.36"
So I'm wondering if there's a way for me to update the webserver logs so that every time it logs a GET or POST request it also logs the client who sent the request. This would satisfy my audit log needs because I would always know what user did what in Airflow on the UI.
Update:
In this article
https://wecode.wepay.com/posts/improving-airflow-ui-security
Apparently Airflow 1.10 has introduced a whole new Website Security architecture and they will be deprecating the original Flask UI in the future.
This piece I found interesting relevant to this post though is the part where she talks about action logging being passive instead of being preemptive, I wonder if that's related to Audit Logging?
During this time, several improvements were made on security,
including adding an action logging feature and creating a hard-coded
naive RBAC implementation. However, the action logging was passive
rather than preemptive, and the native RBAC implementation still
allowed read and write access to DAGs for all roles, so they didn’t
address our security concerns.
WORKING SOLUTION:
Despite me saying I was on Airflow version 1.10 I was actually on Airflow version 1.9 :) On Airflow vesion 1.9 the Owner column on the Logs was always blank for me unless it said Airflow. But after upgrading to Airflow version 1.10 and connecting to my LDAP now I see my LDAP username (kbridenstine) logged under Owner every time I do a modifying command!
And for the icing on the cake Airflow is also logging when someone on the server runs an Airflow command (because you can modify Airflow via their CLI commands too). You can see this with the root and ec2-users I was using for Airflow on my ec2-instance server running Airflow.

I think the logs under AIRFLOW_WEB_SERVER_URL:PORT/admin/log/ should provide you with enough information i.e. if someone clear a dag using UI or cli as shown in the screenshot below.
Some of this metadata is retrieved from the MetaDB.

Related

How to unblock IPAM Access in Windows Server 2022?

I'm using Windows Server 2022 where I'm stucked in completing my IPAM Server Task after step 4 - "Start server discovery". When I proceed to step 5 - "Select or add servers to manage and verify IPAM Access".
When I tried to "Edit Server". I encountered this error as you can see on the screenshot below.
I encountered those errors. I have already ran these commands below in the powershell.
Invoke-IPAMGPOProvisioning –Domain depeddumaschools.com -GPOPrefixName DCSGROUP -IPAMServerFQDN WIN-LODU3GE5I1E.depeddumaschools.com -DelegatedGPOUser DEPEDDUMASCHOOL\Administrator
gpupdate /force
I can't still manage to unblock the IPAM Access and I have thoroughly followed the steps in these two articles below.
https://msftwebcast.com/2020/01/install-and-configure-ipam-in-windows-server-2019.html
https://mehic.se/2017/05/23/install-and-configure-ip-address-management-ipam-2016-part-1/
As you can see on my Group Policy Management below
I was able to update the group policy on our domain controller. Is there anything else that I still missed on my setting and configuration along the way? Please advice. Thanks

WebDriver - headless issue

I need to automate this following website:
https://ekrs.ms.gov.pl/web/wyszukiwarka-krs/strona-glowna/index.html
When I work on my automation in my testing environment then all is fine, but in test I use "visible" normal mode.
But on enduser PC this should be run in headless mode, so I checked my code and I notice that with headless mode this website returns: The requested URL was rejected. Please consult with your administrator
Any concept why this issue occurs and how to solve this problem?
Thank you in advance
I also have this following information get back from WebDriver:
Starting ChromeDriver 96.0.4664.45
(76e4c1bb2ab4671b8beba3444e61c0f17584b2fc-refs/branch-heads/4664#{#947})
on port 9515 Only local connections are allowed. Please see
https://chromedriver.chromium.org/security-considerations for
suggestions on keeping ChromeDriver safe. ChromeDriver was started
successfully.
DevTools listening on
ws://127.0.0.1:63205/devtools/browser/ffacc4cb-af7c-4157-881d-a8c7db522d30
[1206/145642.826:ERROR:command_buffer_proxy_impl.cc(125)]
ContextResult::kTransientFailure: Failed to send
GpuControl.CreateCommandBuffer. [1206/145645.262:INFO:CONSOLE(402)]
"The AudioContext was not allowed to start. It must be resumed (or
created) after a user gesture on the page.
https://...........goo.gl/7K7WLu", source:
https://ekrs.ms.gov.pl/TSPD/08c5699bd4ab2000035ad69152344c2a5571187707e8019758fff5530615875b3778567088bde213?type=11
(402) [1206/145645.263:INFO:CONSOLE(402)] "The ScriptProcessorNode is
deprecated. Use AudioWorkletNode instead.
(https://.........bit.ly/audio-worklet)", source:
https://ekrs.ms.gov.pl/TSPD/08c5699bd4ab2000035ad69152344c2a5571187707e8019758fff5530615875b3778567088bde213?type=11
(402) [1206/145645.264:INFO:CONSOLE(405)] "The AudioContext was not
allowed to start. It must be resumed (or created) after a user gesture
on the page. https://...........goo.gl/7K7WLu", source:
https://ekrs.ms.gov.pl/TSPD/08c5699bd4ab2000035ad69152344c2a5571187707e8019758fff5530615875b3778567088bde213?type=11
(405) [1206/145645.265:INFO:CONSOLE(408)] "The AudioContext was not
allowed to start. It must be resumed (or created) after a user gesture
on the page. https://...........goo.gl/7K7WLu", source:
https://ekrs.ms.gov.pl/TSPD/08c5699bd4ab2000035ad69152344c2a5571187707e8019758fff5530615875b3778567088bde213?type=11
(408) [1206/145645.265:ERROR:web_contents_delegate.cc(228)]
WebContentsDelegate::CheckMediaAccessPermission: Not supported.
[1206/145645.265:ERROR:web_contents_delegate.cc(228)]
WebContentsDelegate::CheckMediaAccessPermission: Not supported.
[1206/145645.306:ERROR:gl_utils.cc(318)] [.WebGL-0000249C00081B00]GL
Driver Message (OpenGL, Performance, GL_CLOSE_PATH_NV, High): GPU
stall due to ReadPixels [1206/145645.467:ERROR:gl_utils.cc(318)]
[.WebGL-0000249C00081B00]GL Driver Message (OpenGL, Performance,
GL_CLOSE_PATH_NV, High): GPU stall due to ReadPixels
[1206/145645.564:ERROR:gl_utils.cc(318)] [.WebGL-0000249C00081B00]GL
Driver Message (OpenGL, Performance, GL_CLOSE_PATH_NV, High): GPU
stall due to ReadPixels [1206/145645.652:INFO:CONSOLE(0)]
"[.WebGL-0000249C00081B00]GL Driver Message (OpenGL, Performance,
GL_CLOSE_PATH_NV, High): GPU stall due to ReadPixels", source:
https://ekrs.ms.gov.pl/TSPD/?type=20 (0)
[1206/145645.652:INFO:CONSOLE(0)] "[.WebGL-0000249C00081B00]GL Driver
Message (OpenGL, Performance, GL_CLOSE_PATH_NV, High): GPU stall due
to ReadPixels", source: https://ekrs.ms.gov.pl/TSPD/?type=20 (0)
[1206/145645.654:INFO:CONSOLE(0)] "[.WebGL-0000249C00081B00]GL Driver
Message (OpenGL, Performance, GL_CLOSE_PATH_NV, High): GPU stall due
to ReadPixels", source: https://ekrs.ms.gov.pl/TSPD/?type=20 (0)
EDIT: 2021/12/08
Finally I find out that a had to add capability user-agent as Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.93 Safari/537.36. The interesting thing was that when I was used 60.0.3112.50 instead 96.0.4664.93 then my automation works well in Headless when it come to navigate to the desired wegsite, but stoped to work on even in Normal mode when it comes to using this website - I mean navigation to website works but after filling the form and submiting data I started to get the same issue ....consult administrator.......
To clarify the matter:
Before I added args user-agent in normal mode works both navigate and search feature.
Before I added args user-agent with outdated 60.0.3112.50 setting, in normal mode works navigate but search stop working.
So now my question changes to:
Why, with out-of-date settings in user-agent , the navigation to the page work properly, but the search on this page does not work? Could it just be related to the strange configuration, design of this site?
options.add_argument("disable-blink-features")
options.add_argument("disable-blink-features=AutomationControlled")
only these two lines are solution
and now it work perfect

Airflow dag cannot find connection-id

I am managing a Google Cloud Composer environment which runs Airflow for a data engineering team. I have recently been asked to troubleshoot one of the dags they run which is failing with this error : [12:41:18,119] {credentials_utils.py:23} WARNING - [redacted-name] connection ID not available, falling back to Google default credentials
The job is basically a data pipeline which reads from various sources and stores data into GBQ. The odd part is that they have a strictly similar Dag running for a different project and it works perfectly.
I have recreated the .json credentials for the service account behind the connection as well as the connection itself in Airflow. I have sanitized the code to see if there was any hidden spaces or so.
My knowledge of Airflow is limited and I have not been able to find any similar issue in my research, any one have encountered this before?
So the DE team came back to me saying it was actually a deployment issue where an internal module involved in service account authentication was being utilized inside another DAG running in stage environment, rendering it impossible to proceed to credential fetch from the connection ID.

Google Cloud Composer The server encountered a temporary error and could not complete your request

After running for a couple of days Google Cloud Composer web UI returns the 502 Server Error indefinitely:
Error: Server Error
The server encountered a temporary error and could not complete your request.
Please try again in 30 seconds.
The only way to fix it is to recreate the Composer environment. Though after running for a couple of days the new environment crashes with the same error.
Image version: composer-1.4.0-airflow-1.10.0
Python version: 3
Anyone knows what's the root cause?
I don't run Cloud Composer but I suspect that there's a case where the webserver has exited from all the web worker threads. This can sometimes happen when airflow has an extended timeout reading or writing to the database; either due to a held lock, or network connection issues. It probably is configured to restart if it fully exits, but there are some cases were the airflow webserver command will still hold on without exiting even though all web workers have exited.
Alternatively the 502 is about the identity provider implemented for GCP. If that's the case you might find you need to sign out of your Google login and use the sign in flow provided by Airflow (if it responds to a private browser session or a signed out session).
I was facing the same 502 error and it turned out to be an issue with the DAG itself. As mentioned:
https://cloud.google.com/composer/docs/how-to/using/troubleshooting-dags
"The web server parses the DAG definition files, and a 502 gateway timeout can occur if there are errors in the DAG."
Visible in Composer / Monitoring
Web server was affected by an issue with the DAG itself. We solved it by deleting the recently added DAGs, after couple of minutes the Airflow UI was up.

Mono.Security.Protocol.Tls.TlsException Received 0 bytes from stream in MVC under Mono

I have an ASP.NET MVC application running on Mono 4.0.5 under Ubuntu 15.04. The application works as expected while the internet access is available, but if the OS is restarted on a network without internet connection, I get the following error:
I have tried updating machine and user certificate stores without any success using "mozroots --import --sync --machine".
It should be noted that this error only occurs on the Login page (using Forms Authentication with MySQL provider with "requireSSL" set to "false").
I don't use SSL on any of my pages and don't have it enabled/configured in Apache/Mod_Mono configuration. The LoginController doesn't make any (e.g. HTTPS) requests either.
Also, I've tried running the application through XSP4, which produced exactly the same behavior.
Any help would be much appreciated...
After inspecting logs, I've noticed that the system date was set to to 01/01/1970. After updating the date and restarting Apache, everything worked. I guess in my case the NTP was updating the date/time on boot every time and without internet connection was falling back to Unix epoch.

Resources