How to fix Airflow logging? - airflow

I have S3 remote logging enabled, Airflow is installed on an EC2. My dags are running however, they don't always create a log and then fails. The error is as follows:
*** Falling back to local log
*** Log file does not exist: /home/ec2-user/airflow/logs/REMOVED/REMOVED/2022-03-07T07:00:00+00:00/2.log
*** Fetching from: http://ip-10-105-32-92.eu-west-1.compute.internal:8793/log/REMOVED/REMOVED/2022-03-07T07:00:00+00:00/2.log
*** Failed to fetch log file from worker. Client error '404 NOT FOUND' for url 'http://ip-10-105-32-92.eu-west-1.compute.internal:8793/log/REMOVED/REMOVED/2022-03-07T07:00:00+00:00/2.log'
For more information check: https://httpstatuses.com/404
After a few attempts (3-5), it eventually does end up working.
I have even disabled the remote logging in an attempt to debug, and it still doesn't work. Any suggestions?
Apache Airflow version: 2.2.4
We use a describe stacks API call to get the latest ECS Task Definition, and I've noticed we have lots of these errors:
An error occurred (Throttling) when calling the DescribeStacks operation (reached max retries: 4): Rate exceeded

Related

Google Cloud Composer (Airflow) - Scalability issues

I'm facing some issues on my Cloud Composer instance resulting in failed tasks.
Details of instance configuration :
Composer image : composer-2.0.29-airflow-2.3.3 / Airflow version : 2.3.3
Airflow.cfg :
parallelism = 32 / dag_concurrency = 100 / worker_concurrency = 24
In terms of resources :
I have 60 DAGs which can contains up to 55 tasks that needs to run in parallel.
They don't do any compute, only some light PythonOperator/GCSOperator/BigQueryOperator.
I often encounter this type of errors :
*** Log file is not found: gs://xxx/xxx/attempt=2.log.
*** The task might not have been executed or worker executing it might have finished abnormally (e.g. was evicted).
*** Please, refer to https://cloud.google.com/composer/docs/how-to/using/troubleshooting-dags#common_issues hints to learn what might be possible reasons for a missing log.
All of my tasks have 3 retries but when it happens for a reason it stops at 2 retries and send a failure error. I don't understand why. Example of error in mail sent :
Try 2 out of 3
Exception:
Executor reports task instance finished (failed) although the task says its queued. (Info: None) Was the task killed externally?
I also receives random zombie tasks Detected as zombie
My metrics are the following :
When I clear the task, it succeeds as it should.
(I don't have access to GKE but if it helps I can ask to have access)
Any advice to prevent this errors and understand what happens ?

Artifactory - Failed to update stats with error couldn't find versionIDs for the given paths

My Artifactory logs are showing the following errors with alarming frequency. The metadata service is up and healthy according to Artifactory, and aside from the log spam, it doesn't seem to be causing any problems. Does anyone have any ideas how to fix this?
[jfrt ] [ERROR] [af10ed1c492f4e88] [s.MetadataEventServiceImpl:346] [art-exec-6 ] - Unable to send statistics event to Metadata Server. Caught exception: Failed executing api/v1/stats, with response code: HTTP/1.1 500 Internal Server Error and response message: {"cause":"Internal error while processing request","message":"Failed to update stats with error couldn't find versionIDs for the given paths: couldn't find versionIDs for the given paths"}
Artifactory 7.27.10, running in Kubernetes
Using an external postgres 13 database
Using s3 as the storage backend
This is a known issue (documented internally as META-1180). This has been fixed and is released with Artifactory 7.29. This version of Artifactory is scheduled for release sometime over the next few weeks.

Corda throws error trying to generate the basic nodes

Am trying to generate the basic nodes- PartyA, PartyB and Notary on Ubuntu 14 by running ./gradlew deployNodes or even ./gradlew clean deployNodes. The error reads:
... still waiting. If this is taking longer than usual, check the node logs.
Error while generating node info file /cordapp-template-java/build/nodes/Notary/logs
Error while generating node info file /cordapp-template-java/build/nodes/PartyB/logs
Error while generating node info file /cordapp-template-java/build/nodes/PartyA/logs
Task :deployNodes FAILED
FAILURE: Build failed with an exception.
What went wrong:
Execution failed for task ':deployNodes'.
Error while generating node info file. Please check the logs in /cordapp-template-java/build/nodes/Notary/logs.
Error while generating node info file. Please check the logs in /cordapp-template-java/build/nodes/Notary/logs.
The error logs do not provide any indication of error.
I have personally run into the above question myself. From what I saw, it seems it was a random incident on the Unix based machine.
The issue was resolved after I moved the project to the different location. It is absurd. But I have never ran into this issue ever again.

Execution failed for task ':kotlin-source:compileKotlin'

I tried to run the cordapp-example from the command prompt. But when I input the "gradlew.bat deployNodes" command, I got the error as below.
Execution failed for task ':kotlin-source:compileKotlin'.
Could not resolve all dependencies for configuration ':kotlin-source:compileClasspath'.
Could not determine artifacts for org.jolokia:jolokia-war:1.3.7
Could not get resource 'https://jcenter.bintray.com/org/jolokia/jolokia-war/1.3.7/jolokia-war-1.3.7.war'.
Could not HEAD 'https://jcenter.bintray.com/org/jolokia/jolokia-war/1.3.7/jolokia-war-1.3.7.war'.
sun.security.validator.ValidatorException: PKIX path building failed:
sun.security.provider.certpath.SunCertPathBuilderException:
unable to find valid certification path to requested target
I had done the "keytool -importcert" command to import the certificate of jolokia-war into java cacerts, but the error was still there.
Does anyone know how to solve the problem?
The error message indicates that the compilation failed because your machine was unable to download the resource at this location: https://jcenter.bintray.com/org/jolokia/jolokia-war/1.3.7/jolokia-war-1.3.7.war.
This URL works for me and I was able to download the resource. You are most likely behind a firewall or otherwise unable to download this resource. You need to make your machine able to access this resource to compile and deploy the node.

sbt-release plugin logs git push as error, despite it succeeding

I am using the sbt-release plugin.
The process seems to work, however, sbt logs the final release step, pushChanges as error. Ideally, only actual errors are logged to error output as it can confuse the automation.
Sample output here:
Push changes to the remote repository (y/n)? [y] y
[error] To git#git.mycompany.com:gsilin/s3-client.git
[error] 67277ef..a1b959f my_branch -> my_branch
[error] To git#git.mycompany.com:gsilin/s3-client.git
[error] * [new tag] v0.1.8 -> v0.1.8
my_branch in this case is not the master branch (as I'm testing this process on my own branch before it goes to master), could that be the issue?
I don't know if something's changed in the latest version, but before sbt-release was warning you before this push step, that git sends it's info on stderr and so it will be shown with error messages in sbt although the process goes perfectly fine. So it's ok, don't worry.

Resources