ERROR: (gcloud.compute.ssh) Could not fetch resource: - The resource was not found - r

I'm trying to run R on Google Cloud following Google's suggested tutorial. However, I have experienced some trouble when finally creating the cluster. When creating the cluster with
elasticluster start myslurmcluster
I get the following error message
ERROR: (gcloud.compute.ssh) Could not fetch resource:
- The resource 'projects/MY_PROJECT/zones/us-central1-b/instances/myslurmcluster-frontend001' was not found
I had run through previous stages of the tutorial several times with no problems but I suspect the issue might be related to the SSH keys so that I can sign in to my cluster.
Any help or advice greatly received!

ERROR: (gcloud.compute.ssh) Could not fetch resource:
- The resource 'projects/MY_PROJECT/zones/us-central1-b/instances/myslurmcluster-frontend001' was not found
The error you are getting means that when you are trying to compute SSH, your resource was not found. The reason for this case is the instance zone and gcloud default zone are different. The command line didn’t specified the instance zone. So the google cloud compute default zone was used. Obviously, The instance should not be found in the default zone. Just adding the zone option in command could solve the problem. The command format is like:
gcloud compute --project "MY_PROJECT" ssh --zone "us-central1-b" "myslurmcluster-frontend001"
To see what your default region and zone settings are, run the following gcloud command:
gcloud compute project-info describe --project [PROJECT_ID]
where [PROJECT_ID] is your own project ID.

Hope this helps... Easy way to find out instance zones without having to type/know instance name.
gcloud compute instances list
Should list instances and information - including (time)ZONE.

Related

How to improve Cloud composer health?

I recently built 120 dags using cloud composer. They all functioned for a while.
They were all approximately the same. Each used python operator. Each made API calls to google search console. Each collected 7-9k rows of GSC data into a pandas dataframe, then uploaded this to GCS buckets and BigQuery (partitioned and clustered).
Occasionally I'd have all fail one day because the GSC auth token had been revoked, but no problem, create new credentials, upload and continue. That situation lasted a couple of months. Now nothing runs.
From the start, the cloud composer health had occasional red spots, but now the health is static red every day.
I have found documentation about how to check the health, but not how to find why the health is so poor and fix it.
Can anyone point me in the right direction?
The environment health metric depends on a Composer-managed DAG named airflow_monitoring which is triggered periodically by the airflow-monitoring pod. If this DAG isn't deleted, you can check the airflow-monitoring logs to see if there are any problems related to reading the DAG's run statuses. Consequently, you can also try troubleshooting the error in Cloud Logging using the filter:
resource.type="cloud_composer_environment"
severity=ERROR
The liveness check failure could be due to the following reasons:
Any resource constraint(Memory and CPU)
Known issue with the composer version. Please check composer
release
notes for any
known issues.
Airflow configuration as core:default_timezone(If you’ve
configured core: default_timezone airflow configuration composer
environment health will be shown as unhealthy. It is a known
issue and the composer product team is working on the resolution.)
Refer to this documentation for information on Cloud Composer’s environment health metric.
I was lucky enough to talk to someone from Google yesterday who said what I need to do is recreate my cloud composer environment because I have insufficient CPU. He suggested the flexible choice when recreating.

how to get firebase cloud messanging running on local host

I try to get a simple firebase cloud message app, web, to run. The README of the sample app says:
1. On the command line run `firebase serve -p 8081` using the Firebase CLI tool to launch a local server.
That gives the error
Error: Cannot understand what targets to deploy/serve. If you are using PowerShell make sure you place quotes around any comma-separated lists (ex: --only "functions,firestore").
So I tried
firebase serve -p 8081 .
which returns
Having trouble? Try firebase [command] --help
How can I get a local web app that receives messages get running? Will similar troubles start again when hosting on firebase.
I am stuck. Firebase is an endless series of weird docu, weird results, all errors have been encountered by others and there are lon long threads on any of them.

Airflow dag cannot find connection-id

I am managing a Google Cloud Composer environment which runs Airflow for a data engineering team. I have recently been asked to troubleshoot one of the dags they run which is failing with this error : [12:41:18,119] {credentials_utils.py:23} WARNING - [redacted-name] connection ID not available, falling back to Google default credentials
The job is basically a data pipeline which reads from various sources and stores data into GBQ. The odd part is that they have a strictly similar Dag running for a different project and it works perfectly.
I have recreated the .json credentials for the service account behind the connection as well as the connection itself in Airflow. I have sanitized the code to see if there was any hidden spaces or so.
My knowledge of Airflow is limited and I have not been able to find any similar issue in my research, any one have encountered this before?
So the DE team came back to me saying it was actually a deployment issue where an internal module involved in service account authentication was being utilized inside another DAG running in stage environment, rendering it impossible to proceed to credential fetch from the connection ID.

Google Cloud Composer (Apache Airflow) cannot access log files

I'm running a DAG in Google Cloud Composer (hosted Airflow) which runs fine in Airflow locally. All it does is print "Hello World". However, when I run it through Cloud Composer I receive the error:
*** Log file does not exist: /home/airflow/gcs/logs/matts_custom_dag/main_test/2020-04-20T23:46:53.652833+00:00/2.log
*** Fetching from: http://airflow-worker-d775d7cdd-tmzj9:8793/log/matts_custom_dag/main_test/2020-04-20T23:46:53.652833+00:00/2.log
*** Failed to fetch log file from worker. HTTPConnectionPool(host='airflow-worker-d775d7cdd-tmzj9', port=8793): Max retries exceeded with url: /log/matts_custom_dag/main_test/2020-04-20T23:46:53.652833+00:00/2.log (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f8825920160>: Failed to establish a new connection: [Errno -2] Name or service not known',))
I've also tried making the DAG add data into a database and it actually succeeds 50% of the time. However, it always returns this error message (and no other print statements or logs). Any help much appreciated on why this might be happening.
We also faced the same issue then raised a support ticket to GCP and got the following reply.
The message is related to the latency of syncing logs from Airflow workers to WebServer, it takes at least some minutes (depending on the number of objects and their size)
The total log size seems not large but it’s enough to noticeably slow down synchronization, hence, we recommend cleanup/archive the logs
Basically we recommend relying on Stackdriver logs instead, because of latency due to the design of this sync
I hope this will help you solve the problem.
I have the same problem after upgrading from 1.10.3 to 1.10.6 of Google Composer.
I can see in my logs that airflow is trying to get the logs from a bucket with a name ended with -tenant while the bucket in my account ends with -bucket
In the configuration, I can see something weird too.
## airflow.cfg
[core]
remote_base_log_folder = gs://us-east1-dada-airflow-xxxxx-bucket/logs
## also in the running configuration says
core remote_base_log_folder gs://us-east1-dada-airflow-xxxxx-tenant/logs env var
I wrote to google support and they said the team is working on a fix.
EDIT:
I've been accessing my logs with gsutil and replacing the bucket name suffix to -bucket
gsutil cat gs://us-east1-dada-airflow-xxxxx-bucket/logs/...../5.logs
I faced the same situation in multiple occasions.
As soon as when the job finished when I take a look at the log on Airflow Web UI, it used to give me the same error. Although when I check back the same logs on UI after a min or 2, I could see the logs properly.
As per the above answers, its a sync issue between the webserver and the Worker node.
In general, the issue describe here should be more like a sporadic issue.
In certain situations, what could help is setting default-task-retries to a value that allows for retrying a task at least 1.
This issue is resolved at least since Airflow version: 1.10.10+composer.

Keystone log files always empty

I am walking through the Ubuntu walk-through for installing OpenStack. I am past the keystone stage and I find that it creates log files in /var/log/keystone which are always zero length. I also get the following message in the response to many commands that otherwise work:
No handlers could be found for logger "keystoneclient.v2_0.client"
-- this may or may not be related. Any advice for a noob appreciated.
This is the Folsom release.
Keystone uses /etc/keystone/logging.conf to control logging levels. By default, the keystone logger will only show WARNING level messages and above:
[logger_root]
level=WARNING
handlers=file
If you change level to something lower (e.g., INFO, DEBUG) then restart the keystone service, you should see log messages appear in /var/log/keystone/keystone.log

Resources