~/.ssh/id_rsa.pub not found error while installing capistrano as ansible playbook - wordpress

I try to install https://github.com/roots/bedrock-ansible to get a bedrock deployment (http://roots.io/wordpress-stack/) running.
When I run "vagrant up", after some time I get the error:
TASK: [capistrano-setup | Setup deploy group] *********************************
skipping: [default]
TASK: [capistrano-setup | Setup deploy user] **********************************
skipping: [default]
TASK: [capistrano-setup | Adding public key to server] ************************
fatal: [default] => could not locate file in lookup: ~/.ssh/id_rsa.pub
FATAL: all hosts have already failed -- aborting
PLAY RECAP ********************************************************************
to retry, use: --limit #/Users/johannes/site.retry
default : ok=46 changed=16 unreachable=1 failed=0
Ansible failed to complete successfully. Any error output should be
visible above. Please fix these errors and try again.
I do not have a clou how i can fix this. Do you have an idea?

It seems the role is trying to find your local public key. It should be in the location in the error message '~/.ssh/id_rsa.pub', but it's not. So either you don't have one, or you keep it in another location.
If you're not familiar with generating SSH keys you probably don't have one. I personally like the GitHub help page for this: https://help.github.com/articles/generating-ssh-keys/
(you only have to perform steps 1 and 2).
If you do have SSH keys, but in a different location, the capistrano-install role in bedrock uses some variables:
deploy_user: deploy
deploy_keys:
- "~/.ssh/id_rsa.pub"
So you can set (multiple) public key files in the deploy_keys list and they will be added to the deploy_user's authorized keys.
All this is needed because Capistrano will use the deploy user to connect to the remote server later. http://blakesmith.me/2010/02/08/understanding-public-key-private-key-concepts.html

Related

SELinux and cryptsetup: chown failed and can't access temporary keystore

I am trying to set up SELinux and an encrypted additional partition that I mount at startup using a systemd service.
If I run SELinux in permissive mode, everything runs ok (partition is correctly mounted, data can be accessed and service runs properly).
If I run SELinux in enforcing mode (enforcing=1), I am not able to mount such partition with the error:
/dev/mapper/temporary-cryptsetup-1808: chown failed: Permission denied
sh[1777]: Failed to open temporary keystore device.
sh[1777]: Command failed with code 5: Input/output error
Any ideas to fix that?
Audit2allow does not return any additional rules to be added
Solved assigning to cryptsetup the lvm_exec_t context.
In the lvm.fc file cryptsetup was defined as /bin/cryptsetup but I had to change it to /usr/sbin/cryptsetup where it actually was.

Ansible Ad-Hoc command with ssh keys

I would like to setup ansible on my Mac. I've done something similar in GNS3 and it worked but here there are more factors I need to take into account. so I have the Ansible installed. I added hostnames in /etc/hosts and I can ping using the hostnames I provided there.
I have created ansible folder which I am going to use and put ansible.cfg inside:
[defaults]
hostfile = ./hosts
host_key_checking = false
timeout = 5
inventory = ./hosts
In the same folder I have hosts file:
[tp-lab]
lab-acc0
When I try to run the following command: ansible tx-edge-acc0 -m ping
I am getting the following errors:
[WARNING]: Invalid characters were found in group names but not replaced, use -vvvv to see details
[WARNING]: Unhandled error in Python interpreter discovery for host tx-edge-acc0: unexpected output from Python interpreter discovery
[WARNING]: sftp transfer mechanism failed on [tx-edge-acc0]. Use ANSIBLE_DEBUG=1 to see detailed information
[WARNING]: scp transfer mechanism failed on [tx-edge-acc0]. Use ANSIBLE_DEBUG=1 to see detailed information
[WARNING]: Platform unknown on host tx-edge-acc0 is using the discovered Python interpreter at /usr/bin/python, but future installation of another Python interpreter could change the meaning of that path. See
https://docs.ansible.com/ansible/2.10/reference_appendices/interpreter_discovery.html for more information.
tx-edge-acc0 | FAILED! => {
"ansible_facts": {
"discovered_interpreter_python": "/usr/bin/python"
},
"changed": false,
"module_stderr": "Shared connection to tx-edge-acc0 closed.\r\n",
"module_stdout": "\r\nerror: unknown command: /bin/sh\r\n",
"msg": "MODULE FAILURE\nSee stdout/stderr for the exact error",
"rc": 0
Any idea what might the problem here? much appreciated
At first glance it seems that you ansible controller does not load configuration files (especially ansible.cfg) when playbook is fired.
(From documentation) Ansible searches for configuration files in the following order, processing the first file it finds and ignoring the rest:
$ANSIBLE_CONFIG if the environment variable is set.
ansible.cfg if it’s in the current directory.
~/.ansible.cfg if it’s in the user’s home directory.
/etc/ansible/ansible.cfg, the default config file.
Edit: For peace of mind it is good to use full paths
EDIT Based on comments
$ cat /home/ansible/ansible.cfg
[defaults]
host_key_checking = False
inventory = /home/ansible/hosts # <-- use full path to inventory file
$ cat /home/ansible/hosts
[servers]
server-a
server-b
Command & output:
# Supplying inventory host group!
$ ansible servers -m ping
server-a | SUCCESS => {
"ansible_facts": {
"discovered_interpreter_python": "/usr/bin/python"
},
"changed": false,
"ping": "pong"
}
server-b | SUCCESS => {
"ansible_facts": {
"discovered_interpreter_python": "/usr/bin/python"
},
"changed": false,
"ping": "pong"
}

Euca 5.0 Ansible Console Task Failing

Background:
I am only able to get past the ansible console install/config tasks by adding --region localhost to anywhere in: /usr/share/eucalyptus-ansible/roles/cloud-post/tasks/console.yml wherever it calls tools that take that argument.
Otherwise each sub task fails like this: ["euca-describe-images: error: connection error (('Connection aborted.', gaierror(-2, 'Name or service not known')))"]
Running the commands from that playbook directly on the euca server being configured gives the same result unless I specify --region localhost
Problem:
I'm stuck here: [cloud-post : update console route53 system domain for eucalyptus-cloud authentication]
Error: "euform-update-stack: error (ValidationError): No updates are to be performed.", "stderr_lines": ["euform-update-stack: error (ValidationError): No updates are to be performed."]
All services are running except the ImagingBackend is Not Ready
No instances are running according to euca-describe-instances
Images are available:
IMAGE ami-5be483c81cf8bd65c eucalyptus-console-image-5-0-823/eucalyptus-console-image-5-0-823.raw.manifest.xml 000216594841 available private x86_64 machine instance-store hvm
TAG image ami-5be483c81cf8bd65c type eucalyptus-console-image
TAG image ami-5be483c81cf8bd65c version 5.0.823
IMAGE ami-f31092ddb73e29af9 eucalyptus-service-image-v5.0.100/eucalyptus-service-image.raw.manifest.xml 000216594841 available privatx86_64 machine instance-store hvm
TAG image ami-f31092ddb73e29af9 provides imaging,loadbalancing
TAG image ami-f31092ddb73e29af9 type eucalyptus-service-image
TAG image ami-f31092ddb73e29af9 version 5.0.100
---
all:
hosts:
exp-euca.lan.com:
exp-enc-[01:02].lan.com:
vars:
vpcmido_public_ip_range: "192.168.100.5-192.168.100.254"
vpcmido_public_ip_cidr: "192.168.100.1/24"
cloud_system_dns_dnsdomain: "cloud.lan.com"
cloud_public_port: 443
eucalyptus_console_cloud_deploy: yes
cloud_service_image_rpm: no
cloud_properties:
services.imaging.worker.ntp_server: "x.x.x.x"
services.loadbalancing.worker.ntp_server: "x.x.x.x"
children:
cloud:
hosts:
exp-euca.lan.com:
console:
hosts:
exp-euca.lan.com:
node:
hosts:
exp-enc-[01:02].lan.com:
EDIT:
Solved. Details are in the comments of the marked answer.
The name error most likely means that DNS for the domain cloud.lan.com is not being correctly delegated to your deployment. To test this, check if the nameserver is found:
dig +short NS cloud.lan.com
you should see "ns1.cloud.lan.com" and then should be able to use that nameserver to resolve services, e.g.
dig +short ec2.cloud.lan.com #ns1.cloud.lan.com
which should be the IP of the host for the compute service.
The second item is a bug in the ansible playbook that occurs when the stack is already present and up to date. To work around it, you can either update your playbook or delete the stack before running the playbook. Depending on how far the playbook progressed you may have a script to do this:
/usr/local/bin/console-manage-stack -a delete
the related playbook change is https://github.com/AppScale/ats-deploy/pull/36

Showing error while mounting EFS to an instance in my Elastic Beanstalk environment

Followed the following procedure for attaching the EFS file to instances created using EB:
https://aws.amazon.com/premiumsupport/knowledge-center/elastic-beanstalk-mount-efs-volumes/#:~:text=In%20an%20Elastic%20Beanstalk%20environment,scale%20up%20to%20multiple%20instances.
But the logs of Elastic Beanstalk are showing following error:
[Instance: i-06593*****] Command failed on instance. Return code: 1 Output: (TRUNCATED)...fs ... mount -t efs -o tls fs-d9****:/ /efs Failed to resolve "fs-d9****.efs.us-east-1.amazonaws.com" - check that your file system ID is correct. See https://docs.aws.amazon.com/console/efs/mount-dns-name for more detail. ERROR: Mount command failed!. command 01_mount in .ebextensions/storage-efs-mountfilesystem.config failed. For more detail, check /var/log/eb-activity.log using console or EB CLI.
Just used **** in EFS ID for security.
Based on the comments.
The solution was to create new EFS filesystem, instead of using the original one.

Task fails due to not being able to read log file

Composer is failing a task due to it not being able to read a log file, it's complaining about incorrect encoding.
Here's the log that appears in the UI:
*** Unable to read remote log from gs://bucket/logs/campaign_exceptions_0_0_1/merge_campaign_exceptions/2019-08-03T10:00:00+00:00/1.log
*** 'ascii' codec can't decode byte 0xc2 in position 6986: ordinal not in range(128)
*** Log file does not exist: /home/airflow/gcs/logs/campaign_exceptions_0_0_1/merge_campaign_exceptions/2019-08-03T10:00:00+00:00/1.log
*** Fetching from: http://airflow-worker-68dc66c9db-x945n:8793/log/campaign_exceptions_0_0_1/merge_campaign_exceptions/2019-08-03T10:00:00+00:00/1.log
*** Failed to fetch log file from worker. HTTPConnectionPool(host='airflow-worker-68dc66c9db-x945n', port=8793): Max retries exceeded with url: /log/campaign_exceptions_0_0_1/merge_campaign_exceptions/2019-08-03T10:00:00+00:00/1.log (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f1c9ff19d10>: Failed to establish a new connection: [Errno -2] Name or service not known',))
I try viewing the file in the google cloud console and it also throws an error:
Failed to load
Tracking Number: 8075820889980640204
But I am able to download the file via gsutil.
When I view the file, it seems to have text overriding other text.
I can't show the entire file but it looks like this:
--------------------------------------------------------------------------------
Starting attempt 1 of 1
--------------------------------------------------------------------------------
#-#{"task-id": "merge_campaign_exceptions", "execution-date": "2019-08-03T10:00:00+00:00", "workflow": "__campaign_exceptions_0_0_1"}
[2019-08-04 10:01:23,313] {models.py:1569} INFO - Executing <Task(BigQueryOperator): merge_campaign_exceptions> on 2019-08-03T10:00:00+00:00#-#{"task-id": "merge_campaign_exceptions", "execution-date": "2019-08-03T10:00:00+00:00", "workflow": "__campaign_exceptions_0_0_1"}
[2019-08-04 10:01:23,314] {base_task_runner.py:124} INFO - Running: ['bash', '-c', u'airflow run __campaign_exceptions_0_0_1 merge_campaign_exceptions 2019-08-03T10:00:00+00:00 --job_id 22767 --pool _bq_pool --raw -sd DAGS_FOLDER//-campaign-exceptions.py --cfg_path /tmp/tmpyBIVgT']#-#{"task-id": "merge_campaign_exceptions", "execution-date": "2019-08-03T10:00:00+00:00", "workflow": "__campaign_exceptions_0_0_1"}
[2019-08-04 10:01:24,658] {base_task_runner.py:107} INFO - Job 22767: Subtask merge_campaign_exceptions [2019-08-04 10:01:24,658] {settings.py:176} INFO - setting.configure_orm(): Using pool settings. pool_size=5, pool_recycle=1800#-#{"task-id": "merge_campaign_exceptions", "execution-date": "2019-08-03T10:00:00+00:00", "workflow": "__campaign_exceptions_0_0_1"}
Where the #-#{} pieces seems to be "on top of" the typical log.
I faced the same problem. In my case the problem was that I removed the google_gcloud_default connection that was being used to retrieve the logs.
Check the configuration and look for the connection name.
[core]
remote_log_conn_id = google_cloud_default
Then check the credentials used for that connection name has the right permissions to access the GCS bucket.
I'm having a similar problem with viewing logs in GCP Cloud Composer. It doesn't appear to be preventing the failing DAG task from running though. What it looks like is a permissions error between the GKE and Storage Bucket where the log files are kept.
You can still view the logs by going into your cluster's storage bucket in the same directory as your /dags folder where you should also see a logs/ folder.
Your helm chart should setup global env:
- name: AIRFLOW_CONN_GOOGLE_CLOUD_DEFAULT
value: "google-cloud-platform://"
Then, you should deploy a Dockerfile with root account only (not airflow account), additionaly, you set up your helm uid, gid as:
uid: 50000 #airflow user
gid: 50000 #airflow group
Then upgrade helm chart with new config
*** Unable to read remote log from gs://bucket
1)Found the solution after assigning the roles to the service account
2)The SA key(json or txt) to be added and configured to the connection in the
remote_log_conn_id = google_cloud_default
3)restart the scheduler and webserver of the airflow
4)restart the dags on the airflow
you can find the logs on the GCS bucket where its configured

Resources