Step timed out while step is verifying the SSM Agent availability on the target instance(s). SSM Agent on Instances: [i-xxxxxxx] are not functioning - patch

I am working on "Patch an AMI and update an Auto Scaling group" and followed the AWS document to configure but I am stuck at "Task 3: Create a runbook, patch the AMI, and update the Auto Scaling group" with the below error. To fix it I have added "user data" while starting the instance(startInstances). As it's accepting only base64, converted and provided base64(UmVzdGFydC1TZXJ2aWNlIEFtYXpvblNTTUFnZW50Cg==).
I tried to execute with the below user data but both are not working, even I tried to apply a new step with the same commands but failed to patch AMI.
Tried the below script:
<powershell> powershell.exe -Command Start-Service -Name AmazonSSMAgent </powershell> <persist>true</persist>
Tried to start and restart SSM agent.
Restart-Service AmazonSSMAgent
base64: UmVzdGFydC1TZXJ2aWNlIEFtYXpvblNTTUFnZW50Cg==
YAML sample:
mainSteps:
- name: startInstances
action: 'aws:runInstances'
timeoutSeconds: 1200
maxAttempts: 1
onFailure: Abort
inputs:
ImageId: '{{ sourceAMIid }}'
InstanceType: m3.large
MinInstanceCount: 1
MaxInstanceCount: 1
SubnetId: '{{ subnetId }}'
UserData: UmVzdGFydC1TZXJ2aWNlIEFtYXpvblNTTUFnZW50Cg==
Still, I am seeing the below error.
Step timed out while step is verifying the SSM Agent availability on the target instance(s). SSM Agent on Instances: [i-xxxxxxxx] are not functioning. Please refer to Automation Service Troubleshooting Guide for more diagnosis details.
Your suggestion/solutions help me a lot. Thank you.

I have troubleshoot and fixed the issue.
The issue was security group is missing on the instance. To communicate the SendCommand API of SSM Service with SSM agent on instance needs a security group that allows HTTPS port 443. I have attached SG allowing 443 port then the SSM agent can communicate with the EC2 instance.
EC2 instance IAM role should have SSM agent full access policy attached to it.
We might get the same issue when the SSM agent is not running on the EC2 instance, for that we need to provide user-data or add a new step in YAML or JSON on systems manager Documents.
If you are working on a windows instance use the below script to start the SSM agent. If its Linux server uses Linux script/commands.
{
"schemaVersion": "2.0",
"description": "Start SSM agent on instance",
"mainSteps": [
{
"action": "aws:runPowerShellScript",
"name": "runPowerShellScript",
"inputs": {
"runCommand": [
"Start-Service AmazonSSMAgent"
]
}
}
]
}

Related

AWS credentials not found for celery-k8s deployment

I'm trying to run dagster using celery-k8s and using the examples/celery-k8s as a start. upon running the pipeline from playground I get
Initialization of resources [s3, io_manager] failed.
botocore.exceptions.NoCredentialsError: Unable to locate credentials
I have configured aws credentials in env variables as mentioned in the document
deployments:
- name: "user-code-deployment-test"
image:
repository: "somasays/dagster-usercode-example"
tag: "0.5"
pullPolicy: Always
dagsterApiGrpcArgs:
- "-f"
- "/workspace/repo.py"
port: 3030
env:
AWS_ACCESS_KEY_ID: AAAAAAAAAAAAAAAAAAAAAAAAA
AWS_SECRET_ACCESS_KEY: qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq
AWS_DEFAULT_REGION: eu-central-1
and I can also see these values are set in the env variables of the pod and can also access the s3 location after pip install awscli and aws s3 ls see the screenshot below the job pod however throws Unable to locate credentials
Please help
The deployment configuration applies to the user code servers. Meanwhile the celery executor runs your pipeline code in separate kubernetes jobs. To provide your secrets there, you will want to configure the env_secrets field of the celery-k8s executor in your pipeline run config.
See https://github.com/dagster-io/dagster/blob/master/python_modules/libraries/dagster-k8s/dagster_k8s/job.py#L321-L327 for details on the config.

EUCA 4.4.5 VPCMIDO Instances Terminate at Launch

I have achieved a small test cloud on 3 pieces of hardware. It works fine when in EDGE mode but when I try to configure it for VPCMIDO, new instances begin to launch but then timeout and move to a terminated state. I can also see the instances' initial volume and config data appear in the NC and CC data directories. Below is my system layout and network.json.
HOST 1 : CLC/UFS/WALRUS/MIDO CLUSTER/MIDO GATEWAY/MIDOLMAN AGENT:
em1 (All Services including Mido Cluster): 10.0.0.21
em3 (Target VPCMIDO Adapter): 10.0.0.22
HOST 2 : CC/SC
em1 : 10.0.0.23
HOST 3 : NC/MIDOLMAN AGENT
em1 : 10.0.0.24
{
"Mido": {
"Gateways": [
{
"Ip": "10.0.0.22",
"ExternalDevice": "em3",
"ExternalCidr": "192.168.0.0/16",
"ExternalIp": "192.168.0.2",
"ExternalRouterIp": "192.168.0.1"
}
]
},
"Mode": "VPCMIDO",
"PublicIps": [
"10.0.100.1-10.0.100.254"
]
}
I may be misunderstanding the intent of reserving an interface just for the mido gateway. All of my eucalyptus/zookeeper/cassandra/midonet configs use the 10.0.0.21 interface and seem to communicate fine. The midonet tunnel zone reports my CLC host and NC host successfully in the tunnel zone. The only part of my config that references the interface I intend to use for the midonet gateway is the network.json. No errors were returned at any time during my config so I think I may be missing something conceptual.
You may need to start eucanetd as described here:
https://docs.eucalyptus.cloud/eucalyptus/4.4.5/index.html#install-guide/starting_euca_clc.html
The eucanetd component in vpcmido mode runs on the cloud controller and is responsible for controlling midonet.
When eucanetd is not running instances will fail to start as the required network resources will not be created.
I configured a bridge on the NC and instances were able to launch and I no longer got an error in my nc.log. Docs and the eucalyptus.conf file comments tell me I shouldn't need to do this in VPCMIDO netowrking mode: https://docs.eucalyptus.cloud/eucalyptus/4.4.5/index.html#install-guide/configuring_bridge.html
Despite all that adding the bridge fixed this issue.

Jenkins Artifactory plugin: a script in "Declarative Pipeline Syntax" – how to bypass the configured proxy?

https://www.jfrog.com/confluence/display/RTF/Working+With+Pipeline+Jobs+in+Jenkins
This article attempts to describe all the details as well for "Declarative Pipeline Syntax" as for "Scripted Pipeline Syntax".
but for "Declarative …" it does not describe how to bypass the proxy whereas for "Scripted …" it does: "server.bypassProxy = true".
So how would I specify bypassing the proxy in a Jenkins pipeline script with "Declarative Pipeline Syntax"?
In rtServer add bypassProxy: true:
rtServer (
id: "Artifactory-1",
url: "http://my-artifactory-domain/artifactory",
username: "user",
password: "password",
bypassProxy: true
)

After changing paswords in vault.yml, deployment fails in trellis

I had a wordpress site setup using Trellis. Initially I had set up the server and deployed without encrypting the vault.yml.
Once everything was working fine I changed the passwords in vault.yml and encrypted the file. But my deployment fails now.
And I get the following error-
TASK [deploy : WordPress Installed?]
**************************
System info:
Ansible 2.6.3; Darwin
Trellis version (per changelog): "Allow customizing Nginx `worker_connections`"
---------------------------------------------------
non-zero return code
Error: Error establishing a database connection. This either means that
the username and password information in your `wp-config.php` file is
incorrect or we can’t contact the database server at `localhost`. This
could mean your host’s database server is down.
fatal: [mysite.org]: FAILED! => {"changed": false,
"cmd": ["wp", "core", "is-installed", "--skip-plugins", "--skip-
themes", "--require=/srv/www/mysite.org/shared/tmp_multisite_constants.php"], "delta":
"0:00:00.224955", "end": "2019-01-04 16:59:01.531111",
"failed_when_result": true, "rc": 1, "start": "2019-01-04
16:59:01.306156", "stderr_lines": ["Error: Error establishing a
database connection. This either means that the username and password
information in your `wp-config.php` file is incorrect or we can’t
contact the database server at `localhost`. This could mean your host’s
database server is down."], "stdout": "", "stdout_lines": []}
to retry, use: --limit
#/Users/praneethavelamuri/Desktop/path/to/my/project/trellis/deploy.retry
Is there any step I missed? I followed these steps-
ansible-playbook server.yml -e env=staging
./bin/deploy.sh staging mysite.org
change passwords in staging/vault.yml
set vault password
inform ansible about password
encrypt the file
commit the file and push the repo
re deploy and then I get the error!
I got it solved. I have changed the sudo user password too in my vault. so ssh into server and changing sudo password to the password mentioned in vault and then provisioning it and then deploying solved the issue.

Why does Meteor Up (MUP) fail on authentication?

I am currently trying to deploy a Meteor project to an external server for the first time. The server is hosted by DigitalOcean, running ubuntu 16.04, and has an SSH key set up for password-free access.
The error I am getting from MUP is:
[159.203.165.13] - Setup Docker
events.js:165
throw er; // Unhandled 'error' event
^
Error: All configured authentication methods failed
at tryNextAuth (/usr/lib/node_modules/mup/node_modules/nodemiral/node_modules/ssh2/lib/client.js:290:17)
at SSH2Stream.onUSERAUTH_FAILURE (/usr/lib/node_modules/mup/node_modules/nodemiral/node_modules/ssh2/lib/client.js:469:5)
at SSH2Stream.emit (events.js:180:13)
at parsePacket (/usr/lib/node_modules/mup/node_modules/ssh2-streams/lib/ssh.js:3647:10)
at SSH2Stream._transform (/usr/lib/node_modules/mup/node_modules/ssh2-streams/lib/ssh.js:551:13)
at SSH2Stream.Transform._read (_stream_transform.js:185:10)
at SSH2Stream._read (/usr/lib/node_modules/mup/node_modules/ssh2-streams/lib/ssh.js:212:15)
at SSH2Stream.Transform._write (_stream_transform.js:173:12)
at doWrite (_stream_writable.js:410:12)
at writeOrBuffer (_stream_writable.js:396:5)
at SSH2Stream.Writable.write (_stream_writable.js:294:11)
at Socket.ondata (_stream_readable.js:651:20)
at Socket.emit (events.js:180:13)
at addChunk (_stream_readable.js:274:12)
at readableAddChunk (_stream_readable.js:261:11)
at Socket.Readable.push (_stream_readable.js:218:10)
Emitted 'error' event at:
at tryNextAuth (/usr/lib/node_modules/mup/node_modules/nodemiral/node_modules/ssh2/lib/client.js:292:12)
at SSH2Stream.onUSERAUTH_FAILURE (/usr/lib/node_modules/mup/node_modules/nodemiral/node_modules/ssh2/lib/client.js:469:5)
[... lines matching original stack trace ...]
at Socket.Readable.push (_stream_readable.js:218:10)
At this point I have tried several solutions involving the mup file as per other recommendations such as:
1) Adding in a password - Gives the exact same error as though the change didn't occur.
2) Adding in the same SSH key that I use for authentication to the server as per digital ocean - Says 'privateKey value does not contain a (valid) private key'. I have tried both the key that is used for authentication to the server and every other key I could find short of generating a new one just for Meteor's use.
3) Leaving both blank and allowing it to 'try' ssh-agent - pretends it doesn't know what ssh-agent is and throws an error saying the same thing as when I use a password.
I have looked through and followed the same instructions in the following article: http://meteortips.com/deployment-tutorial/digitalocean-part-1/
This article assumes that there are only two possible states. One being that an ssh key has NOT been used or set up so it needs to be generated. The second being that an ssh key exists and is set up exactly where they expect it. Unfortunately I seem to be in a different situation. I generated a key using putty prior to setting up the D.O server and created the droplet using that. After creation, the file did not exist. The only thing in the ~/.ssh/ directory was a single file named "authorized_keys" that held the key I would use to connect to the server. This file cannot be used, nor any file on the server in the other ssh key locations.I also tried copying over the file directly onto the server to no avail as well.
In some vain hope at finding a solution I also tried running these same commands in both the Meteor build bundle an the source code folder. Neither worked. I should mention that although this is the only article I still have open to try for a solution, I have tried every one I could find using MUP.
If anyone can point me in the right direction with this so I can stop flailing wildly in the dark I would be incredibly grateful.
Edit: As requested, below is the current mup.js file with removed credentials
module.exports = {
servers: {
one: {
// TODO: set host address, username, and authentication method
host: '111.111.111.11',
username: 'root',
// ssh-agent: '/home/Meteor/MeteorKey.pem'
pem: '~/.ssh/id_rsa.pub'
// password: 'password1'
// or neither for authenticate from ssh-agent
}
},
app: {
// TODO: change app name and path
name: 'app-name',
path: '../',
servers: {
one: {},
},
buildOptions: {
serverOnly: true,
},
env: {
// TODO: Change to your app's url
// If you are using ssl, it needs to start with https://
ROOT_URL: 'http://www.app-name.com',
MONGO_URL: 'mongodb://mongodb/meteor',
MONGO_OPLOG_URL: 'mongodb://mongodb/local',
},
docker: {
// change to 'abernix/meteord:base' if your app is using Meteor 1.4 - 1.5
image: 'abernix/meteord:node-8.4.0-base',
},
// Show progress bar while uploading bundle to server
// You might need to disable it on CI servers
enableUploadProgressBar: true
},
mongo: {
version: '3.4.1',
servers: {
one: {}
}
},
// (Optional)
// Use the proxy to setup ssl or to route requests to the correct
// app when there are several apps
// proxy: {
// domains: 'mywebsite.com,www.mywebsite.com',
The error message you are receiving:
Error: All configured authentication methods failed
Means that the SSH connection is failing. So the credentials you are using (pity you removed them from the config) are not working. Try using a command line ssh using these same credentials, and then trouble shoot that - once you can ssh into the server, then mup should be able to do it's work.
You can get more information out of ssh by specifying one or more -v parameters, eg:
ssh -v -v my_user#remote.com
and it will give you information about the authentication methods it is trying as it goes through them. This will help you narrow down the problem.

Resources