AWS Code Deploy - Script at specified location: scripts/validate_service.sh failed with exit code 1 - aws-code-deploy

My deployments fail on last step Validate Service with error message:
The overall deployment failed because too many individual instances failed deployment, too few healthy instances are available for deployment, or some instances in your deployment group are experiencing problems.
Events log
No lines are selected.
My validate_service.sh contain
#!/bin/bash
# verify we can access our webpage successfully
curl -v --silent localhost:80 2>&1 | grep Welcome
Can someone advice what should I change ?

Script return value matters. Yours looks good to me. I just added couple of seconds to wait until application starts up.
In case you use bash -x together with pipeline of commands, you better add shopt -s pipefail so all pipeline fails when one of the commands fails.
Checkout my script:
#!/bin/bash
sleep 5
curl http://localhost:3009 | grep Welcome

Related

How to avoid duplicating workers with Symfony local server?

I found out that I have quite many Symfony local web server workers registered (around ~35), and the number keeps growing. I usually just start server with symfony serve and then kill it (Ctrl + \) when no longer needed. Apparently killing it leaves a worker behind, as seen in symfony server:status. Running symfony serve again just creates a new worker.
symfony server:status output:
Local Web Server
Not Running
Workers
PID 6327: /usr/bin/php7.2 -S 127.0.0.1:43653 -d variables_order=EGPCS /home/mindaugas/.symfony/php/83247c3521c3ac3990bf3f823ef473db0a9445e1-router.php
PID 24596: /usr/bin/php7.2 -S 127.0.0.1:37789 -d variables_order=EGPCS /home/mindaugas/.symfony/php/83247c3521c3ac3990bf3f823ef473db0a9445e1-router.php
PID 6575: /usr/bin/php7.4 -S 127.0.0.1:42505 -d variables_order=EGPCS /home/mindaugas/.symfony/php/83247c3521c3ac3990bf3f823ef473db0a9445e1-router.php
PID 41550: /usr/bin/php7.4 -S 127.0.0.1:36313 -d variables_order=EGPCS /home/mindaugas/.symfony/php/83247c3521c3ac3990bf3f823ef473db0a9445e1-router.php
...
Environment Variables
None
So my questions regarding this:
#1: is it possible to quickly kill the server? I assume symfony server:stop is more correct way, but that requires additional console window and entering the command.
#2: how to kill those workers that are registered from previous sessions? Trying e.g. kill 6327 says that there's no such process. Also they're not gone after system restart.
Those extra workers are bothering me because for each one of them the server log output in the console is duplicated. So right now on each request to the server I get around 3k lines of log output in the console. Which makes it pretty useless.
I have the same problem after upgrading to Symfony CLI version v4.19.0...
My (very) bad workaround:
rm /home/myusername/.symfony/var/83247c3521c3ac3990bf3f823ef473db0a9445e1/*
Edit: this answer is not accurate, as hinted a by #CrSrr's answer above.
The symfony command adds data to both the ./log and ./var directories. Deleting entries in only one of those does not remove the appearance of non-existent workers in the project directory. I was fooled by checking status in a directory where the server:start had never been run.
A bug report is on file with symfony here.
Just faced with a similar issue. The PIDs were not to be found.
PS G:\workspace\joined> symfony server:status
Local Web Server
Not Running
Workers
PID 7732: C:\php\php-cgi.exe -b 63801 -d error_log=C:\Users\George\.symfony\log\e79ad2f4b30a2f0a35c3b5ab08772770b382a3d6.log
PID 19324: C:\php\php-cgi.exe -b 62927 -d error_log=C:\Users\George\.symfony\log\e79ad2f4b30a2f0a35c3b5ab08772770b382a3d6.log
PID 17968: C:\php\php-cgi.exe -b 50197 -d error_log=C:\Users\George\.symfony\log\e79ad2f4b30a2f0a35c3b5ab08772770b382a3d6.log
PID 14040: C:\php\php-cgi.exe -b 55075 -d error_log=C:\Users\George\.symfony\log\e79ad2f4b30a2f0a35c3b5ab08772770b382a3d6.log
Environment Variables
None
In the Windows OS, the log files are kept in %USERPROFILE%.symfony. There's most likely a similar location in your home directory. Deleting all the contents of that directory allowed a new Windows Terminal app to show:
PS G:\workspace> symfony server:status
Local Web Server
Not Running
Workers
No Workers
Environment Variables
None
do symfony serve:stopto stopped the server and do against symfony serve to run the server good luck

how to clear failing DAGs using the CLI in airflow

I have some failing DAGs, let's say from 1st-Feb to 20th-Feb. From that date upword, all of them succeeded.
I tried to use the cli (instead of doing it twenty times with the Web UI):
airflow clear -f -t * my_dags.my_dag_id
But I have a weird error:
airflow: error: unrecognized arguments: airflow-webserver.pid airflow.cfg airflow_variables.json my_dags.my_dag_id
EDIT 1:
Like #tobi6 explained it, the * was indeed causing troubles.
Knowing that, I tried this command instead:
airflow clear -u -d -f -t ".*" my_dags.my_dag_id
but it's only returning failed task instances (-f flag). -d and -u flags don't seem to work because taskinstances downstream and upstream the failed ones are ignored (not returned).
EDIT 2:
like #tobi6 suggested, using -s and -e permits to select all DAG runs within a date range. Here is the command:
airflow clear -s "2018-04-01 00:00:00" -e "2018-04-01 00:00:00" my_dags.my_dag_id.
However, adding -f flag to the command above only returns failed task instances. is it possible to select all failed task instances of all failed DAG runs within a date range ?
If you are using an asterik * in the Linux bash, it will automatically expand the content of the directory.
Meaning it will replace the asterik with all files in the current working directory and then execute your command.
This will help to avoid the automatic expansion:
"airflow clear -f -t * my_dags.my_dag_id"
One solution I've found so far is by executing sql(MySQL in my case):
update task_instance t left join dag_run d on d.dag_id = t.dag_id and d.execution_date = t.execution_date
set t.state=null,
d.state='running'
where t.dag_id = '<your_dag_id'
and t.execution_date > '2020-08-07 23:00:00'
and d.state='failed';
It will clear all tasks states on failed dag_runs, as button 'clear' pressed for entire dag run in web UI.
In airflow 2.2.4 the airflow clear command was deprecated.
You could now run:
airflow tasks clear -s <your_start_date> -e <end_date> <dag_id>

paramiko and nohup ''

OK so I have paramiko v2.2.1 and I am trying to login to a machine and restart a service. Inside the service scripts it basically starts a process via nohup. However if I allow paramiko to disconnect as soon as it is done the process started terminates with a PIPE signal when it writes to stdout.
If I start the service by ssh'ing into the box and manually starting it there is no issue and it runs in the background fine. Also if I add long sleep 10 before disconnecting (close) paramiko it also seems to work just fine.
The service is started via a init.d script via a line like this:
env LD_LIBRARY_PATH=$bin_path nohup $bin_path/ServerLoop.sh \
"$bin_path/Service service args" "$#" &
Where ServerLoop.sh simply calls the service forever in a loop like this so it will never die:
SERVER=$1
shift
ARGS=$#
logger $ARGS
while [ 1 ]; do
$SERVER $ARGS
logger "$SERVER terminated with exit code: $STATUS. Server has been restarted"
sleep 1
done
I have noticed when I start the service by ssh'ing into the box I get a nohup.out file written to the root. However when I run through paramiko I get no nohup.out written anywhere on the system ... ie this after I manually ssh into the box and start the service:
root#ts4700:/mnt/mc.fw/bin# find / -name "nohup*"
/usr/bin/nohup
/usr/share/man/man1/nohup.1.gz
/nohup.out
And this is after I run through paramiko:
root#ts4700:/mnt/mc.fw/bin# find / -name "nohup*"
/usr/bin/nohup
/usr/share/man/man1/nohup.1.gz
As I understand it nohup will only redirect the output to nohup.out if "If standard output is a terminal" (from the manual), otherwise it thinks it is saving the output to a file so it does not redirect. Hence I tried the following:
In [43]: import paramiko
In [44]: paramiko.__version__
Out[44]: '2.2.1'
In [45]: ssh = paramiko.SSHClient()
In [46]: ssh.set_missing_host_key_policy(AutoAddPolicy())
In [47]: ssh.connect(ip, username='root', password=not_for_so_sorry, look_for_keys=False, allow_agent=False)
In [48]: stdin, stdout, stderr = ssh.exec_command("tty")
In [49]: stdout.read()
Out[49]: 'not a tty\n'
So I am thinking that nohup is not redirecting to nohup.out when I run it through paramiko because tty is not returning a terminal. I don't know why adding a sleep(10) would fix this though as the service if run on the command line is quite verbose.
I have also noticed that if the service is started from a manual ssh its tty in the ps ax output is still set to the ssh tty ... however if the process is started by paramiko its tty in the ps ax output is set to "?" .. since both processes are run through nohup I would have expected this to be the same.
If the problem is that nohup is indeed not redirecting the output to nohup.out because of the tty is there a way to force this to happen or a better way to run this sort of command via paramiko?
Thanks all, any help with this would be great :)

Run multiple instances of RStudio in a web browser

I have RStudio server installed on a remote aws server (ubuntu) and want to run several projects at the same time (one of which takes lots of time to finish). On Windows there is a simple GUI solution like 'Open Project in New Window'. Is there something similar for rstudio server?
Simple question, but failed to find a solution except this related question for Macs, which offers
Run multiple rstudio sessions using projects
but how?
While running batch scripts is certainly a good option, it's not the only solution. Sometimes you may still want interactive use in different sessions rather than having to do everything as batch scripts.
Nothing stops you from running multiple instances of RStudio server on your Ubuntu server on different ports. (I find this particularly easy to do by launching RStudio through docker, as outlined here. Because an instance will keep running even when you close the browser window, you can easily launch several instances and switch between them. You'll just have to login again when you switch.
Unfortunately, RStudio-server still prevents you having multiple instances open in the browser at the same time (see the help forum). This is not a big issue as you just have to log in again, but you can work around it by using different browsers.
EDIT: Multiple instances are fine, as long as they are not on the same browser, same browser-user AND on the same IP address. e.g. a session on 127.0.0.1 and another on 0.0.0.0 would be fine. More importantly, the instances keep on running even if they are not 'open', so this really isn't a problem. The only thing to note about this is you would have to log back in to access the instance.
As for projects, you'll see you can switch between projects using the 'projects' button on the top right, but while this will preserve your other sessions I do not think the it actually supports simultaneous code execution. You need multiple instances of the R environment running to actually do that.
UPDATE 2020 Okay, it's now 2020 and there's lots of ways to do this.
For running scripts or functions in a new R environment, check out:
the callr package
The RStudio jobs panel
Run new R sessions or scripts from one or more terminal sessions in the RStudio terminal panel
Log out and log in to the RStudio-server as a different user (requires multiple users to be set up in the container, obviously not a good workflow for a single user but just noting that many different users can access the same RStudio server instance no problem.
Of course, spinning up multiple docker sessions on different ports is still a good option as well. Note that many of the ways listed above still do not allow you to restart the main R session, which prevents you from reloading installed packages, switching between projects, etc, which is clearly not ideal. I think it would be fantastic if switching between projects in an RStudio (server) session would allow jobs in the previously active project to keep running in the background, but have no idea if that's in the cards for the open source version.
Often you don't need several instances of Rstudio - in this case just save your code in .R file and launch it using ubuntu command prompt (maybe using screen)
Rscript script.R
That will launch a separate R session which will do the work without freezing your Rstudio. You can pass arguments too, for example
# script.R -
args <- commandArgs(trailingOnly = TRUE)
if (length(args) == 0) {
start = '2015-08-01'
} else {
start = args[1]
}
console -
Rscript script.R 2015-11-01
I think you need R Studio Server Pro to be able to log in with multiple users/sessions.
You can see the comparison table below for reference.
https://www.rstudio.com/products/rstudio-server-pro/
Installing another instance of rstudio server is less than ideal.
Linux server admins, fear not. You just need root access or a kind admin.
Create a group to use: groupadd Rwarrior
Create an additional user with same home directory as your primary Rstudio login:
useradd -d /home/user1 user2
Add primary and new user into Rwarrior group:
gpasswd -a user2 Rwarrior
gpasswd -a user1 Rwarrior
Take care of the permissions for your primary home directory:
cd /home
chown -R user1:Rwarrior /home/user1
chmod -R 770 /home/user1
chmod g+s /home/user1
Set password for the new user:
passwd user2
Open a new browser window in incognito/private browsing mode and login to Rstudio with the new user you created. Enjoy.
I run multiple RStudio servers by isolating them in Singularity instances. Download the Singularity image with the command singularity pull shub://nickjer/singularity-rstudio
I use two scripts:
run-rserver.sh:
Find a free port
#!/bin/env bash
set -ue
thisdir="$(dirname "${BASH_SOURCE[0]}")"
# Return 0 if the port $1 is free, else return 1
is_port_free(){
port="$1"
set +e
netstat -an |
grep --color=none "^tcp.*LISTEN\s*$" | \
awk '{gsub("^.*:","",$4);print $4}' | \
grep -q "^$port\$"
r="$?"
set -e
if [ "$r" = 0 ]; then return 1; else return 0; fi
}
# Find a free port
find_free_port(){
local lower_port="$1"
local upper_port="$2"
for ((port=lower_port; port <= upper_port; port++)); do
if is_port_free "$port"; then r=free; else r=used; fi
if [ "$r" = "used" -a "$port" = "$upper_port" ]; then
echo "Ports $lower_port to $upper_port are all in use" >&2
exit 1
fi
if [ "$r" = "free" ]; then break; fi
done
echo $port
}
port=$(find_free_port 8080 8200)
echo "Access RStudio Server on http://localhost:$port" >&2
"$thisdir/cexec" \
rserver \
--www-address 127.0.0.1 \
--www-port $port
cexec:
Create a dedicated config directory for each instance
Create a dedicated temporary directory for each instance
Use the singularity instance mechanism to avoid that forked R sessions are adopted by PID 1 and stay around after the rserver has shut down. Instead, they become children of the Singularity instance and are killed when that shuts down.
Map the current directory to the directory /data inside the container and set that as home folder (this step might not be nessecary if you don't care about reproducible paths on every machine)
#!/usr/bin/env bash
# Execute a command in the container
set -ue
if [ "${1-}" = "--help" ]; then
echo <<EOF
Usage: cexec command [args...]
Execute `command` in the container. This script starts the Singularity
container and executes the given command therein. The project root is mapped
to the folder `/data` inside the container. Moreover, a temporary directory
is provided at `/tmp` that is removed after the end of the script.
EOF
exit 0
fi
thisdir="$(dirname "${BASH_SOURCE[0]}")"
container="rserver_200403.sif"
# Create a temporary directory
tmpdir="$(mktemp -d -t cexec-XXXXXXXX)"
# We delete this directory afterwards, so its important that $tmpdir
# really has the path to an empty, temporary dir, and nothing else!
# (for example empty string or home dir)
if [[ ! "$tmpdir" || ! -d "$tmpdir" ]]; then
echo "Error: Could not create temp dir $tmpdir"
exit 1
fi
# check if temp dir is empty (this might be superfluous, see
# https://codereview.stackexchange.com/questions/238439)
tmpcontent="$(ls -A "$tmpdir")"
if [ ! -z "$tmpcontent" ]; then
echo "Error: Temp dir '$tmpdir' is not empty"
exit 1
fi
# Start Singularity instance
instancename="$(basename "$tmpdir")"
# Maybe also superfluous (like above)
rundir="$(readlink -f "$thisdir/.run/$instancename")"
if [ -e "$rundir" ]; then
echo "Error: Runtime directory '$rundir' exists already!" >&2
exit 1
fi
mkdir -p "$rundir"
singularity instance start \
--contain \
-W "$tmpdir" \
-H "$thisdir:/data" \
-B "$rundir:/data/.rstudio" \
-B "$thisdir/.rstudio/monitored/user-settings:/data/.rstudio/monitored/user-settings" \
"$container" \
"$instancename"
# Delete the temporary directory after the end of the script
trap "singularity instance stop '$instancename'; rm -rf '$tmpdir'; rm -rf '$rundir'" EXIT
singularity exec \
--pwd "/data" \
"instance://$instancename" \
"$#"

Why OpenMPI uses a different server given a different -n setting?

I am testing out OpenMPI, provided and compiled by another user, (I am using soft link to his directories for all bin, include, etc - all the mandatory directories) but I ran into this weird thing:
First of all, if I ran mpirun with -n setting <= 10, I can run this below. testrunmpi.py simply prints out "run." from each core.
# I am in serverA.
bash-3.2$ /home/karl/bin/mpirun -n 10 ./testrunmpi.py
run.
run.
run.
run.
run.
run.
run.
run.
run.
run.
However, when I tried running -n more than 10, I will run into this:
bash-3.2$ /home/karl/bin/mpirun -n 24 ./testrunmpi.py
karl#serverB's password: Could not chdir to home directory /home/karl: No such file or directory
bash: /home/karl/bin/orted: No such file or directory
--------------------------------------------------------------------------
A daemon (pid 19203) died unexpectedly with status 127 while attempting
to launch so we are aborting.
There may be more information reported by the environment (see above).
This may be because the daemon was unable to find all the needed shared
libraries on the remote node. You may set your LD_LIBRARY_PATH to have the
location of the shared libraries on the remote nodes and this will
automatically be forwarded to the remote nodes.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun noticed that the job aborted, but has no info as to the process
that caused that situation.
--------------------------------------------------------------------------
bash-3.2$
bash-3.2$
Permission denied, please try again.
karl#serverB's password:
Permission denied, please try again.
karl#serverB's password:
I see that the work is dispatched to serverB, while I was on serverA. I don't have any account on serverB. But if I invoke mpirun -n <= 10, the work will be on serverA.
This is strange, so I checked out /home/karl/etc/openmpi-default-hostfile, and tried set the following:
serverA slots=24 max_slots=24
serverB slots=0 max_slots=32
But the problem persists and still gives out the same error message above. What must I do in order to have my program run on serverA only?
The default hostfile in Open MPI is system-wide, i.e. its location is determined while the library is being built and installed and there is no user-specific version of it. The actual location can be obtained by running the ompi_info command like this:
$ ompi_info --param orte orte | grep orte_default_hostfile
MCA orte: parameter "orte_default_hostfile" (current value: <LOOK HERE>, data source: default value)
You can override the list of hosts in several different ways. First, you can provide your own hostfile via the -hostfile option to mpirun. If so, you don't have to put hosts with zero slots inside it - simply omit machines that you have no access to. For example:
localhost slots=10 max_slots=10
serverA slots=24 max_slots=24
You can also change the path to the default hostfile by setting the orte_default_hostfile MCA parameter:
$ mpirun --mca orte_default_hostfile /path/to/your/hostfile -n 10 executable
Instead of passing each time the --mca option, you can set the value in an exported environment variable called OMPI_MCA_orte_default_hostfile. This could be set in your shell's dot-rc file, e.g. in .bashrc if using Bash.
You can also specify the list of nodes directly via the -H (or -host) option.

Resources