How to prevent a step failing in Bitbucket Pipelines? - pipeline

I am running all my test cases and some of them get fail sometimes, pipeline detects it and fail the step and build. this blocks the next step to be executed (zip the report folder). I want to send that zip file as an email attachment.
Here is my bitbucket-pipelines.yml file
custom: # Pipelines that can only be triggered manually
QA2: # The name that is displayed in the list in the Bitbucket Cloud GUI
- step:
image: openjdk:8
caches:
- gradle
size: 2x # double resources available for this step to 8G
script:
- apt-get update
- apt-get install zip
- cd config/geb
- ./gradlew -DBASE_URL=qa2 clean BSchrome_win **# This step fails**
- cd build/reports
- zip -r testresult.zip BSchrome_winTest
after-script: # On test execution completion or build failure, send test report to e-mail lists
- pipe: atlassian/email-notify:0.3.11
variables:
<<: *email-notify-config
TO: 'email#email.com'
SUBJECT: "Test result for QA2 environment"
BODY_PLAIN: |
Please find the attached test result report to the email.
ATTACHMENTS: config/geb/build/reports/testresult.zip
The steps:
- cd build/reports
and
- zip -r testresult.zip BSchrome_winTest
do not get executed because - ./gradlew -DBASE_URL=qa2 clean BSchrome_win failed
I don't want bitbucket to fail the step and stop the Queue's step from executing.

The bitbucket-pipelines.yml file is just running bash/shell commands on Unix. The script runner looks for the return status codes of each command, to see if it succeeded (status = 0) or failed (status = non-zero). So you can use various techniques to control this status code:
Add " || true" to the end of your command
./gradlew -DBASE_URL=qa2 clean BSchrome_win || true
When you add "|| true" to the end of a shell command, it means "ignore any errors, and always return a success code 0". More info:
Bash ignoring error for a particular command
https://www.cyberciti.biz/faq/bash-get-exit-code-of-command/
Use "gradlew --continue" flag
./gradlew -DBASE_URL=qa2 clean BSchrome_win --continue
The "--continue" flag can be used to prevent a single test failure from stopping the whole task. So if one test or sub-step fails, gradle will try to continue running the other tests until all are run. However, it may still return an error, if an important step failed. More info: Ignore Gradle Build Failure and continue build script?
Move the 2 steps to the after-script section
after-script:
- cd config/geb # You may need this, if the current working directory is reset. Check with 'pwd'
- cd build/reports
- zip -r testresult.zip BSchrome_winTest
If you move the 2 steps for zip creation to the after-script section, then they will always run, regardless of the success/fail status of the previous step.

A better solution
If you want all the commands in your script to execute regardless of errors then put set +e at the top of your script.
If you just want to ignore the error for one particular command then put set +e before that command and set -e after it.
Example:
- set +e
- ./gradlew -DBASE_URL=qa2 clean BSchrome_win **# This step fails**
- set -e
Also valid for group of commands:
- set +e
- cd config/geb
- ./gradlew -DBASE_URL=qa2 clean BSchrome_win **# This step fails**
- cd config/geb
- set -e

I had a similar problem I had a command that normally takes 1 minute, but sometimes stalls and hits the 2 hour max build timeout (and corrupts my cypress installation)...
I wrapped my command with the timeout command and then ORd the result with true
eg. I changed this:
- yarn
to this:
- timeout 5m yarn || yarn cypress install --force || true # Sometimes this stalls, so kill it if it takes more than 5m and reinstall cypress
- timeout 5m yarn # Try again (in case it failed on previous line). Should be quick

Related

AWS Code Deploy - Script at specified location: scripts/validate_service.sh failed with exit code 1

My deployments fail on last step Validate Service with error message:
The overall deployment failed because too many individual instances failed deployment, too few healthy instances are available for deployment, or some instances in your deployment group are experiencing problems.
Events log
No lines are selected.
My validate_service.sh contain
#!/bin/bash
# verify we can access our webpage successfully
curl -v --silent localhost:80 2>&1 | grep Welcome
Can someone advice what should I change ?
Script return value matters. Yours looks good to me. I just added couple of seconds to wait until application starts up.
In case you use bash -x together with pipeline of commands, you better add shopt -s pipefail so all pipeline fails when one of the commands fails.
Checkout my script:
#!/bin/bash
sleep 5
curl http://localhost:3009 | grep Welcome

how to clear failing DAGs using the CLI in airflow

I have some failing DAGs, let's say from 1st-Feb to 20th-Feb. From that date upword, all of them succeeded.
I tried to use the cli (instead of doing it twenty times with the Web UI):
airflow clear -f -t * my_dags.my_dag_id
But I have a weird error:
airflow: error: unrecognized arguments: airflow-webserver.pid airflow.cfg airflow_variables.json my_dags.my_dag_id
EDIT 1:
Like #tobi6 explained it, the * was indeed causing troubles.
Knowing that, I tried this command instead:
airflow clear -u -d -f -t ".*" my_dags.my_dag_id
but it's only returning failed task instances (-f flag). -d and -u flags don't seem to work because taskinstances downstream and upstream the failed ones are ignored (not returned).
EDIT 2:
like #tobi6 suggested, using -s and -e permits to select all DAG runs within a date range. Here is the command:
airflow clear -s "2018-04-01 00:00:00" -e "2018-04-01 00:00:00" my_dags.my_dag_id.
However, adding -f flag to the command above only returns failed task instances. is it possible to select all failed task instances of all failed DAG runs within a date range ?
If you are using an asterik * in the Linux bash, it will automatically expand the content of the directory.
Meaning it will replace the asterik with all files in the current working directory and then execute your command.
This will help to avoid the automatic expansion:
"airflow clear -f -t * my_dags.my_dag_id"
One solution I've found so far is by executing sql(MySQL in my case):
update task_instance t left join dag_run d on d.dag_id = t.dag_id and d.execution_date = t.execution_date
set t.state=null,
d.state='running'
where t.dag_id = '<your_dag_id'
and t.execution_date > '2020-08-07 23:00:00'
and d.state='failed';
It will clear all tasks states on failed dag_runs, as button 'clear' pressed for entire dag run in web UI.
In airflow 2.2.4 the airflow clear command was deprecated.
You could now run:
airflow tasks clear -s <your_start_date> -e <end_date> <dag_id>

script not running through cron. Works fine when executed manually

I have a shell script where I am calling the hana.scr script from within the main script. The hana.scr contains the below code.
chmod 777 /data/auto/SLT.out; rm -rf /data/auto/SLT.out; hdbsql -n plhesappr61 -i 00 -u USR -p $#^F#$GGG -o /data/auto/SLT.out "Select sum("ERPACC_RPPCLNT200"."VABD"."NETWR") FROM "ACC_CLNT"."VFKH" inner join "ACC_CLNT"."VNRO" on ("ACC_CLNT"."VNRO"."VBELN"="ACC_CLNT"."VFKH"."VBELN") where FKART in ('ZFP1','ZFP3') and FKDAT = (select ADD_DAYS (TO_DATE (current_date, 'YYYY-MM-DD'), -1) "add_days" from dummy) group by FKDAT";
When I run the main script manually, it calls this script fine and the SLT.out file is also generated.
But when I schedule it in cron, the main script executes just fine, except for this hana.scr which does not seem to execute because it does not does not even remove the old file as per the second command rm in the hana.scr.
The cron is the same user as the one I run the script manually with.
I read that if the cron does not get the same environment to run, these issues happen. I tried to import the UNIX profile of the user before executing as the hana.scr as well, but was not successful.
Below is the cron command which runs the main script which calls the hana.scr from within: Used absolute paths..
37 0,2,3,4,5,6 * * * /data/esb/auto/./main.sh R > /data/esb/auto/main.log
The hana.scr is executed in the below manner:
./hana.scr;
check6=$? ;
if [ $check6 = "1" ]
then
echo "***********HANA counts were not generated**********"
fi
After /data/esb/auto/./main.sh your current directory is not changed to /data/esb/auto/. I think you started main.sh from the commandline while your $PWD was the same as where hana.scr was.
Test it from the commandline with
cd /
/data/esb/auto/main.sh
How to fix?
The worst solution is changing the crontab line into
37 0,2,3,4,5,6 * * * cd /data/esb/auto; /data/esb/auto/main.sh R > /data/esb/auto/main.log
That is a workaround for the crontab but main.sh still fails when started from a different directory.
Slightly better is using the complete path in main.sh when you call hana.scr
myscriptdir=/data/esb/auto
..
${myscriptdir}/hana.scr
When you change the folders you need to edit the files and repair the settings.
You can try to use some config file with settings or let main.sh figure out what in which directory it is:
Getting the source directory of a Bash script from within

Memory issue with meteor up (mup) on Digital Ocean

I couldn't find existing posts related to my issue. On a Digital Ocean Droplet, mup setup went fine, but when I try to deploy, I get the following error. Any ideas? Thanks!
root#ts:~/ts-deploy# mup deploy
Meteor Up: Production Quality Meteor Deployments
Building Started: /root/TS/
Bundling Error: code=137, error:
-------------------STDOUT-------------------
Figuring out the best package versions to use. This may take a moment.
-------------------STDERR-------------------
bash: line 1: 31217 Killed meteor build --directory /tmp/dc37af3e-eca0-4a19-bf1a-d6d38bb8f517
Below are the logs. node -v indicates I am using 0.10.31. How do I check which script is exiting with the error? Any other ideas? Thanks!
error: Forever detected script exited with code: 1
error: Script restart attempt #106
Meteor requires Node v0.10.29 or later.
error: Forever detected script exited with code: 1
error: Script restart attempt #107
Meteor requires Node v0.10.29 or later.
error: Forever detected script exited with code: 1
error: Script restart attempt #108
stepping down to gid: meteoruser
stepping down to uid: meteoruser
After I went back to an old backup of the DO Droplet, and re-ran mup setup and mup deploy, I now get this in the command line output
Building Started: /root/TS
Bundling Error: code=134, error:
-------------------STDOUT-------------------
Figuring out the best package versions to use. This may take a moment.
-------------------STDERR-------------------
FATAL ERROR: JS Allocation failed - process out of memory
bash: line 1: 1724 Aborted (core dumped) meteor build --directory /tmp/bfdbcb45-9c61-435f-9875-3fb304358996
and this in the logs:
>> stepping down to gid: meteoruser
>> stepping down to uid: meteoruser
Exception while invoking method 'login' TypeError: Cannot read property '0' of undefined
at ServiceConfiguration.configurations.remove.service (app/server/accounts.js:7:26)
at Object.Accounts.insertUserDoc (packages/accounts-base/accounts_server.js:1024)
at Object.Accounts.updateOrCreateUserFromExternalService (packages/accounts-base/accounts_server.js:1189)
at Package (packages/accounts-oauth/oauth_server.js:45)
at packages/accounts-base/accounts_server.js:383
at tryLoginMethod (packages/accounts-base/accounts_server.js:186)
at runLoginHandlers (packages/accounts-base/accounts_server.js:380)
at Meteor.methods.login (packages/accounts-base/accounts_server.js:434)
at maybeAuditArgumentChecks (packages/ddp/livedata_server.js:1594)
at packages/ddp/livedata_server.js:648
Exception while invoking method 'login' TypeError: Cannot read property '0' of undefined
at ServiceConfiguration.configurations.remove.service (app/server/accounts.js:7:26)
at Object.Accounts.insertUserDoc (packages/accounts-base/accounts_server.js:1024)
at Object.Accounts.updateOrCreateUserFromExternalService (packages/accounts-base/accounts_server.js:1189)
at Package (packages/accounts-oauth/oauth_server.js:45)
at packages/accounts-base/accounts_server.js:383
at tryLoginMethod (packages/accounts-base/accounts_server.js:186)
at runLoginHandlers (packages/accounts-base/accounts_server.js:380)
at Meteor.methods.login (packages/accounts-base/accounts_server.js:434)
at maybeAuditArgumentChecks (packages/ddp/livedata_server.js:1594)
at packages/ddp/livedata_server.js:648
The memory issue stems from using DigitalOcean's $5 Droplet. To solve the problem, I added swap to the server, as explained in detail below.
Create and enable the swap file using the dd command :
sudo dd if=/dev/zero of=/swapfile bs=1024 count=256k
“of=/swapfile” designates the file’s name. In this case the name is swapfile.
Next prepare the swap file by creating a linux swap area:
sudo mkswap /swapfile
The results display:
Setting up swapspace version 1, size = 262140 KiB
no label, UUID=103c4545-5fc5-47f3-a8b3-dfbdb64fd7eb
Finish up by activating the swap file:
sudo swapon /swapfile
You will then be able to see the new swap file when you view the swap summary.
swapon -s
Filename Type Size Used Priority
/swapfile file 262140 0 -1
This file will last on the virtual private server until the machine reboots. You can ensure that the swap is permanent by adding it to the fstab file.
Open up the file:
sudo nano /etc/fstab
Paste in the following line:
/swapfile none swap sw 0 0
Swappiness in the file should be set to 10. Skipping this step may cause both poor performance, whereas setting it to 10 will cause swap to act as an emergency buffer, preventing out-of-memory crashes.
You can do this with the following commands:
echo 10 | sudo tee /proc/sys/vm/swappiness
echo vm.swappiness = 10 | sudo tee -a /etc/sysctl.conf
To prevent the file from being world-readable, you should set up the correct permissions on the swap file:
sudo chown root:root /swapfile
sudo chmod 0600 /swapfile
This only worked for me by increasing the swap space to 1gb:
Make all swap off
sudo swapoff -a
Resize the swapfile
sudo dd if=/dev/zero of=/swapfile bs=1M count=1024
Make swapfile usable
sudo mkswap /swapfile
Make swapon again
sudo swapon /swapfile

Baffling Debian installation script: service starts but then quits mysteriously - Why?

I've been struggling with this one for a while and I'm thoroughly baffled.
I have this postinst Debian script that is supposed to start a service once installation (of the service executable) is complete. As best I can tell, the service does start successfully, but it then immediately and mysteriously quits. Restarting the service from the command-line works fine once Synaptic concludes.
I tried writing a dummy package to verify this. The dummy package installs /etc/init/service-dummy.conf and a symbolic link to that file, named /etc/init.d/service-dummy (just like the original service). The contents of service-dummy.conf are the same as service.conf. The dummy starts the service...and then the service keeps on running. So I can't even reproduce my problem!
The postinst script does this:
#!/bin/sh
set -e
case "$1" in
configure)
# (instructions which config, make and install the freshly installed source code)
ldconfig
echo "Install concluded"
if [ -e "/etc/init/service-dummy.conf" ]; then
echo "Starting service-dummy root service" | tee service.log
service service-dummy restart | tee --append service.log
else
echo "service-dummy.conf not installed"
fi
echo "Postinst complete"
;;
*)
echo "postinst called with unknown argument '$1'" >&2
;;
esac
# exit 1 to ensure installer stalls
exit 1
Synaptic displays the log:
...
Starting service-dummy root service
stop: Unknown instance:
service-dummy start/running, process 9207
Postinst complete
dpkg: error processing service-dummy (--configure):
subprocess installed post-installation script returned error exit status 1
...
It's as if upstart needed to be refreshed?
I tried more things and then I did get it to work, sort of: I try starting the service, then abort the script with an exit 1, and when the script runs a second time (postinst, same parameters, so I detect the second run otherwise), I start the service again, and this time it sticks.
A key clue is in the log:
Postinst complete (aborting script)
dpkg: error processing service-dummy (--configure):
subprocess installed post-installation script returned error exit status 1
Processing triggers for libc-bin ...
ldconfig deferred processing now taking place
Errors were encountered while processing:
service-dummy
E: Sub-process /usr/bin/dpkg returned an error code (1)
A package failed to install. Trying to recover:
Setting up service-dummy ...
service-dummy postinst configure
Starting service-dummy a second time
stop: Unknown instance:
service-dummy start/running, process 4034
Postinst complete (aborting script recovery attempt)
So I guess my question now becomes:
How do I force ldconfig to not defer its processing?
Found the right clue here: http://lists.debian.org/debian-glibc/2008/07/msg00169.html
Turns out apt-get temporarily prevents use of ldconfig by replacing it with something else. The solution to my problem is to very simply call ldconfig.real instead of ldconfig in the script.

Resources