Baffling Debian installation script: service starts but then quits mysteriously - Why? - deb

I've been struggling with this one for a while and I'm thoroughly baffled.
I have this postinst Debian script that is supposed to start a service once installation (of the service executable) is complete. As best I can tell, the service does start successfully, but it then immediately and mysteriously quits. Restarting the service from the command-line works fine once Synaptic concludes.
I tried writing a dummy package to verify this. The dummy package installs /etc/init/service-dummy.conf and a symbolic link to that file, named /etc/init.d/service-dummy (just like the original service). The contents of service-dummy.conf are the same as service.conf. The dummy starts the service...and then the service keeps on running. So I can't even reproduce my problem!
The postinst script does this:
#!/bin/sh
set -e
case "$1" in
configure)
# (instructions which config, make and install the freshly installed source code)
ldconfig
echo "Install concluded"
if [ -e "/etc/init/service-dummy.conf" ]; then
echo "Starting service-dummy root service" | tee service.log
service service-dummy restart | tee --append service.log
else
echo "service-dummy.conf not installed"
fi
echo "Postinst complete"
;;
*)
echo "postinst called with unknown argument '$1'" >&2
;;
esac
# exit 1 to ensure installer stalls
exit 1
Synaptic displays the log:
...
Starting service-dummy root service
stop: Unknown instance:
service-dummy start/running, process 9207
Postinst complete
dpkg: error processing service-dummy (--configure):
subprocess installed post-installation script returned error exit status 1
...
It's as if upstart needed to be refreshed?
I tried more things and then I did get it to work, sort of: I try starting the service, then abort the script with an exit 1, and when the script runs a second time (postinst, same parameters, so I detect the second run otherwise), I start the service again, and this time it sticks.
A key clue is in the log:
Postinst complete (aborting script)
dpkg: error processing service-dummy (--configure):
subprocess installed post-installation script returned error exit status 1
Processing triggers for libc-bin ...
ldconfig deferred processing now taking place
Errors were encountered while processing:
service-dummy
E: Sub-process /usr/bin/dpkg returned an error code (1)
A package failed to install. Trying to recover:
Setting up service-dummy ...
service-dummy postinst configure
Starting service-dummy a second time
stop: Unknown instance:
service-dummy start/running, process 4034
Postinst complete (aborting script recovery attempt)
So I guess my question now becomes:
How do I force ldconfig to not defer its processing?

Found the right clue here: http://lists.debian.org/debian-glibc/2008/07/msg00169.html
Turns out apt-get temporarily prevents use of ldconfig by replacing it with something else. The solution to my problem is to very simply call ldconfig.real instead of ldconfig in the script.

Related

Openstack Devstack installation error : g-api did not start in ubuntu 18.04

I'm installing openstack using All-In-One Single Machine setup, I run stack.sh script for devstack setup. On starting glance service I'm getting following error on my console:
++:: curl -g -k --noproxy '*' -s -o /dev/null -w '%{http_code}' http://10.10.20.10/image
+:: [[ 503 == 503 ]]
+:: sleep 1
+functions:wait_for_service:485 rval=124
+functions:wait_for_service:490 time_stop wait_for_service
+functions-common:time_stop:2310 local name
+functions-common:time_stop:2311 local end_time
+functions-common:time_stop:2312 local elapsed_time
+functions-common:time_stop:2313 local total
+functions-common:time_stop:2314 local start_time
+functions-common:time_stop:2316 name=wait_for_service
+functions-common:time_stop:2317 start_time=1602763779096
+functions-common:time_stop:2319 [[ -z 1602763779096 ]]
++functions-common:time_stop:2322 date +%s%3N
+functions-common:time_stop:2322 end_time=1602763839214
+functions-common:time_stop:2323 elapsed_time=60118
+functions-common:time_stop:2324 total=569
+functions-common:time_stop:2326 _TIME_START[$name]=
+functions-common:time_stop:2327 _TIME_TOTAL[$name]=60687
+functions:wait_for_service:491 return 124
+lib/glance:start_glance:480 die 480 'g-api did not start'
+functions-common:die:198 local exitcode=0
+functions-common:die:199 set +o xtrace
[Call Trace]
./stack.sh:1306:start_glance
/opt/stack/devstack/lib/glance:480:die
[ERROR] /opt/stack/devstack/lib/glance:480 g-api did not start
Error on exit
World dumping... see /opt/stack/logs/worlddump-2020-10-15-121040.txt for details
neutron-dhcp-agent: no process found
neutron-l3-agent: no process found
neutron-metadata-agent: no process found
neutron-openvswitch-agent: no process found
I also tried to increase timeout duration but then also it failed and also verifyied devstack#g-api.service is in active state. Can someone let me know what is the exect reason behind this issue and how to resolve it.
The only solution is to reload the entire system, including the os

Docker: Permission denied when running rocker/shiny-verse and rocker/shiny images

I'm trying to run Shiny Server on an EC2 instance running Ubuntu.
Following this page, I run this command: docker run --rm -p 3838:3838 rocker/shiny.
I get the following warning and error:
s6-supervise shiny-server: warning: unable to spawn ./run - waiting 10 seconds
s6-supervise (child): fatal: unable to exec run: Exec format error
After that, those two outputs just repeat every 10 seconds. I can't even kill the process, so I have to close the terminal and start another one.
I'm new to Docker and have no real clue how to proceed from here. Any help would be appreciated.
Updates:
At least I've tracked down what s6-supervise is: part of a set of utilities revolving around process supervision and management, logging, and system initialization.

paramiko and nohup ''

OK so I have paramiko v2.2.1 and I am trying to login to a machine and restart a service. Inside the service scripts it basically starts a process via nohup. However if I allow paramiko to disconnect as soon as it is done the process started terminates with a PIPE signal when it writes to stdout.
If I start the service by ssh'ing into the box and manually starting it there is no issue and it runs in the background fine. Also if I add long sleep 10 before disconnecting (close) paramiko it also seems to work just fine.
The service is started via a init.d script via a line like this:
env LD_LIBRARY_PATH=$bin_path nohup $bin_path/ServerLoop.sh \
"$bin_path/Service service args" "$#" &
Where ServerLoop.sh simply calls the service forever in a loop like this so it will never die:
SERVER=$1
shift
ARGS=$#
logger $ARGS
while [ 1 ]; do
$SERVER $ARGS
logger "$SERVER terminated with exit code: $STATUS. Server has been restarted"
sleep 1
done
I have noticed when I start the service by ssh'ing into the box I get a nohup.out file written to the root. However when I run through paramiko I get no nohup.out written anywhere on the system ... ie this after I manually ssh into the box and start the service:
root#ts4700:/mnt/mc.fw/bin# find / -name "nohup*"
/usr/bin/nohup
/usr/share/man/man1/nohup.1.gz
/nohup.out
And this is after I run through paramiko:
root#ts4700:/mnt/mc.fw/bin# find / -name "nohup*"
/usr/bin/nohup
/usr/share/man/man1/nohup.1.gz
As I understand it nohup will only redirect the output to nohup.out if "If standard output is a terminal" (from the manual), otherwise it thinks it is saving the output to a file so it does not redirect. Hence I tried the following:
In [43]: import paramiko
In [44]: paramiko.__version__
Out[44]: '2.2.1'
In [45]: ssh = paramiko.SSHClient()
In [46]: ssh.set_missing_host_key_policy(AutoAddPolicy())
In [47]: ssh.connect(ip, username='root', password=not_for_so_sorry, look_for_keys=False, allow_agent=False)
In [48]: stdin, stdout, stderr = ssh.exec_command("tty")
In [49]: stdout.read()
Out[49]: 'not a tty\n'
So I am thinking that nohup is not redirecting to nohup.out when I run it through paramiko because tty is not returning a terminal. I don't know why adding a sleep(10) would fix this though as the service if run on the command line is quite verbose.
I have also noticed that if the service is started from a manual ssh its tty in the ps ax output is still set to the ssh tty ... however if the process is started by paramiko its tty in the ps ax output is set to "?" .. since both processes are run through nohup I would have expected this to be the same.
If the problem is that nohup is indeed not redirecting the output to nohup.out because of the tty is there a way to force this to happen or a better way to run this sort of command via paramiko?
Thanks all, any help with this would be great :)

solr / tomcat7 doesn't come back up after a crash

I keep getting issues with Solr crashing on my server. Its hardly a busy site, so I'm baffled as to why it keeps doing it.
Anyway, as an intermediary - I'm written a shell script that runs on a cron as root:
#!/bin/bash
declare -a arr=(tomcat7 nginx mysql);
for i in "${arr[#]}"
do
echo "Checking $i"
if (( $(ps -ef | grep -v grep | grep $i | wc -l) > 0 ))
then
echo "$i is running!!!"
else
echo "service $i start\n"
service $i start
fi
done
# re-run, but this time do a restart if its still not going!
for i in "${arr[#]}"
do
echo "Checking $i"
if (( $(ps -ef | grep -v grep | grep $i | wc -l) > 0 ))
then
echo "$i is running!!!"
else
service $i restart
fi
done
..then this cron (as root)
*/5 * * * * bash /root/script-checks.sh
The cron itself seems to run just fine:
Checking tomcat7
service tomcat7 start\n
Checking nginx
nginx is running!!!
Checking mysql
mysql is running!!!
Checking tomcat7
Checking nginx
nginx is running!!!
Checking mysql
mysql is running!!!
...and Tomcats status seems ok:
root#domain:~# service tomcat7 status
â tomcat7.service - LSB: Start Tomcat.
Loaded: loaded (/etc/init.d/tomcat7)
Active: active (exited) since Mon 2016-03-21 06:33:28 GMT; 4 days ago
Process: 2695 ExecStart=/etc/init.d/tomcat7 start (code=exited, status=0/SUCCESS)
Warning: Journal has been rotated since unit was started. Log output is incomplete or unavailable.
...yet my script, can't connect to Solr:
Could not parse JSON response: malformed JSON string, neither tag, array, object, number, string or atom, at character offset 0 (before "Can't connect to loc...") at /srv/www/domain.net/www/cgi-bin/admin/WebService/Solr/Response.pm line 42. Can't connect to localhost:8080 Connection refused at /usr/share/perl5/LWP/Protocol/http.pm line 49.
If I manually run a "restart":
service tomcat7 restart
...it then starts working again. Its almost like the 2nd part in my shell script isn't working.
Any suggestions?
My Solr versions are as follows:
Solr Specification Version: 3.6.2.2014.10.31.18.33.47
Solr Implementation Version: 3.6.2 debian - pbuilder - 2014-10-31 18:33:47
Lucene Specification Version: 3.6.2
UPDATE: I've read that sometimes updating the maxThreads can help with crashes, so I've changed it to 10,000:
<Connector port="8443" protocol="org.apache.coyote.http11.Http11Protocol"
maxThreads="10000" SSLEnabled="true" scheme="https" secure="true"
clientAuth="false" sslProtocol="TLS" />
I guess time will tell, to see if this fixes the issue.
Ok, well I never got to the bottom of why it wouldn't restart... but I have worked out why it was crashing. Before, we had it on 2048mb RAM Linode server, but when we moved over to Apache2, I setup a 1024Mb server, and was going to upgrade it to 2048mb one we had it all working. However, we put it live - but I forgot to update it to the 2048mb server, so Nginx/Apache2/Tomcat/MySQL etc, were all trying to run on a pretty slow server.
We found that Solr was dying with an OOM (out of memory) error, which is what gave us the clue.
Hopefully this helps someone else, who may come across this.

Memory issue with meteor up (mup) on Digital Ocean

I couldn't find existing posts related to my issue. On a Digital Ocean Droplet, mup setup went fine, but when I try to deploy, I get the following error. Any ideas? Thanks!
root#ts:~/ts-deploy# mup deploy
Meteor Up: Production Quality Meteor Deployments
Building Started: /root/TS/
Bundling Error: code=137, error:
-------------------STDOUT-------------------
Figuring out the best package versions to use. This may take a moment.
-------------------STDERR-------------------
bash: line 1: 31217 Killed meteor build --directory /tmp/dc37af3e-eca0-4a19-bf1a-d6d38bb8f517
Below are the logs. node -v indicates I am using 0.10.31. How do I check which script is exiting with the error? Any other ideas? Thanks!
error: Forever detected script exited with code: 1
error: Script restart attempt #106
Meteor requires Node v0.10.29 or later.
error: Forever detected script exited with code: 1
error: Script restart attempt #107
Meteor requires Node v0.10.29 or later.
error: Forever detected script exited with code: 1
error: Script restart attempt #108
stepping down to gid: meteoruser
stepping down to uid: meteoruser
After I went back to an old backup of the DO Droplet, and re-ran mup setup and mup deploy, I now get this in the command line output
Building Started: /root/TS
Bundling Error: code=134, error:
-------------------STDOUT-------------------
Figuring out the best package versions to use. This may take a moment.
-------------------STDERR-------------------
FATAL ERROR: JS Allocation failed - process out of memory
bash: line 1: 1724 Aborted (core dumped) meteor build --directory /tmp/bfdbcb45-9c61-435f-9875-3fb304358996
and this in the logs:
>> stepping down to gid: meteoruser
>> stepping down to uid: meteoruser
Exception while invoking method 'login' TypeError: Cannot read property '0' of undefined
at ServiceConfiguration.configurations.remove.service (app/server/accounts.js:7:26)
at Object.Accounts.insertUserDoc (packages/accounts-base/accounts_server.js:1024)
at Object.Accounts.updateOrCreateUserFromExternalService (packages/accounts-base/accounts_server.js:1189)
at Package (packages/accounts-oauth/oauth_server.js:45)
at packages/accounts-base/accounts_server.js:383
at tryLoginMethod (packages/accounts-base/accounts_server.js:186)
at runLoginHandlers (packages/accounts-base/accounts_server.js:380)
at Meteor.methods.login (packages/accounts-base/accounts_server.js:434)
at maybeAuditArgumentChecks (packages/ddp/livedata_server.js:1594)
at packages/ddp/livedata_server.js:648
Exception while invoking method 'login' TypeError: Cannot read property '0' of undefined
at ServiceConfiguration.configurations.remove.service (app/server/accounts.js:7:26)
at Object.Accounts.insertUserDoc (packages/accounts-base/accounts_server.js:1024)
at Object.Accounts.updateOrCreateUserFromExternalService (packages/accounts-base/accounts_server.js:1189)
at Package (packages/accounts-oauth/oauth_server.js:45)
at packages/accounts-base/accounts_server.js:383
at tryLoginMethod (packages/accounts-base/accounts_server.js:186)
at runLoginHandlers (packages/accounts-base/accounts_server.js:380)
at Meteor.methods.login (packages/accounts-base/accounts_server.js:434)
at maybeAuditArgumentChecks (packages/ddp/livedata_server.js:1594)
at packages/ddp/livedata_server.js:648
The memory issue stems from using DigitalOcean's $5 Droplet. To solve the problem, I added swap to the server, as explained in detail below.
Create and enable the swap file using the dd command :
sudo dd if=/dev/zero of=/swapfile bs=1024 count=256k
“of=/swapfile” designates the file’s name. In this case the name is swapfile.
Next prepare the swap file by creating a linux swap area:
sudo mkswap /swapfile
The results display:
Setting up swapspace version 1, size = 262140 KiB
no label, UUID=103c4545-5fc5-47f3-a8b3-dfbdb64fd7eb
Finish up by activating the swap file:
sudo swapon /swapfile
You will then be able to see the new swap file when you view the swap summary.
swapon -s
Filename Type Size Used Priority
/swapfile file 262140 0 -1
This file will last on the virtual private server until the machine reboots. You can ensure that the swap is permanent by adding it to the fstab file.
Open up the file:
sudo nano /etc/fstab
Paste in the following line:
/swapfile none swap sw 0 0
Swappiness in the file should be set to 10. Skipping this step may cause both poor performance, whereas setting it to 10 will cause swap to act as an emergency buffer, preventing out-of-memory crashes.
You can do this with the following commands:
echo 10 | sudo tee /proc/sys/vm/swappiness
echo vm.swappiness = 10 | sudo tee -a /etc/sysctl.conf
To prevent the file from being world-readable, you should set up the correct permissions on the swap file:
sudo chown root:root /swapfile
sudo chmod 0600 /swapfile
This only worked for me by increasing the swap space to 1gb:
Make all swap off
sudo swapoff -a
Resize the swapfile
sudo dd if=/dev/zero of=/swapfile bs=1M count=1024
Make swapfile usable
sudo mkswap /swapfile
Make swapon again
sudo swapon /swapfile

Resources