EC2 startup script gets stuck on wget - r

I have the following script for some number crunching
#!/bin/bash
sudo apt-get update -y
sudo apt-get upgrade -y
sudo apt-get install -y r-base r-base-dev htop s3cmd p7zip-full
wget https://s3.amazonaws.com/#######/###.7z
7z e ###.7z
sudo R CMD BATCH --slave --no-timing --vanilla "--args 0 1 100 200 500 2" SOME-ROUTINE.R
s3cmd put *.results s3://#########/
on EC2. I upload the script as file at the Launch Instance->Instance Details->User Data
The machine fires up, updates and upgrades but then it does not execute wget and does not download the file. When i SSH in the Instance and run the exact same commands the process completes without problems.
Any ideas why wget does not work?
Any other alternatives?
EC

It is always a bit of guessing, but here is how I would debug this:
My first suggestion would be to check for special characters in the S3 URL. This might cause the wget call to fail.
Second, I would give an explicit output path to wget with the -O option. While you are editing the command, you can also add -o to output logging information.
Last step is to check your access rights to the S3 bucket. Perhaps you can try to put the file on another webspace to see if the command executes then.

Related

Building a Docker Image a little more quickly [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 2 years ago.
Improve this question
I plan on using a service to receive and grade student code submissions for a class I'll be teaching next semester.
For each assignment (there are many) a shell script runs to build a Docker image. I upload a zip file to the website, and among all the compressed files, there is this one:
#!/usr/bin/env bash
# these lines install R on the virtual machine
apt-get install -y libxml2-dev libcurl4-openssl-dev libssl-dev
apt-get install -y r-base
# these lines install the packages that is needed both
# 1. the student code
# 2. the autograding code
# Note that
# a. devtools is for install_github This is temporary and will be changed once the updates to gradeR have made it to CRAN.
Rscript -e "install.packages('devtools')"
Rscript -e "library(devtools); install_github('tbrown122387/gradeR')"
# These are packages that many students in the class will use
Rscript -e "install.packages('tidyverse')"
Rscript -e "install.packages('stringr')"
The problem though is that this takes about 20 minutes. How do I speed this up? I'm totally new to Docker containers.
First, I'd suggest building a base image containing all of the tools and packages that you think you'll need. There's no need to be picky, because you only need to do this once. That's kind of the whole point of Docker -- portability and reuse.
FROM ubuntu:bionic
RUN apt-get update && apt-get install -y libxml2-dev libcurl4-openssl-dev libssl-dev r-base
RUN Rscript -e "install.packages('tidyverse')"
RUN Rscript -e "install.packages('stringr')"
...
Build that image and tag it as grader:1.0.0 or whatever.
Then, when it's time to grade, just mount the assignments and grading code using the -v, --volume option to docker run. You don't need to alter the container to make files accessible within it.
docker run \
--rm \
-it \
-v /path/to/assignments:/data/assignments \
-v /path/to/autograder:/data/autograder \
grader:1.0.0 \
/bin/bash
If at some point you need to add some packages, you can rebuild it by modifying the original Dockerfile or extend it by using it as the base of your next image:
FROM grader:1.0.0
RUN apt-get update && apt-get install -y the-package-i-forgot
Build it, tag it.
Use rocker/tidyverse image from Docker Hub instead of whatever image you're using.
First:
docker pull rocker/tidyverse
Then add this line:
FROM rocker/verse

Install systemd service on Debian installation

I'm building custom Debian ISO with simple-cdd utility. It worked well till the moment when I attached my own .deb package.
build-simple-cdd --dist stretch --profiles moj --force-root --local-packages /root/iso/deb
build-simple-cdd works properly, because I saw my deb package in tmp directory structure and iso image is created successfully. However debian installation fails
I suspect, that postinst script fails, since it uses systemctl command when it may be unavailable.
#!/bin/sh
set -e
echo $1
if [ "$1" = "configure" ]; then
echo "Configuring privileges..."
chown user:user /usr/bin/Koncentrator
chmod 0755 /usr/bin/Koncentrator
echo "Enabling Koncentrator services..."
systemctl daemon-reload
systemctl enable Xvfb.service
systemctl enable Koncentrator.service
fi
I've added systemd dependency to control file, but it doesn't work.
I made workaround for this issue. simple-cdd allows to prepare post installation script. apt install is called there without problems. Two steps are required to use this solution:
Add deb package to installation disk. This is configured via profile configuration file (moj.conf):
all_extras="$all_extras /root/iso/files/customapackage_0.1.3.deb"
Run apt install in moj.postinst script:
#!/bin/sh
mount /dev/cdrom /media/cdrom
cd /media/cdrom/simple-cdd
apt install ./custompackage_0.1.3.deb
cd /
sync
umount /media/cdrom
If you want to debug your postinst script, you can insert there long sleep:
#!/bin/sh
sleep 10000000
...
And switch terminal (Ctrl+Alt+F1-6) during finish-install phase. Than call chroot /target to switch in-target environemnent

How to Choose R Server's R as Default in Operationalization, Remote R Workspace and RStudio Server?

So I've set up an Azure Data Science Virtual Machine on Linux (Ubuntu) and I've executed the following on the terminal to enable Remote R workspace, RStudio Server, R Server Operationalization and hadoop:
sudo apt update
sudo apt -y upgrade
# Hadoop is installed but doesn't seem to appear on the PATH or have its environment variable set by default
sudo echo "" >> ~/.bashrc
sudo echo "export PATH="'$'"PATH:/opt/hadoop/hadoop-2.7.4/bin" >> ~/.bashrc
sudo echo "export HADOOP_HOME=/opt/hadoop/hadoop-2.7.4" >> ~/.bashrc
#
source ~/.bashrc
#Setting up a password as none exists to begin with because of private key selection in the installation
#RStudio Server requires a password though
"MyPassword\nMyPassword\n" | sudo passwd sshuser
#Unfortunately hadoop fails on Data Science Virtual Machine
#error: mkdir: Call From IM-DSonUbuntu/192.168.5.4 to localhost:9000 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
# hadoop fs -mkdir /user/RevoShare/rserve2
# hadoop fs -chmod uog+rwx /user/RevoShare/rserve2
sudo mkdir -p /var/RevoShare/rserve2
sudo chmod uog+rwx /var/RevoShare/rserve2
# hadoop fs -mkdir /user/RevoShare/sshuser
# hadoop fs -chmod uog+rwx /user/RevoShare/sshuser
sudo mkdir -p /var/RevoShare/sshuser
sudo chmod uog+rwx /var/RevoShare/sshuser
#Setting up R Server Operationalisation
cd /opt/microsoft/mlserver/9.2.1/o16n
sudo dotnet Microsoft.MLServer.Utils.AdminUtil/Microsoft.MLServer.Utils.AdminUtil.dll -silentoneboxinstall MyPassword
#They say this Data Science Virtual Machine already has RStudio Server, but even though the port 8787 is open, it's nowhere to be found! So installing it now, and after the installation it's accessible by refreshing the page that failed before.
#Perhaps it's not installed then? Or a service is not running like it shoudl?
#https://www.rstudio.com/products/rstudio/download-server/
wget https://download2.rstudio.org/rstudio-server-1.1.414-amd64.deb
yes | sudo gdebi rstudio-server-1.1.414-amd64.deb
#They are small, leave them for debug reasons - lets have evidence the script run thus far.
#sudo rm rstudio-server-1.1.414-amd64.deb
# Remote R workspace Service needs dotnet sdk
curl https://packages.microsoft.com/keys/microsoft.asc | gpg --dearmor > microsoft.gpg
sudo mv microsoft.gpg /etc/apt/trusted.gpg.d/microsoft.gpg
sudo sh -c 'echo "deb [arch=amd64] https://packages.microsoft.com/repos/microsoft-ubuntu-xenial-prod xenial main" > /etc/apt/sources.list.d/dotnetdev.list'
sudo apt update
sudo apt -y install dotnet-sdk-2.0.0
sudo apt install libxml2-dev
#Downloading and installing the Remote R service
wget -O rtvs-daemon.tar.gz https://aka.ms/r-remote-services-linux-binary-current
tar -xvzf rtvs-daemon.tar.gz
sudo ./rtvs-install -s
sudo systemctl enable rtvsd
sudo systemctl start rtvsd
#sudo rm rtvs-daemon.tar.gz
#sudo rm rtvs-install
#Fixing Remote R: For some reason, even though 'sudo systemctl enable rtvsd' runs, after every reboot the service won't become automatically active. So let's fix that.
wget https://sa0im0general.blob.core.windows.net/general-blob-container/StartRemoteRAfterReboot.sh
sudo mv StartRemoteRAfterReboot.sh /var/RevoShare/StartRemoteRAfterReboot.sh
sudo /sbin/shutdown -r 5
sudo chown root /etc/rc.local
sudo chmod 755 /etc/rc.local
sudo systemctl enable rc-local.service
sudo -s
sudo find /etc/ -name "rc.local" -exec sed -i 's/exit 0//g' {} \;
sudo echo "" >> /etc/rc.local
sudo echo "sh /var/RevoShare/StartRemoteRAfterReboot.sh" >> /etc/rc.local
sudo echo "exit 0" >> /etc/rc.local
exit
I've also tried, one by one, these, to see if it makes any difference to the RStudio Server (it didn't, but even if it did, I want a global solution to work on Remote R Workspace Service and R Server Operationalisation as well, not only RStudio Server):
#Configuring RStudio Server to see the R Server R
sudo echo "rsession-which-r=/opt/microsoft/mlserver/9.2.1/bin/R/R" >> /etc/rstudio/rserver.conf
export RSTUDIO_WHICH_R=/opt/microsoft/mlserver/9.2.1/bin/R/R
sudo echo "RSTUDIO_WHICH_R=/opt/microsoft/mlserver/9.2.1/bin/R/R" >> ~/.profile
source ~/.profile
sudo echo "RSTUDIO_WHICH_R=/opt/microsoft/mlserver/9.2.1/bin/R/R" >> ~/.bashrc
source ~/.bashrc
sudo echo "PATH=$PATH:/opt/microsoft/mlserver/9.2.1/bin/R" >> ~/.bashrc
export PATH=$PATH:/opt/microsoft/mlserver/9.2.1/bin/R
source ~/.bashrc
The problem is that even though "which R" points to R Server's R, i.e. typing "sudo R" will show the message "Loading Microsoft R Server packages, version 9.2.1." and will load packages like RevoScaleR, everything else fails to do so.
Accessing the RStudio Server with http://THE-IP-GOES-HERE.westeurope.cloudapp.azure.com:8787 and logging in with the initial user ("sshuser") (or with any other user for that matter) will NOT load R Server and RevoScaleR rx functions are unavailable
Using my local Visual Studio 2017 to access the remote workspace via "Add connection" on "Workspaces" tab loads MRO and says:
Installed R versions:
[0] Microsoft R Open '3.4.1.1347' (Default)
And finally, when I use R Server's Operationalisation and log in with "mrsdeploy" package's "remoteLogin()" R Server packages like RevoScaleR are not loaded again, so things like "rxSummary(~., data=iris)" fail with error 'could not find function "rxSummary"'
The exact same thing happened when I deployed from azure a "Machine Learning Server 9.2.1 on Linux (Ubuntu)".
I don't want to just use the regular open source R, I want to be able to use the R Server - that's why I deployed this VM. How can I make it so that everything loads R Server's R, not Microsoft R Open? (Like I'm able to do from terminal using "R")
As a result of my having tried all of this and the fact that R Server is loaded in the console, my mind now goes to permissions. Could it be that by default the Data Science VM doesn't have the correct permissions to allow these?
I'm at a loss
RStudio Server is installed on the Ubuntu DSVM, but the service is disabled by default as it does not support SSL. You can enable it with systemctl enable rstudio-server, then start it with systemctl start rstudio-server.
RStudio Server uses the same R as Microsoft R Server, but the .libPaths are different, which is why you cannot load the MRS packages. You will need to manually set the .libPaths so they match.

Luarocks error on ubuntu

I am trying to run Neuraltalk2 on Ubuntu. But I am getting an error as follows:
parag#parag:~/torch$ sudo luarocks install nn
[sudo] password for parag:
Error: No results matching query were found.
I followed the following steps uptill now:
sudo curl -s https://raw.githubusercontent.com/torch/ezinstall/master/install-deps | bash
sudo git clone https://github.com/torch/distro.git ~/torch --recursive
sudo cd ~/torch;
sudo ./install.sh
sudo source ~/.bashrc
Please help!
Try running this all without sudo. The last line, especially, sudo source ~/.bashrc does not work because source is meant to operate on the shell you are currently running. If you run it with sudo, it will load .bashrc into the temporary subshell created by sudo (in practice having no effect).
Your error message indicates that luarocks was installed correctly, but it failed to find the rock. Make sure the name of the rock is correct, try searching it with the luarocks search command, and check your configuration running luarocks with no arguments (it will display the name of your config files in use, helping you to troubleshoot the issue).

Run a service automatically in a docker container

I'm setting up a simple image: one that holds Riak (a NoSQL database). The image starts the Riak service with riak start as a CMD. Now, if I run it as a daemon with docker run -d quintenk/riak-dev, it does start the Riak process (I can see that in the logs). However, it closes automatically after a few seconds. If I run it using docker run -i -t quintenk/riak-dev /bin/bash the riak process is not started (UPDATE: see answers for an explanation for this). In fact, no services are running at all. I can start it manually using the terminal, but I would like Riak to start automatically. I figure this behavior would occur for other services as well, Riak is just an example.
So, running/restarting the container should automatically start Riak. What is the correct approach of setting this up?
For reference, here is the Dockerfile with which the image can be created (UPDATE: altered using the chosen answer):
FROM ubuntu:12.04
RUN apt-get update
RUN apt-get install -y openssh-server curl
RUN curl http://apt.basho.com/gpg/basho.apt.key | apt-key add -
RUN bash -c "echo deb http://apt.basho.com precise main > /etc/apt/sources.list.d/basho.list"
RUN apt-get update
RUN apt-get -y install riak
RUN perl -p -i -e 's/(?<=\{http,\s\[\s\{")127\.0\.0\.1/0.0.0.0/g' /etc/riak/app.config
EXPOSE 8098
CMD /bin/riak start && tail -F /var/log/riak/erlang.log.1
EDIT: -f changed to -F in CMD in accordance to sesm his remark
MY OWN ANSWER
After working with Docker for some time I picked up the habit of using supervisord to tun my processes. If you would like example code for that, check out https://github.com/Krijger/docker-cookbooks. I use my supervisor image as a base for all my other images. I blogged on using supervisor here.
To keep docker containers running, you need to keep a process active in the foreground.
So you could probably replace that last line in your Dockerfile with
CMD /bin/riak console
Or even
CMD /bin/riak start && tail -F /var/log/riak/erlang.log.1
Note that you can't have multiple lines of CMD statements, only the last one gets run.
Using tail to keep container alive is a hack. Also, note, that with -f option container will terminate when log rotation happens (this can be avoided by using -F instead).
A better solution is to use supervisor. Take a look at this tutorial about running Riak in a Docker container.
The explanation for:
If I run it using docker run -i -t quintenk/riak-dev /bin/bash the riak process is not started
is as follows. Using CMD in the Dockerfile is actually the same functionality as starting the container using docker run {image} {command}. As Gigablah remarked only the last CMD is used, so the one written in the Dockerfile is overwritten in this case.
By using CMD /bin/riak start && tail -f /var/log/riak/erlang.log.1 in the Buildfile, you can start the container as a background process using docker run -d {image}, which works like a charm.
"If I run it using docker run -i -t quintenk/riak-dev /bin/bash the riak process is not started"
It sounds like you only want to be able to monitor the log when you attach to the container. My use case is a little different in that I want commands started automatically, but I want to be able to attach to the container and be in a bash shell. I was able to solve both of our problems as follows:
In the image/container, add the commands you want automatically started to the end of the /etc/bash.bashrc file.
In your case just add the line /bin/riak start && tail -F /var/log/riak/erlang.log.1, or put /bin/riak start and tail -F /var/log/riak/erlang.log.1 on separate lines depending on the functionality desired.
Now commit your changes to your container, and run it again with: docker run -i -t quintenk/riak-dev /bin/bash. You'll find the commands you put in the bashrc are already running as you attach.
Because I want a clean way to have the process exit later I make the last command a call to the shell's read which causes that process to block until I later attach to it and hit enter.
arthur#macro:~/docker$ sudo docker run -d -t -i -v /raid:/raid -p 4040:4040 subsonic /bin/bash -c 'service subsonic start && read -p "waiting"'
WARNING: Docker detected local DNS server on resolv.conf. Using default external servers: [8.8.8.8 8.8.4.4]
f27229a260c9
arthur#macro:~/docker$ sudo docker ps
[sudo] password for arthur:
ID IMAGE COMMAND CREATED STATUS PORTS
35f253bdf45a subsonic:latest /bin/bash -c service 2 days ago Up 2 days 4040->4040
arthur#macro:~/docker$ sudo docker attach 35f253bdf45a
arthur#macro:~/docker$ sudo docker ps
ID IMAGE COMMAND CREATED STATUS PORTS
as you can see the container exits after you attach to it and unblock the read.
You can of course use a more sophisticated script than read -p if you need to do other clean up, such as stopping services and saving logs etc.
I use a simple trick whenever I start building a new docker container. To keep it alive, I use a ping in the entrypoint script.
So in the Dockerfile, when using debian, for instance, I make sure I can ping.
This is btw, always nice, to check what is accessible from within the container.
...
RUN DEBIAN_FRONTEND=noninteractive apt-get update \
&& apt-get install -y iputils-ping
...
ENTRYPOINT ["entrypoint.sh"]
And in the entrypoint.sh file
#!/bin/bash
...
ping 10.10.0.1 >/dev/null 2>/dev/null
I use this instead of CMD bash, as I always wind up using a startup file.

Resources