SageMaker fails when trying to add Lifecycle Configuration for keeping custom environments persistent after restart - jupyter-notebook

I want to create environment in SageMaker on AWS with miniconda, and make it available as kernels in Jupyter when I restart the session. But the SageMaker keep failing.
I followed the instructions found in here:
https://aws.amazon.com/premiumsupport/knowledge-center/sagemaker-lifecycle-script-timeout/
basically it says:
"Create a custom, persistent Conda installation on the notebook instance's Amazon Elastic Block Store (Amazon EBS) volume: Run the on-create script in the terminal of an existing notebook instance. This script uses Miniconda to create a separate Conda installation on the EBS volume (/home/ec2-user/SageMaker/). Then, run the on-start script as a lifecycle configuration to make the custom environment available as a kernel in Jupyter. This method is recommended for more technical users, and it is a better long-term solution."
I run this on-create.sh script on the terminal on Jupyter:
on-create.sh:
#!/bin/bash
set -e
sudo -u ec2-user -i <<'EOF'
unset SUDO_UID
# Install a separate conda installation via Miniconda
WORKING_DIR=/home/ec2-user/SageMaker/custom-environments
mkdir -p "$WORKING_DIR"
wget https://repo.anaconda.com/miniconda/Miniconda3-4.6.14-Linux-x86_64.sh -O "$WORKING_DIR/miniconda.sh"
bash "$WORKING_DIR/miniconda.sh" -b -u -p "$WORKING_DIR/miniconda"
rm -rf "$WORKING_DIR/miniconda.sh"
# Create a custom conda environment
source "$WORKING_DIR/miniconda/bin/activate"
KERNEL_NAME="conda-test-env"
PYTHON="3.6"
conda create --yes --name "$KERNEL_NAME" python="$PYTHON"
conda activate "$KERNEL_NAME"
pip install --quiet ipykernel
# Customize these lines as necessary to install the required packages
conda install --yes numpy
pip install --quiet boto3
EOF
and it creates the "conda-test-env" environment as expected.
Then I add the on-start.sh as lifestyle configuration:
#!/bin/bash
set -e
sudo -u ec2-user -i <<'EOF'
unset SUDO_UID
source "/home/ec2-user/SageMaker/custom-environments/miniconda/bin/activate"
conda activate conda-test-env
python -m ipykernel install --user --name "conda-test-env" --display-name "conda-test-env"
# Optionally, uncomment these lines to disable SageMaker-provided Conda functionality.
# echo "c.EnvironmentKernelSpecManager.use_conda_directly = False" >> /home/ec2-user/.jupyter/jupyter_notebook_config.py
# rm /home/ec2-user/.condarc
EOF
then I update the instance with the new configuration,
and when I start my notebook instance, after few minutes it fails.
I'll appreciate any help.

Related

Application logs to stdout with Shiny Server and Docker

I have a Docker container running a shiny app (Dockerfile here).
Shiny server logs are output to stdout and application logs are written to /var/log/shiny-server. I'm deploying this container to AWS Fargate and logging applications only display stdout which makes debugging an application when deployed challenging. I'd like to write the application logs to stdout.
I've tried a number of potential solutions:
I've tried the solution provided here, but have had no luck.. I added the exec xtail /var/log/shiny-server/ to my shiny-server.sh as the last line in the file. App logs are not written to stdout
I noticed that writing application logs to stdout is now the default behavior in rocker/shiny, but as I'm using rocker/verse:3.6.2 (upgraded from 3.6.0 today) along with RUN export ADD=shiny, I don't think this is standard behavior for the rocker/verse:3.6.2 container with Shiny add-on. As a result, I don't get the default behavior out of the box.
This issue on github suggests an alternative method of forcing application logging to stdout by way of an environment variable SHINY_LOG_STDERR=1 set at runtime but I'm not Linux-savvy enough to know where that env variable needs to be set to be effective. I found this documentation from Shiny Server v1.5.13 which gave suggestions in which file to set the environment variable depending on Linux distro; however, the output from my container when I run cat /etc/os-release is:
which doesn't really line up with any of the distributions in the Shiny Server documentation, thus making the documentation unhelpful.
I tried adding adding the environment variable from the github issue above in the docker run command, i.e.,
docker run --rm -e SHINY_LOG_STDERR=1 -p 3838:3838 [my image]
as well as
docker run --rm -e APPLICATION_LOGS_TO_STDOUT=true -p 3838:3838 [my image]
and am still not getting the logs to stdout.
I must be missing something here. Can someone help me identify how to successfully get application logs to stdout successfully?
You can add the line ENV SHINY_LOG_STDERR=1 to your Dockerfile (at least, this works with rocker/shiny, not sure about rocker/verse), such as with your Dockerfile:
FROM rocker/verse:3.6.2
## Add shiny capabilities to container
RUN export ADD=shiny && bash /etc/cont-init.d/add
## Install curl and xtail
RUN apt-get update && apt-get install -y \
curl \
xtail
## Add pip3 and other Python packages
RUN sudo apt-get update -y && apt-get install -y python3-pip
RUN pip3 install boto3
## Add R packages
RUN R -e "install.packages(c('shiny', 'tidyverse', 'tidyselect', 'knitr', 'rmarkdown', 'jsonlite', 'odbc', 'dbplyr', 'RMySQL', 'DBI', 'pander', 'sciplot', 'lubridate', 'zoo', 'stringr', 'stringi', 'openxlsx', 'promises', 'future', 'scales', 'ggplot2', 'zip', 'Cairo', 'tinytex', 'reticulate'), repos = 'https://cran.rstudio.com/')"
## Update and install
RUN tlmgr update --self --all
RUN tlmgr install ms
RUN tlmgr install beamer
RUN tlmgr install pgf
#Copy app dir and theme dirs to their respective locations
COPY iarr /srv/shiny-server/iarr
COPY iarr/reports/interim_annual_report/theme/SwCustom /opt/TinyTeX/texmf-dist/tex/latex/beamer/
#Force texlive to find my custom beamer theme
RUN texhash
EXPOSE 3838
## Add shiny-server information
COPY shiny-server.sh /usr/bin/shiny-server.sh
COPY shiny-customized.config /etc/shiny-server/shiny-server.conf
## Add dos2unix to eliminate Win-style line-endings and run
RUN apt-get update -y && apt-get install -y dos2unix
RUN dos2unix /usr/bin/shiny-server.sh && apt-get --purge remove -y dos2unix && rm -rf /var/lib/apt/lists/*
# Enable Logging from stdout
ENV SHINY_LOG_STDERR=1
RUN ["chmod", "+x", "/usr/bin/shiny-server.sh"]
CMD ["/usr/bin/shiny-server.sh"]

Install systemd service on Debian installation

I'm building custom Debian ISO with simple-cdd utility. It worked well till the moment when I attached my own .deb package.
build-simple-cdd --dist stretch --profiles moj --force-root --local-packages /root/iso/deb
build-simple-cdd works properly, because I saw my deb package in tmp directory structure and iso image is created successfully. However debian installation fails
I suspect, that postinst script fails, since it uses systemctl command when it may be unavailable.
#!/bin/sh
set -e
echo $1
if [ "$1" = "configure" ]; then
echo "Configuring privileges..."
chown user:user /usr/bin/Koncentrator
chmod 0755 /usr/bin/Koncentrator
echo "Enabling Koncentrator services..."
systemctl daemon-reload
systemctl enable Xvfb.service
systemctl enable Koncentrator.service
fi
I've added systemd dependency to control file, but it doesn't work.
I made workaround for this issue. simple-cdd allows to prepare post installation script. apt install is called there without problems. Two steps are required to use this solution:
Add deb package to installation disk. This is configured via profile configuration file (moj.conf):
all_extras="$all_extras /root/iso/files/customapackage_0.1.3.deb"
Run apt install in moj.postinst script:
#!/bin/sh
mount /dev/cdrom /media/cdrom
cd /media/cdrom/simple-cdd
apt install ./custompackage_0.1.3.deb
cd /
sync
umount /media/cdrom
If you want to debug your postinst script, you can insert there long sleep:
#!/bin/sh
sleep 10000000
...
And switch terminal (Ctrl+Alt+F1-6) during finish-install phase. Than call chroot /target to switch in-target environemnent

How to mount Cloud Filestore in GCP AI platform Jupyter notebook?

I want to mount a Cloud Filestore instance in a GCP AI Platform Jupyter notebook instance so that I don't have to upload all of my data into the notebook.
I followed the instructions at https://cloud.google.com/filestore/docs/mounting-fileshares, but get these error messages:
root#0084329abd1b:/home# mount <IP_ADDRESS>:/streams cfs
mount.nfs: rpc.statd is not running but is required for remote locking.
mount.nfs: Either use '-o nolock' to keep locks local, or start statd.
root#0084329abd1b:/home# mount -o nolock <IP_ADDRESS>:/streams cfs
mount.nfs: Operation not permitted
From your terminal, you can do something like this.
mkdir des_bucket
gcsfuse --debug_gcs --implicit-dirs src_bucket des_bucket
Create a Filestore instance link
Crerate a Google VM instance link
Create a Notebook AI instance link
On the VM instance run the commands:
sudo apt-get -y update
sudo apt-get -y install nfs-common
sudo mkdir test
# fileshare remote target
sudo mount 111.11.111.11:/fileshare test
sudo chmod go+rw test
echo 'This is a test' > test/testfile
ls test
#testfile
On the Notebook AI instance run the commands link:
sudo apt-get -y update
sudo apt-get -y install nfs-common
sudo mkdir test
# fileshare remote target
sudo mount 111.11.111.11:/fileshare /test
ls test
#testfile
You can also check link

Docker Theme Jupyter

I recently installed the data science notebook for Jupyter but i cant seem to install any themes on it.
Using the local version i have installed a dark theme and i am used to it.
Following this guide and the install section I tried making /custom/ folder and added a cascading style sheet into the mounted volume. But it doesn't seem to work.
Is there anyway i can install a custom theme on the docker image?
My workaround is: add the custom.css file locally under '~/.jupyter/custom/'. When the docker container runs, it will automatically adopt the theme.
You can actually install jupyter-themes and opt for a theme on Dockerfile itself. Here is a example for reference
Install jupyter-themes using pip
RUN pip3 install jupyterthemes
Opt for a theme
CMD ["bash", "-c", "jt -t solarizedd -T -N && jupyter notebook --port=8888 --no-browser --ip=0.0.0.0 --allow-root --notebook-dir=/home/user/workdir"]
Here is a complete docker file sample for reference
FROM ubuntu:20.04
# Install Python and other dependencies
RUN apt-get update && apt-get install -y \
python3 \
python3-pip \
wget
# Install Jupyter
RUN pip3 install jupyter
RUN pip3 install jupyterthemes
# Create a user with a home directory
RUN useradd --create-home --home-dir /home/user user
USER user
# Mount a volume for the working directory
VOLUME /home/user/workdir
# Set the default command to launch Jupyter
CMD ["bash", "-c", "jt -t solarizedd -T -N && jupyter notebook --port=8888 --no-browser --ip=0.0.0.0 --allow-root --notebook-dir=/home/user/workdir"]
# CMD ["jupyter", "notebook", "--no-browser", "--ip=0.0.0.0", "--allow-root", "--notebook-dir=/home/user/workdir"]
You can also include the said command on a docker-compose file, if you are using a docker image
version: "3"
services:
notebook:
image: jupyter/datascience-notebook
ports:
- "8888:8888"
environment:
JUPYTER_ENABLE_LAB: "yes"
volumes:
- .:/home/user/workdir
command: bash -c "pip install jupyterthemes && jt -t solarizedd -T -N"
Here i have chosen the solarizedd theme, just tweak and include the theme that best suits for you.
Hope this helps

How to Choose R Server's R as Default in Operationalization, Remote R Workspace and RStudio Server?

So I've set up an Azure Data Science Virtual Machine on Linux (Ubuntu) and I've executed the following on the terminal to enable Remote R workspace, RStudio Server, R Server Operationalization and hadoop:
sudo apt update
sudo apt -y upgrade
# Hadoop is installed but doesn't seem to appear on the PATH or have its environment variable set by default
sudo echo "" >> ~/.bashrc
sudo echo "export PATH="'$'"PATH:/opt/hadoop/hadoop-2.7.4/bin" >> ~/.bashrc
sudo echo "export HADOOP_HOME=/opt/hadoop/hadoop-2.7.4" >> ~/.bashrc
#
source ~/.bashrc
#Setting up a password as none exists to begin with because of private key selection in the installation
#RStudio Server requires a password though
"MyPassword\nMyPassword\n" | sudo passwd sshuser
#Unfortunately hadoop fails on Data Science Virtual Machine
#error: mkdir: Call From IM-DSonUbuntu/192.168.5.4 to localhost:9000 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
# hadoop fs -mkdir /user/RevoShare/rserve2
# hadoop fs -chmod uog+rwx /user/RevoShare/rserve2
sudo mkdir -p /var/RevoShare/rserve2
sudo chmod uog+rwx /var/RevoShare/rserve2
# hadoop fs -mkdir /user/RevoShare/sshuser
# hadoop fs -chmod uog+rwx /user/RevoShare/sshuser
sudo mkdir -p /var/RevoShare/sshuser
sudo chmod uog+rwx /var/RevoShare/sshuser
#Setting up R Server Operationalisation
cd /opt/microsoft/mlserver/9.2.1/o16n
sudo dotnet Microsoft.MLServer.Utils.AdminUtil/Microsoft.MLServer.Utils.AdminUtil.dll -silentoneboxinstall MyPassword
#They say this Data Science Virtual Machine already has RStudio Server, but even though the port 8787 is open, it's nowhere to be found! So installing it now, and after the installation it's accessible by refreshing the page that failed before.
#Perhaps it's not installed then? Or a service is not running like it shoudl?
#https://www.rstudio.com/products/rstudio/download-server/
wget https://download2.rstudio.org/rstudio-server-1.1.414-amd64.deb
yes | sudo gdebi rstudio-server-1.1.414-amd64.deb
#They are small, leave them for debug reasons - lets have evidence the script run thus far.
#sudo rm rstudio-server-1.1.414-amd64.deb
# Remote R workspace Service needs dotnet sdk
curl https://packages.microsoft.com/keys/microsoft.asc | gpg --dearmor > microsoft.gpg
sudo mv microsoft.gpg /etc/apt/trusted.gpg.d/microsoft.gpg
sudo sh -c 'echo "deb [arch=amd64] https://packages.microsoft.com/repos/microsoft-ubuntu-xenial-prod xenial main" > /etc/apt/sources.list.d/dotnetdev.list'
sudo apt update
sudo apt -y install dotnet-sdk-2.0.0
sudo apt install libxml2-dev
#Downloading and installing the Remote R service
wget -O rtvs-daemon.tar.gz https://aka.ms/r-remote-services-linux-binary-current
tar -xvzf rtvs-daemon.tar.gz
sudo ./rtvs-install -s
sudo systemctl enable rtvsd
sudo systemctl start rtvsd
#sudo rm rtvs-daemon.tar.gz
#sudo rm rtvs-install
#Fixing Remote R: For some reason, even though 'sudo systemctl enable rtvsd' runs, after every reboot the service won't become automatically active. So let's fix that.
wget https://sa0im0general.blob.core.windows.net/general-blob-container/StartRemoteRAfterReboot.sh
sudo mv StartRemoteRAfterReboot.sh /var/RevoShare/StartRemoteRAfterReboot.sh
sudo /sbin/shutdown -r 5
sudo chown root /etc/rc.local
sudo chmod 755 /etc/rc.local
sudo systemctl enable rc-local.service
sudo -s
sudo find /etc/ -name "rc.local" -exec sed -i 's/exit 0//g' {} \;
sudo echo "" >> /etc/rc.local
sudo echo "sh /var/RevoShare/StartRemoteRAfterReboot.sh" >> /etc/rc.local
sudo echo "exit 0" >> /etc/rc.local
exit
I've also tried, one by one, these, to see if it makes any difference to the RStudio Server (it didn't, but even if it did, I want a global solution to work on Remote R Workspace Service and R Server Operationalisation as well, not only RStudio Server):
#Configuring RStudio Server to see the R Server R
sudo echo "rsession-which-r=/opt/microsoft/mlserver/9.2.1/bin/R/R" >> /etc/rstudio/rserver.conf
export RSTUDIO_WHICH_R=/opt/microsoft/mlserver/9.2.1/bin/R/R
sudo echo "RSTUDIO_WHICH_R=/opt/microsoft/mlserver/9.2.1/bin/R/R" >> ~/.profile
source ~/.profile
sudo echo "RSTUDIO_WHICH_R=/opt/microsoft/mlserver/9.2.1/bin/R/R" >> ~/.bashrc
source ~/.bashrc
sudo echo "PATH=$PATH:/opt/microsoft/mlserver/9.2.1/bin/R" >> ~/.bashrc
export PATH=$PATH:/opt/microsoft/mlserver/9.2.1/bin/R
source ~/.bashrc
The problem is that even though "which R" points to R Server's R, i.e. typing "sudo R" will show the message "Loading Microsoft R Server packages, version 9.2.1." and will load packages like RevoScaleR, everything else fails to do so.
Accessing the RStudio Server with http://THE-IP-GOES-HERE.westeurope.cloudapp.azure.com:8787 and logging in with the initial user ("sshuser") (or with any other user for that matter) will NOT load R Server and RevoScaleR rx functions are unavailable
Using my local Visual Studio 2017 to access the remote workspace via "Add connection" on "Workspaces" tab loads MRO and says:
Installed R versions:
[0] Microsoft R Open '3.4.1.1347' (Default)
And finally, when I use R Server's Operationalisation and log in with "mrsdeploy" package's "remoteLogin()" R Server packages like RevoScaleR are not loaded again, so things like "rxSummary(~., data=iris)" fail with error 'could not find function "rxSummary"'
The exact same thing happened when I deployed from azure a "Machine Learning Server 9.2.1 on Linux (Ubuntu)".
I don't want to just use the regular open source R, I want to be able to use the R Server - that's why I deployed this VM. How can I make it so that everything loads R Server's R, not Microsoft R Open? (Like I'm able to do from terminal using "R")
As a result of my having tried all of this and the fact that R Server is loaded in the console, my mind now goes to permissions. Could it be that by default the Data Science VM doesn't have the correct permissions to allow these?
I'm at a loss
RStudio Server is installed on the Ubuntu DSVM, but the service is disabled by default as it does not support SSL. You can enable it with systemctl enable rstudio-server, then start it with systemctl start rstudio-server.
RStudio Server uses the same R as Microsoft R Server, but the .libPaths are different, which is why you cannot load the MRS packages. You will need to manually set the .libPaths so they match.

Resources