Bootstrap action for EMR - emr

While bootstapping on AWS EMR - I am getting the following. Any clues how to resolve it?
/mnt/var/lib/bootstrap-actions/1/STAR: /lib/libc.so.6: version 'GLIBC_2.14' not found (required by /mnt/var/lib/bootstrap-actions/1/STAR)

It's probably caused by not having high enough version of libc6.
You can SSH into the EC2 instance the EMR job created by following this: Open an SSH Tunnel to the Master Node
Then update the packages, for example, if your instance uses ubuntu, you should do sudo apt-get update. The command depends on which distribution of linux you are creating on your ec2 instance. The default emr job uses Debian, and Amazon linux is built based on redhat.
See if this would work.
If this is actually the problem, you can add this update package command (with ignoring Y/N prompt) at the start of your bootstrap script.

Related

Virtuoso installation stuck on "VAD Sticker vad_dav.xml creation" on 2nd attempt

I've been attempting to install virtuoso-opensource as per the README here: https://github.com/openlink/virtuoso-opensource
The first time I ran make, it breezed passed "VAD Sticker vad_dav.xml creation" to "Starting Virtuoso Server" and then complained netstat: command not found.
I installed netstat via sudo apt-get install net-tools, ran make again and now it's always stuck on "VAD Sticker vad_dav.xml creation" even after starting from scratch after deleting the repo folder and re-cloning it again. make install gets stuck at the same point.
Does anyone have any ideas as to how to get past this?
I'm running Ubuntu Server 20.04.1 LTS on an AWS EC2 instance.
I'll note the requirement for netstat to build VADs.
By far the most common cause of hang at that stage is a stagnant virtuoso pid lurking.
pkill virtuoso and run make again.
HTH

Docker : Ubuntu/Shiny R : error when I try to run my own custom environment

I’m quite new at docker, and I’d like to create a docker environement with exactly the same configuration as my production server one. My docker will be used as a local development environement for one specific R Shiny Server application.
Here are my settings :
I’m working locally on Windows 7
Server is Ubuntu 18.04.1 LTS
Server R version : 3.5.1
I was managed to use rocker/rstudio, but it doesn’t allow me to deal with R versions; furthermore, it’s based on Debian distribution.
So, quite innocently, I tried to build my own Dockerfile based on already existing Dockerfiles, to perform installation from Ubuntu -> R -> RStudio + Shiny server.
My Dockerfile is built successfully, but I get the following error when I try to run it with the following command line :
docker run -p 8787:8787 -e PASSWORD=Mypswd -v /c/Users/njeanray/Documents/Myproject:/home/rstudio/myproject rstudio:R3.5.1
Please, find my Dockerfile at this place :
https://wetransfer.com/downloads/972d94d2ec730ecb8afbc2b315c8fbb020200429094458/3c31aa
It’s quite weird because I’ve taken the code from Dockerfile rocker/rstudio, and running rocker/rstudio works…
How can I manage to run my environment from Ubuntu 18.04, with R 3.5.1 and RStudio ?
Can you tell me what I'm doing wrong ?
Many thanks in advance,
Best regards
I created a docker image from the Dockerfile shared by you. It is hosted on https://hub.docker.com/r/aktechthoughts/r-studio-docker.
It is working fine.

s3cmd: how to use server side encryption?

I'm trying to encrypt some files on Amazon S3 using server side encryption. According to this link
http://s3tools.org/kb/item9.htm
I should only add this flag
--server-side-encryption
on the put or sync command I'm trying to run, but when I do that I get a "s3cmd: error: no such option: --server-side-encryption" message.
How do I run this command to use server side encryption?
s3cmd put file.zip s3://test/file.zip
I'm using ubuntu 14.04 server 64 bits.
You need a more recent version of s3cmd than what is in the ubuntu repositories. Use github.com/s3tools/s3cmd master branch (preferred), or the copy in the Debian experimental repository.
If you've upgraded- make sure you don't have any remnants of the old version. I had this issue because I had installed the first package via the system package handler but when I upgraded I had installed via python. This left me with the impression that I had upgraded- but had not removed the old version.
I discovered this because
dpkg -l s3*
Still lists v 1.1 while
pip list | grep s3
Shows 1.6.1
I fixed the issue by uninstalling the old package using the system package handler.
dpkg -r s3*
Then when the cron job ran, it ran the python package version 1.6.1 and no errors occurred.

What is the relation between RStudio and RServe?

I'm new to R and I decided to put R on a machine I have and see if I can remotely run code that is on my desktop computer.
While searching for "how to do" that, I came across the names "Rserve" and "RStudio". As far as I could tell, RServe is a package (actually, it seems to be the package) which I can use to configure the server, while RStudio is an IDE.
My question is: does RStudio use RServe "under the hood"? And, if it doesn't, then how does RStudio compare to RServe? (I.e., which one is better and why?)
[I figured out that this question could possibly be a duplicate, but I couldn't find any similar question]
Rserve is a client server implemenation written in pure c that starts a server and spawns multiple processes each with it's own R workspace. This is not threads but processes due to R's limitation on multithreading. It uses a QAP packing protocol as it's primary form of transport between the client and the server. You execute commands via the client (PHP, Java, C++) to the server and it returns you REXP objects that are essentially mappings to R's underlying SEXP data objects. Rserve also offers a websockets version that does will can transmit data through websockets but the api is not well documented. It also supports basic authentication through a configuration file.
Rstudio is a C++ and gwt application that provides a web based front end to R. AFAIK it uses json as it's primary transport and supports authentication through pam. Each user has a workspace configured in their home directory. It runs a server very similar but not the same as Rserve to communicate with R using RCPP. It also has it's own plotting driver used to wrap the plot device so that it can pickup the plots to be served to the ui. It has much more functionality such as stepping through your code from the ui and viewing workspace variables.
Functionally they are similar in that they provide a client/server connection to R but IMHO the comparison stops there.
I believe they are separate projects (though I could be wrong). I've never heard of RServe and there does not appear to be any mention of it in the documentation for RStudio. I have used and would recommend RStudio Server. It is relatively easy to set up and super easy to use once it is set up. This is a helpful guide to setting up a server on Amazon EC2:
#Create a user, home directory and set password
sudo useradd rstudio
sudo mkdir /home/rstudio
sudo passwd rstudio
#Enter Password
sudo chmod -R 0777 /home/rstudio
#Update all files from the default state
sudo apt-get update
sudo apt-get upgrade
#Be Able to get R 3.0
sudo add-apt-repository 'deb http://cran.rstudio.com/bin/linux/ubuntu precise/'
#Update files to use CRAN mirror
#Don't worry about error message
sudo apt-get update
#Install latest version of R
#Install without verification
sudo apt-get install r-base
#Install a few background files
sudo apt-get install gdebi-core
sudo apt-get install libapparmor1
#Change to a writeable directory
#Download & Install RStudio Server
cd /tmp
wget http://download2.rstudio.org/rstudio-server-0.97.551-amd64.deb
sudo gdebi rstudio-server-0.97.551-amd64.deb
#Once you’ve installed the above commands, you can now access RStudio through your local browser. Navigate to the Public DNS of your image on port 8787, similar to:
#http://ec2-50-19-18-120.compute-1.amazonaws.com:8787
The earlier answer about 3 years old provide old information, such as here.
Updated correction
RStudio is a firm that provides the open source RStudio IDE for R. They also sell commercial services such as RStudio Server Pro that markets itself with load balancing and related things. Apparently, the successuful open source project has lead the way to markets.
You may also mean Microsoft R Server, which is now called Microsoft Machine Learning Server?
There is also RServer by RStudio.
Anyway how to install both can be found here.

Nginx and passenger 3.0.0 on mac - why does it fail on startup?

I've been trying to set up nginx 0.8.53 and passenger 3.0.0 on my dev
environment - osx snow leopard and REE. I manually compiled nginx
with the passenger module linked in.
When I tried running passenger, it had a problem - ENV['PATH']
appeared to be null, so the split on it when call
PlatformInfo.find_command raised an exception. It was called when
trying to find out the osname - looking for the sw_vers command.
I tweaked the source and told it that it was macosx and then it
complained that it couldn't find the Rails 2.3.8 gem. This is
probably related to the first problem.
I'm not sure how to troubleshoot this? When I su -i and sudo nobody,
both users let me start irb and see the expected value for
ENV['PATH'], so I'm not sure why it's not working when passenger is
running?
One possibility: Passenger launches as the user that owns the config/environment.rb file (or of the config.ru file, if you have one) - make sure that file's owner is something sensible.
I don't know how you start Nginx, but you can write a launcher script for Nginx that starts Nginx with a specific environment, like this:
#!/bin/bash
export PATH=whatever
exec /path/to/nginx

Resources