Install Spark on Windows for sparklyr - r

I have tried several tutorials on setting up Spark and Hadoop in a Windows environment, especially alongside R. This one resulted in this error by the time I hit figure 9:
This tutorial from Rstudio is giving me issues as well. When I get to the
sc <- spark_connect(master = "local")
step, I get this familiar error:
Error in force(code) :
Failed while connecting to sparklyr to port (8880) for sessionid (1652): Gateway in port (8880) did not respond.
Path: C:\Users\jvangeete\spark-2.0.2-bin-hadoop2.7\bin\spark-submit2.cmd
Parameters: --class, sparklyr.Backend, "C:\Users\jvangeete\Documents\R\win-library\3.3\sparklyr\java\sparklyr-2.0-2.11.jar", 8880, 1652
---- Output Log ----
The system cannot find the path specified.
---- Error Log ----
This port issue is similar to the one I get when trying to assign the "yarn-client" parameter inside spark_connect(...) as well, when trying it from Ms. Zaidi's tutorial, here. (That tutorial has its own issues, which I've put up on a board, here, if anyone's interested.)
The TutorialsPoint walkthrough gets me through fine if I first install an Ubuntu VM, but I'm using Microsoft R(RO) so I'd like to figure this out in Windows, not least of all because it appears that Mr. Emaasit is in the first tutorial able to run a command I cannot with .\bin\sparkR.
Most generally I am trying to understand how to install and run Spark together with R using preferably sparklyr, in Windows.
UPDATE 1: This is what's in the directories:
UPDATE 2: This is my R-session and system info
platform x86_64-w64-mingw32
arch x86_64
os mingw32
system x86_64, mingw32
status
major 3
minor 3.1
year 2016
month 06
day 21
svn rev 70800
language R
version.string R version 3.3.1 (2016-06-21)
nickname Bug in Your Hair

Download spark_hadoop tar from
http://spark.apache.org/downloads.html
install sparklyr package from carn
spark_install_tar(tarfile = "path/to/spark_hadoop.tar")
If you still getting error, then untar the tar manually and set spark_home environment variable points to spark_hadoop untar path.
Then try executing the following in the R console. library(sparklyr) sc <- spark_connect(master = "local").

Related

GnuTLS Error -50: Cannot install packages from Github in Rstudio?

The following lines prompt an error (which differs from this question):
library(devtools)
install_github("StatsWithR/statsr") # the same for any other open repos
The error is this:
Error: Failed to install 'unknown package' from GitHub:
Error -50 setting GnuTLS cipher list starting with +VERS-TLS1.3
Then I wanted to work around the problem caused by devtools, so I tried githubinstall, but the error became this:
Error in curl::curl_download(input, tmpFile, mode = "wb", quiet = !showProgress) :
Error -50 setting GnuTLS cipher list starting with +VERS-TLS1.3
It seems obvious that the issue is caused by GnuTLS and I found a similar thread: git/jenkins TLS issue, but I cannot find some hints there.
I also tried install_url, install_local, and install_git, but all failed.
Here is the information of the R I recently upgraded from 3.4.4(2018):
platform x86_64-pc-linux-gnu
arch x86_64
os linux-gnu
system x86_64, linux-gnu
status
major 4
minor 1.2
year 2021
month 11
day 01
svn rev 81115
language R
version.string R version 4.1.2 (2021-11-01)
nickname Bird Hippie
I have been stuck with devtools and install_github. What can I try next?
I installed the r-base and Rstudio using apt install in an existing container, and the default settings(the default r version is 3.4.4) caused the issues I met, then I purged the r environment and created a new one in a new Docker container following this blog: Running RStudio Server with Docker.

Is RCurl currently broken on Windows? (error:1407742E:SSL)

Good day one and all.
I have been using RCurl to load https://raw. tables from our GitHub repository for data cleanup and analysis. Recently (maybe two weeks) every script using:
read.csv(text = getURL())
stopped working, throwing an error as such:
Error in function (type, msg, asError = TRUE) : error:1407742E:SSL routines:SSL23_GET_SERVER_HELLO:tlsv1 alert protocol version
Unfortunately my understanding of SSL and related issues is very limited (I still git using login credentials in bash).
The thing is, on my Debian machine at work, the code executes flawlessly. And upon removing geURL() and sticking to simple read.csv(), even the Windows code works.
Which is fine but I have functions dependent on url.exists() which also get broken and I have been unable to find a replacement for that.
version
_
platform x86_64-w64-mingw32
arch x86_64
os mingw32
system x86_64, mingw32
status
major 4
minor 0.3
year 2020
month 10
day 10
svn rev 79318
language R
version.string R version 4.0.3 (2020-10-10)
nickname Bunny-Wunnies Freak Out
Thank you in advance for anything that might get me closer to a solution.
The problem is, that the RCurl package for Windows is linked to a very old version of curl (7.40). For reasons I do not know, even the most current packages of RCurl still link to that old version.
In order to use a recent version of curl, you have to install RCurl from the source by using https://cran.r-project.org/bin/windows/Rtools/ .
For compiling RCurl you need to install curl with the relevant header files using "Rtools bash" by the following commands:
pacman -Sy
pacman -S mingw-w64-{i686,x86_64}-curl
After that, you can install RCurl from the source:
install.packages("RCurl", type="source)

Multiple versions of R installed - terminal launching wrong / different R from RStudio

I am attempting to compile my R package, and realized that I have multiple versions of R installed on my mac, which is giving me difficulty. When I run 'which R' from terminal, I receive this:
Home$ which R
/Users/Home/anaconda2/bin/R
Home$ R
R version 3.2.2 (2015-08-14) -- "Fire Safety"
However, when I launch RStudio from my applications folder, and type 'version' in the console, I get this:
> version
_
platform x86_64-apple-darwin13.4.0
arch x86_64
os darwin13.4.0
system x86_64, darwin13.4.0
status
major 3
minor 3.2
year 2016
month 10
day 31
svn rev 71607
language R
version.string R version 3.3.2 (2016-10-31)
nickname Sincere Pumpkin Patch
So I have 3.3.2 on RStudio (the version I want for compiling my package), and 3.2.2 from anaconda being launched in terminal when I type R in terminal.
How can I fix this? Do I have to change my path to find the correct version of R when I launch from terminal? How do I find the correct path?
Thanks!
I bet anaconda has just inserted its path at the front of your PATH variable and is overriding your newer 3.3.2 version at the terminal. If you want your 'RStudio' version to be the default version that pops up at your terminal when you type 'R', then you gotta modify your PATH. No biggie.
First, figure out which R version RStudio points to. Type the following into your RStudio console:
Sys.which("R")
I bet you'll see something like /usr/local/bin/R. So that's what you have to add to the front of your PATH (minus the '/R').
To confirm that anaconda has messed you up, open up your terminal and check out your PATH:
echo $PATH
You'll probably see /Users/YOURNAME/anaconda2/bin as the first entry in your PATH, and further down you'll see /usr/local/bin. We have to flip this order. There are a million ways to fix this. Here's the quick and dirty solution -- add the following to the bottom of your .bash_profile
export PATH="/usr/local/bin:$PATH"
And type R --version in your terminal to confirm that your default R has changed.
You might get fancy later with sed or awk if having two /usr/local/bin entries in your PATH annoys you (as it would me).
First go to the directory /Library/Frameworks/R.framework/Versions
Here you should see various versions of R that you have installed.
To change to say version 3.4 use the following sequence of commands in the Terminal:
cd /Library/Frameworks/R.framework/Versions
unlink Current
ln -s 3.4 Current
I highly recommend RSwitch. It's a tiny program that allows you to select from any installed R version, press one button, restart your R session and you'll be using the selected R version.

Cannot run R from terminal after upgrading to macOS Sierra

I recently updated my macbook to macOS Sierra (Version 10.12.3 (16D32)), and I am no longer able to run R directly from Terminal:
DN51ssqi:~ kjytay$ R
-bash: R: command not found
DN51ssqi:~ kjytay$ R --version
-bash: R: command not found
Opening R from the Applications folder or from RStudio works fine. Anyone experience this issue/has been able to fix it?
Here is my R version information:
platform x86_64-apple-darwin13.4.0
arch x86_64
os darwin13.4.0
system x86_64, darwin13.4.0
status
major 3
minor 3.2
year 2016
month 10
day 31
svn rev 71607
language R
version.string R version 3.3.2 (2016-10-31)
nickname Sincere Pumpkin Patch
This is just a guess, but I'm thinking this is probably an issue with your PATH settings, which might have been overwritten when you upgraded*. Seems worth a try at least. This is from the RStudio support pages**:
R from source (including MacPorts and Homebrew)
When R is installed from CRAN on OS X the R executable is installed at
/usr/bin/R. However, if R is installed directly from source or via a
package manager like MacPorts or Homebrew, then the R executable is
installed to either /usr/local/bin/R (Homebrew) or /opt/local/bin/R
(MacPorts). In order to support these variations, RStudio scans for
the R executable in the following sequence:
/usr/bin/R
/usr/local/bin/R
/opt/local/bin/R
If RStudio is not able to locate R by scanning these locations, it
will fall back to using whatever version of R is located at
/Library/Frameworks/R.framework/.
If RStudio is finding R OK, then you must have it at one of these locations. Make sure these locations are in your $PATH list:
In the Terminal:
echo $PATH
Will display your current PATHs list. If any of the locations in the RStudio quote are missing, you can see if that's where R is located by trying to specify that location. For example:
/usr/local/bin/R
If that works to start R, just add that location to your PATHs list:
export PATH=$PATH:/usr/local/bin
So that OSX knows where to find it!
* It's been noted elsewhere that homebrew breaks, for example, on upgrade to Sierra. Here's a blog post outlining some steps an R user might like to take after the upgrade: http://www.statsblogs.com/2017/01/26/upgrading-to-macos-sierra-nee-osx-for-r-users/
** Here's the support page where the quote is from https://support.rstudio.com/hc/en-us/articles/200486138-Using-Different-Versions-of-R

RStudio can not use git in Yosemite

After upgrading my Mac to Yosemite, I am not able to use git in RStudio anymore.
(I can still use source tree or git independently from RStudio)
Not sure whether it is related to the PATH issue posted here:
Running system command from R console cannot locate installed programs since upgrading to Mac OSX 10.10
I tried the above solution, but did not work.
In RStudio, I specified the path to Git in Tools/Global Options.../"Git/SVN" correctly (as I used before)
But, in in Tools/Project Options.../"Git/SVN":Version control system the only option left is (None).
RStudio: 0.98.1074 (updated to 0.98.1083, still does not work)
version
_
platform x86_64-apple-darwin10.8.0
arch x86_64
os darwin10.8.0
system x86_64, darwin10.8.0
status
major 3
minor 1.0
year 2014
month 04
day 10
svn rev 65387
language R
version.string R version 3.1.0 (2014-04-10)
nickname Spring Dance
I encountered the same problem. When you enter the directory you used to use with git in your terminal (cd path/to/directory) type git status. In my case I received the message:
Agreeing to the Xcode/iOS license requires admin privileges, please re-run as root via sudo.
Apparently, I had to agree to the Xcode licence. When you re-run the line with sudo (sudo git status) and enter your password in the terminal you get the whole license displayed. Now you only need to type agree and everything is done. Re-start RStudio and the git functionality is back again.
Best wishes,
Marco

Resources