Setting up sparklyr - r

I am working on setting up sparklyr utilizing R but I keep getting an error message. I essentially have this type in:
install.packages("sparklyr")
library(sparklyr)
spark_install(version = "2.1.0")
sc <- spark_connect(master = "local")
However when I get to create my spark connect I am receiving the following error message:
Using Spark: 2.1.0
Error in if (a[k] > b[k]) return(1) else if (a[k] < b[k]) return(-1L) :
missing value where TRUE/FALSE needed
In addition: Warning messages:
1: running command '"C:\WINDOWS\SYSTEM32\java.exe" -version' had status 2
2: In compareVersion(parsedVersion, "1.7") : NAs introduced by coercion
Any thoughts?

Related

Rismed R package fails to run EUtilsGet function

I'm using the Rismed package to make s search query for the word "hsv".
search_topic_hsv <- "HSV"
search_query_hsv <- EUtilsSummary(search_topic_hsv, retmax= 27000, mindate= 1970, maxdate= 2022)
summary(search_query_hsv) #returns to 26149
records_hsv <- EUtilsGet(search_query_hsv)
EutilsGet functions gives this error:
Error in readLines(collapse(EUtilsFetch, "&retmode=xml"), warn = FALSE, :
cannot read from connection
In addition: Warning message:
In readLines(collapse(EUtilsFetch, "&retmode=xml"), warn = FALSE, :
URL 'https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=pubmed&id=35245365,35243667,35243023,35241838,35240194,35239848,35230726,35229412,35228129,35225980,35225036,35223260,35217873,35216177,35215908,35215799,35214758,35214644,35212944,35208817,35208572,35206263,35205731,35204477,35201702,35202955,35202734,35202567,35201013,35200997,35200729,35200645,35199176,35199165,35198914,35198826,35197947,35197285,35196784,35193038,35187895,35187000,35186664,35185822,35182142,35180012,35179120,35178081,35176987,35175220,35172828,35168095,35167501,35167411,35165387,35164537,35163675,35163086,35161988,35157734,35155582,35154980,35154946,35154584,35150885,35146210,35145504,35144835,35144523,35143960,35142052,35141364,35141210,35141072,35130299,35126336,35121324,35120255,35119473,35119328,35115776,35114961,35114414,35112018,35111489,35110532,35107381,35106980,35106191,35105326,35105322,35103749,35102178,35101046,35100650,35100323,35099587,35097177,35087520,35085968,35083659,35082770,350825 [... truncated]
is this a pubmed connection related issue or package related issue?
packageVersion("RISmed")
[1] ‘2.3.0’
Thanks in advance.

How to make sparkR run

I have been trying to make SparkR work without success.
Read previous questions, blogs, and yet haven't been able to make it work.
First I had issues installing SparkR, finally I think I installed it, but then cannot make it run.
Here is my detailed code with different Options I tried to make it run.
Currently using Rstudio with R 3.6.0 version.
Any help will be appreciated!!
#***************************#
#Installing Spark Option 1
#***************************#
install.packages("SparkR")
'''
Does not work
'''
Sys.setenv("JAVA_HOME" = "D:/Program Files/Java/jdk1.8.0_181")
Sys.getenv("JAVA_HOME")
#***************************#
#Installing Spark Option 2
#***************************#
#Find Spark Versions
jsonlite::fromJSON("https://api.github.com/repos/apache/spark/tags")$name
if (!require('devtools')) install.packages('devtools')
devtools::install_github('apache/spark#v2.4.6', subdir='R/pkg')
Sys.setenv(SPARK_HOME='D:/spark-2.3.1-bin-hadoop2.7')
.libPaths(c(file.path(Sys.getenv('SPARK_HOME'), 'R', 'lib'), .libPaths()))
'''
Installation didnt work
'''
#***************************#
#Installation Spark Option 3
#***************************#
install.packages("sparklyr")
library(sparklyr)
spark_install(version = "2.3.1")
install.packages("https://cran.r-project.org/src/contrib/Archive/SparkR/SparkR_2.3.0.tar.gz", repos = NULL, type="source")
library(SparkR)
'''
One of 2 installations worked
'''
#***************************#
#Starting Spark Option 1
#***************************#
library(SparkR, lib.loc = c(file.path(Sys.getenv("SPARK_HOME"), "R","lib")))
sparkR.session(master = "local[*]", sparkConfig = list(spark.driver.memory = "2g"))
'''
Spark package found in SPARK_HOME: D:/spark-2.3.1-bin-hadoop2.7
Launching java with spark-submit command D:/spark-2.3.1-bin-hadoop2.7/bin/spark-submit2.cmd --driver-memory "2g" sparkr-shell C:\Users\FELIPE~1\AppData\Local\Temp\RtmpKOxYkx\backend_port34a0263f43f5
Error in if (len > 0) { : argumento tiene longitud cero
'''
#***************************#
#Starting Spark Option 2
#***************************#
Sys.setenv("JAVA_HOME" = "D:/Program Files/Java/jdk1.8.0_181")
Sys.getenv("JAVA_HOME")
sparkEnvir <- list(spark.num.executors='5', spark.executor.cores='5')
# initializing Spark context
sc <- sparkR.init(sparkHome = "'D:/spark-2.3.1-bin-hadoop2.7'",
sparkEnvir = sparkEnvir)
'''
Error in sparkR.sparkContext(master, appName, sparkHome, convertNamedListToEnv(sparkEnvir), :
JVM is not ready after 10 seconds
Además: Warning message:
sparkR.init is deprecated.
Use sparkR.session instead.
See help("Deprecated")
'''
#***************************#
#Starting Spark Option 3
#***************************#
Sys.setenv("JAVA_HOME" = "D:/Program Files/Java/jdk1.8.0_181")
Sys.getenv("JAVA_HOME")
sparkEnvir <- list(spark.num.executors='5', spark.executor.cores='5')
# initializing Spark context
sc <- sparkR.session(sparkHome = "'D:/spark-2.3.1-bin-hadoop2.7'",
sparkEnvir = sparkEnvir)
'''
Spark not found in SPARK_HOME: D:/spark-2.3.1-bin-hadoop2.7
Spark package found in SPARK_HOME: D:/spark-2.3.1-bin-hadoop2.7
Launching java with spark-submit command D:/spark-2.3.1-bin-hadoop2.7/bin/spark-submit2.cmd sparkr-shell C:\Users\FELIPE~1\AppData\Local\Temp\RtmpKOxYkx\backend_port34a082b15e1
Error in if (len > 0) { : argumento tiene longitud cero
'''

R Internationalization and environment

I'm using R 3.4.3 on Ubuntu 16.04. I don't quite understand internationalization
> Sys.setenv("LANGUAGE" = "en_US")
> 2+x
Error: object 'x' not found
> Sys.setenv("LANGUAGE" = "fr_FR")
> 2+x
Erreur : objet 'x' introuvable
> Sys.setenv("LANGUAGE" = "en_US")
> 2+x
Erreur : objet 'x' introuvable
More specifically, I don't understand why the last error message is printed in French. Even more strange, the other error messages are displayed in English. For instance:
> log(-1)
[1] NaN
Warning message:
In log(-1) : NaNs produced
And when I do the same trick (Sys.setenv("LANGUAGE" = "fr_FR") and then Sys.setenv("LANGUAGE" = "en_US")), the message is displayed in French.
Why can't I get the messages to get back to English, and is there any workaround?
Try using
options(tz="Europe/Stockholm")
and/or
Sys.setenv(TZ="Europe/Stockholm")
Sys.setlocale("LC_ALL", 'en_US.UTF-8')
If you want you settings to persist add those to you .Rprofile like so:
.First <- function(){
options(tz="Europe/Stockholm") #Your tz
Sys.setenv(TZ="Europe/Stockholm") #Your tz
}
Clean .Rdata and restart

Not able to to convert R data frame to Spark DataFrame

When I try to convert my local dataframe in R to Spark DataFrame using:
raw.data <- as.DataFrame(sc,raw.data)
I get this error:
17/01/24 08:02:04 WARN RBackendHandler: cannot find matching method class org.apache.spark.sql.api.r.SQLUtils.getJavaSparkContext. Candidates are:
17/01/24 08:02:04 WARN RBackendHandler: getJavaSparkContext(class org.apache.spark.sql.SQLContext)
17/01/24 08:02:04 ERROR RBackendHandler: getJavaSparkContext on org.apache.spark.sql.api.r.SQLUtils failed
Error in invokeJava(isStatic = TRUE, className, methodName, ...) :
The question is similar to
sparkR on AWS: Unable to load native-hadoop library and
Don't need to use sc if you are using the latest version of Spark. I am using SparkR package having version 2.0.0 in RStudio. Please go through following code (that is used to connect R session with SparkR session):
if (nchar(Sys.getenv("SPARK_HOME")) < 1) {
Sys.setenv(SPARK_HOME = "path-to-spark home/spark-2.0.0-bin-hadoop2.7")
}
library(SparkR)
library(SparkR, lib.loc = c(file.path(Sys.getenv("SPARK_HOME"), "R","lib")))
sparkR.session(enableHiveSupport = FALSE,master = "spark://master url:7077", sparkConfig = list(spark.driver.memory = "2g"))
Following is the output of R console:
> data<-as.data.frame(iris)
> class(data)
[1] "data.frame"
> data.df<-as.DataFrame(data)
> class(data.df)
[1] "SparkDataFrame"
attr(,"package")
[1] "SparkR"
use this example code :
library(SparkR)
library(readr)
sc <- sparkR.init(appName = "data")
sqlContext <- sparkRSQL.init(sc)
old_df<-read_csv("/home/mx/data.csv")
old_df<-data.frame(old_df)
new_df <- createDataFrame( sqlContext, old_df)

Error while using transformation function in R

I was working with baby names data set and encountered below error while using transform function. Any guidance/suggestion would be highly appreciated. I did reinstalled the packages but of no avail.
Mac OS X (Mountain Lion)
R version 3.1.2 (2014-10-31) -- "Pumpkin Helmet"
library(stringr)
require(stringr)
bnames1 <- transform(bnames1,
first = tolower(str_sub(name,1,1)),
last = tolower(str_sub(name,-1,1)),
vowels = vowels(name),
length= nchar(name),
per1000 = 10000 * prop,
one_par = 1/prop
)
Error in tolower(str_sub(name, 1, 1)) :
lazy-load database '/Library/Frameworks/R.framework/Versions/3.1/Resources/library/stringr/R/stringr.rdb' is corrupt
In addition: Warning messages:
1: In tolower(str_sub(name, 1, 1)) :
restarting interrupted promise evaluation
2: In tolower(str_sub(name, 1, 1)) : internal error -3 in R_decompress1
internal error -3 is often a functioning of installing on top of a loaded package. Restart R and restart your application. There may be other issues, but until you do this you won't be going much further.
Try
remove.packages("stringr")
install.packages("stringr")

Resources