I am trying to get H2O working with Sparklyr on my spark cluster (yarn)
spark_version(sc) = 2.4.4
My spark cluster is running V2.4.4
According to this page the compatible version with my spark is 2.4.5 for Sparkling Water and the H2O release is rel-xu patch version 3. However when I install this version I am prompted to update my H2O install to the next release (REL-ZORN). Between the H2O guides and the sparklyr guides it's very confusing and contradictory at times.
Since this is a yarn deployment and not local, unfortunately I can't provide a repex to help with trobleshooting.
url <- "http://h2o-release.s3.amazonaws.com/sparkling-water/rel-2.4/5/sparkling-water-2.4.5.zip"
download.file(url = url,"sparkling-water-2.4.5.zip")
unzip("sparkling-water-2.4.5.zip")
# RUN THESE CMDs FROM THE TERMINAL
cd sparkling-water-2.4.5
bin/sparkling-shell --conf "spark.executor.memory=1g"
# RUN THESE FROM WITHIN RSTUDIO
install.packages("sparklyr")
library(sparklyr)
# REMOVE PRIOR INSTALLS OF H2O
detach("package:rsparkling", unload = TRUE)
if ("package:h2o" %in% search()) { detach("package:h2o", unload = TRUE) }
if (isNamespaceLoaded("h2o")){ unloadNamespace("h2o") }
remove.packages("h2o")
# INSTALLING REL-ZORN (3.36.0.3) WHICH IS REQUIRED FOR SPARKLING WATER 3.36.0.3
install.packages("h2o", type = "source", repos = "https://h2o-release.s3.amazonaws.com/h2o/rel-zorn/3/R")
# INSTALLING FROM S3 SINCE CRAN NO LONGER SUPPORTED
install.packages("rsparkling", type = "source", repos = "http://h2o-release.s3.amazonaws.com/sparkling-water/spark-2.4/3.36.0.3-1-2.4/R")
# AS PER THE GUIDE
options(rsparkling.sparklingwater.version = "2.4.5")
library(rsparkling)
# SPECIFY THE CONFIGURATION
config <- sparklyr::spark_config()
config[["spark.yarn.queue"]] <- "my_data_science_queue"
config[["sparklyr.backend.timeout"]] <- 36000
config[["spark.executor.cores"]] <- 32
config[["spark.driver.cores"]] <- 32
config[["spark.executor.memory"]] <- "40g"
config[["spark.executor.instances"]] <- 8
config[["sparklyr.shell.driver-memory"]] <- "16g"
config[["spark.default.parallelism"]] <- "8"
config[["spark.rpc.message.maxSize"]] <- "256"
# MAKE A SPARK CONNECTION
sc <- sparklyr::spark_connect(
master = "yarn",
spark_home = "/opt/mapr/spark/spark",
config = config,
log = "console",
version = "2.4.4"
)
When I try to establish a H2O context using the next chunk I get the following error
h2o_context(sc)
Error in h2o_context(sc) : could not find function "h2o_context"
Any pointers as to where I'm going wrong would be greatly appreciated.
See this tutorial please. The newer versions of Rsparkling use {H2OContext.getOrCreate(h2oConf)} instead of {h2o_context(sc)}.
I read into R a CSV file with large numbers formatted in scientific notation.
I used a couple of statistical R functions (MSE and RMSPE) on the numbers and got an incorrect answer (I checked it in Excel).
When I changed the format in the CSV file to ordinary number format, i.e. with lots of zeroes, the R functions calculated correctly.
What was I doing wrong?
Thanks for any insights,
Claire
UPDATE: console output added. I am using R4.0.2. I have imported two CSV files, one called MPERRORS.csv with the original scientific notation format and the second called CBERRORS.csv saved in number format. I believe the issue is to do with the conversion in Excel of scientific notation format numbers.
Code is below and I have also pasted in results. If you look at number 6.89E+11 it shows as 689000000000 in the formula bar but if you convert it to number format you get 689116020736. Apologies if this is wrong, I am a newbie with minimal R experience as you will have guessed.
CLAIRE
platform x86_64-w64-mingw32
arch x86_64
os mingw32
system x86_64, mingw32
status
major 4
minor 0.2
year 2020
month 06
day 22
svn rev 78730
language R
version.string R version 4.0.2 (2020-06-22)
MPERRORS RESULTS
3.359375e+20 MSE function for MPERRORS
3.359375e+20 mse function for MPERRORS
0.01878106 RMSPE function
0.991949 R2 function
0.9916312 gofR2 function
CBERRORS results
2.94363e+20 MSE
2.94363e+20 mse
0.01805762 RMSPE
0.9929211 R2
enter code here
version
library(ineq)
library(Metrics)
library(MLmetrics)
library(ehaGoF)
ERRORS1<-read.csv(file = 'MPerrors.csv')
ERRORS2<-read.csv(file = 'CBerrors.csv')
str(ERRORS1)
str(ERRORS2)
hist1<-ERRORS1[,2]
base1<-ERRORS1[,3]
print(hist1)
dput(head(ERRORS1,10))
MSE(base1,hist1)
mse(base1,hist1)
RMSPE(base1,hist1)
R2_Score(base1,hist1)
gofRSq(base1,hist1, dgt = 7)
hist2<-ERRORS2[,2]
base2<-ERRORS2[,3]
print(hist2)
dput(head(ERRORS2,10))
MSE(base2,hist2)
mse(base2,hist2)
RMSPE(base2,hist2)
R2_Score(base2,hist2)
gofRSq(base2,hist2, dgt = 7)
# MPERRORS FIRST 10 LINES
structure(list(Time..Year. = 1990:1999, real.gdp.at.market.prices =
c(6.89e+11, 7.51e+11, 7.27e+11, 7.55e+11, 7.85e+11, 7.99e+11, 8.53e+11,
8.95e+11,
9.67e+11, 1.02e+12), X..BusinessAsUsual = c(6.79e+11, 7.25e+11,
7.31e+11, 7.66e+11, 7.76e+11, 7.86e+11, 8.26e+11, 8.84e+11, 9.56e+11,
1.01e+12), Diff = c(9.93e+09, 2.54e+10, -4.32e+09, -1.05e+10,
9.4e+09, 1.36e+10, 2.7e+10, 1.02e+10, 1.13e+10, 1.49e+10)), row.names =
c(NA,
10L), class = "data.frame")
#CBERRORS FIRST 10 LINES
structure(list(Time..Year. = 1990:1999, real.gdp.at.market.prices =
c(689116020736,
750739980288, 726938025984, 755445989376, 785442996224, 799333023744,
852837007360, 894628003840, 966879019008, 1021999972352), X..BusinessAsUsual
= c(679182532608,
725334294528, 731261042688, 765934698496, 776039104512, 785780506624,
825845153792, 884472348672, 955611414528, 1007061172224), Diff = c(9.93e+09,
2.54e+10, -4.32e+09, -1.05e+10, 9.4e+09, 1.36e+10, 2.7e+10, 1.02e+10,
1.13e+10, 1.49e+10)), row.names = c(NA, 10L), class = "data.frame")
Is there any way to run an applescript within R?
I found this reference in an R FAQ on CRAN
From release 1.3.1 R has partial support for AppleScripts. This means two things: you can run applescripts from inside R using the command applescript() (see the corresponding help)
But in my current version of R
R version 3.4.1 (2017-06-30)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS High Sierra 10.13.6
neither applescript() nor ?applescript() returns anything.
Thanks, Simon
Those features aren't in modern R versions (IIRC they harken back to pre-macOS/Mac OS X days).
However, the applescript() function performed no magic:
applescript <- function(script_source, extra_args = c()) {
script_source <- paste0(script_source, collapse = "\n")
tf <- tempfile(fileext = ".applescript")
on.exit(unlink(tf), add=TRUE)
cat(script_source, file = tf)
osascript <- Sys.which("osascript")
args <- c(extra_args, tf)
system2(
command = osascript,
args = args,
stdout = TRUE
) -> res
invisible(res)
}
So you can do anything with it, like open a folder:
applescript(
sprintf(
'tell app "Finder" to open POSIX file "%s"',
Sys.getenv("R_DOC_DIR")
)
)
or, query an app and return data:
res <- applescript('
tell application "iTunes"
set r_name to name of current track
set r_artist to artist of current track
end
return "artist=" & r_artist & "\ntrack=" & r_name
')
print(res)
## [1] "artist=NICO Touches the Walls" "track=Hologram"
For (mebbe) easier usage (I say "mebbe" as the pkg relies on reticulate for some things) I added this to the macthekinfe macOS-centric R package.
I am new to R. I have just installed R 3.2.2, and RStudio 0.99, under Windows 8.
As per section 3.2 Testing an Installation of Help contents,
I meant to execute the following 5 lines of commands.
Sys.setenv(LC_COLLATE = "C", LANGUAGE = "en")
library("tools")
testInstalledBasic("both")
testInstalledPackages(scope = "base", errorsAreFatal = FALSE)
testInstalledPackages(scope = "recommended", errorsAreFatal = FALSE)
The first 2 worked fine. The third threw an error, as shown here
> Sys.setenv(LC_COLLATE = "C", LANGUAGE = "en")
> library("tools")
> testInstalledBasic("both")
running strict specific tests
running code in ‘eval-etc.R’
unable to open output file
FAILED
[1] 1
Warning message:
running command '"C:/PROGRA~1/R/R-32~1.2/bin/x64/R" CMD BATCH --vanilla --no-timing "eval-etc.R" "eval-etc.Rout"' had status 2
>
What is the problem?
Could it be related to setting directories/permissions?
You seem to be missing the diff file from Rtools in the test folder. Simply copying/pasting should do the trick.
I was working with baby names data set and encountered below error while using transform function. Any guidance/suggestion would be highly appreciated. I did reinstalled the packages but of no avail.
Mac OS X (Mountain Lion)
R version 3.1.2 (2014-10-31) -- "Pumpkin Helmet"
library(stringr)
require(stringr)
bnames1 <- transform(bnames1,
first = tolower(str_sub(name,1,1)),
last = tolower(str_sub(name,-1,1)),
vowels = vowels(name),
length= nchar(name),
per1000 = 10000 * prop,
one_par = 1/prop
)
Error in tolower(str_sub(name, 1, 1)) :
lazy-load database '/Library/Frameworks/R.framework/Versions/3.1/Resources/library/stringr/R/stringr.rdb' is corrupt
In addition: Warning messages:
1: In tolower(str_sub(name, 1, 1)) :
restarting interrupted promise evaluation
2: In tolower(str_sub(name, 1, 1)) : internal error -3 in R_decompress1
internal error -3 is often a functioning of installing on top of a loaded package. Restart R and restart your application. There may be other issues, but until you do this you won't be going much further.
Try
remove.packages("stringr")
install.packages("stringr")