Error with LSTM RNN in mxnet (R environment) - r

I'm trying to setup a LSTM RNN by using mxnet in R, however, while trying to train my network I get this error and R is showing me a fatal error all the time:
"[00:36:08] d:\program files (x86)\jenkins\workspace\mxnet\mxnet\src\operator\tensor./matrix_op-inl.h:155: Using target_shape will be deprecated.
[00:36:08] d:\program files (x86)\jenkins\workspace\mxnet\mxnet\src\operator\tensor./matrix_op-inl.h:155: Using target_shape will be deprecated.
[00:36:08] d:\program files (x86)\jenkins\workspace\mxnet\mxnet\src\operator\tensor./matrix_op-inl.h:155: Using target_shape will be deprecated."
here is my code:
# install.packages("drat", repos="https://cran.rstudio.com")
# drat:::addRepo("dmlc")
# install.packages("mxnet")
rm(list = ls())
require(mxnet)
require(mlbench)
inputData <- read.table(file.path(getwd(), "Data", "input.csv"),
header = TRUE, sep = ",")
inputData$X <- as.Date(inputData$X)
inputData <- na.omit(inputData)
index <- 1:nrow(inputData)*0.8
train.dates <- inputData[index,1]
test.dates <- inputData[-index,1]
inputData[,1] <- NULL
train <- inputData[index,]
test <- inputData[-index,]
train.x <- data.matrix(train[,-ncol(train)])
test.x <- data.matrix(test[,-ncol(test)])
train.y <- train[,ncol(train)]
test.y <- test[,ncol(test)]
get.label <- function(X) {
label <- array(0, dim=dim(X))
d <- dim(X)[1]
w <- dim(X)[2]
for (i in 0:(w-1)) {
for (j in 1:d) {
label[i*d+j] <- X[(i*d+j)%%(w*d)+1]
}
}
return (label)
}
X.train.label <- get.label(t(train.x))
X.val.label <- get.label(t(test.x))
X.train <- list(data=t(train.x), label=X.train.label)
X.val <- list(data=t(test.x), label=X.val.label)
batch.size = 1
seq.len = 32
num.hidden = 16
num.embed = 16
num.lstm.layer = 1
num.round = 1
learning.rate= 0.1
wd=0.00001
clip_gradient=1
update.period = 1
model <- mx.lstm(X.train, X.val,
ctx=mx.cpu(),
num.round=num.round,
update.period=update.period,
num.lstm.layer=num.lstm.layer,
seq.len=seq.len,
num.hidden=num.hidden,
num.embed=num.embed,
num.label=15,
batch.size=batch.size,
input.size=15,
initializer=mx.init.uniform(0.1),
learning.rate=learning.rate,
wd=wd,
clip_gradient=clip_gradient)
Input dataset consists of Date column, 15 features, and the target value.
Please hep me. Thanks in advance!

The message that you receive is a warning, and you can ignore it. The real problem is the mismatch of shapes. If I run your code I receive:
[14:06:36] src/ndarray/ndarray.cc:348: Check failed: from.shape() == to->shape() operands shape mismatchfrom.shape = (1,15) to.shape=(1,32)
To fix this problem set seq.len = 15, since you have 15 features. If you update the seq.len and run your code, you will see that training started (notice, I also receive the same warning as you):
[14:08:17] src/operator/tensor/./matrix_op-inl.h:159: Using target_shape will be deprecated.
[14:08:17] src/operator/tensor/./matrix_op-inl.h:159: Using target_shape will be deprecated.
[14:08:17] src/operator/tensor/./matrix_op-inl.h:159: Using target_shape will be deprecated.
Iter [1] Train: Time: 0.263811111450195 sec, NLL=2.71622828266634, Perp=15.1231742012938
Iter [1] Val: NLL=2.51107457406329, Perp=12.3181597260587

Related

How to fix C function R_nc4_get_vara_double returned error in ncdf4 parallel processing in R

I want to download nc data through OPENDAP from a remote storage. I use parallel backend with foreach - dopar loop as follows:
# INPUTS
inputs=commandArgs(trailingOnly = T)
interimpath=as.character(inputs[1])
gcm=as.character(inputs[2])
period=as.character(inputs[3])
var=as.character(inputs[4])
source='MACAV2'
cat('\n\n EXTRACTING DATA FOR',var, gcm, period, '\n\n')
# CHANGING LIBRARY PATHS
.libPaths("/storage/home/htn5098/local_lib/R40") # local library for packages
setwd('/storage/work/h/htn5098/DataAnalysis')
source('./src/Rcodes/CWD_function_package.R') # Calling the function Rscript
# CALLING PACKAGES
library(foreach)
library(doParallel)
library(parallel)
library(filematrix)
# REGISTERING CORES FOR PARALLEL PROCESSING
no_cores <- detectCores()
cl <- makeCluster(no_cores)
registerDoParallel(cl)
invisible(clusterEvalQ(cl,.libPaths("/storage/home/htn5098/local_lib/R40"))) # Really have to import library paths into the workers
invisible(clusterEvalQ(cl, c(library(ncdf4))))
# EXTRACTING DATA FROM THE .NC FILES TO MATRIX FORM
url <- readLines('./data/external/MACAV2_OPENDAP_allvar_allgcm_allperiod.txt')
links <- grep(x = url,pattern = paste0('.*',var,'.*',gcm,'_.*',period), value = T)
start=c(659,93,1) # lon, lat, time
count=c(527,307,-1)
spfile <- read.csv('./data/external/SERC_MACAV2_Elev.csv',header = T)
grids <- sort(unique(spfile$Grid))
clusterExport(cl,list('ncarray2matrix','start','count','grids')) #exporting data into clusters for parallel processing
cat('\nChecking when downloading all grids\n')
# k <- foreach(x = links,.packages = c('ncdf4')) %dopar% {
# nc <- nc_open(x)
# nc.var=ncvar_get(nc,varid=names(nc$var),start=start,count=count)
# return(nc.var)
# nc_close(nc)
# }
k <- foreach(x = links,.packages = c('ncdf4'),.errorhandling = 'pass') %dopar% {
nc <- nc_open(x)
print(nc)
nc.var=ncvar_get(nc,varid=names(nc$var),start=c(659,93,1),count=c(527,307,-1))
nc_close(nc)
return(dim(nc.var))
Sys.sleep(10)
}
# k <- parSapply(cl,links,function(x) {
# nc <- nc_open(x)
# nc.var=ncvar_get(nc,varid=names(nc$var),start=start,count=count)
# nc_close(nc)
# return(nc.var)
# })
print(k)
However, I keep getting this error:
<simpleError in ncvar_get_inner(ncid2use, varid2use, nc$var[[li]]$missval, addOffset, scaleFact, start = start, count = count, verbose = verbose, signedbyte = signedbyte, collapse_degen = collapse_degen): C function R_nc4_get_vara_double returned error>
What could be the reason for this problem? Can you recommend a solution for this that is time-efficient (I have to repeat this for about 20 files)?
Thank you.
I had the same error in my code. The problem was not the code itself. It was one of the files that I wanted to read. It has something wrong, so R couldn't open it. I identified the file and downloaded it again, and the same code worked perfectly.
I also encountered the same error. For me, restarting R session did the trick.

R Model returning the error: Too many open devices

I've been working on the creation of a training model in R for MS Azure. When I initially set up the model it all worked fine. Now it's continuously returning the below:
{"error":{"code":"LibraryExecutionError","message":"Module execution encountered an internal library error.","details":[{"code":"FailedToEvaluateRScript","target":"Score Model (RPackage)","message":"The following error occurred during evaluation of R script: R_tryEval: return error: Error in png(file = \"3e25ea05d5bc49d683f4471ff40780bcrViz%03d.png\", bg = \"transparent\") : \n too many open devices\n"}]}}
I haven't changed anything, and have looked around online only to find references to other issues. My code is as follows:
Trainer R Script
# Modify Datatype, factor Level, Replace NA to 0
x <- dataset
for (i in seq_along(x)) {
if (class(x[[i]]) == "character") {
#Convert Type
x[[i]] <- type.convert(x[[i]])
#Apply Levels
# levels(x[[i]]) <- levels(cols_modeled[, names(x)[i]]) # linked with levels in model
}
if (is.numeric(x[[i]]) && is.na(x[[i]]) ){
#print("*** Updating NA to 0")
x[[i]] <- 0
}
}
df1 <- x
rm(x)
set.seed(1234)
model <- svm(Paid ~ ., data= df1, type= "C")
Scorer R Script
library(e1071)
scores <- data.frame( predicted_result = predict(model, dataset))
Has anyone come across this before?

Dredge error in is.dataframe(data) object not found

I am trying to run pdredge using the following example code, where the data is located at
https://github.com/aditibhaskar/help/blob/master/gages_urbanizing_and_ref_with_trends_cut_to_20years_2018-12-02.Rdata
library(MuMIn)
require(snow)
require(parallel)
variable.list <- c("log(hden_change_divided_by_hdenStart)", "hden_peak_change", "DRAIN_SQKM", "PPTAVG_BASIN", "SNOW_PCT_PRECIP", "log(HIRES_LENTIC_PCT)", "FRAGUN_BASIN", "BFI_AVE", "CLAYAVE", "WTDEPAVE", "AWCAVE", "RD_STR_INTERS", "RIP800_FOREST", "BAS_COMPACTNESS", "STREAMS_KM_SQ_KM", "RRMEAN", "SLOPE_PCT", "PCT_1ST_ORDER", "FRESHW_WITHDRAWAL")
y.365 <- lm(paste0("(SlopePct.365 - Ref1.SlopePct.365) ~", paste(variable.list, collapse="+")), data=gages)
options(na.action = "na.fail")
no_cores <- detectCores() - 1
cl <- makeCluster(no_cores)
clusterType <- if(length(find.package("snow", quiet = TRUE))) "SOCK" else "PSOCK"
clust <- try(makeCluster(getOption("cl.cores", no_cores), type = clusterType))
dredge.365 <- pdredge(y.365, rank="AICc", trace=2, cluster=clust)
From this I get these errors:
"2097152: In is.data.frame(data):
object 'gages' not found (model 2097151 skipped)
Error in pdredge(y.365, rank="AICc", trace=2, cluster=clust) :
the result is empty
What am I doing wrong? Thanks.
You forgot to export model's data to the cluster nodes:
clusterExport(clust, "gages")

DEoptim error: objective function result has different length than parameter matrix due to foreachArgs specification

I have a very odd DEoptim error that I have "fixed", but do not understand.
I do not have issues when I use DEoptim's parallel capabilities with the parallel package (i.e., pType=1). However when I use foreach instead (which I must use on the grid computing setup that is available to me), I have issues. Below is an MRE of a much simplified version of the issue that I had. pType=1 works, pType=2 when foreachArgs is specified returns an error:
objective function result has different length than parameter matrix
When I do not specify foreachArgs the issue goes away. Does anyone have thoughts about the root cause of this issue?
library(zoo)
library(parallel)
library(doParallel)
library(DEoptim)
myfunc1 <- function(params){
s <- myfunc2(params,ncal,n_left_cens,astats, X_ret, disc_length, X_acq, POP_0, POP_ann_growth)
loss_func(s)
}
myfunc2 = function(params,ncal,n_left_cens,astats, X_ret, disc_length, X_acq, POP_0, POP_ann_growth){
sum(params) + ncal + n_left_cens + astats + X_ret + disc_length + X_acq + POP_0 + POP_ann_growth
}
loss_func = function(s){
s
}
# General setup
ncal = 1
n_left_cens = 1
astats= 1
disc_length = 1
POP_0 = 1
POP_ann_growth = 1
X_acq = 1
X_ret = 1
params = c(1,1)
W = 1
paral = TRUE
itermax=100
ncores = detectCores()
cltype <- ifelse(.Platform$OS.type != "windows", "FORK", "PSOCK")
trace=TRUE
# bounds for search for DEoptim
lower = rep(-1,length(params))
upper = lower*-1
# parallel: works
pType = 1
parVar = c("myfunc1","myfunc2","loss_func","W","ncal","n_left_cens","astats","X_ret","disc_length",
"X_acq","POP_0","POP_ann_growth")
foreachArguments <- list("myfunc1","myfunc2","loss_func","ncal","n_left_cens","astats","X_ret","disc_length",
"X_acq","POP_0","POP_ann_growth")
clusters <- makeCluster(ncores, type = cltype)
registerDoParallel(clusters)
clusterExport(cl=clusters, varlist=foreachArguments, envir=environment())
results <- DEoptim(fn=myfunc1,lower=lower,upper=upper,
DEoptim.control(itermax=itermax,trace=trace,parallelType=pType,
parVar=parVar))
showConnections(all = TRUE)
closeAllConnections()
# foreach with foreachArgs specified: doesn't work
pType = 2
clusters <- makeCluster(ncores, type = cltype)
registerDoParallel(clusters)
clusterExport(cl=clusters, varlist=foreachArguments, envir=environment())
results <- DEoptim(fn=myfunc1,lower=lower,upper=upper,
DEoptim.control(itermax=itermax,trace=trace,parallelType=pType,
foreachArgs=foreachArguments))
showConnections(all = TRUE)
closeAllConnections()
# foreach with foreachArgs unspecified: works
pType = 2
foreachArguments <- list("myfunc1","myfunc2","loss_func","ncal","n_left_cens","astats","X_ret","disc_length",
"X_acq","POP_0","POP_ann_growth")
clusters <- makeCluster(ncores, type = cltype)
registerDoParallel(clusters)
clusterExport(cl=clusters, varlist=foreachArguments, envir=environment())
results <- DEoptim(fn=myfunc1,lower=lower,upper=upper,
DEoptim.control(itermax=itermax,trace=trace,parallelType=pType))
showConnections(all = TRUE)
closeAllConnections()
From ?DEoptim.control:
foreachArgs: A list of named arguments for the ‘foreach’ function from
the package ‘foreach’. The arguments ‘i’, ‘.combine’ and ‘.export’ are
not possible to set here; they are set internally.
Which you seem to be conflating with the behavior of parVar:
parVar: Used if ‘parallelType=1’; a list of variable names (as strings)
that need to exist in the environment for use by the objective function
or are used as arguments by the objective function.
You need to specify the arguments passed to foreach as name = value pairs. For example:
foreachArguments <- list(.export = c("myfunc1", "myfunc2", "loss_func", "ncal",
"n_left_cens", "astats", "X_ret", "disc_length", "X_acq","POP_0","POP_ann_growth")
I'm not sure what's causing that specific error, but the fix is "don't do that." ;)
Here's an example of how you'd actually use the foreachArgs argument. Note that I'm setting the .verbose argument to make foreach print diagnostics:
library(doParallel)
library(DEoptim)
clusters <- makeCluster(detectCores())
registerDoParallel(clusters)
obj_func <- function(params) { sum(params) }
results <- DEoptim(fn=obj_func, lower=c(-1, -1), upper=c(1, 1),
DEoptim.control(parallelType=2, foreachArgs=list(.verbose=TRUE)))
stopCluster(clusters)

Loading CRAN packages to use with emrlapply() from JD Long's 'segue' package?

I'm using JD Long's segue package (https://code.google.com/p/segue/) to do some parallel computing, and am running into an issue loading CRAN packages on the EC2 instances.
First, I created an EMR cluster like so:
myCluster <- createCluster(numInstances = 5,
cranPackages = c("RWeka", "tm"),
masterInstanceType="m1.large",
slaveInstanceType="m1.large",
location="us-east-1c",)
Per the documentation, I specified which packages I want to load (in this case, RWeka and tm).
The cluster seems to start properly, with no error messages. I am using RStudio on Linux Mint 17 with R version 3.0.2.
I wrote a function getTerms.jobAd which takes a character string and calls some functions from the packages above, and am using emrlapply() like so:
> jobAdTerms <- emrlapply(myCluster, X = as.list(jobAds[1:2, 3]), FUN = getTerms.jobAd)
RUNNING - 2014-06-24 17:05:19
RUNNING - 2014-06-24 17:05:50
WAITING - 2014-06-24 17:06:20
When I check the jobAdTerms list that is supposed to be returned, I get:
> jobAdTerms
[[1]]
[1] "error caught by Segue: Error in function (txt) : could not find function \"Corpus\"\n"
[[2]]
[1] "error caught by Segue: Error in function (txt) : could not find function \"Corpus\"\n"
Obviously, Corpus is one of the functions from the tm package.
What am I doing wrong? And how can I remedy this situation? Thanks!!
EDIT
Here's the function I am calling:
nGramTokenizer <- function(x) NGramTokenizer(x, Weka_control(min = 2, max = 4))
getTerms.jobAd <- function(txt) {
tmp <- tolower(txt)
tmp <- gsub('\\s*<.*?>|[:;,#$%^&*()?]|(?<=[a-zA-Z])\\.(?= |$)', '', tmp, perl = TRUE)
txt.Corpus <- Corpus(VectorSource(tmp))
txt.Corpus <- tm_map(txt.Corpus, stripWhitespace)
txt.TFV <- termFreq(txt.Corpus[[1]], control = list(dictionary = jobTags[, 1], wordLengths = c(1, Inf)))
txt.TFV2 <- termFreq(txt.Corpus[[1]], control = list(tokenize = nGramTokenizer, dictionary = jobTags[, 1], wordLengths = c(1, Inf)))
jobTerms <- rowSums(as.matrix(c(txt.TFV, txt.TFV2)))
return(jobTerms)
}
EDIT 2
Here's how you can reproduce the error:
data(crude)
jobAdTerms <- emrlapply(myCluster, X = as.list(crude), FUN = getTerms.jobAd)

Resources